Frequently Asked Questions

PromptKey is a comprehensive platform for evaluating large language models (LLMs) across any provider. It allows user-centric evaluation with custom grading parameters tailored to specific workloads and offers powerful tools for managing prompts, datasets, and model performance analysis.

PromptKey is designed for teams, businesses, and researchers who need to evaluate and optimize LLM prompts and responses. Users can invite subject matter experts to grade responses and leverage LLMs as judges to enhance evaluation consistency.

PromptKey supports user grading with customizable grading parameters. It also enables expert reviews and uses LLM-based evaluation informed by human feedback to ensure high-quality and objective assessments.

  1. Identify datasets, models, and presets: Users select the datasets, models, and any existing presets for the evaluation. Custom grading parameters specific to the workload are configured.
  2. Understand presets: A preset is a model configuration that includes parameters like max_tokens, temperature, top_p, and top_k. These settings control the behavior and output quality of the model. Each model can have up to three presets, offering flexibility to test different configurations.
  3. Kick off the evaluation: Once everything is set, the user starts the evaluation process.
  4. Generate and store LLM responses: The system generates LLM responses based on the provided prompts and datasets, and the results are securely stored.
  5. User and expert grading: Users can review and grade the responses. Subject matter experts (SMEs) can also be invited to provide their evaluations.
  6. Full evaluation with LLM as a judge: Once human grading is complete, users can trigger a full evaluation where the LLM acts as a judge, using the human feedback to evaluate other records and ensure consistency.

Yes, evaluations can be revisited, and additional grading or evaluations can be performed as needed to refine results and improve accuracy.

Users can sign up using their Google or GitHub accounts for quick and secure access.
Projects are the core organizational structure in PromptKey. Prompts and datasets are part of projects, making it easy to manage and evaluate different workloads in an organized way.

Yes, PromptKey allows you to store prompts with variables and create multiple versions, making it easy to iterate and refine over time.

Datasets for the variables can be uploaded in JSON or CSV format, and you can also create and manage different dataset versions.

Datasets are generally created under a project. When creating a prompt, users can either select an existing dataset across projects or add a new dataset. Any newly added dataset will automatically appear in the datasets tab for that project.

Yes, all datasets from all projects can be viewed at the top level or organization level. This makes it easy to manage and access datasets without being restricted to a single project view.

Absolutely! PromptKey supports dataset versioning, allowing you to maintain multiple iterations of your datasets and revert or compare versions as needed.

Users can securely store their API keys for different LLM providers. These keys are used exclusively for generating LLM responses within the platform, and robust security measures are in place to protect them.

Today, we allow storing only one API key per provider. In future we will let you use multiple keys.

Yes, you can store and manage API keys for multiple LLM providers. The system will automatically use the appropriate key based on your evaluation setup.

All API keys added are viewable in the configuration tab at the top level organization.

Your API keys are encrypted before being stored in our database. We use industry-standard encryption protocols to ensure your keys cannot be accessed in plain text form.

We use strong encryption algorithms to protect your API keys. The encryption keys themselves are securely stored in a dedicated Key Management Service (KMS) vault, separated from the encrypted data.

Your API keys are only decrypted at the moment they’re needed to execute an evaluation job. After the job completes, the decrypted keys are immediately removed from memory.

No one on our team has direct access to your plain text API keys. The system architecture is designed to maintain key confidentiality throughout the entire process.

Yes, you can remove your API keys at any time through the configuration tab. Once deleted, the encrypted keys are permanently removed from our database.

In the unlikely event of a security incident, attackers would only have access to encrypted API keys, not the actual keys themselves, as the decryption keys are stored separately in the KMS vault.

No. Your API keys are strictly used only for the evaluation jobs you initiate. We never use your keys for any other purpose.

No, all provider API keys are treated with the same high level of security, regardless of which LLM provider they belong to.

Pricing is job-specific and depends on the size of the tokens processed. This ensures you only pay for the resources you use.
Yes, Prompt key will provide an approximate cost based on the size of the dataset size, LLM response size and other available details like grading parameters and number of SME responses. Final cost may slightly vary based on actual LLM tokens, time of the job.
PromptKey will use a stored credit card and charge for the evaluation job.
PromptKey provides comprehensive dashboards that offer insights into costs, latency, content moderation, token usage, and overall model performance.
Yes, PromptKey’s dashboards track historical data, helping you monitor improvements and spot trends in model performance and costs.

Yes, you can invite subject matter experts to participate in the grading process. Their feedback is incorporated into the evaluation, enhancing the quality and relevance of model assessments.

At the top level or organization level (navigating click on the top left), you can see the configuration tab where all Users are visible.

  1. Invite experts: After LLM responses are generated and stored, users can invite subject matter experts (SMEs) via email to participate in grading.
  2. Expert access: Experts receive an email invitation and, upon logging in, can see the projects they’ve been invited to.
  3. Select a project: Experts choose the project they want to grade and access the list of LLM-generated responses.
  4. Unbiased grading: Experts see the request made to the LLM and the corresponding response, along with the custom grading parameters defined by the user. However, they will not see which model the response is from, ensuring unbiased ratings.
  5. Grade responses: Experts provide their evaluation for each record and can easily move on to the next one.

Yes, an SME can be invited to participate in the grading process for multiple projects. They will see a list of all the projects they’ve been invited to when they log in and can select the project they wish to grade.

No, SMEs will not be informed of the model that generated the response. This ensures that their evaluations remain unbiased and focus solely on the quality of the responses.

PromptKey’s LLM-as-a-judge feature uses human feedback as a foundation to assess other records, creating a more scalable and consistent evaluation process.

Our evaluation system allows you to create custom grading parameters tailored to your specific workload. You can define success criteria, scoring rubrics, and evaluation metrics that matter most to your use case.

Yes, you can create custom grading parameters specific to your workload. This ensures that evaluations align with your actual business needs and use cases rather than generic metrics.

PromptKey supports multiple evaluation approaches:

  • Internal team evaluations
  • Subject matter expert invitations
  • LLM-based automated evaluations that incorporate human feedback
  • Collaborative grading across teams
Scroll to Top