PromptKey – Become an AI Engineer by Building Real Projects

CloudDocs Inc deployed a RAG system but has no way to measure quality. Users complain about irrelevant answers, but there's no data to guide improvements. Build an evaluation suite with test queries, ground truth answers, and automated metrics (precision, recall, MRR).

advanced🎯 40–55 minmissionRank 04PRO

M-037Build Two-Stage Retrieval with Reranking

LegalTech AI's RAG system retrieves 10 documents but only 3 are relevant (precision@10 = 0.30). The bi-encoder is fast but imprecise. Implement a two-stage pipeline: fast bi-encoder retrieval followed by cross-encoder reranking to improve precision.

beginner🎯 25–40 minmissionRank 03PRO

M-029Build Your First Document Q&A System

Nebula Corp has a collection of product FAQ documents but no way to search them intelligently. Users type questions and get nothing useful back. Build a basic RAG pipeline: embed the documents, find the most relevant ones for a user query using cosine similarity, and construct a prompt that includes the retrieved context so the LLM can generate a grounded answer.

beginner🎯 15–30 minmissionRank 03

M-026Build Your First Similarity Search

Nebula Corp has a knowledge base of product descriptions stored as embedding vectors, but no way to search them. Build a similarity search function that takes a query vector, compares it against all stored document vectors using cosine similarity, and returns the top-K most relevant results ranked by score.

intermediate🎯 30–45 minmissionRank 03PRO

M-031Compare Embedding Models for Domain-Specific RAG

MedTech AI's RAG system uses a general-purpose embedding model (MiniLM) but struggles with medical terminology. 'myocardial infarction' and 'heart attack' aren't recognized as similar. Test different embedding models and measure which performs best on medical queries.

beginner🎯 20–35 minmissionRank 03PRO

M-028Fix the Embedding Service

A junior developer at Nebula Corp submitted a PR for the embedding service, but it has several bugs. Review the code, identify the issues, and fix them before this goes to production.

intermediate🎯 35–55 minmissionRank 03PRO

M-032Implement Hybrid Search for Better Accuracy

TechDocs Inc's RAG system misses exact keyword matches. A query for 'ERR_CONNECTION_REFUSED' returns generic networking docs instead of the specific error code documentation. Implement hybrid search combining semantic and keyword search to improve precision.

beginner🎯 25–40 minmissionRank 02PRO

M-027Implement Metadata Filtering for Multi-Tenant RAG

SecureDoc's RAG system has a critical security bug: users can see documents from other organizations! The vector database returns results from all tenants. Implement metadata filtering to ensure users only retrieve documents they have access to.

advanced🎯 40–55 minmissionRank 04PRO

M-036Implement Query Decomposition for Complex Questions

AnalyticsPro's RAG system fails on multi-part questions. A query like 'Compare pricing between Pro and Enterprise plans and explain which includes API access' returns incomplete answers. Implement query decomposition to break complex questions into focused sub-queries.

intermediate🎯 30–45 minmissionRank 03PRO

M-030Optimize Chunking Strategy for Better Retrieval

DataFlow Inc's RAG system has poor retrieval accuracy (precision@5 = 0.45). The current fixed-size chunking splits documents mid-sentence, breaking context. Implement and test different chunking strategies to improve retrieval quality above the target threshold.

intermediate🎯 30–45 minmissionRank 04PRO

M-033Optimize Context Window Packing for RAG

Nebula Corp's RAG system retrieves 10 chunks but naively concatenates them all, often exceeding the LLM's context window and getting truncated. Important information at the end gets cut off. Implement a smart context packer that: estimates token counts, prioritizes the most relevant chunks, and fits as much high-quality context as possible within the token budget — without exceeding it.

advanced🎯 35–50 minmissionRank 03PRO

M-035Optimize RAG Pipeline Costs

Nebula Corp's RAG pipeline is burning through API credits. The current implementation sends full documents to the LLM for every query. Refactor the pipeline to reduce cost while maintaining answer quality above the threshold.

🔧Workshops

advanced🔧 60–85 minworkshopRank 06PRO

W-012Create a Custom Reranker

Build a reranker that improves RAG accuracy from 60% to 85% using cross-encoders.

rerankingcross-encodersaccuracy-tuning

intermediate🔧 45–70 minworkshopRank 06PRO

W-011Implement Semantic Caching

Build a semantic cache that reduces API costs by 70% using embedding similarity.

embeddingscachingcost-optimization

intermediate🔧 50–75 minworkshopRank 04PRO

W-009Metadata Extraction Pipeline

Build a pipeline to automatically extract and enrich metadata from documents using LLMs.

metadatadocument-processingllm-prompting

intermediate🔧 45–70 minworkshopRank 04PRO

W-010RAG Debugging Tool

Create a debugger that visualizes retrieval results, shows chunk overlap, and identifies gaps.

debuggingvisualizationanalysis

🚀Projects

intermediate🚀 540–755 minprojectRank 04PRO

P-004Production RAG Chatbot

Build a production-ready RAG chatbot with vector search, reranking, and cost optimization. Deploy with real-time streaming responses.

ragvector-searchrerankingdeploymentcost-optimization