PromptKey – Become an AI Engineer by Building Real Projects

Filter by rank:

Jump to:📖 Lessons 🎯 Missions 🔧 Workshops 🚀 Projects

📖Lessons

beginner📖 15 minlesson

Introduction to AI Observability

Understand why traditional monitoring fails for LLM applications and what AI observability actually means

observabilitymonitoringllm-opsintroductionfundamentals

beginner📖 18 minlesson

Tracing Fundamentals

Learn how traces, spans, and generations capture the full lifecycle of LLM requests

tracingspansgenerationsopentelemetryfundamentals

beginner📖 20 minlessonPRO

Instrumenting LLM Calls

Add observability to your LLM application code with minimal effort using SDKs and decorators

instrumentationsdklangfusedecoratorsopenaianthropic

intermediate📖 22 minlessonPRO

Langfuse Deep Dive

Master Langfuse — the leading open-source LLM observability platform for tracing, evaluation, and prompt management

langfuseself-hostedtracingevaluationprompt-management

intermediate📖 18 minlessonPRO

Cost Tracking and Optimization

Monitor per-request costs, set budget alerts, and optimize spending across models and features

costbudgettokensoptimizationpricing

intermediate📖 18 minlessonPRO

Latency Analysis and Normalization

Understand why raw latency is misleading for LLMs and how to normalize, measure, and optimize response times

latencyperformancetokens-per-secondttftnormalization

intermediate📖 22 minlessonPRO

Tracing Agents and Multi-Turn Conversations

Add observability to complex agentic workflows, tool-calling loops, and multi-turn conversation sessions

agentsmulti-turnsessionstool-callingtracing

intermediate📖 18 minlessonPRO

Metrics, Dashboards, and Alerting

Build production dashboards and set up alerts for cost spikes, quality drops, and latency regressions

metricsdashboardsalertingmonitoringproduction

intermediate📖 22 minlessonPRO

Grafana for LLM Dashboards

Build production-grade LLM monitoring dashboards with Grafana, Prometheus, and Tempo — the open-source observability stack

grafanaprometheustempodashboardsmonitoringproduction

advanced📖 20 minlessonPRO

Evaluation in Production

Run continuous quality evaluation on live traffic using LLM-as-judge, user feedback, and automated scoring

evaluationllm-judgefeedbackscoringquality

advanced📖 20 minlessonPRO

OpenTelemetry for LLMs

Use the OpenTelemetry standard for vendor-neutral LLM instrumentation with OpenLLMetry and semantic conventions

opentelemetryopenllmetrystandardsinstrumentationvendor-neutral

intermediate📖 45 minlessonPRO

Workshop: Build an Observability Pipeline

Hands-on workshop: instrument an LLM application with Langfuse, build a cost dashboard, and set up quality alerts

workshophands-onlangfusedashboardpipeline

🎯Missions

intermediate🎯 25–40 minmissionRank 08PRO

M-073Build a Per-Request Cost Tracker

Nebula Corp's LLM spending is out of control — they have no idea which features or users are driving costs. Build a cost tracking system that calculates per-request costs, aggregates by dimension (model, feature, user), and flags requests that exceed budget thresholds. The pricing table and trace data are provided, but the cost calculation and aggregation logic is missing.

advanced🎯 30–40 minmissionRank 08PRO

M-075Build an Online Evaluation Pipeline

Nebula Corp needs to continuously monitor the quality of their AI responses in production. Build an evaluation pipeline that scores responses using multiple criteria (relevance, groundedness, safety), samples production traffic at a configurable rate, detects quality regressions by comparing recent scores against a baseline, and generates alerts when quality drops below thresholds.

advanced🎯 30–40 minmissionRank 08PRO

M-076Build an OpenTelemetry LLM Exporter

Nebula Corp wants vendor-neutral observability. Build a lightweight OpenTelemetry-compatible span exporter for LLM calls. The exporter should capture LLM-specific semantic conventions (gen_ai.* attributes), batch spans for efficient export, and format them as OTLP-compatible JSON. This lets them send traces to any backend — Langfuse, Jaeger, Grafana Tempo — without changing application code.

beginner🎯 20–35 minmissionRank 08

M-071Build Your First LLM Trace

Nebula Corp's chatbot has no observability — when users report wrong answers, the team has no way to see what the model received or returned. Implement a basic tracing system that captures LLM calls with their inputs, outputs, token counts, and latency. The skeleton has a trace store and an LLM wrapper, but the actual trace capture logic is missing.

intermediate🎯 25–40 minmissionRank 08PRO

M-074Normalize LLM Latency Metrics

Nebula Corp's monitoring dashboard shows raw latency for LLM calls, but the numbers are misleading — a 5-second response generating 800 tokens looks 'slow' while a 500ms response generating 10 tokens looks 'fast'. Build a latency normalization system that calculates tokens-per-second throughput, Time to First Token (TTFT), and identifies actual performance bottlenecks by comparing normalized metrics.

intermediate🎯 30–45 minmissionRank 08PRO

M-072Trace a Multi-Step Agent

Nebula Corp's customer support agent makes multiple LLM calls and tool executions per request, but there's no visibility into what happens between the user's question and the final answer. Build a tracing wrapper for the agent loop that captures each LLM call, tool execution, and the overall trace with cumulative metrics. Include loop detection to flag when the agent calls the same tool with the same arguments more than twice.

🔧Workshops

advanced🔧 45–65 minworkshopRank 08PRO

W-021Agent Trace Analyzer

Build a tool that analyzes agent execution traces to identify performance issues, detect loops, calculate per-turn costs, and generate optimization recommendations. Takes raw agent traces as input and produces actionable insights about agent behavior patterns.

agentstracinganalysisoptimizationdebugging

intermediate🔧 50–75 minworkshopRank 08PRO

W-020LLM Observability Dashboard Builder

Build a real-time observability dashboard for LLM applications. Create a monitoring interface that displays trace timelines, cost breakdowns by model and feature, latency percentile charts, and quality score trends. Uses Langfuse data with a custom visualization layer.

dashboardmonitoringlangfusevisualizationreal-time

🚀Projects

advanced🚀 400–540 minprojectRank 08PRO

P-010AI Observability Platform

Build a self-hosted AI observability platform that traces LLM calls, tracks costs, monitors latency, evaluates quality, and alerts on anomalies. Integrates with OpenTelemetry for vendor-neutral instrumentation and provides a dashboard for real-time monitoring. Think of it as a mini Langfuse — purpose-built for your LLM applications.

observabilitytracingmonitoringdashboardopentelemetryevaluation