📖Lessons
Introduction to AI Observability
Understand why traditional monitoring fails for LLM applications and what AI observability actually means
Tracing Fundamentals
Learn how traces, spans, and generations capture the full lifecycle of LLM requests
Instrumenting LLM Calls
Add observability to your LLM application code with minimal effort using SDKs and decorators
Langfuse Deep Dive
Master Langfuse — the leading open-source LLM observability platform for tracing, evaluation, and prompt management
Cost Tracking and Optimization
Monitor per-request costs, set budget alerts, and optimize spending across models and features
Latency Analysis and Normalization
Understand why raw latency is misleading for LLMs and how to normalize, measure, and optimize response times
Tracing Agents and Multi-Turn Conversations
Add observability to complex agentic workflows, tool-calling loops, and multi-turn conversation sessions
Metrics, Dashboards, and Alerting
Build production dashboards and set up alerts for cost spikes, quality drops, and latency regressions
Grafana for LLM Dashboards
Build production-grade LLM monitoring dashboards with Grafana, Prometheus, and Tempo — the open-source observability stack
Evaluation in Production
Run continuous quality evaluation on live traffic using LLM-as-judge, user feedback, and automated scoring
OpenTelemetry for LLMs
Use the OpenTelemetry standard for vendor-neutral LLM instrumentation with OpenLLMetry and semantic conventions
Workshop: Build an Observability Pipeline
Hands-on workshop: instrument an LLM application with Langfuse, build a cost dashboard, and set up quality alerts
🎯Missions
M-073Build a Per-Request Cost Tracker
Nebula Corp's LLM spending is out of control — they have no idea which features or users are driving costs. Build a cost tracking system that calculates per-request costs, aggregates by dimension (model, feature, user), and flags requests that exceed budget thresholds. The pricing table and trace data are provided, but the cost calculation and aggregation logic is missing.
M-075Build an Online Evaluation Pipeline
Nebula Corp needs to continuously monitor the quality of their AI responses in production. Build an evaluation pipeline that scores responses using multiple criteria (relevance, groundedness, safety), samples production traffic at a configurable rate, detects quality regressions by comparing recent scores against a baseline, and generates alerts when quality drops below thresholds.
M-076Build an OpenTelemetry LLM Exporter
Nebula Corp wants vendor-neutral observability. Build a lightweight OpenTelemetry-compatible span exporter for LLM calls. The exporter should capture LLM-specific semantic conventions (gen_ai.* attributes), batch spans for efficient export, and format them as OTLP-compatible JSON. This lets them send traces to any backend — Langfuse, Jaeger, Grafana Tempo — without changing application code.
M-071Build Your First LLM Trace
Nebula Corp's chatbot has no observability — when users report wrong answers, the team has no way to see what the model received or returned. Implement a basic tracing system that captures LLM calls with their inputs, outputs, token counts, and latency. The skeleton has a trace store and an LLM wrapper, but the actual trace capture logic is missing.
M-074Normalize LLM Latency Metrics
Nebula Corp's monitoring dashboard shows raw latency for LLM calls, but the numbers are misleading — a 5-second response generating 800 tokens looks 'slow' while a 500ms response generating 10 tokens looks 'fast'. Build a latency normalization system that calculates tokens-per-second throughput, Time to First Token (TTFT), and identifies actual performance bottlenecks by comparing normalized metrics.
M-072Trace a Multi-Step Agent
Nebula Corp's customer support agent makes multiple LLM calls and tool executions per request, but there's no visibility into what happens between the user's question and the final answer. Build a tracing wrapper for the agent loop that captures each LLM call, tool execution, and the overall trace with cumulative metrics. Include loop detection to flag when the agent calls the same tool with the same arguments more than twice.
🔧Workshops
W-021Agent Trace Analyzer
Build a tool that analyzes agent execution traces to identify performance issues, detect loops, calculate per-turn costs, and generate optimization recommendations. Takes raw agent traces as input and produces actionable insights about agent behavior patterns.
W-020LLM Observability Dashboard Builder
Build a real-time observability dashboard for LLM applications. Create a monitoring interface that displays trace timelines, cost breakdowns by model and feature, latency percentile charts, and quality score trends. Uses Langfuse data with a custom visualization layer.