🚀 We're in early access! Submit feedback — your input shapes the platform.
← All Topics

AI Observability

📖 12 lessons🎯 6 missions🔧 2 workshops🚀 1 project⏱️ ~16 hours

📖Lessons

1
beginner📖 15 minlesson

Introduction to AI Observability

Understand why traditional monitoring fails for LLM applications and what AI observability actually means

observabilitymonitoringllm-opsintroductionfundamentals
2
beginner📖 18 minlesson

Tracing Fundamentals

Learn how traces, spans, and generations capture the full lifecycle of LLM requests

tracingspansgenerationsopentelemetryfundamentals
🔒
beginner📖 20 minlessonPRO

Instrumenting LLM Calls

Add observability to your LLM application code with minimal effort using SDKs and decorators

instrumentationsdklangfusedecoratorsopenaianthropic
🔒
intermediate📖 22 minlessonPRO

Langfuse Deep Dive

Master Langfuse — the leading open-source LLM observability platform for tracing, evaluation, and prompt management

langfuseself-hostedtracingevaluationprompt-management
🔒
intermediate📖 18 minlessonPRO

Cost Tracking and Optimization

Monitor per-request costs, set budget alerts, and optimize spending across models and features

costbudgettokensoptimizationpricing
🔒
intermediate📖 18 minlessonPRO

Latency Analysis and Normalization

Understand why raw latency is misleading for LLMs and how to normalize, measure, and optimize response times

latencyperformancetokens-per-secondttftnormalization
🔒
intermediate📖 22 minlessonPRO

Tracing Agents and Multi-Turn Conversations

Add observability to complex agentic workflows, tool-calling loops, and multi-turn conversation sessions

agentsmulti-turnsessionstool-callingtracing
🔒
intermediate📖 18 minlessonPRO

Metrics, Dashboards, and Alerting

Build production dashboards and set up alerts for cost spikes, quality drops, and latency regressions

metricsdashboardsalertingmonitoringproduction
🔒
intermediate📖 22 minlessonPRO

Grafana for LLM Dashboards

Build production-grade LLM monitoring dashboards with Grafana, Prometheus, and Tempo — the open-source observability stack

grafanaprometheustempodashboardsmonitoringproduction
🔒
advanced📖 20 minlessonPRO

Evaluation in Production

Run continuous quality evaluation on live traffic using LLM-as-judge, user feedback, and automated scoring

evaluationllm-judgefeedbackscoringquality
🔒
advanced📖 20 minlessonPRO

OpenTelemetry for LLMs

Use the OpenTelemetry standard for vendor-neutral LLM instrumentation with OpenLLMetry and semantic conventions

opentelemetryopenllmetrystandardsinstrumentationvendor-neutral
🔒
intermediate📖 45 minlessonPRO

Workshop: Build an Observability Pipeline

Hands-on workshop: instrument an LLM application with Langfuse, build a cost dashboard, and set up quality alerts

workshophands-onlangfusedashboardpipeline

🎯Missions

🔒
intermediate🎯 25–40 minmissionRank 08PRO

M-073Build a Per-Request Cost Tracker

Nebula Corp's LLM spending is out of control — they have no idea which features or users are driving costs. Build a cost tracking system that calculates per-request costs, aggregates by dimension (model, feature, user), and flags requests that exceed budget thresholds. The pricing table and trace data are provided, but the cost calculation and aggregation logic is missing.

🔒
advanced🎯 30–40 minmissionRank 08PRO

M-075Build an Online Evaluation Pipeline

Nebula Corp needs to continuously monitor the quality of their AI responses in production. Build an evaluation pipeline that scores responses using multiple criteria (relevance, groundedness, safety), samples production traffic at a configurable rate, detects quality regressions by comparing recent scores against a baseline, and generates alerts when quality drops below thresholds.

🔒
advanced🎯 30–40 minmissionRank 08PRO

M-076Build an OpenTelemetry LLM Exporter

Nebula Corp wants vendor-neutral observability. Build a lightweight OpenTelemetry-compatible span exporter for LLM calls. The exporter should capture LLM-specific semantic conventions (gen_ai.* attributes), batch spans for efficient export, and format them as OTLP-compatible JSON. This lets them send traces to any backend — Langfuse, Jaeger, Grafana Tempo — without changing application code.

4
beginner🎯 20–35 minmissionRank 08

M-071Build Your First LLM Trace

Nebula Corp's chatbot has no observability — when users report wrong answers, the team has no way to see what the model received or returned. Implement a basic tracing system that captures LLM calls with their inputs, outputs, token counts, and latency. The skeleton has a trace store and an LLM wrapper, but the actual trace capture logic is missing.

🔒
intermediate🎯 25–40 minmissionRank 08PRO

M-074Normalize LLM Latency Metrics

Nebula Corp's monitoring dashboard shows raw latency for LLM calls, but the numbers are misleading — a 5-second response generating 800 tokens looks 'slow' while a 500ms response generating 10 tokens looks 'fast'. Build a latency normalization system that calculates tokens-per-second throughput, Time to First Token (TTFT), and identifies actual performance bottlenecks by comparing normalized metrics.

🔒
intermediate🎯 30–45 minmissionRank 08PRO

M-072Trace a Multi-Step Agent

Nebula Corp's customer support agent makes multiple LLM calls and tool executions per request, but there's no visibility into what happens between the user's question and the final answer. Build a tracing wrapper for the agent loop that captures each LLM call, tool execution, and the overall trace with cumulative metrics. Include loop detection to flag when the agent calls the same tool with the same arguments more than twice.