🧪TDD Challenge·advanced·⏱️ 35–50m·⭐ 225 XP

M-081Build a RAGAS-Style RAG Evaluator

Description

Nebula Corp needs to evaluate their RAG pipeline's quality using metrics inspired by the RAGAS framework. Build an evaluator that computes faithfulness (is the answer grounded in context?), answer relevancy (does it address the question?), and context precision (is the retrieved context relevant?). Produce a comprehensive evaluation report with per-question and aggregate scores.

Test Cases (4)

Faithful answer scores high

Answer grounded in context should score high

Input:computeFaithfulness('You can get a full refund within 30 days of purchase.', ['Customers can request a full refund within 30 days of purchase.'])

Expected:CONTAINS:1

Unfaithful answer scores low

Answer contradicting context should score lower

Input:computeFaithfulness('The API allows unlimited requests.', ['Free tier: 100 req/hr. Pro: 1000 req/hr.'])

Expected:CONTAINS:0

Dataset evaluation produces report

Should produce aggregate scores

Input:testDatasetEval()

Expected:CONTAINS:avgOverall

Weakest sample identified

The unfaithful answer about rate limits should be weakest

Input:testWeakestSample()

Expected:CONTAINS:rate limits

Related Lessons

📖RAGASdeveloper-tools↗

Click Run / Check to validate your solution