๐งชTDD Challengeยทadvancedยทโฑ๏ธ 35โ50mยทโญ 225 XP
M-081Build a RAGAS-Style RAG Evaluator
Description
Nebula Corp needs to evaluate their RAG pipeline's quality using metrics inspired by the RAGAS framework. Build an evaluator that computes faithfulness (is the answer grounded in context?), answer relevancy (does it address the question?), and context precision (is the retrieved context relevant?). Produce a comprehensive evaluation report with per-question and aggregate scores.
Test Cases (4)
Faithful answer scores high
Answer grounded in context should score high
Input:computeFaithfulness('You can get a full refund within 30 days of purchase.', ['Customers can request a full refund within 30 days of purchase.'])
Expected:CONTAINS:1
Unfaithful answer scores low
Answer contradicting context should score lower
Input:computeFaithfulness('The API allows unlimited requests.', ['Free tier: 100 req/hr. Pro: 1000 req/hr.'])
Expected:CONTAINS:0
Dataset evaluation produces report
Should produce aggregate scores
Input:testDatasetEval()
Expected:CONTAINS:avgOverall
Weakest sample identified
The unfaithful answer about rate limits should be weakest
Input:testWeakestSample()
Expected:CONTAINS:rate limits
Related Lessons
Click Run / Check to validate your solution