โ Testing And Evaluation/M-067Build an AI Evaluation Metrics Calculator๐ Guide๐
Rank 08ยท The Arbiter
๐งชTDD Challengeยทintermediateยทโฑ๏ธ 30โ45mยทโญ 175 XP
M-067Build an AI Evaluation Metrics Calculator
Description
Nebula Corp needs to measure their chatbot's quality with standard metrics. Build a metrics calculator that computes precision, recall, F1 score, and semantic similarity for AI-generated responses compared to ground truth answers. The calculator should handle edge cases and produce a comprehensive evaluation report.
Test Cases (4)
Perfect match has F1 of 1
Identical texts should have perfect F1
Input:f1Score('the cat sat on the mat', 'the cat sat on the mat')
Expected:CONTAINS:1
No overlap has F1 of 0
Completely different texts should have zero F1
Input:f1Score('hello world', 'goodbye universe')
Expected:0
Partial overlap computed correctly
Most predicted tokens appear in reference
Input:precision('the big cat sat', 'the cat sat on the mat')
Expected:CONTAINS:0.
Batch evaluation works
Should average metrics across pairs
Input:testBatchEval()
Expected:CONTAINS:avgF1
Related Lessons
Click Run / Check to validate your solution