๐Ÿš€ Everything is free โ€” help us improve! Submit feedback and shape the platform.
๐ŸงชTDD Challengeยทintermediateยทโฑ๏ธ 30โ€“45mยทโญ 175 XP

M-067Build an AI Evaluation Metrics Calculator

Description

Nebula Corp needs to measure their chatbot's quality with standard metrics. Build a metrics calculator that computes precision, recall, F1 score, and semantic similarity for AI-generated responses compared to ground truth answers. The calculator should handle edge cases and produce a comprehensive evaluation report.

Test Cases (4)

Perfect match has F1 of 1
Identical texts should have perfect F1
Input:f1Score('the cat sat on the mat', 'the cat sat on the mat')
Expected:CONTAINS:1
No overlap has F1 of 0
Completely different texts should have zero F1
Input:f1Score('hello world', 'goodbye universe')
Expected:0
Partial overlap computed correctly
Most predicted tokens appear in reference
Input:precision('the big cat sat', 'the cat sat on the mat')
Expected:CONTAINS:0.
Batch evaluation works
Should average metrics across pairs
Input:testBatchEval()
Expected:CONTAINS:avgF1

Related Lessons

Click Run / Check to validate your solution