🧪TDD Challenge·intermediate·⏱️ 30–45m·⭐ 175 XP

M-067Build an AI Evaluation Metrics Calculator

Description

Nebula Corp needs to measure their chatbot's quality with standard metrics. Build a metrics calculator that computes precision, recall, F1 score, and semantic similarity for AI-generated responses compared to ground truth answers. The calculator should handle edge cases and produce a comprehensive evaluation report.

Test Cases (4)

Perfect match has F1 of 1

Identical texts should have perfect F1

Input:f1Score('the cat sat on the mat', 'the cat sat on the mat')

Expected:CONTAINS:1

No overlap has F1 of 0

Completely different texts should have zero F1

Input:f1Score('hello world', 'goodbye universe')

Expected:0

Partial overlap computed correctly

Most predicted tokens appear in reference

Input:precision('the big cat sat', 'the cat sat on the mat')

Expected:CONTAINS:0.

Batch evaluation works

Should average metrics across pairs

Input:testBatchEval()

Expected:CONTAINS:avgF1

Click Run / Check to validate your solution

M-067Build an AI Evaluation Metrics Calculator

Description

Test Cases (4)

Related Lessons