🧪TDD Challenge·beginner·⏱️ 15–30m·⭐ 100 XP

M-063Build Your First LLM Eval Scorer

Description

Nebula Corp's AI team has no way to measure whether their LLM outputs are any good. Build a simple evaluation scorer that checks LLM responses against expected answers using multiple metrics: exact match, keyword containment, and a basic similarity score based on word overlap. This is the foundation of every eval pipeline.

Test Cases (3)

Exact match works

Should match after trimming and lowercasing

Input:exactMatch(' Hello World ', 'hello world')

Expected:true

Keyword containment check

Should find all keywords in the text

Input:containsAllKeywords('The quick brown fox jumps over the lazy dog', ['quick', 'fox', 'dog'])

Expected:true

Word overlap scoring

4 of 6 expected words found = 0.67

Input:wordOverlapScore('the cat sat on the mat', 'the cat is on a mat')

Expected:STARTS_WITH:0.6

Click Run / Check to validate your solution

M-063Build Your First LLM Eval Scorer

Description

Test Cases (3)

Related Lessons