Harvey Eval Study

Overview

用一个任务学会 Legal Benchmark Eval

当前本地学习样例只保留一个典型任务，不需要完整 benchmark 仓库。

13 source documents

50 rubric items

2 deliverables

All-pass scoring

The benchmark turns a real legal work product into atomic expert checks; every check is judged independently.

Original task

→

Matter packet

→

Deliverables

→

Expert rubric

→

Pass / Fail

→

Final score

Harvey Eval Study / Original Task

重点：模型不是回答问题，而是完成一份法律工作产品。

点击材料名在当前页预览；需要原始 Office 文件时再打开 original。

Rubric & Eval

Final score 0.0

Passed 47 / 50

Failed 3

Rule all checks must pass

Sample verdicts are for learning only. Pass rate is diagnostic; final score is 1.0 only when every checkpoint passes.

ID	Category	Issue	Deliverable	Pass condition	Fail condition	Verdict	Judge note

MVP uses light linking only: deliverable names can filter the table. Full source-document cross-index can wait.