Sample task
Loading task...
Overview
用一个任务学会 Legal Benchmark Eval
Summary
当前本地学习样例只保留一个典型任务,不需要完整 benchmark 仓库。
Workflow
The benchmark turns a real legal work product into atomic expert checks; every check is judged independently.
Original task
→
Matter packet
→
Deliverables
→
Expert rubric
→
Pass / Fail
→
Final score
Harvey Eval Study / Original Task
Review Data Room for Acquisition Red Flags
Task instruction 原文
重点:模型不是回答问题,而是完成一份法律工作产品。
Deliverables 交付物
Matter packet 材料包:13 files
点击材料名在当前页预览;需要原始 Office 文件时再打开 original。
Rubric & Eval
50 expert checkpoints + evaluation result in one table
Sample verdicts are for learning only. Pass rate is diagnostic; final score is 1.0 only when every checkpoint passes.
| ID | Category | Issue | Deliverable | Pass condition | Fail condition | Verdict | Judge note |
|---|
MVP uses light linking only: deliverable names can filter the table. Full source-document cross-index can wait.