Sample task

Loading task...

Static HTML One-task study

Overview

用一个任务学会 Legal Benchmark Eval

Summary

当前本地学习样例只保留一个典型任务,不需要完整 benchmark 仓库。

13 source documents
50 rubric items
2 deliverables
All-pass scoring

Workflow

The benchmark turns a real legal work product into atomic expert checks; every check is judged independently.

Original task
Matter packet
Deliverables
Expert rubric
Pass / Fail
Final score

Harvey Eval Study / Original Task

Review Data Room for Acquisition Red Flags

Task instruction 原文

重点:模型不是回答问题,而是完成一份法律工作产品。

Deliverables 交付物

    Matter packet 材料包:13 files

    点击材料名在当前页预览;需要原始 Office 文件时再打开 original。

      Embedded preview
      Select a document
      Open original

      Rubric & Eval

      50 expert checkpoints + evaluation result in one table

      Final score 0.0
      Passed 47 / 50
      Failed 3
      Rule all checks must pass

      Sample verdicts are for learning only. Pass rate is diagnostic; final score is 1.0 only when every checkpoint passes.

      ID Category Issue Deliverable Pass condition Fail condition Verdict Judge note

      MVP uses light linking only: deliverable names can filter the table. Full source-document cross-index can wait.