Caselaw Access Project – Overview

Caselaw Access Project – Overview

Project Summary

Problem: Our common law is not freely accessible online. This lack of access to the law impairs justice and equality and stifles innovation.

Goal: Transform the official print versions of all historical U.S. court decisions into digital files made freely accessible online. Encourage and assist federal and state courts in making all prospective court decisions freely accessible online.

Scope:

  • All official reported decisions of the federal courts
  • All official reported decisions of the courts of every state
  • All territorial and pre-statehood decisions in HLSL’s collection
  • Estimated 43,000 volumes and 40MM pages

Process:

  1. Get the books from HLSL or Harvard Depository
  2. Scan the books using a high-speed scanner (~450K pages per week)
  3. Preserve the books in long-term underground storage
  4. Convert the scanned images into machine-readable text files
  5. Extract the individual cases into individual text files
  6. Redact headnotes and other editorial content
  7. Make the redacted images and text files freely accessible online

Projected Timeline:

  • 2015: Ramp up digitization production
  • 2016 (projected): digitize 25MM-30MM pp → publish CA, NY, MA, IL, TX, Federal
  • 2017 (projected): digitize remaining 10MM-15MM pp → publish everything

Harvard – Ravel Agreement – Key Terms

Funding:

  • Ravel pays total costs of digitization

Digitization Responsibilities:

  • Harvard responsible for scanning books
  • Ravel (via vendor) responsible for converting scanned images to text files

Data Ownership and License:

  • Harvard owns the resulting data
  • Ravel gets a temporary exclusive license to commercially exploit redacted files
    • Maximum duration of exclusive commercial license is 8 years
    • Early expiration of exclusive commercial license if:
      • Ravel does not meet its obligations
      • a given jurisdiction publishes its future court decisions online in an acceptable format. Illinois and Arkansas have already satisfied this condition.

Data Access Rights and Obligations:

  • Harvard
    • Harvard may provide anyone with public access to the redacted files, subject to a bulk access limitation
    • Harvard may provide Harvard community members and outside research scholars with free bulk access to the entire dataset, provided they accept contractual prohibitions on redistribution
  • Ravel
    • Ravel will provide ongoing free public access to the redacted files, subject to a bulk access limitation
    • Ravel will provide developers ongoing API access to the redacted files
      • Free access for non-profit developers
      • Paid access for for-profit developers

Other Notable Terms:

  • Harvard has a 4% equity interest in Ravel, with any proceeds going to a sustainability fund to support the project.
  • Should Ravel stop offering public access, Harvard will be able to do so with the necessary Ravel software.

 

4 comments

  1. Andrew Hodel says:

    I’ll help, email me.

    I could help as a developer with steps 3-7 of the process.

  2. […] Sourced through Scoop.it from: etseq.law.harvard.edu […]

  3. Bob Gault says:

    I too have some OCR conversion software – can help with conversion/redaction/transfer/etc.

  4. Samuel A. Abady says:

    You list “All territorial and pre-statehood decisions.” Although not stated, I assume this is meant to include all D.C. decisions, and decisions of courts in Puerto Rico, as neither is a territory like the Virgin Islands and Guam, but both are state-like jurisdictions within the U.S.

    Here’s a suggestion of special interest to historians. During the Civil War, the Confederacy operated courts which issued opinions on everything from contract disputes to desertion by CSA soldiers. The Project should include all published decisions issued by Confederate courts.