Quantify missing UK court decisions relative to the Cambridge Law Corpus

Determine the total number of UK court decisions that are not included in the Cambridge Law Corpus, acknowledging that UK courts do not report all decisions they make and that online publication is incomplete, in order to assess the corpus’s completeness and representativeness.

Background

The Cambridge Law Corpus (CLC) comprises 258,146 court cases from the United Kingdom, focused on England and Wales and combined UK jurisdictions. The dataset includes publicly available judgments and excludes certain courts (e.g., Scotland, Northern Ireland) and European judgments.

The authors state that UK courts do not report all decisions they make, implying that some judgments are not publicly available and thus cannot be included. As a result, the corpus cannot claim to be comprehensive or representative, and the number of missing court decisions remains unknown. Establishing this number would help quantify the dataset’s coverage and inform its use in research.

References

As courts in the UK do not report all decisions they make, it is currently not possible to know how many other court decisions there are that are not in our dataset.

The Cambridge Law Corpus: A Dataset for Legal AI Research  (2309.12269 - Östling et al., 2023) in Composition — answer to “Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?”