Skip to Content
No preview available

Actions

Download Analytics Citations

Export to: EndNote  |  Zotero  |  Mendeley

Collections

This file is not currently in any collections.

UK-StatuteCorpus v1.0 (2020–2024): passage-level corpus, Eval100 evaluation dataset, and distillation set Open Access

UK-StatuteCorpus v1.0 is a passage-level corpus of UK statutory instruments from 2020–2024 (124,796 passages). The release also includes Eval100, a 100-query evaluation dataset with three graded relevant passages per query (rel=3/2/1) and rationales, and a distillation set of 5,221 query–passage examples with Voyage rerank-2.5 teacher scores for training neural rerankers. Contains public sector information licensed under the Open Government Licence v3.0 (legislation.gov.uk, The National Archives).

Descriptions

Resource type
Dataset
Contributors
Creator: Alshehri, Amal Saad 1
Contact person: Alshehri, Amal Saad 1
Creator: Atapour-Abarghouei, Amir 1
Creator: Eken, Can 1
Creator: Bencomo, Nelly 1
1 Durham University, UK
Funder
Research methods
We segmented statutory instruments into passages and associated each passage with structured metadata (e.g., year, legislation type, source URL). Eval100 was created by constructing 100 statutory questions and annotating three passages per query with graded relevance labels (3/2/1) and short rationales. A teacher reranker (Voyage rerank-2.5) was used to score query–passage pairs to form a 5,221-example distillation dataset for training smaller reranking models.
Other description
Research data created by the four authors licenced under CC-BY 4.0. UK Government research data licenced under OGL 3.0.
Keyword
Statutory instruments
Legal information retrieval
Neural re-ranking
Subject
Neural networks (Computer science)--Research
Deep learning (Machine learning)
Location
Language
Cited in
Identifier
ark:/32150/r14x51hj064
doi:10.15128/r14x51hj064
Rights
Creative Commons Attribution 4.0 International (CC BY)

Publisher
Durham University
Date Created

File Details

Depositor
A. Alshehri
Date Uploaded
Date Modified
19 December 2025, 11:12:41
Audit Status
Audits have not yet been run on this file.
Characterization
File format: zip (ZIP Format)
Mime type: application/zip
File size: 69883189
Last modified: 2025:12:19 08:56:43+00:00
Filename: uk-statutecorpus_v1.0.zip
Original checksum: 29907ffa565a5e421c851ab2bc661423
Activity of users you follow
User Activity Date
User N. Syrotiuk has updated UK-StatuteCorpus v1.0 (2020–2024): passage-level corpus, Eval100 evaluation dataset, and distillation set about 7 hours ago
User N. Syrotiuk has updated UK-StatuteCorpus v1.0 (2020–2024): passage-level corpus, Eval100 evaluation dataset, and distillation set about 7 hours ago
User N. Syrotiuk has updated UK-StatuteCorpus v1.0 (2020–2024): passage-level corpus, Eval100 evaluation dataset, and distillation set about 7 hours ago
User N. Syrotiuk has added a new version of UK-StatuteCorpus v1.0 (2020–2024): passage-level corpus, Eval100 evaluation dataset, and distillation set about 7 hours ago
User N. Syrotiuk has updated UK-StatuteCorpus v1.0 (2020–2024): passage-level corpus, Eval100 evaluation dataset, and distillation set about 7 hours ago
User N. Syrotiuk has updated R14x51hj064 UK-StatuteCorpus v1.0 (2020–2024): passage-level corpus, Eval100 evaluation dataset, and distillation set about 7 hours ago
User A. Alshehri has deposited uk-statutecorpus_v1.0.zip about 10 hours ago