Skip to Content
No preview available

Actions

Download Analytics Citations

Export to: EndNote  |  Zotero  |  Mendeley

Collections

This file is in the following collections:

MuLD: The Multitask Long Document Benchmark [dataset]

OpenSubtitles - train [dataset] Open Access

MuLD (Multitask Long Document Benchmark) is a set of 6 NLP tasks where the inputs consist of at least 10,000 words. The benchmark covers a wide variety of task types including translation, summarization, question answering, and classification. Additionally there is a range of output lengths from a single word classification label all the way up to an output longer than the input text.

Descriptions

Resource type
Dataset
Contributors
Data collector: Hudson, G Thomas 1
1 Durham Univesity
Funder
Research methods
Other description
Keyword
nlp
multitask
long document
Subject
Location
Language
Cited in
Identifier
ark:/32150/r2x920fw88t
Rights
MIT Licence (MIT)

Publisher
Durham University
Date Created

File Details

Depositor
G.T. Hudson
Date Uploaded
Date Modified
3 May 2022, 13:05:24
Audit Status
Audits have not yet been run on this file.
Related Files
hotpot_annotated_train.json
narrativeqa_train.json.bz2
vlsp_test.json.bz2
style_change_validation.json.bz2
hotpot_annotated_valid.json
narrativeqa_test.json.bz2
style_change_train.json.bz2
narrativeqa_validation.json.bz2
character_id_test.json.bz2
style_change_test.json.bz2
character_id_validation.json.bz2
opensubtitles_test.json.bz2
character_id_train.json.bz2
Characterization
File format: x-bzip2 (bzip2 compressed data, block size = 900k, BZ2, Bzip2)
Mime type: application/x-bzip2
File size: 423341613
Last modified: 2022:04:26 16:12:39+01:00
Filename: opensubtitles_train.json.bz2
Original checksum: 5ffb30163a8ba21757b837a6a85e19a3
Activity of users you follow
User Activity Date