Where can I find good data sets for text summarization?

Where can I find good data sets for text summarization?

The DUC (Document Understanding Conference) datasets are the defacto standard data sets that the NLP community uses for evaluating summarization systems. Most of the papers use DUC-2003 as the training set and DUC-2004 as the testset. Even though this dataset is old, this dataset is considered incredibly challenging .

What are the two fields in a Twitter data set?

For that reason, Twitter data sets are often shared as simply two fields: user_id and tweet_id. Then to reconstruct the dataset, one would query the API with those two keys.

Are there any datasets of the top 10, 000 tweets?

Here is a dataset of relative word frequency for the top 10,000 words in 890 million Tweets, divided by county: https://sites.google.com/site/wordmapperinfo/ A list of Twitter datasets and related resources, released under CC0.

Where can I find the Lerman Twitter 2010 dataset?

Lerman Twitter 2010 Dataset [2.8m] – Contains tweets containing URLs that have been posted on Twitter during October 2010. In addition to tweets, links of tweeting users were followed, allowing the reconstruction the follower graph of active (tweeting) users. Twitter_2010 {?} [2m] – Released by Kristina Lerman at USC.

Is it possible to summarise a speech in NLP?

Summarising a speech is more art than science, some might argue. But recent advances in NLP could well test the validity of that argument. In particular, Hugging Face’s (HF) transformers summarisation pipeline has made the task easier, faster and more efficient to execute.

When to use standard datasets for natural language processing?

Further, it is also helpful to use standard datasets that are well understood and widely used so that you can compare your results to see if you are making progress. In this post, you will discover a suite of standard datasets for natural language processing tasks that you can use when getting started with deep learning.

Which is a dataset for machine translation?

Machine translation is the task of translating text from one language to another. Below are some good beginner machine translation datasets. Aligned Hansards of the 36th Parliament of Canada. Pairs of sentences in English and French. European Parliament Proceedings Parallel Corpus 1996-2011. Sentences pairs of a suite of European languages.