Slovak Categorized News Corpus

This corpus aims to be the first attempt to create a representative sample of the contemporary Slovak language from various domains with easy searching and automated processing.

It contains a selection of news articles, processed by our NLP tools.

The process of the corpus annotation can be tried using online demo.

The second part of the effort is the information retrieval evaluation set for the corpus.


Please write a request on for download link.


D. Hládek, J. Staš, J. Juhár: Slovak Categorized News Corpus, LREC 2014 PDF poster