Slovak Categorized News Corpus
This corpus aims to be the first attempt to create a representative sample of the contemporary Slovak language from various domains with easy searching and automated processing.
It contains a selection of news articles, processed by our NLP tools.
The process of the corpus annotation can be tried using online demo.
- Token boundary identification
- Sentence boundary identification
- Morphological Analysis
- Named Entity Recognition
- Named Entity Transcription
The second part of the effort is the information retrieval evaluation set for the corpus.
Please write a request on firstname.lastname@example.org for download link.