/
Periodicals
/
Prispevki za novejšo zgodovino

This work by Luka Terčon, Kaja Dobrovoljc, Nikola Ljubešić is licensed under Creative Commons Attribution-ShareAlike 4.0 International
We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to Stanza and give a detailed description of the model training process for the latest 2.2 release of the pipeline. We also report performance scores produced by the pipeline for different languages and language varieties. CLASSLA-Stanza exhibits consistently high performance across all the supported languages and outperforms its parent pipeline Stanza at all the supported tasks. We also present the pipeline’s new functionality that enables efficient processing of web data and describe the efficiency of the pipeline for annotating written transcripts of spoken data.