Periodicals

Prispevki za novejšo zgodovino

CLASSLA-Stanza

The Next Step for Linguistic Processing of South Slavic Languages

CLASSLA-Stanza

Naslednji korak za jezikovno procesiranje južnoslovanskih jezikov

Author(s):Luka Terčon, Kaja Dobrovoljc, Nikola Ljubešić

Co-author(s):Jure Gašparič (gl. ur.), Mojca Šorn (ur.), Andreja Jezernik (lekt.), Cody J. Inglis (lekt.), Studio S.U.R (lekt., prev.)

Leto:2025

Publisher(s):Inštitut za novejšo zgodovino, Ljubljana

Source(s):Prispevki za novejšo zgodovino, 2025, št. 3

Language(s):slovenščina, angleščina

Type(s) of material:text

Keywords:južnoslovanski jeziki, avtomatsko procesiranje jezika, označevalni cevovod, jezikovno označevanje, South Slavic languages, automatic linguistic processing, annotation pipeline, linguistic annotation

Identifier:https://doi.org/10.51663/pnz.65.3.05

Rights:

This work by Luka Terčon, Kaja Dobrovoljc, Nikola Ljubešić is licensed under Creative Commons Attribution-ShareAlike 4.0 International

Files (1)

Name:PNZ_03_2025.pdf

Size:12.31MB

Format:

Permanent link:https://hdl.handle.net/11686/file61994

Open

Download

Description

We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to Stanza and give a detailed description of the model training process for the latest 2.2 release of the pipeline. We also report performance scores produced by the pipeline for different languages and language varieties. CLASSLA-Stanza exhibits consistently high performance across all the supported languages and outperforms its parent pipeline Stanza at all the supported tasks. We also present the pipeline’s new functionality that enables efficient processing of web data and describe the efficiency of the pipeline for annotating written transcripts of spoken data.

Metadata (13)

identifierhttps://hdl.handle.net/11686/71604
title
- CLASSLA-Stanza
- The Next Step for Linguistic Processing of South Slavic Languages
- CLASSLA-Stanza
- Naslednji korak za jezikovno procesiranje južnoslovanskih jezikov
creator
- Luka Terčon
- Kaja Dobrovoljc
- Nikola Ljubešić
contributor
- Jure Gašparič (gl. ur.)
- Mojca Šorn (ur.)
- Andreja Jezernik (lekt.)
- Cody J. Inglis (lekt.)
- Studio S.U.R (lekt., prev.)
subject
- južnoslovanski jeziki
- avtomatsko procesiranje jezika
- označevalni cevovod
- jezikovno označevanje
- South Slavic languages
- automatic linguistic processing
- annotation pipeline
- linguistic annotation
description
- We present CLASSLA-Stanza, a pipeline for automatic linguistic annotation of South Slavic languages, which is based on the Stanza natural language processing pipeline. We describe the main improvements in CLASSLA-Stanza with respect to Stanza and give a detailed description of the model training process for the latest 2.2 release of the pipeline. We also report performance scores produced by the pipeline for different languages and language varieties. CLASSLA-Stanza exhibits consistently high performance across all the supported languages and outperforms its parent pipeline Stanza at all the supported tasks. We also present the pipeline’s new functionality that enables efficient processing of web data and describe the efficiency of the pipeline for annotating written transcripts of spoken data.
- V članku predstavljamo orodje CLASSLA-Stanza, cevovod za avtomatsko jezikovno označevanje južnoslovanskih jezikov, ki temelji na cevovodu za procesiranje naravnega jezika Stanza. Opišemo vse glavne izboljšave, ki jih prinaša CLASSLA-Stanza v primerjavi s Stanzo in podamo podroben opis postopka učenja modelov v različici 2.2, najnovejši različici orodja. Obenem poročamo o rezultatih delovanja cevovoda za različne jezike in jezikovne zvrsti. CLASSLA-Stanza dosega konsistentno visoke rezultate za vse podprte jezike in preseže rezultate izvornega cevovoda Stanza pri vseh podprtih jezikih. Predstavimo tudi novo funkcijo cevovoda, ki omogoča učinkovito procesiranje spletnih besedil, in opišemo učinkovitost cevovoda za označevanje transkriptov govora.
publisher
- Inštitut za novejšo zgodovino
date
- 2025
type
- besedilo
identifier
- identifier: https://doi.org/10.51663/pnz.65.3.05
language
- Slovenščina
- Angleščina
isPartOf
- https://hdl.handle.net/11686/71598
rights
- license: ccBySa

Archive sources

Museum items

Printed sources

Oral sources

Critical editions

Monographs

Reference collections

Periodicals

Thesis and textbooks

Typescript

Text Collections

Conferences

Lectures

Exhibitions

Social Science Data Archive

CLARIN.SI

Research data

World War II casualties

History Citation Index

DARIAH-SI

Tools

Documentation

SI-DIH

slv

CLASSLA-Stanza

The Next Step for Linguistic Processing of South Slavic Languages

CLASSLA-Stanza

Naslednji korak za jezikovno procesiranje južnoslovanskih jezikov

Files (1)

Description

Metadata (13)