logo

/

Periodicals

/

Prispevki za novejšo zgodovino

Computational Analysis of Slovenian Historical Newspapers (1771–1914)

Linguistic, Thematic, and Nation-building Insights

Co-author(s):Jure Gašparič (gl. ur.), Mojca Šorn (ur.), Andreja Jezernik (lekt.), Cody J. Inglis (lekt.), Studio S.U.R (lekt., prev.)
Leto:2025
Publisher(s):Inštitut za novejšo zgodovino, Ljubljana
Language(s):slovenščina, angleščina
Type(s) of material:text
Identifier:https://doi.org/10.51663/pnz.65.3.02
Rights:
CC license

This work by Ajda Pretnar Žagar is licensed under Creative Commons Attribution-ShareAlike 4.0 International

Files (1)
Name:PNZ_03_2025.pdf
Size:12.31MB
Format:
Open
Download
Description

This paper presents a computational linguistic analysis of sPeriodika, a historical corpus of Slovenian periodicals published between 1771 and 1914. Using keyword analysis and diachronic analysis, we explore the linguistic, thematic, and historical dimensions of ten prominent newspapers in the corpus. Our findings reveal the centrality of these newspapers in shaping Slovenian nation-building during the post-1848 period, while also highlighting the diverse thematic orientations of individual periodicals, including agriculture, pedagogy, literature, and advertising. Moreover, the study examines the challenges posed by low-quality Optical Character Recognition (OCR) in historical text digitisation and its implications for linguistic and content analysis. By combining computational methods with historical inquiry, this research provides insights into the evolution of the Slovenian language, the media’s role in nation-building , and the potential for improving OCR-based textual resources.

Metadata (13)
  • identifierhttps://hdl.handle.net/11686/71601
    • title
      • Računalniška analiza slovenskih zgodovinskih časopisov (1771–1914)
      • Jezikovni, tematski in državotvorni uvidi
      • Computational Analysis of Slovenian Historical Newspapers (1771–1914)
      • Linguistic, Thematic, and Nation-building Insights
    • creator
      • Ajda Pretnar Žagar
    • contributor
      • Jure Gašparič (gl. ur.)
      • Mojca Šorn (ur.)
      • Andreja Jezernik (lekt.)
      • Cody J. Inglis (lekt.)
      • Studio S.U.R (lekt., prev.)
    • subject
      • korpusno jezikoslovje
      • slovenski časopisi
      • napake OCR
      • analiza ključnih besed
      • historical periodicals
      • keyword analysis
      • OCR errors
      • corpus linguistics
    • description
      • Prispevek predstavlja računalniško-jezikoslovno analizo sPeriodike, zgodovinskega korpusa slovenskih periodičnih publikacij, izdanih med letoma 1771 in 1914. Z analizo ključnih besed ter diahrono analizo smo raziskali jezikovne, tematske in zgodovinske razsežnosti desetih najvidnejših časopisov v korpusu. Ugotovitve razkrivajo osrednjo vlogo teh časopisov pri oblikovanju slovenskega narodnega prebujanja v obdobju po letu 1848, hkrati pa poudarjajo raznolike tematske usmeritve posameznih periodičnih publikacij, kot so kmetijstvo, pedagogika, književnost in oglaševanje. Poleg tega raziskava obravnava izzive, ki jih prinaša slaba kakovost optičnega prepoznavanja znakov (OCR) pri digitalizaciji zgodovinskih besedil, ter njihove posledice za jezikovno in vsebinsko analizo. Združevanje računalniških metod z zgodovinskim raziskovanjem v tej študiji ponuja vpogled v razvoj slovenskega jezika, vlogo medijev pri oblikovanju narodne identitete in možnosti za izboljšanje besedilnih virov, temelječih na OCR.
      • This paper presents a computational linguistic analysis of sPeriodika, a historical corpus of Slovenian periodicals published between 1771 and 1914. Using keyword analysis and diachronic analysis, we explore the linguistic, thematic, and historical dimensions of ten prominent newspapers in the corpus. Our findings reveal the centrality of these newspapers in shaping Slovenian nation-building during the post-1848 period, while also highlighting the diverse thematic orientations of individual periodicals, including agriculture, pedagogy, literature, and advertising. Moreover, the study examines the challenges posed by low-quality Optical Character Recognition (OCR) in historical text digitisation and its implications for linguistic and content analysis. By combining computational methods with historical inquiry, this research provides insights into the evolution of the Slovenian language, the media’s role in nation-building , and the potential for improving OCR-based textual resources.
    • publisher
      • Inštitut za novejšo zgodovino
    • date
      • 2025
    • type
      • besedilo
    • identifier
      • identifier: https://doi.org/10.51663/pnz.65.3.02
    • language
      • Slovenščina
      • Angleščina
    • isPartOf
    • rights
      • license: ccBySa