This article introduces the PEC24, an extension of the Perugia corpus, as a new reference corpus for Italian. The update mainly concerned the size of the corpus, which now consists of approximately 47 million tokens, with an addition of over 100,000 texts. The PEC24 maintains the same structure as its predecessor, divided into 10 sections, representing ten different written and spoken genres. In this article, after reviewing the spoken, written, and web corpora available for the Italian language, the internal composition of each section of the corpus will be described, followed by an explanation of how the corpus was annotated. Further, as the PEC24 is available and searchable online, examples of how it can be queried will be illustrated. In conclusion, the PEC24 represents a significant advancement in the panorama of Italian corpora, offering a representative and more comprehensive resource for linguistic research and corpus-bases studies.
From PEC to PEC24: a new reference corpus for Italian
Spina S
;Zanda F;Fioravanti I
2025-01-01
Abstract
This article introduces the PEC24, an extension of the Perugia corpus, as a new reference corpus for Italian. The update mainly concerned the size of the corpus, which now consists of approximately 47 million tokens, with an addition of over 100,000 texts. The PEC24 maintains the same structure as its predecessor, divided into 10 sections, representing ten different written and spoken genres. In this article, after reviewing the spoken, written, and web corpora available for the Italian language, the internal composition of each section of the corpus will be described, followed by an explanation of how the corpus was annotated. Further, as the PEC24 is available and searchable online, examples of how it can be queried will be illustrated. In conclusion, the PEC24 represents a significant advancement in the panorama of Italian corpora, offering a representative and more comprehensive resource for linguistic research and corpus-bases studies.File | Dimensione | Formato | |
---|---|---|---|
Spina_Zanda_Fioravanti+From+PEC+to+PEC24+a+new+reference+corpus+for+Italian_updatedef..pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
675.45 kB
Formato
Adobe PDF
|
675.45 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.