Alma Mater Studiorum Università di Bologna
          
   
CORIS/CODIS

Italiano

A corpus of written Italian - CORIS/CODIS is available on-line for research purposes. The project, designed and co-ordinated by R. Rossini Favretti, was started in 1998, with the purpose of creating a representative and sizeable general reference corpus of written Italian which would be easily accessible and user-friendly. CORIS contains 150 million words and has been updated every three years by means of a built-in monitor corpus. It consists of a collection of authentic and commonly occurring texts in electronic format chosen by virtue of their representativeness of modern Italian.

The project is the result of research carried out at the University of Bologna. This has been possible thanks to technological development, the possibility of building on previous experience and the long period of preparatory study that preceded the planning and construction phases.

It is aimed at a broad spectrum of potential users, from Italian language scholars to Italian and foreign students engaged in linguistic analysis based on authentic data and, in a wider prospective, all those interested in intra- and/or interlinguistic analysis.

Design and implementation of a CORpus di Italiano Scritto

References

R. Rossini, F. Tamburini, A. Zaninello (2011), Exploiting corpus evidence for automatic sense induction, Actas del III Congreso de la Asociación Española de Lingüística de Corpus, Universitat Politècnica de València.

Grandi N., Montermini F., Tamburini F. (2011), Annotating large corpora for studying Italian derivational morphology. Lingue e Linguaggio, X.2., 227-244.

R. Rossini Favretti (2009), "Corpus data and frame semantics", in Abstracts of the International conference of the Linguistics Society of Belgium on Framing: from grammar to application, Antwerp, p. 17. (Presentation)

R. Rossini Favretti, F. Tamburini (2009), "Exploring register variation through corpus evidence", in Abstracts of DGfS 2009 Workshop on Corpus, Colligation, Register Variation, Osnabruck, p. 155. (Presentation)

F. Tamburini (2009). "PoS-tagging Italian texts with CORISTagger". In Proc of EVALITA 2009. AI*IA Workshop on Evaluation of NLP and Speech Tools for Italian, Reggio Emilia, December 2009.

R. Rossini Favretti (2008), "Grounding frame elements identification in corpus collocational patterns", in Proceedings of the 41st Meeting of the British Association of Applied Linguistics, Swansea, p.91-92. (Presentation)

F. Tamburini, C. Seidenari, A. Bolognesi, R. Bernardi (2008). "Italian Lexical-Classes Definition Using Automatic Methods.", In Rossini Favretti R. (ed.), Frames, Corpora and Knowledge Representation, Bologna: Bononia University Press, 95-120.

R. Bernardi, A. Bolognesi, C. Seidenari, F. Tamburini (2008). "Learning an Italian Categorial Grammar." In Rossini Favretti R. (ed.), Frames, Corpora and Knowledge Representation, Bologna: Bononia University Press, 185-200.

R. Rossini Favretti (2008). "Text, collocations and frames", in Rossini Favretti R. (ed.), Frames, Corpora and Language Representation, Bologna, BUP, pp.79-94.

R. Rossini Favretti (2008), Frames, Corpora and Language Representation, Bologna, BUP, 2008, pp.301.

R. Rossini Favretti, F. Tamburini, D. Proietti (2007), "Strumenti di esplorazione: i corpora", Tradurre per l'Europa, Accademia della Crusca, Florence, Italy.

R. Rossini Favretti (2007), "Multilinguismo e comunicazione in rete", in: Annali del Collegio Superiore. Anno Accademico 2007/08, G. Brandi (ed.), Bologna, BUP, 2007, pp. 175-185.

F. Tamburini (2007). "CORISTagger: a high-performance PoS tagger for Italian." Intelligenza Artificiale, IV(2), 14-15.

Onelli C., Proietti D., Seidenari C., Tamburini F. (2006). "The DiaCORIS project: a diachronic corpus of written Italian". In Proc. 5th International Conference on Language Resources and Evaluation - LREC 2006, Genova, 1212-1215.

Bernardi R., Bolognesi A., Seidenari C., Tamburini F. (2006). "POS tagset design for Italian". In Proc. 5th International Conference on Language Resources and Evaluation - LREC 2006, Genova, 1396-1401.

Bernardi R., Bolognesi A., Seidenari C., Tamburini F. (2005). "Automatic induction of a POS tagset for Italian". In Proc. Australasian Language Technology Workshop 2005, Sydney, 176-183.

Tamburini F. (2004). "Building Distributed Language Resources by Grid Computing". In Proc. 4th International Conference on Language Resources and Evaluation - LREC 2004, Lisbon, 1217-1220.

Bernardi R., Bolognesi A., Tamburini F., and Moortgat M. (2004). "Categorial Type Logic meets Dependency Grammar to annotate an Italian corpus". In Proc. Recent Advances in Dependency Grammars Workshop - COLING 2004, Geneva, 57-64.

Tamburini F. (2002). "A dynamic model for reference corpora structure definition". In Proc. Third International Conference on Language Resources and Evaluation - LREC2002, Las Palmas, Canary Islands, Spain, 1847-1850.

R. Rossini Favretti, F. Tamburini, C. De Santis (2002). "A corpus of written Italian: a defined and a dynamic model.", in A. Wilson, P. Rayson,  T. McEnery (eds.) , A Rainbow of Corpora: Corpus Linguistics and the Languages of the World, Lincom-Europa, Munich. 

F. Tamburini, C. De Santis, E. Zamuner (2002). "Identifying phrasal connectives in Italian using quantitative methods." In Phrases and Phraseology - Data and Descriptions, S. Nuccorini (ed.), Berlin: Peter Lang. 

R. Rossini Favretti (2002), "Corpus linguistics and Italian studies", In S. Nuccorini (ed.), Phrases and Phraseology - Data and Descriptions, Bern: Peter Lang, pp. 27-43.

R. Rossini Favretti (2001), "La linguistica dei corpora in Europa: prospettive di analisi", in Lingua e Stile, XXVI, 2, pp. 367-81.

R. Rossini Favretti (2001), "Interpretation and Representation in the Discourse of Economics". In P.L. Porta, R. Scazzieri, A. Skinner (eds.), in Knowledge, Institutions and the Division of Labour, Cheltenham: Edward Elgar, pp. 65-74.

R. Rossini Favretti (2000). "Progettazione e costruzione di un corpus di italiano scritto: CORIS/CODIS", in R. Rossini Favretti (ed.), Linguistica e informatica. Multimedialità, corpora e percorsi di apprendimento, Bulzoni, Roma, pp. 39-56.

F. Tamburini (2000). "Annotazione grammaticale e lemmatizzazione di corpora in italiano.", in R. Rossini Favretti (ed.), Linguistica e informatica. Multimedialità, corpora e percorsi di apprendimento, Bulzoni, Roma, pp. 57-73.