The Longitudinal lEarner cOrpus in Italiano, Deutsch & English (LEONIDE)

LEONIDE  joins the written language productions from the students, who participated in the study of the project SMS in a linguistically annotated and searchable corpus and allows reference to the students texts and further research on the data by making the corpus available to the research community. Due to the design of the study, the corpus makes it possible to observe the students written competences in three different languages (Italian, German and English) and over the span of all three years of lower secondary school education. It can thus be used for longitudinal research as well as for research on multilingual competences of the students, while offering everything researchers need for more traditional forms of learner corpus research investigating only L2 language productions. The corpus furthermore combines texts of two text genres, a picture story re-telling task and an opinion text, elaborating on different aspects related to the pupils’ life and public discourse. Each student produced one text per text genre and language in every year. The opinion texts were written with complementary tasks for German learners of Italian and Italian learners of German, and with a task repetition in year 1 and year 3, to allow targeted comparisons.

Corpus details: The corpus has an overall size of approx. 240 000 tokens spread over 2 510 texts that have an average text size of 94 tokens (min. 1, max. 517 tokens). The corpus contains texts of 163 students, 81 from school with Italian as main language of instruction and 82 from schools with German as main language of instruction, representing 4 school classes for each of the two major school systems of the Province of South Tyrol/Alto Adige. Complete text samples, where the participant handed in all texts for every language, every exercise type, and every year (18 in total), are available for 94 students. Subdivided by language, the corpus contains 844 Italian, 833 German and 833 English texts. A series of relevant person-related metadata moreover provides information about e.g. age, gender, first language(s), school and possible special needs of the students.

Usage: as the corpus documents the development of plurilingual competencies of individual learners, it allows for contrastive longitudinal research on the development of young learners’ writing skills in different languages while considering person-related metadata. Moreover, the corpus is a valuable resource for language teachers in order to create and improve their teaching material and language courses as the large amount of authentic and longitudinal data reflects the sequencing of language skills over three consecutive years in three languages.

Availability: The corpus will be available for corpus queries via an ANNIS search interface and as download for academic purposes (ACA-BY-NC-NORED 1.0) on the Eurac Research Clarin Centre by the end of 2020.

For further information and corpus access, please refer to our Learner Corpus Portal PORTA.

