Category Archives: Corpus linguistics & applied linguistics research

Plenary: From data literacy to AI literacy: examining engagement with corpora and technology in language education

Prof Pascual Pérez-Paredes will deliver a keynote at the 2nd EUt+ International Conference on Languages  EUt LC 2024 -Merging New Trends and Consolidating Good Practices in Languages for Specific Purposes – UPCT, Cartagena, Spain, June 26-28, 2024

Abstract

In this talk, I discuss recent developments in what I have described elsewhere as Broad scope DDL (BsDDL) (Pérez-Paredes, 2024), an alternative approach that situates learners’ learning ecology (Pérez-Paredes, 2022b) at the centre of a learning process and where a variety of language data sources such as corpora, Gen AI and Large Language Models (LLMs) coexist. This ecology acknowledges the important role of new digital literacies and the symbolically mediated practices involving different types of knowledge and skills when engaged with texts in electronically mediated environments (Kern, 2021).

References

Boulton, A. (2021). Research in data-driven learning. In Pérez-Paredes, P., & Mark, G. (Eds.) Beyond concordance lines: Corpora in language education. John Benjamins, pp.9-34.

Boulton, A., & Cobb, T. (2017). Corpus use in language learning: A meta‐analysis. Language learning, 67(2), 348-393.

Boulton, A., & Vyatkina, N. (2021). Thirty years of data-driven learning: Taking stock and charting new directions over time. Language, Learning & Technology, 25(3), 66-89.

Boulton, A., & Vyatkina, N. (2023). Expanding Methodological Approaches in DDL Research. TESOL Quarterly.

British Council, The. (2023). Artificial intelligence and English language teaching: Preparing for the future. URL: https://www.teachingenglish.org.uk/publications/case-studies-insights-and-research/artificial-intelligence-and-english-language

Curry, N., Baker, P., & Brookes, G. (2024). Generative AI for corpus approaches to discourse studies: a critical evaluation of ChatGPT. Applied Corpus Linguistics, 4(1).

Kern, R. (2021). Twenty-five years of digital literacies in CALL. Language Learning & Technology, 25(3), 132–150.

Mizumoto, A. (2023). Data-driven Learning Meets Generative AI: Introducing the Framework of Metacognitive Resource Use. Applied Corpus Linguistics, 3(3), 100074.

Pérez-Paredes, P. (2010). Corpus Linguistics and Language Education in Perspective: Appropriation and the Possibilities Acenario. In T. Harris & M. Moreno Jaén (Eds.), Corpus Linguistics in Language Teaching (pp. 53-73). Peter Lang.

Pérez-Paredes, P. (2022a). A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Computer Assisted Language Learning, 35(1-2), 36-61.

Pérez-Paredes, P. (2022b). How learners use corpora. In R. R. Jablonkai & E. Csomay (Eds). The Routledge Handbook of Corpora and English Language Teaching and Learning (pp. 390-405). Routledge.

Pérez-Paredes, P. (2024) Data-driven learning in informal contexts? Embracing Broad Data-driven learning (BDDL) research. In Crosthwaite, P. (Ed.). Corpora for Language Learning: Bridging the Research-Practice Divide. Routledge.

The Core Metadata Schema for L2 data

The Core Metadata Schema for L2 data: Collaborative efforts towards improved data findability, metadata quality and study comparability in L2 research

Dr Magali Paquot, UCLouvain

October 30, 18:00 (Madrid time) / 17:00 (UK time)

Registration: https://umurcia.zoom.us/webinar/register/WN_a6Wkw7llSG2HrvJ9yIGKvQ

You can check out the 2021 and 2022 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

Abstract

The Core Metadata Schema for L2 data consists in a comprehensive set of variables that encapsulate crucial information about L2 data. It is organized into several sections that describe specific aspects of a learner corpus. These include administrative details (e.g. authors or license), corpus design, text-related variables, learner-related variables, in-built annotation(e.g. details about manual or automatic annotation), information about annotators or transcribers (e.g. native language or language repertoire) and task-related details (e.g. instructions, time constraints) (Paquot et al., 2023). It is the result of extensive collaboration between learner corpus compilers at the Centre for English Corpus Linguistics (UCLouvain, Belgium) and EURAC Research (Bolzano, Italy), and a research data infrastructure expert and member of CLARIN’s metadata taskforce (König et al., 2022; Frey et al. 2023).

In this presentation, I will discuss the underlying rationale for the development of such a resource and present its second version. This will give me the opportunity to clarify in what ways we have tried to embark learner corpus researchers into this initiative and reiterate our hope that the LCR community will collaborate with us to refine the schema and align it with the evolving needs of the field.

References

Frey, J.-C., König, A., Stemle, E. & M. Paquot (2023). A core metadata schema for L2 data. Paper presented at the 32nd Conference of the European Second Language Association (EUROSLA), 30 August – 2 September 2023, University of Birmingham, UK.

König, A., Frey J.-C., Stemle, E., Glaznieks, A. & M. Paquot (2022). Towards standardizing LCR metadata. Paper presented at Learner Corpus Research 6, 22-24 September 2022, University of Padua, Italy.

Paquot, M., König, A., Stemle, E. & J.-C. Frey (2023). Core Metadata Schema for Learner Corpora, https://doi.org/10.14428/DVN/4CDX3P

Dr Magali Paquot is a permanent FNRS research associate at the Centre for English Corpus Linguistics, Institut Langage et Communication, UCLouvain, and an affiliate member of the Corpus Linguistics Lab, University of Florida. She holds a PhD in Linguistics (Université catholique de Louvain) and a degree in Natural Language Processing (Université de Liège). Her research interests include (but are not limited to) corpus linguistics, learner corpus research, vocabulary, phraseology (collocations, lexical bundles, …), pedagogical lexicography, electronic lexicography, terminology, EAP (English for Academic Purposes), ESP (English for Specific Purposes), EFL (English as a Foreign Language), SLA (Second Language Acquisition), linguistic complexity and L1 influence.

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).

Coordination: Prof Pascual Pérez-Paredes & Dr Carlos Ordoñana Guillamón

Contrastive approaches in corpus linguistics research

Dr Niall Curry, Manchester Metropolitan University

October 11, 18:00 (Madrid time) / 17:00 (UK time)

This talk is part of the Corpus linguistics & applied linguistics research 2023 online event.

Registration: https://umurcia.zoom.us/webinar/register/WN_d68rw3V_TnOGNWDg6sXHnw

Abstract

Comparability is a core criterion underpinning corpus linguistics research. From using a reference corpus to determine keywords to comparing across time, space, and language, corpus linguistics often draws on different data sets to tell us what is special about the language we are studying. This view has become so naturalised within corpus linguistics methodologies that discussions of comparability in corpus research are quite uncommon. This challenge of addressing comparability is long-standing in fields like contrastive analysis, which came to prominence and fell to decline owing to advances and limitations in methodological approaches, in part related to issues of comparability. In its most recent rise, as corpus-based contrastive linguistics, research has sought to merge contrastive and corpus linguistics approaches to address the weaknesses identified in contrastive analysis methodologies and enhance perspectives on comparability in corpus linguistics research. Merging contrastive and corpus linguistics approaches, this talk presents case studies with a view to interrogating issues of comparability in corpus analysis and establishing theoretical bases from which to draw meaningful comparisons across multilingual discourses. Specifically, the talk sheds light on the methodological pitfalls we encounter in comparing corpora representing a range of contexts and variables, the impact that our methods of analysis can have on our findings, and the importance of contextually situating contrastive studies from epistemological and ontological perspectives. The findings of the talk are intended to offer points of reflection for anyone applying contrastive approaches in corpus linguistics research, both across languages and across language varieties.

Dr Niall Curry is a Senior Lecturer in TESOL and Applied Linguistics within the Department of Languages, Information and Communications, at Manchester Metropolitan University. Currently, he is researching language relating to global crises and global issues. He is particularly interested in investigating how knowledge of these issues and crises is socially and discursively constructed across contexts, times, languages, and cultures with a view to understanding better how global issues vary across local contexts, and for international and local audiences. His areas of focus include (but are not limited to) issues such as climate, health, economics, and education. In parallel, Niall is conducting research on applied linguistics and TESOL related issues, spanning foci on register, genre, metadiscourse, materials development, and digital pedagogies.

You can check out the 2021 and 2022 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).

Coordination: Prof Pascual Pérez-Paredes & Dr Carlos Ordoñana Guillamón

Multiple correspondence analysis and corpus linguistics research

Dr Isobelle Clarke, Lancaster University

October 25, 17:30 (Madrid time) / 16:30 (UK time)

This talk is part of the Corpus linguistics & applied linguistics research 2023 online event.

Registration: https://umurcia.zoom.us/webinar/register/WN_s0aPEXAFTQe_App0qS7Erg

Abstract

In this talk, I will describe what Multiple Correspondence Analysis (MCA) is and how it can be used for the Multi-Dimensional Analysis of short texts, as well as for corpus-assisted discourse analysis in an approach called Keyword Co-occurrence Analysis, drawing on the results of my own research on tweets (Clarke and Grieve, 2019; Clarke, 2022) and discourses of Islam in the UK press (Clarke et al. 2021; 2022). I will then go on to demonstrate how the results can be used to track communicative functions and discourses over time in diachronic analyses. Finally, I will discuss the limitations of MCA in these tasks.

Dr Isobelle Clarke‘s research interests include corpus linguistics, forensic linguistics, sociolinguistics and news discourse and discourse analysis. Her previous research covers language variation on social media, especially Twitter, and authorship analysis. Her current research examines the representation of Islam in the press and second learner language and spoken language. Dr Clarke received a Leverhulme’s early career researcher fellowship to investigate anti-science discourses, such as anti-vaccination discourse, climate change denials, and anti-GMO discourse.

You can check out the 2021 and 2022 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).

Coordination: Prof Pascual Pérez-Paredes & Dr Carlos Ordoñana Guillamón

New book chapter: Using corpus linguistics and grounded theory to explore EMI stakeholders’ discourse

Using corpus linguistics and grounded theory to explore EMI stakeholders’ discourse

Niall Curry & Pascual Pérez-Paredes

In Qualitative Research Methods in English Medium Instruction for Emerging Researchers. Theory and Case Studies of Contemporary Research.
Edited by Samantha M. Curle, Jack K. H. Pun Routledge.

Typically, interview data thematic analyses employ common-sense approaches to thematic analysis (King et al., 2018). Such an approach necessitates that the researcher identifies distinctive themes and observes some degree of repetition of themes. As the process involves multiple stages of description, interpretation and synthesis and requires that analyses consider themes within and across a number of texts, there is a value in investigating the affordances of corpus linguistic approaches to interview analysis, given that corpus linguistics shares these considerations.

This chapter shows how corpus linguistics methodology can offer a nuanced approach to thematic coding when used in synchrony with frameworks, such as critical grounded theory (Hadley, 2017) and the ROAD-MAPPING framework (Dafouz & Smit, 2020). We argue that the use of keyword analysis to generate initial field codes for thematic analysis can reveal specific points in interviews and focus groups in which important themes are discursively constructed. We draw on a number of prior studies (e.g. Curry & Pérez-Paredes, 2021; Pérez-Paredes & Curry, forthcoming) with a view to demonstrating the reflexivity and value of this approach as a way to make sense of complex data, inform the use of existing analytical and theoretical approaches on EMI (Dafouz & Smit, 2020) and teacher identity (Martel & Wang, 2014). Overall, we reinforce the view that corpus linguistics research methods can inform a systemic view of mixed methods research (Hashemi, 2019), arguing that the use of advanced techniques of data analysis can favour a dynamic interpretation of thematic analyses (King et al., 2018).

Corpus linguistics and the discursive construction of migrants

Online talk November 9, 12:00 (Madrid time) / 11:00 (UK time)

Dr Charlotte Taylor, University of Sussex

This talk is part of the Corpus & applied linguistics research 2022 online event.

Free registration link.

Abstract

The ways in which migration is framed in our public spaces influence who we think of when we think of migration, how we think about those people, and what kinds of responses are considered appropriate to their movement. In this paper, I want to show how corpus linguistics can be used to build a fuller picture of how migration is framed. I will start by addressing how corpus linguistics can help determine ‘who’ we should analyse when investigating the discursive construction of migrant groups. Then I will move on to what corpus linguistics, combined with discourse analysis can tell us about ‘how’ they are represented. Finally, I will discuss how we can use corpus linguistics to compare such representations across cultures and languages. 

Charlotte Taylor is Senior Lecturer in English Language & Linguistics at the University of Sussex. Her work is broadly concerned with language and persuasion and the rhetorical uses of language. Her current project examines the role of memory in migration discourses and unpicks the ways in which our contemporary public discourses both recycle past frames, and elide and re-shape past experiences. She also has a keen interest in methodological issues in corpus and discourse work. Her book-length publications include Corpus Approaches to Discourse (with Anna Marchi), Exploring Absence and Silence in Discourse (with Melani Schroeter), Patterns and Meanings in Discourse (with Alan Partington & Alison Duguid) and Mock Politeness in English and Italian. She is working on a new monograph titled Migration Discourses and Memory which will be coming out in 2023.

You can check out the 2021 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).

A corpus-friendly analysis of fragmentary constructions in English

Online talk. October 26, 18:00 (Madrid time) / 17:00 (UK time)

Prof Javier Pérez-Guerra, Universidade de Vigo

This talk is part of the Corpus & applied linguistics research 2022 online event.

Free registration link.

Abstract
The concept of ‘fragment’ covers a wide array of structures of a very diverse nature, from interjections and headings to lists. In this talk, ‘fragment’ is narrowed down to encompass only stand-alone constructions which, despite their reduced, non-canonical, fragmentary structure, convey a propositional meaning comparable to that of complete sentences, as in Well done to Giles! and Good old Hendon next stop. The talk presents a theoretical (couched within Cognitive Construction Grammar) and an empirical characterisation of fragmentary expressions in contemporary English, based on the corpus analysis of sentence fragments in written and spoken English in recent diachrony (BNC1994 and BNC2014).


Javier Pérez-Guerra is Professor in the English Department of the University of Vigo (Spain). Javier is the coordinator of the LVTC (Language Variation and Textual Categorisation) research group in this institution, and the principal investigator of a number research projects funded by mainly the Spanish Ministry of Innovation and Science. His areas of specialisation are information packaging in the clause, multidimensional approaches to register variation as applied to earlier periods of English, the study of grammatical variation between Early Modern and Present-day English by means of computational techniques, and the impact of performance preferences and ease of processing on the design of grammars.

You can check out the 2021 talks here:

https://www.youtube.com/channel/UCKjKIIQL6u1mXD2V9ZaT-_Q/featured

This online event is organized by the Universidad de Murcia and the E020-07 research group (Lenguajes de especialidad, corpus lingüísticos y lingüística inglesa aplicada a la ingeniería del conocimiento).