Category Archives: Corpus linguistics

New book chapter: Using corpus linguistics and grounded theory to explore EMI stakeholders’ discourse

Using corpus linguistics and grounded theory to explore EMI stakeholders’ discourse

Niall Curry & Pascual Pérez-Paredes

In Qualitative Research Methods in English Medium Instruction for Emerging Researchers. Theory and Case Studies of Contemporary Research.
Edited by Samantha M. Curle, Jack K. H. Pun Routledge.

Typically, interview data thematic analyses employ common-sense approaches to thematic analysis (King et al., 2018). Such an approach necessitates that the researcher identifies distinctive themes and observes some degree of repetition of themes. As the process involves multiple stages of description, interpretation and synthesis and requires that analyses consider themes within and across a number of texts, there is a value in investigating the affordances of corpus linguistic approaches to interview analysis, given that corpus linguistics shares these considerations.

This chapter shows how corpus linguistics methodology can offer a nuanced approach to thematic coding when used in synchrony with frameworks, such as critical grounded theory (Hadley, 2017) and the ROAD-MAPPING framework (Dafouz & Smit, 2020). We argue that the use of keyword analysis to generate initial field codes for thematic analysis can reveal specific points in interviews and focus groups in which important themes are discursively constructed. We draw on a number of prior studies (e.g. Curry & Pérez-Paredes, 2021; Pérez-Paredes & Curry, forthcoming) with a view to demonstrating the reflexivity and value of this approach as a way to make sense of complex data, inform the use of existing analytical and theoretical approaches on EMI (Dafouz & Smit, 2020) and teacher identity (Martel & Wang, 2014). Overall, we reinforce the view that corpus linguistics research methods can inform a systemic view of mixed methods research (Hashemi, 2019), arguing that the use of advanced techniques of data analysis can favour a dynamic interpretation of thematic analyses (King et al., 2018).

Using ChatGPT (twice) to answer one my own undergrad essay titles

I’ve been teaching a module on mass media, discourse analysis and corpus linguistics for a few years. In this module, among other activities, I set a selection of essay titles for my undergrad students. They check out the titles and choose one. My assumption is that they go for the title where they can show a sophisticated understanding of the underlying question/ issue. Essentially, choosing a title to write about depends on the quality and the depth of their reading and engagement with the contents. However, there’s a new stakeholder in academia and higher education this year: Open AI ChapGPT.

One of the essay tiles I regularly use is this one: How different is corpus linguistics from the main approaches to critical discourse analysis (CDA)? This essay seeks to facilitate a discussion where the students reflect on the broad field of critical discourse analysis, and the wide range of methodologies that are used in CDA and, in particular, the contribution of corpus linguistics to CDA. Most of the background reading involves Paul Baker’s (2006) Using corpora in discourse analysis and a selection of extracts from research papers and books. Students are expected to write some 800 words.

I thought I should try ChatGPT before my own students do so. This is what ChatGPT has to say about my essay title:

Corpus linguistics and critical discourse analysis (CDA) are two distinct but related areas of study within linguistics and communication studies.

Corpus linguistics is a subfield of computational linguistics and is concerned with the study of language as it is used in real-world texts. Researchers in corpus linguistics use computerized text databases (corpora) to study patterns of language use and to identify lexical, grammatical, and discourse-level features of language.

Critical discourse analysis, on the other hand, is an interdisciplinary approach to the study of language and power that seeks to understand how language is used to maintain and reproduce existing power relations in society. CDA is concerned with the ways in which language is used to legitimize or challenge political and social inequalities, and it typically focuses on texts and discourse at the level of discourse or ideology.

While both corpus linguistics and CDA use text and discourse as their primary data, the approaches differ in their focus and goals. Corpus linguistics is primarily concerned with describing patterns of language use in texts, while CDA is concerned with explaining how language is used to maintain and reproduce social power relations.

That being said, the two approaches can complement each other, and some researchers use corpus linguistic methods to support their CDA research. For example, they may use corpus data to identify patterns of language use that reflect or reinforce social inequalities, and then use CDA methods to analyze these patterns in greater depth and to understand the ways in which they contribute to the maintenance of power relations in society.

Some quick-and-dirty conclusions

-Overall, the text is well structured, it shows thematic progression and most of the text reflects a broad understanding of the question.

-The length of the text is way below 800 words.

-The answer doesn’t come with a list of references or sources “used”.

-The text is vague. Appeals to authority are vague too and, for obvious reasons, the text shows no engagement with the specific set readings.

-All in all, I suppose that the students are very likely to try this tool. Being, most of them, L2 speakers of English, coming up with a well-written outline can be helpful to get them to write a first draft. Less industrious students may be tempted to copy and paste the answer. Students in this course are expected to read a selection of sources and consider some outstanding issues surrounding the use of corpora, corpus linguistics and CDA. This may easily take 40 or 50 hours of their time. Obtaining the reply above literally took me 20 seconds.

-I submitted the same question 2 hours after my initial prompt and I got a slightly different answer. I’m not totally sure but this may mean that if two students submit the same question they’ll always get different texts. This is the second answer I got (in blue):

Corpus linguistics and critical discourse analysis (CDA) are two distinct but complementary fields of study within linguistics that share some overlapping concerns, but also have some important differences.

Corpus linguistics is a method of language study that uses large collections of naturally occurring language data (corpora) to investigate patterns and trends in language use. This approach emphasizes the importance of quantifiable data and the use of computational tools to analyze the data, and seeks to understand language in terms of patterns of usage rather than focusing on the social and ideological context in which language is produced and consumed.

CDA, on the other hand, is an interdisciplinary field that seeks to examine the ways in which language is used to reproduce and reinforce power relations in society. This approach emphasizes the importance of understanding language in its social and cultural context and the ways in which language can be used to influence and shape public opinion, policy, and behavior. CDA often employs qualitative and interpretive methods, such as discourse analysis, to understand the relationships between language, power, and society.

While corpus linguistics and CDA both aim to understand the relationship between language and society, they approach this relationship from different perspectives and with different methods. Corpus linguistics is more focused on the descriptive and quantifiable aspects of language use, while CDA is more focused on the interpretive and critical aspects of language use and its relationship to power. The two approaches can be used together to gain a more complete understanding of the relationship between language and society.

-However, I can easily realize there are potential, creative uses of ChatGPT and similar tools. I see some value in getting a text and offer a critique based on students’ engagement with the readings and their own understanding of the issues at play.

-Similarly, there is some pedagogic value in comparing the two answers and offer a critique.

A couple of interesting resources

Bron Eager has created ChatGPT for educators, a free course that shows how to use this resource to facilitate the use of case studies in education.

Anna Mills’ webinar on what to do about ChatGPT? Next Steps for Educators. The webinar explores what ChatGPT’s capabilities are in relation to academic writing and how educators can respond.

Pascual Pérez-Paredes

9/02/2023

Taller de lingüística de corpus 16/12/2022, IULMA

Taller práctico presencial de Lingüística de Corpus sobre AntConc: “A Corpus analysis toolkit for concordancing and text analysis” impartido por el profesor Pascual Pérez Paredes.

Organizado por el Instituto Interuniversitario de Lenguas Modernas Aplicadas de la Comunitat Valenciana

Lugar y fecha: Universidad Jaume I, Castellón, 16 diciembre 2022

Inscripción: https://docs.google.com/forms/d/e/1FAIpQLSfjn84Nq8UeBJNJC84ll7FSQ2APA09nNdW5cglZYKM7_7jJJQ/viewform