1. Trang chủ
  2. » Luận Văn - Báo Cáo

Personal pronoun we and other key items in non native english learners academic writing a corpus driven study

126 3 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Personal Pronoun We and Other Key Items in Non-Native English Learners Academic Writing: A Corpus-Driven Study
Tác giả Trinh Ngoc Thanh
Người hướng dẫn Dr. Ai-Ling Wang
Trường học Tamkang University
Chuyên ngành English
Thể loại dissertation
Năm xuất bản 2019
Thành phố Taipei
Định dạng
Số trang 126
Dung lượng 5,74 MB

Cấu trúc

  • CHAPTER 1: INTRODUCTION (0)
    • 1.1 Research Purposes (8)
    • 1.2 Research Background (9)
    • 1.3 Statement of the Problems (17)
    • 1.4 Research Summary (22)
    • 1.5 Research Contributions (24)
  • CHAPTER 2: LITERATURE REVIEW (26)
    • 2.1 Research Paradigms in Corpus Linguistics (26)
    • 2.2 Research Implementations of Corpus-Driven Approach (31)
    • 2.3 Empirical Corpus Findings on Person Pronouns (36)
  • CHAPTER 3: RESEARCH METHODOLOGY (43)
    • 3.1 Review of Methods of Investigation (43)
    • 3.2 Review of Research Issues (45)
    • 3.3 Research Procedures (47)
    • 3.4 Pilot study (54)
  • CHAPTER 4: FINDINGS (62)
    • 4.1 The Overuse of We in Chinese Learner’s English Essay Corpus (62)
    • 4.2 The Distribution of We-Clause in Chinese Learner’s English Essay Corpus (70)
    • 4.3 Person Pronouns from Multiple Corpora Comparison (74)
    • 4.4 Exploring I and We Tri-Grams from Multiple Corpora Sources (79)
  • CHAPTER 5: DISCUSSION (83)
    • 5.1 Patterns of Pronoun Usage (83)
    • 5.2 Patterns of Lexical Items Usage (91)
    • 5.3 Patterns of n-grams (94)
  • CHAPTER 6: CONCLUSION (103)
    • 6.1 Efficiency of Frequency-Based Findings (103)
    • 6.2 Limitations of the Study (104)
    • 6.3 Direction for Future Studies (106)

Nội dung

INTRODUCTION

Research Purposes

This dissertation aims to explore the significance of frequency data within an inductive analysis using a corpus-driven approach It specifically addresses the practical application of corpus-driven analysis to identify linguistic patterns that emerge from frequency data, ensuring reliability in data interpretation based on frequency distribution and recurring linguistic patterns Consequently, the central research question guiding this dissertation is formulated to investigate these aspects.

To what extent can corpus-driven analysis reveal the linguistic patterns and enhance the reliability of data interpretation?

Considering the stated theoretical question as the guidance of studies, the dissertation is conducted to crack out three theoretical concerns regarding corpus-driven approach for text data analysis

First, it is the view of frequency data as the primary concern in corpus-driven approach (Biber,

Reliance on frequency-based criteria in corpus-driven approaches may lead to the exclusion of significant corpus evidence when minimum frequency filters are applied (Xiao, 2009) Additionally, the role of pre-existing theories in inductive corpus-driven analysis raises concerns, particularly regarding the claim that the corpus-driven perspective exaggerates its independence from preconceived theories and intuition (Xiao, 2009).

The integration of qualitative and quantitative paradigms in corpus linguistics research raises concerns about its effectiveness In situations where the context of writing samples is unclear, corpus linguists can utilize a quantitative analysis as a bottom-up approach to identify textual patterns However, without subsequent qualitative analysis, deeper insights into the written discourse and genres remain inaccessible.

This dissertation focuses on the overuse of the personal pronoun "we," utilizing a corpus-driven approach In addition to examining "we," the study also analyzes other personal pronouns and lexical items to explore data from student academic writing.

At the first step of setting up the theoretical platform of the dissertation, the next section briefly reviews the research background and the research problems of the dissertation

Section 1.2 explores the concept of corpus representativeness and the dynamics between writers and their topics, while Section 1.3 focuses on the challenges faced by non-native English learners, particularly highlighting the excessive use of "we" as a key area of investigation in the dissertation.

Research Background

This section explores the concept of corpus representativeness, highlighting its traditional and modern interpretations It emphasizes the significance of sample selection and size in effectively analyzing unique characteristics within corpus data.

The traditional view of corpus representativeness

Corpus representativeness is defined by how well a corpus reflects the linguistic diversity of specific populations To achieve this, the selected samples must exhibit two key dimensions: first, a balanced representation of various genres within the text types, and second, the demographic characteristics of the population being studied.

In evaluating the balance of genres within collected text samples, Biber (1993) proposed two key criteria for assessing corpus representativeness: the range of text types and the distribution of linguistic features The first criterion examines the diversity of resources, such as newspapers, fiction, novels, and academic writing, from which the text data is drawn The second criterion focuses on the distribution of linguistic characteristics, both internally within individual texts and externally across different text types.

Atkins, Clear, and Ostler (1992) highlight the importance of population focus in determining the representativeness of a language corpus They emphasize that generalizations about language, aligned with the sampling method, are crucial for ensuring the specialization of collected writing samples The extent to which results from these samples reflect the entire population is assessed through both internal and external criteria According to Atkins et al (1992), incorporating internal criteria is essential for achieving accurate generalizations.

The relationship between language and context is essential, as linguistic features alone do not account for text variation Non-linguistic factors, such as social and situational contexts, play a crucial role in analyzing textual features, highlighting the importance of considering these external criteria in any comprehensive text analysis.

Considering the criteria of sampling method, Atkins et al (1992) mentioned the distinction between receptive (i.e languages for the purpose of hearing and reading) and productive language

The relationship between a sample and its population is crucial in understanding language use for speaking and writing It's important to consider both receptive and productive aspects, as the prevalence of language activities, such as writing, can vary significantly within a large community Additionally, the context in which samples are collected—such as emails, business correspondence, and private conversations—plays a vital role in assessing their value in representing the unique characteristics of the language corpus.

The modern view of corpus representativeness

Corpus representativeness from a traditional viewpoint has a major focus on avoiding

Chomsky (1957) criticized corpora for being mere snapshots of a population, which can lead to a lack of representativeness due to the exclusion of certain natural language forms To create a representative corpus, it is essential to adhere to three key characteristics: first, the data must be selected from a suitable sampling frame encompassing diverse text genres; second, the corpus should explicitly define the population's level and context; and third, it must account for appropriate text length and category variation (McEnery & Wilson, 1996).

The modern perspective on the relationship between human language intuitions and corpus representativeness highlights the challenge of measuring corpus representativeness due to the lack of standardized metrics for population and size Rapp (2014) suggests that corpus representativeness can be evaluated through key statistics: (a) word frequency, which refers to the raw frequency of individual words; (b) word co-occurrence, which examines the association between a single word and other words within the corpus; and (c) common contexts of words, which analyzes the statistical patterns of word usage in various contexts.

4 concerning the relatedness between a single word and the group of words representing a similar semantic aspect)

The second consideration of the modern view is given to theoretical assumptions regarding the relationship between quantity and quality of the text samples According to Pastor and Seghiri

In 2010, the importance of text quality is closely linked to its reliability, emphasizing the need to avoid low-quality content To improve the relevance of information, it is essential to make appropriate selections from the data domain Pastor and Seghiri (2010, p 116) specifically addressed issues related to size in this context.

According to McEnery and Wilson (2001), a corpus in modern linguistics is best defined as a finite collection of machine-readable text that is carefully sampled to be highly representative of the specific language variety being studied.

Pastor and Seghiti (2010) argue that a smaller corpus can effectively represent specialized language areas, as its compilation can adequately reveal specific language patterns and properties They illustrate this with the Turicor corpus, suggesting that a smaller corpus may contain a higher density of specialized vocabulary and collocations compared to a larger corpus This highlights potential weaknesses in large corpora, such as issues related to the frequency of subjects and the alignment of content with the compiler's expectations (Baker, 2006).

This section outlines the selection of a methodological approach for studying linguistic phenomena, contrasting traditional and modern perspectives on corpus representativeness The dissertation adopts a corpus-driven approach for linguistic analysis, emphasizing its key characteristics.

The corpus-driven approach differs from the corpus-based approach, as highlighted by Tognini-Bonelli (2001) While the corpus-based approach focuses on exemplifying and validating existing linguistic theories informed by prior studies, the corpus-driven approach emphasizes the linguistic representation of frequency and text co-occurrence This method seeks to systematize linguistic categories based on the relationships between items and their contexts.

Tognini-Bonelli (2001) highlighted key differences between corpus-based and traditional approaches, emphasizing that the corpus-based method focuses on the probabilistic aspects of language usage This perspective leads to a more flexible approach, prioritizing real-world data over strict systemization Consequently, validating findings from the corpus-based approach requires alignment with established theories to ensure reliability.

According to McEnery, Xiao, and Tono (2006), corpus-driven linguists differentiate their approach by emphasizing the types of corpora utilized and their perspectives on existing theories and intuitions They argue that corpus balance and representativeness are not essential, highlighting that a limited selection of texts can significantly influence a corpus's thematic or stylistic balance Additionally, they assert that utilizing very large corpora offers minimal benefits when investigating frequent linguistic features.

Statement of the Problems

This section utilizes insights from World Englishes and English as a Lingua Franca (ELF) to focus on non-native English learners as the primary research subject of the dissertation It addresses the complexities of speaker identity while emphasizing the importance of preserving local diversity, linguistic nationalism, and cultural nationalism By examining the indexical values of English, the study investigates language use across various boundaries, highlighting the unique concerns associated with non-native English learners (Pennycook).

Concerns about the integration of global English with local discourse highlight its role in constructing indexical values, such as representing modernity, facilitating commercial trade, advancing scientific and technological progress, and promoting human rights This blending underscores the significance of English in contemporary society, as noted by various scholars (Ammon, 2001; Toolan, 2003; Graddol, 2006; Seargeant & Erling, 2011).

Non-native English learners play a crucial role in the study of "ethnography of communication," highlighting the relationship between linguistic signs and their contextual usage (Kroskrity, 2004, p.500) This focus suggests the importance of analyzing how language operates within specific social contexts.

The investigation of indexical values in language use is framed within specific contexts, including settings, topics, institutions, and the sociocultural backgrounds of speakers (Kroskrity, 2004, p.500) This approach utilizes semiotic-functional models to explore how these elements influence language and meaning.

Kroskrity’s (2004) implication of placing particular contexts and theoretical orientation of semiotic-functional models for investigating indexical values in language use can be formulated

11 from two main factors: sociocultural groups of language users and the relevant speech community

The indication of non-native English learners with reference to the former factor goes in line with

Coleman (2011) emphasizes the significance of focusing on developing countries as key research sites to explore the role of English in fostering development Many of these nations serve as benchmarks for establishing Standard English usage, particularly for assessing English proficiency in Expanding Circles countries, where English operates as a foreign language across various domains (Lowenberg, 2002) Additionally, the frequent use of the personal pronoun "we" can be noted in discussions surrounding this topic.

This section of the dissertation examines the phenomenon of non-native English learners' overuse of the pronoun "we." As noted by Pavlidou (2014), the usage of "we" encompasses complexities that extend beyond typical issues related to person deixis and reference Consequently, this study will define the overuse of "we" by identifying its common indicators.

The overuse of the pronoun "we" can be analyzed through relevant statistics on first-person pronoun usage, highlighting trends in their frequency Studies indicate a notable underuse of "we" and other first-person plural pronouns compared to the prevalent use of "I" and other first-person singular forms This shift reflects a growing individualism in language, as corpus studies reveal a significant disparity in the usage patterns of these pronouns.

A study by Twenge, Campbell, and Gentile (2012) analyzed pronoun usage in American books from 1960 to 2008, revealing a significant shift towards individualism The research utilized N-gram data from the Google Book Database, which indicated a 10% decline in the use of first-person plural pronouns (such as "we" and "us") and a notable 42% increase in first-person singular pronouns (like "I" and "me").

Research indicates a significant overuse of first person singular pronouns compared to first person plural pronouns This trend highlights a shift towards a less hierarchical and more self-focused cultural perspective, as noted by Twenge et al (2012) Additionally, the decline in the use of "we" may be linked to the rising prominence of second person pronouns, with Twenge et al (2012) noting a dramatic quadrupling in the usage of "you" and "yours" in American literature from 1960 to 2008 This shift reflects a broader trend of increasing individualism in society.

The use of second-person plural pronouns reflects a growing trend in written discourse that engages readers in a conversational manner, indicating a shift towards individualism.

Research by Uz (2014) examined the correlation between personal pronouns and individualism, utilizing n-gram data from Google Books The study analyzed variables such as country-specific individualism scores, total pronoun usage, and the frequency of first-person singular and plural pronouns Results indicated no significant correlation between individualism scores and total pronoun usage; however, a strong positive correlation was found between individualism scores and the use of first-person singular pronouns (r = 62, ρ = 05, n = 10 countries) Further regression analysis reinforced this finding, showing a higher relationship between first-person singular pronouns and individualism scores (β = 74, ρ = 02), while total pronoun usage was deemed an unreliable indicator of individualism (β = 34, ρ = 21).

Corpus studies by Twenge et al (2012) and Uz (2014) reveal limitations in using the first-person plural pronoun "we" to measure individualism based on pronoun usage statistics Key issues include the diverse text genres within the database, which complicate the comparison of pronoun frequency across different contexts Consequently, while "we" may indicate individualism in specific sub-corpora, such as Google Books n-gram, the lack of qualitative data limits the understanding of pronoun usage patterns Relying solely on frequency data and statistical tests fails to capture the significant effects of overusing "I" or underusing "we" on individualism scores, highlighting the necessity for qualitative analysis to bridge this research gap.

The index of self-reference

On the self-reference to group indexicality, Muhlhausler and Harre (1990, p 178) defined that: “By selecting we rather than other pronominal form a speaker introduce a bond with his/her

Incorporating the overuse of "we" in discourse can serve two key pragmatic functions: (a) it creates a sense of group indexicality, and (b) it fosters a feeling of collectivity By utilizing "we," speakers can shift responsibility away from themselves, thereby drawing others into a shared obligation This dynamic highlights the importance of understanding how self-reference operates in both spoken and written communication.

The pragmatic function of self-reference distinguishes between inclusive and exclusive uses of the term "we," based on the relationships among the addressee, other participants, and third parties (Pavlidou, 2014, p.4) Overusing "we" can replace the referential values of other pronouns, such as using "we" to mean "I," "you," or "they," each serving specific self-referential purposes (Table 1, Pavlidou).

The second index of self-reference in the usage of "we" relates to the concept of collectivity, which is closely tied to the speaker's identity (Pavdilou, 2014) This categorization divides "we" into two types: the relational "we," which reflects personal connections between speakers, and the collective "we," which encompasses larger groups and distinguishes between in-groups and out-groups (Sedkides & Brewer, 2001, p.2).

Three examples are cited from Pavlidou (2014) to illustrate on the index of self-reference

Research Summary

This dissertation focuses on a corpus-driven analysis of the first person plural pronoun "we," along with other significant pronouns and lexical items The research is organized into three main sections, each exploring different aspects of these linguistic elements.

Chapters 2 and 3 in the dissertation cover the literature review, research methodology, and research methods prior to the proceeding of main studies Chapter 2 starts from a review of transition from quantitative sociolinguistics to the quantitative and qualitative research paradigm in corpus linguistics, layouts research implementations of corpus-driven approach, and reviews empirical findings on research of person pronouns The layout of research methodology in Chapter 3 mainly states specific components for research methods: (a) the corpus database, (b) narrow usage of corpus database, and (c) data analysis Chapter 3 also includes a pilot study for preliminary the usage of first person plural pronoun we by Chinese EFL learners before reporting quantitative findings in Chapter 4

Chapter 4 of the dissertation reports quantitative findings from four research sub-topics in the following sequence:

Study 4.1 examines the linguistic feature of overusing we in English writing by Chinese learners

The study is divided into two halves: the first half categorizes the overuse of "we" into frequently and less frequently used indexes using z-score and standard deviation methods In the second half, the researcher conducts a deeper analysis of the areas where "we" is overused, applying additional criteria for assessment.

Study 4.2 reconsiders the methodological issue with the coding scheme of we in past research

Past research has shown inconsistencies in coding selection and subjective overuse of first-person pronouns in data interpretation due to intuition-based approaches This study focuses on analyzing the distribution of dynamic and stative types of we-clauses, utilizing Pearson Residuals as the primary statistical method to categorize them into frequently and less frequently used indexes for the frequency of "we."

Study 4.3 extends the scope of investigation into multiple types of person pronouns and in a context of multiple corpora comparison Apart from the common comparison between native language and interlanguage under the Contrastive Interlanguage Analysis (CIA) framework, this study also focuses on the comparison between interlanguages To suit the stated research scope, data collection includes written learner corpus of English essays from four EFL groups of learners

(Chinese, Japanese, Korean, and Thailand) and a group of native English learners Study 4.3 is carried out in consideration to drawing interesting conclusions about major language patterns as

17 being emerged from the corpus data and using χ 2 tests and effect size to test predictive models of similar language patterns among groups of learners

Study 4.4 discovers the case of n-grams which is discovered primarily on the basis of frequency and ranking and therefore raises a theoretical concern on qualifying the characteristic of its being overused by group(s) of language user Borrowing a research design as outlined in Ishikawa

This study builds on Ishikawa's 2016 research by closely examining the use of first-person pronouns in English essays and speeches among EFL learners from China, Japan, and Thailand It specifically reassesses the overuse of the tri-gram clusters "I" and "we," aiming to identify similar patterns in the current findings The reliability of these patterns is evaluated to enhance our understanding of EFL learners' pronoun usage.

Chapter 5 discusses the quantitative findings with the visualization of scatter plot, the general linkage between person pronouns with other key items, the selected text data from the corpus resources, pattern of n-grams, and concordance observation Chapter 6-the final chapter of the dissertation- discusses the efficiency of frequency-based findings in corpus-driven approach together with limitation of the study and suggests potential research direction for future studies.

Research Contributions

This dissertation contributes the following outcomes to the field of corpus linguistics

It is the theoretical reconsideration for the notion of methodological triangulation in corpus linguistic research (Baker & Egbert, 2016) and the role of frequency data its corpus-driven analysis

It is in particular the practical reconsideration to employ mixed method research design (Creswell,

The dissertation explores the integration of quantitative and qualitative research paradigms in corpus linguistics, illustrating the transition from statistical measurement to word reconstruction through text pattern analysis It presents a comprehensive investigation that combines various types of empirical evidence across three stages within a systemic linguistic framework.

In quantitative analysis, it is essential to establish linguistic variables for measurement, identify patterns of variation from these measurements, and elaborate on the quantitative findings to connect them with the construction of text within corpora data.

This dissertation explores the use of the personal pronoun "we" and other critical elements in evaluating the academic writing of non-native English learners It emphasizes two significant issues in the English as a lingua franca (ELF) debate: first, the concern that promoting core ELF features may inadvertently reintroduce a monolithic English variety, conflicting with the pluralism of the World Englishes paradigm; second, skepticism regarding whether teaching these core features to Expanding Circle speakers can genuinely challenge the dominance of authoritative English varieties and transform native/non-native speaker dynamics Addressing these issues is essential for understanding how English is utilized with both homogeneity and heterogeneity among non-native learners.

LITERATURE REVIEW

Research Paradigms in Corpus Linguistics

This section firstly explains the expansion of quantitative research paradigm from quantitative sociolinguistics to corpus linguistics in terms of research scope and research principle

The article reviews the evolution of qualitative research within corpus linguistics, highlighting the advancement of complex methodologies for text analysis that utilize corpora as primary databases It begins with an exploration of quantitative sociolinguistics as a foundational element in this development.

Quantitative sociolinguistics focuses on the analysis of language varieties, particularly social dialects within specific urban contexts, utilizing interview data to interpret linguistic styles It emphasizes the quantitative measurement of linguistic variables and provides referential explanations for nonfunctional aspects of these variants (Biber, 2009) Research within the Labovian paradigm typically examines frequency data related to linguistic variables, contextualizing linguistic phenomena through socio-economic, demographic, and geographical factors By concentrating on phonological variables derived from spoken data, Labovian studies predominantly illustrate that phonological variants are influenced by style shifting and are closely tied to the interactions of different speaker groups with their situational contexts.

Moreover, quantitative sociolinguistics also emphasizes the notion of dealing with referential explanations for the nonfunctional aspects of the variants in a systematic way

The Labovian research paradigm links systematic variability to the frequency of linguistic variants, categorizing them into high and low frequency (Wolfram, 2006) For example, Labov's (1963) study explored the centralization of the vowel sounds /ay/ and /aw/, highlighting these frequency distinctions in linguistic variations.

Martha's Vineyard analyzed the centralization of the vowel sounds /ay/ and /aw/ based on factors such as age, geographical location, occupation, and ethnicity These factors serve as dependent variables that influence the coexistence of different variants of /ay/ and /aw/.

Quantitative sociolinguistics primarily focuses on social varieties to explain linguistic variants, yet it falls short in detailing why certain variants are preferred over others across different language varieties For instance, Labov's (1963) study relied on lexical questionnaires, interviews, and reading texts to analyze the /ay/ and /aw/ variants, but it lacked deeper interpretations regarding the frequency of these linguistic variables, particularly in relation to contextual factors like pragmatic features and the information processing of the data collected.

Even if proper contextual factors are identified, quantitative sociolinguists might face challenges when language contexts are of no relevance to the occurrence of casual speech (Milroy,

1987) Especially when there is a high reliance on linguistic cues such as topic of conversation

(Trudgill, 1974), the avoidance of presenting data under the continuum from casual to formal style

(Macaulay, 1977) may fail to infer aspects of competence and of performance in the examination of language production (Ellis, 1994) b The expansion of quantitative research paradigm in corpus linguistics

Corpus linguistics represents an advancement in the quantitative paradigm of sociolinguistics, sharing the belief that linguistic choices are not merely random variations Instead, it posits that linguistic variables are governed by specific rules and conditions.

Corpus linguistics research explores a diverse array of linguistic variants across various registers and forms, both spoken and written This approach provides a systematic framework for examining these variants within the context of established linguistic constructs (Biber, 2009).

A crucial aspect of corpus linguistics research is determining how adequately text data collections can represent a corpus Granger (1998) identified two main criteria for establishing corpus representativeness in learner corpus design: (a) characteristics of the language situation and (b) attributes of language learners The language situation features are organized from the broadest medium classification, such as written or spoken corpora, down to specific genres like argumentative and narrative writing.

21 language learners are defined as attributes including categories such as age, sex, mother tongue, religion, other foreign languages, level, learning context, and practical experience

The development of learner corpora arose from addressing limitations in error analysis, such as the extensive manual effort required for word calculation, challenges with diverse learner types, and a focus on errors that overlooked learner performance (Granger, 1998) For corpus linguists, utilizing word frequency, word co-occurrence, and common word contexts is crucial for assessing the representativeness of learner corpora data (Rapp, 2014).

Assessing language performance through corpus data presents limitations in representativeness Chomsky (1957) argued that a corpus merely reflects a snapshot of a population, making it unlikely to accurately represent language norms Similarly, Rapp (2014) highlighted the absence of a standard for measuring both the population and the suitability of corpus data for specific linguistic analyses Consequently, creating a representative corpus is challenged by the need for an appropriate sampling frame that encompasses various genres, forms, text lengths, and category variations defined by the population's context (McEnery & Wilson, 1996) This underscores the growing importance of qualitative research within the corpus linguistics paradigm.

Traditional qualitative text analysis is characterized by two main perspectives: structural and functional The structural perspective examines text hierarchy, distinguishing between micro-level elements, such as sentence networks, and macro-level elements, which encompass the overall meaning of the text (Kintsch & van Dijk, 1978) This perspective also includes a classification of analysis methods into bottom-up approaches, which progress from clauses to the whole text, and top-down approaches, which do the opposite (Mann & Thompson, 1988) Conversely, the functional perspective focuses on the flow of information, analyzing how topics and sub-topics progress (Lautamatti, 1987) and the relationship between given and new information in text organization (Prince, 1981).

The qualitative research paradigm in corpus linguistics builds on prior theoretical developments in qualitative text analysis to define research goals and scopes By employing an inductive approach, qualitative corpus linguistics aims to provide in-depth analyses of linguistic norms and language varieties, considering the backgrounds of interlocutors as well as spatial and temporal contexts (Hasko, 2012) Additionally, corpus-integrated qualitative analysis plays a crucial role in minimizing subjectivity in data interpretation during discourse analysis.

Traditional text analysis utilizes a coding system that integrates function-based categories within a sentence network For example, the Kintsch and van Dijk (1978) model employs a coding system that highlights macrostructures through function-based labels such as deletion, generalization, and construction Additionally, the coding system for superstructure categorizes genres, with news discourse featuring labels like headlines, lead, context, and event Another coding approach prioritizes the organization of information based on the relevance of given information, which is reflected in the development of the given and new theory, categorizing information into new, inferred, and given formations (Prince).

Qualitative corpus analysis enhances the understanding of text patterns by linking them to other textual patterns and contextual elements (Biber, Conrad, & Reppen, 1998) This method, characterized by its focused approach within smaller databases, serves as a complement to quantitative research methodologies (Hasko, 2012) By integrating qualitative insights, corpus linguists can gain a broader perspective on linguistic norms and facilitate deeper evaluations in subsequent stages of corpus analysis (Mittelberg, Farmer, & Waugh).

The complexity of qualitative corpus analysis is heightened by the integration of computer-aided software, which facilitates the identification of text patterns through features like keyword-in-context (KWIC) searches and the expansion of nodes in collocations or word combinations within concordance lines (Pollach, 2012) This approach not only combines two paradigms but also allows for more accurate generalizations from small samples to larger populations (Schmied, 1993).

On the quantitative research paradigm in corpus linguistics

Research Implementations of Corpus-Driven Approach

This section elucidates the significance of the corpus-driven approach in analyzing frequency data, as highlighted in the dissertation It addresses two primary theoretical concerns: the comparison between corpus-based and corpus-driven methodologies, and the discussion of word frequency and its broader implications in corpus-driven research Furthermore, it explores how mixed methods research design can improve the reliability of frequency-based analyses within the corpus-driven framework.

The deductive direction for analysis of corpus data characterizes corpus-based approach

The corpus-based approach is rooted in the validation of existing linguistic theories, drawing insights from prior studies (Tognini-Bonelli, 2011) This approach is primarily deductive, focusing on theory confirmation and hypothesis testing within a specific scope of corpus data (Cheng, 2012) By narrowing its research focus, the corpus-based approach can effectively investigate topics such as the distribution of parts of speech—like nouns, verbs, and adjectives—as well as the functional aspects of linguistic variation, all while reconsidering register differences as key parameters influencing textual patterns (Biber, 2009).

The corpus-driven approach utilizes an inductive method to uncover textual patterns with little to no preconceived notions, making corpus evidence essential for developing or revising linguistic theories (Tognini-Bonelli, 2001) In this approach, the balance and representativeness of the corpus are less critical, as large corpora provide limited advantages; instead, frequency data plays a crucial role, serving as the primary filter for data analysis (Xiao, 2009).

Contradictory perspectives exist concerning the two methodologies for analyzing corpus evidence Specifically, the corpus-based approach is regarded as effective for corpus annotation and tagging, which helps reveal textual patterns (Xiao, 2009).

The corpus-driven approach is particularly effective for analyzing large corpora when used alongside corpus analysis tools, offering greater potential for theory development and modification compared to the corpus-based approach.

The selection of an approach in corpus linguistics is influenced by the definitions and frameworks established by researchers, as highlighted by Biber (2009) A key aspect of this discipline involves analyzing word frequency within a corpus-driven methodology.

This section explores the impact of corpus investigation on the selection of frameworks and corpus evidence, focusing on word frequency as a foundational element It discusses the role of frequency data within the context of a corpus-driven approach, highlighting its significance in guiding research and analysis.

From the claim that word frequency is of primary concern in corpus-driven approach

(Biber, 2009), this section takes the following insights into consideration: first, it is the consideration of word frequency as the minimal unit of corpus analysis in keywords approach

According to Baker and Egbert (2016), utilizing a corpus-driven approach is essential for linguistic analysis Cheng (2012) emphasizes the necessity of generating word frequency lists before comparing key word lists from both observed and reference corpora Furthermore, Tribble and colleagues suggest that creating a key word list that includes word frequency information serves as an effective foundation for exploring various text dimensions.

Word frequency is typically displayed in a list format, organized either alphabetically or by frequency (Hunston, 2006) For data presentation, raw frequencies within a corpus are straightforward to interpret, while normalized frequencies allow corpus linguists to compare word frequency across various corpus sizes by standardizing the raw data (Gries, 2015) In terms of data interpretation, word frequency is valuable for highlighting the linguistic norms of specific speaker or writer groups, reflecting both regular and exceptional language use (Hyland, 2015).

In corpus-driven data analysis, two key theoretical concerns arise regarding word frequency extraction Firstly, high-frequency items may only represent common usage within a limited set of writing samples, failing to capture broader language usage phenomena (Baron, Rayson, & Atcher, 2009) Secondly, the emergence of linguistic patterns can be influenced by factors such as language choice, register variance, and context, suggesting that word frequency is intertwined with the randomness or non-randomness of language use (Hunston, 2006).

26 c Going beyond the scope of word frequency in corpus-driven approach

This section examines n-grams as an extension of word frequency within a corpus-driven approach Language patterns often consist of combinations of two or more words (Cheng, 2012), leading to the concept of n-grams, also referred to as lexical bundles or clusters This concept originates from the study of formulaic patterns, focusing on the frequent occurrence of multi-word combinations that create familiar language use (Hyland, 2008) Structurally, n-grams are defined as combinations of at least two words, categorized as 2-grams, 3-grams, and 4-grams The presentation of n-grams, organized by frequency, serves as an expanded framework for understanding single word frequency lists (Rayson, 2015).

The expanded concept of word frequency is rooted in Sinclair's (1991) research on corpus-driven language analysis, utilizing concordance as a method for data visualization This approach allows corpus linguists to uncover valuable insights into language patterns and usage by examining the occurrence and co-occurrence of words within a text (Wynne, 2008) Concordance lines display the search term centrally, flanked by its surrounding context on both sides, and can include single words, n-grams, or part-of-speech tags (Rayson, 2015).

The third expanded scope of word frequency focuses on keyness as a widely used methodology in corpus-driven analysis Keyness can be identified through two primary methods: comparing word frequency lists from the examined corpus with those from a reference corpus, where corpus linguists utilize statistical measures such as χ² and log-likelihood to generate a keyness value for each word (Cupeper & Demmen, 2015).

The reference corpus can be equal to or larger than the target corpus, allowing for keyness identification to be expanded into a multi-corpora comparison.

A comprehensive approach to studying keyness involves combining a keyness word list with simultaneous concordance analyses, which enhances data interpretation and helps characterize the unique language features of specific groups of writers or speakers (Baron, Rayson, & Archer, 2009; Hyland, 2015).

27 d Emphasizing reliability through triangulation in corpus-driven approach

Empirical Corpus Findings on Person Pronouns

This section offers a document-based analysis of empirical studies on person pronouns, drawing from the literature reviewed in the works of Leedham and Fernandez-Parra (2017), Hyland and Jiang (2017), Yeo and Ting (2014), O'Boyle (2014), Paquot and Granger (2012), and Carter-Thomas and Chambers.

Recent studies by McCrostie (2008), Martinez (2005), and Fortanet (2004) highlight four key areas of empirical findings regarding academic written and spoken discourses These areas focus on the characteristics of personal pronouns in academic contexts, the visibility of writers and readers through the use of pronouns, the functional categories of these pronouns, and the external factors that influence their patterns Understanding these aspects is crucial for enhancing academic communication and engagement.

Corpus research on person pronouns in academic contexts reveals their general features and associations with writers' purposes Studies on academic writing, including student work and published journal articles, indicate that authors often use pronouns to enhance their self-promotion This methodological triangulation combines frequency data with analysis of form and presentation to provide insights into the usage of person pronouns in scholarly texts.

30 in the establishment of their arguments Hyland’s works (e.g Hyland, 2001; Hyland, 2004; Hyland,

In 2012, research on first-person pronouns highlighted their role in establishing authority and personal projection in academic writing Hyland's studies examined how writers adapt to social and rhetorical conventions, as well as their interactions with academic writing patterns These aspects align with the self-promotional function of first-person pronouns, as noted by Harwood (2005a).

Student writers establish their roles in academic discourse through their association with individual or group perspectives, which is reflected in their use of first-person pronouns Research by Tang and John (1999) on university essay writings revealed that first-year Singaporean students often used "we" and "us" to emphasize group identity This suggests that first-person pronouns are commonly employed to guide arguments rather than express personal opinions, potentially due to concerns about the informal nature of these pronouns and the challenge of balancing formality and informality in academic writing (Chang & Swales, 1999).

The investigation of person pronouns in academic writing raises three theoretical concerns Firstly, there is a significant issue regarding the incompleteness of comparisons between non-native and native English speakers, particularly due to the potential unfairness in comparing corpora, such as published academic articles versus student essays (McCrostie, 2008) Secondly, the emphasis on person pronouns varies across different academic disciplines, influencing their usage in writing.

Writers in natural sciences tend to adopt a more informal writing style compared to their counterparts in social sciences (Hyland & Jiang, 2017) Additionally, the use of specific vocabulary plays a significant role in analyzing the characteristics of personal pronouns.

(2017) found the link between the increasing usage of first person pronouns and a small rise of 2% in informal features such as sentences starting with conjunctions and conjunctive adverbs dated from 1965

Meanwhile, corpus studies for person pronouns in academic spoken discourse were conducted with a collection of university lectures as the main corpus resource According to

Yeo and Ting (2014), studies in the research direction of lecture comprehension and student

Research indicates that both technology-based and traditional classroom settings significantly influence the implementation of interactive techniques during lectures Studies have highlighted the effectiveness of lecturer questioning as a means to improve lecture comprehensibility, while others emphasize the use of personal pronouns to foster interactivity and enhance student engagement.

The investigation of person pronouns is crucial as they play a significant role in facilitating interactive situations between the speaker and listener (Yeo & Ting, 2014) Contrary to the common assumption that "I" refers to the speaker and "you" to the listener, the meaning of these pronouns is largely influenced by the context of shared information and the focus of attention (Biber et al., 1999; Yeo & Ting, 2014) Understanding this dynamic is essential for enhancing writer/reader visibility.

The second aspect of corpus research on personal pronouns in academic writing examines the visibility of writers and readers in scholarly discourse Research in this area highlights the importance of considering readers as a primary focus for academic writers during the text creation process.

Studies on the presence of readers employed a comparison between novice and expert writers or between L2 and L1 English students (Tang & John, 1999; Ivanic & Camps, 2001)

Specifically for the comparison between L2 and L1 English students, L2 student writers tended to overuse first person pronouns than native English students (McCrostie, 2008; Lee & Chen, 2009)

Petch-Tyson (1998) investigated the visibility of writers and readers in academic writing, revealing that non-native English learners—specifically French, Dutch, Swedish, and Finnish students—tended to overuse first and second person pronouns in their argument essays compared to native English speakers, such as American learners.

There are three theoretical explanations for the relationship between personal pronouns and writer/reader visibility Firstly, the construction of writer identity plays a significant role in the writing process, as it is often complex and multifaceted (Leedham & Fernandez-Parra, 2017) Secondly, the perceived time and distance between writers and readers may necessitate that writers establish a certain level of interpersonal connection.

32 involvement for the maintenance of intelligibility and clarity of meaning and content (Chafe, 1982)

The use of personal pronouns in academic writing is closely connected to how writers conceptualize their audience, as it reflects their awareness of speech events and their efforts to build rapport with readers (Fortanet, 2004).

The relationship between personal pronouns and visibility in writing can be explained by the linguistic resources they provide for understanding speech events (Halliday, 1994) However, the role of personal pronouns as reference devices can sometimes be ambiguous Biber et al (1999) highlighted the vagueness of the first-person plural pronoun "we," noting its overlap with "you" in terms of reference This ambiguity in first-person pronouns is partly influenced by style manuals.

(Hyland, 2001) or the rhetorical choices of the writers (Martinez, 2005) c Functional categories of person pronouns

The third strand of corpus research on person pronouns examines functional categories through two key aspects: context independence and discourse function Hyland and Jiang (2017) highlighted the context independence, particularly in relation to the degree of informality in academic writing, while Fortanet (2004) and Yeo and Ting (2014) explored the discourse functions of person pronouns in both academic written and spoken contexts.

RESEARCH METHODOLOGY

FINDINGS

DISCUSSION

CONCLUSION

Ngày đăng: 11/07/2021, 16:49

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN