Introduction
Problem identification
Since the implementation of the 'open door' policy, Vietnam has attracted a significant influx of foreign investors, highlighting the critical importance of the English language in the country's industrialization and modernization efforts As Vietnamese individuals recognize the necessity of English for effective communication in a globalized environment, the motivation to learn the language has shifted from mere academic requirement to a strategic career move Many are eager to acquire English skills to enhance their job prospects in foreign or joint venture companies and to prepare for overseas assignments In response to this demand, numerous foreign language schools have emerged, offering A, B, and C certificates to assess and validate learners' English proficiency These certifications are increasingly valued as they can lead to improved career opportunities and promotions.
Founded in 1995, CTFLC has provided English classes to thousands of learners, awarding A, B, and C certificates to many of them Similar to other foreign language centers in Vietnam, CTFLC offers a structured program where students progress from beginner to advanced levels through three distinct stages Each level requires six months of study, divided into two three-month stages, with examinations held every three months Upon passing these exams, learners receive certificates issued by MOET, which are recognized nationwide, validating their English proficiency at beginner (Level A), intermediate (Level B), and advanced (Level C) levels.
Despite having the necessary official permissions and English proficiency, CTFLC teachers do not create the A, B, and C English test versions for exams Instead, these tests are developed by the CED under the MOET This approach is taken to ensure standardized testing, which upholds fairness for all candidates and maintains the integrity of the exams Additionally, it helps CTFLC mitigate negative rumors that may arise from competition with other centers.
CTFLC has maintained a strong reputation for quality and transparency, receiving no criticism from other centers regarding its testing procedures However, recent complaints from candidates have highlighted concerns about varying difficulty levels among different versions of the written tests administered at different times, suggesting that these versions may not be consistent in content and challenge.
The researcher, serving as a teacher at the center, believes it is crucial to evaluate the reliability of the test materials developed by CED, MOET This necessity prompted the initiation of the study.
Purpose of the study
To ensure fairness among examinees taking exams of the same level at different times, the CED of MOET must create equivalent written test versions that can be used interchangeably These test versions should maintain the same level of difficulty and closely align in terms of theoretical basis and scoring validity.
The purposes of the study were therefore to estimate the reliability of the English Level-A written test versions’ contents, and find out the solutions to this problem.
Research questions
The aim of the study was to investigate the reliability of the English Level-A written test versions’ contents issued by CED, MOET administered at CTFLC through the following questions:
1 To what extent, are the English Level-A written test versions provided by CED, MOET to CTFLC reliable?
2 If not, what suggestions should be given to CED, MOET?
Limitation
The study exclusively examined the reliability of the English Level-A written test contents provided by CED, MOET, without assessing the speaking and listening components or their validity Additionally, English tests at levels B and C were not part of this research.
English Level-A Tests
According to MOET (official correspondence reference number 407/TC-BT dated January 12, 1994), an English Level-A course must consist of 400 periods, each lasting 45 minutes Upon completion, learners are expected to effectively communicate on various topics such as family relationships, seasons, careers, habits, health, sports, entertainment, and food They should be able to understand questions and respond appropriately, as well as comprehend short statements and identify key information in simple conversations related to these subjects.
Learners must showcase their reading skills by interpreting simple notices, signs, instructions, and short passages related to time, space, and possessions In writing, they are expected to effectively compose notices, describe events, people, places, and accommodations, as well as create simple letters and postcards.
Upon completing the grammar course, learners will be proficient in using various types of nouns, including countable and uncountable nouns with "some" and "any," as well as compound nouns and complex noun phrases They will also understand and apply pronouns, determiners, adjectives, adverbs, and modal verbs effectively Additionally, students will master verb tenses such as the present simple, present continuous, and present perfect simple, along with their uses in narrative and reported speech, and will learn to form future tenses using "going to," present continuous, present simple, and "will" or "shall." The course covers essential verb forms, including imperatives, infinitives, gerunds, phrasal verbs, and passive forms, as well as causative structures Furthermore, learners will explore conditional sentences, simple reported speech, and various prepositions related to time, location, and direction, alongside miscellaneous prepositions and prepositional phrases to enhance their grammatical competence.
The English Level-A test comprises a written component, which accounts for 90% of the overall assessment, and a speaking component, representing 10% The written test evaluates grammar, vocabulary, reading comprehension, writing composition, and listening comprehension The Ministry of Education and Training (MOET) provides the written test versions and answer keys via postal service the day prior to the test Meanwhile, the listening section of the written test and the speaking segment are designed and administered by the Center for Testing Foreign Language Competence (CTFLC).
The written test typically follows a consistent format, comprising three subtests Paper 1: READING COMPREHENSION is divided into two sections Section A features 15 to 20 multiple-choice questions that assess lexis and grammatical rules, where candidates must select the word or phrase that best completes each sentence Section B contains five items focused on evaluating reading comprehension through a provided passage.
Paper 2 is USE OF ENGLISH with four sections, Section A, B, C, and D In Section A, candidates are required to complete a gapped text by choosing the best word for each blank from a selection of 10 to 15 words presented There are totally
In this assessment, candidates will encounter multiple sections designed to evaluate their language proficiency Section B requires them to transform given words or phrases into 10 complete sentences through necessary changes and additions In Section C, candidates will read a passage and answer 5 questions to demonstrate their reading fluency Finally, Section D asks candidates to write a brief composition of 80 to 100 words on a specified topic, showcasing their writing skills.
Paper 3 is LISTENING COMPREHENSION which is entrusted to CTFLC to design This part usually consists of two or three sections with totally 20 to 30 test items including multiple choice, true-false and/or gap filling items
The written test versions usually have a score scale varied from 0 to 90, which is often distributed as follows: Paper 1 Section A: 15 to 20 marks, Section B:
5 to 10 marks; Paper 2 Section A: 10 to 15 marks, Section B: 10 marks, Section C:
5 marks and Section D: 10 marks; and Paper 3: 20 to 30 marks
To ensure a smooth testing process, CTFLC meticulously prepares materials and equipment in advance The editing committee begins their work the evening before the test, typically around 5 or 6 p.m., to verify that the test version is error-free and that there are enough score keys and answer sheets They photocopy and number the test materials and answer sheets consecutively, packaging them for each candidate in designated rooms, and securely sealing the packages Additionally, all necessary equipment, including tapes, tape recorders, and loudspeakers, is thoroughly checked to confirm proper functionality.
All members of the editing committee must remain in designated separate rooms at the school until testing begins the following day, under the supervision of two local police officers throughout their stay and work.
Prior to the testing period, all candidates and proctors receive comprehensive instructions covering every aspect of the test, including the use of tape recorders, oral test procedures, the marking scheme, the rating system, and protocols for handling instances of cheating.
Candidates are also given detailed instructions, e.g time, candidate index number and room numbers, what to bring, what they should do if they forget their identity cards, etc in advance
The testing rooms at Cantho In-service University (CTIU) are spacious and designed to accommodate the required number of candidates while ensuring ample space between them to prevent cheating Each room is equipped with a loudspeaker system, providing a quiet and conducive environment for examinations.
Candidates must complete the writing test in the morning, lasting 120 minutes from 7 a.m to 9 a.m., followed by the speaking test in the afternoon at 1:30 p.m It is mandatory for all candidates to arrive 15 minutes early Each testing room is monitored by two proctors, who verify the identity of candidates upon entry and assign seating based on their index numbers displayed on the desks Candidates are seated at separate two-seat tables to minimize the risk of cheating Latecomers are only permitted to enter the room within 10 minutes after the test materials have been distributed.
Proctors must inform candidates ahead of time about their responsibilities, procedures for handling early finishers, and the penalties for cheating during the test Additionally, candidates receive answer sheets and are instructed on how to fill in the necessary information.
At 7 a.m., proctors distribute test materials to candidates, who must remain seated until testing concludes at 9 a.m After the time is up, candidates sign the candidates' list as their answer sheets are collected Proctors then review the answer sheets before submitting them to the administration board for rechecking and sealing.
Significance of the study
The study aimed to enhance the reliability of the English Level-A written tests provided by CED, MOET, ensuring fairness for all candidates seeking Level-A English certificates across various testing occasions.
Organization of the study
The thesis is structured into five chapters, with Chapter 1 outlining the study's rationale, objectives, and limitations It provides an overview of the English Level-A tests, detailing course objectives, test formats, preparation methods, administration processes, marking criteria, candidate backgrounds, and the role of examiners at CTFLC The chapter concludes by highlighting the significance and organization of the research.
Chapter 2 presents a literature review that highlights the significance of testing in language teaching and learning It outlines various types of testing and test techniques, followed by an introduction to the essential qualities of a good test: validity and reliability Additionally, the chapter addresses factors that can affect reliability, offers potential solutions, and explores the relationship between validity and reliability.
Chapter 3 will demonstrate the methodology rationale, the research methodology, and the methods of data analysis Then Chapter 4 will present the data analysis and discuss the findings Finally, Chapter 5 will give a conclusion and offer some suggestions as well as implications for further study.
Literature Review
Language testing
2.1.1 The importance of testing in language teaching and learning
Language tests serve dual purposes: they provide valuable insights for decision-making in educational programs and act as indicators of relevant abilities or attributes in research related to language, language acquisition, and language teaching (Bachman, 1997) Additionally, McNamara emphasizes the significance of these assessments in understanding language-related phenomena.
Language tests significantly impact individuals' lives by serving as crucial gateways in education, employment, and immigration They assist teachers in making informed decisions about student placements, measuring achievements, assessing performance, and motivating learners A well-designed test fosters positive attitudes towards learning by providing a sense of accomplishment and aligning evaluations with taught material Furthermore, effective tests encourage learners to study diligently, focus on course objectives, and identify areas for improvement Testing and teaching are deeply interconnected, making it challenging to engage in one without considering the other While tests are not the sole method of evaluating learning, they are essential in the language teaching and learning process, as they facilitate rational educational decisions Thus, the importance of testing in language education cannot be overstated.
However, not all language tests are of the same kind They differ with respect to the methods they are designed, and what they are used for
Tests can be classified according to their method or their purpose of testing
Language assessment methods generally include traditional paper-and-pencil tests and performance tests (McNamara, 2000) Paper-and-pencil tests focus on evaluating specific language components, such as grammar and vocabulary, or receptive skills like listening and reading comprehension These tests often utilize fixed response formats, with multiple-choice questions being the most common The advantages of multiple-choice tests include their efficiency in administration and scoring, as well as the reliability of their results (Hughes, 2003), which is why they are referred to as objective tests However, they are less effective in assessing productive skills such as speaking and writing, except in an indirect manner.
To assess candidates' productive skills in speaking and writing, performance tests are employed, where a sample of speech or writing is collected and evaluated by trained raters following a standardized rating procedure (McNamara, 2000: 6) These samples typically reflect real-world tasks within realistic contexts However, a notable drawback of these tests is their potential subjectivity and unreliability, leading to their classification as subjective tests.
Language tests are categorized based on their intended purpose, with the same test format often serving multiple objectives Typically, these assessments are meticulously crafted to meet specific goals.
According to Hughes (2003) and Davies (2000), the most common types of language tests are proficiency, placement, achievement, progress, and diagnostic tests, each serving a distinct purpose Brown (1996) categorizes these tests into two main types: norm-referenced tests, which assist administrators and teachers in making program-level decisions regarding general language abilities, and criterion-referenced tests, which aid in classroom-level decisions by evaluating specific language objectives.
Proficiency tests assess an individual's language ability without considering their prior educational experiences, focusing instead on future language use (McNamara, 2000:7) These assessments are essential for evaluating learners' skills against widely recognized standards, with prominent examples including the TOEFL, IELTS, and UCLES.
Placement tests are those to assist placement of new learners by identifying the stage or part of a teaching program most appropriate to their ability They are
“essential in large institutions that frequently receive new students” (Davies, 2000:
Achievement tests evaluate learners' performance throughout a course and assess their success in meeting the course objectives These tests are closely aligned with the goals of language courses and are typically administered at the conclusion of the program They are created and overseen by educational ministries, official examination boards, or faculty members from teaching institutions (Hughes, 2003).
Progress achievement tests, also known as short-term achievement tests, are designed to assess learners' understanding and retention of material after each lesson or unit in a language course According to Davies (2000), these tests evaluate what students have learned at various stages of their education Hughes (2003) emphasizes that their primary purpose is to measure learners' progress, often concentrating on recently introduced and practiced language concepts.
Diagnostic tests play a crucial role in identifying learners' strengths and weaknesses, as well as pinpointing areas requiring additional support These assessments enable teachers to tailor their instruction to meet the diverse needs of both the group and individual students, making them particularly beneficial in mixed-level classrooms.
Tests are selected based on the specific goals of educators or assessors, as outlined in sections 1.5.1, 1.5.2, and 1.5.3 of Chapter 1 This study focuses on criterion-referenced tests designed to evaluate clearly defined objectives set by the Ministry of Education and Training (MOET) for candidates seeking certification of their English proficiency level.
Each test version for the English Level-A course, organized by various FLCs and developed by CED and MOET, includes both objective and subjective assessments, such as multiple-choice, gap-filling, writing, and speaking tasks Learners who successfully complete these tests receive certificates issued by MOET, which are recognized nationwide and often required for recruitment or promotion in many Vietnamese organizations While CED and MOET provide these tests, several FLCs and schools also have the authority to issue their own elementary tests; however, there is a lack of integration among these institutions in test design, aside from the guidelines set by MOET It is important to note that a candidate who does not pass the Level-B test cannot be considered to have achieved the Level-A proficiency Thus, these assessments serve as achievement tests for elementary English proficiency.
A high-quality test must possess both validity and reliability, which are essential qualities that ensure its effectiveness This article will explore the significance of these two critical aspects in the following section.
Test qualities
According to Bachman (1997), key characteristics of effective communicative testing include reliability, validity, authenticity, impact, and practicality Davies (2000) emphasizes the importance of understanding the concepts of validity and reliability, as well as their interrelationship, for those involved in creating significant assessments.
The interpretation and utilization of language ability measures are crucial for effective testing, as highlighted by Bachman (1997) These essential measurement qualities justify the reliance on numerical test scores for making informed inferences and decisions (Bachman & Palmer, 1996) Consequently, the primary qualities to consider in test development and application are validity, which pertains to the interpretation and use of tests, and reliability, which relates to the test scores themselves.
Validity is the most crucial aspect of test interpretation, defined as the accuracy with which a test measures its intended purpose (Alderson et al., 1995; Heaton, 1989; Hughes, 2003) It pertains to how test scores are utilized and interpreted, making it inherently linked to the test's intended purpose Therefore, testers must prioritize whether a test effectively measures what it claims to measure; if a test lacks validity for its intended use, the resulting scores may not accurately reflect their intended meaning.
Validity can be established through various methods, prompting writers to discuss different types of validity These types primarily include internal validity and external validity, which serve as distinct approaches to evaluating the overall validity of research.
Internal validity refers the validity in terms of the test itself According to
According to Alderson et al (1995), assessing the internal validity of a test can be done through several methods, with three of the most common being face validation, content validation, and construct validation Face validation involves feedback from non-testers, such as students and administrators, regarding the test's value Content validation is conducted by testers or experts in the relevant field who evaluate the test's content Lastly, construct validation assesses how effectively test performance reflects a meaningful measure of a specific characteristic or quality (Gronlund, 1985: 58).
Face validity refers to the extent to which a test seems credible or acceptable to the public, specifically regarding its ability to measure the knowledge or skills it purports to assess, as judged subjectively by an observer (Hughes, 2003).
According to Alderson et al (1995), a test possesses content validity when it includes a representative sample of the relevant language skills and structures Additionally, there should be a clear connection between the test items and the course objectives Therefore, establishing content validity requires a thorough and logical analysis of both the language being assessed and the specific objectives of the course.
Construct validity is essential for language tests as it aligns with specific characteristics based on a theory of language behavior and learning According to Hughes (2003), these tests aim to measure theoretical constructs such as reading ability, speaking fluency, and grammatical control As a result, the term 'construct validity' has evolved to encompass the broader concept of validity Moreover, construct validity significantly impacts content validity, which in turn influences face validity.
External validity, as described by Alderson et al (1995), pertains to the relationship between a test and other measures, with criterion-related validity being the most common type This includes both concurrent and predictive validity, highlighting the importance of assessing how well a test correlates with other established measures.
Concurrent validity refers to the evidence of a test's validity obtained by comparing its results with external measures of the same construct This involves evaluating the test against established criteria, such as existing valid tests or independent assessments, like teacher ratings, conducted simultaneously Essentially, a valid test should yield results that align closely with other valid measures intended for the same purpose.
Predictive validity refers to a test's ability to forecast future success, primarily seen in proficiency tests Unlike concurrent validity, which assesses external measures simultaneously with the test, predictive validity evaluates outcomes after a time lapse This can be quantified by correlating learners’ test scores with their later performance on exams, placement success, or assessments from subject or language teachers.
Validity in scoring is crucial for ensuring a test's effectiveness, as both the test items and the scoring methods must be valid (Hughes, 2003) For instance, if an oral test aims to assess a candidate's speaking ability but scores are influenced by the candidate's knowledge of the subject or culture, the assessment lacks validity This underscores Weir's (2005) perspective that reliability—characterized by stability and consistency—is a key component of validity, which he terms 'scoring validity.'
There is a significant interconnection among various types of validity in testing A valid test must possess internal validity, which encompasses construct validity that influences content validity, ultimately affecting face validity Additionally, external validity, including both concurrent and predictive validity, is essential Furthermore, a test is deemed valid only when both its items and the scoring methods for responses are valid.
This study does not assess the validity of level-A written test contents, but it provides foundational theory to support the analysis and recommendations presented in chapters four and five.
Another quality, which is really important in a language test, is reliability: the quality of test scores It is also the major consideration in this study
Reliability in measurement is characterized by its consistency, as defined by various authors Bachman and Palmer (1996) describe it as the "consistency of measurement," while Weir (2005) emphasizes its stability over time and freedom from bias Hughes (2003) adds that a language test is considered reliable if it yields consistent scores for the same students across different occasions and raters Despite varying definitions, all authors converge on the idea that reliability fundamentally pertains to the stability and consistency of test scores.
Conclusion
This chapter provides essential theoretical insights into test reliability in language teaching and learning, emphasizing the significance of testing and various test types It highlights two critical qualities of tests: validity and reliability, and examines the factors that influence test reliability along with potential solutions Additionally, the chapter explores the relationship between validity and reliability, offering a comprehensive framework that aids researchers in assessing the reliability of the English Level-A written test contents issued by CED, MOET at CTFLC.
To ensure an effective test, it must possess both validity and reliability Validity refers to the degree to which a test accurately measures its intended purpose, while reliability, or scoring validity, indicates the stability and consistency of test results over time, ensuring they are free from bias Test scores can be influenced by two primary sources: meaningful variance, which reflects true differences, and error variance, which encompasses inaccuracies in measurement.
Meaningful variance refers to the variance that is directly associated with the objectives of a test, and it arises from errors in the design of test items According to Brown (1996: 186), this issue fundamentally relates to test validity, suggesting that it can be categorized under errors originating from the test itself.
Measurement errors, which can impact test reliability, stem from four primary sources: administration procedures, scoring procedures, the test itself, and candidate-related factors These errors can significantly influence the accuracy and consistency of test results.
Effective administration procedures are crucial for ensuring the reliability of tests, as any shortcomings can lead to inconsistencies and unfairness for candidates At CTFLC, all testing protocols, including instructions for testing rooms, candidates, proctors, equipment, and timing, have been meticulously followed The center has successfully conducted 48 English Level-A courses and test administrations without any complaints from candidates regarding the procedures This success is attributed to the proctors' extensive experience in overseeing various important exams Additionally, each English Level-A test is conducted in accordance with the Education Ministry's regulations and is supervised by police officers and representatives from the City Education Department Therefore, it can be concluded that the administration procedures for the English Level-A tests at CTFLC are both rigorous and reliable.
Consistency in scoring procedures is crucial for ensuring the reliability and fairness of tests Objective tests are generally more reliable than subjective ones, as inconsistencies in marking subjective tasks can arise from several factors: inadequate selection and training of raters, lack of defined marking schemes or rating scales, and the potential for raters to be influenced by personal biases.
At CTFLC, the Level-A written tests are evaluated by a marking committee comprised of eight to ten experienced English teachers, including two males and eight females Each member has a minimum of seven years of teaching experience at CTIU and CTFLC, as well as seven years of experience as raters All committee members are graduates of Cantho University and reside in Cantho city, with ages ranging from 29 and above.
The marking committee usually starts working the day after the testing day
To ensure objective and reliable scoring while minimizing variability among raters, a comprehensive answer key with detailed scoring guidelines and alternative responses is provided for objective test items like multiple choice and gap filling These marking schemes, validated against written test versions from CED, MOET, are meticulously checked for accuracy and completeness before distribution to raters For subjective items such as composition writing, the marker committee collaboratively establishes clear rating criteria prior to scoring The composition writing section constitutes 14 to 17 percent of the overall test and involves marking a substantial number of scripts, typically ranging from 300 to 500 To maintain objectivity, scripts are coded numerically, and each is evaluated independently by two raters, with the final score being the average of their assessments If there is a significant discrepancy in ratings, a third rater or the committee chair re-evaluates the script, thereby ensuring inter-rater reliability in the scoring process.
Intra-rater reliability, alongside inter-rater reliability, plays a crucial role in assessing rater consistency The English Level-A written test comprises both objective and subjective items, with the majority being objective Subjective items account for only 14 to 17 percent of the total test, which minimizes the risk of scoring inconsistencies, particularly concerning intra-rater unreliability.
The scoring of objective test items in the English Level-A written tests at CTFLC is conducted using well-established marking schemes, ensuring reliability Although subjective test items account for only 14 to 17 percent of the total assessment, their scoring demonstrates inter-rater reliability, while intra-rater reliability is less significant Overall, the scoring procedures adhere to expert recommendations, leading to the conclusion that they are, to a considerable extent, reliable.
The reliability of the English Level-A written tests at CTFLC is influenced by two key factors: potential errors within the test version and mistakes made by candidates.
CTFLC has made significant efforts to minimize measurement errors in the English Level-A tests through careful administration and marking processes, as outlined in sections 1.5.3, 1.5.4, 1.5.5, and 1.5.7 However, candidates, who come from diverse backgrounds in the Mekong Delta, may be influenced by individual characteristics such as cultural background, knowledge, cognitive abilities, gender, age, attitudes, health, and sleep patterns These factors can lead to misinterpretation of test instructions, forgetting instructions, omitting test sections, or misreading items Therefore, individual candidate factors are considered outside the scope of this study.
To ensure reliability and fairness in testing, interchangeable test versions of the same level can be utilized For instance, if a specific test version is unavailable, an alternative version can be used that yields equivalent results for candidates At CTFLC, the English Level-A written tests are created by CED, MOET, not by CTFLC members Recently, there has been an increase in complaints regarding the quality of these written tests, with candidates noting that the current version is more challenging, featuring unfamiliar vocabulary and complex structures compared to previous versions This study aims to investigate potential issues related to the content of the English Level-A written test versions.
Estimating test reliability is a complex task that demands scientifically validated evidence In this study, the researcher, acting as the test administrator, sought to identify objective evidence to determine if any errors stemmed from the content of the written test versions.
How the research was carried out will be carefully described in the next chapter, Chapter three: Methodology.