Stay Signed In
Do you want to access your site more quickly on this computer? Check this box, and your username and password will be remembered for two weeks. Click logout to turn this off.
Stay Safe
Do not check this box if you are using a public computer. You don't want anyone seeing your personal info or messing with your site.
A Taxonomy
Evaluating Reading Comprehension in EFL
By Cheryl L. Champeau De Lopez, Giancarla Marchi B., and Maria E. Arreaza-Coyle
About 400 million people in the world today use English as a second or foreign language. Many of these people are professionals whose success or failure may well depend on their ability to read the latest scientific and technical publications in English. For this reason courses whose specific objective is the reading of scientific and technical texts are becoming more and more common in universities and technical colleges throughout the world.
Venezuela is no exception. At the Simón Bolívar University in Caracas, the first year English program is composed of three courses designed by language department professors to meet the needs of students who will major in different areas of science and technology. The main objective of these courses is to develop the skill of reading scientific and technical texts in English since students will be expected to understand English books and journals for their undergraduate studies and research, and later on in their professional activities.
Due to the importance of reading in English in today’s scientific world, the three courses are obligatory for all first year students. They are taught each term to approximately 1,000 students divided into 30 to 36 sections taught by 18 to 20 professors. This situation created the need to impose some type of standardized criteria to assure the achievement of a similar level among all students. It was therefore decided to administer two departmental exams each term. At first, controlled open-ended questions were used. To validate the correction of the exams, groups of exams were exchanged among different professors. Each professor corrected one lot and then checked her own group of tests. In spite of the control over the type of questions and the list of correction rules, it was found that professors graded the exams differently. Another problem with this type of exam was the difficulty of distinguishing between reading and writing. Students were being penalized for errors in writing, when what we really wanted to test was reading.
In an attempt to solve these problems, in 1978 the language department started to use multiple–choice questions. This type of objective question separates reading from writing skills and presents a series of advantages over open-ended questions. These advantages are the following: a) high corrector reliability, b) easy implementation, c) quick and easy collection, and d) easy determination of difficulty and discrimination levels.
It was also decided to use modular type questions, i.e., short independent texts for each question, rather than one or two longer readings, followed by numerous questions. By using 20 to 25 short texts on a variety of technical topics, we hoped to compensate for any advantage previous knowledge of a specific subject might afford a particular student. It should be pointed out that in our first year courses, students who will later major in different areas of pure science or engineering are mixed together in the same classes, so interests and background knowledge are diverse. The use of modular items also reduces the possibility of inter-item dependence, a condition which can reduce the discriminative ability of the items, and, therefore, the reliability of the scores (Haladyna 1994). Finally, the use of 20 to 25 different texts reduces the possibility that students will remember the questions and communicate this information to others.
The efficiency of multiple-choice items depends, to a great extent, on their design. The options of a good question must be plausible cognitive tasks related to and derived from the content of the text. The syntactic and semantic form of the questions must differ from that of the text so that students must understand the context rather than simply recognize the form to answer the question correctly. However, since the options are prefabricated answers, they may reduce the interaction between reader and text and deter the interpretation process (Widdowson 1978). But after considering the advantages and disadvantages for our particular situation, we decided that multiple-choice items were the most objective and efficient way to measure the reading skill in large groups of students. Because of their limitations, however, the weight of these exams is only 50% of the final grade, allowing teachers to complete the total grade with other types of evaluations.
In recent years more than 1,200 questions have been collected, so it has become necessary to organize these questions and create a computerized program which could store them and prepare an exam by selecting the most appropriate questions to be used in a given evaluation. For this purpose, the following taxonomy was designed.
________________________________________
The taxonomy
A multiple-choice item used to test reading comprehension usually consists of three parts: the reading text, the question or stem, and the options. Although several taxonomies exist, most describe only the type of question or stem making no reference to the nature of the reading text or the option, and most refer to open-ended questions rather than multiple-choice items.
The first taxonomy, and probably the best known, was published by Bloom et al., in 1956. The main purpose of this taxonomy was to classify educational objectives, but it was later also applied to the areas of instruction and evaluation. It is divided into three large areas or domains: (a) the cognitive domain, (b) the affective domain, and (c) the psychomotor domain. The cognitive domain refers to the intellectual activities involved in learning and is composed of a six-level hierarchy: knowledge, comprehension, application, analysis, synthesis, and evaluation. This taxonomy was very influential since it emphasized the complexity of the cognitive activities involved in learning and the fact that all must be taught and evaluated. The limitations, for our own purposes, are that it includes far more than reading comprehension and does not include those specific processes involved in the understanding of a written text.
In 1978, Herber tried to relate Bloom’s categories to three levels of reading comprehension: (a) literal comprehension, (b) interpretive comprehension, and (c) applied comprehension. Literal questions require the reader to recall or recognize information explicitly presented in the reading material. Interpretive questions ask for a paraphrase, explanation, inference, conclusion, or summary. Applied questions utilize the readers’ background knowledge and lead them to evaluate, elaborate, predict, or solve problems based on implicit information in the text.
Pearson and Johnson (1978) present a taxonomy of word comprehension tasks with nine levels and a taxonomy of propositional comprehension tasks also containing nine categories. Their question taxonomy, however, consists of only three levels: (a) textually explicit questions, (b) textually implicit questions, and (c) “scriptally” implicit questions. The definitions of these categories correspond roughly to those of Herber and to what Gray (1960) has called to “read the lines, read between the lines, and read beyond the lines.” In contrast to the taxonomy presented by Bloom et al., these two taxonomies refer specifically to reading comprehension and are important because they emphasize the relationship between the question and the source of the answer, thus reflecting the relationship between the text and the reader. For our purposes, however, both are too general.
Barrett’s taxonomy also refers to questions related to reading comprehension and is far more detailed than the ones mentioned above. Barrett proposes four main categories: (a) literal recognition or recall, (b) inferences, (c) evaluation, and (d) appreciation. Each level contains between four and eight categories. As the reader will see, some of these categories are similar to those mentioned in our taxonomy. For example, Barrett mentions recognition or recall of sequence (1.3.) and/or cause and effect relationships (1.5.). Our taxonomy also deals with these and other rhetorical patterns, but only at the level of recognition since it refers only to multiple-choice items, not open-ended questions. Another difference between our system and that of Barrett is that several of Barrett’s categories refer to the analysis of literary texts (i.e., 4.2. identification with characters and incidents). Since our taxonomy was designed for scientific and technical readings, it contains no such categories.
Elijah and Legenza (1975) present a taxonomy based largely on Barrett’s (1968) and Sander’s (1966) publications. They also describe four main levels of comprehension (literal, interpretive reaction, and application), with numerous subcategories. This system mentions several tasks not taken into account by Barrett such as interpreting unfamiliar words (1.B.1.) and summarizing (1.C.2.). However, it includes numerous activities which could not be tested using multiple-choice items.
Irwin’s taxonomy (1986) best reflects the interactive theory of reading comprehension. Irwin separates questions at the level of micro-information (concerning word meaning or syntactic relationships) from questions at the level of macro-information (main ideas summaries). Although this taxonomy contains numerous categories which would be useful in classroom discussions, they would not be applicable in multiple-choice exams. For example, Irwin mentions previous knowledge and metacognitive processes in her system. These types of questions would certainly be very important in teaching the mental processes needed to understand a reading (“comprehending”), but not to measure the level of understanding which has taken place (“comprehension”) (Chapman 1976). It should also be pointed out that although our system uses some of the same terms Irwin uses, the meaning of these terms is not necessarily the same in both taxonomies.
While these and other taxonomies classify types of questions, without mentioning the text and options, Arcay and Cossé (1992) present a system which categorizes certain types of texts. Their system groups both fictional and non-fictional texts according to form, content, and organization. Arcay and Cossé’s taxonomy includes many more areas of form and content than ours but in a more general form. They make no attempt to classify comprehension questions referring to these texts.
After reviewing these and other taxonomies, it became evident that none satisfied our needs regarding a system to classify multiple–choice items used to test reading comprehension of scientific and technical texts. We therefore decided to design our own taxonomy for this purpose. Since most of the items which we use contain three parts (the reading text, the stem, and the options), our system takes these three main areas into consideration. Furthermore, it takes into account the interactive and constructivist reading models on which the first year program is based (Rumelhart 1977; Stanovich 1980; Flower and Spivey in Cornish 1991; Goodman in Carrell et al., 1988; Widdowson 1984, 1990). These models present reading as a dynamic process where bottom-up and top-down processes interact to create meaning.
The following taxonomy ( Figure 1 ) which we created to overcome some of the limitations of existing versions will be described below.
________________________________________
The reading text
Four basic criteria are used to classify the reading text: A. Subject, B. Rhetorical Patterns, C. Sources, and D. Form.
The first general category (A. Subject) is further divided into three main groups: (1) Humanities and Social Sciences, (2) Physical Sciences, and (3) Biological Sciences, each of which is broken down into specific disciplines. Due to the growing interdependence of many fields nowadays, which is reflected in many of the reading texts, a reading may be classified as belonging to more than one subject category. For example, a text describing the use of computers in education would be classified under systems engineering (computers) (I.A.2.b.1.g.) as well as education (I.A.1.5.). The category “others” exists throughout the system to include texts or questions which exemplify a complex combination of several categories or which illustrate unique areas not frequent enough to merit a separate category.
The text is also classified according to the predominating rhetorical pattern. The study guide used in the first two trimesters of the reading course is organized around the patterns most commonly found in scientific and technical writing. The patterns selected to classify the reading text closely follow those which are emphasized in class: definition, static description, classification, comparison-contrast, chronology, process, cause-effect, hypothesis, argumentation, and exemplification.
The final two categories refer to source and form. The source is identified according to the style in which the text is written and the type of information which appears. For example, a textbook would be written in an objective style and contain explanations of basic concepts, well-known and generally accepted information, whereas a journal would describe recent investigations or discoveries and be written in technical language for specialists and researchers. The form refers to the graphic appearance of the text. It may be an extract from an article or book, a list of sentences to be placed in the correct order, a table or graph, a page from a dictionary, an abstract, etc.
________________________________________
The question (stem)
The question may take one of several forms. It may be a sentence separate from the text which must be completed with one of the options. It may be in the form of a question to be answered by an option, or it may take the form of instructions such as the following: “Form a coherent paragraph by choosing the correct order of the following sentences.” There may simply be a blank space left in the reading text which must be completed with one of the options.
Regardless of the form the question may take this part of the classification system attempts to categorize the cognitive process the reader must undergo to reach the correct answer. Frequently in order to decide which category is appropriate, one must look not only at the question but also at the text and options to see if the information needed is explicit or implicit, for example. To categorize the question, it is necessary to consider not only the type of information requested, but also the relation between the question and the source of the answer.
The classification of the question has been divided into two general categories: A. Micro-information and B. Macroinformation. Questions which belong to the first category can be answered by understanding or recognizing only specific sentences, phrases, or key words of the text. The reader does not necessarily have to read or understand the entire text but must be able to identify those parts of the reading referred to in the questions. For this task the reader depends mainly on his linguistic schemata (vocabulary and grammar). S/he must be able to group words together to form meaningful phrases and recognize syntactic relationships. In these tasks, bottom-up processing is very important. To answer a question classified as Macroinformation, the reader must read the entire text and integrate information found in different parts of the reading. In order to do this, s/he must draw upon his/her formal and content schemata. In these tasks, the importance of top-down processing becomes evident.
A. Microinformation: Within the category of Microinformation, the taxonomy includes thirteen tasks which a reader may be asked to perform. Regarding vocabulary, a reader may be asked to determine the meaning of a word based on the context in which it appears (II.A. 1.). In this type of question, the options all contain valid definitions of the word, so the question does not become a simple dictionary exercise. In category II.A.2., the reader is asked to identify the word or phrase which a particular noun or pronoun refers to, thus establishing cohesive relationships of an anaphoric or cataphoric nature.
In order to demonstrate his/her comprehension of the relationship among the different propositions presented by the author, the reader may be asked to select the appropriate connector or the appropriate usage of a given connector (II.A.3.). For example, by choosing the connector “nevertheless” in the following blank, the reader is demonstrating his/her recognition that the relationship between the first and second parts of the sentence is one of contrast:
The results were convincing; _________, further evidence from research was called for.
To determine if the reader has comprehended explicit information which appears in the text, s/he may be asked to select the most appropriate paraphrase for this information or simply to recognize the answer in specific parts of the text (II.A.4.). Category II.A.5. refers to items in which the stem appears in the form of a question, and the reader is asked to demonstrate understanding of explicitly stated facts in the reading.
The next eight categories require the reader to recognize the different rhetorical patterns used by the author. The reader may be asked to identify the words which are defined in the text (II.A.6.); to recognize the elements being compared, the basis for the comparison, or the relationship between two or more elements being compared (similarities or differences) (II.A.7.); to recognize the criteria used by the author to classify specific elements and for the relationship between these elements (II.A.8.); to recognize the sequence (chronology or process) used by the writer, or to recognize the sentence which appropriately describes the relationship between steps or stages in the sequence (II.A.9.). The reader may also be required to distinguish between reasons or motives and consequences clearly and explicitly described in the text by identifying the cause and or effect of a particular action or event (II.A.10. ), identify an idea as having been presented in the original text in the form of either fact or hypothesis (II.A.11), or identify what is being described in the reading (II.A.19.). Finally, the reader may be asked to identify the rhetorical function of the text. In these questions the options do not include information specific to the particular text. The reader would simply recognize key words indicating specific functions (II.A.13.).
B. Macroinformation. The category of Macroinformation is broken down into Analyze and Interpret. In questions which fall into the first of these categories, Analyze, the reader must examine and relate information which is explicitly present in different sections of the text. In addition to linguistic schemata, the reader must also utilize his/her formal schemata (Carrell et al., 1988) regarding the rhetorical organization of different types of texts. In questions classified in the second group, Interpret, the reader must go beyond the explicit information found in the text. S/he must elaborate, infer, or predict. In order to do this, s/he must rely heavily on content schemata.
1. Analyze. There are eight possible tasks within the category of Analyze. The reader may be asked to place a list of sentences in the correct order to form a coherent paragraph (II.B.1.1.). To do this, s/he must recognize the different indicators of text cohesion and identify propositional relationships between sentences at various levels.
To evaluate if the reader is able to transcode information from a text to a graph or diagram, s/he may be asked to recognize the most appropriate graphic representation of the information presented verbally in the reading (II.B.I.9.). S/he may also be required to select the best verbal interpretation of information which appears in a table or diagram (II.B.1.3.)
In some cases, the reading material may be composed of two short texts from different sources. In these instances the readings describe two different ideas, theories, or opinions on a given subject. The reader is asked to compare some aspect of the two texts (style, concepts presented, source, author’s purpose, etc.) (II.B.1.4.).
Two types of questions require the reader to recognize the structure or organization of the entire text. In the first, the reader must recognize textual inconsistencies. In these questions s/he is required to identify the sentence or idea which does not fit into an otherwise coherent paragraph, based on inconsistencies of either a linguistic or conceptual nature (II.B.1.5.). In the second, the reader must identify the logical progression of the text; s/he must recognize the manner in which the author presents his/her ideas (for example, inductively or deductively), or the order in which they appear (II.B.1.6.).
The two last categories under Analyze test for comprehension of explicit ideas presented in the reading. In II.B.1.7., the reader must integrate information explicitly present in different parts of the text in order to draw a conclusion and/or deduction. In II.B.1.8., the reader is asked to predict what follows the information that is presented in the text. This may take the form of completing the last sentence of the reading or predicting what the next sentence or next paragraph will probably deal with.
2. Interpret. The category of Interpret includes eleven possible tasks. In II.B.2.1., the reader is requested to identify the main idea of the reading, i.e., the message which the author wants to transmit. Regarding this category, we agree with the interpretation of Pearson and Johnson rather than that of Barrett. Barrett specifies two categories: 1.2. recognition or recall of main ideas and 2.2. inferring the main idea. By using only one category for identification of the main idea, our taxonomy reflects the opinion of Pearson and Johnson, who believe that almost all main ideas are inferences, even when they are explicitly stated in the text. The reason for this is that there are generally no grammatical or lexical clues in the text to indicate that a specific sentence reflects the main idea of the reading. The reader must infer which sentence encompasses the ideas presented in all the other sentences.
In II.B.2.2. the reader must identify the objective, goal, or purpose of the author in writing the text. In these questions the purpose must be specific to the particular text and simply more than just the recognition of general function words (see II.A.13.).
Category II.B.2.3. requires the reader to select the best title for the text. In order to do this, s/he must be able to recognize the main idea and or purpose of the author and identify it in a phrase which probably does not appear in the reading.
In the following two categories, the reader should consider the style, language and format used by the author to identify the probable source of the text (Il.B.2.4.) and the readers for whom it was written (II.B.2.5.).
Categories II.B.2.6. and II.B.2.7. refer to the author’s point of view. In the first, the reader should recognize the tone used by the author, e.g., irony, sarcasm, optimism, pessimism, etc. In the second, the reader should recognize the opinion expressed by the author, e.g., whether or not the author recommends a particular book or supports a specific theory. The reader should identify whether the author’s opinion is positive or negative.
Category II.B.2.8. is similar to II.B.1.7., except that now the information on which the reader is asked to base his/her conclusion is implicit rather than explicit. In these questions, the reader may be asked to select the opposite of the information which appears in the text, to generalize from specific examples given in the text, or to choose an appropriate example of a general category described in the reading.
In the following two categories, the reader utilizes implicit information from the reading as a basis for inferring what might have preceded (ll.B.2.9.) or followed this text (II.B.2.10.). This is similar to Barrett’s category 2.3. Inferring sequence.
In the final category the reader is asked to make an analogy between information contained in the passage and a new situation (II.B.2.11). In these questions, the reader must apply the information stated in the text to new examples.
Two aspects should be pointed out regarding part II of the taxonomy. First, the order in which the tasks appear does not necessarily imply order of difficulty of the item. In this sense, we adhere to the strict definition of the term taxonomy as being simply a classification system not a “hierarchical listing of skills” as identified by Elijah and Legenza, (1975:28). In multiple-choice items used to test reading comprehension, many factors affect the difficulty level. Besides the form which the stem takes, other elements such as the subject and style of the reading text and the reader’s previous knowledge regarding this subject are only a few of the factors which may contribute to determining the level of difficulty.
Second, we are aware that differences of opinion exist regarding the definitions of inference and implicit information. According to Chikilanga (1992), implicit information is based on two sources: the propositional content of a text (i.e., the explicit information present in the text) and the reader’s previous knowledge. Barrett’s concept of inference is slightly broader than Chikilanga’s description. She refers to inferential comprehension as being a combination of a synthesis of the literal content of a selection plus the reader’s personal knowledge, intuition, and imagination. On the other hand, Pearson and Johnson (1978) distinguish between questions requiring information which is textually implicit (“answers that are on the page but…not so obvious” p. 157) or “scriptally” implicit (“a reader needs to use his or her script in order to come up with an answer” p. 57).
We agree that in order to respond to an inference question, the reader must elaborate on information which is explicitly present, i.e., “read between the lines.” To do this, the reader must use all three types of schemata: linguistic, formal, and content. But it is also necessary to keep in mind that the purpose of these questions is to measure comprehension of a written text. We must, therefore, be careful to assure that our questions are not independent of the text. On the other hand, if specific information other than that which is presented in the text is needed to correctly answer the question, this information must be available to all the readers. For example, a question which requires the reader to recognize the possible source of a text assumes that all the readers are familiar with the characteristics which distinguish this particular type of reading. Figure 2 shows how this taxonomy would be used to classify a testing item.
________________________________________
The options
Each multiple–choice item in our system has four options. The classification of these options is based on statistical analysis. After each exam is administered, the answer sheets are analyzed using the LERTAP computer program, which determines the difficulty and the discrimination levels of each question and the effectiveness of the options. This information becomes part of our computerized item bank and is utilized in the selection of items to be used on future exams. In this way, we are able to produce exams at an appropriate level of difficulty containing items which have proven to distinguish between the efficient and less efficient readers.
It is important to point out that the reliability of the taxonomy was tested as measuring the degree of agreement among different professors who classified the same items. After a short period of training, the classifications reached independently by these professors coincided 90% of the time.
________________________________________
Conclusion
The taxonomy described here has been used to classify more than 1,200 items which form the basis of a computerized item bank of comprehension questions that are used to prepare valid and reliable exams to measure the ability of university students to read scientific and technical texts in English as a foreign language. Both the taxonomy and the computer program, which was also designed in the language department at the Simón Bolívar University, are sufficiently flexible to permit changes for practical and theoretical reasons. This flexibility was built into the system to accommodate the results of growing research in the area of applied linguistics and reading comprehension.
The program is extremely user friendly and presents a series of menus with various options designed to carry out exam-related functions and prepare different lists and tables useful for decision making. The user need only specify the requirements for a particular exam regarding text subject, objectives, difficulty levels, etc., and the program will provide a list of acceptable items fitting these characteristics.
The program also provides us with access to a data base which serves as a rich source of information for reading researchers. This data base contains a complete corpus of organized information which permits the study and evaluation of results produced by a specific item throughout the years and across groups of subjects.
The taxonomy has also been very useful to new teachers by helping them to focus on specific learner outcomes which they can emphasize in class. It also serves as a guide in the preparation of new items which can be incorporated into future exams.
It should be mentioned that the taxonomy presented here can also be used as a means for teaching. Once the students’ reading problems have been detected, the student may access other data bases to practice with texts and questions similar to those in the item bank.
It is necessary to point out that we do not pretend to have solved all problems related to the evaluation of the reading comprehension process. This system does not include nor does it classify all cognitive abilities involved in the reading process. We simply hope to have provided one approach to help in the evaluation of the ability to read scientific and technical texts in English.
Arcay Hands, E. and L. Cossé. 1992. La composición en EFL: Un modelo teórico. Valencia, Venezuela: Universidad de Carabobo.
• Barrett T. C. 1968. What is reading? Some current concepts. In Innovation and Change in Reading Instruction. The sixteenth handbook of the National Society for the Study of education. ed. H. M. Robinson. Chicago: The University of Chicago Press.
• ———. 1976 Taxonomy of reading comprehension. In Teaching reading in the middle class. eds. Smith R. and Barrett, T. C. Reading. MA.: Addison-Wesley.
• Bloom B. S., M. B. Engelhart, E. J. Furst, and D.R. Krathwohl. 1956. Taxonomy of education objectives: The classification of educational goals. Handbook 1: Cognitive domain. New York: Longmans Green.
• Carrell, P., J. Devine, and D. Eskey. 1988. Interactive approaches to second language reading. Cambridge: Cambridge University Press.
• Chapman, T. 1976. Comprehending and the teacher of reading. In Promoting reading comprehension. ed. J. Flood. Newark. Del.: International Reading Association.
• Chikalanga, I. 1992. A suggested taxonomy of inferences for the reading teacher. Reading in a Foreign Language, 8 (2), pp. 697–709.
• Cornish, F. (Date). Foreign language reading comprehension as externally guided thinking. Reading in a Foreign Language, 8, 2. p. 721.
• Elijah. D. and A. Leganza. 1975. A comprehension taxonomy for teachers. Reading Improvement, 15, 1. pp. 98–99.
• Flower, __, __ Spivey. 1992. Foreign language reading comprehension as externally guided thinking. ed. F. Cornish, Reading in a Foreign Language, 8, 2. p. 721.
• Goodman, K. 1988. The reading process. In Interactive approaches to second language reading. eds. P. Carrell et. al. Cambridge: Cambridge University Press. pp. 11–21.
• Gray, W. S. 1960. The major aspects of reading. In Sequential development of reading abilities. ed. H. M. Robinson. Supplementary Educational Monographs No. 90. Chicago: University of Chicago Press.
• Haladyna, T. 1994. Developing and validating multiple-choice test items. Hillsdale, N.J.: Lawrence Erlbaum Associate, Publishers.
• Herber, H. 1978. Teaching reading in content areas. 2nd ed. Englewood Cliffs, Prentice-Hall.
• Irwin, J. W. 1986. Teaching reading comprehension process. Englewood Cliffs: Prentice-Hall Inc.
• Pearson P. D., and D. D. Johnson. 1978. Teaching reading comprehension. New York: Holt Rinehart and Winston.
• Rumelhart D. 1977. Toward an interactive model of reading. In Attention and Performances. ed. S. Dornic. Hillsdale N.J.: Erlbaum.
• Sanders N. M. 1966. Classroom questions. New York: Harper and Row.
• Stanovich W.E. 1980. Toward an interactive-compensation model of individual differences in the development of reading fluency. Reading Research Quarterly, l7, pp. 157–159.
• Widdowson, H. G. 1978. Teaching language and communication. London: Oxford University Press.
• ———. 1984. Explorations in applied linguistics. London: Oxford University Press.
• ———. 1990. Aspects of language teaching. London: Oxford University Press.
________________________________________
Example of Item with Classification
At Albert Einstein College of Medicine in New York, Dr. Eli Seifter and co-workers have found that vitamin A and beta carotene, the chemical that gives carrots their color and from which the body makes vitamin A, can prevent or heal ulcers that have been provoked by heavy physical stress in experimental animals. Seifter suggests that vitamin A may shield the stomach and intestinal lining from erosion by gastric juices.
Which of the following is still only hypothesis?
A. What beta carotene is.
B. That vitamin A prevents ulcers.
C. How vitamin A heals ulcers.*
D. What the body produces vitamin A from.
Text subject: I.A.3.3. Medicine-health-nutrition
Text functions: I.B.7. Cause-effect
I.B.8. Hypothesis
Text source: I.C.1. Magazine-newspaper-pamphlet
Text form: I.D.1.1. Extract from article/book
Stem: II.A.11. Recognize fact-hypothesis
Testing Spoken English
As a Second Language
By Shreesh Chaudhary
Teaching and testing Spoken English (SE) has an old history. In the early 1800s Carey (1906; cited in Sinha 1978:22) advertised that at his school near Calcutta, “particular attention will be paid to the pronunciation” of English.
Teaching elocution, rhetoric, or SE, has until recently been an integral a part of the school curriculum. But as demand for English has grown and properly trained people have become scarce, clear goals and models have also disappeared as have the teaching and testing of SE except as an extra-curricular activity, even in countries like India.
________________________________________
“Speaking,” as Harris (1977:81) observes, “is a complex skill requiring the simultaneous use of different abilities which often develop at different rates….Five components are generally recognized in analyses of the speech process.” Harris lists them as follows:
1. a. Pronunciation including segmental features, vowels and consonants, and the stress and intonation patterns
b. Grammar
c. Vocabulary
d. Fluency
e. Comprehension.
Of these, pronunciation is the most difficult to assess.
“The central reason is the lack of general agreement on what good pronunciation of a second language means: Is comprehensibility to be the sole basis of judgment, or must we demand a high degree of phonetic and allophonic accuracy? And can we be certain that two or more native speakers will find the utterance of a foreign speaker equally comprehensible…?” (Harris 1977:81).
Tonkyn (1992) presents a good overview of this confusion about what may be called dimensions of oral proficiency. He examines rating scales like the ones used by the American Council on the Teaching of Foreign Languages (ACTFL), Australian Second Language Proficiency Ratings (ASLPR), the British Council/University of Cambridge English Language Testing Service (ELTS), the British Council’s Mini- Platform Interview (MPI) scale, etc.
After examining these scales, Tonkyn (1992:154–55) observes, “…a workable three-part profile might be produced concentrating on three separately rated overall factors, which I shall call accuracy, range, and strategic competence.…”
Tonkyn defines accuracy in terms of grammatical and pronunciation features requiring two different scales: range in terms of vocabulary and grammatical complexity; and strategic competence in terms of fluency. He, however, admits that, “we need to listen to a lot more examples of oral performance to validate this, or any alternative, profile” (p.155).
There has been little work on testing SE. Brown (1992:15) says, “Since the inception of the journal Language Testing in 1984 only one article has appeared specifically on the topic of pronunciation testing.” The writer of that article, R. Major (1987:155), feels, “The measurement of pronunciation accuracy is in the dark ages when compared to measurement of other areas of competence.” The present paper seeks to fill this gap.
________________________________________
Concept of “good” SE for ESL
In SE, the distinction between ESL and English as a foreign language (EFL) seems significant. As Brown (1992:3) notes, “In ESL situations English has official status, is used widely in government, is the medium of education, and is in widespread use in everyday life of the people. In contrast, (in EFL) English in official situations has low recognition and is used mainly for communication with foreigners….” These differences have implications for teaching and testing.
Many features in (1) or in Tonkyn’s scales may be redundant for English in India, where it is a second rather than a foreign language. In ESL, pronunciation requires attention. An ESL speaker has relatively little difficulty with grammar, vocabulary and fluency. Regarding Indian English, Bansal (1973:1) says, “…in pronunciation it is very different from either British or American English and even within India there are a large number of regional varieties, each different from the others in certain ways and retaining to some extent the phonetic patterns of the Indian languages spoken in that particular region.”
As a listener, Wells (1982:624) feels, there are Indians educated at British public schools whose accent is unquestionably RP. There are Indians with a fair knowledge of English whose accent is nevertheless so impenetrable that English people can understand them, if at all, only with the greatest difficulty.
So there is a need for teaching and testing SE, but there is no agreement as to what “good” pronunciation is. In this it is unlike Written English (WE). All teachers agree that in teaching writing they must teach spelling, punctuation, and format such as leaving space before and after every word.
In the absence of a similar agreement on pronunciation, teaching and testing SE isn’t so objective. It is sometimes argued that there is no need to teach SE (e.g., Kachru 1988, Nadkarni 1992), or that standard English e.g., Received Pronunciation (RP) is the ideal model (Quirk 1990). But RP itself, as Shibles (1995) shows, is no monolith nor is General American (GA) (see Wells 1982). Natural languages do not work in this way.
Fortunately, there have been efforts on varieties labeled variously as “minimum essential,” “minimum adequate” (West 1968), “Rudimentary International Pronunciation (RIP) (Gimson 1978), “essential ingredients,” (Bradford 1996), etc. These features constitute the relatively unchanging core of its phonology, which spans centuries and countries.
In pronunciation, according to West (1968: 205), “What is of vital importance is rhythm, the strong regular beat of English stresses which makes Welshmen, Scotsmen, and all native English speakers intelligible to each other, in spite of their very different vowel systems….”
Absence of a “strong regular beat of English stress” marks non-standard non-native accents of English, though these varieties also differ among themselves. Most “standard” varieties (Wells 1982:34) differ from other varieties in the following:
2. a. Phrasal pause
b. Word stress
c. Vowel length
d. Some consonantal contrasts.
In “standard” varieties syllables are gathered in groups of stressed and some stressless syllables, groups usually co-terminous with a phrase. Standard varieties stress over 808 words alike (Sack 1968). This gives them a unique rhythm.
Vowel quality differs with dialect; its quantity rarely does. For instance, dame is pronounced like dime in Australia; bomb like balm, court like caught by many RP speakers; and cheer like chair by some in New Zealand. But they all have a diphthong or long vowel.
Historically, vowel quality has changed more than vowel quantity. Sea was spoken like say by Londoners in the days of Shakespeare. Preferred pronunciation of great in the days of Dr. Johnson rhymed with greet rather than with grate. Standard varieties have eight diphthongs and seven long vowels, the largest number of long vowel sounds in the world.
Likewise, contrasts between voiced and voiceless, or between /l/ and /r/, etc., or fricative consonants, of which English has nine, have remained relatively unchanged.
A rapid rate of speech obstructs ESL speakers’ comprehensibility. Powers (1985) reports that a tempo of over 275 words per minute can make one unintelligible. Usha (1995) has shown that a tempo of four syllables per second may be ideal for comprehensibility. So in my course (See Chaudhary 1993) I include the following:
3. a. Slow tempo of speech
b. Phrasal pause
c. Word stress
d. Long vowels
e. Fricative consonants
I also include pronunciation of the following:
4. a. Numbers
b. Names of days, date, month, etc.
c. Letters of English alphabet
d. Weights and measures, etc.
My overall goal is comprehensibility, for which accuracy in word stress, length of vowels and some consonants seems essential.
________________________________________
Design of test
To include items in (3) and (4) so that the test can be administered and scored objectively and easily is difficult. The test must also combine comprehensibility with phonetic accuracy, and, as Gimson (1980:327) adds, “performance in a situation of free discourse.” A good test for SE, it seems, must include the following:
5. a. Free discourse to check intelligibility and acceptability
b. Atomistic test to check
l. word stress through word lists read aloud
2. phonetic accuracy in some vowels and consonants.
For my course for senior undergraduate students, many of whom go to America as teaching assistants, or work for big Indian and foreign companies, I design the test (See Figure 1 ) on the basis of the principles in (5). The first two questions check tempo of speech, extemporaneous expression, pronunciation of numbers, letters of the English alphabet and other items which occur frequently in SE. The third question checks phrasal pauses. It has two reading passages. These passages (See Figure 1), from Gimson (1978), have most vowel and consonant sounds of standard varieties. Examinees read the passages silently before reading a passage aloud for evaluation. It also involves atomistic tests of vowels and consonants.
A list of bi-, tri-, and poly-syllabic words is used for testing word stress. Words are arranged at random. A word with main stress on the initial syllable may be printed next to one with main stress on the final syllable, which may be followed by another with main stress elsewhere. The list has 20 bisyllabic, 20 trisyllabic, 20 of four syllables and some polysyllabic words. They represent major stress patterns of English (See Chomsky and Halle 1968).
The last question has minimal or sub-minimal pairs of words to test the production of diphthongs, long vowels and some consonants known to be difficult for many Indian speakers.
This test is easily administered. As the student reads a text aloud, the examiner can mark time, pauses, stresses, and make note of vowel and consonantal quality, and award grades later. Students’ speech is recorded along with the time taken by them.
Credit is given for slow tempo. A test of such a test would be to see if the student with the highest score is also the best understood.
________________________________________
Intelligibility study and evaluation of TSE
This test has been used for relative grading and comprehensibility of English spoken by ESL learners from the Indian sub-continent.
On a relative scale scores ranged from 66% to 99%. The examinee with a score of only 66% did not pause systematically, had at least three pairs of unclear sound contrasts, and misstressed at least 25 of the 66 words given for the test. The examinee with a score of 99% had regular pauses, “correct” stress on all words, and got most sound contrasts right. The examinee scoring only 66% erred in word stress, and had erratic and unclear segmental contrasts and pauses. There were 29 examinees, and their average score was approximately 90%.
From an earlier test, recordings of some students’ SE were played to a randomly selected group of listeners. Relative grades of these students were borne out almost without change (See Figure 2 ). The student with the highest score, call him G(A), was heard by 129 listeners, who were asked to fill in the 15 blanks in the given text after hearing G(A). One hundred and fifteen listeners filled in 14 blanks correctly, which is about 90% listeners filling nearly 90% blanks correctly.
The student with the lowest score, call her L(A), was heard by 62 listeners who were asked to fill in the 15 blanks in the text after hearing her. Of these only 35 listeners filled in a maximum of 10 blanks correctly. Only 60% listeners heard more than 60% of the text correctly.
To further test this model of the course and the test, we played two recordings of the highest achiever to an international group of listeners. The text, given in Figure 2, was taken from O’Connor (1980). The first recording was made on the first day of the course, (we will call it G[B]), and the second recording (we will call it G[A]) was made on the last day of the course.
For this test there were 18 listeners from Indian Ocean countries including seven from India itself. Listening to G(B) none of the students could fill in more than 14 of the 28 blanks. Only five listeners could fill in over 10 blanks. Most could fill in only between five and nine blanks correctly. G(B) was understood no better by the Indians—one of whom could fill in 14 blanks; others filled in between one and six blanks only.
The second recording, G(A), however, was understood much better. Of the 27 blanks on the sheet, no one filled in less than two. Six listeners filled in over 21 blanks, and 16 filled in over 15 blanks correctly.
This improvement in intelligibility can be attributed mainly to the change in the tempo of speech. G(B) speaks at an average tempo of over five syllables per second, whereas G(A) has a tempo of less than three syllables per second.
________________________________________
Conclusion
This course and accompanying test ap-pear to be appropriate for teaching SE in India. The design of the test for objective and valid evaluation of tempo of speech and standard word stress seems adequate. But teaching and testing of pauses and segments require further refining.
________________________________________
References
• Bansal, R. K. 1973. The intelligibility of Indian English. Monograph N. 4, Central Institute of English and Foreign Languages, Hyderabad.
• Bradford, B. 1990. The essential ingredients of a pronunciation programme. In Speak Out, No. 6, July, 1990.
• Brown, A. 1992. Twenty questions. In Approaches to pronunciation teaching. ed. A. Brown, London: Macmillan, pp. 1–17.
• Chaudhary, S. 1993. A new course in better spoken English. New Delhi: Sterling Publishers.
• Chomsky, N. and M. Halle. 1968. The sound patterns of English. New York: Harper and Row.
• Gimson, A. C. 1978. Towards an international pronunciation of English. Oxford: Oxford Un