ASSESSING LISTENING AND ASSESSING READING (GROUP 5) ~ LTEClass English Department Baturaja University

ASSESSING LISTENING

In earlier chapters, a number of foundational principles of language assessment were introduced. Concepts like practicality, reliability, validity, authenticity, washback, direct and indirect testing, and formative and summative assessment are by now part of your vocabulary. You have become acquanted with some tools for evaluating a “good” test, examined procedures for designing a clasroom test, and explored the complex process of creating different kinds of test items. You have begun absorb the intricate psychometric, educational, and political issues that intertwine in the world of standarized and standards- based testing.

Now our focus will shift away from the standarized testing juggernaut to th elevel at which you will usually work: the day-to-day classroom assessment of listening, speaking, writing, and reading. Since this is the level at which you will most frequently have the opportunity to apply principles of assessment, the next four chapters of this book will provide guidelines and hands –on practice in testing within a curriculum of English as a second language or foreign language.

But first, two important caveats. The fact that the four language skills are discussed in four separate chapters should in no way prediscope you to think that those skills are or should be assessed in isolation. Every TESOI: professional will tell you that the integration of skills is of paramount importance in language learning. Likewise, assessment is more authentic and provides more washback when skills are integred. Nevertheless, the skills are treated independently here in order to identify principles, test types, tasks, and issues associated with each one.

Second, you may already have scanned through this book to look for a chapter on assessing grammar and vocabulary, or something in the way of a focus on form in assessment. The treatment of form- focused assessment is not relegated to a separate chapter here for a very distinct reason : there is no such thing as a test of grammar or vocabulary that does not invoke one or more of the separate skills of listening, speaking , reading and writing! It’s not uncommon to find little “ grammar test” and” vocabulary tests” in textbooks, and these may be perfectly useful instruments. But, responses on these quizzes are usually written, with multiple- choice selection or fill-in-the-blank items. In this book, we treat the various linguistic forms ( phonology, morphology, lexicon, grammar, and discourse) within the context of skill areas. That way, we don’t perpetuate the myth that grammar and vocabulary and other linguistic forms can somehow be disassociated from a mode of performance.

OBSERVING THE PERFORMANCCE OF THE FOUR SKILLS

Before focussing on listening itself, think about th etwo interacting concepts of performance and observation. All language users perform the acts of listening, speaking, reading and writing. They of course rely on their underlying competence in order to accomplish these performances. When you propose to assess someone’s ability in one or a combination of the four skills, you assess that person’s competence, but you observe the person’s performance. Sometimes the performance does not indicate true competence : a bad night’s rest, illness, an emotional distraction, test anxiety, a memory block, or other student- related reliability factors could affect performance, thereby providing an unreliable measure of actual competence.

So, on eimportant for assessing a learner’s competence is to consider the falibility of the results of a single performance, such as that produced in a test. As with any attempt at measurement, it is your obligation as a teacher to triangulate your measurements : consider at least two ( or more ) performances and/ or contexts before drawing a conclusion. That could take the form of one or more of the following designs:

· Several tests that are combined to form an assessment

· A single test with multiple test tasks to account for learning styles and performance variables

· In- class and extra- class graded work

· Alternative forms of assessment (e.g., journal, portfolio, conference, observation, self-assessment, peer-assessment).

Multiple measures will always give you a more reliable and valid assessment than a single measure.

A second principle is one that we teachers often forget. We must rely as much as possible on observable performance in our assessment of students. Observable means being able to see or hear the performance of the learner ( the senses of touch, taste , and smell don’t apply very often to language testing!). what, then, is observable among the four skills of listening, speaking, reading and writing? Table 6.1 offers an answer.

Isn’t it interesting that in the case of the receptive skills, we can observe neither the process of performing nor a product? I can hear your argument already : But I can see that she’s listening because she’s nodding her head and frowning and smiling and asking relevant questions.” Well, you’re not observing the listening performance; you’re observing the result of the listening. You can no more observe listening (or reading ) than you can see the wind blowing. The process of the listening performance itself is the invisible, inaudible process of internalizing meaning from the auditory signals being transmitted to the ear and brain. Or you may argue that the product of listening is a spoken or written response from the student that indicates correct (or incorrect) auditory processing. Again, the product of listening nad reading is not the spoken or written response. The product is within the structure of the brain, and until teachers carry with them little portable MRI scanners to detect meaningful intake, it is impossible to observe the product. You observe only the result of the meaningful input in the form of spoken or written output, just as you observe the result of the wind by noticing trees waving back and forth.

The productive skills of speaking and writing allow us to hear and see the process as it is performed. Writing gives a permanet product in the form of a written piece. But unless you have recorded speech, there is no permanent observable product for speaking performance because all those words you just heard have vanished from your perception and (you hope0 have been transformed into meaningful intake somewhere in your brain.

Receptive skills, then, are clearly the more enigmatic of the two modes of performance. You cannot observe the actual act of listening or reading, nor can you see or hear an actual product! You can observe learners only while they are learning or reading. The upshot is that all assessment of listening and reading must be made on the basis of observing the test- takers speaking o writing ( or nonverbal response), and not on the listening or reading itself. So, all assessment of receptive performance must be made by inference!

How discouraging, right? Well, not necessarily. We have developed reasonably good assessment tasks to make the necessary jump, through the process of inference, from unobservable reception to a conclusion about comprehension competence. And all this is a good reminder of the importance not just of tringulation but of the potential fragility of the assessment of comprehension ability. The actual performance is made “ behind the scenes,” and those of us who propose to make reliable assessments of receptive performance need to be on our guard.

THE IMPORTANCE OF LISTENING

Listening has often played second fiddle to its counterpart, speaking. In the standardized tasting industry, a number of separate oral production tests are vailable ( Test of Spoken English, Oral Proficiency Inventory, and PhonePass^® , to name several that are described Chapter 7 of this book), but it is rare to find just a listening test. One reason for this emphasis is that listening is often implied as a compotent of speaking. How could you speak a language without also listening? In addition, the overtly observable nature of speaking renders it more empirically measurable then listening. But perhaps a deeper cause lies in universal biases toward speaking. A good speaker is often (unwisely) valued more highly than a good listener. To determine if someone is a proficient user of a language , people customarily ask, “ Do you speak Spanish”

Every teacher of language knows that one’s oral production ability- other than monologues, speeches, reading aloud, and the like –nis only as good as one’s listening comprehension ability. But of even further impact is the likelihood that input in the aural-oral mode accounts for a language proportion of successful language acquisition. In a typical day, we do measurably more listening than speaking (with the exception of one or two of your friends who may be nonstop chatterboxes!). Whether in the workplace, educational, or home contexts, aural comprehension far outstrips oral production in quantifiable terms of time, number of words, effort, and attention.

We therefore need to pay close attention to listening as a mode of performance for assessment in the classroom. In this chapter, we will begin with basic principles and types of listening, then move to a survey of tasks that can be used to access listening. (For a review of issues in listening, you may want to read Chapter 16 of TBP).

BASIC TYPES OF LISTENING

As with all effective tests, designing appropriate assessment tasks in listening begins with the specifications of objectives or criteria. Those objectives may be classified in terms of several types of listening performance. Think about what you do when you listen. Literally in nanoseconds, the following processes flash through your brain:

1. You recognize speech sounds and hold a temporary “imprint” of them in short-term memory.

2. You simultaneously determine the type of speech event (monologue, interpersonal dialogue, trabsactional dialogue) that is being processed and attend to its ocntext (who the speaker is, location, purpose) and the content of the message).

3. You use (bottom- up) linguistic decoding skills and/or (top/down background schemata to bring a plausible interpretation to the message, and assign a little eral and intended meaning to the utterance.

4. In most cases ( except for repetition tasks, which involve short-term memory only), you delete the exact linguistic form in which the message was originally received in favor of conceptually retaining important or relevant information in long- term memory.

Each of these stages represents a potential assessment objective:

· Comprehending of surface structure elements such as phonemes, words, intonation, or a grammatical category

· Understanding of pragmatic context

· Determining meaning of auditory input

· Developing the gist a global or comprehensive understanding

From these stages we can derive four commonly identified types of listening performance, each of which comprises a category within which to consider assessment tasks and procedures.

1. Intensive. Listening for perception of the components (phonemes, words, intonation, discourse markets, etc.) of a larger stretch of language.

2. Reponsive. Listening to a relatively short stretch of language (a greeting, question, command, comprehension check, etc.) in order to make an equally short response.

3. Selective. Processing stretches of discourse such as short monologues for several minutes in order to “scan” for certain information. The purpose of such performance is not necessarily to look for global or general meanings, but to be able to comprehend designated information in a context of longer stretches of spoken language ( such as classroom directions from a teacher, TV or radio news items, or stories). Assessment tasks in selective listening could ask students, for example, to listens for names, numbers, a grammatical category, directions ( in a map exercises), or certain facts and events.

4. Extensive. Listening to develop a top-down, global understandingn of spoken language. Extensive performance ranges from listening to lenghthy lectures to listening to a coversation and deriving a comprehensive message or purpose. Listening for the gist, of the main idea, and making inferences are all part of extensive listening.

For full, comprehension, test- takers may at the extensive level need to invoke interactive skills ( perhaps note-taking, questioning, discussion): listening that includes all four of the above types as test- takers actively participate in discussions, debates, conversations, role plays, and pair and group work. Their listening performance must be intricately integrated with speaking (and perhaps other skills)in the authentic give-and-take of communictive interchange.

MICRO AND MACRO SKILLS OF LISTENING

A useful way of synthesizing the above two lists is to consider a finite number of micro- and macroskills implied in the perfomance of listening comprehension Richards’ (1983) list of microskills has proven useful in the domain of specifying objectives for learning and may be even more useful in forcing test makers to carefully identify specific assessment objectives. In the following box, the skills oare sub devided into what I prifer to think of as microskills ( attending to the smaller bits and chunks of language, in more of bottom-up process ) and microskills ( focusing on the larger elements involved in a top-down approach to a listening task ). The micro and macroskills provide 17 different objectives to assess in listening.

Micro- and macroskills of listening ( adapted from Richards, 1983 )

Microskills :

1. Discriminate among the distinctive sound of English.

2. Retain chunks of language of different lengthd in short-term memory.

3. Recognize English strees patterns, word in stressed and unstessed positions, rhythmic structure, intonation contours, and their role in signaling information.

4. Recognize reduced forms of words.

5. Distinguish word boundaries, recognize a core of words, and interpretword order patterns and their significance.

6. Process speech at different rates of delivery.

7. Process speech containing pauses, errors, corrections, and other perfomance variables.

8. Recognize grammatical word classes ( noun, verbs, etc.), systems (e.g. tense, agreement, pluralization), pattern, rules, and elliptical forms.

9. Detect sentence constituents and distinguish between major and minor constituents.

10. Recognize a particular meaning may be expressed in different grammatical forms.

11. Recognize cohesive devices in spoken discourse.

Macroskills :

12. Recognize the communicative functions of utterances, according to situations, participants, goals.

13. Infer situations, participants, goals using real–world knowladge.

14. From event, ideas, and so on, described, predict outcomes, infer links and connections between events, deduce causes and effects, and detect such relations as main idea, supporting idea, new information, given information, generalization, and exemplification.

15. Distinguish between literal and implied meanings.

16. Use facial, kinesic, body language, and other nonverbal clues to decipher meaning.

17. Develop and use a battery of listening strategies, such as detecting key words, guessing the meaning of words from context, appealing for help, and signaling comprehension or lack thereof.

Implied in the taxonomy above is a notion of what makes many aspects of listening ladifficult, or why listening is not simply a linear process of recording strings of language as they are transmited into our brains. Developing a sense of which aspects of listening perfomance are predictably difficult will help you to challenge your students appropriately and to assign weights to items. Consider the following list of what makes listening difficult.

1. Clustering: attending to appropriate “chunks” of language-phrases, caluses, constituents.

2. Redundancy: recognizing the kinds of repetitions, rephrasing, elaborations, and insertions that unrehearsed spoken language often contains, and benefiting from that recognition.

3. Reduced forms: understanding the reduced forms that may not have been a part of an English learner’s past learning experiences in classes where only formal “textbook” language has been presented.

4. Perfomance variables: being able to “weed out” hesitations, false starts, pauses, and corrections in natural speech.

5. Colloquial language: comprehending idioms, slang, reduced forms, shared cultural knowledge.

6. Rate of delivery: keeping up with the speed of delivery, processing automatically as the speaker continues.

7. Stress, rhythm, and intonation: correctly understanding prosodic elements of spoken language, which is almost always much more difficult than understanding the smaller phonological bits and pieces.

8. Interaction: managing the interactive flow of language from listening to speaking to listening, etc.

DESIGNING ASSESSMENT TASKS: INTENSIVE LISTENING

Once you have determined objectives, your next step is to design the tasks, including making decisions about how you will elicit performance and how you will expect the test-taker to respond. We will look at tasks that range from intensive listening performance, such as minimal phonemic pair recognition, to extensive comprehension of language in communicative contexts. The focus in this section is on the microskills of intensive listening.

Recognizing Phonological and Morphological Elements

A typical form of intensive listening at this level is the assessment of recognition of phonological and morphological element of language. A classic test task gives a spoken stimulus and asks test-takersto identify the stimulus from two or more choices, as in the following two examples:

Test-takers hear: He’s from California.

Test-takers read: (a)He’s from California

(b)She’s from California.

Phonemic pair, consonants

Test-takers hear: is he living?

Test-takers read: (a) Is he leaving?

(b) Is he living?

Phonemic pair, vowels

In both cases above, minimal phonemic distinctions are the target. If you are testing recognition of morphology, you can use the same format:

Morphological pair, -ed ending

Test-takers hear: I missed you very much

Test-takers read: (a) I missed you very much

(b) I miss you very much

Hearing the past tense morpheme in this sentence challenges even advanced learners, especially if no context is provided. Stressed and unstressed words may also be tested with the same rubic. In the following example, the reduced form (constraction)of can not is tested:

Stress pattern in can’t

Test-takers hear: My girlfriend can’t go to the party.

Test-takers read: (a) My girlfriend can’t go to the party.

(b) My girlfriend can go to the party

Because they are decontextualized, these kinds of tasks leave something to the desired in their authenticity. But they are a step better than items that simply provide a one-word stimulus:

Test-takers hear: vine

Test-takers read: (a) vine

(b) wine

One-word stimulus

Paraphrase Recognition

The next step up on the scale of listening comprehension microskills is words, phrase and sentence, which are frequently assessed by providing a stimulus sentence and asking the test-taker to choose the correct paraphrase from a number of choices.

Test-takers hear: hellow, my name’s Keiko. I come from Japan.

Test takers read: (a) Keiko is comfortable in Japan.

(b) Keiko wants to come to Japan.

(d) Keiko likes Japan.

Sentence paraphrase

In the above item, the idiomatic come from is the phrase being tested. To add a little context, a conversation can be the stimulus task to which test-takers must respond with the correct paraphrase:

Test-takers hear: man : Hi, Maria, my name’s George.

Woman: Nice to meet you, George. Are you American?

Man : No, I’m Canadian.

Test-takers read: (a)George lives in the United States.

(b)George is American.

(c)George comes from Canada.

(d)George is Canadian.

Dialogue paraphrase

Here, the criterion is recognition of the objective form used to indicate country of origin: Canadian, American, Brazilian,Italian, etc.

DESIGNING ASSESSMENT TASKS: RESPONSIVE LISTENING

A question-and-answer format can provide some interactivity in these lower-end listening tasks. The test-taker’s response is the appropriate answer to a question.

Appropriate response to a question

Test-takers hear: How much time did you take to do your homework?

Test-takers read: (a) In about an hour.

(b) About an hour.

(d) Yes, I did.

The objective of this item is recognition of the wh-question how much and its appropriate response. Distractors are chosen to represent common learner errors: (a) responding to how mush vs how much longer; (c) confusing how much reference to time vs the more frequent reference to money; (d) confusing a a wh-questionwith a yes/no question.

None of the tasks so far discussed have to be framed in a multiple-choice format. They can be offered in a more open-ended framework in which test-takers write or speak the response. The above item would then look like this:

Open-ended response to a question

Test-takers hear : how much time did you take to do your homework?

Test-takers write or speak :

If open-ended response formats gain a small amount of authenticity and creativity, they of course suffer some in their practicality, as teachers must then read students’ responses and judge their appropriateness, which takes time.

DESIGNING ASSESSMENT TASKS: SELECTIVE LISTENING

A third type of listening performance is selective listening, in which the test-taker listen to a limited quantity of aural input and must discern within it some specific information. A number of techniques have been used that require selective listening.

Listening Cloze

Listening cloze tasks ( sometimes called cloze dictations or partial dictations )require the test-taker to listen to a story, monologue, or conversation and simultaneously read the written text in which selected words or phrases have been deleted. Cloze procedure is most commonly associated with reading only. In generic form, the test consists of a passage in which every nth word (typically every seventh word) is deleted and the test-taker is asked to supply an appropriate word. In a listening cloze task, test-taker see a transcript of the passage that they are listening to and fill in the blanks with the words or phrases that they hear.

One potential weakness of listening cloze techniques is that they may simply become reading comprehension tasks. Test-takers who are asked to listen to a story with periodic deletions in the written version may not need to listen at all, yet may still be able to respond with the appropriate word or phrase. You can guard again this eventuality if the blanks are items with high information load that cannot easily predicted simply by reading the passage. In the example below (adapted from Bailey, 1998, p. 16), such a shortcoming was avoided by focusing only on the criterion of numbers. Test-takers hear an announcement from an airline agent and see the transcript with the underlined words deleted :

Listening cloze

Test-takers hear:

Ladies and gentleman. I now have some connecting gate information for those of you making connections to other flights out of San Francisco.

Flight seven-oh—six to Portland will depart from gate seventy-three at nine-thirty P.M.

Flight ten-forty—five to Reno will depart at nine-fifty P.M. from gate seventeen.

Flight four-forty to Monterley will depart from gate nine-thirty at P.M. from gate sixty.

And flight sixteen-oh—three to Sacramento will depart from gate nineteen at ten-fifteen P.M.

Test-takers write the missing words or phrases in the blanks.

Other listening cloze task may focus on a grammatical category such as tenses, articles, two-words verbs, prepositions, or tranition words/phrases. Notice two important structural differences between listening cloze tasks and standard reading cloze. In a listening cloze, deletions are geverned by the objective of the test, not by mathematical deletion of every nth word; and more than one word may be deleted, as in he above example.

Listening cloze tasks should normally use an exact word method of scoring in which you accept as a correct response only the actual word or phrase that we spoken and consider other appropriate words as incorrect. Such stringency is warranted; your objective is, after all, to test listening comprehension, not grammatical or lexical expectancies.

Selective listening can also assessed through an information transfer technique in which aurally processed information must be transferred to a visual representation, such as labeling a diagram, identifying an element in a picture. Completing a form, or showing routes on a map.

At the lower end of the scale of linguistic complexity, simple picture-cued items are sometimes efficient rubrics for assessing certain selected information.

Consider the following item:

Information transfer: multiple-picture-cued selection

Test-takers hear:

Choose the correct picture. In my back yard I have a bird feeder. Yesterday, there were two birds and a squirrel fighting for the last few seeds in the bird feeder. The squirrel was on top of the bird feeder while the larger bird sat at the bottom of the feeder screeching at the squirrel. The smaller bird was flying around the squirrel, trying to scare it way.

The preceding example illustrate the need for test-takers to focus on just the relevant information. The objective of this task is to test prepositions and prepositional phrases of location (at the bottom, on top of, around, along with larger, smaller), so other words and phrases such as back yard, yesterday, last few seeds, and scare away are supplied only as context and need not be tested. (the task also presupposes, of course, that test-takers are able to identify the difference between a bird and a quirrel).

In another genre of picture-cued tasks, a number of people and/or actions are presented in one picture, such as a group of people at a party. Assuming that all the items, people, and actions are clearly depicted and understood by the test-taker assessment may take the form of

· Questions :”is the tall man near the door talking to a short woman?”

· True/false:”the woman wearing a red skirt is watching TV.”

· Identification:”Point to the person to the left of the couch.”

In a third picture-cued option used by the Test of English for International Communication (TOEIC^Ò), one single photograph is presented to the test-taker, who then hears four different statements and must choose one of the four to describe the photograph. Here is an example.

Information transfer: single-picture-cued verbal multiple-choice

Test-takers see:	a photograph of a woman in a laboratory setting, with no glasses on, squinting through a microscope with her right eye, and with her left eye closed.
Test-takers hear:	(a) She’s speaking into a microphone. (b) She’s putting on her glassess. (c) She has both eyes open. (d) She’s using a microscope.

Information transfer tasks may reflect greater authenticity by using charts, maps, grids, and other artifacts of daily life. In the example below, test-takers hear a student’s daily schedule, and the task is to fill in the partially completed weekly calendar.

Information transfer: chart-filling

Test-takers hear:

Now you will hear information about Lucy’s daily schedule. The information will be given twice. The first time just listen carefully. The second time, there will be a pause after each sentence. Fill in Lucy’s blank daily schedule with the correct information. The example has already been filled in.

You will hear : Lucy gets up at eight o’clock every morning except on weekends.

You will fill in the schedule to provide the information.

Now listen to the information about Lucy’s schedule. Remember, you will first hear all the sentences; then you will hear each sentence seperately with time to fill in your chart.

Lucy gets up at 8:00 every morning except on weekends. She has English on Monday, Wednesday, and Friday at ten o’clock. She has History on Tuesday and Thursday at two o’clock. She takes Chemistry on Monday from two o’clock to six o’clock. She plays tennis on weekends at four o’clock. She eats lunch at twelve o’clock every day except Saturday and Sunday.

Now listen a second time. There will be a pause after each sentence to give you time to fill in the chart. (Lucy’s schedule is repeated with a pause after each sentence).

Test-takers see the following weekly calendar grid:

	Monday	Tuesday	Wednesday	Thursday	Friday	Weekends
8:00	get up	get up	get up	get up	get up
10:00
12:00
2:00
4:00
6:00

Such chart-filling tasks are good examples of aural scanning strategies. A listener must discern from a number of pieces of information which pieces are relevant. In the above example, virtually all of the stimuli are relevant, and very few words can be ignored. In other tasks, however, much more information might be presented than is needed (as in the birdfeeder item), forcing the test-taker to select the correct bits and pieces mecessary to complete a task.

Chart-filling tasks increase in difficulty as the linguistic stimullus material becomes more complex. In one task described by Ur (1984, pp. 108-122), test-takers listen to a very long descriptions of animals in various cages in a zoo. While they listen, they can look at a map of the layout of the zoo with unlabeled cages. Their tasks is to fill in the correct animal in each cage, but the compleity of the challenging. Similarly, Hughes 1989, p. 138) described a map-marking task in which test-takers must process around 250 words of colloquial language in order to complete the tasks of identifying names, positions, and directions in a car accident scenario on a city street.

Sentence repetition

The task of simply repeating a sentence or a partial sentence, or sentence repetition, is also used as an assesment of listening comprehension. As in a dictation ( dicussed below ), the test-taker must retain a stretch of language long enought reproduce it, and then must respond with an oral repetition of that stimulus. Incorrect listening comprehension, whether at the phonemic or discourse leve may be manifested in the correctness of the repetition. A miscue in repetition scored as a miscue in listening. In the case of somewhat longer sentences, one comargue that the ability to recognize and retain chunks of language as well as thread of meaning might be assessed througt repetition. In chapter 7, we will look close at PhonePase , a commercially produced test that relies largely on entence repetition to assess both oral production and listening comprehension.

Sentence repetition is far from a flewless listening assessment task. Buck (2001, p. 79 ) noted that such tasks “are not just tests of lisrening, but tests of general oral skill”. Further, this task may test only recognitio of sounds, and it can easily be contaminated b lack of short-term memory abiliy, thus invalidating it as an assessment or comprehention alone. And the teacher may never be able to distinguish a listening comprehension error from an oral production error. Therefore, sentence repetition tasks should be used with caution.

DESIGNING ASSESSMENT TASKS: EXTENSIVE LISTENING

Drawing a clear distinction between any two of the categories of listening referred to here is problematic, but perhap the fuzziest division is between selective and extensive listening. As we gradually move along the continuum from smaller to large stretches of language, and from micro- to macroskills of listening, the probability of using more extensive listening tasks increases. Some important questions about designing assessments at this level emerge.

1. Can listening performance be distinguished from cognitive processing factors such as memory, associations, storage, and recall?

2. As assessment procedures become more comunicative, does the task take into account test-takers’ ability to use grammatical expectancies, lexical collecations, semantic interpretations, and gramatic competence?

3. Are test tasks themselves correspondingly content valid and authentic that is, do they mirror real-worldlanguage and context?

4. As assessment tasks become more and more open-ended, they more closely resemble pedagogical tasks, which leads one to ask what the differentce is imply specified scoring proscedures, while the letter do not.

We will try to address these questions s we look at a number of extensive or quasiextensive listening comprehension tasks.

Dictation

Dictation is a widely researched genre of assessing coprehension. In a dictation, test-takers hear a passage, typically of 50 to 100 words, recited three times: firts, at normal speed; then, with long pauses between phrases or natural word groups, during which time test-takers write down what they have just heard; and finally, at normal speed once more so they can check their work and proofread. Here is a sample dictation at the intermediate level of English.

First reading ( natural speed, no pauses, test-taker listen for gist ):

The state of California has many geographical areas. On the western side is the pacific ocean with its beaches and sea life. The central part of the state is a large fertile valley. The southeast ha a hot desert, and north and west have beautiful mountains and forests. Southern California is a large urban area populated by millions of people.

Second reading (slowed speed, pouse at each // break, test-takers write):

The state of California // has many geographical areas. // On the western side // is the pacific ocean // with its beaches and sea life. // The central part of the state // is a large fertile valley. // The southeast ha a hot desert, // and north and west // have beautiful mountains and forests. // Southern California // is a large urban area // populated by millions of people.

Third reading ( natural speed, tet-takers check their work ).

Dictation

Dictations have been used as assessment tools for decades. Some readers still cringe at the thought of having to render a correctly spelled, verbatim verson of a paragraph or story recited by the teacher. Unti research on integrative teting was published (see oller, 1971), dictations were thought to be not much more than glorified spelling tests. Howerver, the requried integration of listening and writing in dictation, along with its presupposed knowledge of grammatical and discourse expectancies, brought this tecnique back into vogue. Hughes (1989), Cohen (1994), Beiley (1998), and Buck (2001) all defend the plausibelity of dictation as an integrative test that requires some sophistication in the language in order to process and write down all segments correctly. Thust, I include dictation here under the rubric or extensive tasks, although I am more comfortable with labeling it quasi-extensive.

The difficuly of a dictation task can be easily manipulated by the length of te word groups ( or bursts, as they are tecnically called ), the length of the pauses, the speed at which the text is read, and the complexity of the discourse, grammar, and vocabulary used in the passage.

Scoring is another matter. Depending on your context and purpose in administering a dictation, you will need to decide on scoring criteria for several possible kinds of errors:

Ø Spelling error only, but the word appears to have been heard correctly

Ø Spelling and/ or obvious misrepresentation of a word, illegible word

Ø Grammatical error ( for examole, test-takers hears I can’t do it, writes I can do it ).

Ø Skipped word of phrase

Ø Permutation of words

Ø Additional words not in the original

Ø Replacement of a word with an appropriate synonym

Determining the weight of each of these errors is a highly idiosyncratic choised spealists disagree almost more than they agree on the importance of the above categories. They do agree ( Buck, 2001 ) that a dictation is not a spelling test, and that the first item in the list above should not be considered an error. They also suggest that point systems be kept simple ( for maintaining practically and reliability ) and that a deductible scoring method, in which points are subtracted from a hypotetical total, is usually effecive.

Dictation seems to provide a reasonably valid method for integrating and writing skills and for tapping into the cohesive elements of language implied in shord passages. Howevere, a word of caution lest you assume that dictation provides a quickand easy methodof assesing extensive listening comprehension. If the burstsin a dictation are relatively long ( more than five-word segments ), this method places a certain amount of load on memory and processing of meaning ( Buck, 2001, p. 78 ). But only a moderate degree of cognitive processing of required, and claiming that dictation fully assesses the ability to comprehend pragmatic or illocutionary elements of language, context inference, or semantics may be going too far. Finally, one can easily question the authenticity of dictation: it is rare in the real world for people to write down more than a few chunk of information ( addresses, phone numbers, grocery lists, ddirections, for example ) at a time.

Despite these disadvantages, the oarticality of the administration of dicnations, a moderate degree of rebility in well-established scoring system, and a strong correspondence to other language abilities speaks well for the inclusion of dictation among the possibilities for assessing extensive ( or quasi-extensive ) listening comprehension.

Communicative Stimulus-Response Tasks

Another-and more aucentic-example of extansive listening is found in a popular genre of assessment task in which the test-taker is presented with a stimulus monologue or conversation and then is asked to respond to a set of comprehension questions. Such tasks ( as you saw in chapter 4 in the discussion of standardized testing ) are commenly used in commercialy produced proficiency tests. The monologues, lectures, and brief conversations used in such task are sometime a little contrived, and certainly the subsequent multiple-choice questions don’t mirror communicative, real-life situations. But with some care and creativity, one can create reasonably autentic stimuli, and in some rere cases the response mode ( as shown in one example below ) actually approaches complete authenticity. Here is a typical example of such a tesk.

Tast-takers hear:

Directions: now you will hear a conversation between lynn and her doctor. You will hear the conversation two times. After you hear the conversation the second time, choose the correct answer for questions 11-15 below. Mark your answers on the answer sheet provided.

Doctor : Good morning, lynn. What’s the problem?

Lynn : well, you see, I have a terrible headeche, my nose is running, and i’m really dizzy.

Doctor : okay,. Anything else?

I’ve been coughing, I think I heve a fever, and my stomach aches.

Lynn : well, let’s see, I went to the lake lat weekend, and after I returned home I started sneezing.

Dogtor : hmm. You must have the flu. You should get lost of rest, drink hot beverages, and stay warm. Do you follow me?

Lynn : well, uh, yeah, but . . . shouldn’t I take some medicine?

Test-takers read:

11. what is Lynn’s problem?

a) She feels horrible.

b) She run too fast at the lake.

c) She’s been drinking too many hot beverages.

12. when did Lynn’s problem start?

a) When she saw her doctor

b) Before she went to the lake.

c) After she came home form the lake

13. the doctor said that Lynn

a) Flew to the lake last weekend

b) Must not get the flu

c) Probably has the flu

14. the doctor told Lynn

a) To rest

b) To follow him

c) To take some medicine

15. according to Dr. Brown, sleep and rest are medicine when you have the flu.

a) More affective than

b) As effective as

c) Less effective than

Dialogue and multiple-choice comprehension items

Does this meet the criterion of authenticity? If you want to be painfully fussy you might object that it is rare in the real world to eavesdrop on someone else’s doctor-patient conversation. Nevertheless , the conversation itself is relatively authentic ; we all have doctor-patient exchanges like this. Equally authentic, if you add a grain of salt, are monologues, lecturettes, and news stories, all of which are commonly utilized as listening stimuli to be followed by comprehension questions aimed at assesing certain objectives that are build into the stimulus.

Is the task itself (of responding to multiple—choice questions) authentic? It’s plausible to assert that any task of this kind following a one-way listening to a conversation is artificial : we simply don’t often encounter little quizzes about conversations we’ve heard (unless it’s your parent, spouse, or bestfriend who wants to get in on the latest gossip !). The questions posed above, with the possible exception of #14, are unlikely to appear in a lifetime of doctor visits. Yet the ability to respond correctly to such items can be construct validated as an appropriate measure of field-independent listening skills : the ability to remember certain details from a conversation. (as an aside here, many highly proficient native speaker of English might miss some of the above question if they heard the conversation only once and if they had no visual accsess to the items until after the conversation was done!).

The compensate for the potential inautenticity of post-stimulus comprehension questions, you might, with a little creativity, be able to find contexts where question that probe understanding are more appropriate. Consider the following situation :

Dialogue and authentic questions on details

Test-takers hear:

You will hear a conversation between a detective and a man. The tape will play the conversation twice. After you hear the conversation a second time, choose the correct answers on your test sheet.

Detective : where were you last night at eleven P.M., the time of the murder?

Man : Uh, let’s see, well, I was just starting to see a movie.

Detective : Did you go alone?

Man : No, Uh, well, I was with my friend, Uh, Bill. Yeah, I was with Bill.

Detective : what did you do after that?

Man : we went out to dinner then I dropped her off at her place .

Detective : then you went home?

Man : yeah

Detective : when did you get home?

Man : alittle before midnight.

Test-takers read :

7. Where was the man at 11:00 P.M.?

a. In a restaurant

b. In a teather

c. At home

8. Was he with someone?

a. He was alone

b. He was with his wife

c. He was with a friend

9. Then what did he do ?

a. He ate out

b. He made dinner

c. He went home

10. When did he get home?

a. About 11:00

b. Almost 12:00

c. Right after the movie

11. The man is probably lying because (name to clues) :

1. ..........................................................................

2. ..........................................................................

In this case, test-takers are brought into a little scene in a crime story. The questions following are plausible questions that might be asked to review fact and fiction in the conversation. Question #11, of course, provides an extra shot of reality : the test-taker must name the probable lies told by the man (He reffered to Bill as “her”; he saw a movie and ate dinner in the space of one hour), which requires the precess of inference.

Authentic Listening Tasks

Ideally, the language assessment field would have a stockpile of listening test types that are cognitively demanding, communicative, and authentic, not to mention interactive by means of an integration with speaking. However, the nature of a test as a sample of performance and a set of tasks with limited time frames implies an equally limited capacity to mirror all real-world contexts of listening performance. “There is no such thing as a communicative test, “stated Buck (2001,p.92.). “ Every test requires som components of communicative language ability, and no test covers them all. Similarly, with the notion of authenticity, every task shares some characteristics with targer-language tasks, and no test is completely authentic”.

Beyond the rubrics of intensive, responsive, selective, and quasi-extensive communicative contex described above, can we assess aural comprehension in a truly communicative contexts? Can we, at this end of the range of listening tasks, ascertain from test-takers that they have processed the main idea(s) of a lecture, the gist of a story, the pragmatics of a conversation, or the unspoken inferential data present in most authentic aural input? Can we assess a test-taker’s comprehension of humor idiom, and metaphor? The answer is a cautious yes, but not without some concessions to practicality. And the answer is a more certain yes if we take the liberty of stretching the concept of assessment to extend beyond tests and into a broader framework of alternatives. Here are some possibilities.

1. Note-taking.

In the academic world, classroom lectures by professors are common features of a non-native English-user’s experience. One form of a midterm (Kahn, 2002) uses a 15-minutes lecture as a stimulus. One among several response formats includes note-taking by the test-takers. These notes are evaluated by the teacher on a 3—point system, as follows.

Scoring system from lecture notes

0-15 points

Visual representation : are your notes clear and easy to read? Can you easily find and retrieve information from them? Do you use the space on the paper to visually represent ideas? Do you use indentation, headers, numbers, etc?

0-10 points

Accuracy : Do you accurately indicate main ideas from lectures? Do you note important details and supporting information and examples? Do you leave out unimportant information and tangents?

0-5 points

Symbols and abbrevations : Do you use symbols and abbrevations as much as possible to save time? Do you avoid writing out whole words, and do you avoid writing down every single word the lecturer says?

The process of scoring is time consuming (a loss of practicality), and because of the subjectivity of the point system, it lacks some reliability. But the gain is in offering students an authentic task that mirrors exactly what they have been focussing on in the classroom. The notes become an indirect but arguably valid form of assessing global listening comprehension. The task fulfills the criteria of cognitive demand, communicative lamguage, and authenticity.

2. Editing. Another authentic task provides both a written and a spoken stimulus, and requires the test-taker to listen for discrepancies. Scoring achieves relatively high reliability as there are usually a small number of specific differences that must be identified. Here is the way the task preceeds.

Editing a written version of an aural stimulus

Test-takers read: the written stimulus material 9a news report, an email from a friend, notes from s lecture, or an editorial in a newspaper).

Test-taker hear: a spoken version of the stimulus that deviates, in a finite numbers of fact or opinions, from the original written form.

Test-takers mark: the written stimulus by circling any words, phrases, facts, or opinions that show a discrepancy between the two versions.

One potentially interesting set of stimuli for such a task is the description of a political scandal first form a newspaper with a political blas, and then form a radio broacast from an “alternative” news station. Test-takers are not only forced to listen carefully to differences but are subtly informed about biases in the news.

3. Interpretive tasks. One of the intensive listening tasks described above was paraphrasing a story or conversation. An interpretive task extends the stimulus material to a longer stratch of discourse and forces the test—taker to infer a response.

Potential stimuli include

· Song lyrics

· [recited] poetry

· Radio/television news reports, and

· An oral acount of an experience

Test-takers are then directed to interpret the stimulus by answering a few questions (in open-ended form). Question might be :

· “Why was the singer feeling sad?”

· “What events might have led up to the reciting of this poem?”

· “What do you think the political activists might do next, and why?”

· “What do you think the storyteller felt about the mysterious disappereance of her necklace?”

This kind of task moves us away from what might traditionally be considered a test toward an informal assessment, or possibly even a pedagogical technique or activity. But the task conforms to certain time limitations, and the questions can be quite specific, even though they ask the test-taker to use inference. While reliable scoring may be an issue (there may be more than one correct interpretation), the authenticity of the interaction in this task and potential washback to the student surely give it some prominence among communicataive assessment procedures.

4. Retelling : In a related task, test-takers listen to a story or news event and simply retell it, or summarize it, either orally 9on an audiotape) or in writing. In so doing, test-takers must identify the gist, main idea, purposes, supporting points and/or conclusion to show full comprehension. Scoring is partially predetermine by specifying a minimum number of elements that must appear in the retelling. Again reliability may suffer, and the time and effort needed to read and evaluate the response lowers practicality. Validity, cognitive precessing, communicative ability and authenticity are all well incorporated into the task.

ASSESING READING

Even as we are bombarded with an unending supply of visual and auditiry media, the written word continoues in its function to convey information, to amuse and entertain us, to codify our social, economic, and legel convenstion, to fulfill a host of other functions. In literate societies, most “normal” children learn to read by the age of five or six, and some even earlier. With the exception of a small number of people with learning disabilities, reading is a skill that is taken for granted.

In foreign language learning, reading is likewise a skill that teacher simply expect learners to acquire. Basic, beginning level textbooks in a foeign langage presupose a student’s reading ability if only because it’s a book that is a medium. Most formal test use the writtenword as a stimulus for tets-taker response; even oral interviews my require reading performance for certains tasks. Reading, arquably the most essential skill for success in all educational context, remains a skill of paramount importance as we create assessments of general language ability.

Is reading so natural and normal that learners should simply be exposed to writteen texts with no particular instructions? Will they just absorb the skill necessary to convert their perception of a handful of letters into meaningful chucks of informations? Not necessarily. For learners of English, two pritnary hurdles must be cleared in order to become efficient reader. First, they need to able to master fundamental bottem-up strategies for processing separate letters, words, and phrases, as weell as top-down, conceptually driven strategies for comprehension. Second, as part of that top-down approach, second languange readers must develop appropiate conteny and formal schemata-backround information and cultural experiece-to caarry out those interpretations effectively.

The assessment of reading ability does not end with the measurement of comprehension. Strategic pathways to fuul understnding are often important factor to include in assessing learners, especially in the case of the classroom assessments that are for mative in nature. An inability to comprehend my thus be treced to a need to enchace a test-taker’s strategies for achieving ultimate comprehension. For example, an academic tecnhical report ma be comprehensible to a student at the sentence level, but if theleaner has not exercised certain strategies for noting the discourse conventions of the genre, misunderstanding may occur.

As we consider a number of different types or genres of written texts, the components of reading, let’s not forget the unobservable nature of reading. Likr listening one cannot see the process of reading, nor can one observe a specific product of reading. Other than observing a readre’s eye movements and page turning, there is no tecnology that anables us to “see” sequences of grapich symbols traveling from the pages of a book into compretments of the brain ( in a possible bottom-up process ). Even more outlandish is the notion that one might be able to watch information from the brain make its way down into the page ( in typical top-down strategies ). Further, once something is read-information from the written test is stored-no tecnology alliws us to empirically measure exactly what is lodged in the brain. All assessment of reading must ba carried out be inference.

TYPES ( GENRES ) OF READING

Each type or genre of written text has its own set of governing rules and convertions. A reader must be able to anticipate those conventions in order to process meaning effecienly. Whith an extraordinary number of genres present in any literate culture, the reader’s ability to process texts must be every sophisticated. Considr the following abridged list of common genres, which ultimately form part of the specifications for assessments of reading ability.

1. Academic reading

General interest articles ( in magazines, newspapers, ect)

Technical repor ( e.g., lab reports ), professional jiurnl articles reference material ( dictionries, ect.)

Textbooks, theses

Essays, papers

Test directions

Editorials and opinion writing

2. Job-related reading

Massages ( e.g,. phon masseges )

Letters/ Emails

Memos ( e.g., interoffice )

Reports ( e.g., job evaluation, projectreports )

Schediles, labels, signs, announcements

Forms, applicatons, questionnaires

Financial document ( bills, invoices, etc.)

Directories ( telephone, office, etc.)

Manuals, directions

3. Personal reading

Newspapers and megazines

Letters, email, greeting cards, invitations

Massages, note, lists

Schedules ( train, bus, plane, etc. )

Recipes, menus, maps, calenders

Advertisements ( commercials, want ads )

Novels, short stories, medical reports, immigration documents

Comic rtips, cartoons

Genre of reading

When we realize that list is only the beginning, it is easy to see how overwhelming it is to learn to read in a foreign language! The genre of a text enables readers to apply certain schemata that wil assest them in etracting appropriate meaning. If, for example, readers know that a text is a recipe, they will expect a certain arranbement of information ( ingredients ) and wil know to search for a sequential order of directions. Efficient reader also have to know what their purpose is in reading a text, the strategies for accomplishing that purpose, and how to retain the information.

The content validity of an assessment procedure is largely established through the genre of a text. For example, if learns in a program of English for toursm have ments of their ability should include guidebooks, maps, transportation schedules, calendars, and other relevent texts.

MICROSKILLS, MAKROSKILLS, AND STRATEGIES FOR READING

Aside from attending to genres of text, the skills and strategies for accomplishing reading amerge as a crucial cinsideration in the assesment of reading ability. The micro and macroskills below represent the spectrum of possibilities for objectives in the asessment of reading comprehension.

Micro and macroskills for reading comprehension

Microskills

1. Discriminate among of the distinctive graphemes and orthographic patterns of Englis

2. Retain chuck of language of different lenfths in short-term memory.

3. Process writing at an efficient rate of speed to suilt the purpose

4. Recognize a core of words, and interpret word order patterns and thair significance.

5. Recogize grammatical word classes (nouns, verb, etc. ) sytems ( e.g., tese, agreement, pluralization ), patterns, rules, and elliptical forms.

6. Reconizethat a particular meaning may be expressed in different grammatical forms.

7. Recognize cohesive devices in written discourse and their role n signaling the relationship between and among clauses.

Microskills

8. Recognize the rhetorical forms of written discourse and their significance for onterpretation.

9. Recognize the communicative functions of written text, according to form and purpose.

10. Infer context that is not explicit by using backround knowledge.

11. From described events, ideas, ect., infer link and contections between event’s, decude causes and effects, and detect such relation as main idea, supporting idea, new information, generalization, and examplification.

12. Distinguish between literal and implied meaning.

13. Detect culturally specific reference and interpret them in a context of the appropriate cultural schemata.

14. Develop and use a battery of reading strategies , such as scanning and skimming, detecting discourse markers, guessing the meaning of words from context, and activating schemata for the interpretation of texts.

The assessment of reading can imly the asessment of a storehouse of reading strategies, as indicated in item #14. Aside from simply testing the ultimate achievement of conprehension of a written text, it may be inpotant in some context to asses one or more of a storehouse of classic reading strategies. The brief taxonomy of strategies below is a list of possible assesment criteria.

1. Identify your purpose in reading a text.

2. Apply spelling riles and conventions for bottom-up decoding.

3. Use lexical analysis ( prefices, roots, suffixes, etc. ) to determine meaning

4. Gues at meaning ( of words, idioms, etc.) when you aren’t certain.

5. Skim the text for the gist ind for main ideas.

6. Scan the text for specific information ( names. Dates, key words).

7. Use silent reading tecniques for rapid procesing.

8. Use marginal notes, outlines, charts, or semantic maps for understanding and retainng information

9. Distinguish between literal and impliedmeanings.

10. Capitalize on discourse markers to process relationships.

Some principle strategies for reading comprehension

TYPES OF READING

In the previous chapters we saw that both listening and speaking could be subdivided into at least five different types of listening and speaking performance. In the case of reading, variety of perfomance is derived more from the multiplicity of texts (the genres listed above) than from the variet of overt types of performance. Never the less, for considering assesment produceres, several types of reading performance are typically identified, and these will serve are organizers of various assessment taskt.

1. Perceptive. In keeping with the set of categories specifiedfor listening comprehension, similar spesifications are offered hrer, except with some differing terminonoly to computer uniqueness of reading. Perceptive reading tasks involve attending to the components of largers stretches of discourse: letters, word, punctuation, and other graphemic symbols. Bottom-up processing is implied.

2. Selective. This category is largely an artifact of assessment formants, in order to ascertain one’s reading recognition of lexical, grammatical, or discourse features of language within a very short stretch of language, certain typical tasks are used picture cued tasks, matching, true/ false, multiple-choice, etc. Stimuli include sentences, brief paragraphs, and simple charts and graphs. Brief resounses are intended as well. A combination of bottom-up and top-down processing may be used.

3. Interactive. Include among tnteractivereading types are streches of language of several paragraphs to one page or more in which the reders must, in a psycholinguitic sense, interact with the text. That is, reading is a process of negotiating meaning; the readers brings to the text a set of schemata for undestanding it and in take is the product of that interaction. Typical genres that lend themselves to interactive reading are anecdotes, short narratives and descriptions, excerpts from longer text, questionnaires, memos, announcements, directions, recipes, and the like. The focus of an interactive task is to identify relevant features ( lexical, symbolic, grammatical, and discoure ) within texts of moderately short length with the objective of retaining the information that is processed. Top-down processing is typical of such tasks, although some instances of bottom-up performance may be necessary.

4. Extensive. Extensive reading, a dicussed in this books, applies to texts of more than a page, up to and including professional articles, essats, tecnhical reports, short stories, and books. (it should be noted that reading research commonly refers to “extensive reading” as longer stretches of discourse, such as long articles and books that are usually read outside a classroom hour. Hereh that definition is massaged a little in order to encompass any taxts longer that a page). The purposes of assessment usually are to tap into a learner’s global undestanding of a iext, as opposed to asking test-takers to “zoom in”on small details. Top-doen processing is assumed for most extensive tasks.

DESIGNING ASSESSMENT TASKS: PERCEPTIVE READING

At the beginning levelof reading a second language lies a set of tasks that are fundamental and basic: recognition of alphebetic symbols, capitalied and lowercase letters, punctuation. Word, and grapheme-phoneme correpondences. Such tasks of perception are often referred to as literacy tasks, implying that the learner is in the early stages of becoming “literate”in their own native language, but in other cases the second language my be the first languge that they have ever learned to read. This letters context poses cognitive and sometimes agerelated issues thet need to be considered carefully. Assessment of literacy is no easy assigment, and if you are interested in this particular challenging area further reding beyond this book is advised ( Harp, 1991; far & Tone, 1994; Genesee, 1994; Cooper, 1997 ). Assessment of basic reading skills may be carried out in a number of different ways.

Reading Aloud

The test-takers sees separate letters, words, and/ or short sentence and read them aloud, one by one, in the oresence of an administrator. Since the assessment is of reading comprehensin, any recognizable oral approximation of the target response is considered correct.

Written Response

The same stimuli are presented, and the test-takker’s task is to reproduce the probe on writing. Because of the transfer across different skills here, evaluation of the test-taker’s response must be carefully treated. If an error accursw, make sue you determine its course; what might be assumed to be writing eror, for example, may actually be a reading error, and vice versa.

Multiple-Choice

Multiple-choice responses are not only a matter of choosing one of four or five posible answers. Other formats, some of wich are especially useful at the low levels of reading, include same/ different, circle the answer, true/false, choose he letter, and matching. Here are ome posibilities.

Test-taker read: Circle “S” for same or “D” for different.

1. Led let S D

2. Bit bit S D

3. Seat set S D

4. Too to S D

In the case of very low level learners, the teacher/administrator reads directions.

Minimal pairndistinction

Test-takers read: Circle the “ood” item, the one that doesn’t “belong”.

1. Piece peace plece

2. Book book boot

In the case of very low level learners, the teacher/administrator reads directions.

Grapheme recognition task

Structure-Cued Items

Test-takers are shown a picture, such as the one on the next page, along with a written text and are given one of a number of possible tasks to perform.

Picture-cued word identification (Brown & Sahni, 1994, p. 124)

cat

chair

clock

With the same picture, the test-taker might read sentences and then point to the correct part of the picture :

Picture-cued sentence identification

Test-takers hear : Point to the part of the picture that you read about here.

Test-takers see the picture and read each sentence written on a seperate card.

The man is reading a book

Or a true/false procedure might be presented with the same picture cue:

Picture-cued true/false sentence identification

Test-takers read:

1. The pencils are under the table T F

2. The cat is on the table T F

3. The picture is over the couch T F

Matching can be an effective method of assesing reading at this level. With objects labeled A,B,C,D,E in the picture, the test-taker reads words and writes the appropriate letter beside the word :

Picture-cued matching word identification

Test-takers read:

1. Clock

2. Chair

3. Books

4. Cat

5. table

Finanlly. Test-takers might see a word or phrase and then be directed to choose one of four pictures that being described, thus requiring the test-taker to transfer from a verbal to a nonverbal mode. In the following item, test-takers choose the correct letter:

Multiple-choice picture-cued word identification

Test-takers read: Rectangle

Test-takers see, and choose the correct item:

A B C D

DESIGNING ASSESMENT TASKS : SELECTIVE READING

Just above the rudimentary skill level of perception of letters and words is a category in which the test designer focuses on formal aspects of language (lexical, grammatical, and a few discourse feature). This category includes what many textbooks provide little think of as testing “vocabulary and grammar”. How many textbooks provive little tests and quizzes labeled “vocabulary and grammar” and never feature any other skill besides reading? Lexical and grammatical aspects of language or simply the forms we use the perform four of the skills of listening, speaking, reading, and writing. (notice that in all of these chaapters on the four skills, formal features of language have become a potential focus for assesment.)

Here are some of the possible tasks you can use to asses lexical and grammatical aspects of reading ability.

MULTIPLE-CHOICE (FOR FORM –FOCUSED CRITERIA)

By far the most popular method of testing a reading knowledge of vocabulary and grammar is the multiple-choice format, mainly for reasons of particality : it is easy to administer and can be scored quickly. The most straightforward multiple-choice items may have little context, but might serve as a vocabulary or grammatical check.

1. He’s not married. He’s

A. Young

B. Single

C. First

D. A husband

2. If there’s no doorbell, please .......................... on the door.

A. Kneel

B. Type

C. Knock

D. Shout

3. The mouse is ................................. the bed.

A. Under

B. Around

C. Between

4. The bank robbery occurred .......................... i was the restroom.

A. That

B. During

C. While

D. Which

5. Yeast is an organic catalyst ................................... known to prehistoric humanity.

A. Was

B. Which was

C. Which it

D. Which

Multiple-choice vocabulary / grammar tasks

This kind of darting from one context to another to another in a test has become so commonplace that learners almost expect the disjointedness. Some improvement of these items is possible by providing some contexts within each item :

1. Oscar : Do you like champagne?

Lucy : No, i can’t ..................... it!

A. Stand

B. Prefer

C. Hate

2. Manager : Do you like to work by yourself?

Employee : Yes, I like to work .......................

A. Independently

B. Definitely

C. Impatiently

3. Jack : Do you have a coat like this?

John : Yes, mine is ........................ yours.

A. So same as

B. The same like

C. As same as

D. The same as

4. Boss : Where did I put the Johnson file?

Secretary : I think ....................... is on your desk.

A. You were the file looking at

B. The you were looking at file

C. The file you were looking at

D. You were looking at the file

Contextualized multiple-choice vocabulary/grammar tasks

A better contextualized format is to offer a modified cloze test adjusted to fit the objectives being assesed. In the example below, a few lines of English add to overall context.

I’ve lived in the United States (21) .................. three years. I (22)..................... live in Costa Rica. I (23) ..............speak any English. I used to (24) ................ homesick, but now I enjoy (25) ................ here. I have nevehe United Sr (26) .................. back home (27) ............... I came to ttates, but I might (28) ............... to visit my family soon.

21. A. Since 25. A. live

B. for B. To live

C. during C. Living

22. A. Used to 26. A. be

B. use to B. been

C. was C. was

23. A. Couldn’t 27. A. when

B. could B. while

C. can C. since

24. A. been 28. A. go

B. be B. Will go

C. being C. going

Multiple-choice cloze vocabulary / grammar task

The context of the story in this example may not specifically help the test-taker to respond to the items more easily, but it allows the learner to attend to one set of related sentences for eight items that assess vocabulary and grammar. Other contexts might involve some content dependencies, such that earlier sentences predict the correct response for a later item.

Matching Tasks

1. Vocabulary matching task

Write in the letter of the definition on the right that matches the word on the left.

.................... 1. Exhausted a. unhappy

.................... 2. Disappointed b. Understanding of others

.................... 3. Enthusiastic c. tired

.................... 4. Empathetic d. excited

1. At the end of the long race, the runners were totally .................................

2. My parents were ....................... with my bad performance on the final exam

3. Everyone in the office was ............................... about the new salary raises

4. The ........................... listening of the counselor made Christina feel well understood.

Choose from among the following :

Disappointed

Empathetic

Exhausted

enthusiastic

Selected response fill-in vocabulary task

Matching task ADVANTAGES it offers an alternative to traditional multiple-choice or fill in the blank formats and are easier to construct than multiple choice item. DISADVANTAGES - it become more of a puzzle-solving process than a genuine test of comprehension as test-takers struggle with the search for a match.

Editing Tasks

editing for grammatical or rhetorical errors is a widely used test method for assessing linguistic competence in reading. It does not only focus on grammar but also introduces a simulation of the authentic task of editing or discerning errors in written passages.

Picture-Cued Tasks

In the previous section we looked at picture-cued tasks for perceptive recognition of symbols and words. Picture and photographs may be equally well utilized for examining ability at the selective level. Several types of picture-cued methods are commonly used.

1. Test-takers a sentence or passage and choose one of four pictures that is being described. The sentence or sentences at this level is more complex a computer-based example follows:

Multiple-choice picture-cued response

Test-takers read a three-paragraph passage, one sentence of which is :

During at least three quarters of the year, the Arctic is frozen.

Click on the chart that shows the relative amount of time each year that water is available to plants in the Arctic

Test-takers see the following four pictures:

2. Test-takers read a series of sentences or definition, each describing a lbeled part of a picture or diagram. Their task is to identipy each labeled item. In the following diagram, test takers do not necessary know each term, but by reading the definition are able to make an identification. For example: Diagram-labeling task

Testtakers see:

Test-taker read:

Label the picture with the number of the corresponding item described below.

1. Wire supports extending from the hub of a wheel to its perimeter

2. Along narrow support pole between the seat and the handlebars

3. A small, geared wheel concentric with the rear wheel

4. A long, linked, flexible metal device that propels the vehicle

5. A small rectangular lever operated by the foot to propel the vehicle

6. A tough but somewhat flexxible rubber item thta circles each wheel

Gap –Filling Tasks

Many of the multiple-choice tasks described above can be converted into gap-filling, or “fill-in-the-blank,” items in which the test- taker’s response is to write a word of phrase. An extension of simple gap-filling tasks is to create sentence completion items where test- takers read part of a sentence and then complete it by writing a phrase.

Oscar : Doctor, what should I do if I get sick?

Doctor: It is best to stay home and .

If you have a fever, .

You should drink as much .

The worst thing you can do is .

You should also .

Sentence completion tasks

The obvious disadvantages of this type of tasks is its quessionable assessment of reading ability. The task requires both reading and writing performance, there by rendering it of low validity in isolating reading as the sole criterion. Another draw back is scoring the variety of creative responses that are likely to appear. You will have to make a number of judgment calls on what comprises a correct response. In a test of reading comprehension only, you must accept as correct any responses that demonstrate comprehension of the first part of the sentence. This alone indicates that such tasks are better categorized as integrative procedures.

DESIGNING ASSESSMENT TASKS: INTERACTIVE READING

Tasks in this level, like selective tasks, have a combination of form-focused and meaning- focused objectives but wih more emphasis on meaning. Interactive tasks may therefore imply a little more focus on top-down processing than on bottom-up. Texts are a little longer, from a paragraph to as much as a page or so in the case of ordinary prose. Charts, graphs, and other graphics may be somewhat complex in their format.

Cloze Tasks

One of the most popular types of reading assessment tasks is the cloze procedure. The word cloze was coined by educational psychologists to capture the Gestalt psycological concept of “closure,” that is the ability to fill in gaps in an incomplete image (visual, auditory, or cognitive) nad supply (from bacjground schemata) omitted details.

In written language, a sentence with a word left out should have enough context that a reader can close that gap with a calculated guess, using linguistic expectancies (formal schemata), background experience (content schemata), and some strategic competence. Based on this assumption, cloze tests were developed for native language readers and defended as an appropriate gauge of reading ability. Some research (Oller,1973,1976, 1979) on second language acquisition vigorously defends cloze testing as an integrative measure not only of reading ability but also of other language abilities. It was argued that the ability to make coherent guesses in cloze gaps also taps into the ability to listen, speak, and write. With the decline of zeal for the search for the ideal integrative test in recent years, cloze testing has returned to a more appropriate status as one of a number of assessment procedures available for testing ability.

Cloze tests are usually a minimum of two paragraphs in length in order to account for discourse expectancies. They can be constructed relatively easily as long as the specifications for choosing deletions and for scoring are clearly defined. Typically every seventh word (plus or minus two) is deleted (known as fixed – ratiodelection), but many cloze test designers insted use a rasional deletion procedure of choosing deletions according to the grammatical or discourse functions of the words. rational deletion also allows the designer to avoid deleting words that would be difficult to predict from the context. For example, in the sentence” Everyone in the crowd enjoyed the gorgeous sunset,” the seeventh word is gorgeous, but learners could easily substitute other appropriate adjectives. Traditionally, cloze passages have between 30 and 50 blanks to fill, but a passage with as few as half a dozen blanks can legitimately be labeled a cloze test.

Two approches to the scoring of cloze tests are commonly used. The exact word method gives credit to test-takers only if they insert the exact word that was orginally delected. The second method, appropriate word scoring, credits the test- takers for supply any word that is grammatically correct and that makes good sense in the context. In the sentence above about the “gorgeous sunset” the test-takers would get credit for supplying beautiful, amazing, and spectacular. The choice between the two methods of scoring is one of practically/ reliability vs. face validity.in the exact word approach, scoring can be done quickly (especially if the procedure uses a multiple-choice technique) and reliably. The second approach takes more time because the teacher must determine whether each response is indeed appropriate, but students will perceive the test as being fairer: they won’t get “marked of” for appropriate, grammatically correct responses.

The following excerpts from a longer essay illustrate the difference between rational and fix- ratio deletion, and between exact word and appropriate word scoring.

The recognition that one’s feelings of (1) and unhappiness can coexist much like (2) and hate in a close relationship (3) offer valuable clues on how to (4) a happier life. It suggests, for (5) , that changing or avoiding things that (6) you measurablemay well make you (7) miserable but probably no happier.

Cloze procedure, fixed ratio deletion (every seventh word)

The recognition that one’s feelings (1) ) happiness can coexist much like (2) unhappiness can coexist much like love and hate (3) a close relationship may offer valuable clues (4) how to lead a happier life. It suggests, (5) example, that changing (6) avoiding things that

Cloze procedure, rational deletion (prepositions and conjuntions)

Make you miserable may well make you less miserable (7) probably no happier.

In both versions there are seven deletions, but the second version allows the test designer to tap into prediction of prepositios and conjuntions in particular. And the second version provides more washback as students focus on targeted grammatical features.

Both eof the scoring methods named above could present problems, with the first version presenting a little more ambiguity. Possible responses might include:

Fixed-ratio version, blank #3: may, might, could, can

#4: lead, live, have, seek

#5: example, instance

Rational deletion version, blank #4: on, about

#6: or, and

#7: but, and

Arranging a cloze test in a multiple- choice format allows even more rapid scoring: hand scoring with an answer key or hole-punched grid, of computer scoring using scannable answer sheets. Multiple – choice cloze tests must of course adhere to all the other guidelines for effective multiple- choice items that were convered in Chapter 4, especially the choice of appropriate distractors; therefore they can take much longer to construct – possibly too long to pay off in a classroom setting.

Some variations on standard cloze testing have appeared over the years, two of the better known are the C-test and the cloze- elide procedure. In the C-test (Klein- Braley & Raatz, 1984; Klein – Braley, 1985; Dornyei &Katona, 1992), the second half (according to the number of letters) of every other word is obliterated and the test- taker must restore each word. While Klein – Braley and others vouched for its validity and reliability, many consider this technique to be “even more irritating to complete than cloze tests” (Anderson, 2000, p. 225). Look the following example and judge for yourself:

The recognition th one’s feel of happ and unhap can coe much li love a hate i a cl relati may of valuable cl on h to le a hap life. I suggests, f example, th changing o avoiding thi that ma you mise may we make y less mise but prob no hap .

C- test procedure

The second variation, the cloze-elide procedure, inserts words into a text that don’t belong. The test- taker’s task is to detect and cross out the “intrusive” words. look at the same familiar passage:

The recognition that one;s noe feelings of happiness and unhappiness can under coexist much like love and hate in a close then relationship may offer valuable clues on how to lead a happier with life. It suggests, for example, that changing or avoiding my things that make you miserable may well make you less miserable ever but probably no happier.

Cloze – elide procedure

Critics of this procedure (Davies, 1975) claimed that the cloze –elide procedure is actually a test of reading speed and not of proofreading skill, as its proponents asserted. Two advantages are nevertheless immediately apparent: (1) neither the words to insert nor the frequency of insertion appears to have any rationale. Good readers naturally weed out such potential interuptions.

Impromptu Reading Plus Comprehension Questions

If cloze testing is the most- researched procedure for assessing reading, the traditional “Read a passage and answer some questions” technique is undoubtedly the oldest and the most common. Virtually every proficiency test uses the format, and one would rerely consider assessing reading without some component of the assessment involving impromptu reading and responding to questions.

In chapter 4, in the discussion on proficiency testing, we looked at a typical reading comprehension passage and a set of questions from the TOEFL. Here’s another such passage:

Questions 1-10

The hollywood sign in the hills that line the northen border of Los Angeles is a famous landmark recognized the world over. The white- painted, 50-foot-high, sheet metal letters can be seen from great distances across the Los Angeles basin.

The sign was not constructed, as one might suppose, by the movie business as a means of celebriting the importance of Hollywood to this industry: instead, it was first constructed in 1923 as a means of adversiting homes for sale in a 500- acre housing subdivision in a part of Los Angeles called “ Hollywoodland.” The sign that was constructed at the time, of course, said “Hollywoodland.” Over the years, people began referring to the area by the shortened version “Hollywood,” and after the sign and its site were donated to the city in 1945, the last four letters were removed.

The sign suffered from years of disrepair, and in 1973 it needed to be completely replaced, at a cost of $27.700 per letter. Various celebrities were instrumental in helping to raise needed funds. Rock star Alice Cooper, for example, bought on O in memory of Groucho Marx, and Hugh Hefner of Playboy fame held a benefit party to raise the money for the Y. The construction of the new sign was finally completed in 1978.

1. What is the topic of this passage?

(A) A famous sign

(B) A famous city

(D) Hollywood versus Hollywoodland

2. The expression “the world over” in line 2 could best be replaced by

(A) In the northem parts of the world

(B) On top of the world

(D) in the skies

Reading comprehension passage (Phillips, 2001, pp. 421- 422)

3. it can be inferred from the passage that most people think that the Hollywood sign was first constructed by

(A) an advertising company

(B) the movie industry

(D) the city of Los Angeles

4. the pronoun “it” in line 5 refers to

(A) the sign

(B) the movie bussiness

(D) this industry

5. according to the passage, the Hollywood sign was first built in

(A) 1923

(B) 1949

(D) 1978

6. Which of the following is NOT mentioned about Hollywoodland?

(A) it used to be the name of an area of Los Angeles.

(B) It was formerly the name on the sign in the hills.

(D) It was the most expensive area of Los Angeles

7. the passage indicates that the sign suffered because

(A) People damaged it

(B) It was not fixed

(D) It was poorly constructed

8. it can be inferred from the passage that the Hollywood sign was how old when it was necessary to replace it completely?

(A) ten years old

(B) twenty- six years old

(D) fifty- five years old

9. the word “replaced” in line 10 is closest in meaning to wich of the following?

(A) moved to a new location

(B) destroyed

(D) exchanged for a newer one

10. according to the passage, how did celebrities help with the new sign?

(A) they played instruments

(B) they raised the sign

(D) they took part in work parties to build the sign

Notice that this set of questions, based on a 250 word passage, covers the comprehension of these features:

· Main idea(topic)

· Expressions/idioms/phrases in context

· Inference (implied detail)

· Grammatical features

· Detail (scanning for a specifically stated detail)

· Excluding facts not written (unstated details)

· Supporting idea(S)

· Vocabulary in context

These specifications, and the questions that exemplify them, are not just a string of “ straight” comprehension questions that follow the thread of the passage. The questions represent a sample of the test specifications for TOEFL reading passages, which are derived from research on a variety of abilities good readers exhibit. Notice that idea, scanning for details, guessing word meanings from context, inferencing, using discourse markers, etc. To construct your own assessment that involve short reading passages followed by questions, you can begin with TOEFL –like specs as a basis. Your focus in your own classroom will determine which of these- and possibly other specifications – you will include in your assessment procedure, how you will frame questions, and how much weight you will give each item in scoring.

The technology of computer- based readig comprehension tests of this kind enables some additional types of items. Items such as the following are typical:

· Click on the word in paragraph 1 that means “ subsection work”.

· Look at the word they in paragraph 2. Click on the word that they refer to.

· The following sentence could be added to paragraph 2:

Instead, he used the pseudonym Mrs. Silence Dogood.

Where would it best fit in to the paragraph? Click on the square to add the sentence to the paragraph.

· Click on the drawing that most closely resembles the prehistoric coelacanth. [Four drawings are depicted on the screen]

computer- based TOEFL^®reading comprehension items

Short -Answer Tasks

Multiple- choice items are difficult to construct and validate, and classroom teachers rarely have time in their busy schedules to design such a test. A popular alternative

DESIGNING ASSESSMENT TASKS: EXTENSIVE READING

Extensive reading involves somewhat longer texts than we have been dealing with up to this point. Journal articles technical reports, longer essas, short stories, and books fall ito this category. The reason for placing such reading into a separate category is that reading of this type of discourse almost always involves a focus a focus an meaning using mostly top-down processing, with only occasional use of a targeted bottom-up strategy. Also, because of the extent of such reading, formal assessment is unlikely to be contained within presenta a unique challenge fo assessment purposes.

Another complication in assessing extensive reading is that the expected response from the reader is likely to involve as much written ( or sometimes oral )performance as reading. For example, in asking test-takers to respond to an article or story, one could argue that a greater emphasis is palced on writing than on reading. This is no reason to sweep extensive reading assessment under the rug; teacher should not shrink from the assessment of this highly shopisticated skill.

Before examining a few tasks that have proved to be useful in assessing extensive reading, it is essential to note that number of the tasks described in previous categories can apply here. Among them are:

· Impromptu reading plus comprehension questions,

· Short-answer tasks,

· Editing,

· Scanning,

· Ordering,

· Information transfer, and

· Interpretation ( discussed under graphics ).

In addition to those applications are tasks that are unique to extensive reading: skimming, summerizing, responding to reading, and note-taking.

Skimming Tasks

Skimming is the process of rapid coverage of reading matter to determine its gits or main idea. It is a prediction strategy used to give a reader a sense of the topic and purpose of a text, the organization of the text, the perspective or point of view of the writer, its aesy or difficulty, and/or its usefulness to the reader. Of course skimming can apply to the text of less than one page, so it would be wise not to confine this type to task just to extensive texts.

Assessment of skimming strategies is usually straightforward: the test-taker skims a text and answers questions such as the following:

What is the main idea of this text?

What is teh author’s purpose in writing the text?

What kind of writing is this [newspaper, article, manual, novel,etc.]?

What type of writing in this [expository, technical, narrative,etc]?

How easy or difficult do you thinks this text will be?

What do you think you will learn from the text?

How useful will the text be for your [profession, academic needs, interests]?

Skimming tasks

Responses the oral or written, depending on the context. Most assessments in the domain of skimming are informal and formative: they are grist for an imminent dicussion, a more careful reading to follow, or an in-class discussion, and therefore their washback potential is good insofar as the subject amtter and task are useful to a student’s goals, authenticityis preserved. Scoring is less of an issue than providing appropriate feedback to students on their strategies of prediction.

Summarizing and Responding

One of the most common means of assessing axtensive reading is to ask the test-taker to write a summary of the text. Rhe task that is given to students can be very simply worded:

Write the summary of the text. Your summary should be about one paragraph in length (100-150 word) and should include your understanding of the main idea and supporting ideas.

Directions for summarizing

Evaluating summaries is difficult: do you give test-taker a certain of points for targeting the main idea and its supporting ideas?

1. Expresses acurately the main idea and supporting ideas.

2. Is written in the student’s own words; occasional vocabulary from the original text is acceptable.

3. Is logically organized.

4. Displays facility in the use of language to clearly express ideas in the text.

Criteria for assessing a summary (Imao, 2001,p.184)

As you can readily see, a strict adherence to the criterion of assessing reading and reading only, implies consideration of only the first factor; the other three pertain in writing performance. The first criterion is nevertheless a crucial factor; other wise the reader-writer could pass all three of the other criterial with virtually no understanding of the text itself. Evaluation of the reading comprehension criterion will of neccesity remain somewhat subjective because the teacher will need to determine digrees of fulfiliment of the objective ( see below for more about scoring this task).

Of further interest in assessing extensive reading is the technique of asking a student to respond to a text. The two tasks should not be confused with each other summarizing requires a synopsis or over view of the text, while responding asks the reader to provide his or her own opinion on the text as a whole or on some statement or issue within it. Responding may be prompted by such directionsas this.

Directions for responding to reading

One criterion for a good response here is the extent to which the test-taker accurately reflects the content of the article and some of the arguments there in. Scoring as also difficult here because of the subjectivity of determining an accurate reflection of the article itself. For the reading component of this task, as well as the summary task describe above, a holistic scoring system may be feasible:

3 Demonstrated clear, unambiguous comprehension of the main and supporting ideas.

2 Demonstrates comprehension of the main idea but lacks comprehension of some

supporting ideas.

1 Demonstrates only a partial comprehension of the main and supporting ideas.

0 Demonstrates no comprehension of the main and supporting ideas.

Holistic scoring scale for summarizing and responding to reading

The teacher or test administrator must still determine shades of gray between the point categories, but the descriptions help to bridge the gap between an impirically determined evaluation (which is impossible) and wild, impressionistic guesses.

An attempt has been made here to underscore the reading component of summarizing and responding to reading, but it is crucial to consider the interactive relationship between reading and writing that is highlighted in these two tasks. As you direct students to engage in such intergrative performance, it is advisable not to treat them as tasks for assessing reading alone.

Note-Taking Outlining

Finally, a reader’s comprehension of extensive texts may be assessed through an evaluation of a process of note-taking and/or outlining. Because of the difficulty of controlling the conditions and time frame for both these techniques, they rest fimerly in the category of informal assessment. Their utility is in the strategic training that learners gain in retaining information through marginal notes that highlight key information or organizational outlines that put supporting ideas into a visually manageable framework. A teacher, perhaps in one-on-one conferences with students, can use student notes/outlines as indicators of the presence or absence of effective reading strategies, and there by point the learners in positive directions.

LTEClass English Department Baturaja University

Sabtu, November 19, 2016

ASSESSING LISTENING AND ASSESSING READING (GROUP 5)

0 komentar:

Posting Komentar