ASSESSING LISTENING
In
earlier chapters, a number of foundational principles of language assessment
were introduced. Concepts like practicality, reliability, validity,
authenticity, washback, direct and indirect testing, and formative and
summative assessment are by now part of your vocabulary. You have become
acquanted with some tools for evaluating a “good” test, examined procedures for
designing a clasroom test, and explored the complex process of creating
different kinds of test items. You have begun absorb the intricate
psychometric, educational, and political issues that intertwine in the world of
standarized and standards- based testing.
Now
our focus will shift away from the standarized testing juggernaut to th elevel
at which you will usually work: the day-to-day classroom assessment of
listening, speaking, writing, and reading. Since this is the level at which you
will most frequently have the opportunity to apply principles of assessment,
the next four chapters of this book will provide guidelines and hands –on
practice in testing within a curriculum of English as a second language or
foreign language.
But
first, two important caveats. The fact that the four language skills are
discussed in four separate chapters should in no way prediscope you to think
that those skills are or should be
assessed in isolation. Every TESOI: professional will tell you that the
integration of skills is of paramount importance in language learning.
Likewise, assessment is more authentic and provides more washback when skills
are integred. Nevertheless, the skills are treated independently here in order
to identify principles, test types, tasks, and issues associated with each one.
Second,
you may already have scanned through this book to look for a chapter on assessing
grammar and vocabulary, or something in the way of a focus on form in
assessment. The treatment of form- focused assessment is not relegated to a
separate chapter here for a very distinct reason : there is no such thing as a
test of grammar or vocabulary that does not invoke one or more of the separate
skills of listening, speaking , reading and writing! It’s not uncommon to find
little “ grammar test” and” vocabulary tests” in textbooks, and these may be
perfectly useful instruments. But, responses on these quizzes are usually
written, with multiple- choice selection or fill-in-the-blank items. In this
book, we treat the various linguistic forms ( phonology, morphology, lexicon,
grammar, and discourse) within the context of skill areas. That way, we don’t
perpetuate the myth that grammar and vocabulary and other linguistic forms can
somehow be disassociated from a mode of performance.
OBSERVING THE
PERFORMANCCE OF THE FOUR SKILLS
Before
focussing on listening itself, think about th etwo interacting concepts of performance
and observation. All language users perform the acts of listening,
speaking, reading and writing. They of course rely on their underlying
competence in order to accomplish these performances. When you propose to
assess someone’s ability in one or a combination of the four skills, you assess
that person’s competence, but you observe the person’s performance. Sometimes
the performance does not indicate true competence : a bad night’s rest,
illness, an emotional distraction, test anxiety, a memory block, or other
student- related reliability factors could affect performance, thereby
providing an unreliable measure of actual competence.
So,
on eimportant for assessing a learner’s competence is to consider the falibility of the results
of a single performance, such as that produced in a test. As with any attempt
at measurement, it is your obligation as a teacher to triangulate your
measurements : consider at least two ( or more ) performances and/ or contexts before drawing
a conclusion. That could take the form of one or more of the following designs:
·
Several
tests that are combined to form an assessment
·
A
single test with multiple test tasks to account for learning styles and
performance variables
·
In-
class and extra- class graded work
·
Alternative
forms of assessment (e.g., journal, portfolio, conference, observation,
self-assessment, peer-assessment).
Multiple
measures will always give you a more reliable and valid assessment than a
single measure.
A
second principle is one that we teachers often forget. We must rely as much
as possible on observable performance in
our assessment of students. Observable means being able to see or hear the
performance of the learner ( the senses of touch, taste , and smell don’t apply
very often to language testing!). what, then, is observable among the four
skills of listening, speaking, reading and writing? Table 6.1 offers an answer.
Isn’t
it interesting that in the case of the receptive skills, we can observe neither
the process of performing nor a product? I can hear your argument already : But
I can see that she’s listening because she’s nodding her head and frowning and
smiling and asking relevant questions.” Well, you’re not observing the
listening performance; you’re observing the result of the listening. You can no
more observe listening (or reading ) than you can see the wind blowing. The
process of the listening performance itself is the invisible, inaudible process
of internalizing meaning from the auditory signals being transmitted to the ear
and brain. Or you may argue that the product of listening is a spoken or
written response from the student that indicates correct (or incorrect)
auditory processing. Again, the product of listening nad reading is not the
spoken or written response. The product is within the structure of the brain,
and until teachers carry with them little portable MRI scanners to detect
meaningful intake, it is impossible to observe the product. You observe only
the result of the meaningful input in the form of spoken or written output,
just as you observe the result of the wind by noticing trees waving back and
forth.
The
productive skills of speaking and writing allow us to hear and see the process
as it is performed. Writing gives a permanet product in the form of a written
piece. But unless you have recorded speech, there is no permanent observable
product for speaking performance because all those words you just heard have
vanished from your perception and (you hope0 have been transformed into
meaningful intake somewhere in your brain.
Receptive
skills, then, are clearly the more enigmatic of the two modes of performance.
You cannot observe the actual act of listening or reading, nor can you see or
hear an actual product! You can observe learners only while they are learning
or reading. The upshot is that all assessment of listening and reading must be
made on the basis of observing the test- takers speaking o writing ( or
nonverbal response), and not on the listening or reading itself. So, all
assessment of receptive performance must be made by inference!
How
discouraging, right? Well, not necessarily. We have developed reasonably good
assessment tasks to make the necessary jump, through the process of inference,
from unobservable reception to a
conclusion about comprehension competence. And all this is a good reminder of
the importance not just of tringulation but of the potential fragility of the
assessment of comprehension ability. The actual performance is made “ behind
the scenes,” and those of us who propose to make reliable assessments of
receptive performance need to be on our guard.
THE IMPORTANCE
OF LISTENING
Listening has
often played second fiddle to its counterpart, speaking. In the standardized
tasting industry, a number of separate oral production tests are vailable (
Test of Spoken English, Oral Proficiency Inventory, and PhonePass® ,
to name several that are described Chapter 7 of this book), but it is rare to
find just a listening test. One reason for this emphasis is that listening is
often implied as a compotent of speaking. How could you speak a language
without also listening? In addition, the overtly observable nature of speaking
renders it more empirically measurable then listening. But perhaps a deeper
cause lies in universal biases toward speaking. A good speaker is often
(unwisely) valued more highly than a good listener. To determine if someone is
a proficient user of a language , people customarily ask, “ Do you speak
Spanish”
Every
teacher of language knows that one’s oral production ability- other than
monologues, speeches, reading aloud, and the like –nis only as good as one’s
listening comprehension ability. But of even further impact is the likelihood
that input in the aural-oral mode accounts for a language proportion of
successful language acquisition. In a typical day, we do measurably more
listening than speaking (with the exception of one or two of your friends who
may be nonstop chatterboxes!). Whether in the workplace, educational, or home
contexts, aural comprehension far outstrips oral production in quantifiable
terms of time, number of words, effort, and attention.
We
therefore need to pay close attention to listening as a mode of performance for
assessment in the classroom. In this chapter, we will begin with basic
principles and types of listening, then move to a survey of tasks that can be
used to access listening. (For a review of issues in listening, you may want to
read Chapter 16 of TBP).
BASIC TYPES OF
LISTENING
As with all
effective tests, designing appropriate assessment tasks in listening begins
with the specifications of objectives or criteria. Those objectives may be
classified in terms of several types of listening performance. Think about what
you do when you listen. Literally in nanoseconds, the following processes flash
through your brain:
1.
You
recognize speech sounds and hold a temporary “imprint” of them in short-term
memory.
2.
You
simultaneously determine the type of speech event (monologue, interpersonal
dialogue, trabsactional dialogue) that is being processed and attend to its
ocntext (who the speaker is, location, purpose) and the content of the
message).
3.
You
use (bottom- up) linguistic decoding skills and/or (top/down background
schemata to bring a plausible interpretation to the message, and assign a
little eral and intended meaning to the utterance.
4.
In
most cases ( except for repetition tasks, which involve short-term memory
only), you delete the exact linguistic form in which the message was originally
received in favor of conceptually retaining important or relevant information
in long- term memory.
Each of these
stages represents a potential assessment objective:
·
Comprehending
of surface structure elements such as phonemes, words, intonation, or a
grammatical category
·
Understanding
of pragmatic context
·
Determining
meaning of auditory input
·
Developing
the gist a global or comprehensive understanding
From these
stages we can derive four commonly identified types of listening performance,
each of which comprises a category within which to consider assessment tasks
and procedures.
1.
Intensive.
Listening for perception of the components (phonemes, words, intonation,
discourse markets, etc.) of a larger stretch of language.
2.
Reponsive.
Listening to a relatively short stretch of language (a greeting, question,
command, comprehension check, etc.) in order to make an equally short response.
3.
Selective.
Processing stretches of discourse such as short monologues for several minutes
in order to “scan” for certain information. The purpose of such performance is
not necessarily to look for global or general meanings, but to be able to
comprehend designated information in a context of longer stretches of spoken
language ( such as classroom directions from a teacher, TV or radio news items,
or stories). Assessment tasks in selective listening could ask students, for
example, to listens for names, numbers, a grammatical category, directions ( in
a map exercises), or certain facts and events.
4.
Extensive.
Listening to develop a top-down, global understandingn of spoken language.
Extensive performance ranges from listening to lenghthy lectures to listening
to a coversation and deriving a comprehensive message or purpose. Listening for
the gist, of the main idea, and making inferences are all part of extensive
listening.
For
full, comprehension, test- takers may at the extensive level need to invoke
interactive skills ( perhaps note-taking, questioning, discussion): listening
that includes all four of the above types as test- takers actively participate
in discussions, debates, conversations, role plays, and pair and group work.
Their listening performance must be intricately integrated with speaking (and
perhaps other skills)in the authentic give-and-take of communictive
interchange.
MICRO
AND MACRO SKILLS OF LISTENING
A useful way of synthesizing the above
two lists is to consider a finite number of micro- and macroskills implied in
the perfomance of listening comprehension Richards’ (1983) list of microskills
has proven useful in the domain of specifying objectives for learning and may
be even more useful in forcing test makers to carefully identify specific
assessment objectives. In the following box, the skills oare sub devided into what I prifer to think
of as microskills ( attending to the smaller bits and chunks of language, in
more of bottom-up process ) and microskills ( focusing on the larger elements
involved in a top-down approach to a listening task ). The micro and
macroskills provide 17 different objectives to assess in listening.
Micro- and macroskills of listening (
adapted from Richards, 1983 )
Microskills
:
1.
Discriminate among the
distinctive sound of English.
2.
Retain chunks of
language of different lengthd in short-term memory.
3.
Recognize English
strees patterns, word in stressed and unstessed positions, rhythmic structure,
intonation contours, and their role in signaling information.
4.
Recognize reduced forms
of words.
5.
Distinguish word
boundaries, recognize a core of words, and interpretword order patterns and
their significance.
6.
Process speech at
different rates of delivery.
7.
Process speech
containing pauses, errors, corrections, and other perfomance variables.
8.
Recognize grammatical
word classes ( noun, verbs, etc.), systems (e.g. tense, agreement,
pluralization), pattern, rules, and elliptical forms.
9.
Detect sentence
constituents and distinguish between major and minor constituents.
10.
Recognize a particular
meaning may be expressed in different grammatical forms.
11.
Recognize cohesive
devices in spoken discourse.
Macroskills
:
12.
Recognize the
communicative functions of utterances, according to situations, participants,
goals.
13.
Infer situations,
participants, goals using real–world knowladge.
14.
From event, ideas, and
so on, described, predict outcomes, infer links and connections between events,
deduce causes and effects, and detect such relations as main idea, supporting
idea, new information, given information, generalization, and exemplification.
15.
Distinguish between
literal and implied meanings.
16.
Use facial, kinesic,
body language, and other nonverbal clues to decipher meaning.
17.
Develop and use a battery
of listening strategies, such as detecting key words, guessing the meaning of
words from context, appealing for help, and signaling comprehension or lack
thereof.
Implied in the
taxonomy above is a notion of what makes many aspects of listening ladifficult,
or why listening is not simply a linear process of recording strings of
language as they are transmited into our brains. Developing a sense of which
aspects of listening perfomance are predictably difficult will help you to
challenge your students appropriately and to assign weights to items. Consider
the following list of what makes listening difficult.
1.
Clustering: attending
to appropriate “chunks” of language-phrases, caluses, constituents.
2.
Redundancy: recognizing
the kinds of repetitions, rephrasing, elaborations, and insertions that
unrehearsed spoken language often contains, and benefiting from that
recognition.
3.
Reduced forms:
understanding the reduced forms that may not have been a part of an English
learner’s past learning experiences in classes where only formal “textbook”
language has been presented.
4.
Perfomance variables:
being able to “weed out” hesitations, false starts, pauses, and corrections in
natural speech.
5.
Colloquial language:
comprehending idioms, slang, reduced forms, shared cultural knowledge.
6.
Rate of delivery:
keeping up with the speed of delivery, processing automatically as the speaker
continues.
7.
Stress, rhythm, and
intonation: correctly understanding prosodic elements of spoken language, which
is almost always much more difficult than understanding the smaller
phonological bits and pieces.
8.
Interaction: managing
the interactive flow of language from listening to speaking to listening, etc.
DESIGNING
ASSESSMENT TASKS: INTENSIVE LISTENING
Once you have
determined objectives, your next step is to design the tasks, including making
decisions about how you will elicit performance and how you will expect the
test-taker to respond. We will look at tasks that range from intensive
listening performance, such as minimal phonemic pair recognition, to extensive
comprehension of language in communicative contexts. The focus in this section
is on the microskills of intensive listening.
Recognizing
Phonological and Morphological Elements
A typical form
of intensive listening at this level is the assessment of recognition of
phonological and morphological element
of language. A classic test task gives a spoken stimulus and asks
test-takersto identify the stimulus from two or more choices, as in the
following two examples:
Test-takers
hear: He’s from
California.
Test-takers
read: (a)He’s
from California
(b)She’s from California.
|
Test-takers
hear: is he
living?
Test-takers
read: (a) Is he
leaving?
(b) Is he living?
|
In both cases
above, minimal phonemic distinctions are the target. If you are testing
recognition of morphology, you can use the same format:
Morphological pair, -ed ending
Test-takers
hear: I missed
you very much
Test-takers
read: (a) I
missed you very much
(b) I miss you very much
|
Hearing
the past tense morpheme in this sentence challenges even advanced learners,
especially if no context is provided. Stressed and unstressed words may also be
tested with the same rubic. In the following example, the reduced form
(constraction)of can not is tested:
Stress pattern in can’t
Test-takers
hear: My
girlfriend can’t go to the party.
Test-takers
read: (a) My
girlfriend can’t go to the party.
(b) My girlfriend can go to the
party
|
Because
they are decontextualized, these kinds of tasks leave something to the desired
in their authenticity. But they are a step better than items that simply
provide a one-word stimulus:
Test-takers
hear: vine
Test-takers
read: (a) vine
(b) wine
|
Paraphrase
Recognition
The next step up
on the scale of listening comprehension microskills is words, phrase and
sentence, which are frequently assessed by providing a stimulus sentence and
asking the test-taker to choose the correct paraphrase from a number of
choices.
Test-takers
hear: hellow, my
name’s Keiko. I come from Japan.
Test
takers read: (a)
Keiko is comfortable in Japan.
(b) Keiko wants to come to
Japan.
(c) Keiko is Japanese.
(d) Keiko likes Japan.
|
In
the above item, the idiomatic come from is the phrase being tested. To add a
little context, a conversation can be the stimulus task to which test-takers
must respond with the correct paraphrase:
Test-takers
hear: man : Hi, Maria, my name’s George.
Woman:
Nice to meet you, George. Are you American?
Man : No, I’m Canadian.
Test-takers
read: (a)George
lives in the United States.
(b)George is American.
(c)George comes from Canada.
(d)George is Canadian.
|
Here,
the criterion is recognition of the objective form used to indicate country of
origin: Canadian, American, Brazilian,Italian, etc.
DESIGNING
ASSESSMENT TASKS: RESPONSIVE LISTENING
A
question-and-answer format can provide some interactivity in these lower-end
listening tasks. The test-taker’s response is the appropriate answer to a
question.
Appropriate response to a question
Test-takers
hear: How much time
did you take to do your homework?
Test-takers
read: (a) In about an
hour.
(b)
About an hour.
(c) About $10.
(d) Yes, I did.
The
objective of this item is recognition of the wh-question how much and its
appropriate response. Distractors are chosen to represent common learner
errors: (a) responding to how mush vs how much longer; (c) confusing how much
reference to time vs the more frequent reference to money; (d) confusing a a
wh-questionwith a yes/no question.
None of the tasks so far discussed have
to be framed in a multiple-choice format. They can be offered in a more
open-ended framework in which test-takers write or speak the response. The
above item would then look like this:
Open-ended response to a question
Test-takers
hear : how much time
did you take to do your homework?
Test-takers
write or speak :
If
open-ended response formats gain a small amount of authenticity and creativity,
they of course suffer some in their practicality, as teachers must then read
students’ responses and judge their appropriateness, which takes time.
DESIGNING
ASSESSMENT TASKS: SELECTIVE LISTENING
A third type of
listening performance is selective listening, in which the test-taker listen to
a limited quantity of aural input and must discern within it some specific
information. A number of techniques have been used that require selective listening.
Listening
Cloze
Listening cloze
tasks ( sometimes called cloze dictations or partial dictations )require the
test-taker to listen to a story, monologue, or conversation and simultaneously read
the written text in which selected words or phrases have been deleted. Cloze procedure is most commonly associated
with reading only. In generic form, the test consists of a passage in which
every nth word (typically every
seventh word) is deleted and the test-taker is asked to supply an appropriate
word. In a listening cloze task, test-taker see a transcript of the passage
that they are listening to and fill in the blanks with the words or phrases
that they hear.
One
potential weakness of listening cloze techniques is that they may simply become
reading comprehension tasks. Test-takers who are asked to listen to a story
with periodic deletions in the written version may not need to listen at all,
yet may still be able to respond with the appropriate word or phrase. You can
guard again this eventuality if the blanks are items with high information load
that cannot easily predicted simply by reading the passage. In the example
below (adapted from Bailey, 1998, p. 16), such a shortcoming was avoided by
focusing only on the criterion of numbers. Test-takers hear an announcement
from an airline agent and see the transcript with the underlined words deleted
:
Listening
cloze
Test-takers hear:
Ladies
and gentleman. I now have some connecting gate information for those of you
making connections to other flights out of San Francisco.
Flight
seven-oh—six to Portland will depart from gate seventy-three at nine-thirty P.M.
Flight
ten-forty—five to Reno will depart at nine-fifty P.M. from
gate seventeen.
Flight
four-forty to Monterley will depart from gate nine-thirty at P.M. from gate sixty.
And
flight sixteen-oh—three to Sacramento will depart from gate nineteen at ten-fifteen P.M.
Test-takers
write the missing words or phrases in the blanks.
Other listening
cloze task may focus on a grammatical category such as tenses, articles,
two-words verbs, prepositions, or tranition words/phrases. Notice two important
structural differences between listening cloze tasks and standard reading
cloze. In a listening cloze, deletions are geverned by the objective of the
test, not by mathematical deletion of every nth
word; and more than one word may be deleted, as in he above example.
Listening cloze
tasks should normally use an exact word method of scoring in which you accept
as a correct response only the actual word or phrase that we spoken and
consider other appropriate words as incorrect. Such stringency is warranted;
your objective is, after all, to test listening comprehension, not grammatical
or lexical expectancies.
Selective
listening can also assessed through an information transfer technique in which
aurally processed information must be transferred to a visual representation,
such as labeling a diagram, identifying an element in a picture. Completing a
form, or showing routes on a map.
At the lower end
of the scale of linguistic complexity, simple picture-cued items are sometimes
efficient rubrics for assessing certain selected information.
Consider the
following item:
Information
transfer: multiple-picture-cued selection
Test-takers hear:
Choose the correct picture. In my back
yard I have a bird feeder. Yesterday, there were two birds and a squirrel
fighting for the last few seeds in the bird feeder. The squirrel was on top of
the bird feeder while the larger bird sat at the bottom of the feeder
screeching at the squirrel. The smaller bird was flying around the squirrel,
trying to scare it way.
The preceding
example illustrate the need for test-takers to focus on just the relevant
information. The objective of this task is to test prepositions and
prepositional phrases of location (at the
bottom, on top of, around, along with larger,
smaller), so other words and phrases such as back yard, yesterday, last few seeds, and scare away are supplied only as context and need not be tested.
(the task also presupposes, of course, that test-takers are able to identify
the difference between a bird and a quirrel).
In
another genre of picture-cued tasks, a number of people and/or actions are
presented in one picture, such as a group of people at a party. Assuming that
all the items, people, and actions are clearly depicted and understood by the
test-taker assessment may take the form of
·
Questions :”is the tall man near the
door talking to a short woman?”
·
True/false:”the woman wearing a red
skirt is watching TV.”
·
Identification:”Point to the person to
the left of the couch.”
In
a third picture-cued option used by the Test of English for International
Communication (TOEICÃ’),
one single photograph is presented to the test-taker, who then hears four
different statements and must choose one of the four to describe the
photograph. Here is an example.
Information transfer: single-picture-cued verbal
multiple-choice
Test-takers see:
|
a photograph
of a woman in a laboratory setting, with no glasses on, squinting through a
microscope with her right eye, and with her left eye closed.
|
Test-takers hear:
|
(a)
She’s speaking into a microphone.
(b)
She’s putting on her glassess.
(c)
She has both eyes open.
(d)
She’s using a microscope.
|
Information
transfer tasks may reflect greater authenticity by using charts, maps, grids,
and other artifacts of daily life. In the example below, test-takers hear a
student’s daily schedule, and the task is to fill in the partially completed
weekly calendar.
Information
transfer: chart-filling
Test-takers
hear:
Now
you will hear information about Lucy’s daily schedule. The information will be
given twice. The first time just listen carefully. The second time, there will
be a pause after each sentence. Fill in Lucy’s blank daily schedule with the
correct information. The example has already been filled in.
You will hear :
Lucy gets up at eight o’clock every morning except on weekends.
You
will fill in the schedule to provide the information.
Now
listen to the information about Lucy’s schedule. Remember, you will first hear
all the sentences; then you will hear each sentence seperately with time to
fill in your chart.
Lucy
gets up at 8:00 every morning except on weekends. She has English on Monday,
Wednesday, and Friday at ten o’clock. She has History on Tuesday and Thursday
at two o’clock. She takes Chemistry on Monday from two o’clock to six o’clock.
She plays tennis on weekends at four o’clock. She eats lunch at twelve o’clock
every day except Saturday and Sunday.
Now
listen a second time. There will be a pause after each sentence to give you
time to fill in the chart. (Lucy’s schedule is repeated with a pause after each
sentence).
Test-takers see the
following weekly calendar grid:
|
Monday
|
Tuesday
|
Wednesday
|
Thursday
|
Friday
|
Weekends
|
8:00
|
get up
|
get up
|
get up
|
get up
|
get up
|
|
10:00
|
|
|
|
|
|
|
12:00
|
|
|
|
|
|
|
2:00
|
|
|
|
|
|
|
4:00
|
|
|
|
|
|
|
6:00
|
|
|
|
|
|
|
Such
chart-filling tasks are good examples of aural scanning strategies. A listener must discern from a number of
pieces of information which pieces are relevant. In the above example,
virtually all of the stimuli are relevant, and very few words can be ignored.
In other tasks, however, much more information might be presented than is
needed (as in the birdfeeder item), forcing the test-taker to select the
correct bits and pieces mecessary to complete a task.
Chart-filling
tasks increase in difficulty as the linguistic stimullus material becomes more
complex. In one task described by Ur (1984, pp. 108-122), test-takers listen to
a very long descriptions of animals in various cages in a zoo. While they
listen, they can look at a map of the layout of the zoo with unlabeled cages.
Their tasks is to fill in the correct animal in each cage, but the compleity of
the challenging. Similarly, Hughes 1989, p. 138) described a map-marking task
in which test-takers must process around 250 words of colloquial language in
order to complete the tasks of identifying names, positions, and directions in
a car accident scenario on a city street.
Sentence repetition
The task of
simply repeating a sentence or a partial sentence, or sentence repetition, is
also used as an assesment of listening comprehension. As in a dictation (
dicussed below ), the test-taker must retain a stretch of language long enought
reproduce it, and then must respond with an oral repetition of that stimulus.
Incorrect listening comprehension, whether at the phonemic or discourse leve
may be manifested in the correctness of the repetition. A miscue in repetition
scored as a miscue in listening. In the case of somewhat longer sentences, one
comargue that the ability to recognize and retain chunks of language as well as
thread of meaning might be assessed througt repetition. In chapter 7, we will
look close at PhonePase , a commercially produced test that relies largely on
entence repetition to assess both oral production and listening comprehension.
Sentence
repetition is far from a flewless listening assessment task. Buck (2001, p. 79
) noted that such tasks “are not just tests of lisrening, but tests of general
oral skill”. Further, this task may test only recognitio of sounds, and it can
easily be contaminated b lack of short-term memory abiliy, thus invalidating it
as an assessment or comprehention alone. And the teacher may never be able to
distinguish a listening comprehension error from an oral production error.
Therefore, sentence repetition tasks should be used with caution.
DESIGNING
ASSESSMENT TASKS: EXTENSIVE LISTENING
Drawing a clear
distinction between any two of the categories of listening referred to here is
problematic, but perhap the fuzziest division is between selective and
extensive listening. As we gradually move along the continuum from smaller to
large stretches of language, and from micro- to macroskills of listening, the
probability of using more extensive listening tasks increases. Some important
questions about designing assessments at this level emerge.
1.
Can listening
performance be distinguished from cognitive processing factors such as memory,
associations, storage, and recall?
2.
As assessment
procedures become more comunicative, does the task take into account
test-takers’ ability to use grammatical expectancies, lexical collecations,
semantic interpretations, and gramatic competence?
3.
Are test tasks
themselves correspondingly content valid and authentic that is, do they mirror
real-worldlanguage and context?
4.
As assessment tasks
become more and more open-ended, they more closely resemble pedagogical tasks,
which leads one to ask what the differentce is imply specified scoring
proscedures, while the letter do not.
We will try to address these questions s
we look at a number of extensive or quasiextensive listening comprehension
tasks.
Dictation
Dictation is a widely researched genre
of assessing coprehension. In a dictation, test-takers hear a passage, typically
of 50 to 100 words, recited three times: firts, at normal speed; then, with
long pauses between phrases or natural word groups, during which time
test-takers write down what they have just heard; and finally, at normal speed
once more so they can check their work and proofread. Here is a sample
dictation at the intermediate level of English.
First reading ( natural
speed, no pauses, test-taker listen for gist ):
The
state of California has many geographical areas. On the western side is the
pacific ocean with its beaches and sea life. The central part of the state
is a large fertile valley. The southeast ha a hot desert, and north and
west have beautiful mountains and forests. Southern California is a large
urban area populated by millions of people.
Second reading (slowed speed,
pouse at each // break, test-takers write):
The
state of California // has many geographical areas. // On the western side
// is the pacific ocean // with its beaches and sea life. // The central
part of the state // is a large fertile valley. // The southeast ha a hot
desert, // and north and west // have beautiful mountains and forests. //
Southern California // is a large urban area // populated by millions of
people.
Third
reading ( natural speed, tet-takers
check their work ).
|
Dictations have
been used as assessment tools for decades. Some readers still cringe at the
thought of having to render a correctly spelled, verbatim verson of a paragraph
or story recited by the teacher. Unti research on integrative teting was
published (see oller, 1971), dictations were thought to be not much more than
glorified spelling tests. Howerver, the requried integration of listening and
writing in dictation, along with its presupposed knowledge of grammatical and
discourse expectancies, brought this tecnique back into vogue. Hughes (1989),
Cohen (1994), Beiley (1998), and Buck (2001) all defend the plausibelity of
dictation as an integrative test that requires some sophistication in the
language in order to process and write down all segments correctly. Thust, I
include dictation here under the rubric or extensive tasks, although I am more
comfortable with labeling it quasi-extensive.
The difficuly of
a dictation task can be easily manipulated by the length of te word groups ( or
bursts, as they are tecnically called ),
the length of the pauses, the speed at which the text is read, and the
complexity of the discourse, grammar, and vocabulary used in the passage.
Scoring is
another matter. Depending on your context and purpose in administering a
dictation, you will need to decide on scoring criteria for several possible
kinds of errors:
Ø Spelling
error only, but the word appears to have been heard correctly
Ø Spelling
and/ or obvious misrepresentation of a word, illegible word
Ø Grammatical
error ( for examole, test-takers hears I can’t do it, writes I can do it ).
Ø Skipped
word of phrase
Ø Permutation
of words
Ø Additional
words not in the original
Ø Replacement
of a word with an appropriate synonym
Determining the
weight of each of these errors is a highly idiosyncratic choised spealists
disagree almost more than they agree on the importance of the above categories.
They do agree ( Buck, 2001 ) that a dictation is not a spelling test, and that
the first item in the list above should
not be considered an error. They also suggest that point systems be kept simple
( for maintaining practically and reliability ) and that a deductible scoring
method, in which points are subtracted from a hypotetical total, is usually
effecive.
Dictation seems
to provide a reasonably valid method for
integrating and writing skills and for tapping into the cohesive elements of
language implied in shord passages. Howevere, a word of caution lest you assume
that dictation provides a quickand easy methodof assesing extensive listening
comprehension. If the burstsin a dictation are relatively long ( more than
five-word segments ), this method places a certain amount of load on memory and
processing of meaning ( Buck, 2001, p. 78 ). But only a moderate degree of
cognitive processing of required, and claiming that dictation fully assesses
the ability to comprehend pragmatic or illocutionary elements of language,
context inference, or semantics may be going too far. Finally, one can easily
question the authenticity of dictation: it is rare in the real world for people
to write down more than a few chunk of information ( addresses, phone numbers,
grocery lists, ddirections, for example ) at a time.
Despite these
disadvantages, the oarticality of the administration of dicnations, a moderate
degree of rebility in well-established
scoring system, and a strong correspondence to other language abilities speaks
well for the inclusion of dictation among the possibilities for assessing
extensive ( or quasi-extensive ) listening comprehension.
Communicative
Stimulus-Response Tasks
Another-and more
aucentic-example of extansive listening is found in a popular genre of
assessment task in which the test-taker is presented with a stimulus monologue
or conversation and then is asked to respond to a set of comprehension
questions. Such tasks ( as you saw in chapter 4 in the discussion of
standardized testing ) are commenly used in commercialy produced proficiency
tests. The monologues, lectures, and brief conversations used in such task are
sometime a little contrived, and certainly the subsequent multiple-choice
questions don’t mirror communicative, real-life situations. But with some care
and creativity, one can create reasonably autentic stimuli, and in some rere
cases the response mode ( as shown in one example below ) actually approaches
complete authenticity. Here is a typical example of such a tesk.
Tast-takers
hear:
Directions:
now you will hear a conversation between lynn and her doctor. You will hear
the conversation two times. After you hear the conversation the second
time, choose the correct answer for questions 11-15 below. Mark your
answers on the answer sheet provided.
Doctor : Good morning, lynn. What’s the problem?
Lynn : well, you see, I have a terrible headeche,
my nose is running, and i’m really dizzy.
Doctor : okay,. Anything else?
I’ve been coughing, I
think I heve a fever, and my stomach aches.
Lynn : well, let’s see, I went to the lake
lat weekend, and after I returned home I started sneezing.
Dogtor : hmm. You must have the flu. You should
get lost of rest, drink hot beverages, and stay warm. Do you follow me?
Lynn : well, uh, yeah, but . . . shouldn’t I
take some medicine?
Test-takers
read:
11.
what is Lynn’s problem?
a)
She
feels horrible.
b)
She
run too fast at the lake.
c)
She’s
been drinking too many hot beverages.
12.
when did Lynn’s problem start?
a)
When
she saw her doctor
b)
Before
she went to the lake.
c)
After
she came home form the lake
13.
the doctor said that Lynn
a)
Flew
to the lake last weekend
b)
Must
not get the flu
c)
Probably
has the flu
14.
the doctor told Lynn
a)
To
rest
b)
To
follow him
c)
To
take some medicine
15.
according to Dr. Brown, sleep and rest are medicine when you have the
flu.
a)
More
affective than
b)
As
effective as
c)
Less
effective than
|
Does this meet the criterion of authenticity? If you want
to be painfully fussy you might object that it is rare in the real world to
eavesdrop on someone else’s doctor-patient conversation. Nevertheless , the
conversation itself is relatively authentic ; we all have doctor-patient
exchanges like this. Equally authentic, if you add a grain of salt, are
monologues, lecturettes, and news stories, all of which are commonly utilized
as listening stimuli to be followed by comprehension questions aimed at assesing
certain objectives that are build into the stimulus.
Is the task itself (of responding to multiple—choice
questions) authentic? It’s plausible to
assert that any task of this kind following a one-way listening to a
conversation is artificial : we simply don’t often encounter little quizzes
about conversations we’ve heard (unless it’s your parent, spouse, or bestfriend
who wants to get in on the latest gossip !). The questions posed above, with
the possible exception of #14, are unlikely to appear in a lifetime of doctor
visits. Yet the ability to respond correctly to such items can be construct
validated as an appropriate measure of field-independent listening skills : the
ability to remember certain details from a conversation. (as an aside here, many
highly proficient native speaker of English might miss some of the above
question if they heard the conversation only once and if they had no visual
accsess to the items until after the conversation was done!).
The compensate for the potential inautenticity of
post-stimulus comprehension questions, you might, with a little creativity, be
able to find contexts where question that probe understanding are more
appropriate. Consider the following situation :
Dialogue
and authentic questions on details
Test-takers
hear:
You
will hear a conversation between a detective and a man. The tape will play
the conversation twice. After you hear the conversation a second time,
choose the correct answers on your test sheet.
Detective
: where were you last night
at eleven P.M., the time of the murder?
Man : Uh, let’s see,
well, I was just starting to see a movie.
Detective : Did you go alone?
Man
: No, Uh, well,
I was with my friend, Uh, Bill. Yeah, I was with Bill.
Detective
: what did you do after that?
Man : we went out to
dinner then I dropped her off at her place .
Detective : then you went home?
Man : yeah
Detective : when did you get home?
Man
: alittle
before midnight.
Test-takers
read :
7.
Where was the man at 11:00
P.M.?
a.
In a restaurant
b.
In a teather
c.
At home
8.
Was he with someone?
a.
He was alone
b.
He was with his wife
c.
He was with a friend
9.
Then what did he do ?
a.
He ate out
b.
He made dinner
c.
He went home
10.
When did he get home?
a.
About 11:00
b.
Almost 12:00
c.
Right after the movie
11.
The man is probably lying
because (name to clues) :
1.
..........................................................................
2.
..........................................................................
|
In this case, test-takers are brought into a little scene in a
crime story. The questions following are plausible questions that might be asked
to review fact and fiction in the conversation. Question #11, of course,
provides an extra shot of reality : the test-taker must name the probable lies
told by the man (He reffered to Bill as “her”; he saw a movie and ate dinner in
the space of one hour), which requires the precess of inference.
Authentic Listening Tasks
Ideally, the
language assessment field would have a stockpile of listening test types that
are cognitively demanding, communicative, and authentic, not to mention
interactive by means of an integration with speaking. However, the nature of a
test as a sample of performance and a
set of tasks with limited time frames implies an equally limited capacity to
mirror all real-world contexts of listening performance. “There is no such
thing as a communicative test, “stated Buck (2001,p.92.). “ Every test requires
som components of communicative language ability, and no test covers them all.
Similarly, with the notion of authenticity, every task shares some
characteristics with targer-language tasks, and no test is completely
authentic”.
Beyond the rubrics of intensive, responsive, selective,
and quasi-extensive communicative contex described above, can we assess aural
comprehension in a truly communicative contexts? Can we, at this end of the
range of listening tasks, ascertain from test-takers that they have processed
the main idea(s) of a lecture, the gist of
a story, the pragmatics of a conversation, or the unspoken inferential
data present in most authentic aural input? Can we assess a test-taker’s
comprehension of humor idiom, and metaphor? The answer is a cautious yes, but
not without some concessions to practicality. And the answer is a more certain
yes if we take the liberty of stretching the concept of assessment to extend
beyond tests and into a broader framework of alternatives. Here are some
possibilities.
1. Note-taking.
In
the academic world, classroom lectures by professors are common features of a
non-native English-user’s experience. One form of a midterm (Kahn, 2002) uses a
15-minutes lecture as a stimulus. One among several response formats includes
note-taking by the test-takers. These notes are evaluated by the teacher on a
3—point system, as follows.
Scoring system from
lecture notes
0-15
points
Visual
representation : are your notes clear and easy to read? Can you easily find and
retrieve information from them? Do you use the space on the paper to visually
represent ideas? Do you use indentation, headers, numbers, etc?
0-10
points
Accuracy
: Do you accurately indicate main ideas from lectures? Do you note important
details and supporting information and examples? Do you leave out unimportant
information and tangents?
0-5
points
Symbols and abbrevations :
Do you use symbols and abbrevations as much as possible to save time? Do you
avoid writing out whole words, and do you avoid writing down every single word
the lecturer says?
The process of scoring is time consuming (a loss of practicality),
and because of the subjectivity of the point system, it lacks some reliability.
But the gain is in offering students an authentic task that mirrors exactly
what they have been focussing on in the classroom. The notes become an indirect
but arguably valid form of assessing global listening comprehension. The task
fulfills the criteria of cognitive demand, communicative lamguage, and
authenticity.
2. Editing. Another
authentic task provides both a written and a spoken stimulus, and requires the
test-taker to listen for discrepancies. Scoring achieves relatively high
reliability as there are usually a small number of specific differences that
must be identified. Here is the way the task preceeds.
Editing
a written version of an aural stimulus
Test-takers read: the
written stimulus material 9a news report, an email from a friend, notes from s
lecture, or an editorial in a newspaper).
Test-taker hear: a
spoken version of the stimulus that deviates, in a finite numbers of fact or
opinions, from the original written form.
Test-takers mark: the
written stimulus by circling any words, phrases, facts, or opinions that show a
discrepancy between the two versions.
One potentially interesting set of stimuli for such a
task is the description of a political scandal first form a newspaper with a
political blas, and then form a radio broacast from an “alternative” news
station. Test-takers are not only forced to listen carefully to differences but
are subtly informed about biases in the news.
3. Interpretive tasks. One
of the intensive listening tasks described above was paraphrasing a story or
conversation. An interpretive task extends the stimulus material to a longer
stratch of discourse and forces the test—taker to infer a response.
Potential
stimuli include
·
Song lyrics
·
[recited] poetry
·
Radio/television news
reports, and
·
An oral acount of an
experience
Test-takers are then
directed to interpret the stimulus by answering a few questions (in open-ended
form). Question might be :
·
“Why was the singer feeling
sad?”
·
“What events might have
led up to the reciting of this poem?”
·
“What do you think the
political activists might do next, and why?”
·
“What do you think the
storyteller felt about the mysterious disappereance of her necklace?”
This kind of task moves
us away from what might traditionally be considered a test toward an informal
assessment, or possibly even a pedagogical technique or activity. But the task
conforms to certain time limitations, and the questions can be quite specific,
even though they ask the test-taker to use inference. While reliable scoring
may be an issue (there may be more than one correct interpretation), the
authenticity of the interaction in this task and potential washback to the
student surely give it some prominence among communicataive assessment
procedures.
4. Retelling : In
a related task, test-takers listen to a story or news event and simply retell
it, or summarize it, either orally 9on an audiotape) or in writing. In so
doing, test-takers must identify the gist,
main idea, purposes, supporting points and/or conclusion to show full
comprehension. Scoring is partially predetermine by specifying a minimum number
of elements that must appear in the retelling. Again reliability may suffer,
and the time and effort needed to read and evaluate the response lowers
practicality. Validity, cognitive precessing, communicative ability and
authenticity are all well incorporated into the task.
ASSESING READING
Even as we are bombarded with an unending supply of
visual and auditiry media, the written word continoues in its function to
convey information, to amuse and entertain us, to codify our social, economic,
and legel convenstion, to fulfill a host of other functions. In literate
societies, most “normal” children learn to read by the age of five or six, and
some even earlier. With the exception of a small number of people with learning
disabilities, reading is a skill that is taken for granted.
In foreign language learning, reading is likewise a
skill that teacher simply expect learners to acquire. Basic, beginning level
textbooks in a foeign langage presupose a student’s reading ability if only
because it’s a book that is a medium. Most formal test use the writtenword as a
stimulus for tets-taker response; even oral interviews my require reading
performance for certains tasks. Reading, arquably the most essential skill for
success in all educational context, remains a skill of paramount importance as
we create assessments of general language ability.
Is reading so natural and normal that learners
should simply be exposed to writteen texts with no particular instructions?
Will they just absorb the skill necessary to convert their perception of a
handful of letters into meaningful chucks of informations? Not necessarily. For
learners of English, two pritnary hurdles
must be cleared in order to become efficient reader. First, they need to
able to master fundamental bottem-up strategies for processing separate
letters, words, and phrases, as weell as top-down, conceptually driven
strategies for comprehension. Second, as part of that top-down approach, second
languange readers must develop appropiate conteny and formal schemata-backround
information and cultural experiece-to caarry out those interpretations
effectively.
The assessment of reading ability does not end with
the measurement of comprehension. Strategic pathways to fuul understnding are
often important factor to include in assessing learners, especially in the case
of the classroom assessments that are for mative in nature. An inability to
comprehend my thus be treced to a need to enchace a test-taker’s strategies for
achieving ultimate comprehension. For example, an academic tecnhical report ma
be comprehensible to a student at the sentence level, but if theleaner has not
exercised certain strategies for noting the discourse conventions of the genre,
misunderstanding may occur.
As we consider a number of different types or genres
of written texts, the components of reading, let’s not forget the unobservable nature of reading.
Likr listening one cannot see the process of reading, nor can one observe a
specific product of reading. Other than observing a readre’s eye movements and
page turning, there is no tecnology that anables us to “see” sequences of
grapich symbols traveling from the pages of a book into compretments of the
brain ( in a possible bottom-up process ). Even more outlandish is the notion
that one might be able to watch information from the brain make its way down
into the page ( in typical top-down strategies ). Further, once something is
read-information from the written test is stored-no tecnology alliws us to
empirically measure exactly what is lodged in the brain. All assessment of
reading must ba carried out be inference.
TYPES ( GENRES ) OF READING
Each type or genre of written text has its own set
of governing rules and convertions. A reader must be able to anticipate those
conventions in order to process meaning effecienly. Whith an extraordinary
number of genres present in any literate culture, the reader’s ability to
process texts must be every sophisticated. Considr the following abridged list
of common genres, which ultimately form part of the specifications for assessments
of reading ability.
1. Academic
reading
General
interest articles ( in magazines, newspapers, ect)
Technical
repor ( e.g., lab reports ), professional jiurnl articles reference
material ( dictionries, ect.)
Textbooks,
theses
Essays,
papers
Test
directions
Editorials
and opinion writing
2. Job-related
reading
Massages
( e.g,. phon masseges )
Letters/
Emails
Memos
( e.g., interoffice )
Reports
( e.g., job evaluation, projectreports )
Schediles,
labels, signs, announcements
Forms,
applicatons, questionnaires
Financial
document ( bills, invoices, etc.)
Directories
( telephone, office, etc.)
Manuals,
directions
3. Personal
reading
Newspapers
and megazines
Letters,
email, greeting cards, invitations
Massages,
note, lists
Schedules
( train, bus, plane, etc. )
Recipes,
menus, maps, calenders
Advertisements
( commercials, want ads )
Novels,
short stories, medical reports, immigration documents
Comic
rtips, cartoons
|
When we realize that list is only the beginning, it
is easy to see how overwhelming it is to learn to read in a foreign language!
The genre of a text enables readers to apply certain schemata that wil assest
them in etracting appropriate meaning. If, for example, readers know that a
text is a recipe, they will expect a certain arranbement of information (
ingredients ) and wil know to search for a sequential order of directions.
Efficient reader also have to know what their purpose is in reading a text, the
strategies for accomplishing that purpose, and how to retain the information.
The content validity of an assessment procedure is
largely established through the genre of a text. For example, if learns in a
program of English for toursm have ments
of their ability should include guidebooks, maps, transportation schedules,
calendars, and other relevent texts.
MICROSKILLS, MAKROSKILLS, AND
STRATEGIES FOR READING
Aside from attending to genres of text, the skills
and strategies for accomplishing reading amerge as a crucial cinsideration in
the assesment of reading ability. The micro and macroskills below represent the
spectrum of possibilities for objectives in the asessment of reading
comprehension.
Micro
and macroskills for reading comprehension
Microskills
1.
Discriminate
among of the distinctive graphemes and orthographic patterns of Englis
2.
Retain
chuck of language of different lenfths in short-term memory.
3.
Process
writing at an efficient rate of speed to suilt the purpose
4.
Recognize
a core of words, and interpret word order patterns and thair significance.
5.
Recogize
grammatical word classes (nouns, verb, etc. ) sytems ( e.g., tese,
agreement, pluralization ), patterns, rules, and elliptical forms.
6.
Reconizethat
a particular meaning may be expressed in different grammatical forms.
7.
Recognize
cohesive devices in written discourse and their role n signaling the
relationship between and among clauses.
Microskills
8.
Recognize
the rhetorical forms of written discourse and their significance for
onterpretation.
9.
Recognize
the communicative functions of written text, according to form and purpose.
10.
Infer
context that is not explicit by using backround knowledge.
11.
From
described events, ideas, ect., infer link and contections between event’s,
decude causes and effects, and detect such relation as main idea,
supporting idea, new information, generalization, and examplification.
12.
Distinguish
between literal and implied meaning.
13.
Detect
culturally specific reference and interpret them in a context of the
appropriate cultural schemata.
14.
Develop
and use a battery of reading strategies , such as scanning and skimming,
detecting discourse markers, guessing the meaning of words from context,
and activating schemata for the interpretation of texts.
|
The assessment of reading can imly the asessment of
a storehouse of reading strategies, as
indicated in item #14. Aside from simply testing the ultimate achievement of
conprehension of a written text, it may be inpotant in some context to asses
one or more of a storehouse of classic reading strategies. The brief taxonomy
of strategies below is a list of possible assesment criteria.
1.
Identify
your purpose in reading a text.
2.
Apply
spelling riles and conventions for bottom-up decoding.
3.
Use
lexical analysis ( prefices, roots, suffixes, etc. ) to determine meaning
4.
Gues
at meaning ( of words, idioms, etc.) when you aren’t certain.
5.
Skim
the text for the gist ind for main ideas.
6.
Scan
the text for specific information ( names. Dates, key words).
7.
Use
silent reading tecniques for rapid procesing.
8.
Use
marginal notes, outlines, charts, or semantic maps for understanding and
retainng information
9.
Distinguish
between literal and impliedmeanings.
10.
Capitalize
on discourse markers to process relationships.
|
TYPES OF READING
In the previous chapters we saw that both listening
and speaking could be subdivided into at least five different types of
listening and speaking performance. In the case of reading, variety of
perfomance is derived more from the multiplicity of texts (the genres listed above)
than from the variet of overt types of performance. Never the less, for
considering assesment produceres, several types of reading performance are
typically identified, and these will serve are organizers of various assessment
taskt.
1.
Perceptive. In keeping with the set
of categories specifiedfor listening comprehension, similar spesifications are
offered hrer, except with some differing terminonoly to computer uniqueness of
reading. Perceptive reading tasks involve attending to the components of largers
stretches of discourse: letters, word, punctuation, and other graphemic
symbols. Bottom-up processing is implied.
2.
Selective. This category is largely
an artifact of assessment formants, in order to ascertain one’s reading recognition of lexical, grammatical,
or discourse features of language within a very short stretch of language,
certain typical tasks are used picture cued tasks, matching, true/ false,
multiple-choice, etc. Stimuli include sentences, brief paragraphs, and simple
charts and graphs. Brief resounses are intended as well. A combination of
bottom-up and top-down processing may be used.
3.
Interactive. Include among tnteractivereading
types are streches of language of several paragraphs to one page or more in
which the reders must, in a psycholinguitic sense, interact with the text. That
is, reading is a process of negotiating meaning; the readers brings to the text
a set of schemata for undestanding it and in take is the product of that
interaction. Typical genres that lend themselves to interactive reading are
anecdotes, short narratives and descriptions, excerpts from longer text,
questionnaires, memos, announcements, directions, recipes, and the like. The
focus of an interactive task is to identify relevant features ( lexical,
symbolic, grammatical, and discoure ) within texts of moderately short length
with the objective of retaining the information that is processed. Top-down
processing is typical of such tasks, although some instances of bottom-up
performance may be necessary.
4.
Extensive. Extensive reading, a
dicussed in this books, applies to texts of more than a page, up to and
including professional articles, essats, tecnhical reports, short stories, and
books. (it should be noted that reading research commonly refers to “extensive
reading” as longer stretches of discourse, such as long articles and books that
are usually read outside a classroom hour. Hereh that definition is massaged a
little in order to encompass any taxts longer that a page). The purposes of
assessment usually are to tap into a learner’s global undestanding of a iext,
as opposed to asking test-takers to “zoom in”on small details. Top-doen processing
is assumed for most extensive tasks.
DESIGNING ASSESSMENT TASKS:
PERCEPTIVE READING
At the beginning levelof reading a second language
lies a set of tasks that are fundamental and basic: recognition of alphebetic
symbols, capitalied and lowercase letters, punctuation. Word, and
grapheme-phoneme correpondences. Such tasks of perception are often referred to
as literacy tasks, implying that the learner is in the early stages of becoming
“literate”in their own native language, but in other cases the second language
my be the first languge that they have ever learned to read. This letters
context poses cognitive and sometimes agerelated issues thet need to be
considered carefully. Assessment of literacy is no easy assigment, and if you
are interested in this particular challenging area further reding beyond this
book is advised ( Harp, 1991; far & Tone, 1994; Genesee, 1994; Cooper, 1997
). Assessment of basic reading skills may be carried out in a number of
different ways.
Reading Aloud
The test-takers sees separate letters, words, and/
or short sentence and read them aloud, one by one, in the oresence of an
administrator. Since the assessment is of reading comprehensin, any
recognizable oral approximation of the target response is considered correct.
Written Response
The same stimuli are presented, and the
test-takker’s task is to reproduce the probe on writing. Because of the
transfer across different skills here, evaluation of the test-taker’s response
must be carefully treated. If an error accursw, make sue you determine its
course; what might be assumed to be writing eror, for example, may actually be
a reading error, and vice versa.
Multiple-Choice
Multiple-choice responses are not only a matter of
choosing one of four or five posible answers. Other formats, some of wich are
especially useful at the low levels of reading, include same/ different, circle
the answer, true/false, choose he letter, and matching. Here are ome
posibilities.
Test-taker
read: Circle “S” for same or “D” for different.
1.
Led let S D
2.
Bit bit S D
3.
Seat set S D
4.
Too to S D
In
the case of very low level learners, the teacher/administrator reads
directions.
|
Test-takers read:
Circle the “ood” item, the one that doesn’t “belong”.
1.
Piece peace plece
2.
Book book boot
In the case of very
low level learners, the teacher/administrator reads directions.
|
Structure-Cued Items
Test-takers
are shown a picture, such as the one on the next page, along with a written
text and are given one of a number of possible tasks to perform.
|
cat
|
chair
|
clock
|
With the same picture, the test-taker might read
sentences and then point to the correct part of the picture :
Picture-cued sentence identification
Test-takers hear : Point to the part of the picture that you read about
here.
Test-takers see the picture and read each
sentence written on a seperate card.
|
The
man is reading a book
|
The
man is reading a book
|
Or a true/false procedure might be presented with
the same picture cue:
Picture-cued true/false sentence identification
Test-takers read:
1.
The pencils are under the
table T F
2.
The cat is on the table T F
3.
The picture is over the couch T F
|
Matching can be an effective method of assesing
reading at this level. With objects labeled A,B,C,D,E in the picture, the
test-taker reads words and writes the appropriate letter beside the word :
Picture-cued
matching word identification
Test-takers read:
1. Clock
2. Chair
3. Books
4. Cat
5. table
|
Finanlly.
Test-takers might see a word or phrase and then be directed to choose one of
four pictures that being described, thus requiring the test-taker to transfer
from a verbal to a nonverbal mode. In the following item, test-takers choose
the correct letter:
Multiple-choice picture-cued word
identification
Test-takers read: Rectangle
Test-takers see, and choose the correct item:
A B C D
A B C D
|
DESIGNING ASSESMENT TASKS
: SELECTIVE READING
Just
above the rudimentary skill level of perception of letters and words is a
category in which the test designer focuses on formal aspects of language
(lexical, grammatical, and a few discourse feature). This category includes
what many textbooks provide little think of as testing “vocabulary and
grammar”. How many textbooks provive little tests and quizzes labeled
“vocabulary and grammar” and never feature any other skill besides reading?
Lexical and grammatical aspects of language or simply the forms we use the
perform four of the skills of listening, speaking, reading, and writing.
(notice that in all of these chaapters on the four skills, formal features of
language have become a potential focus for assesment.)
Here are some of the possible tasks
you can use to asses lexical and grammatical aspects of reading ability.
MULTIPLE-CHOICE (FOR FORM
–FOCUSED CRITERIA)
By far the most popular method of
testing a reading knowledge of vocabulary and grammar is the multiple-choice
format, mainly for reasons of particality : it is easy to administer and can be
scored quickly. The most straightforward multiple-choice items may have little
context, but might serve as a vocabulary or grammatical check.
1.
He’s not married. He’s
A. Young
B. Single
C.
First
D. A husband
2.
If there’s no doorbell, please ..........................
on the door.
A. Kneel
B. Type
C.
Knock
D. Shout
3.
The mouse is .................................
the bed.
A. Under
B. Around
C.
Between
4.
The bank robbery occurred
.......................... i was the restroom.
A. That
B. During
C.
While
D. Which
5.
Yeast is an organic catalyst
................................... known to prehistoric humanity.
A. Was
B. Which was
C.
Which it
D. Which
|
This
kind of darting from one context to another to another in a test has become so
commonplace that learners almost expect the disjointedness. Some improvement of
these items is possible by providing some contexts within each item :
1.
Oscar : Do you like champagne?
Lucy : No, i can’t ..................... it!
A. Stand
B. Prefer
C.
Hate
2.
Manager :
Do you like to work by yourself?
Employee : Yes, I
like to work .......................
A. Independently
B. Definitely
C.
Impatiently
3.
Jack :
Do you have a coat like this?
John : Yes, mine is
........................ yours.
A. So same as
B. The same like
C.
As same as
D. The same as
4.
Boss :
Where did I put the Johnson file?
Secretary : I think ....................... is on your desk.
A. You were the file
looking at
B. The you were looking
at file
C.
The file you were looking at
D. You were looking
at the file
|
A better contextualized format is to offer a
modified cloze test adjusted to fit the objectives being assesed. In the
example below, a few lines of English add to overall context.
I’ve lived in the United States (21) .................. three
years. I (22)..................... live
in Costa Rica. I (23) ..............speak any English. I used to (24)
................ homesick, but now I enjoy (25) ................ here. I
have nevehe United Sr (26) .................. back home (27)
............... I came to ttates, but I might (28) ............... to visit
my family soon.
21. A. Since 25.
A. live
B. for B. To live
C. during C. Living
22. A. Used to 26.
A. be
B. use to B. been
C. was C. was
23. A. Couldn’t 27.
A. when
B. could B. while
C. can C. since
24. A. been 28.
A. go
B. be B. Will go
C. being C. going
|
The
context of the story in this example may not specifically help the test-taker
to respond to the items more easily, but it allows the learner to attend to one
set of related sentences for eight items that assess vocabulary and grammar.
Other contexts might involve some content dependencies, such that earlier
sentences predict the correct response for a later item.
Matching
Tasks
1. Vocabulary matching task
Write in the letter of the definition on the right that matches the
word on the left.
.................... 1. Exhausted a.
unhappy
.................... 2. Disappointed b. Understanding of others
.................... 3. Enthusiastic c. tired
.................... 4. Empathetic d.
excited
|
2.
Selected
response fill-in vocabulary task
1.
At the end of the long race, the runners were totally
.................................
2.
My parents were ....................... with my
bad performance on the final exam
3.
Everyone in the office was
............................... about the new salary raises
4.
The ........................... listening of the
counselor made Christina feel well understood.
Choose from among
the following :
Disappointed
Empathetic
Exhausted
enthusiastic
|
Matching task
ADVANTAGES it offers an alternative to
traditional multiple-choice or fill in the blank formats and are easier to
construct than multiple choice item. DISADVANTAGES - it become more of a
puzzle-solving process than a genuine test of comprehension as test-takers
struggle with the search for a match.
Editing
Tasks
editing for
grammatical or rhetorical errors is a widely used test method for assessing
linguistic competence in reading. It does not only focus on grammar but also
introduces a simulation of the authentic task of editing or discerning errors
in written passages.
Picture-Cued
Tasks
In the previous
section we looked at picture-cued tasks for perceptive recognition of symbols
and words. Picture and photographs may be equally well utilized for examining
ability at the selective level. Several types of picture-cued methods are
commonly used.
1. Test-takers
a sentence or passage and choose one of four pictures that is being described.
The sentence or sentences at this level is more complex a computer-based
example follows:
Multiple-choice
picture-cued response
Test-takers read a three-paragraph passage, one sentence of which
is :
During at least three quarters of the year, the
Arctic is frozen.
Click on the chart that shows the relative amount of time each year
that water is available to plants in the Arctic
Test-takers see the following four pictures:
|
2. Test-takers
read a series of sentences or definition, each describing a lbeled part of a
picture or diagram. Their task is to
identipy each labeled item. In the following diagram, test takers do not
necessary know each term, but by reading the definition are able to make an
identification. For example:
Diagram-labeling task
Testtakers see:
Test-taker read:
Label the picture with the number of the
corresponding item described below.
1.
Wire supports extending from the hub of a wheel
to its perimeter
2.
Along narrow support pole between the seat and
the handlebars
3.
A small, geared wheel concentric with the rear
wheel
4.
A long, linked, flexible metal device that
propels the vehicle
5.
A small rectangular lever operated by the foot
to propel the vehicle
6.
A tough but somewhat flexxible rubber item thta
circles each wheel
|
Gap
–Filling Tasks
Many of the
multiple-choice tasks described above can be converted into gap-filling, or
“fill-in-the-blank,” items in which the test- taker’s response is to write a
word of phrase. An extension of simple gap-filling tasks is to create sentence
completion items where test- takers read part of a sentence and then complete
it by writing a phrase.
Oscar
: Doctor, what
should I do if I get sick?
Doctor:
It is best to stay home and
.
If you have
a fever, .
You should
drink as much .
The worst
thing you can do is .
You should
also .
|
The
obvious disadvantages of this type of tasks is its quessionable assessment of
reading ability. The task requires both reading and writing performance, there
by rendering it of low validity in isolating reading as the sole criterion.
Another draw back is scoring the variety of creative responses that are likely
to appear. You will have to make a number of judgment calls on what comprises a
correct response. In a test of reading comprehension only, you must accept as
correct any responses that demonstrate comprehension of the first part of the
sentence. This alone indicates that such tasks are better categorized as
integrative procedures.
DESIGNING
ASSESSMENT TASKS: INTERACTIVE READING
Tasks
in this level, like selective tasks, have a combination of form-focused and
meaning- focused objectives but wih more emphasis on meaning. Interactive tasks
may therefore imply a little more focus on top-down processing than on bottom-up.
Texts are a little longer, from a paragraph to as much as a page or so in the
case of ordinary prose. Charts, graphs, and other graphics may be somewhat
complex in their format.
Cloze Tasks
One
of the most popular types of reading assessment tasks is the cloze procedure.
The word cloze was coined by educational psychologists to capture the Gestalt
psycological concept of “closure,” that is the ability to fill in gaps in an
incomplete image (visual, auditory, or cognitive) nad supply (from bacjground
schemata) omitted details.
In
written language, a sentence with a word left out should have enough context
that a reader can close that gap with a calculated guess, using linguistic
expectancies (formal schemata), background experience (content schemata), and
some strategic competence. Based on this assumption, cloze tests were developed
for native language readers and defended as an appropriate gauge of reading
ability. Some research (Oller,1973,1976, 1979) on second language acquisition
vigorously defends cloze testing as an integrative measure not only of reading
ability but also of other language abilities. It was argued that the ability to
make coherent guesses in cloze gaps also taps into the ability to listen,
speak, and write. With the decline of zeal for the search for the ideal
integrative test in recent years, cloze testing has returned to a more
appropriate status as one of a number of assessment procedures available for
testing ability.
Cloze
tests are usually a minimum of two paragraphs in length in order to account for
discourse expectancies. They can be constructed relatively easily as long as
the specifications for choosing deletions and for scoring are clearly defined.
Typically every seventh word (plus or minus two) is deleted (known as fixed
– ratiodelection), but many cloze test designers insted use a rasional
deletion procedure of choosing deletions according to the grammatical or
discourse functions of the words. rational deletion also allows the designer to
avoid deleting words that would be difficult to predict from the context. For
example, in the sentence” Everyone in the crowd enjoyed the gorgeous sunset,”
the seeventh word is gorgeous, but learners could easily substitute
other appropriate adjectives. Traditionally, cloze passages have between 30 and
50 blanks to fill, but a passage with as few as half a dozen blanks can
legitimately be labeled a cloze test.
Two
approches to the scoring of cloze tests are commonly used. The exact word
method gives credit to test-takers only if they insert the exact word that was
orginally delected. The second method, appropriate word scoring, credits the
test- takers for supply any word that is grammatically correct and that makes
good sense in the context. In the sentence above about the “gorgeous sunset”
the test-takers would get credit for supplying beautiful, amazing, and
spectacular. The choice between the two methods of scoring is one of
practically/ reliability vs. face validity.in the exact word approach, scoring
can be done quickly (especially if the procedure uses a multiple-choice
technique) and reliably. The second approach takes more time because the
teacher must determine whether each response is indeed appropriate, but
students will perceive the test as being fairer: they won’t get “marked of” for
appropriate, grammatically correct responses.
The following
excerpts from a longer essay illustrate the difference between rational and
fix- ratio deletion, and between exact word and appropriate word scoring.
The recognition that one’s feelings of (1) and unhappiness can coexist much like (2)
and hate in a close relationship (3) offer valuable clues on how to (4) a happier life. It
suggests, for (5) ,
that changing or avoiding things that (6) you measurablemay well make you (7) miserable but probably
no happier.
|
The recognition that one’s feelings (1) ) happiness
can coexist much like (2) unhappiness can coexist much like love and hate (3) a close relationship may offer valuable
clues (4) how
to lead a happier life. It suggests,
(5) example, that changing (6) avoiding things that
|
Make you miserable may well make you less miserable (7) probably no happier.
|
In both
versions there are seven deletions, but the second version allows the test
designer to tap into prediction of prepositios and conjuntions in particular.
And the second version provides more washback as students focus on targeted
grammatical features.
Both eof the
scoring methods named above could present problems, with the first version
presenting a little more ambiguity. Possible responses might include:
Fixed-ratio version, blank #3:
may, might, could, can
#4: lead, live, have, seek
#5: example, instance
Rational deletion version, blank #4:
on, about
#6:
or, and
#7:
but, and
Arranging a
cloze test in a multiple- choice format allows even more rapid scoring: hand
scoring with an answer key or hole-punched grid, of computer scoring using
scannable answer sheets. Multiple – choice cloze tests must of course adhere to
all the other guidelines for effective multiple- choice items that were
convered in Chapter 4, especially the choice of appropriate distractors;
therefore they can take much longer to construct – possibly too long to pay off
in a classroom setting.
Some variations
on standard cloze testing have appeared over the years, two of the better known
are the C-test and the cloze- elide procedure. In the C-test (Klein- Braley
& Raatz, 1984; Klein – Braley, 1985; Dornyei &Katona, 1992), the second
half (according to the number of letters) of every other word is obliterated
and the test- taker must restore each word. While Klein – Braley and others vouched for its
validity and reliability, many consider this technique to be “even more
irritating to complete than cloze tests” (Anderson, 2000, p. 225). Look the following
example and judge for yourself:
The recognition th
one’s feel of happ
and unhap can coe much li love a hate i a cl
relati may of valuable cl on h to le a hap life. I suggests, f example, th changing o avoiding thi that ma you mise may we make y
less mise but prob no hap .
|
The second
variation, the cloze-elide procedure, inserts words into a text that don’t
belong. The test- taker’s task is to detect and cross out the “intrusive”
words. look at the same familiar passage:
The recognition that one;s noe feelings of happiness and
unhappiness can under coexist much like love and hate in a close then
relationship may offer valuable clues on how to lead a happier with life.
It suggests, for example, that changing or avoiding my things that make you
miserable may well make you less miserable ever but probably no happier.
|
Critics of this
procedure (Davies, 1975) claimed that the cloze –elide procedure is actually a
test of reading speed and not of proofreading skill, as its proponents
asserted. Two advantages are nevertheless immediately apparent: (1) neither the
words to insert nor the frequency of insertion appears to have any rationale.
Good readers naturally weed out such potential interuptions.
Impromptu
Reading Plus Comprehension Questions
If cloze
testing is the most- researched procedure for assessing reading, the
traditional “Read a passage and answer some questions” technique is undoubtedly
the oldest and the most common. Virtually every proficiency test uses the
format, and one would rerely consider assessing reading without some component
of the assessment involving impromptu reading and responding to questions.
In chapter 4,
in the discussion on proficiency testing, we looked at a typical reading
comprehension passage and a set of questions from the TOEFL. Here’s another
such passage:
Questions
1-10
The
hollywood sign in the hills that
line the northen border of Los Angeles is a famous landmark recognized the
world over. The white- painted, 50-foot-high, sheet metal letters can be
seen from great distances across the Los Angeles basin.
The
sign was not constructed, as one might suppose, by the movie business as a
means of celebriting the importance of Hollywood to this industry: instead,
it was first constructed in 1923 as a means of adversiting homes for sale
in a 500- acre housing subdivision in a part of Los Angeles called “
Hollywoodland.” The sign that was constructed at the time, of course, said
“Hollywoodland.” Over the years, people began referring to the area by the
shortened version “Hollywood,” and after the sign and its site were donated
to the city in 1945, the last four letters were removed.
The sign suffered from years of disrepair, and in 1973 it
needed to be completely replaced, at a cost of $27.700 per letter. Various
celebrities were instrumental in helping to raise needed funds. Rock star
Alice Cooper, for example, bought on O in memory of Groucho Marx, and Hugh
Hefner of Playboy fame held a benefit party to raise the money for the Y.
The construction of the new sign was finally completed in 1978.
1.
What is the topic of this passage?
(A)
A famous sign
(B)
A famous city
(C)
World landmarks
(D)
Hollywood versus Hollywoodland
2.
The expression “the world over” in line 2 could best be replaced
by
(A)
In the northem parts of the world
(B)
On top of the world
(C)
In the entire world
(D)
in the skies
|
3.
it
can be inferred from the passage that most people think that the Hollywood
sign was first constructed by
(A) an advertising company
(B) the movie industry
(C) a construction company
(D) the city of Los Angeles
4.
the
pronoun “it” in line 5 refers to
(A)
the sign
(B)
the movie bussiness
(C)
the importance of Hollywood
(D)
this industry
5.
according
to the passage, the Hollywood sign was first built in
(A)
1923
(B) 1949
(C) 1973
(D) 1978
6.
Which
of the following is NOT mentioned about Hollywoodland?
(A)
it used to be the name of an area of Los Angeles.
(B)
It was formerly the name on the sign in the hills.
(C)
there were houses fo sale there.
(D)
It was the most expensive area of Los Angeles
7.
the
passage indicates that the sign suffered because
(A) People damaged it
(B) It was not fixed
(C) The weather was bad
(D) It was poorly constructed
8.
it
can be inferred from the passage that the Hollywood sign was how old
when it was necessary to replace it
completely?
(A) ten years old
(B) twenty- six years old
(C) fifty years old
(D) fifty- five years old
9.
the
word “replaced” in line 10 is closest in meaning to wich of the following?
(A) moved to a new location
(B) destroyed
(C) found again
(D) exchanged for a newer one
10. according to the passage, how did celebrities help with the new
sign?
(A) they played instruments
(B) they raised the sign
(C) they helped get the money
(D) they took part in work parties to build the sign
|
Notice that
this set of questions, based on a 250 word passage, covers the comprehension of
these features:
·
Main
idea(topic)
·
Expressions/idioms/phrases
in context
·
Inference
(implied detail)
·
Grammatical
features
·
Detail
(scanning for a specifically stated detail)
·
Excluding
facts not written (unstated details)
·
Supporting
idea(S)
·
Vocabulary
in context
These
specifications, and the questions that exemplify them, are not just a string of
“ straight” comprehension questions that follow the thread of the passage. The
questions represent a sample of the test specifications for TOEFL reading
passages, which are derived from research on a variety of abilities good
readers exhibit. Notice that idea, scanning for details, guessing word meanings
from context, inferencing, using discourse markers, etc. To construct your own
assessment that involve short reading passages followed by questions, you can
begin with TOEFL –like specs as a basis. Your focus in your own classroom will
determine which of these- and possibly other specifications – you will include
in your assessment procedure, how you will frame questions, and how much weight
you will give each item in scoring.
The technology
of computer- based readig comprehension tests of this kind enables some
additional types of items. Items such as the following are typical:
·
Click on the word in paragraph 1 that means “ subsection work”.
·
Look at the word they in paragraph 2. Click on the word that
they refer to.
·
The following sentence could be added to paragraph 2:
Instead, he used the pseudonym Mrs. Silence Dogood.
Where would it best fit in to the paragraph? Click on the
square to add the sentence to the paragraph.
·
Click on the drawing that most closely resembles the
prehistoric coelacanth. [Four drawings are depicted on the screen]
|
Short
-Answer Tasks
Multiple-
choice items are difficult to construct and validate, and classroom teachers
rarely have time in their busy schedules to design such a test. A popular
alternative
DESIGNING
ASSESSMENT TASKS: EXTENSIVE READING
Extensive reading involves somewhat
longer texts than we have been dealing with up to this point. Journal articles
technical reports, longer essas, short stories, and books fall ito this
category. The reason for placing such reading into a separate category is that
reading of this type of discourse almost always involves a focus a focus an
meaning using mostly top-down processing, with only occasional use of a
targeted bottom-up strategy. Also, because of the extent of such reading,
formal assessment is unlikely to be contained within presenta a unique
challenge fo assessment purposes.
Another complication in assessing extensive
reading is that the expected response from the reader is likely to involve as
much written ( or sometimes oral )performance as reading. For example, in
asking test-takers to respond to an article or story, one could argue that a
greater emphasis is palced on writing than on reading. This is no reason to
sweep extensive reading assessment under the rug; teacher should not shrink
from the assessment of this highly shopisticated skill.
Before examining a few tasks that
have proved to be useful in assessing extensive reading, it is essential to
note that number of the tasks described in previous categories can apply here.
Among them are:
·
Impromptu reading plus comprehension
questions,
·
Short-answer tasks,
·
Editing,
·
Scanning,
·
Ordering,
·
Information transfer, and
·
Interpretation ( discussed under
graphics ).
In
addition to those applications are tasks that are unique to extensive reading:
skimming, summerizing, responding to reading, and note-taking.
Skimming
Tasks
Skimming is the process of rapid coverage
of reading matter to determine its gits or main idea. It is a prediction
strategy used to give a reader a sense of the topic and purpose of a text, the
organization of the text, the perspective or point of view of the writer, its
aesy or difficulty, and/or its usefulness to the reader. Of course skimming can
apply to the text of less than one page, so it would be wise not to confine
this type to task just to extensive texts.
Assessment
of skimming strategies is usually straightforward: the test-taker skims a text
and answers questions such as the following:
What is the main idea of this
text?
What is teh author’s purpose
in writing the text?
What kind of writing is this
[newspaper, article, manual, novel,etc.]?
What type of writing in this
[expository, technical, narrative,etc]?
How easy or difficult do you
thinks this text will be?
What do you think you will
learn from the text?
How useful will the text be
for your [profession, academic needs, interests]?
|
Responses
the oral or written, depending on the context. Most assessments in the domain
of skimming are informal and formative: they are grist for an imminent
dicussion, a more careful reading to follow, or an in-class discussion, and
therefore their washback potential is good insofar as the subject amtter and
task are useful to a student’s goals, authenticityis preserved. Scoring is less
of an issue than providing appropriate feedback to students on their strategies
of prediction.
Summarizing
and Responding
One of the most common means of
assessing axtensive reading is to ask the test-taker to write a summary of the
text. Rhe task that is given to students can be very simply worded:
Write
the summary of the text. Your summary should be about one paragraph in
length (100-150 word) and should include your understanding of the main
idea and supporting ideas.
|
Evaluating summaries is difficult:
do you give test-taker a certain of points for targeting the main idea and its
supporting ideas?
1. Expresses
acurately the main idea and supporting ideas.
2. Is
written in the student’s own words; occasional vocabulary from the original
text is acceptable.
3. Is
logically organized.
4. Displays
facility in the use of language to clearly express ideas in the text.
|
As you can readily see, a strict
adherence to the criterion of assessing reading and reading only, implies
consideration of only the first factor; the other three pertain in writing
performance. The first criterion is nevertheless a crucial factor; other wise
the reader-writer could pass all three of the other criterial with virtually no
understanding of the text itself. Evaluation of the reading comprehension
criterion will of neccesity remain somewhat subjective because the teacher will
need to determine digrees of fulfiliment of the objective ( see below for more
about scoring this task).
Of
further interest in assessing extensive reading is the technique of asking a
student to respond to a text. The two tasks should not be confused with each
other summarizing requires a synopsis or over view of the text, while
responding asks the reader to provide his or her own opinion on the text as a
whole or on some statement or issue within it. Responding may be prompted by
such directionsas this.
Directions
for responding to reading
One criterion for a good response
here is the extent to which the test-taker accurately reflects the content of
the article and some of the arguments there in. Scoring as also difficult here
because of the subjectivity of determining an accurate reflection of the
article itself. For the reading component of this task, as well as the summary
task describe above, a holistic scoring system may be feasible:
3 Demonstrated clear, unambiguous comprehension of
the main and supporting ideas.
2 Demonstrates comprehension of the main idea but
lacks comprehension of some
supporting
ideas.
1 Demonstrates only a partial comprehension of the
main and supporting ideas.
0 Demonstrates no comprehension of the main and
supporting ideas.
|
The teacher or
test administrator must still determine shades of gray between the point
categories, but the descriptions help to bridge the gap between an impirically
determined evaluation (which is impossible) and wild, impressionistic guesses.
An
attempt has been made here to underscore the reading component of summarizing
and responding to reading, but it is crucial to consider the interactive
relationship between reading and writing that is highlighted in these two
tasks. As you direct students to engage in such intergrative performance, it is
advisable not to treat them as tasks for assessing reading alone.
Note-Taking
Outlining
Finally, a reader’s comprehension
of extensive texts may be assessed through an evaluation of a process of
note-taking and/or outlining. Because of the difficulty of controlling the
conditions and time frame for both these techniques, they rest fimerly in the
category of informal assessment. Their utility is in the strategic training
that learners gain in retaining information through marginal notes that
highlight key information or organizational outlines that put supporting ideas
into a visually manageable framework. A teacher, perhaps in one-on-one
conferences with students, can use student notes/outlines as indicators of the
presence or absence of effective reading strategies, and there by point the
learners in positive directions.
0 komentar:
Posting Komentar