ASSESSING SPEAKING AND WRITING
Written by: GROUP 6
Wahyu Ningrum 1423 022
Rany Pangesti 1423 024
Puji Sundari 1423 023
Fitri Dewi Sartika 1423 008
Reza Junaidi 1423 050
ASSESSING SPEAKING
A.BASIC TYPES
OF SPEAKING
We
cited four categories of listening performance assessment tasks in last
materals, a similar taxonomy emerges for oral production.
1.
Imitative. At one end of a continuum of types of speaking performance is the
ability to simply parrot back (imitate) a word or phrase or possibly a
sentence. While this is a purely phonetic level or oral production, a number of
prosodic, lexical, and grammatical properties of language may be included in
the criterion performance. We are interested only in what is traditionally
labeled “ pronounciation”; no inferences are made about the test-taker’s
ability to understand or convey meaning or to paticipate in an interactive
conversation. The only role of listening here is in the short- term storage of
a prompt, just long enough to allow the speaker to retain the short stretch of
language that must be imitated.
2.
Intensive. A second type of speaking frequently employed in assessment
contexts is the production of short stretches of oral language designed to
demonstrate competence in an arrow band of grammatical, phrasal, lexical, or
phonological relationships (such as prosodic elements- intonation, stress,
rhythm, juncture). The speaker must be aware of semantic properties in oreder
to be able to respond, but interaction with an interlocutor or test
administrator is minimal at best. Examples of intensive asessment tasks
include directed directed response tasks, reading aloud, sentence and dialogue
completion; limited picture- cued tasks including simple sequences; and
translation up to the simple sentence level.
3.
Responsive. Responsive assessment task include interaction and test
comprehension but at the somewhat limited level of very short conversations,
standard greetings and small talk, simple requests and comments, and the like.
The stimulus is almost always a spoken prompt ( in order to perserve autheticity),
with perhaps only one or two follow- up questions or retorts:
A.
Mary:
Excuse me, do you have the time?
Doug : Yeah. Nine-fifteen.
B.
T :
What is the most urgent environmental problem today?
S : I would say massive deforestation.
C.
Jeff
: Hey, Stef, how’s it going?
Stef : Not bad, and yourself?
Jeff : I’m good.
Stef : Cool. Okay, gotta go.
4.
Interactive. The difference between responsive
and interactive apeaking is the length and complexity of the interaction, which
sometimes includes multiple exchanges and/or multiple participants. Interaction
can take the two forms of transactional language, which has the purpose
of exchanging specific information, or interpersonal exchanges, which
have the purpose of maintaining social relationships. ( in the three dialogues
cited above, A and B were transactional, and C was
interpersonal.) in interpersonal exchanges, oral production can become pragmatically complex with the
need to speak in a casual register and use colloquial language ellipsis, slang,
humor, and other sociolinguistic conventions.
5.
Extensive
(monologue) .Extensive oral production tasks
include speeches, oral presentations, and story- telling, during which the
opportunity for oral interaction from listeners is either highly limited
(perhaps to nonverbal responses) or ruled out altogether. Language style is
frequently more deliberative (planning is involved) and formal for extensive tasks,
but we cannot rule out certain informal monologues such as casually delivered
speech ( for example, my vacation in the mountain,a recipe for outstanding
pasta primavera, recounting the plot of a novel or movie).
B.MICRO- AND
MACROSKILLS OF SPEAKING
A
list of listening micro- macroskills enumerated the various components of
listening that make up criteria for assessment. A similar list of speaking
skills can be drawn up for the same purpose: to serve as a taxonomy of skills
from which you will select one or several that will become the objective (s) of
an assessment task. The microskills refer to producing the smaller chunks of
language such as phonemes, morphemes, words, collocations, and phrasal units.
The macroskills imply the speaker’s focus on the larger elements: fluency,
discourse, function, style, cohesion, nonverbal communication, and strategic
options. The micro- and macroskills total roughly 16 different objectives to
assess in speaking.
Micro- and
macroskills of oral production
Microskills
1. Produce
differences among English phonemes and allophonic variants.
2. Produce
chunks of language of different lengths.
3. Produce
English stress pattern, word in stressed and unstressed positions, rythmic
structure, and intonation contours.
4. Produce
reduced forms of words and phrases.
5. Use
an adequate number of lexical units (words) to accomplish pragmatic
purposes.
6. Produce
fluent speech at different rates of delivery.
7.
Monitor one’s own oral production and use various strategic devices-
pauses, fillers, self- corrections, backtracking- to enhance the clarity of
the message.
8. Use
grammatical word classes (nouns, verbs, etc.) systems (e.g.,tense,
agreement, pluralization), word order, patterns, rules, and elliptical
forms.
9. Produce
speech in natural constituents: in appropriate phrases, pause groups,
breath groups, and sentence constituents.
10. Express
a particular meaning in different grammatical forms.
11.
Use cohesive devices in spoken discourse.
Macroskills
12. Appropriately
accomplish communicative functions according to situations, participants,
and goals.
13.
Use appropriate styles, registers, implicature, redundancies,
pragmatic other conventions, conventions rules, floor- keeping and
–yielding, interupting, and sociolinguistic features in face- to- face
conversations.
14. Convey
links and connections between events and communicate such relations as
focal and periperal ideas, events and feelings, new information and given
information, generalization and exemplification.
15. Convey
facial features, kinesics, body language, and other nonverbal cues along
with verbal language
16.
Develop and use a battery of speaking strategies, such as
emphasizing key words, rephrasing, providing a context for interpreting the
meaning of words, appealing for help, and accurately assessing how well
your interlocutor is understanding you.
|
|
As
you consider designing tasks for
assessing spoken language, these skills can act as a checklist of objectives.
While the microskills have the appearance of being more complex than the microskills,
both contain ingredientsof difficulty, depending on the stage and context of
the test-taker.
There
is such an array of oral production tasks that a complete treatment is almost
impossible within the confines of one chapter in this book. Below is a
consideration of the most common techniques with brief allusions to related
tasks. as already noted in the introduction to this chapter, consider three
important issues as you set out to design tasks:
1.
No
speaking task is capable of isolating the single skill of oral production.
Concurrent involvement of the additional performance of aural comprehension,
and possibly reading, is usually necessary.
2.
Eliciting
the specific criterion you have designated for a task can be tricky because
beyond the word level, spoken language offers a number of productive options to
test- takers. Make sure your elicitation prompt achieves its aims as closely as
possible.
3.
Because
of the above two characteristics of oral production assessment, it is important
to carefully specify scoring procedures for a response so that ultimately you
achieve as high a reliability index as possible.
C.DESIGNING
ASSESSMENT TASKS : IMITATIVE SPEAKING
You may be
surprised to see the inclusion of simple phonological imitation in a
consideration of assessment of oral production. After all, endless repeating of
words, phrases, and sentences was the province of the long-since-discarded
Audiolingual Method, and in an era of communicative language teaching, many
believe that nonmeaningful imitation of sounds is fruitless. Such opinions have
faded in recent years as we discovered that an overemphasis on fluency can
sometimes lead to the decline of accurary in speech. And so we have been paying
more attention to pronounciation, especially suprasegmentals, in an attempt to
help learners be more comprehensible.
An
occasional phonologically focused repetition task is warranted as long as
repetition tasks are not allowed to occupy a dominant role in an overall oral
production assessment, and as long as you artfully avoid a negative washback
effect. Such tasks range from word level to sentence level, usually with each
item focusing on a specific phonological
criterion. In a simple repetition task,
tes- takers repeat the stimulus, whether it is a pair of words, a sentence, or
perhaps a question ( to test for intonation production).
Test-
takers hear : Repeat after me:
beat
[pause] bit [pause]
bat
[pause] vat [pause] etc.
I
bought a boat yesterday.
The
glow of the candle is growing. etc.
When
did they go on vacation?
Do you like
coffee? etc.
Test- takers repeat the stimulus.
|
|
Word repetition
task
A
variation on such a task prompts test-takers with a brief written stimulus
which they are to read aloud. ( In the section below on intensive speaking,
some tasks are described in which test- takers read aloud longer texts).
Scoring specifications must be clear in order to avoid reliability breakdowns.
A common form of scoring simply indicates a two- or three-point system for each
response.
2 acceptable pronounciation
1 comprehensible, partially correct
pronounciation
0 silence, seriously incorrect
pronounciation
|
|
Scoring scale
for repetition tasks.
The
longer the stretch of language, the more possibility for error and therefore
the more difficult it becomes to assign a point system to the text. In such a
case, it may be impretative to score only the criterion of the task. For
example, in the sentence “ when did they go on vacation?” since the criterion
is falling intonation for wh- questions, points should be awarded
regardless of any mispronunciation.
D.PHONEPASS®
TEST
An
example of a popular test that uses imitative (as well as intensive) production
tasks is PhonePass, a widely used, commercially available speaking test in many
countries. Among a number of speaking tasks on the test, repetition of
sentences ( of 8 to 12 words) occupies a prominent role. It is remarkable that
research on the PhonePass test has supported the construct validity of its
repetition tasks not just for a test- taker’s phonological ability but also for
discourse nd overall oral production ability ( Townshend et al., 1998;
Bernstein et al., 2000; Cascallar & Bernstein, 2000).
The
PhonePass test elicits computer-assisted oral production over a telephone.
Test- takers read aloud, repeat sentences, say words, and answer questions.
With a downloadable test sheet as a reference, test- takers are directed to
telephone a designated number and listen
for directions. The test has five sections.
Part
A:
Test-takers
read aloud selected sentences from among those printed on the test sheet.
Examples :
1.
Traffic is a huge problem in Southern California.
|
|
PhonePass®
test specifications
2.
The endless city has no coherent mass transit system.
3. Sharing
rides was going to be the solution o rush- hour traffic.
4.
Most people still want to drive their own cars, though.
Part B :
Test-takers repeat
sentences dictated over the phone. Examples : “Leave town on the next
train”/
Part
C:
Test- takers answer questions with a single word or a short
phrase of two or three words. Example: :would you get water from a bottle
or a newspaper?”
Part
D:
Test- takers hear three words groups in random order and must
link them in a correctly ordered entence. Example : was reading/ my
mother/a magazine.
Part
E:
Test –takers have 30 seconds to talk about their opinion about
some topic that is dictated over the phone. Topics center on family,
preferences, and choices.
|
|
Scores for the
PhonePass test are calculated by a computerized scoring template and reported
back to the test- taker within minutes. Six scores are given: an overall score
between 20 and 80 and five subscores on the same scale that rate
pronounciation.
The
tasks on Part A and Part B of the PhonePass test do not extend beyond the level
of oral reading and imitation. Parts C and D represent intensive speaking ( see
the next section in this chapter). Section E is used only for experimental
data-gathering and does not figure into the scoring. The scoring procedure has
been validated againts human scoring with extraordinarily high reliabilities
and correlation statistics (94 overall). Further, this ten- minute test
correlateswith the ellaborate Oral Proficiency Interview (OPI, described later
in this chapter) at . 75, indicating a very high degree of correspondence
between the machine- scored PhonePass and the human- scored OPI (Bernstein at
al.,2000).
The
PhonePass findings could signal an increase in the future use of repetition and
read-aloud procedures for the assessment of oral production.because a test-
takers output is completely controlled, scoring using speech- recognition
technology becomes achieveable and practical. As researches uncover the
constructs underlying both repetition/read- aloud tasks and oral production in
all its complexities, we will have access to more comprehension explanations of
why such simple tasks appear to be reliable and valid indicators of very
complex oral production proficiency. Here re some details on the PhonePass
test.
Producer
: Ordinate
Corporation, Menlo Park, CA
Objective
: To test oral
production skills of non- native English speakers
Primary
market : worldwide,
primarily in workplace settings where employees require a comprehensible
command of spoken English; secondarily in academic settings for placement
and evaluation of students
Type
: Computer –
assisted telephone operated, with a test sheet.
Response
modes : Oral, mostly
repetition tasks
Specifications
: (see above)
Time
allocations : Ten minutes
Internet access : www.ordinate.com
|
|
PhonePass® test
E.DESIGNING
ASSESSMENT TASKS : INTENSIVE SPEAKING
At the
intensive level, test- takers are prompted to produce short stretches of
discourse ( no more than a sentence) through which they demonstrate linguistic
ability at a specified level of language. Many tasks are “ cued” tasks in that
they lead the test- taker into a narrow band of possibilities.
Part
C and D of the PhonePass test fulfill the criteria of intensive tasks are they
elicit certain expected forms of language. Antonyms like high and low,
happy and sad are prompted so that the automated scoring mechanism
anticipates only one word. The either/ or task of part D fulfills the same
criterion. Intensive tasks may also be described as limited response
tasks (Madsen, 1983), or mechanical tasks (Underhill, 1987), or what
classroom pedagogy would label as controlled responses.
1.Directed Response
Tasks
In
this type of task, the test administrator elicits a particular grammatical form
or a transformation of a sentence. Such tasks are clearly mechanical and not
communicative, but they do require minimal processing of meaning in order to produce
the correct grammatical output.
Test
– takers hear : Tell me he went
home.
Tell
me that you like rock music.
Tell
me that you aren’t interested in tennis.
Tell
him to come to my office at noon.
Remind
him what time it is.
|
|
Directed
response
2.Read – Aloud Tasks
Intensive
reading – aloud tasks include reading beyond the sentence level up to a
paragraph or two. This technique is easily administrated by selecting a passage
that incorporates test specs and by recording the test- taker’s recording the
test- taker’s output; the scoring is relatively easy because all of the test –
taker’s oral production is controlled. Because of the results of research on
the PhonePass test, reading aloud may actually be a suprisingly strong
indicator of overall oral production ability.
For many
decades, foreign language programs have used reading passages to analyze oral
production. Prator’s (1972) Manual of Anerican English Pronounciation included
a “diagnostic passage” of about 150 words that students could read aloud into a
tape recorder. Teachers listening to the recording would then rate students on
a number of phonological factors ( vowels, dipthongs, consonants, consonant
clusters, stress, and intonation) by completing a two – page diagnostic
checklist on which all errors or questionable items were noted. These
checklists ostensibly offered direction to the teacher for emphases in the
course to come.
An
earlier form of the Test of Spoken English (TSE® , see below )
incorporated one read- aloud passage of about 120 to 130 words with a rating
scale for pronounciation and fluency. The following passage is typical:
Despite the decrease in size- and, some would say, quality- of
four cultural world, there still remain strong ddifferences between the
usual British and American writing styles. The question is, how do you get
our message across? English prose conveys its most novel ideas as if they
were timeless truths, while American writing exaggerates; if you believe
half of what is said, that’s enough. The former uses understatement; the
latter, overstatement. There are also disadvantages to each characteristics
approach. Readers who are used to being screamed at may not listen when
someone chooses to whisper politely. At the same time, the individual whi
is used to a quiet manner may reject a series of loud imperatives.
|
|
Read – aloud
stimulus, paragraph length
The scoring
scale for this passage provided a four- point scale for pronounciation and for
fluency, as shown in the box below.
Pronounciation:
Points
:
0.0-0.4 frequent phonetic
errors and foreign stress and intonation patterns that cause the speaker to
be unintelligible.
0.5-1.4 frequent phonemic
errors and foreign stress and intonation patterns that cause the speaker to
be occasionally unintelligible.
1.5-2.4 some consistent
phonemic errors and foreign stress and intonation patterns, but the speaker
is unintelligible.
2.5-3.0 Occasional
non-native pronounciation errors, but the speaker is always intelligible.
Fluency
:
Points
:
0.0-0.4 speech is so
halting and fragmentary or has such a non-native flow that intelligibility
is virtually impossible.
0.5-1.4 Numerous non-native
pauses and/ or a non-native flow that interfere With intelligibility.
|
|
Test of Spoken
English scoring scale (1987, p. 10)
|
|
1.5-2.4 some non-native
pauses but with a more nearly native flow so that the pauses do not
interfere with intelligibity.
2.5-3.0 Speech is smooth
and effortless, closely approximating that of a native speaker.
|
|
Such a rating
list does not indicate how to gauge intelligibility, which is mentional
in both lists. Such slippery terms slippery terms remind us that oral
production scoring, even with the controls that reading aloud offers, is still
an inexact science.
Underhill (1987, pp. 77-78)
suggested some variations on the task of simply reading a short passage:
·
Reading
a scripted dialogue, with someone else reading the other part
·
Reading
sentences containing minimal pairs, for example:
Try
not to heat/hit the pan too much.
The doctor gave
me a bill/pill.
·
Reading
information from a table of chart
If reading aloud shows certain practical advantages (predictable
output, practicality, realibility in scoring), there are several drawbacks to
using this technique for assessing oral production. Reading aloud is somewhat
inauthentic in that we seldom read anything aloud to someone else in the real
world, with the exception of a parent reading to a child, occasionally sharing
a written story with someone, or giving a scripted oral presentation. Also,
reading aloud calls on certain specialized oral abilities that may not indicate
one’s pragmatic ability to communicate orally in face-to-face contexts. You should
therefore employ this technique with some caution, and certainly supplement it
as an assessment task with other, more communicative procedures.
3.Sentence/Dialogue
Completion Tasks and Oral Questionnaires
Another
technique for targeting intensive aspects of language requires test- takers to
read dialogue in which one speaker’s lines have been omitted. Test- takers are
first given time to read through the dialogue to get its gist and to think
about appropriate lines to fill in. Then as the tape, teacher, or test
administrator produces one part orally, test-taker responds.
An
advantages of this technique lies in its moderate control of the output of the
test-taker. While individual variations in responses are accepted, the tecnique
taps into a lerner’s ability to discren expectancies in a conversation and to
produce sociolinguistically correct language. One disadvantages of this
technique is its reliance on literacy and an ability to transfer easily from
written to spoken English. Another disadvantages is the contrived, inauthentic
nature of this task: Couldn’t the same criterion performance be elicited in a
live interview in which an impromptu role-play technique is used?
Perhaps
more useful is a whole host of shorter dialogues of two or three lines, each of
which aims to elicit a specified target. In the following examples, somewhat
unrelated items attempt to elicit the past tense, future tense, yes/no
question formation, and asking for the time. Again, test-takers see the
stimulus in written form.
Test-takers see
Interviewer: What did
you do last weekend?
Test- taker:
Interviewer: What will
you do after you graduate from this program?
Test- taker:
Test-taker: ?
Interviewer: I was in
Japan for two weeks.
Test- taker: ?
Interviewer: It’s ten-thirty.
Test- takers respond with appropriate lines.
|
|
Directed
response tasks
One could
contend that performance on these items is responsive, rather than intensive.
True, the discourse involves responses, but there is a degree of control
here that predisposes the test- taker to respond with certain expected forms.
Such arguments underscore the fine lines of distinction between and among the
selected five categories.
It
could also be argued that such techniques are nothing more than a written form
of questions that might otherwise ( and more appropriately) be part of a
standard oral interview. True, but the advantage that the written form offers
is to provide a little more time for the test- taker to anticipate an answer,
and it begins to remove the potential ambiguity created by aural
misunderstanding. It helps to unlock the almost unbiquitous link between
listening and speaking performance.
Underhill
(1987) describes yet another technique that is useful for controlling the test-
taker’s output: form-filling, or what I might rename “ oral questionnaire.”
Here the test- taker sees a questionnaire that asks for certain categories of
information (personal data, academic information, job experience, etc.) and
supplies the information orally.
4.Picture –
Cued Tasks
One
of the most popular ways to elicit oral language performance at both intensive
and extensive levels is a picture- cued stimulus that requires a description from
the test- taker. Pictures may be very simple, designed to elicit a word or a
phrase; somewhat more elaborate and “busy”; or composed of a series that tells
a story or incident. Here is an example of a picture-cued elicitation of the
production of a simple minimal pair. Grammatical categories may be cued by
pictures. In the following sequences, comparatives are elicited
Notice
that a little sense of humor is injected here: the family, bundled up in their
winter coats, is looking forward to leaving the wintry scene behind them! A
touch of authenticity is added in that almost everyone can identify with
looking forward to a vacation on a tropical island. Assessment of oral
production may be stimulated through a more elaborate picture such as the one
the next page, a party scene. Moving into more open-ended performance, the
following picture asks test- takers not only to identify certain specific
information but also to elaborate with their own opinion, to accomplish a
persuasive function, and to describe preferences in paintings. Maps are another
visual stimulus that can be used to assess the language forms needed to give
directions and specify locations. In the following example, the test- taker
must provide directions to different locations.
5.Translation (of Limited Stretches of
Discourse)
Translation is a part of our
tradition in language teaching that we tend to discount or disdain, if only
because our current pedagogical stance plays down its importance. Translation
methods of teaching are certainly passé in a era of direct approaches to
creating communicative classroom. But we should remember that in countries
where English is not the native or prevailing language, translation is a
meaningful communicative device in context where the English user is called on
to be an interpreter. Also, translation is a well-proven communication strategy
for learners of a second language.
Under certain constraints, then, it
is not far-fetched to suggest translation as a device to check oral production.
Instead of offering pictures or written stimuli, the test-taker is given a
native language word, phrase, or sentence and is asked to translate it.
Conditions may vary from expecting an instant translation of an orally elicited
linguistic target to allowing more thinking time before producing a translation
of somewhat longer texts, which may optionally be offered to the test-taker in
written form. (Translation if extensive is discussed at the end of this
chapter). As an assessment procedure, the advantages of translation lie in its
control of the output of the test-taker, which of course means that scoring is
more easily specified.
Chapter 2
A.DESIGNING ASSESSMENT TASKS: RESPONSIVE
SPEAKING
Assessment of responsive task involves brief
interactions with an interlocutor, differing from intensive tasks in the
increased creativity given to the test-taker and from interactive tasks by the
somewhat limited length of utterances.
1.Question and Answer
Question and answer tasks can consist of one or two
questions from an interviewer, or they can make up an portion of a whole
battery of questions and prompts is an oral interview. They can vary from
simple questions like “What is the called in English?” to complex question like
“ What are the steps governments should take, if any, to stem the rate of
deforestation in tropical countries”? The first question is intensive in its
purpose; it is a display question intended to elicit a predetermined correct
response. We have already looked at some of these types of questions in the
previous section. Questions at the responsive level tend to be genuine
referential question in which the test-taker is given more opportunity to
produce meaningful language in response.
In designing such questions for
test-takers, it’s important to make sure that you know why you are asking the
question. Are you simply trying to elicit stings of language output to gain a
general sense of the test-taker’s discourse competence? Are you combining
discourse and grammatical competence in the same question? Is each question
just one in a whole set of related questions? Responsive questions may take the
following forms:
Questions
eliciting open-ended responses
Test-takers hear:
- What do you think about the weather today?
- What do you like about the English language?
- Why did you choose your academic major?
- What kind of strategies have you used to help
you learn English?
- a. Have you ever been to the United States
before?
b. What other countries have you visited?
c. Why did you go there? What did you
like best about it?
d. If you could go back, what would you
to do or see?
e. What country would you like to visit
next, and why?
|
Notice that
question #5 has five situationally linked questions that may vary slightly
depending on the test-taker’s response to a previous question.
Oral interaction with a test
administrator often involves the latter forming all the questions. The flip
side of this usual concept of question-and-answer tasks is to elicit questions
from the test-taker.
A potentially tricky form of oral
production assessment involves more than one test-taker with a interviewer,
which is discussed later in this chapter. With two students in a interview
context, both test-takers can ask questions of each other.
2.Giving Instruction and Directions
We are all called on in our daily routines to read
instructions on how to operate an appliance, how to put a bookshelf together,
or how to create a delicious clam chowder. Somewhat less frequent is the
mandate to provide such instructions orally, but this speech act is still
relatively common. Using such a stimulus in an assessment context provides an
opportunity for the test-taker to engage in a relatively extended stretch of
discourse, to be very clear and specific, and to use appropriate discourse
markers and connectors. The technique is simple: the administrator poses the
problem, and the test-taker respond. Scoring is based primarily on
comprehensibility and secondarily on other specified grammatical or discourse
categories. Here are some possibilities.
Eliciting
instruction or directions
Test-taker hear:
- Describe how to make a typical dish from your
country.
- What’s a good recipe for making________?
- How do you access email on a PC computer?
- How would I make a typical costume for a _____
celebration in your country?
- How do you program telephone numbers into a
cell (mobile) phone?
- How do I get from _________to________ in your
city?
Test-takers respond with appropriate
instruction/directions.
|
Some pointers for creating such tasks: The test
administrator needs to guard against test-takers knowing and preparing for such
items in advance lest they simply parrot back a memorized set of sentences. An
impromptu delivery of instructions is warranted here, or at most a minute or so
of preparation time. Also, the choice of topics needs to be familiar enough so
that you are testing not general knowledge but linguistic competence;
therefore; topics beyond the content schemata of the test-taker are
inadvisable. Finally, the task should require the test-taker to produce at
least five or six sentences (of connected discourse) to adequately fulfill the
objective.
This task can be designed to be more
complex, thus placing it in the category of extensive speaking. If your
objective is to keep the response short and simple, then make sure your
directive does not take the test-taker down a path of complexity that he or she
is not ready to face.
3.Paraphrasing
Another type
of assessment task that can be categorized as responsive asks the test-taker to
read or hear a limited number of sentences (perhaps two to five) and produce a
paraphrase of the sentence. For example:
Paraphrasing
a story
Test-takers hear: Paraphrase the
following little story in your own words.
My weekend in the mountains was fabulous.
The first day we backpacked into the
mountains and climbed about 2,000 feet. The hike was strenuous but
exhilarating. By sunset we found these beautiful alpine lakes and made camp
there. The sunset was amazing beautiful. The next two days we just kicked
back and did little day hikes, some rock climbing, bird watching, swimming,
and fishing. The hike out on the next day was really easy-all downhill-and
the scenery was incredible.
Test-takers respond with two or three
sentence.
|
A more authentic context for paraphrase is aurally
receiving and orally relaying a message. In the example below, the test-taker
must relay information from a telephone call to an office colleague named Jeff.
Paraphrasing
a phone message
Test-takers hear:
Please tell Jeff that I’m tied up in
traffic so I’m going to be about a half hour late for the nine o’clock
meeting. And ask him to bring up our question about the employee benefits
plan. If he wants to check in with me on my cell phone, have him call
415-338-3095. Thanks.
Test-takers respond with two or three
sentences.
|
The advantages of such tasks are that they elicit
short stretches of output and perhaps tap into test-takers’ ability to practice
the conversational art of conciseness by reducing the output/input ratio. Yet
you have to question the criterion being assessed. Is it a listening task more
than production? Does it test short-term memory rather than linguistic ability?
And how does the teacher determine scoring of responses? If you use short
paraphrasing tasks as an assessment procedure, it’s important to pinpoint the
objective of the task clearly. In this case, the integration of listening and
speaking is probably more at stake than simple oral production alone.
B.TEST OF SPOKEN ENGLISH (TSE®)
Somewhere straddling responsive, interactive, and
extensive speaking tasks lies another popular commercial oral production
assessment, the Test of Spoken English (TSE). The TSE is a 20-minute audiotaped
test of oral language ability within an academic or professional environment.
TSE scores are used by many North American institutions of higher education to
select international teaching assistants. The scores are also used for
selecting and certifying health professionals such as physicians, nurses,
pharmacists, physical therapists, and veterinarians.
The tasks on the TSE are designed to
elicit oral production in various discourse categories rather than in selected
phonological, grammatical, or lexical targets. The following content specifications
for the TSE represent the discourse and pragmatic context assessed in each
administration:
- Describe something physical.
- Narrate from presented material.
- Summarize information of the speaker’s own
choice.
- Give directions based on visual materials.
- Give instruction.
- Give an opinion.
- Support an opinion.
- Compare/contrast.
- Hypothesize.
- Function “interactively”.
- Define.
Using these
specifications, Larazaton and Wagner (1996) examined 15 different specific
tasks in collecting background data from native and non-native speakers of
English.
1.
Giving a personal description
2.
Describing a daily routine
3.
Suggesting a gift and supporting one’s
choice
4.
Recommending a place to visit and
supporting one’s choice
5.
Giving directions
6.
Describing a favorite movie and supporting
one’s choice
7.
Telling a story from pictures
8.
Hypothesizing about future action
9.
Hypothesizing about a preventative action
10.
Making a telephone call to the dry cleaner
11.
Describing an important news event
12.
Giving an opinion about animals in the zoo
13.
Defining a technical term
14.
Describing information in a graph and
speculating about its implications
15.
Giving details about a trip schedule
From their
findings, the researchers were able to report on the validity of the tasks,
especially the match between the intended task functions and the actual output
of both native and non-native speakers.
Following is a set of sample items
as they appear in the TSE Manual, which is downloadable from the TOEFL®
website (see reference on page 167).
Test of
Spoken English sample items
Part A.
Test-takers see: A map of a town
Test-takers hear: Imagine that we are colleagues.
The map below is of a neighboring
town that you
have suggested I visit. You have 30 seconds to study
the map. Then
I’ll ask you some questions about it.
- Choose one place on the map that you think I
should visit and give me some reason why you recommend this place.(30
seconds)
- I’d like to see a movie. Please give me
directions from the bus station to the movie theater.(30 seconds)
- One of your favorite movies is playing at the
theater. Please tell me about the movie and why you like it.(60 seconds)
Part B.
Test-takers see:
A series of six pictures depicts a
sequence of event. In this series, painters have just painted a park bench.
Their WET PAINT sign blows away. A man approaches the bench, sits on it, and
starts reading a newspaper. He quickly discovers his suit has just gotten wet
paint on it and then rushes to the dry cleaner.
Test-takers hear:
Now please look at the six pictures
below. I’d like you tell me the story that the pictures show, starting with
picture number 1 and going through picture number 6. Please take 1 minute to
look at the pictures and think about the story. Do not begin the story until
tell you to do so.
- Tell me the story that the picture show. (60
seconds)
- What could the painters have done to prevent
this? (30 seconds)
- Imagine that this happens to you. After you
have taken the suit to the dry cleaners, you find out that you need to
wear the suit the next morning. The dry cleaning service usually takes
two days. Call the dry cleaner and try to persuade them to have the suit
ready later today. (45 seconds)
- The man in the pictures is reading a newspaper.
Both newspaper and television news programs can be good sources of
information about current events. What do you think are the advantages
and disadvantages oh each of these sources? (60 seconds)
Part C.
Test-takers hear:
Now I’d like hear your ideas about a
variety of topics. Be sure a say as much as you can in responding to each
question. After I ask each question, you may take a few seconds to prepare
your answer, and then begin speaking when you’re ready.
- Many people enjoy visiting zoos and seeing the
animals. Other people believe that animals should not be taken from
their natural surroundings and put in zoos. I’d like to know what you
think about is issue. (60 seconds)
- I’m not familiar with your field of study.
Select a term used frequently in your field and define it for me. (60
seconds)
Part D.
Test-takers see:
A graph showing an increase in world
population over a half-century of time.
Test-takers hear:
- This
graph presents the actual and projected percentage of the world population living in cities
from 1950 to 2010. Describe to me the information given in the graph.
(60 seconds)
- Now discuss what this information might mean
for the future. (45 seconds)
Part E.
Test-takers see:
A printed itinerary for a one-day bus
tour of Washington, D.C., on which four relatively simple pieces of
information (date, departure time, etc.) have been crossed out by hand and
new handwritten information added.
Test-takers hear:
- Now
please look at the information below about a trip to Washington,
D.C., that has been organized
for the members of the Forest City Historical Society. Imagine that you
are the president of this organization. At the last meeting, you gave
out a schedule for the trip, but there have been some changes. You must
remind the members about the details of the trip and tell them about
changes indicated on the schedule. In your presentation, do not just
read the information printed, but present it as if you were talking to a
group of people. You will have one minute the plan your presentation and
will be told when to begin speaking. (90 seconds)
|
Holistic scoring taxonomies such as
these imply a number of abilities that comprise “effective” communication and
“competent” performance of the task. The original version of the TSE (1987)
specified three contributing factors to a final score on “overall
comprehensibility”: pronunciation, grammar, and fluency. The current scoring
scale of 20 to 60 listed above incorporates task performance, function,
appropriateness, and coherence as well as the form-focused factors. From
reported scores, institutions are left to determine their own threshold levels
of acceptability, but because scoring is holistic, they will one receive an
analytic score of how each factor breaks down (see Douglas & Smith, 1997,
for further information). Classroom teachers who propose to model oral
production assessments’ after the task on the TSE must, in order to provide
some washback effect, be more explicit in analyzing the various components of
test-takers’ output,. Such scoring rubrics are presented in the next section.
Following a summary of information
on the TSE:
Test of
Spoken English (TSE®)
Producer:
Educational Testing Service, Princeton, NJ
Objectives: To test oral production
skills of non-native English speakers
Primary market: Primarily used for screening
international teaching assistants in universities in the United State; a
growing secondary market is certifying health professionals in the United
States
Type: Audiotaped with
written, graphic, and spoken stimuli
Response modes: Oral tasks, connected discourse
Specification: (see sample
items above)
Time allocation: 20
minute
Internet access: http:/www.toefl.org/tse/tseindx.html
|
B.DESIGNING ASSESSMENT TASKS: INTRACTIVE
SPEAKING
The final categories of oral production assessment
(interactive and extensive speaking) include tasks that involve relatively long
stretches of interactive discourse (interviews, role plays, (speeches, telling
longer stories, and extended explanations and translations). The obvious
difference between the two sets of tasks is the degree of interaction with an
interlocutor. Also, interactive tasks are what some would describe as interpersonal, while the final category
includes more transactional speech
events.
1.Interview
When “oral production assessment” is mentioned, the
first thing that comes to mind is an oral interview: a test administrator and a
test-taker sit down in a direct face-to-face exchange and proceed through a
protocol of questions and directives. The interview, which may be tape-recorded
for re-listening, is then scored on one or more parameters such as accuracy in
pronunciation and/or grammar, vocabulary usage, fluency, sociolinguistic/ pragmatic
appropriateness, tasks, accomplishment, and even comprehension.
Interviews can vary in length from
perhaps five to forty-five minutes, depending on their purpose and context.
Placement interviews, designed to get a quick spoken sample from a student in
order to verify placement into a course, may need only five minutes if the
interviewer is trained to evaluate the output accurately. Longer comprehensive
interviews such as the OPI (see the next section) are designed to cover
predetermined oral production context an may require the better part of an hour.
Every effective interview contains a
number of mandatory stages. Two decades ago, Michael Canale (1984) proposed a
framework for oral proficiency testing that has withstood the test of time. He
suggested that test-takers will perform at their best if they are led through
four stages:
- Warm-up. In a minute or so
of preliminary small talk, the interviewer directs mutual introductions,
helps the test-taker become comfortable with the situation, apprises the
test-taker of the format, and allays anxieties. No scoring of this phase
takes place.
- Level check. Through a series
of preplanned questions, the interviewer stimulates the test-taker to
respond using expected or predicted forms and functions. If, for example,
from previous test information, grades, or other data, the test-taker has
been judged to be a “Level 2” (see below) speaker, the interviewer’s
prompts will attempt to confirm this assumption. The response may take
very simple or very complex form, depending on the entry level of the
learner. Questions are usually designed to elicit grammatical categories
(such as past tense or subject-verb agreement), discourse structure (a
sequence of event), vocabulary usage, and/or sociolinguistic factors
(politeness conventions, formal/informal language). This stage could also
give the interviewer a picture of the test-taker’s extroversion, readiness
to speak, and confidence, all of which may be of significant consequence
in the interview’s result. Linguistic target criteria are scored in this
phase. If this stage is lengthy, a tape-recording of the interview is important.
- Probe. Probe questions
and prompts challenge test-takers to go to the heights of their difficult
questions. Probe questions may be complex in their farming and/or complex
in their cognitive and linguistic demand. Through probe items, the
interviewer discovers the ceiling or limitation of the test-taker’s
proficiency. This need not be a separate stage entirely, but might be a
set of questions that are interspersed into the previous stage. At the
lower levels of proficiency, probe items may simply demand a higher range
of vocabulary or grammar from the test-taker than predicted. At the higher
levels, probe items will typically as the test-taker to given an opinion
or a value judgment, to discus his or her field of specialization, to
recount a narrative, or to respond to questions that are worded in complex
form. Responses to probe questions may be scored, or they may be ignored
if the test-taker displays an inability to handle such complexity.
- Wind-down. This final phase
of the interview is simply a short period of time during which the
interviewer encourages the test-taker to relax with some easy questions,
set the test-taker’s mind at ease, and provides information about when and
where to obtain the result of the interview. This part is not scored.
The
suggested set of content specifications for an oral interview (below) may serve
as sample questions that can be adapted to individual situations.
Oral
interview content specifications
Warm-up:
- Small
talk
Level
check:
The test-taker . . .
- Answers wh-questions.
- Produces
a narrative without interruptions.
- Reads
a passage aloud.
- Tells
how to make something or do something.
- Engages
in a brief, controlled, guided role play.
Probe:
Test-taker. . .
- Responds
to interviewer’s questions about something the test-taker doesn’t know
and is planning to include in a article or paper.
- Talks
about his or her own field of study or profession.
- Engages
in a longer, more open-ended role play (for example, simulates a
difficult or embarrassing circumstance) with the interviewer.
- Gives
an impromptu presentation or some aspect or test-taker’s field.
Window-down:
- Feelings about the interview, information or
results, further question.
|
Here are some possible questions, probes, and comments that fit those
specifications. Sample questions for the four stages of an oral
interview
1. warm-up:
How are you?
What’s your name?
What country are you from? What [city town]?
Let me tell you about this interview.
2. Level
check:
How long have you been in this [country, city]?
Tell me about your family.
What is your [academic major, professional interest,
job]?
How long have you been working at your [degree,
job]?
Describe your home [city, town] to me.
How do you like your home [city, town]?
What are you hobbies or interest? (What do you do in
your spare time?)
Why do you like your [hobby, interest]?
Have you traveled to another country beside this one
and your home country?
Tell me about that country.
Compare your home [city, town] to another [city,
town].
What is your favorite food?
Tell me how to [make, do] something you know well.
What will you be doing ten years from now?
I’d like you to ask me some questions.
Tell me about an exciting or interesting experience
you’ve had.
Read the following paragraph, please. [test-taker reads aloud]
Pretend that you are ______and I am a______. [guided role play follows]
- Probe:
What are you goals for learning English in this
program?
Describe your [academic field, job] to me. What do
you like and dislike about it?
What is your opinion of [a recent headline news
event]?
Describe someone you greatly respect, and tell me
why you respect that person.
If you could redo your education all over again,
what would you do differently?
How do eating habits and customs reflect the culture
of the people of a country?
If you were [president, prime minister] of your
country, what would you like to change about your country?
What career advice would you give to your younger
friends?
Imagine you are writing an article on a topic you
don’t know very much about. Ask me some questions about that topic.
You are in a shop that sells expensive glassware.
Accidentally you knock over an expensive vase, and it breaks. What will you
say to the store owner? [Interviewer role-plays the store owner]
- Wind-down:
Did you feel okay about this interview?
What are you plans for [the weekend, the rest of
today, the future]?
You’ll get your results from this interview
[tomorrow, next week].
Do you have any questions you want to ask me?
It was interesting to talk with you. Best wishes.
|
The success
of an oral interview will depend on
·
Clearly specifying administrative
procedures of the assessment (practicality),
·
Focusing the questions an probes on the
purpose of the assessment (validity),
·
Appropriately eliciting an optimal amount
and quality of oral production from the test-taker (biased for best
performance), and
·
Creating a consistent, workable scoring
system (reliability).
The last issue is the thorniest. In oral production
tasks that are open-ended and that involve a significant level of interaction,
the interviewer is forced to make judgments that are susceptible to some
unreliability. Thorough experience, training, and careful attention to the
linguistic criteria being assessed, the ability to make such judgments
accurately will be acquired. In Table 7.2, a set of descriptions is given for
scoring open-ended oral interviewers. These descriptions come from an earlier
version of the Oral Proficiency Interview and are useful for classroom
purposes.
The test administrator’s challenge
is to assign a score, ranging from 1 to 5, for each of six categories indicated
above. It may look easy to do, but in reality the line of distinction between
levels is quite difficult to pinpoint. Some training or at least a good deal of
interviewing experience is required to make accurate assessments of oral
production in the six categories. Usually the six scores are then amalgamated
into one holistic score, a process that might not be relegated to a simple
mathematical average if you wish to put more weight on some categories than you
do on others.
This five-point scale, once known as
“FSI levels” (because they were first advocated by the Foreign Service
Institute in Washington, D.C.), is still in popular use among U.S. government
foreign service staff for designating proficiency in a foreign language. To
complicate the scoring somewhat, the five-point holistic scoring categories
have historically been subdivided into “pluses” and “minuses” as indicated in
Table 7.3. To this day, even though the official nomenclature has now changed
(see OPI description below), in-group conversations refer to colleagues and
co-workers by their FSI level: “Oh, Bob, yeah, he’s a good 3+ in Turkish- he
can easily handle that assignment”.
A variation on the usual one-on-one
format with one interviewer and one test-taker is to place two test-takers at a
time with the interviewer. An advantage of a two-on-one interview is the
practicality of scheduling twice as many candidates in the same time frame, but
more significant is the opportunity for student-student interaction. By deftly
posing questions, problems, and role plays, the interviewer can maximize the
output of the test-takers while lessening the need or his or her own output. A
further benefit is the probable increase in authenticity when to two
test-takers can actually converse with each other. Disadvantages are equalizing
the output between the two test-takers, discerning the interaction effect to
unequal comprehension and production abilities, and scoring two people
simulataneously.
2.Role Play
Role playing is a popular
pedagogical activity in communicative language-teaching classes. As an
assesment device,role play opens som windows of opportunity for test-takers to
use discourse that might otherwise be difficult to elicit. How to do that? In
this case, The test administrator must determine the assesment objectives of
the role play,then devise a scoring technique that appropriately pinpoints
those objectives.
3.Discussions and Conversations
As formal assesment devices,
discussions and conversations are difficult to specify and score. But as
informal techniques to assess learners,they offer a level of authenticity and
spontaneity that other assesments may not provide. Discussions may be
especially appropriate tasks through which to elicit and observe such abilities
as:
- Topic nomination,maintenance,and
termination;
- Attention getting,interrupting,floor
holding,control;
- Clarifying,questioning,paraphrasing;
- Comprehension signals(nodding, “uh-huh”,
“hmm”, etc. .) ;
- Negotiating meaning;
- Intonation patterns for pragmatic effect;
- Kinesics,eye contact,proxemics,body
language; and
- Politeness,formality,and other
sociolinguistic factors.
Assessing the
performance of participants through scores or checklist should be carefully
designed to suit the objectives of the observed discussion. Of course
discussion is intergrative task and so it is also advisable to give some
cognizance to comprehension performance in evaluatng learners.
4.Games
Among informal assesment devices are
a variety of games that directly involve language production. Consider the
following types:
- Tinkertoy
game
- Crossword
puzzles
- Information
gap grids
- City maps.
As
assesments,the key is to specify a set of criteria and a reasonably practical
and reliable scoring method. The benefit of such an informal assesment may not
be as much in a summative evaluation as in its formative nature, with washback
for the students.
B.ORAL PROFICIENCY INTERVIEW (OPI)
The best known oral interview format
is one that has gone through a considerable metamorphosis over the last
half-century, the Oral Proficiency Interview. Originally known as the Foreign
Servive Institute (FSI) test, the OPI is the result of historical progression
of revisions under the auspices of several agencies, including the Educational
Testing Service and the American Council on Teaching Foreign Language. The OPI
is widely used across dozens of language around the world.
In a series of structured tasks ,the
OPI is carefully designed to elicit pronunciation, fluency, and integrative
abilities, sociolinguistic and cultural knowledge, grammar, and vocabulary.
Performance is judged by the examiner to be at one of ten possible levels on
the ACTFL-designated proficiency guidelines for speaking:
Superior;Advanced-high,mid,low; intermediate-high,mid,low; Novice-high,mid,low.
Bachman (1988,p.149) pointed out
that the validity of the OPI simply cannot be demonstrated “because it
confounds abilities wiyh elicitation procedures in its design, and it provides
only a single rating,which has no basis in either theory or research.”
Meanwhile, a great deal of
experimentation continues to be conducted to design better oral proficiency
testing methods(Bailey,1998;Young & He,1998). With on going critical
attention to issues of language assessment in the years to come,we may able to
solve some of the thorny problems of how best to elicit oral production in
authentic contexts and to create valid and reliable scoring meth
C.DESIGNING ASSESSMENTS: EXTENSIVE SPEAKING
1.Oral Presentations
For oral presentations, a checklist
or grid is a common means of scoring or evaluation. Holistic scores are
tempting to use for their apparent practicallity,but they may obscure the
variability of performance across several subcategories,especially the two
major components of content and delivery.
A checklist is reasonably practical.
Its reliability can vary if clear standards for scoring are not maintained. Its
authenticity can be supported in that all of the items on the list contribute
to an effective presentation. The washback of a checklist will be enhanced by
written comments from the teacher,a conference with the teacher,peer
evaluations using the same form,and self-assessment.
2.Picture-Cued Story-Telling
One of the most common techniques
for eliciting oral production is through visual
pictures,photographs,diagrams,and charts. At this level we consider a picture
or a series of pictures as a stimulus for a longer story or description.
It’s always tempting to throw any
picture sequence at test-takers and have them talk for a minute or more about
the pictures. But as is true of every assessment of speakig ability,the
objective of eliciting narrartive discourse needs to be clear. Your criteria
for scoring also need to be clear about what you are hoping to assess. Refer
back to some of the guidlines suggested under the section on oral interviews or
the OPI for some general suggestions on scoring.
3.Retelling a Story,News Event
In this type of task,test-takers
hear or read a story or news event that they are asked to retell about it. This
differs from the paraphrasing task is a longer stretch of discourse and a
different genre. The objectives in assigning such a task vary from listening
comprehension of the original to production of a number of oral discourse features
(communicating sequence and relationships of events,stress and emphasis
patterns,”expression” in the case of a dramatic story), fluency, and
interaction with the hearer. Scoring should meet the intended criteria,of
course.
4.Translation (of Extended Prose)
The advantage of translating longer
text is in the control of the content,vocabulary,and to some extend,the
grammatical and discourse features. The disadvantage is that translation of
longer text is a highly specialized skill for which some individuals obtain
post-baccalaureate degrees. To judge nonspecialist’s oral language ability on
such a skill maybe completely invalid,especially if the test takers have not
engaged in translation at this level. Criteria for scoring should therefore
take into account not only the purpose in stimulating a translation but the
possibility of errors that are unrelated to oral production ability.
Chapter 3
ASSESSING
WRITING
Not
many centuries ago, writing was a skill that was the exclusive domain of
scribes and scholars in educational or religious instituations. Almost every
aspect of everyday like for ”common” people was carried out orally. Bussiness
transactions, records, legal documents, political and military agreements – all
were written by specialists whose vocation it was to render language into the
written word. Today, the ability to write hsa become an indispensable skill in
our global literate community. Writing skill, at least at rudimentary levels,
is a necessary condition for achieving employment in many walks of life and is
simply taken for granted in literate cultures.
In
the field of second language teaching, only a half- century ago experts were
saying that writing was primarily a convention for recording speech and for
reinforcing grammatical and lexical features of language. Now we understand the
uniqueness of writing as a skill with its own features and conventions. We also
fully understand the diffficulty of learning to write “ well” in any language,
even in our own native language. Every educated child in developed countries
learns the rudiments of writing in his or her native language, but very few
learn to express themselves clearly with logical, well developed organization
that accomplishes an intended purpose. And yet we expect second language learners
to write coherent essays with artfully chosen rhetorical and discourse devices!
With
such a monumental goal, the job of teaching writing has occupied the attention
of papers, articles, dissertations, books, and even separate professional
journals exclusively devoted to writing in a second language. It follows
logically that the assessment of writing is no simple task. As you consider
assessing students’ writing ability, as usual you need to be clear about your
objective or criterion. What it is you want to test: hand writing ability?
Correct spelling? Writing sentences that are grammatically correct? Paragraph
construction? Logical development of main idea? All of these, and more, are
possible objectives. And each objective can be assessed through a variety of
tasks, which we will examine in this chapter.
Before
looking at specific tasks, we must scritinize the different genres of written
language ( so that context and purpose are clear), types of writing ( so that
stages of the development of writing ability are accounted for), and micro- and
macroskills of writing ( so that objectives can be pinpointed precisely).
A.GENRES OF
WRITTEN LANGUAGE
Assessment
of reading listed more than 50 written language genres. The same clasification
scheme is reformulated here to include the most common genres that a second
langauge writer might produce, within and beyond the requirments of a
curriculum. Even though this list is slightly shorter, you should be aware of
the suprising multiplicity of options of written genres that second language
learners need to acquire.
1. Academic
Writing
Papers and general subject reports essays, copositions
academically focused journals short- answer test responses technical
reports (e.g.,lab reports) theses, dissertations
2. Job-
related writing
Messages (e.g., phone messages) letters/emails memos (e.g.,
interoffice) reports (e.g., job evaluations, project reports) schedules,
labels, signs advertisements, announcements manuals
3. Personal
writing
Letters, emails, greeting cards, invitations messages, notes
calendar entries, shopping lists, reminders financial documents (e.g.,
checks, tax forms, loan applications) forms, questionnaires, medical
reports, immigration documants diaries, personal journals fiction (e.g.,
short stories, poetry)
|
|
Genres of
writing
B.TYPES OF
WRITING PERFORMANCE
Four
categories of written performance that capture the range of written production
are considered here. Each category resembles the categories defined for the
other three skilss, but these categories, as
always, reflect the uniqueness of the skill area.
- Imitative. To produce written language, the learner
must attain skills in the fundamental, basic tasks of writing letters,
words, punctiations, and very brief sentences. This category includes the
ability to spell correctly and to perceive phoneme- grapeme
correspondences in the English spelling system. It ia a level at which
learners are trying to master the mechanics of writing. At this stage,
form is the primary if not exclusive focus, while context and meaning are
of secondary concern
- Intensive (controlled). Beyond the fundamentals of imitative
writing are skills in producing appropriate vocabulary within a acontext,
collocations and idioms, and correct grammatical features up to the length
of a sentence. Meaning and context are of some importance in determining
correctness and appropriateness, but most assessment tasks are more
concerned with a focus on form, and are rather strictly controlled by the
test design.
- Responsive. Here, assessment tasks require learners
to perform at a limited discourse level, connecting sentences into a
paragraph and creating a logically connected sequence of two or three
paragraphs. Task respond to pedagogical diirectives, lists of criteria,
outlines, and other guidelines. Genres of writing include brief narratives
and descriptions, short reports, lab reports, summaries, brief responses
to reading, and interpretations of charts and graphs. Under specified
conditions, the writer begins to exercise some freedom of choice among
alternative forms of expression of ideas. The writer has mastered the
fundamentals of sentence- level grammar is more focused on the discourse
conventions that will achieve the objectives of the written text. Form-
focussed attention is mostly at the discourse level, with a strong
emphasis on context and meaning.
- Extensive. Extensive writing implies successful
management of all the processes and strategies of writing of all purposes,
up to the length of an essay, a term paper, a major research project
report, or even a thesis. Writers focus on achieving a purpose, organizing
and developing ideas logically, using details to support or illustrate
ideas, demonstrating syntactic and lexical variety, and in many cases,
engaging in the process multiple drafts to achieve a final product. Focus
on grammatical form is limited to occasional editing or proofreading of a
draft.
C.MICRO- AND
MACROSKILLS OF WRITING
We
turn once again to a taxonomy of micro- and macroskills that will assist you in
defining the ultimate criterion of an assessment procedure. The earlier
microskills apply more appropriately to imitative and intensive types of
writing task, while the macroskills are essential for the succesful mastery of
responsive and extensive writing.
Microskills
1. Produce
graphemes and orthographic patterns of English.
2. Produce
writing at an efficient rate of speed to suit the purpose.
3. Produce
an acceptable core of words and use appropriate word order patterns.
4. Use
acceptable grammatical systems (e.g., tense, agreement, pluralization),
patterns and rules.
5. Express
a particular meaning in different grammatical forms.
6.
Use cohesive devices in written discourse.
Macroskills
7. Use
the rethorical forms and conventions of written discourse.
8. Appropriately
accomplish the communicative functions of written texts according to form
and purpose.
9.
Convey links and connections between events, and communicative
such relations as main idea, supporting idea, new information, given
information, generalization, and exemplification.
|
|
Micro- and
macroskills of writing
10. distinguish between literal and implied meanings when writing.
11.
correctly convey culturally specific refernces in the context
of the written text.
12.
develop and use a battery of writing strategies, such as
accurately assessing the audience’s interpretation, using prewriting
devices, writing with fluency in the first drafts, using paraphrases and
synonyms, soliciting peer and instructor feedback, and using feedback for
revising and editing.
|
|
D.DESIGNING
ASSESSMENT TASKS: IMITATIVE WRITING
With
the recent worldwide emphasis on teaching English at young ages, it is tempting
to assume that ecery English learner knows how to handwrite the Roman alphabet.
Such is not the case. Many beginning- level English learners, from young
children to older adults, need basic training in and assessment of imitative
writing: the rudiments of forming letters, words, and simple sentences. We
examine this level of writing first.
1.Tasks in
[Hand] Writing Letters, Words, and Punctuation
First,
a comment should be made on the increasing use of personal and laptop computers
and handheld instruments for creating written symbols. Handwriting has the
potential of becoming a lost art as even very young children are more and more
likely to use a keyboard to produce writing. Making the shapes of letters and
other symbols is now more a question of learning typing skills than of training
the muscles of the hands to use a pen or pencil. Nevertheless, for all
practical purposes, handwriting remains a skill of paramount importance within
the larger domain of language assessment .
A limited
variety of types of tasks are commonly used to asess a person’s ability to
produce written letters and symbols. A few of the more common types are
described here.
- Copying. There is nothing innovative or modern
about directing a test- taker to copy letters or words. The test-
takerwill see something like the following:
- Listening cloze selection tasks. these tasks combine dictation with a
written script that has a relatively frequent deletion ratio (every fourth
or fifth word, perhaps). The test sheet provides a list of missing words
fro which the test- taker must select. The purpose at this stage is not to
test spelling but to give practice in writing. To increase the difficulty,
the list of words can be deleted, but then spelling might become an
obstacle. Probes look like this:
- Picture- cued tasks. familiar pictures are displayed, and test-
takers are told to write the word that the picture represents. Assuming no
ambiguity in identifying the picture (cat, hat, chair, table, etc.), no
reliance is made on aural comprehension for successful completion of the
task.
- Form completion tasks. a variation on pictures is the use of a
simple form (registration, application, etc.) that asks for name, address,
phone number, and other data. Assuming, of course, that prior classroom
instruction has focused on filling out such forms, this task becomes an
appropriate assessment of simple tasks such as writting one’s name and
address.
- Converting numbers and abbreviations to
words. Some tests
have a section on which numbers are written – for example, hours of the
day, dates, or schedules- and test- takers aare directed to write out the
numbers. This task can serve as a reasonably reliable method to stimulate
handwritten English. It lacks unthenticity, however, in that people rarely
write out such numbers (except in writing checks), and it is more of a reading task (recognizing numbers)
than a writing task. If you plan to use such a method, be sure to specify
exactly what the criterion is , and then proceed with some caution.
Converting abbreviations to words is more authentic: we actually do have
occasions to write out days of the week, months, and words like street,
boulevard, telephone, and April (months of course are often abbreviated
with numbers). Test tasks may take this form:
2.Spelling
Tasks and Detecting Phoneme-Grapheme Correspondences
A
number of task types are in popular use to assess the ability to spell words
correctly and to process phoneme- grapheme correspondences.
- Spelling Tests. In a traditional, old-fashioned spelling
test, the teacher dictates a simple list of words, one word at a time,
followed by the word in a sentence, repeated again, with a pause for test-
takers to write the word. Scoring emphasizes correct spelling. You can
help to control for listening errors by choosing words that the students
have encountered before- words that they have spoken or heard in their
class.
- Picture- cued tasks. pictures are displayed
with the objective of focusing on familiar wordswhose spelling may be
unpredictable. Items are chosen according to the objectives of the
assessment, but this format is an opportunity to present some challenging
words and word pairs: boot/book, read/reed, bit/bite, etc.
- Multiple-choice techniques. Presenting words and phrases in the form
of a multiple choice task risks croosing over into the domain of assessing
reading, but if the items have a follow-up writing component, they can
serve as formative reinforcement of spelling conventions. They might be more
challenging with the addition of homonyms.
- Matching phonetic symbols. If students have become familiar with the phonetic alphabet,
they could be shown phonetic symbols and asked and write the correctly
spelled word alphabetically. This works best with letters that do not have
one- to – one correspondence with the phonetic symbol. In the sample below
the answers, which of course do not appear on the test sheet, are included
in brackets for your refernce.
Such a task
risks confusing students who don’t recognize the phonetic alphabet or use it in
their daily routine. Opinion is mixed on the value of using phonetic symbols at
the literacy level. Some claim it helps students to perceive the relationship
between phonemes and graphemes. Others caution againts using yet another system
of symbols when the alphabet already poses a challenge, especially for adults
for whom English is the only language they have learned to read or write.
E.DESIGNING
ASSESSMENT TASKS: INTENSIVE (CONTROLLED) WRITING
This
next level of writing is what second language teacher training manuals have for
decades called controlled writing. It may also be thought of as form- focussed
writing, grammar writing, or simply guided writing. A good deal of writing at
this level is display writing as opposed to real writing: students produce
language to display their competence in grammar, vocabulary or sentence
formation, and not necessarily to convey meaning for an authentic purpose. The
traditional grammar/ vocabulary test has plenty of display writing in it, since
the response mode demonstrates only the test- taker’s ability to combine or use
words correctly. No new information is passed on from one person to the other.
1.Dictation and
Dicto-Comp
Dictation
was described as an assessment of the integration of listening and writing, but
it was clear that the primary skill being assessed is listening. Because of its
response mode, however, it deserves a second mention in this chapter. Dictation
is simply the rendition in writing of
what one hears aurally, so it could be classified as an imitative type
of writing, especially sine a proportion of the test-taker’s performance
centers on correct spelling. Also, because the test- taker must listen to
stretches of discourse and in the process insert punctuation, dictation of a
paragraph or more can arguably be classified as a controlled or intensive form
of writing. A form of controlled writing related to dictation is a dicto-comp.
Here, a paragraph is read at normal speed, usually two or three times; then the
teacher asks students to rewrite the paragraph
from the best of their recollection. In one of several variations of the
dicto-comp technique, the teacher, after reading the passage, distributes a
handout with key words from the paragraph, in sequence, as cues for the
students. In eiher case, the dicto-comp is genuinely classified as an
intensive, if not a responsive, writing task. Test- takers must internalize the
content of the passage, remember a few phrases and lexical items as key words,
then recreate the story in their own words.
2.Grammatical
Transformation Task
In
the day of structural paradigms of language teaching with slot-filler
techniques and slot substitution drills, the practice of making grammatical
transformations –orally or in writing –was very popular. To this day, language
teachers have also used this technique as an assessment task, ostensibly to
measure grammatical competence. Numerous versions of the task are possible:
·
Change the
tenses in a paragraph
·
Change full
forms of verbs to reduced forms (contradictions)
·
Change
statements to yes/no or wh-questions.
·
Change
questions into statements.
·
Combine two
sentences into one using a relative pronoun.
·
Change direct
speech to indirect specch.
·
Change from
active to passive voice.
The
list of possibilities is almost endless. The tasks are virtually devoid of any
meaningful value. Sometimes test designers attempt to add authenticity by
providing a context(“ Today Doug is doing all these things. Tomorrow he will do
the same things again. Write about the what Doug wil do tomorrow by using the
future tense”), but, this is just a backdrop for a written substitution task.
One the positive side, grammatical transformation tasks are easy to administer
and are therefore practical, quite high in scorer reliaability, and arguably
tap into a knowledge of grammatical forms that will be performed through
writing. If you are only interested in a person’s ability to produce the forms, then such tasks
may prove to be justifiable.
3.Picture –
Cued Tasks
a
variety of picture –cued controlled tasks have been used in English classrooms
aroun the world. The main advantage in this technique is in detaching the
almost ubiquitous reading and writing connection and offering instead a
nonverbal means to stimulative written responses.
- Short sentences. A drawing of some simple action is shown;
the test-taker writes a brief sentence.
- Picture description. A somewhat more complex picture may be
presented showing, say, a person reading on a couch, a cat under a table,
books and pencils on the table, chairs aroud the table, a lamp next to the
couch, and a picture on the wall over the couch. Test- takers are asked to
described the picture using four of the following prepositions: on, over,
under, next, to, around. As long as the prepositions are used
appropriately; the criterion is considered to be met.
- Picture sequence description. A sequence of three to six pictures
depicting a story line can provide a suitable stimulus for written
production. The pictures must be simple and unambigious because an
open-ended task at the selective level would give tes- takers too many
options. If writing the correct grammatical form of a verb is the only
criterion, then some test items might include the simple form of the verb
below the picture. The time sequence in the following task in intended to
give writter some cues.
When these
kinds of tasks are designed to be controlled, even at this very simple level, a
few different correct responses can be made for each item in the sequence. If your
criteria in this task are both lexical and grammatical choice, then you need to
design a rating scale to account for variations between completely right and
completely wrong in both categories.
Scoring scale
for controlled writing
2 grammatically and lexically
correct.
1 either grammar or vocabulary is
incorrect, but not both.
0 both grammar and vocabulary are
incorrect
|
|
4.Vocabulary
Assessment Tasks
Most
vocabulary study is carried out through reading. A number of assessments of
reading recognition of vocabulary were discussed in the previous chapter:
multiple-choice techniques, matching, picture- cued identification, cloze
techniques, guessing the meaning of word in context, etc. The major techniques
used to assess vocabulary are (a) defining and (b) using a word in a sentence.
The latter is the more authentic, but even that task is constrained by a
contrived situation in which the test- taker, usually in a matter of seconds,
has to come up with an appropriate a sentence, which may or may not indicate
that the test- taker” knows” the word:
Read (2000)
suggested several types of items for assessment of basic knowledge of the
meaning of a word, collocational possibilities, and derived morphological
forms. His example centered on the word interpret , as follows:
Test- takers read:
1. Write
two sentences, A and B. In each sentence, use the two words given.
A.
Interpret, experiment .
B.
Interpret, language .
2. Write
three words that can fit in the blank.
To interpret a (n) i. .
ii .
iii .
3. Write
the correct ending for the word in each of the following sentences:
Some who interprets is an interpret .
Something thet can be interpreted is interpret .
Someone who interprets gives an interpret .
|
|
Vocabulary
writing tasks (Read, 2000, p. 179)
Vocabulary
assessment is clearly form-focused in the above tasks, but the procedures are
creatively linked by means of the target word, its collocations, and its
morphological variants. At the responsive and extensive levels, where learners
are called upon to create coherent paragraps, performance obviously becomes
more authentic, and lexical choice is one of several possible components of the
evaluation of extensive writing.
5.Ordering
Tasks
On
task at the sentence level may appeal to those who are fond of word games and
puzzles: ordering (or recording ) a scrambled set of words into a correct
sentence. Here is the way the item format appears.
Test-takers read :
Put the words below into the correct order to make a sentence:
1. Cold
/winter/ is/weather/ the/in/the
2. Studying
/ what / you/ are
3.
Next/ clock / the / the / is/ picture/ to
Test- takers write:
1. The weather is cold in the winter.
2. What are you studying?
3.
The
clock is next to the picture.
|
|
Recording words
in a sentence
While this
somewhat inauthentic task generates writing performance and may be said to tap
into grammatical word-ordering rules, it presents a challenge to test- takers
whose learning styles do not dispose the to logical- mathematical problem
solving. If sentences are kept very simple (such as #2) with perhaps no more
than four or five words, if only one possible sentence can emerge, and if
students have practiced the technique in class, then some justification
emerges. But once again, as is on many writing techniques, this task involves
as much, if not more, reading performance as writing.
6.Short- Answer
and Sentence Completion Tasks
Some
types of short-answer tasks were discussed in Chapter 8 because of the heavy
participation of reading performance in their completion. Such items range from
very simple and predictable to somewhat more elaborate responses. Look at the
range of possibilities.
Test- takers see:
1. Alicia : Who’s that?
Tony : Gina.
Alicia : where’s she from?
Tony : Italy.
2. Jennifer
: ?
Kathy : I’m studying English.
3. Restate
the following sentences in your own words, using the underlined word. You
may need to change the meaning of the sentence alittle.
3a. I never miss a day of school. Always
3b. I’m pretty healthy most of the time. Seldom
3c. I play tennis twice a week. Sometimes
4. You
are in the kitchen helping your roommate cook. You need to ask questions
about quentities. Ask a question using how much (#4a) and a queation using
how many (#4b), using nouns like sugar, pounds, flour, onions, eggs, cups.
4a. .
4b. .
5. Look
at the schedule of Roberto’s week. Write two sentences describing what
Roberto does, using the words before (#5a) and after (#5b).
5a. .
5b. .
6. Write
three sentences describing your preferences: #6a: a big, expensive caror a
small, cheap car; #6b: a house in the country or an apartment in the
city;#6c: money or good health.
6a. .
6b. .
6c. .
|
|
Limited
response writing tasks
The reading-
writing connection is apparent in the first three item types but had less of an
effect in the last three, where reading is necessary in order to understand the
directions but is not crucial in creating sentences. Scoring on 2-1-0 scale (as described above) may be the
most appropriate way to avoid
self-arguing about the appropriateness of a response.
F.ISSUES IN ASSESSING RESPONSIVE AND EXTENSIVE WRITING
Responsive
writing creates the opportunity for test-takers to offer an array of possible
creative responses within a pedagogical or assessment framework: test- takers
are :responding” to a prompt or assignment. Freed from the strict control of
intensive writing, learners can exercise a number of option in choosing
vocabulary, grammar, and discourse, but with some constraints and conditions.
Criteria now begin to include the discourse and rhetorical conventions of
paragraph structure and of connecting two or three such paragraphs in texts of
limited length. The learner is responsible for accomplishing a purpose in
writing, for developing a sequence of connected ideas, and for empathizing with
an audience.
The genres of
text that are typically addressed here are
- Short reports (with structured formats and
conventions);
- Responses to the reading of an article or
story;
- Summaries of articles or stories;
- Brief narratives or descriptions; and
- Interpretations of graphs, tables, and
charts.
It is here that
writers become involved in the art ( and science) of composing, or real
writing, as opposed to display writing.
Extensive, or
“free,” writing, which is amalgamated into our discussion here, takes all the
principles and guidelines of responsive writing and puts them into practice in
longer texts such as full- length essays, term papers, project reports, and
theses and dissertations, in extensive writing, however, the writer has been
given even more freedom to choose: topics, length, style, and perhaps even
conventions of formating are less constrained then in the typical responsive
writing exercise. At this stage, all the rules of effective writing come into
play, and the second language writer is expected to meet all the standards
applied to native language writers.
Both responsive
andextensive writing tasks are the subject of some classic, widely debated
assessment issues that take on a distincly different flavor from those at the
lower-end production of writing.
- Authenticity. Authenticity is a trait that is given
special attention: if test- takers are being asked to perform a task, its
face and content validity need to be assured in order to bring out the
best in the writer. A good deal of writing performance in academic
contexts is contrained by the pedagogical necessities of establishing the
basic building blocks of writing; we have looked at assessment techniques
that address those foundations. But once those fundamentals are in place,
the would-be writer is ready to fly out of the protectivev nest of the
writing classroom and assume his or her own voice. Offering that freedom
to learners requires the setting of authentic real-world contexts in which
to write. The teacher become less of an instructor and more of a coach or
facilitator. Assessment therefore is typically formative, not summative,
and positive washback is more important than practically and reliability.
- Scoring. Scoring is the thorniest issue at these
final two stages of writing. With
so many options available to
leraner, each evaluation by a test administrator need to be finely
attuned not just to how the writer strings words together ( the form) but
also to what the writer is saying ( the function of the text). The quality
of writing (ots important and effectiveness) becomes as important, if not
more important, than all the nuts and bolths that hold it together. How are
you to score such creative production, some of which is more artictic than
scientific? A discussion of different scoring options will continue below,
followed by a reminder that responding and editing are nonscoring options
that yield washback to the writer.
- Time. Yet another assessment issues surrounds
the unique nature of writing: it is the only skill in which the language
producer is not necessarily constrained by time, which implies the freedom
to process multiple drafts before the texts becomes a finished product.
Like a sculptor creating an image, the writer can take an initial rough
conception of a text and continue to refine it until it is deemed
presentable to the public eye. Virtually all real writing of prose texts
presupposes an extended time period for it to reach its final form, and
therefore the revising and editing processes are implied.
Responsive writing, along with the next category of extensive writing,
often relies on this essential drafting process for its ultimate success.
How do you
assess writing ability within the cinfines traditional, formal assessment
procedures that are almost always, by logistical necessity, timed? We have a
whole testing industry that has based large-scale assessment of writing on the
premise that the timed impromptu format
is a valid method of assessing writing ability. Is this an authentic format?
Can a language learner – or a native speaker, for that matter – adequately
perform writing tasks within the confines of a brief timed period of
composition? Is that hastily written product an appropriate reflection of what
that same test- taker might produce after several drafts of the same work? Does
this format favor fast writers at the expense of slower but possible equally
good or better writers? Alderson( 2000) and Weigle (2002) both cited this as
one of the most pressing unresolved issues in the assessment of writing today.
We will return to this question below.
Because of the
complexity of assesssing responsive and extensive writing, the discussion that
ensues will now have a different look from the one used in the previous three
chapters. Four major topics will be addressed : (1) a few fundamental task
types at the lower (responsive) end of the continum of writing at this level;
(2) a description and analysis of the Test of Written English®
as a typical timed impromptu test of writing;(3) a survey of methods of scoring
and evaluating writing production; and (4) a discussion of the assessment
qualities of editing and responding to a series of writing drafts.
G.DESIGNING
ASSESSMENT TASKS: RESPONSIVE AND EXTENSIVE WRITING
In
this section we consider both responsive and extensive writing tasks. they will
be regarded here as a continuum of possibilities ranging from lower- end tasks
whose complexity exceeds those in the previous category of intensive or
controlled writing, through more open- ended tasks such as writing short
reports, essays, summaries, and responses, up to texts of several pages or
more.
1.Paraphrasing
One
of the more difficult concepts for second language learners to grasp is
paraphrasing. The initial step in teaching paraphrasing is to ensure that
learners understand the importance of paraphrasing: to say something in one’s
own words, to avoid plagiarizing, to ofer some variety and expression. With
those possible motivations and purposes in mind, the test designer needs to
elicit a paraphrase of a sentence or paragraph, usually not more.
Scoring
of the test- taker’s response is the judgment call in which the criterion of
conveying the same of similar message is primary, with secondary
evaluations of discourse, grammar , and
vocabulary. Other components of analytic
or holistic scales might be considered as criteria for an evaluation.
Paraprasing is more often a part of informal and formative assessment than of
formal, summative assessment, and therefore student responses should be viewed
as opportunities for teachers and students to gain positive washback on the art
of paraphrasing.
2.Guided
Question and Answer
Nother
lower- order taskin this type of writing, which hasthe pedagogical benefit of
guiding a learner without dictating the form of the output, is a guided
question- and answer format in which the test administrator poses a series of
questions that essentially serve as an outline of the emergent written text. In
the writing of a narrative that the teacher has already covered in a class
discussion, the following kinds of questions might be posed to stimulate a
sequence of sentences.
Guided writing
stimuli
1. Where did this story take place?[setting]
2. Who
were the people in the story? [ characters]
3. What
happened first? And then? [sequence of events]
4. Why
did do ? [reasons, causes]
5. What
did think
about ?
[opinion]
6. What
happened at the end/ [climax]
7.
What is the moral of this story?[evaluation]
|
|
Guided
writing texts, which may be as long as two or three paragraphs, may be scored
on either an anlytic or a holistic scale (discussed below). Guided writing prompts like these are less likely to
appear on a formal test and more likely to serve as a way to prompt initial
drafts of writing. This first draft can then undergo the editing and revising
stages discussed in the next section of this chapter. A variation on using
guided questions is to prompt the test- taker to write from an outline. The
outline may be self- created from earlier reading and/ or discussion, or, which
is less desirable, be provided by the teacher or test administrator. The
outline helps to guide the learner through a presumably logical development of
ideas that have been given some forethought. Assessment of the resulting text
follows the same criteria listed below(#3 in the next section, paragraph
construction tasks).
3.Paragraph
Construction Tasks
The
participation of reading performance is inevitable in writing effective
paragraphs. To a great extent, writing is the art of emulating what one reads.
You read an effective paragraph; you analyze the ingredients of its success ;
you emulate it. Assessment of paragraph development takes on a number of
different forms:
- Topic sentence writing. There is no cardinal rule that says every
paragraph must have a topic
sentence, but the stating of a topic through the lead sentence (or a
subsequent one) has remained as a tried – and- true technique for teaching
the concept of a paragraph. Assessment there of consists of
·
Specifying the
writing of a topic sentence ,
·
Scoring points
for its presence or absence, and
·
Scoring and/or
commenting on its effectiveness in stating the topic.
- Topic development within a paragraph. Because paragraphs are intended to
provide a reader with “clusters” of meaningful, connected thoughts or
ideas, an other stage of assessment is development of an idea within a
paragraph. Four criteria are commonly applied to assess the quality of a
paragraph:
·
The clarity of
expression of ideas
·
The logic of
the sequence and connections
·
The
cohesiveness or unity of the paragraph
·
The overall
effectiveness or impact of the paragraph as a whole
- Development of main and supporting ideas
across paragraphs. As writers string two or more paragraphs together in a
longer text (and as we move up the continuum from responsive to extensive
writing), the writer attempts to articulate a thesis or main idea with
clearly stated supporting ideas. These elements can be considered
in evaluating a multi- paragraph essay:
·
Addressing the
topic, main idea, or principal purpose
·
Organizing and
developing supporting ideas
·
Using
appropriate details to undergrid supporting ideas
·
Showing
facility and fluency in the use of language
·
Demostrating
syntactic variety
4.Strategic
Options
Developing main
and supporting ideas is the goal for the writer attempting to create an
effective text, whether a short one- to two- paragraph one or an extensive one
of several pages. A number of strategies are commonly taught to second language
writers to accomplish their purposes. Aside from strategies of freewriting,
outlining, drafting, and revising, writers need to be aware of the task that
has been demanded and to focus on the genre of writing and the expectations of
that genre.
1.
Attending
to task. In responsive
writing, the context is seldom completely open-ended : a task has been defined
by the teacher or test administrator, and the writer must fulfill the criterion
of the task. Even in intensive writing of longer text, a set of directives has
been stated by the teacher or is implied by the conventions of the genre. Four
types of the tasks are commonly addressed in academic writing courses:
compare/contrst, problem/solution,pros/cons, and cause/effect. Depending on the
genre of the text, one or more of these task types will be needed to achieve
the writer’s purpose. If students are
asked, for example, to” agree or disagree with the author’s statement”,
a likely strategy would be to cited pros and cons and then take a stand. A task
that asks students to argue for one among several political candidates in an
election might be an ideal compare-and-contrastcontext, with an appeal to
problems present in the constituentcy and the relative value of candidates’ solution.
Assessment of the fulfillment of such tasks could be formative and informal
(comments in marginal notes, feedback in a conference in an editing/revising
stage), but the product might also be assigned a holistic or analytic score.
2.
Attending to genre. The genres of writing that were listed
at the beginning of this chapter provide some sense of the many varieties of
text that may be produced by a second language learner in a writing curriculum.
Another way of looking at the strategic options open to a writer is the extent
to which both the constrains and the opportunities of the genre are exploited.
Assessment of any writing necessitates attention to the conventions of the
genre in question. Assessment of the more common genres may include the
following criteria, along with choosen factors from the liat in item #3 ( main
and supporting ideas) above:
Reports ( Lab Reports, Project Summaries, Article/Book
Reports,etc.)
·
Conform
to a conventional format (for this case, field)
·
Convey
the purpose, goal, or main idea
·
Organize
details logically and sequentially
·
State
conclusions or findings
·
Use
appropriate vocabulary and jargon for the specific case
Summaries of Readings/Lectures/Videos
·
Effectively
capture the main and supporting ideas of the original
·
Maintain objectivity in reporting
·
Use
writer’s own words for the most part
·
Use
quotations effectively when appropriate
·
Omit
irrelevent or marginal details
·
Conform
to an expected length
Responses to Readings/Lectures/Videos
·
Accurately
reflect the message or meaning of the original
·
Appropriately
select supporting ideas to respond to
·
Express
the writer’s own opinion
·
Defend
on support that opinion effectively
·
Conform
to an expected length
Narration, Description, Persuasion/Argument, and Exposition
·
Follow
expected conventions for each type of writing
·
Convey
purpose, goal, or main idea
·
Use
effective writing strategies
·
Demonstrate
syntactic variety and rhetorical fluency
Interpreting statistical, Graphic, or Tabular Data
·
Provides
an effective global, overall description of the data
·
Organizes
the details in clear, logical language
·
Accurately
conveys details
·
Appropriately
articulates relationships among elements of the data
·
Conveys
specialized or complex data comprehensibly to a lay reader
·
Interprets
beyond the data when appropriate
Library Research Paper
·
states
purpose or goal of the reserach
·
includes
appropriate citations and references in correct format
·
accurately
represents others’ research findings
·
injects writer’s own interpretation, when
appropriate, and justifies it
·
Includeds
suggestions for futher research
·
Sums
up findings in a conclusion
Chapter 4
A.TEST OF
WRITTEN ENGLISH
One
of a number of internationally available standardized tests of writing ability
is the Test of Written English (TWE). Established in 1986, the TWE
has gained a repetation as a well-respected measure of written English, and a
number of research articles support its validity (Frase et al.,1999;Hale et
al.,1996;Longford,1996;Myford et al.,1996). In 1998, a computer-delivered
version of the TWE was incorporated into the standard computer-based TOEFL, and
simply labeled as the “writing” section of the TOEFL. The TWE is still offered
as a separate test especially where only the paper-based TOEFL is available.
Correlations between the TWE and TOEFL scores (before TWE became a standard
part of TOEFL) were consistently high, ranging from 57 to 69 over 10 test
administrations from 1993 to 1995. Data on the TWE are providedat the end of
this section.
The
TWE is in the category of a timed impromptu test in that test- takers are under
a 30- minute time limit and are not able to prepare ahead of time for the topic
that will appear. Topics are prepared by a panel of experts following
specifications for topics that represent commonly used discourse and thought patterns
at the university level. Here are some sample topics published on the TWE
website.
1. Some
people say that the best preparation for life is learning to work with
others and be cooperative. Others take the opposite view and say that
learning to be competitive is the best preparation. Discuss these
positions, using concrete examples of both. Tell which one you agree with
and explain why.
2. Some
people believe that automobile are useful and necessary. Others believe
that automobiles cause problems that affect our health and well- being.
Which position do you support? Give specific reasons for your answer.
3. Do
you agree or disagree with the following statement?
Teachers should make learning enjoyable and fun for their
students.
Use reasons abnd specific examples to support your opinion.
|
|
Sample TWE®
topics
Test
preparation manuals such as Deborah Phillips’s Longman Introductory Course
for the TOEFL Test (2001) advise TWE test-takers to follow six steps to
maximize success on the test:
1.
Carefully
identify the topic.
2.
Olan
your supporting ideas.
3.
In
the introductory paragraph, restate the topic and state the organizational plan
pf the essay
4.
Write
effective supporting paragraphs show transitions, include a topic sentence’,
specify details).
5.
Restate
your position and summarize in the concluding paragraph.
6.
Edit
sentence structure and rhetorical expression.
The
scoring guide for the TWE follows a widely accepted set of specifications for a
holistic evaluation of an essay ( see below for more discussion of holistic
scoring). Each point on the scoring system is defined by a set of statements
that addres topic, organization and development , supporting ideas, facility
(fluency, naturalness, appropriateness) in writing, and grammatical and lexical
correctness and choice. Each essay is
scored by two trained readers working independently. The final score assigned
is the mean of the two independent ratings. The test-taker can achieve a score
ranging from 1 to 6 with possible half-points (e.g., 4.5, 5.5) in between. In
the case of a discrepancy of more than one point, a third readerresolves the
difference. Discrepancy rates are extremely low, usually ranging from 1 to 2
percent per reading. It is important to put tests like the TWE in perspective.
Timed impromptu tests have obvious limitations if you are looking for authentic
sample of performance in a real -world context. How many times in real-world
situations (other than in academic writing classes!) will you be asked to write
an essay in 30 minutes? Probably never, but the TWE and other standarized timed
tests are not intended to mirror the real world. Instead, they are intended to
elicit a sample of writing performance that will be indicative of a
peson’s writing ability in the real
world. TWE designers sought to validate a feasible timed task that would be
manaageable within their constraints and at the same time offer useful
information about the test-taker.
How
does the Educational Testing Service justify the TWE as such an
indicator?Research by Hale et al. (1996) showed that the prompts used in the
TWE approximate writing tasks assigned in 162 graduate and undergraduate
courses across several disciplines in eight universities. Another study
(Golub-Smith et al.,1993) ascertained the reliabilities across several types of
prompts (e.g., compare/ contrast vs. Chart-graph interpretation). Both Myford
et al. (1996) and Longford (1996) studied the reabilities of judges’ ratings.
The question of whether a mere 30-minute time period is sufficient to elicit a
sufficient sample of a test-taker’s writing was addressed by Hale (1992).
Henning and Cascallar (1992) conducted a large-scale study to assess to extent
to which TWE performance taps into the communicative competence of the
test-taker. The upshot of this research –which is updated regularly- is that
the TWE (which adheres to a high standard of excellence in standarized testing)
is, within acceptable standard error ranges, a remarkably accurate indicator of
writing ability.
The
flip side of this controversial coin reminds us that standarized tests are
indicators, not fail-safe, infallible measures of competence. Even though we
might need TWE scores for the administrative purposes of admissions or
placement, we should not rely on such tests for instructional purposes . no one
would suggest that such 30-minute writing tests offer constructive feedback to
the student, nor do they provide the kind of formative assessment that a
process approach to writing brings. Tests like the TWE are administrative
necessities in a world where hundreds or thousands of applicants must be
evaluated by some means short of calculating their performance across years of
instruction in academic writing. The convenience of the TWE should not lull
administrators into believing that TWEs and TOEFLs and the like are the only
measures that should be applied to students. It behooves admissions and
placement officers worldwide to offer secondary measures of writing ability to
those test-takers who
a.
Are
on the threshold of a minimum score,
b.
May
be disabled by highly tim- constrained or anxiety-producing situations,
c.
Could
be culturally disadvantaged by a topic or situation, and/or
d.
(in
the case of computer- based writing) have had few opportunities to compose on a
computer.
While
timed impromptu tests suffer from a lack of authenticity and put test- takers
into an artificially time- constrained context, they necertheless offer
intertesting, relevant information for an important but narrow range of
administrative purposes. The classroom offers a much wider set of options for
creating real-world writing purposes and contexts. The classroom becomes the
locus of extended hard work and effort for building the skills necessary to
create written production. The classroom provides a setting for writers, in a process
of multiple drafts and revisions, to create a final, publicly acceptable
product. And the classroom is a place where learners can take all the small
steps, at their own pace, toward becoming proficient writers. For your
reference, following is some information on the TWE:
Producer: Educational Testing
Service (ETS), Princeton, NJ
Objective: To test written
expression
Primary
market: Almost
exclusively U.S. universities and colleges for admission purposes
Type Computer-based,
with the TOEFL. A traditional paper- based (PB) version is also available
separately.
Response
modes: Written essay
Specifications: (see
above, in this section)
Time
allocation: 30 minutes
Internet access: http://www.toefl.org/educator/edabttwe.html
|
|
Test of Written
English (TWE®)
B.SCORING
METHODS FOR RESPONSIVE AND EXTENSIVE WRITING
At
responsive and extensive levels of writing, three major approaches to scoring
writing performance are commonly used by test designers: holistic, primary
trait, and analytical. In the first method, a single score is assigned to an
essay, which represents a reader’s general overall assessment. Primary trait
scoring is a variation of the holistic method in that the achievement of the
primary purpose, or trait, of an essay is the only factor rated. Analytical
scoring breaks a test- taker’s written text down into a number of subcategories
(organization, grammar, etc.) and gives a separate rating for each.
1.Holistic
Scoring
The TWE scoring scale above is a
prime example of holistic scoring. In this chapter 7, a rubric for scoring oral
production holistically was presented. Each point on a holistic scale is given
a systematic set of descriptors to arrive at a score. Descriptors usually (but
not always) follow a prescribed pattern. For example, the first descriptor
across all score categories may address the quality of task achievement, the
second may deal with organization, the third with grammatical or rethorical
considerations, and so on. Scoring, however, is truly holistic in that those subsets
are not quantitatively added up to yield a score.
Advantages of
holistic scoring include
·
Fast
evaluation,
·
Relatively
high inter-rater reliability,
·
The
fact that scores represent “standards” that are easily interpreted by lay
persons,
·
The
fact that scores tend to emphasize the writer’s strengths (Cohen, 1994, p.135),
and
·
Applicability
to writing across many different disciplines.
Its
disadvantages must also be weighed into
a decision on whether to use holistic scoring:
·
One
scores masks differences across the subskills within each score.
·
No
diagnostic information is available (no washback potential).
·
The
scale may not apply equally well to all genres of writing.
·
Raters
need to be extensively trained to use the scale accurately.
In
general, teachers and test designers lean toward holistic scoring only when it
is expedient for administrative purposes. As long as trained evaluators are in
place, differenttiation across six levels may be quite adequate for
admissioninto an institution or placement into courses. For classroom
instructional purposes, holistic scores provide very little information. In
most classroom settings where a teacher wishes to adapt a curriculum to the
needs of a particular group of students, much more differentiated information
across subskills is desirable than is provided by holistic scoring.
2.Primary Trait
Scoring
A
second method of scoring, primary trait, focuses on “how well students can
write within a narrowly defined range of discourse” (Weigle,2002,p. 110). This
type of scoring emphasizes the task at hand and assigns a score based on the
effectivelness of the text’s achieving that one goal. For example, if the
purpose or function of an essay is to persuade the reader to do something, the
score for the writing would rise or fall on the accomplishment of that
function.if a learner is asked to exploit the imaginative fuction of language
by expressing personal feelings, then the response would be evaluated on that
feature alone. For rating the primary trait of the text, Lloyd-Jones (1997)
suggested a four- point scale ranging from zero (no response or fragmented
response) to 4 ( the purpose is unequivocally accomplished in a convincing
fashion). It almost goes without saying that organization, supporting details,
fluency, syntactic variety, and other features will implicitly be evaluated in
the process of offering a primary trait score. But the advantage of this method
is that it allows both writer and evaluator to focus on function. In summary, a
primary trait score would assess
·
The
accuracy of the account of the original (summary),
·
The
clarity of the steps of the procedure and the final result (lab report),
·
The
description of the main features of the graph (graph description), and
·
The
expression of the writer’s opinion (response to an article).
3.Analytic
Scoring
For
classroom instruction, holistic scoring provides little washback into the
writer’s further stages of learning. Primary trait scoring focusses on the
principal function of the text nad therefore offers some feedback potential,
but no washback for any of the aspects of the written production that enhance
the ultimate accomplishment of the purpose. Classroom evaluation of learning is
best served through analytic scoring, in which as many as six majors
elements of writing are scored, thus enabling learners to home in on weakness
and to capitalize on strengths.
Analytic
scoring may be more appropriately called analytic assessment in order to
capture its closer association with classroom language instruction than with
formal testing. Brown and Bailey (1984) designed an analytic scoring scale that
specified five major categories and a description of five different levels in
each category, ranging from “unacceptable” to “ excellent”. At first
glance, Brown and Bailey’s scale may look
similar to the TWE® holistic scale discussed earlier: for each
scoring category there is a description that encompasses several subsets. A
closer inspection, however, reveals much more detail in the analytic method.
Instead of just six descriptions, there are 25, each subdivided into a number
of contributing factors. The order in which the five categories (organization,
logical development of ideas, grammar, punction/spelling/mechanics, and style
and quality of expression) are listed may bias the elevaluator toward the
greater importance of organization and logical development as opposed to
punction and style. But the mathematical assignment of the 100-point scale
gives equal weight ( a maximum of 20 points) to each of the five major
categories. Not all writing and assessment specialists agree.
Table 9.2. Analytic scale for rating composition tasks (Brown &
Bingley, 1984, pp. 39-41)
|
20-18
Exellent to Good
|
17 – 15
Good to Adequate
|
14 – 12
Adequate to Fair
|
11 – 6
Unacceptable - not
|
5 – 1
College – level work
|
I.
Organization :
Introduction,
Body, and
conclusion
|
Appropriate title, effective
Introductory paragraph, topic is stated, leads to body;
transitional expressions used; arrangement of material shows plan (could be
outlined by reader); supporting evidence given for generalizations;
conclusion logical and complete
|
Adequate title,
Introduction, and conclusion; body of essay is acceptable, but
some evidence may be lacking, some ideas aren’t fully developed; sequence is
logical but transitional expressions may be absent or misused
|
Mediocre or scant introduction or conclusion; problems with the
order of ideas in body; the generalizations may not be fully supported by the
evidence given; problems of organization interfere
|
Shaky or minimally recognizable introduction; organization can
barely be seen; severe problems with ordering of ideas; lack of supporting
evidence; conclusion weak or illogical; inadequate effort at organization
|
Absence of introduction or conclusion; no apparent organization
of body; severe lack of supporting evidence; writer has not made any effort
to organize the composition (could not be outlined by reader)
|
II.
Logical development ideas:
Content
|
Essay addresses the assigned topic; the ideas are concrete and
thoroughly developed; no extraneous material; essay reflects thought
|
Essay addresses the issues but misses some points; ideas could be
more fully developed; some extraneous material is present
|
Development of ideas not complete or essay is somewhat off the
topic; paragraphs aren’t divided exactly right
|
Ideas incomplete; essay does not reflect careful thinking or was
hurriedly written; inadequate effort in area of content
|
Essay is completely inadequate and does not reflect college-
level work; no apparent effort to consider the topic carefully
|
III.
Grammar
|
Native- like fluency in English grammar; correct use of relative
clauses, prepositions, modals, articles, verb forms, and tense sequencing; no
fragments or run- on sentences
|
Advanced proficiency in English grammar; some grammar problems
don’t influence communication , although the reader is aware of them; no fragments
of run-on sentences
|
Ideas are getting through to the reader, but grammar problems are
apparent and have a negative effect on communication; run- on sentences or
fragments present
|
Numerous serious grammar problems interfere with communication of
the writer’s ideas; grammar review of some areas clearly needed; difficult to
readsentences
|
Severe grammar problems interfere greatly with the message;
reader can’t understand what the writer was tryinng to say; unintelligible
sentence structure
|
IV.
Punctuation, spelling, and mechanics
|
Correct use of English writing conventions: left and right
margins, all needed capitals, paragraphs indented , punctuation and spelling
very neat
|
Some problems with writing
conventions or punctuation; occasional spelling errors; left margin
correct; paper is neat and legible
|
Uses general writing but has errors; spelling problems distract
reader; punctuation errors interfere with ideas
|
Serious problems with format of paper; parts of essay not
legible; errors in sentence punctuation and final punctuation; unacceptable
to educated readers
|
Complete disregrard for English writing conventions; paper
illegible; obvious capitals missing, no margins, severe spelling problems
|
V.
Style and quality of expression
|
Precise vocabulary usage; use of parallel structures; concise;
register good
|
Attempts variety; good vocabulary; not wordy; register OK; style
fairly concise
|
Some vocabulary misused; lacks awareness of register; may be too
wordy
|
Poor expression of ideas; problems in vocabulary; lacks variety
of structure
|
Inappropriate use of vocabulary; no concept of register or
sentence variety
|
Content 30
Organization 20
Vocabulary 20
Syntax 25
Mechanics 5
Total 100
As your curricular goals and students’ needs vary, your own
analytical scoring of essays may be appropriately tailored. Level of
proficiency can make a significant difference in emphasis: at the intermediate
level, for example, you might give more emphasis to syntax and mechanics, while
advanced levels of writing may call for a strong push toward organization and
development. Genre can also dictate variations in scoring. Would a summary of
an article require the same relative emphases as a narrative essay? Most likely
not. Certain types of writing, such as lab report or interpretations of
statistical data, may even need additional – or at leat redefined- categories
in order to capture the essential components of good writing within those
genres. Analytic scoring of compisitions offers writers a little more washback
than a single holistic or primary trait score. Scores in five or six major
elements will help to call the writer’s attention to areas of needed
improvement. Practically is lowered in that more time is required for teachers
to attend to details within each of the categories in order to render a final
score or grade, but ultimately students receive more information about their
writing. Numerical scores alone, however, are still not sufficient for enabling
students to become proficient writers, as well shall see in the next section.
C.BEYOND SCORING : RESPONDING TO EXTENSIVE
WRITING
Formal testing carries with it the
burden of designing a practical and reliable instrument that assess its
intended criterion accurately. To accomplish that mission, designers of writing
tests are charged with the task of providing a “ objective” a scoring procedure
as possible, and one that in many cases can be easily interpreted by agents
beyond the learner. Holistic, primary trait, and analytic scoring all satisfy
those ends. Yet beyond mathematically calculated scores lies a rich domain of
assessment in which a developing writer is coached from stage to stage in a
process of building a storehouse of writing skills. Here in the classroom, in
the tutored relationships of teacher and student, and in the community of peer learners, most of the hard work of
assessing writing is carried out. Such assessment is informal, formative, and
replete with washback.
Most writing specialists agree that
the best way to teach writing is a hands- on approach that stimulates student
output and then generates a series of self-assessments, peer editing and
revision, and teacher response and conferencing (Raimes, 1991,1998; Reid, 1993;
Seow, 2002). It is not an approach that relies on a massive dose of lecturing
about good writing, nor on memorizing a bunch of rules about rhetorical
organization, nor on sending students home with an assignment to turn in a
paper the next day. People become good writers by writing and speaking the
facilitative input of others to refine their skills. Assessment takes on a
crucial role in such an approach. Learning how to become a good writer places
the student in an almost constant stage of assessment. To give the student the
maximum benefit of assessment, it is important to consider (a) earlier stages
(from freewriting to the first draft or two) and (b) later stages (revising and
finalizing) of producing a written text. A further factor in assessing writing
is the involvement of self, peers, and teacher at appropriate steps in the
process. ( for further guidelines on the process of teaching writing, see TBP,
Chapter 19.)
1.Assessing
Initial Stages of the Process of Composing
Following
are some guidelines for assessing the initial stages (the first draft or two)
of a written composition. These guidelines are generic for self, peer, and
teacher responding. Each assessor will need to modify the list according to the
level of the learner, the context, and the purpose in responding.
1. Focus
your efforts primarily on meaning, main idea, and organization.
2. Comment
on the introductory paragraph.
3.
Make general comments about the clarify of the main idea and
logic or
|
|
Assessment of
initial stages in composing
|
|
appropriateness
of the organization.
4.
As a rule of thumb,
ignore minor (local) grammatical and lexical errors.
5. indicate what appear to be major (global)
errors (e.g, by underlining the text in question), but allow the writer to
make corrections.
6. do not rewrite questionable,
ungrammatical, or awkward sentences; rather, probe with a question about
meaning.
7.
comment on features that appear to be irrelevant to the topic.
|
|
The
teacher-assessor’s role is as a guide, a facilitator, and an ally; therefore,
assessment at this stage of writing needs to be as positive as possible to
encourage the writer. An early focus on overall structure and meaning will
enable writers to clarify thair purpose and plan and will set a framework for
the writers’ later refinement of the lexical and grammatical issues.
2.Assesing
Later Stages of the Process of Composing
Once the writer
has determined and clarified his or her purpose and plan, and has completed at
least one or perhaps two drafts, the focus shifts toward “fine tuning” the
expression with a view toward a final revision. Editing and responding assume
an appropriately different character now, with these guidelines:
1. Comment
on the specific clarity and strength of all main ideas and supporting
ideas, and on argument and logic.
2. Call
attention to minor (“local”) grammatical and mechanical (spelling,
punctuation) errors, but direct the writer to self-correct.
3. Comment
on any further word choices and expressions that may not be awkward but are
not as clear or direct as they could be.
4. Point
out any problems with cohesive devices within and across paragraphs.
5. If
appropriate, comment on documentation, citation of sources, evidence, and
other support.
6.
Comment on the adequacy and strength of the conclusion.
|
|
Assessment of
later stages in composing
Through all
these stages it is assumed that peers and teacher are both responding to the
writer through conferencing in person, electronic communication, or, at the
very least, an exchange of papers. The impromptu timed tests and the methods of
scoring discussed earlier may appear to be only distantly related to such an
individualized process of creating a written text, but are they, in reality?
All those developmental stages may be the preparation that learners need both
to function in creative real-world writing tasks and to succesfully demonstrate
their competence on a timed impromptu test. And those holistic scores are after
all generalizations of the various components of effective writing. If the hard
work of succesfully progressing through a semester or two of a challenging
course in academic writing ultimately means that writers are ready to function
in their real-world contexts, and to get a 5 or 6 on the TWE, then all
the effort was worthwhile.
REFERENCES
Brown, Douglas.
H. 2004. LANGUAGE ASSESSMENT. Pearson
Education, LTD.