Assessing Speaking and Assessing Writing (GROUP TASK) ~ LTEClass English Department Baturaja University

ASSESSING SPEAKING AND WRITING
Written by: GROUP 6
Wahyu Ningrum        1423 022
Rany Pangesti           1423 024
Puji Sundari               1423 023
Fitri Dewi Sartika     1423 008
Reza Junaidi             1423 050

ASSESSING SPEAKING

A.BASIC TYPES OF SPEAKING

We cited four categories of listening performance assessment tasks in last materals, a similar taxonomy emerges for oral production.

1. Imitative. At one end of a continuum of types of speaking performance is the ability to simply parrot back (imitate) a word or phrase or possibly a sentence. While this is a purely phonetic level or oral production, a number of prosodic, lexical, and grammatical properties of language may be included in the criterion performance. We are interested only in what is traditionally labeled “ pronounciation”; no inferences are made about the test-taker’s ability to understand or convey meaning or to paticipate in an interactive conversation. The only role of listening here is in the short- term storage of a prompt, just long enough to allow the speaker to retain the short stretch of language that must be imitated.

2. Intensive. A second type of speaking frequently employed in assessment contexts is the production of short stretches of oral language designed to demonstrate competence in an arrow band of grammatical, phrasal, lexical, or phonological relationships (such as prosodic elements- intonation, stress, rhythm, juncture). The speaker must be aware of semantic properties in oreder to be able to respond, but interaction with an interlocutor or test administrator is minimal at best. Examples of intensive asessment tasks include directed directed response tasks, reading aloud, sentence and dialogue completion; limited picture- cued tasks including simple sequences; and translation up to the simple sentence level.

3. Responsive. Responsive assessment task include interaction and test comprehension but at the somewhat limited level of very short conversations, standard greetings and small talk, simple requests and comments, and the like. The stimulus is almost always a spoken prompt ( in order to perserve autheticity), with perhaps only one or two follow- up questions or retorts:

A. Mary: Excuse me, do you have the time?

Doug : Yeah. Nine-fifteen.

B. T : What is the most urgent environmental problem today?

S : I would say massive deforestation.

C. Jeff : Hey, Stef, how’s it going?

Stef : Not bad, and yourself?

Jeff : I’m good.

Stef : Cool. Okay, gotta go.

4. Interactive. The difference between responsive and interactive apeaking is the length and complexity of the interaction, which sometimes includes multiple exchanges and/or multiple participants. Interaction can take the two forms of transactional language, which has the purpose of exchanging specific information, or interpersonal exchanges, which have the purpose of maintaining social relationships. ( in the three dialogues cited above, A and B were transactional, and C was interpersonal.) in interpersonal exchanges, oral production can become pragmatically complex with the need to speak in a casual register and use colloquial language ellipsis, slang, humor, and other sociolinguistic conventions.

5. Extensive (monologue) .Extensive oral production tasks include speeches, oral presentations, and story- telling, during which the opportunity for oral interaction from listeners is either highly limited (perhaps to nonverbal responses) or ruled out altogether. Language style is frequently more deliberative (planning is involved) and formal for extensive tasks, but we cannot rule out certain informal monologues such as casually delivered speech ( for example, my vacation in the mountain,a recipe for outstanding pasta primavera, recounting the plot of a novel or movie).

B.MICRO- AND MACROSKILLS OF SPEAKING

A list of listening micro- macroskills enumerated the various components of listening that make up criteria for assessment. A similar list of speaking skills can be drawn up for the same purpose: to serve as a taxonomy of skills from which you will select one or several that will become the objective (s) of an assessment task. The microskills refer to producing the smaller chunks of language such as phonemes, morphemes, words, collocations, and phrasal units. The macroskills imply the speaker’s focus on the larger elements: fluency, discourse, function, style, cohesion, nonverbal communication, and strategic options. The micro- and macroskills total roughly 16 different objectives to assess in speaking.

Micro- and macroskills of oral production

Microskills

1. Produce differences among English phonemes and allophonic variants.

2. Produce chunks of language of different lengths.

3. Produce English stress pattern, word in stressed and unstressed positions, rythmic structure, and intonation contours.

4. Produce reduced forms of words and phrases.

5. Use an adequate number of lexical units (words) to accomplish pragmatic purposes.

6. Produce fluent speech at different rates of delivery.

7. Monitor one’s own oral production and use various strategic devices- pauses, fillers, self- corrections, backtracking- to enhance the clarity of the message.

8. Use grammatical word classes (nouns, verbs, etc.) systems (e.g.,tense, agreement, pluralization), word order, patterns, rules, and elliptical forms.

9. Produce speech in natural constituents: in appropriate phrases, pause groups, breath groups, and sentence constituents.

10. Express a particular meaning in different grammatical forms.

11. Use cohesive devices in spoken discourse.

Macroskills

12. Appropriately accomplish communicative functions according to situations, participants, and goals.

13. Use appropriate styles, registers, implicature, redundancies, pragmatic other conventions, conventions rules, floor- keeping and –yielding, interupting, and sociolinguistic features in face- to- face conversations.

14. Convey links and connections between events and communicate such relations as focal and periperal ideas, events and feelings, new information and given information, generalization and exemplification.

15. Convey facial features, kinesics, body language, and other nonverbal cues along with verbal language

16. Develop and use a battery of speaking strategies, such as emphasizing key words, rephrasing, providing a context for interpreting the meaning of words, appealing for help, and accurately assessing how well your interlocutor is understanding you.

As you consider designing tasks for assessing spoken language, these skills can act as a checklist of objectives. While the microskills have the appearance of being more complex than the microskills, both contain ingredientsof difficulty, depending on the stage and context of the test-taker.

There is such an array of oral production tasks that a complete treatment is almost impossible within the confines of one chapter in this book. Below is a consideration of the most common techniques with brief allusions to related tasks. as already noted in the introduction to this chapter, consider three important issues as you set out to design tasks:

1. No speaking task is capable of isolating the single skill of oral production. Concurrent involvement of the additional performance of aural comprehension, and possibly reading, is usually necessary.

2. Eliciting the specific criterion you have designated for a task can be tricky because beyond the word level, spoken language offers a number of productive options to test- takers. Make sure your elicitation prompt achieves its aims as closely as possible.

3. Because of the above two characteristics of oral production assessment, it is important to carefully specify scoring procedures for a response so that ultimately you achieve as high a reliability index as possible.

C.DESIGNING ASSESSMENT TASKS : IMITATIVE SPEAKING

You may be surprised to see the inclusion of simple phonological imitation in a consideration of assessment of oral production. After all, endless repeating of words, phrases, and sentences was the province of the long-since-discarded Audiolingual Method, and in an era of communicative language teaching, many believe that nonmeaningful imitation of sounds is fruitless. Such opinions have faded in recent years as we discovered that an overemphasis on fluency can sometimes lead to the decline of accurary in speech. And so we have been paying more attention to pronounciation, especially suprasegmentals, in an attempt to help learners be more comprehensible.

An occasional phonologically focused repetition task is warranted as long as repetition tasks are not allowed to occupy a dominant role in an overall oral production assessment, and as long as you artfully avoid a negative washback effect. Such tasks range from word level to sentence level, usually with each item focusing on a specific phonological criterion. In a simple repetition task, tes- takers repeat the stimulus, whether it is a pair of words, a sentence, or perhaps a question ( to test for intonation production).

Test- takers hear : Repeat after me:

beat [pause] bit [pause]

bat [pause] vat [pause] etc.

I bought a boat yesterday.

The glow of the candle is growing. etc.

When did they go on vacation?

Do you like coffee? etc.

Test- takers repeat the stimulus.

Word repetition task

A variation on such a task prompts test-takers with a brief written stimulus which they are to read aloud. ( In the section below on intensive speaking, some tasks are described in which test- takers read aloud longer texts). Scoring specifications must be clear in order to avoid reliability breakdowns. A common form of scoring simply indicates a two- or three-point system for each response.

2 acceptable pronounciation

1 comprehensible, partially correct pronounciation

0 silence, seriously incorrect pronounciation

Scoring scale for repetition tasks.

The longer the stretch of language, the more possibility for error and therefore the more difficult it becomes to assign a point system to the text. In such a case, it may be impretative to score only the criterion of the task. For example, in the sentence “ when did they go on vacation?” since the criterion is falling intonation for wh- questions, points should be awarded regardless of any mispronunciation.

D.PHONEPASS^®TEST

An example of a popular test that uses imitative (as well as intensive) production tasks is PhonePass, a widely used, commercially available speaking test in many countries. Among a number of speaking tasks on the test, repetition of sentences ( of 8 to 12 words) occupies a prominent role. It is remarkable that research on the PhonePass test has supported the construct validity of its repetition tasks not just for a test- taker’s phonological ability but also for discourse nd overall oral production ability ( Townshend et al., 1998; Bernstein et al., 2000; Cascallar & Bernstein, 2000).

The PhonePass test elicits computer-assisted oral production over a telephone. Test- takers read aloud, repeat sentences, say words, and answer questions. With a downloadable test sheet as a reference, test- takers are directed to telephone a designated number and listen for directions. The test has five sections.

Part A:

Test-takers read aloud selected sentences from among those printed on the test sheet. Examples :

1. Traffic is a huge problem in Southern California.

PhonePass^®test specifications

2. The endless city has no coherent mass transit system.

3. Sharing rides was going to be the solution o rush- hour traffic.

4. Most people still want to drive their own cars, though.

Part B :

Test-takers repeat sentences dictated over the phone. Examples : “Leave town on the next train”/

Part C:

Test- takers answer questions with a single word or a short phrase of two or three words. Example: :would you get water from a bottle or a newspaper?”

Part D:

Test- takers hear three words groups in random order and must link them in a correctly ordered entence. Example : was reading/ my mother/a magazine.

Part E:

Test –takers have 30 seconds to talk about their opinion about some topic that is dictated over the phone. Topics center on family, preferences, and choices.

Scores for the PhonePass test are calculated by a computerized scoring template and reported back to the test- taker within minutes. Six scores are given: an overall score between 20 and 80 and five subscores on the same scale that rate pronounciation.

The tasks on Part A and Part B of the PhonePass test do not extend beyond the level of oral reading and imitation. Parts C and D represent intensive speaking ( see the next section in this chapter). Section E is used only for experimental data-gathering and does not figure into the scoring. The scoring procedure has been validated againts human scoring with extraordinarily high reliabilities and correlation statistics (94 overall). Further, this ten- minute test correlateswith the ellaborate Oral Proficiency Interview (OPI, described later in this chapter) at . 75, indicating a very high degree of correspondence between the machine- scored PhonePass and the human- scored OPI (Bernstein at al.,2000).

The PhonePass findings could signal an increase in the future use of repetition and read-aloud procedures for the assessment of oral production.because a test- takers output is completely controlled, scoring using speech- recognition technology becomes achieveable and practical. As researches uncover the constructs underlying both repetition/read- aloud tasks and oral production in all its complexities, we will have access to more comprehension explanations of why such simple tasks appear to be reliable and valid indicators of very complex oral production proficiency. Here re some details on the PhonePass test.

Producer : Ordinate Corporation, Menlo Park, CA

Objective : To test oral production skills of non- native English speakers

Primary market : worldwide, primarily in workplace settings where employees require a comprehensible command of spoken English; secondarily in academic settings for placement and evaluation of students

Type : Computer – assisted telephone operated, with a test sheet.

Response modes : Oral, mostly repetition tasks

Specifications : (see above)

Time allocations : Ten minutes

Internet access : www.ordinate.com

PhonePass^® test

E.DESIGNING ASSESSMENT TASKS : INTENSIVE SPEAKING

At the intensive level, test- takers are prompted to produce short stretches of discourse ( no more than a sentence) through which they demonstrate linguistic ability at a specified level of language. Many tasks are “ cued” tasks in that they lead the test- taker into a narrow band of possibilities.

Part C and D of the PhonePass test fulfill the criteria of intensive tasks are they elicit certain expected forms of language. Antonyms like high and low, happy and sad are prompted so that the automated scoring mechanism anticipates only one word. The either/ or task of part D fulfills the same criterion. Intensive tasks may also be described as limited response tasks (Madsen, 1983), or mechanical tasks (Underhill, 1987), or what classroom pedagogy would label as controlled responses.

1.Directed Response Tasks

In this type of task, the test administrator elicits a particular grammatical form or a transformation of a sentence. Such tasks are clearly mechanical and not communicative, but they do require minimal processing of meaning in order to produce the correct grammatical output.

Test – takers hear : Tell me he went home.

Tell me that you like rock music.

Tell me that you aren’t interested in tennis.

Tell him to come to my office at noon.

Remind him what time it is.

Directed response

2.Read – Aloud Tasks

Intensive reading – aloud tasks include reading beyond the sentence level up to a paragraph or two. This technique is easily administrated by selecting a passage that incorporates test specs and by recording the test- taker’s recording the test- taker’s output; the scoring is relatively easy because all of the test – taker’s oral production is controlled. Because of the results of research on the PhonePass test, reading aloud may actually be a suprisingly strong indicator of overall oral production ability.

For many decades, foreign language programs have used reading passages to analyze oral production. Prator’s (1972) Manual of Anerican English Pronounciation included a “diagnostic passage” of about 150 words that students could read aloud into a tape recorder. Teachers listening to the recording would then rate students on a number of phonological factors ( vowels, dipthongs, consonants, consonant clusters, stress, and intonation) by completing a two – page diagnostic checklist on which all errors or questionable items were noted. These checklists ostensibly offered direction to the teacher for emphases in the course to come.

An earlier form of the Test of Spoken English (TSE^®, see below ) incorporated one read- aloud passage of about 120 to 130 words with a rating scale for pronounciation and fluency. The following passage is typical:

Despite the decrease in size- and, some would say, quality- of four cultural world, there still remain strong ddifferences between the usual British and American writing styles. The question is, how do you get our message across? English prose conveys its most novel ideas as if they were timeless truths, while American writing exaggerates; if you believe half of what is said, that’s enough. The former uses understatement; the latter, overstatement. There are also disadvantages to each characteristics approach. Readers who are used to being screamed at may not listen when someone chooses to whisper politely. At the same time, the individual whi is used to a quiet manner may reject a series of loud imperatives.

Read – aloud stimulus, paragraph length

The scoring scale for this passage provided a four- point scale for pronounciation and for fluency, as shown in the box below.

Pronounciation:

Points :

0.0-0.4 frequent phonetic errors and foreign stress and intonation patterns that cause the speaker to be unintelligible.

0.5-1.4 frequent phonemic errors and foreign stress and intonation patterns that cause the speaker to be occasionally unintelligible.

1.5-2.4 some consistent phonemic errors and foreign stress and intonation patterns, but the speaker is unintelligible.

2.5-3.0 Occasional non-native pronounciation errors, but the speaker is always intelligible.

Fluency :

Points :

0.0-0.4 speech is so halting and fragmentary or has such a non-native flow that intelligibility is virtually impossible.

0.5-1.4 Numerous non-native pauses and/ or a non-native flow that interfere With intelligibility.

Test of Spoken English scoring scale (1987, p. 10)

1.5-2.4 some non-native pauses but with a more nearly native flow so that the pauses do not interfere with intelligibity.

2.5-3.0 Speech is smooth and effortless, closely approximating that of a native speaker.

Such a rating list does not indicate how to gauge intelligibility, which is mentional in both lists. Such slippery terms slippery terms remind us that oral production scoring, even with the controls that reading aloud offers, is still an inexact science.

Underhill (1987, pp. 77-78) suggested some variations on the task of simply reading a short passage:

· Reading a scripted dialogue, with someone else reading the other part

· Reading sentences containing minimal pairs, for example:

Try not to heat/hit the pan too much.

The doctor gave me a bill/pill.

· Reading information from a table of chart

If reading aloud shows certain practical advantages (predictable output, practicality, realibility in scoring), there are several drawbacks to using this technique for assessing oral production. Reading aloud is somewhat inauthentic in that we seldom read anything aloud to someone else in the real world, with the exception of a parent reading to a child, occasionally sharing a written story with someone, or giving a scripted oral presentation. Also, reading aloud calls on certain specialized oral abilities that may not indicate one’s pragmatic ability to communicate orally in face-to-face contexts. You should therefore employ this technique with some caution, and certainly supplement it as an assessment task with other, more communicative procedures.

3.Sentence/Dialogue Completion Tasks and Oral Questionnaires

Another technique for targeting intensive aspects of language requires test- takers to read dialogue in which one speaker’s lines have been omitted. Test- takers are first given time to read through the dialogue to get its gist and to think about appropriate lines to fill in. Then as the tape, teacher, or test administrator produces one part orally, test-taker responds.

An advantages of this technique lies in its moderate control of the output of the test-taker. While individual variations in responses are accepted, the tecnique taps into a lerner’s ability to discren expectancies in a conversation and to produce sociolinguistically correct language. One disadvantages of this technique is its reliance on literacy and an ability to transfer easily from written to spoken English. Another disadvantages is the contrived, inauthentic nature of this task: Couldn’t the same criterion performance be elicited in a live interview in which an impromptu role-play technique is used?

Perhaps more useful is a whole host of shorter dialogues of two or three lines, each of which aims to elicit a specified target. In the following examples, somewhat unrelated items attempt to elicit the past tense, future tense, yes/no question formation, and asking for the time. Again, test-takers see the stimulus in written form.

Test-takers see

Interviewer: What did you do last weekend?

Test- taker:

Interviewer: What will you do after you graduate from this program?

Test- taker:

Test-taker: ?

Interviewer: I was in Japan for two weeks.

Test- taker: ?

Interviewer: It’s ten-thirty.

Test- takers respond with appropriate lines.

Directed response tasks

One could contend that performance on these items is responsive, rather than intensive. True, the discourse involves responses, but there is a degree of control here that predisposes the test- taker to respond with certain expected forms. Such arguments underscore the fine lines of distinction between and among the selected five categories.

It could also be argued that such techniques are nothing more than a written form of questions that might otherwise ( and more appropriately) be part of a standard oral interview. True, but the advantage that the written form offers is to provide a little more time for the test- taker to anticipate an answer, and it begins to remove the potential ambiguity created by aural misunderstanding. It helps to unlock the almost unbiquitous link between listening and speaking performance.

Underhill (1987) describes yet another technique that is useful for controlling the test- taker’s output: form-filling, or what I might rename “ oral questionnaire.” Here the test- taker sees a questionnaire that asks for certain categories of information (personal data, academic information, job experience, etc.) and supplies the information orally.

4.Picture – Cued Tasks

One of the most popular ways to elicit oral language performance at both intensive and extensive levels is a picture- cued stimulus that requires a description from the test- taker. Pictures may be very simple, designed to elicit a word or a phrase; somewhat more elaborate and “busy”; or composed of a series that tells a story or incident. Here is an example of a picture-cued elicitation of the production of a simple minimal pair. Grammatical categories may be cued by pictures. In the following sequences, comparatives are elicited

Notice that a little sense of humor is injected here: the family, bundled up in their winter coats, is looking forward to leaving the wintry scene behind them! A touch of authenticity is added in that almost everyone can identify with looking forward to a vacation on a tropical island. Assessment of oral production may be stimulated through a more elaborate picture such as the one the next page, a party scene. Moving into more open-ended performance, the following picture asks test- takers not only to identify certain specific information but also to elaborate with their own opinion, to accomplish a persuasive function, and to describe preferences in paintings. Maps are another visual stimulus that can be used to assess the language forms needed to give directions and specify locations. In the following example, the test- taker must provide directions to different locations.

5.Translation (of Limited Stretches of Discourse)

Translation is a part of our tradition in language teaching that we tend to discount or disdain, if only because our current pedagogical stance plays down its importance. Translation methods of teaching are certainly passé in a era of direct approaches to creating communicative classroom. But we should remember that in countries where English is not the native or prevailing language, translation is a meaningful communicative device in context where the English user is called on to be an interpreter. Also, translation is a well-proven communication strategy for learners of a second language.

Under certain constraints, then, it is not far-fetched to suggest translation as a device to check oral production. Instead of offering pictures or written stimuli, the test-taker is given a native language word, phrase, or sentence and is asked to translate it. Conditions may vary from expecting an instant translation of an orally elicited linguistic target to allowing more thinking time before producing a translation of somewhat longer texts, which may optionally be offered to the test-taker in written form. (Translation if extensive is discussed at the end of this chapter). As an assessment procedure, the advantages of translation lie in its control of the output of the test-taker, which of course means that scoring is more easily specified.

Chapter 2

A.DESIGNING ASSESSMENT TASKS: RESPONSIVE SPEAKING

Assessment of responsive task involves brief interactions with an interlocutor, differing from intensive tasks in the increased creativity given to the test-taker and from interactive tasks by the somewhat limited length of utterances.

1.Question and Answer

Question and answer tasks can consist of one or two questions from an interviewer, or they can make up an portion of a whole battery of questions and prompts is an oral interview. They can vary from simple questions like “What is the called in English?” to complex question like “ What are the steps governments should take, if any, to stem the rate of deforestation in tropical countries”? The first question is intensive in its purpose; it is a display question intended to elicit a predetermined correct response. We have already looked at some of these types of questions in the previous section. Questions at the responsive level tend to be genuine referential question in which the test-taker is given more opportunity to produce meaningful language in response.

In designing such questions for test-takers, it’s important to make sure that you know why you are asking the question. Are you simply trying to elicit stings of language output to gain a general sense of the test-taker’s discourse competence? Are you combining discourse and grammatical competence in the same question? Is each question just one in a whole set of related questions? Responsive questions may take the following forms:

Questions eliciting open-ended responses

Test-takers hear:

What do you think about the weather today?
What do you like about the English language?
Why did you choose your academic major?
What kind of strategies have you used to help you learn English?
a. Have you ever been to the United States before?

b. What other countries have you visited?

c. Why did you go there? What did you like best about it?

d. If you could go back, what would you to do or see?

e. What country would you like to visit next, and why?

Notice that question #5 has five situationally linked questions that may vary slightly depending on the test-taker’s response to a previous question.

Oral interaction with a test administrator often involves the latter forming all the questions. The flip side of this usual concept of question-and-answer tasks is to elicit questions from the test-taker.

A potentially tricky form of oral production assessment involves more than one test-taker with a interviewer, which is discussed later in this chapter. With two students in a interview context, both test-takers can ask questions of each other.

2.Giving Instruction and Directions

We are all called on in our daily routines to read instructions on how to operate an appliance, how to put a bookshelf together, or how to create a delicious clam chowder. Somewhat less frequent is the mandate to provide such instructions orally, but this speech act is still relatively common. Using such a stimulus in an assessment context provides an opportunity for the test-taker to engage in a relatively extended stretch of discourse, to be very clear and specific, and to use appropriate discourse markers and connectors. The technique is simple: the administrator poses the problem, and the test-taker respond. Scoring is based primarily on comprehensibility and secondarily on other specified grammatical or discourse categories. Here are some possibilities.

Eliciting instruction or directions

Test-taker hear:

Describe how to make a typical dish from your country.
What’s a good recipe for making________?
How do you access email on a PC computer?
How would I make a typical costume for a _____ celebration in your country?
How do you program telephone numbers into a cell (mobile) phone?
How do I get from _________to________ in your city?

Test-takers respond with appropriate instruction/directions.

Some pointers for creating such tasks: The test administrator needs to guard against test-takers knowing and preparing for such items in advance lest they simply parrot back a memorized set of sentences. An impromptu delivery of instructions is warranted here, or at most a minute or so of preparation time. Also, the choice of topics needs to be familiar enough so that you are testing not general knowledge but linguistic competence; therefore; topics beyond the content schemata of the test-taker are inadvisable. Finally, the task should require the test-taker to produce at least five or six sentences (of connected discourse) to adequately fulfill the objective.

This task can be designed to be more complex, thus placing it in the category of extensive speaking. If your objective is to keep the response short and simple, then make sure your directive does not take the test-taker down a path of complexity that he or she is not ready to face.

3.Paraphrasing

Another type of assessment task that can be categorized as responsive asks the test-taker to read or hear a limited number of sentences (perhaps two to five) and produce a paraphrase of the sentence. For example:

Paraphrasing a story

Test-takers hear: Paraphrase the following little story in your own words.

My weekend in the mountains was fabulous. The first day we backpacked into the mountains and climbed about 2,000 feet. The hike was strenuous but exhilarating. By sunset we found these beautiful alpine lakes and made camp there. The sunset was amazing beautiful. The next two days we just kicked back and did little day hikes, some rock climbing, bird watching, swimming, and fishing. The hike out on the next day was really easy-all downhill-and the scenery was incredible.

Test-takers respond with two or three sentence.

A more authentic context for paraphrase is aurally receiving and orally relaying a message. In the example below, the test-taker must relay information from a telephone call to an office colleague named Jeff.

Paraphrasing a phone message

Test-takers hear:

Please tell Jeff that I’m tied up in traffic so I’m going to be about a half hour late for the nine o’clock meeting. And ask him to bring up our question about the employee benefits plan. If he wants to check in with me on my cell phone, have him call 415-338-3095. Thanks.

Test-takers respond with two or three sentences.

The advantages of such tasks are that they elicit short stretches of output and perhaps tap into test-takers’ ability to practice the conversational art of conciseness by reducing the output/input ratio. Yet you have to question the criterion being assessed. Is it a listening task more than production? Does it test short-term memory rather than linguistic ability? And how does the teacher determine scoring of responses? If you use short paraphrasing tasks as an assessment procedure, it’s important to pinpoint the objective of the task clearly. In this case, the integration of listening and speaking is probably more at stake than simple oral production alone.

B.TEST OF SPOKEN ENGLISH (TSE^®)

Somewhere straddling responsive, interactive, and extensive speaking tasks lies another popular commercial oral production assessment, the Test of Spoken English (TSE). The TSE is a 20-minute audiotaped test of oral language ability within an academic or professional environment. TSE scores are used by many North American institutions of higher education to select international teaching assistants. The scores are also used for selecting and certifying health professionals such as physicians, nurses, pharmacists, physical therapists, and veterinarians.

The tasks on the TSE are designed to elicit oral production in various discourse categories rather than in selected phonological, grammatical, or lexical targets. The following content specifications for the TSE represent the discourse and pragmatic context assessed in each administration:

Describe something physical.
Narrate from presented material.
Summarize information of the speaker’s own choice.
Give directions based on visual materials.
Give instruction.
Give an opinion.
Support an opinion.
Compare/contrast.
Hypothesize.
Function “interactively”.
Define.

Using these specifications, Larazaton and Wagner (1996) examined 15 different specific tasks in collecting background data from native and non-native speakers of English.

1. Giving a personal description

2. Describing a daily routine

3. Suggesting a gift and supporting one’s choice

4. Recommending a place to visit and supporting one’s choice

5. Giving directions

6. Describing a favorite movie and supporting one’s choice

7. Telling a story from pictures

8. Hypothesizing about future action

9. Hypothesizing about a preventative action

10. Making a telephone call to the dry cleaner

11. Describing an important news event

12. Giving an opinion about animals in the zoo

13. Defining a technical term

14. Describing information in a graph and speculating about its implications

15. Giving details about a trip schedule

From their findings, the researchers were able to report on the validity of the tasks, especially the match between the intended task functions and the actual output of both native and non-native speakers.

Following is a set of sample items as they appear in the TSE Manual, which is downloadable from the TOEFL^® website (see reference on page 167).

Test of Spoken English sample items

Part A.

Test-takers see: A map of a town

Test-takers hear: Imagine that we are colleagues. The map below is of a neighboring

town that you have suggested I visit. You have 30 seconds to study

the map. Then I’ll ask you some questions about it.

Choose one place on the map that you think I should visit and give me some reason why you recommend this place.(30 seconds)
I’d like to see a movie. Please give me directions from the bus station to the movie theater.(30 seconds)
One of your favorite movies is playing at the theater. Please tell me about the movie and why you like it.(60 seconds)

Part B.

Test-takers see:

A series of six pictures depicts a sequence of event. In this series, painters have just painted a park bench. Their WET PAINT sign blows away. A man approaches the bench, sits on it, and starts reading a newspaper. He quickly discovers his suit has just gotten wet paint on it and then rushes to the dry cleaner.

Test-takers hear:

Now please look at the six pictures below. I’d like you tell me the story that the pictures show, starting with picture number 1 and going through picture number 6. Please take 1 minute to look at the pictures and think about the story. Do not begin the story until tell you to do so.

Tell me the story that the picture show. (60 seconds)
What could the painters have done to prevent this? (30 seconds)
Imagine that this happens to you. After you have taken the suit to the dry cleaners, you find out that you need to wear the suit the next morning. The dry cleaning service usually takes two days. Call the dry cleaner and try to persuade them to have the suit ready later today. (45 seconds)
The man in the pictures is reading a newspaper. Both newspaper and television news programs can be good sources of information about current events. What do you think are the advantages and disadvantages oh each of these sources? (60 seconds)

Part C.

Test-takers hear:

Now I’d like hear your ideas about a variety of topics. Be sure a say as much as you can in responding to each question. After I ask each question, you may take a few seconds to prepare your answer, and then begin speaking when you’re ready.

Many people enjoy visiting zoos and seeing the animals. Other people believe that animals should not be taken from their natural surroundings and put in zoos. I’d like to know what you think about is issue. (60 seconds)
I’m not familiar with your field of study. Select a term used frequently in your field and define it for me. (60 seconds)

Part D.

Test-takers see:

A graph showing an increase in world population over a half-century of time.

Test-takers hear:

This graph presents the actual and projected percentage of the world population living in cities from 1950 to 2010. Describe to me the information given in the graph. (60 seconds)
Now discuss what this information might mean for the future. (45 seconds)

Part E.

Test-takers see:

A printed itinerary for a one-day bus tour of Washington, D.C., on which four relatively simple pieces of information (date, departure time, etc.) have been crossed out by hand and new handwritten information added.

Test-takers hear:

Now please look at the information below about a trip to Washington, D.C., that has been organized for the members of the Forest City Historical Society. Imagine that you are the president of this organization. At the last meeting, you gave out a schedule for the trip, but there have been some changes. You must remind the members about the details of the trip and tell them about changes indicated on the schedule. In your presentation, do not just read the information printed, but present it as if you were talking to a group of people. You will have one minute the plan your presentation and will be told when to begin speaking. (90 seconds)

Holistic scoring taxonomies such as these imply a number of abilities that comprise “effective” communication and “competent” performance of the task. The original version of the TSE (1987) specified three contributing factors to a final score on “overall comprehensibility”: pronunciation, grammar, and fluency. The current scoring scale of 20 to 60 listed above incorporates task performance, function, appropriateness, and coherence as well as the form-focused factors. From reported scores, institutions are left to determine their own threshold levels of acceptability, but because scoring is holistic, they will one receive an analytic score of how each factor breaks down (see Douglas & Smith, 1997, for further information). Classroom teachers who propose to model oral production assessments’ after the task on the TSE must, in order to provide some washback effect, be more explicit in analyzing the various components of test-takers’ output,. Such scoring rubrics are presented in the next section.

Following a summary of information on the TSE:

Test of Spoken English (TSE^®)

Producer: Educational Testing Service, Princeton, NJ

Objectives: To test oral production skills of non-native English speakers

Primary market: Primarily used for screening international teaching assistants in universities in the United State; a growing secondary market is certifying health professionals in the United States

Type: Audiotaped with written, graphic, and spoken stimuli

Response modes: Oral tasks, connected discourse

Specification: (see sample items above)

Time allocation: 20 minute

Internet access: http:/www.toefl.org/tse/tseindx.html

B.DESIGNING ASSESSMENT TASKS: INTRACTIVE SPEAKING

The final categories of oral production assessment (interactive and extensive speaking) include tasks that involve relatively long stretches of interactive discourse (interviews, role plays, (speeches, telling longer stories, and extended explanations and translations). The obvious difference between the two sets of tasks is the degree of interaction with an interlocutor. Also, interactive tasks are what some would describe as interpersonal, while the final category includes more transactional speech events.

1.Interview

When “oral production assessment” is mentioned, the first thing that comes to mind is an oral interview: a test administrator and a test-taker sit down in a direct face-to-face exchange and proceed through a protocol of questions and directives. The interview, which may be tape-recorded for re-listening, is then scored on one or more parameters such as accuracy in pronunciation and/or grammar, vocabulary usage, fluency, sociolinguistic/ pragmatic appropriateness, tasks, accomplishment, and even comprehension.

Interviews can vary in length from perhaps five to forty-five minutes, depending on their purpose and context. Placement interviews, designed to get a quick spoken sample from a student in order to verify placement into a course, may need only five minutes if the interviewer is trained to evaluate the output accurately. Longer comprehensive interviews such as the OPI (see the next section) are designed to cover predetermined oral production context an may require the better part of an hour.

Every effective interview contains a number of mandatory stages. Two decades ago, Michael Canale (1984) proposed a framework for oral proficiency testing that has withstood the test of time. He suggested that test-takers will perform at their best if they are led through four stages:

Warm-up. In a minute or so of preliminary small talk, the interviewer directs mutual introductions, helps the test-taker become comfortable with the situation, apprises the test-taker of the format, and allays anxieties. No scoring of this phase takes place.
Level check. Through a series of preplanned questions, the interviewer stimulates the test-taker to respond using expected or predicted forms and functions. If, for example, from previous test information, grades, or other data, the test-taker has been judged to be a “Level 2” (see below) speaker, the interviewer’s prompts will attempt to confirm this assumption. The response may take very simple or very complex form, depending on the entry level of the learner. Questions are usually designed to elicit grammatical categories (such as past tense or subject-verb agreement), discourse structure (a sequence of event), vocabulary usage, and/or sociolinguistic factors (politeness conventions, formal/informal language). This stage could also give the interviewer a picture of the test-taker’s extroversion, readiness to speak, and confidence, all of which may be of significant consequence in the interview’s result. Linguistic target criteria are scored in this phase. If this stage is lengthy, a tape-recording of the interview is important.
Probe. Probe questions and prompts challenge test-takers to go to the heights of their difficult questions. Probe questions may be complex in their farming and/or complex in their cognitive and linguistic demand. Through probe items, the interviewer discovers the ceiling or limitation of the test-taker’s proficiency. This need not be a separate stage entirely, but might be a set of questions that are interspersed into the previous stage. At the lower levels of proficiency, probe items may simply demand a higher range of vocabulary or grammar from the test-taker than predicted. At the higher levels, probe items will typically as the test-taker to given an opinion or a value judgment, to discus his or her field of specialization, to recount a narrative, or to respond to questions that are worded in complex form. Responses to probe questions may be scored, or they may be ignored if the test-taker displays an inability to handle such complexity.
Wind-down. This final phase of the interview is simply a short period of time during which the interviewer encourages the test-taker to relax with some easy questions, set the test-taker’s mind at ease, and provides information about when and where to obtain the result of the interview. This part is not scored.

The suggested set of content specifications for an oral interview (below) may serve as sample questions that can be adapted to individual situations.

Oral interview content specifications

Warm-up:

Small talk

Level check:

The test-taker . . .

Answers wh-questions.
Produces a narrative without interruptions.
Reads a passage aloud.
Tells how to make something or do something.
Engages in a brief, controlled, guided role play.

Probe:

Test-taker. . .

Responds to interviewer’s questions about something the test-taker doesn’t know and is planning to include in a article or paper.
Talks about his or her own field of study or profession.
Engages in a longer, more open-ended role play (for example, simulates a difficult or embarrassing circumstance) with the interviewer.
Gives an impromptu presentation or some aspect or test-taker’s field.

Window-down:

Feelings about the interview, information or results, further question.

Here are some possible questions, probes, and comments that fit those specifications. Sample questions for the four stages of an oral interview

1. warm-up:

How are you?

What’s your name?

What country are you from? What [city town]?

Let me tell you about this interview.

2. Level check:

How long have you been in this [country, city]?

Tell me about your family.

What is your [academic major, professional interest, job]?

How long have you been working at your [degree, job]?

Describe your home [city, town] to me.

How do you like your home [city, town]?

What are you hobbies or interest? (What do you do in your spare time?)

Why do you like your [hobby, interest]?

Have you traveled to another country beside this one and your home country?

Tell me about that country.

Compare your home [city, town] to another [city, town].

What is your favorite food?

Tell me how to [make, do] something you know well.

What will you be doing ten years from now?

I’d like you to ask me some questions.

Tell me about an exciting or interesting experience you’ve had.

Read the following paragraph, please. [test-taker reads aloud]

Pretend that you are ______and I am a______. [guided role play follows]

Probe:

What are you goals for learning English in this program?

Describe your [academic field, job] to me. What do you like and dislike about it?

What is your opinion of [a recent headline news event]?

Describe someone you greatly respect, and tell me why you respect that person.

If you could redo your education all over again, what would you do differently?

How do eating habits and customs reflect the culture of the people of a country?

If you were [president, prime minister] of your country, what would you like to change about your country?

What career advice would you give to your younger friends?

Imagine you are writing an article on a topic you don’t know very much about. Ask me some questions about that topic.

You are in a shop that sells expensive glassware. Accidentally you knock over an expensive vase, and it breaks. What will you say to the store owner? [Interviewer role-plays the store owner]

Wind-down:

Did you feel okay about this interview?

What are you plans for [the weekend, the rest of today, the future]?

You’ll get your results from this interview [tomorrow, next week].

Do you have any questions you want to ask me?

It was interesting to talk with you. Best wishes.

The success of an oral interview will depend on

· Clearly specifying administrative procedures of the assessment (practicality),

· Focusing the questions an probes on the purpose of the assessment (validity),

· Appropriately eliciting an optimal amount and quality of oral production from the test-taker (biased for best performance), and

· Creating a consistent, workable scoring system (reliability).

The last issue is the thorniest. In oral production tasks that are open-ended and that involve a significant level of interaction, the interviewer is forced to make judgments that are susceptible to some unreliability. Thorough experience, training, and careful attention to the linguistic criteria being assessed, the ability to make such judgments accurately will be acquired. In Table 7.2, a set of descriptions is given for scoring open-ended oral interviewers. These descriptions come from an earlier version of the Oral Proficiency Interview and are useful for classroom purposes.

The test administrator’s challenge is to assign a score, ranging from 1 to 5, for each of six categories indicated above. It may look easy to do, but in reality the line of distinction between levels is quite difficult to pinpoint. Some training or at least a good deal of interviewing experience is required to make accurate assessments of oral production in the six categories. Usually the six scores are then amalgamated into one holistic score, a process that might not be relegated to a simple mathematical average if you wish to put more weight on some categories than you do on others.

This five-point scale, once known as “FSI levels” (because they were first advocated by the Foreign Service Institute in Washington, D.C.), is still in popular use among U.S. government foreign service staff for designating proficiency in a foreign language. To complicate the scoring somewhat, the five-point holistic scoring categories have historically been subdivided into “pluses” and “minuses” as indicated in Table 7.3. To this day, even though the official nomenclature has now changed (see OPI description below), in-group conversations refer to colleagues and co-workers by their FSI level: “Oh, Bob, yeah, he’s a good 3+ in Turkish- he can easily handle that assignment”.

A variation on the usual one-on-one format with one interviewer and one test-taker is to place two test-takers at a time with the interviewer. An advantage of a two-on-one interview is the practicality of scheduling twice as many candidates in the same time frame, but more significant is the opportunity for student-student interaction. By deftly posing questions, problems, and role plays, the interviewer can maximize the output of the test-takers while lessening the need or his or her own output. A further benefit is the probable increase in authenticity when to two test-takers can actually converse with each other. Disadvantages are equalizing the output between the two test-takers, discerning the interaction effect to unequal comprehension and production abilities, and scoring two people simulataneously.

2.Role Play

Role playing is a popular pedagogical activity in communicative language-teaching classes. As an assesment device,role play opens som windows of opportunity for test-takers to use discourse that might otherwise be difficult to elicit. How to do that? In this case, The test administrator must determine the assesment objectives of the role play,then devise a scoring technique that appropriately pinpoints those objectives.

3.Discussions and Conversations

As formal assesment devices, discussions and conversations are difficult to specify and score. But as informal techniques to assess learners,they offer a level of authenticity and spontaneity that other assesments may not provide. Discussions may be especially appropriate tasks through which to elicit and observe such abilities as:

Topic nomination,maintenance,and termination;
Attention getting,interrupting,floor holding,control;
Clarifying,questioning,paraphrasing;
Comprehension signals(nodding, “uh-huh”, “hmm”, etc. .) ;
Negotiating meaning;
Intonation patterns for pragmatic effect;
Kinesics,eye contact,proxemics,body language; and
Politeness,formality,and other sociolinguistic factors.

Assessing the performance of participants through scores or checklist should be carefully designed to suit the objectives of the observed discussion. Of course discussion is intergrative task and so it is also advisable to give some cognizance to comprehension performance in evaluatng learners.

4.Games

Among informal assesment devices are a variety of games that directly involve language production. Consider the following types:

Tinkertoy game
Crossword puzzles
Information gap grids
City maps.

As assesments,the key is to specify a set of criteria and a reasonably practical and reliable scoring method. The benefit of such an informal assesment may not be as much in a summative evaluation as in its formative nature, with washback for the students.

B.ORAL PROFICIENCY INTERVIEW (OPI)

The best known oral interview format is one that has gone through a considerable metamorphosis over the last half-century, the Oral Proficiency Interview. Originally known as the Foreign Servive Institute (FSI) test, the OPI is the result of historical progression of revisions under the auspices of several agencies, including the Educational Testing Service and the American Council on Teaching Foreign Language. The OPI is widely used across dozens of language around the world.

In a series of structured tasks ,the OPI is carefully designed to elicit pronunciation, fluency, and integrative abilities, sociolinguistic and cultural knowledge, grammar, and vocabulary. Performance is judged by the examiner to be at one of ten possible levels on the ACTFL-designated proficiency guidelines for speaking: Superior;Advanced-high,mid,low; intermediate-high,mid,low; Novice-high,mid,low.

Bachman (1988,p.149) pointed out that the validity of the OPI simply cannot be demonstrated “because it confounds abilities wiyh elicitation procedures in its design, and it provides only a single rating,which has no basis in either theory or research.”

Meanwhile, a great deal of experimentation continues to be conducted to design better oral proficiency testing methods(Bailey,1998;Young & He,1998). With on going critical attention to issues of language assessment in the years to come,we may able to solve some of the thorny problems of how best to elicit oral production in authentic contexts and to create valid and reliable scoring meth

C.DESIGNING ASSESSMENTS: EXTENSIVE SPEAKING

1.Oral Presentations

For oral presentations, a checklist or grid is a common means of scoring or evaluation. Holistic scores are tempting to use for their apparent practicallity,but they may obscure the variability of performance across several subcategories,especially the two major components of content and delivery.

A checklist is reasonably practical. Its reliability can vary if clear standards for scoring are not maintained. Its authenticity can be supported in that all of the items on the list contribute to an effective presentation. The washback of a checklist will be enhanced by written comments from the teacher,a conference with the teacher,peer evaluations using the same form,and self-assessment.

2.Picture-Cued Story-Telling

One of the most common techniques for eliciting oral production is through visual pictures,photographs,diagrams,and charts. At this level we consider a picture or a series of pictures as a stimulus for a longer story or description.

It’s always tempting to throw any picture sequence at test-takers and have them talk for a minute or more about the pictures. But as is true of every assessment of speakig ability,the objective of eliciting narrartive discourse needs to be clear. Your criteria for scoring also need to be clear about what you are hoping to assess. Refer back to some of the guidlines suggested under the section on oral interviews or the OPI for some general suggestions on scoring.

3.Retelling a Story,News Event

In this type of task,test-takers hear or read a story or news event that they are asked to retell about it. This differs from the paraphrasing task is a longer stretch of discourse and a different genre. The objectives in assigning such a task vary from listening comprehension of the original to production of a number of oral discourse features (communicating sequence and relationships of events,stress and emphasis patterns,”expression” in the case of a dramatic story), fluency, and interaction with the hearer. Scoring should meet the intended criteria,of course.

4.Translation (of Extended Prose)

The advantage of translating longer text is in the control of the content,vocabulary,and to some extend,the grammatical and discourse features. The disadvantage is that translation of longer text is a highly specialized skill for which some individuals obtain post-baccalaureate degrees. To judge nonspecialist’s oral language ability on such a skill maybe completely invalid,especially if the test takers have not engaged in translation at this level. Criteria for scoring should therefore take into account not only the purpose in stimulating a translation but the possibility of errors that are unrelated to oral production ability.

Chapter 3

ASSESSING WRITING

Not many centuries ago, writing was a skill that was the exclusive domain of scribes and scholars in educational or religious instituations. Almost every aspect of everyday like for ”common” people was carried out orally. Bussiness transactions, records, legal documents, political and military agreements – all were written by specialists whose vocation it was to render language into the written word. Today, the ability to write hsa become an indispensable skill in our global literate community. Writing skill, at least at rudimentary levels, is a necessary condition for achieving employment in many walks of life and is simply taken for granted in literate cultures.

In the field of second language teaching, only a half- century ago experts were saying that writing was primarily a convention for recording speech and for reinforcing grammatical and lexical features of language. Now we understand the uniqueness of writing as a skill with its own features and conventions. We also fully understand the diffficulty of learning to write “ well” in any language, even in our own native language. Every educated child in developed countries learns the rudiments of writing in his or her native language, but very few learn to express themselves clearly with logical, well developed organization that accomplishes an intended purpose. And yet we expect second language learners to write coherent essays with artfully chosen rhetorical and discourse devices!

With such a monumental goal, the job of teaching writing has occupied the attention of papers, articles, dissertations, books, and even separate professional journals exclusively devoted to writing in a second language. It follows logically that the assessment of writing is no simple task. As you consider assessing students’ writing ability, as usual you need to be clear about your objective or criterion. What it is you want to test: hand writing ability? Correct spelling? Writing sentences that are grammatically correct? Paragraph construction? Logical development of main idea? All of these, and more, are possible objectives. And each objective can be assessed through a variety of tasks, which we will examine in this chapter.

Before looking at specific tasks, we must scritinize the different genres of written language ( so that context and purpose are clear), types of writing ( so that stages of the development of writing ability are accounted for), and micro- and macroskills of writing ( so that objectives can be pinpointed precisely).

A.GENRES OF WRITTEN LANGUAGE

Assessment of reading listed more than 50 written language genres. The same clasification scheme is reformulated here to include the most common genres that a second langauge writer might produce, within and beyond the requirments of a curriculum. Even though this list is slightly shorter, you should be aware of the suprising multiplicity of options of written genres that second language learners need to acquire.

1. Academic Writing

Papers and general subject reports essays, copositions academically focused journals short- answer test responses technical reports (e.g.,lab reports) theses, dissertations

2. Job- related writing

Messages (e.g., phone messages) letters/emails memos (e.g., interoffice) reports (e.g., job evaluations, project reports) schedules, labels, signs advertisements, announcements manuals

3. Personal writing

Letters, emails, greeting cards, invitations messages, notes calendar entries, shopping lists, reminders financial documents (e.g., checks, tax forms, loan applications) forms, questionnaires, medical reports, immigration documants diaries, personal journals fiction (e.g., short stories, poetry)

Genres of writing

B.TYPES OF WRITING PERFORMANCE

Four categories of written performance that capture the range of written production are considered here. Each category resembles the categories defined for the other three skilss, but these categories, as always, reflect the uniqueness of the skill area.

Imitative. To produce written language, the learner must attain skills in the fundamental, basic tasks of writing letters, words, punctiations, and very brief sentences. This category includes the ability to spell correctly and to perceive phoneme- grapeme correspondences in the English spelling system. It ia a level at which learners are trying to master the mechanics of writing. At this stage, form is the primary if not exclusive focus, while context and meaning are of secondary concern
Intensive (controlled). Beyond the fundamentals of imitative writing are skills in producing appropriate vocabulary within a acontext, collocations and idioms, and correct grammatical features up to the length of a sentence. Meaning and context are of some importance in determining correctness and appropriateness, but most assessment tasks are more concerned with a focus on form, and are rather strictly controlled by the test design.
Responsive. Here, assessment tasks require learners to perform at a limited discourse level, connecting sentences into a paragraph and creating a logically connected sequence of two or three paragraphs. Task respond to pedagogical diirectives, lists of criteria, outlines, and other guidelines. Genres of writing include brief narratives and descriptions, short reports, lab reports, summaries, brief responses to reading, and interpretations of charts and graphs. Under specified conditions, the writer begins to exercise some freedom of choice among alternative forms of expression of ideas. The writer has mastered the fundamentals of sentence- level grammar is more focused on the discourse conventions that will achieve the objectives of the written text. Form- focussed attention is mostly at the discourse level, with a strong emphasis on context and meaning.
Extensive. Extensive writing implies successful management of all the processes and strategies of writing of all purposes, up to the length of an essay, a term paper, a major research project report, or even a thesis. Writers focus on achieving a purpose, organizing and developing ideas logically, using details to support or illustrate ideas, demonstrating syntactic and lexical variety, and in many cases, engaging in the process multiple drafts to achieve a final product. Focus on grammatical form is limited to occasional editing or proofreading of a draft.

C.MICRO- AND MACROSKILLS OF WRITING

We turn once again to a taxonomy of micro- and macroskills that will assist you in defining the ultimate criterion of an assessment procedure. The earlier microskills apply more appropriately to imitative and intensive types of writing task, while the macroskills are essential for the succesful mastery of responsive and extensive writing.

Microskills

1. Produce graphemes and orthographic patterns of English.

2. Produce writing at an efficient rate of speed to suit the purpose.

3. Produce an acceptable core of words and use appropriate word order patterns.

4. Use acceptable grammatical systems (e.g., tense, agreement, pluralization), patterns and rules.

5. Express a particular meaning in different grammatical forms.

6. Use cohesive devices in written discourse.

Macroskills

7. Use the rethorical forms and conventions of written discourse.

8. Appropriately accomplish the communicative functions of written texts according to form and purpose.

9. Convey links and connections between events, and communicative such relations as main idea, supporting idea, new information, given information, generalization, and exemplification.

Micro- and macroskills of writing

10. distinguish between literal and implied meanings when writing.

11. correctly convey culturally specific refernces in the context of the written text.

12. develop and use a battery of writing strategies, such as accurately assessing the audience’s interpretation, using prewriting devices, writing with fluency in the first drafts, using paraphrases and synonyms, soliciting peer and instructor feedback, and using feedback for revising and editing.

D.DESIGNING ASSESSMENT TASKS: IMITATIVE WRITING

With the recent worldwide emphasis on teaching English at young ages, it is tempting to assume that ecery English learner knows how to handwrite the Roman alphabet. Such is not the case. Many beginning- level English learners, from young children to older adults, need basic training in and assessment of imitative writing: the rudiments of forming letters, words, and simple sentences. We examine this level of writing first.

1.Tasks in [Hand] Writing Letters, Words, and Punctuation

First, a comment should be made on the increasing use of personal and laptop computers and handheld instruments for creating written symbols. Handwriting has the potential of becoming a lost art as even very young children are more and more likely to use a keyboard to produce writing. Making the shapes of letters and other symbols is now more a question of learning typing skills than of training the muscles of the hands to use a pen or pencil. Nevertheless, for all practical purposes, handwriting remains a skill of paramount importance within the larger domain of language assessment .

A limited variety of types of tasks are commonly used to asess a person’s ability to produce written letters and symbols. A few of the more common types are described here.

Copying. There is nothing innovative or modern about directing a test- taker to copy letters or words. The test- takerwill see something like the following:
Listening cloze selection tasks. these tasks combine dictation with a written script that has a relatively frequent deletion ratio (every fourth or fifth word, perhaps). The test sheet provides a list of missing words fro which the test- taker must select. The purpose at this stage is not to test spelling but to give practice in writing. To increase the difficulty, the list of words can be deleted, but then spelling might become an obstacle. Probes look like this:
Picture- cued tasks. familiar pictures are displayed, and test- takers are told to write the word that the picture represents. Assuming no ambiguity in identifying the picture (cat, hat, chair, table, etc.), no reliance is made on aural comprehension for successful completion of the task.
Form completion tasks. a variation on pictures is the use of a simple form (registration, application, etc.) that asks for name, address, phone number, and other data. Assuming, of course, that prior classroom instruction has focused on filling out such forms, this task becomes an appropriate assessment of simple tasks such as writting one’s name and address.
Converting numbers and abbreviations to words. Some tests have a section on which numbers are written – for example, hours of the day, dates, or schedules- and test- takers aare directed to write out the numbers. This task can serve as a reasonably reliable method to stimulate handwritten English. It lacks unthenticity, however, in that people rarely write out such numbers (except in writing checks), and it is more of a reading task (recognizing numbers) than a writing task. If you plan to use such a method, be sure to specify exactly what the criterion is , and then proceed with some caution. Converting abbreviations to words is more authentic: we actually do have occasions to write out days of the week, months, and words like street, boulevard, telephone, and April (months of course are often abbreviated with numbers). Test tasks may take this form:

2.Spelling Tasks and Detecting Phoneme-Grapheme Correspondences

A number of task types are in popular use to assess the ability to spell words correctly and to process phoneme- grapheme correspondences.

Spelling Tests. In a traditional, old-fashioned spelling test, the teacher dictates a simple list of words, one word at a time, followed by the word in a sentence, repeated again, with a pause for test- takers to write the word. Scoring emphasizes correct spelling. You can help to control for listening errors by choosing words that the students have encountered before- words that they have spoken or heard in their class.
Picture- cued tasks. pictures are displayed with the objective of focusing on familiar wordswhose spelling may be unpredictable. Items are chosen according to the objectives of the assessment, but this format is an opportunity to present some challenging words and word pairs: boot/book, read/reed, bit/bite, etc.
Multiple-choice techniques. Presenting words and phrases in the form of a multiple choice task risks croosing over into the domain of assessing reading, but if the items have a follow-up writing component, they can serve as formative reinforcement of spelling conventions. They might be more challenging with the addition of homonyms.
Matching phonetic symbols. If students have become familiar with the phonetic alphabet, they could be shown phonetic symbols and asked and write the correctly spelled word alphabetically. This works best with letters that do not have one- to – one correspondence with the phonetic symbol. In the sample below the answers, which of course do not appear on the test sheet, are included in brackets for your refernce.

Such a task risks confusing students who don’t recognize the phonetic alphabet or use it in their daily routine. Opinion is mixed on the value of using phonetic symbols at the literacy level. Some claim it helps students to perceive the relationship between phonemes and graphemes. Others caution againts using yet another system of symbols when the alphabet already poses a challenge, especially for adults for whom English is the only language they have learned to read or write.

E.DESIGNING ASSESSMENT TASKS: INTENSIVE (CONTROLLED) WRITING

This next level of writing is what second language teacher training manuals have for decades called controlled writing. It may also be thought of as form- focussed writing, grammar writing, or simply guided writing. A good deal of writing at this level is display writing as opposed to real writing: students produce language to display their competence in grammar, vocabulary or sentence formation, and not necessarily to convey meaning for an authentic purpose. The traditional grammar/ vocabulary test has plenty of display writing in it, since the response mode demonstrates only the test- taker’s ability to combine or use words correctly. No new information is passed on from one person to the other.

1.Dictation and Dicto-Comp

Dictation was described as an assessment of the integration of listening and writing, but it was clear that the primary skill being assessed is listening. Because of its response mode, however, it deserves a second mention in this chapter. Dictation is simply the rendition in writing of what one hears aurally, so it could be classified as an imitative type of writing, especially sine a proportion of the test-taker’s performance centers on correct spelling. Also, because the test- taker must listen to stretches of discourse and in the process insert punctuation, dictation of a paragraph or more can arguably be classified as a controlled or intensive form of writing. A form of controlled writing related to dictation is a dicto-comp. Here, a paragraph is read at normal speed, usually two or three times; then the teacher asks students to rewrite the paragraph from the best of their recollection. In one of several variations of the dicto-comp technique, the teacher, after reading the passage, distributes a handout with key words from the paragraph, in sequence, as cues for the students. In eiher case, the dicto-comp is genuinely classified as an intensive, if not a responsive, writing task. Test- takers must internalize the content of the passage, remember a few phrases and lexical items as key words, then recreate the story in their own words.

2.Grammatical Transformation Task

In the day of structural paradigms of language teaching with slot-filler techniques and slot substitution drills, the practice of making grammatical transformations –orally or in writing –was very popular. To this day, language teachers have also used this technique as an assessment task, ostensibly to measure grammatical competence. Numerous versions of the task are possible:

· Change the tenses in a paragraph

· Change full forms of verbs to reduced forms (contradictions)

· Change statements to yes/no or wh-questions.

· Change questions into statements.

· Combine two sentences into one using a relative pronoun.

· Change direct speech to indirect specch.

· Change from active to passive voice.

The list of possibilities is almost endless. The tasks are virtually devoid of any meaningful value. Sometimes test designers attempt to add authenticity by providing a context(“ Today Doug is doing all these things. Tomorrow he will do the same things again. Write about the what Doug wil do tomorrow by using the future tense”), but, this is just a backdrop for a written substitution task. One the positive side, grammatical transformation tasks are easy to administer and are therefore practical, quite high in scorer reliaability, and arguably tap into a knowledge of grammatical forms that will be performed through writing. If you are only interested in a person’s ability to produce the forms, then such tasks may prove to be justifiable.

3.Picture – Cued Tasks

a variety of picture –cued controlled tasks have been used in English classrooms aroun the world. The main advantage in this technique is in detaching the almost ubiquitous reading and writing connection and offering instead a nonverbal means to stimulative written responses.

Short sentences. A drawing of some simple action is shown; the test-taker writes a brief sentence.
Picture description. A somewhat more complex picture may be presented showing, say, a person reading on a couch, a cat under a table, books and pencils on the table, chairs aroud the table, a lamp next to the couch, and a picture on the wall over the couch. Test- takers are asked to described the picture using four of the following prepositions: on, over, under, next, to, around. As long as the prepositions are used appropriately; the criterion is considered to be met.
Picture sequence description. A sequence of three to six pictures depicting a story line can provide a suitable stimulus for written production. The pictures must be simple and unambigious because an open-ended task at the selective level would give tes- takers too many options. If writing the correct grammatical form of a verb is the only criterion, then some test items might include the simple form of the verb below the picture. The time sequence in the following task in intended to give writter some cues.

When these kinds of tasks are designed to be controlled, even at this very simple level, a few different correct responses can be made for each item in the sequence. If your criteria in this task are both lexical and grammatical choice, then you need to design a rating scale to account for variations between completely right and completely wrong in both categories.

Scoring scale for controlled writing

2 grammatically and lexically correct.

1 either grammar or vocabulary is incorrect, but not both.

0 both grammar and vocabulary are incorrect

4.Vocabulary Assessment Tasks

Most vocabulary study is carried out through reading. A number of assessments of reading recognition of vocabulary were discussed in the previous chapter: multiple-choice techniques, matching, picture- cued identification, cloze techniques, guessing the meaning of word in context, etc. The major techniques used to assess vocabulary are (a) defining and (b) using a word in a sentence. The latter is the more authentic, but even that task is constrained by a contrived situation in which the test- taker, usually in a matter of seconds, has to come up with an appropriate a sentence, which may or may not indicate that the test- taker” knows” the word:

Read (2000) suggested several types of items for assessment of basic knowledge of the meaning of a word, collocational possibilities, and derived morphological forms. His example centered on the word interpret , as follows:

Test- takers read:

1. Write two sentences, A and B. In each sentence, use the two words given.

A. Interpret, experiment .

B. Interpret, language .

2. Write three words that can fit in the blank.

To interpret a (n) i. .

ii .

iii .

3. Write the correct ending for the word in each of the following sentences:

Some who interprets is an interpret .

Something thet can be interpreted is interpret .

Someone who interprets gives an interpret .

Vocabulary writing tasks (Read, 2000, p. 179)

Vocabulary assessment is clearly form-focused in the above tasks, but the procedures are creatively linked by means of the target word, its collocations, and its morphological variants. At the responsive and extensive levels, where learners are called upon to create coherent paragraps, performance obviously becomes more authentic, and lexical choice is one of several possible components of the evaluation of extensive writing.

5.Ordering Tasks

On task at the sentence level may appeal to those who are fond of word games and puzzles: ordering (or recording ) a scrambled set of words into a correct sentence. Here is the way the item format appears.

Test-takers read :

Put the words below into the correct order to make a sentence:

1. Cold /winter/ is/weather/ the/in/the

2. Studying / what / you/ are

3. Next/ clock / the / the / is/ picture/ to

Test- takers write:

1. The weather is cold in the winter.

2. What are you studying?

3. The clock is next to the picture.

Recording words in a sentence

While this somewhat inauthentic task generates writing performance and may be said to tap into grammatical word-ordering rules, it presents a challenge to test- takers whose learning styles do not dispose the to logical- mathematical problem solving. If sentences are kept very simple (such as #2) with perhaps no more than four or five words, if only one possible sentence can emerge, and if students have practiced the technique in class, then some justification emerges. But once again, as is on many writing techniques, this task involves as much, if not more, reading performance as writing.

6.Short- Answer and Sentence Completion Tasks

Some types of short-answer tasks were discussed in Chapter 8 because of the heavy participation of reading performance in their completion. Such items range from very simple and predictable to somewhat more elaborate responses. Look at the range of possibilities.

Test- takers see:

1. Alicia : Who’s that?

Tony : Gina.

Alicia : where’s she from?

Tony : Italy.

2. Jennifer : ?

Kathy : I’m studying English.

3. Restate the following sentences in your own words, using the underlined word. You may need to change the meaning of the sentence alittle.

3a. I never miss a day of school. Always

3b. I’m pretty healthy most of the time. Seldom

3c. I play tennis twice a week. Sometimes

4. You are in the kitchen helping your roommate cook. You need to ask questions about quentities. Ask a question using how much (#4a) and a queation using how many (#4b), using nouns like sugar, pounds, flour, onions, eggs, cups.

4a. .

4b. .

5. Look at the schedule of Roberto’s week. Write two sentences describing what Roberto does, using the words before (#5a) and after (#5b).

5a. .

5b. .

6. Write three sentences describing your preferences: #6a: a big, expensive caror a small, cheap car; #6b: a house in the country or an apartment in the city;#6c: money or good health.

6a. .

6b. .

6c. .

Limited response writing tasks

The reading- writing connection is apparent in the first three item types but had less of an effect in the last three, where reading is necessary in order to understand the directions but is not crucial in creating sentences. Scoring on 2-1-0 scale (as described above) may be the most appropriate way to avoid self-arguing about the appropriateness of a response.

F.ISSUES IN ASSESSING RESPONSIVE AND EXTENSIVE WRITING

Responsive writing creates the opportunity for test-takers to offer an array of possible creative responses within a pedagogical or assessment framework: test- takers are :responding” to a prompt or assignment. Freed from the strict control of intensive writing, learners can exercise a number of option in choosing vocabulary, grammar, and discourse, but with some constraints and conditions. Criteria now begin to include the discourse and rhetorical conventions of paragraph structure and of connecting two or three such paragraphs in texts of limited length. The learner is responsible for accomplishing a purpose in writing, for developing a sequence of connected ideas, and for empathizing with an audience.

The genres of text that are typically addressed here are

Short reports (with structured formats and conventions);
Responses to the reading of an article or story;
Summaries of articles or stories;
Brief narratives or descriptions; and
Interpretations of graphs, tables, and charts.

It is here that writers become involved in the art ( and science) of composing, or real writing, as opposed to display writing.

Extensive, or “free,” writing, which is amalgamated into our discussion here, takes all the principles and guidelines of responsive writing and puts them into practice in longer texts such as full- length essays, term papers, project reports, and theses and dissertations, in extensive writing, however, the writer has been given even more freedom to choose: topics, length, style, and perhaps even conventions of formating are less constrained then in the typical responsive writing exercise. At this stage, all the rules of effective writing come into play, and the second language writer is expected to meet all the standards applied to native language writers.

Both responsive andextensive writing tasks are the subject of some classic, widely debated assessment issues that take on a distincly different flavor from those at the lower-end production of writing.

Authenticity. Authenticity is a trait that is given special attention: if test- takers are being asked to perform a task, its face and content validity need to be assured in order to bring out the best in the writer. A good deal of writing performance in academic contexts is contrained by the pedagogical necessities of establishing the basic building blocks of writing; we have looked at assessment techniques that address those foundations. But once those fundamentals are in place, the would-be writer is ready to fly out of the protectivev nest of the writing classroom and assume his or her own voice. Offering that freedom to learners requires the setting of authentic real-world contexts in which to write. The teacher become less of an instructor and more of a coach or facilitator. Assessment therefore is typically formative, not summative, and positive washback is more important than practically and reliability.
Scoring. Scoring is the thorniest issue at these final two stages of writing. With so many options available to leraner, each evaluation by a test administrator need to be finely attuned not just to how the writer strings words together ( the form) but also to what the writer is saying ( the function of the text). The quality of writing (ots important and effectiveness) becomes as important, if not more important, than all the nuts and bolths that hold it together. How are you to score such creative production, some of which is more artictic than scientific? A discussion of different scoring options will continue below, followed by a reminder that responding and editing are nonscoring options that yield washback to the writer.
Time. Yet another assessment issues surrounds the unique nature of writing: it is the only skill in which the language producer is not necessarily constrained by time, which implies the freedom to process multiple drafts before the texts becomes a finished product. Like a sculptor creating an image, the writer can take an initial rough conception of a text and continue to refine it until it is deemed presentable to the public eye. Virtually all real writing of prose texts presupposes an extended time period for it to reach its final form, and therefore the revising and editing processes are implied. Responsive writing, along with the next category of extensive writing, often relies on this essential drafting process for its ultimate success.

How do you assess writing ability within the cinfines traditional, formal assessment procedures that are almost always, by logistical necessity, timed? We have a whole testing industry that has based large-scale assessment of writing on the premise that the timed impromptu format is a valid method of assessing writing ability. Is this an authentic format? Can a language learner – or a native speaker, for that matter – adequately perform writing tasks within the confines of a brief timed period of composition? Is that hastily written product an appropriate reflection of what that same test- taker might produce after several drafts of the same work? Does this format favor fast writers at the expense of slower but possible equally good or better writers? Alderson( 2000) and Weigle (2002) both cited this as one of the most pressing unresolved issues in the assessment of writing today. We will return to this question below.

Because of the complexity of assesssing responsive and extensive writing, the discussion that ensues will now have a different look from the one used in the previous three chapters. Four major topics will be addressed : (1) a few fundamental task types at the lower (responsive) end of the continum of writing at this level; (2) a description and analysis of the Test of Written English^® as a typical timed impromptu test of writing;(3) a survey of methods of scoring and evaluating writing production; and (4) a discussion of the assessment qualities of editing and responding to a series of writing drafts.

G.DESIGNING ASSESSMENT TASKS: RESPONSIVE AND EXTENSIVE WRITING

In this section we consider both responsive and extensive writing tasks. they will be regarded here as a continuum of possibilities ranging from lower- end tasks whose complexity exceeds those in the previous category of intensive or controlled writing, through more open- ended tasks such as writing short reports, essays, summaries, and responses, up to texts of several pages or more.

1.Paraphrasing

One of the more difficult concepts for second language learners to grasp is paraphrasing. The initial step in teaching paraphrasing is to ensure that learners understand the importance of paraphrasing: to say something in one’s own words, to avoid plagiarizing, to ofer some variety and expression. With those possible motivations and purposes in mind, the test designer needs to elicit a paraphrase of a sentence or paragraph, usually not more.

Scoring of the test- taker’s response is the judgment call in which the criterion of conveying the same of similar message is primary, with secondary evaluations of discourse, grammar , and vocabulary. Other components of analytic or holistic scales might be considered as criteria for an evaluation. Paraprasing is more often a part of informal and formative assessment than of formal, summative assessment, and therefore student responses should be viewed as opportunities for teachers and students to gain positive washback on the art of paraphrasing.

2.Guided Question and Answer

Nother lower- order taskin this type of writing, which hasthe pedagogical benefit of guiding a learner without dictating the form of the output, is a guided question- and answer format in which the test administrator poses a series of questions that essentially serve as an outline of the emergent written text. In the writing of a narrative that the teacher has already covered in a class discussion, the following kinds of questions might be posed to stimulate a sequence of sentences.

Guided writing stimuli

1. Where did this story take place?[setting]

2. Who were the people in the story? [ characters]

3. What happened first? And then? [sequence of events]

4. Why did do ? [reasons, causes]

5. What did think about ? [opinion]

6. What happened at the end/ [climax]

7. What is the moral of this story?[evaluation]

Guided writing texts, which may be as long as two or three paragraphs, may be scored on either an anlytic or a holistic scale (discussed below). Guided writing prompts like these are less likely to appear on a formal test and more likely to serve as a way to prompt initial drafts of writing. This first draft can then undergo the editing and revising stages discussed in the next section of this chapter. A variation on using guided questions is to prompt the test- taker to write from an outline. The outline may be self- created from earlier reading and/ or discussion, or, which is less desirable, be provided by the teacher or test administrator. The outline helps to guide the learner through a presumably logical development of ideas that have been given some forethought. Assessment of the resulting text follows the same criteria listed below(#3 in the next section, paragraph construction tasks).

3.Paragraph Construction Tasks

The participation of reading performance is inevitable in writing effective paragraphs. To a great extent, writing is the art of emulating what one reads. You read an effective paragraph; you analyze the ingredients of its success ; you emulate it. Assessment of paragraph development takes on a number of different forms:

Topic sentence writing. There is no cardinal rule that says every paragraph must have a topic sentence, but the stating of a topic through the lead sentence (or a subsequent one) has remained as a tried – and- true technique for teaching the concept of a paragraph. Assessment there of consists of

· Specifying the writing of a topic sentence ,

· Scoring points for its presence or absence, and

· Scoring and/or commenting on its effectiveness in stating the topic.

Topic development within a paragraph. Because paragraphs are intended to provide a reader with “clusters” of meaningful, connected thoughts or ideas, an other stage of assessment is development of an idea within a paragraph. Four criteria are commonly applied to assess the quality of a paragraph:

· The clarity of expression of ideas

· The logic of the sequence and connections

· The cohesiveness or unity of the paragraph

· The overall effectiveness or impact of the paragraph as a whole

Development of main and supporting ideas across paragraphs. As writers string two or more paragraphs together in a longer text (and as we move up the continuum from responsive to extensive writing), the writer attempts to articulate a thesis or main idea with clearly stated supporting ideas. These elements can be considered in evaluating a multi- paragraph essay:

· Addressing the topic, main idea, or principal purpose

· Organizing and developing supporting ideas

· Using appropriate details to undergrid supporting ideas

· Showing facility and fluency in the use of language

· Demostrating syntactic variety

4.Strategic Options

Developing main and supporting ideas is the goal for the writer attempting to create an effective text, whether a short one- to two- paragraph one or an extensive one of several pages. A number of strategies are commonly taught to second language writers to accomplish their purposes. Aside from strategies of freewriting, outlining, drafting, and revising, writers need to be aware of the task that has been demanded and to focus on the genre of writing and the expectations of that genre.

1. Attending to task. In responsive writing, the context is seldom completely open-ended : a task has been defined by the teacher or test administrator, and the writer must fulfill the criterion of the task. Even in intensive writing of longer text, a set of directives has been stated by the teacher or is implied by the conventions of the genre. Four types of the tasks are commonly addressed in academic writing courses: compare/contrst, problem/solution,pros/cons, and cause/effect. Depending on the genre of the text, one or more of these task types will be needed to achieve the writer’s purpose. If students are asked, for example, to” agree or disagree with the author’s statement”, a likely strategy would be to cited pros and cons and then take a stand. A task that asks students to argue for one among several political candidates in an election might be an ideal compare-and-contrastcontext, with an appeal to problems present in the constituentcy and the relative value of candidates’ solution. Assessment of the fulfillment of such tasks could be formative and informal (comments in marginal notes, feedback in a conference in an editing/revising stage), but the product might also be assigned a holistic or analytic score.

2. Attending to genre. The genres of writing that were listed at the beginning of this chapter provide some sense of the many varieties of text that may be produced by a second language learner in a writing curriculum. Another way of looking at the strategic options open to a writer is the extent to which both the constrains and the opportunities of the genre are exploited. Assessment of any writing necessitates attention to the conventions of the genre in question. Assessment of the more common genres may include the following criteria, along with choosen factors from the liat in item #3 ( main and supporting ideas) above:

Reports ( Lab Reports, Project Summaries, Article/Book Reports,etc.)

· Conform to a conventional format (for this case, field)

· Convey the purpose, goal, or main idea

· Organize details logically and sequentially

· State conclusions or findings

· Use appropriate vocabulary and jargon for the specific case

Summaries of Readings/Lectures/Videos

· Effectively capture the main and supporting ideas of the original

· Maintain objectivity in reporting

· Use writer’s own words for the most part

· Use quotations effectively when appropriate

· Omit irrelevent or marginal details

· Conform to an expected length

Responses to Readings/Lectures/Videos

· Accurately reflect the message or meaning of the original

· Appropriately select supporting ideas to respond to

· Express the writer’s own opinion

· Defend on support that opinion effectively

· Conform to an expected length

Narration, Description, Persuasion/Argument, and Exposition

· Follow expected conventions for each type of writing

· Convey purpose, goal, or main idea

· Use effective writing strategies

· Demonstrate syntactic variety and rhetorical fluency

Interpreting statistical, Graphic, or Tabular Data

· Provides an effective global, overall description of the data

· Organizes the details in clear, logical language

· Accurately conveys details

· Appropriately articulates relationships among elements of the data

· Conveys specialized or complex data comprehensibly to a lay reader

· Interprets beyond the data when appropriate

Library Research Paper

· states purpose or goal of the reserach

· includes appropriate citations and references in correct format

· accurately represents others’ research findings

· injects writer’s own interpretation, when appropriate, and justifies it

· Includeds suggestions for futher research

· Sums up findings in a conclusion

Chapter 4

A.TEST OF WRITTEN ENGLISH

One of a number of internationally available standardized tests of writing ability is the Test of Written English (TWE). Established in 1986, the TWE has gained a repetation as a well-respected measure of written English, and a number of research articles support its validity (Frase et al.,1999;Hale et al.,1996;Longford,1996;Myford et al.,1996). In 1998, a computer-delivered version of the TWE was incorporated into the standard computer-based TOEFL, and simply labeled as the “writing” section of the TOEFL. The TWE is still offered as a separate test especially where only the paper-based TOEFL is available. Correlations between the TWE and TOEFL scores (before TWE became a standard part of TOEFL) were consistently high, ranging from 57 to 69 over 10 test administrations from 1993 to 1995. Data on the TWE are providedat the end of this section.

The TWE is in the category of a timed impromptu test in that test- takers are under a 30- minute time limit and are not able to prepare ahead of time for the topic that will appear. Topics are prepared by a panel of experts following specifications for topics that represent commonly used discourse and thought patterns at the university level. Here are some sample topics published on the TWE website.

1. Some people say that the best preparation for life is learning to work with others and be cooperative. Others take the opposite view and say that learning to be competitive is the best preparation. Discuss these positions, using concrete examples of both. Tell which one you agree with and explain why.

2. Some people believe that automobile are useful and necessary. Others believe that automobiles cause problems that affect our health and well- being. Which position do you support? Give specific reasons for your answer.

3. Do you agree or disagree with the following statement?

Teachers should make learning enjoyable and fun for their students.

Use reasons abnd specific examples to support your opinion.

Sample TWE^®topics

Test preparation manuals such as Deborah Phillips’s Longman Introductory Course for the TOEFL Test (2001) advise TWE test-takers to follow six steps to maximize success on the test:

1. Carefully identify the topic.

2. Olan your supporting ideas.

3. In the introductory paragraph, restate the topic and state the organizational plan pf the essay

4. Write effective supporting paragraphs show transitions, include a topic sentence’, specify details).

5. Restate your position and summarize in the concluding paragraph.

6. Edit sentence structure and rhetorical expression.

The scoring guide for the TWE follows a widely accepted set of specifications for a holistic evaluation of an essay ( see below for more discussion of holistic scoring). Each point on the scoring system is defined by a set of statements that addres topic, organization and development , supporting ideas, facility (fluency, naturalness, appropriateness) in writing, and grammatical and lexical correctness and choice. Each essay is scored by two trained readers working independently. The final score assigned is the mean of the two independent ratings. The test-taker can achieve a score ranging from 1 to 6 with possible half-points (e.g., 4.5, 5.5) in between. In the case of a discrepancy of more than one point, a third readerresolves the difference. Discrepancy rates are extremely low, usually ranging from 1 to 2 percent per reading. It is important to put tests like the TWE in perspective. Timed impromptu tests have obvious limitations if you are looking for authentic sample of performance in a real -world context. How many times in real-world situations (other than in academic writing classes!) will you be asked to write an essay in 30 minutes? Probably never, but the TWE and other standarized timed tests are not intended to mirror the real world. Instead, they are intended to elicit a sample of writing performance that will be indicative of a peson’s writing ability in the real world. TWE designers sought to validate a feasible timed task that would be manaageable within their constraints and at the same time offer useful information about the test-taker.

How does the Educational Testing Service justify the TWE as such an indicator?Research by Hale et al. (1996) showed that the prompts used in the TWE approximate writing tasks assigned in 162 graduate and undergraduate courses across several disciplines in eight universities. Another study (Golub-Smith et al.,1993) ascertained the reliabilities across several types of prompts (e.g., compare/ contrast vs. Chart-graph interpretation). Both Myford et al. (1996) and Longford (1996) studied the reabilities of judges’ ratings. The question of whether a mere 30-minute time period is sufficient to elicit a sufficient sample of a test-taker’s writing was addressed by Hale (1992). Henning and Cascallar (1992) conducted a large-scale study to assess to extent to which TWE performance taps into the communicative competence of the test-taker. The upshot of this research –which is updated regularly- is that the TWE (which adheres to a high standard of excellence in standarized testing) is, within acceptable standard error ranges, a remarkably accurate indicator of writing ability.

The flip side of this controversial coin reminds us that standarized tests are indicators, not fail-safe, infallible measures of competence. Even though we might need TWE scores for the administrative purposes of admissions or placement, we should not rely on such tests for instructional purposes . no one would suggest that such 30-minute writing tests offer constructive feedback to the student, nor do they provide the kind of formative assessment that a process approach to writing brings. Tests like the TWE are administrative necessities in a world where hundreds or thousands of applicants must be evaluated by some means short of calculating their performance across years of instruction in academic writing. The convenience of the TWE should not lull administrators into believing that TWEs and TOEFLs and the like are the only measures that should be applied to students. It behooves admissions and placement officers worldwide to offer secondary measures of writing ability to those test-takers who

a. Are on the threshold of a minimum score,

b. May be disabled by highly tim- constrained or anxiety-producing situations,

c. Could be culturally disadvantaged by a topic or situation, and/or

d. (in the case of computer- based writing) have had few opportunities to compose on a computer.

While timed impromptu tests suffer from a lack of authenticity and put test- takers into an artificially time- constrained context, they necertheless offer intertesting, relevant information for an important but narrow range of administrative purposes. The classroom offers a much wider set of options for creating real-world writing purposes and contexts. The classroom becomes the locus of extended hard work and effort for building the skills necessary to create written production. The classroom provides a setting for writers, in a process of multiple drafts and revisions, to create a final, publicly acceptable product. And the classroom is a place where learners can take all the small steps, at their own pace, toward becoming proficient writers. For your reference, following is some information on the TWE:

Producer: Educational Testing Service (ETS), Princeton, NJ

Objective: To test written expression

Primary market: Almost exclusively U.S. universities and colleges for admission purposes

Type Computer-based, with the TOEFL. A traditional paper- based (PB) version is also available separately.

Response modes: Written essay

Specifications: (see above, in this section)

Time allocation: 30 minutes

Internet access: http://www.toefl.org/educator/edabttwe.html

Test of Written English (TWE^®)

B.SCORING METHODS FOR RESPONSIVE AND EXTENSIVE WRITING

At responsive and extensive levels of writing, three major approaches to scoring writing performance are commonly used by test designers: holistic, primary trait, and analytical. In the first method, a single score is assigned to an essay, which represents a reader’s general overall assessment. Primary trait scoring is a variation of the holistic method in that the achievement of the primary purpose, or trait, of an essay is the only factor rated. Analytical scoring breaks a test- taker’s written text down into a number of subcategories (organization, grammar, etc.) and gives a separate rating for each.

1.Holistic Scoring

The TWE scoring scale above is a prime example of holistic scoring. In this chapter 7, a rubric for scoring oral production holistically was presented. Each point on a holistic scale is given a systematic set of descriptors to arrive at a score. Descriptors usually (but not always) follow a prescribed pattern. For example, the first descriptor across all score categories may address the quality of task achievement, the second may deal with organization, the third with grammatical or rethorical considerations, and so on. Scoring, however, is truly holistic in that those subsets are not quantitatively added up to yield a score.

Advantages of holistic scoring include

· Fast evaluation,

· Relatively high inter-rater reliability,

· The fact that scores represent “standards” that are easily interpreted by lay persons,

· The fact that scores tend to emphasize the writer’s strengths (Cohen, 1994, p.135), and

· Applicability to writing across many different disciplines.

Its disadvantages must also be weighed into a decision on whether to use holistic scoring:

· One scores masks differences across the subskills within each score.

· No diagnostic information is available (no washback potential).

· The scale may not apply equally well to all genres of writing.

· Raters need to be extensively trained to use the scale accurately.

In general, teachers and test designers lean toward holistic scoring only when it is expedient for administrative purposes. As long as trained evaluators are in place, differenttiation across six levels may be quite adequate for admissioninto an institution or placement into courses. For classroom instructional purposes, holistic scores provide very little information. In most classroom settings where a teacher wishes to adapt a curriculum to the needs of a particular group of students, much more differentiated information across subskills is desirable than is provided by holistic scoring.

2.Primary Trait Scoring

A second method of scoring, primary trait, focuses on “how well students can write within a narrowly defined range of discourse” (Weigle,2002,p. 110). This type of scoring emphasizes the task at hand and assigns a score based on the effectivelness of the text’s achieving that one goal. For example, if the purpose or function of an essay is to persuade the reader to do something, the score for the writing would rise or fall on the accomplishment of that function.if a learner is asked to exploit the imaginative fuction of language by expressing personal feelings, then the response would be evaluated on that feature alone. For rating the primary trait of the text, Lloyd-Jones (1997) suggested a four- point scale ranging from zero (no response or fragmented response) to 4 ( the purpose is unequivocally accomplished in a convincing fashion). It almost goes without saying that organization, supporting details, fluency, syntactic variety, and other features will implicitly be evaluated in the process of offering a primary trait score. But the advantage of this method is that it allows both writer and evaluator to focus on function. In summary, a primary trait score would assess

· The accuracy of the account of the original (summary),

· The clarity of the steps of the procedure and the final result (lab report),

· The description of the main features of the graph (graph description), and

· The expression of the writer’s opinion (response to an article).

3.Analytic Scoring

For classroom instruction, holistic scoring provides little washback into the writer’s further stages of learning. Primary trait scoring focusses on the principal function of the text nad therefore offers some feedback potential, but no washback for any of the aspects of the written production that enhance the ultimate accomplishment of the purpose. Classroom evaluation of learning is best served through analytic scoring, in which as many as six majors elements of writing are scored, thus enabling learners to home in on weakness and to capitalize on strengths.

Analytic scoring may be more appropriately called analytic assessment in order to capture its closer association with classroom language instruction than with formal testing. Brown and Bailey (1984) designed an analytic scoring scale that specified five major categories and a description of five different levels in each category, ranging from “unacceptable” to “ excellent”. At first glance, Brown and Bailey’s scale may look similar to the TWE^®holistic scale discussed earlier: for each scoring category there is a description that encompasses several subsets. A closer inspection, however, reveals much more detail in the analytic method. Instead of just six descriptions, there are 25, each subdivided into a number of contributing factors. The order in which the five categories (organization, logical development of ideas, grammar, punction/spelling/mechanics, and style and quality of expression) are listed may bias the elevaluator toward the greater importance of organization and logical development as opposed to punction and style. But the mathematical assignment of the 100-point scale gives equal weight ( a maximum of 20 points) to each of the five major categories. Not all writing and assessment specialists agree.

Table 9.2. Analytic scale for rating composition tasks (Brown & Bingley, 1984, pp. 39-41)

	20-18 Exellent to Good	17 – 15 Good to Adequate	14 – 12 Adequate to Fair	11 – 6 Unacceptable - not	5 – 1 College – level work
I. Organization : Introduction, Body, and conclusion	Appropriate title, effective Introductory paragraph, topic is stated, leads to body; transitional expressions used; arrangement of material shows plan (could be outlined by reader); supporting evidence given for generalizations; conclusion logical and complete	Adequate title, Introduction, and conclusion; body of essay is acceptable, but some evidence may be lacking, some ideas aren’t fully developed; sequence is logical but transitional expressions may be absent or misused	Mediocre or scant introduction or conclusion; problems with the order of ideas in body; the generalizations may not be fully supported by the evidence given; problems of organization interfere	Shaky or minimally recognizable introduction; organization can barely be seen; severe problems with ordering of ideas; lack of supporting evidence; conclusion weak or illogical; inadequate effort at organization	Absence of introduction or conclusion; no apparent organization of body; severe lack of supporting evidence; writer has not made any effort to organize the composition (could not be outlined by reader)
II. Logical development ideas: Content	Essay addresses the assigned topic; the ideas are concrete and thoroughly developed; no extraneous material; essay reflects thought	Essay addresses the issues but misses some points; ideas could be more fully developed; some extraneous material is present	Development of ideas not complete or essay is somewhat off the topic; paragraphs aren’t divided exactly right	Ideas incomplete; essay does not reflect careful thinking or was hurriedly written; inadequate effort in area of content	Essay is completely inadequate and does not reflect college- level work; no apparent effort to consider the topic carefully
III. Grammar	Native- like fluency in English grammar; correct use of relative clauses, prepositions, modals, articles, verb forms, and tense sequencing; no fragments or run- on sentences	Advanced proficiency in English grammar; some grammar problems don’t influence communication , although the reader is aware of them; no fragments of run-on sentences	Ideas are getting through to the reader, but grammar problems are apparent and have a negative effect on communication; run- on sentences or fragments present	Numerous serious grammar problems interfere with communication of the writer’s ideas; grammar review of some areas clearly needed; difficult to readsentences	Severe grammar problems interfere greatly with the message; reader can’t understand what the writer was tryinng to say; unintelligible sentence structure
IV. Punctuation, spelling, and mechanics	Correct use of English writing conventions: left and right margins, all needed capitals, paragraphs indented , punctuation and spelling very neat	Some problems with writing conventions or punctuation; occasional spelling errors; left margin correct; paper is neat and legible	Uses general writing but has errors; spelling problems distract reader; punctuation errors interfere with ideas	Serious problems with format of paper; parts of essay not legible; errors in sentence punctuation and final punctuation; unacceptable to educated readers	Complete disregrard for English writing conventions; paper illegible; obvious capitals missing, no margins, severe spelling problems
V. Style and quality of expression	Precise vocabulary usage; use of parallel structures; concise; register good	Attempts variety; good vocabulary; not wordy; register OK; style fairly concise	Some vocabulary misused; lacks awareness of register; may be too wordy	Poor expression of ideas; problems in vocabulary; lacks variety of structure	Inappropriate use of vocabulary; no concept of register or sentence variety

Content 30

Organization 20

Vocabulary 20

Syntax 25

Mechanics 5

Total 100

As your curricular goals and students’ needs vary, your own analytical scoring of essays may be appropriately tailored. Level of proficiency can make a significant difference in emphasis: at the intermediate level, for example, you might give more emphasis to syntax and mechanics, while advanced levels of writing may call for a strong push toward organization and development. Genre can also dictate variations in scoring. Would a summary of an article require the same relative emphases as a narrative essay? Most likely not. Certain types of writing, such as lab report or interpretations of statistical data, may even need additional – or at leat redefined- categories in order to capture the essential components of good writing within those genres. Analytic scoring of compisitions offers writers a little more washback than a single holistic or primary trait score. Scores in five or six major elements will help to call the writer’s attention to areas of needed improvement. Practically is lowered in that more time is required for teachers to attend to details within each of the categories in order to render a final score or grade, but ultimately students receive more information about their writing. Numerical scores alone, however, are still not sufficient for enabling students to become proficient writers, as well shall see in the next section.

C.BEYOND SCORING : RESPONDING TO EXTENSIVE WRITING

Formal testing carries with it the burden of designing a practical and reliable instrument that assess its intended criterion accurately. To accomplish that mission, designers of writing tests are charged with the task of providing a “ objective” a scoring procedure as possible, and one that in many cases can be easily interpreted by agents beyond the learner. Holistic, primary trait, and analytic scoring all satisfy those ends. Yet beyond mathematically calculated scores lies a rich domain of assessment in which a developing writer is coached from stage to stage in a process of building a storehouse of writing skills. Here in the classroom, in the tutored relationships of teacher and student, and in the community of peer learners, most of the hard work of assessing writing is carried out. Such assessment is informal, formative, and replete with washback.

Most writing specialists agree that the best way to teach writing is a hands- on approach that stimulates student output and then generates a series of self-assessments, peer editing and revision, and teacher response and conferencing (Raimes, 1991,1998; Reid, 1993; Seow, 2002). It is not an approach that relies on a massive dose of lecturing about good writing, nor on memorizing a bunch of rules about rhetorical organization, nor on sending students home with an assignment to turn in a paper the next day. People become good writers by writing and speaking the facilitative input of others to refine their skills. Assessment takes on a crucial role in such an approach. Learning how to become a good writer places the student in an almost constant stage of assessment. To give the student the maximum benefit of assessment, it is important to consider (a) earlier stages (from freewriting to the first draft or two) and (b) later stages (revising and finalizing) of producing a written text. A further factor in assessing writing is the involvement of self, peers, and teacher at appropriate steps in the process. ( for further guidelines on the process of teaching writing, see TBP, Chapter 19.)

1.Assessing Initial Stages of the Process of Composing

Following are some guidelines for assessing the initial stages (the first draft or two) of a written composition. These guidelines are generic for self, peer, and teacher responding. Each assessor will need to modify the list according to the level of the learner, the context, and the purpose in responding.

1. Focus your efforts primarily on meaning, main idea, and organization.

2. Comment on the introductory paragraph.

3. Make general comments about the clarify of the main idea and logic or

Assessment of initial stages in composing

appropriateness of the organization.

4. As a rule of thumb, ignore minor (local) grammatical and lexical errors.

5. indicate what appear to be major (global) errors (e.g, by underlining the text in question), but allow the writer to make corrections.

6. do not rewrite questionable, ungrammatical, or awkward sentences; rather, probe with a question about meaning.

7. comment on features that appear to be irrelevant to the topic.

The teacher-assessor’s role is as a guide, a facilitator, and an ally; therefore, assessment at this stage of writing needs to be as positive as possible to encourage the writer. An early focus on overall structure and meaning will enable writers to clarify thair purpose and plan and will set a framework for the writers’ later refinement of the lexical and grammatical issues.

2.Assesing Later Stages of the Process of Composing

Once the writer has determined and clarified his or her purpose and plan, and has completed at least one or perhaps two drafts, the focus shifts toward “fine tuning” the expression with a view toward a final revision. Editing and responding assume an appropriately different character now, with these guidelines:

1. Comment on the specific clarity and strength of all main ideas and supporting ideas, and on argument and logic.

2. Call attention to minor (“local”) grammatical and mechanical (spelling, punctuation) errors, but direct the writer to self-correct.

3. Comment on any further word choices and expressions that may not be awkward but are not as clear or direct as they could be.

4. Point out any problems with cohesive devices within and across paragraphs.

5. If appropriate, comment on documentation, citation of sources, evidence, and other support.

6. Comment on the adequacy and strength of the conclusion.

Assessment of later stages in composing

Through all these stages it is assumed that peers and teacher are both responding to the writer through conferencing in person, electronic communication, or, at the very least, an exchange of papers. The impromptu timed tests and the methods of scoring discussed earlier may appear to be only distantly related to such an individualized process of creating a written text, but are they, in reality? All those developmental stages may be the preparation that learners need both to function in creative real-world writing tasks and to succesfully demonstrate their competence on a timed impromptu test. And those holistic scores are after all generalizations of the various components of effective writing. If the hard work of succesfully progressing through a semester or two of a challenging course in academic writing ultimately means that writers are ready to function in their real-world contexts, and to get a 5 or 6 on the TWE, then all the effort was worthwhile.

REFERENCES

Brown, Douglas. H. 2004. LANGUAGE ASSESSMENT. Pearson Education, LTD.

LTEClass English Department Baturaja University

Sabtu, November 26, 2016

Assessing Speaking and Assessing Writing (GROUP TASK)

0 komentar:

Posting Komentar