Selecting appropriate assessment techniques II: types of teacher-made assessments

One of the challenges for beginning teachers is to select and use appropriate assessment techniques. In this section we summarize the wide variety of types of assessments that classroom teachers use. First we discuss the informal techniques teachers use during instruction that typically require instantaneous decisions. Then we consider formal assessment techniques that teachers plan before instruction and allow for reflective decisions.

Teachers’ observation, questioning, and record keeping

During teaching, teachers not only have to communicate the information they planned but also continuously monitor students’ learning and motivation in order to determine whether modifications have to be made (Airasian, 2005). Beginning teachers find this more difficult than experienced teachers because of the complex cognitive skills required to improvise and be responsive to students needs while simultaneously keeping in mind the goals and plans of the lesson (Borko & Livingston, 1989). The informal assessment strategies teachers most often use during instruction are observation and questioning.

Observation

Effective teachers observe their students from the time they enter the classroom. Some teachers greet their students at the door not only to welcome them but also to observe their mood and motivation. Are Hannah and Naomi still not talking to each other? Does Ethan have his materials with him? Gaining information on such questions can help the teacher foster student learning more effectively (e.g. suggesting Ethan goes back to his locker to get his materials before the bell rings or avoiding assigning Hannah and Naomi to the same group).

During instruction, teachers observe students’ behavior to gain information about students’ level of interest and understanding of the material or activity. Observation includes looking at non-verbal behaviors as well as listening to what the students are saying. For example, a teacher may observe that a number of students are looking out of the window rather than watching the science demonstration, or a teacher may hear students making comments in their group indicating they do not understand what they are supposed to be doing. Observations also help teachers decide which student to call on next, whether to speed up or slow down the pace of the lesson, when more examples are needed, whether to begin or end an activity, how well students are performing a physical activity, and if there are potential behavior problems (Airasian, 2005). Many teachers find that moving around the classroom helps them observe more effectively because they can see more students from a variety of perspectives. However, the fast pace and complexity of most classrooms makes it difficult for teachers to gain as much information as they want.

Questioning

Teachers ask questions for many instructional reasons including keeping students’ attention on the lesson, highlighting important points and ideas, promoting critical thinking, allowing students’ to learn from each others answers, and providing information about students’ learning. Devising good appropriate questions and using students’ responses to make effective instantaneous instructional decisions is very difficult. Some strategies to improve questioning include planning and writing down the instructional questions that will be asked, allowing sufficient wait time for students to respond, listening carefully to what students say rather than listening for what is expected, varying the types of questions asked, making sure some of the questions are higher level, and asking follow-up questions.

While the informal assessment based on spontaneous observation and questioning is essential for teaching there are inherent problems with the validity, reliability and bias in this information (Airasian, 2005; Stiggins 2005). We summarize these issues and some ways to reduce the problems in Table 1.

Table 1: Validity and reliability of observation and questioning
Problem Strategies to alleviate problem
Teachers lack of objectivity about overall class involvement and understanding Try to make sure you are not only seeing what you want to see. Teachers typically want to feel good about their instruction so it is easy to look for positive student interactions. Occasionally, teachers want to see negative student reactions to confirm their beliefs about an individual student or class.
Tendency to focus on process rather than learning Remember to concentrate on student learning not just involvement. Most of teachers’ observations focus on process—student attention, facial expressions posture—rather than pupil learning. Students can be active and engaged but not developing new skills.
Limited information and selective sampling

Make sure you observe a variety of students—not just those who are typically very good or very bad.

Walk around the room to observe more students “up close” and view the room from multiple perspectives.

Call on a wide variety of students—not just those with their hands up, or those who are skilled as the subject, or those who sit in a particular place in the room.

Keep records.

Fast pace of classrooms inhibits corroborative evidence If you want to know if you are missing important information ask a peer to visit your classroom and observe the students’ behaviors. Classrooms are complex and fast paced and one teacher cannot see much of what is going on while trying to also teach.
Cultural and individual differences in the meaning of verbal and non verbal behaviors Be cautious in the conclusions that you draw from your observations and questions. Remember that the meaning and expectations of certain types of questions, wait time, social distance, and role of “small talk” varies across cultures. Some students are quiet because of their personalities not because they are uninvolved, nor keeping up with the lesson, nor depressed or tired.

Record keeping

Keeping records of observations improves reliability and can be used to enhance understanding of one student, a group, or the whole class’ interactions. Sometimes this requires help from other teachers. For example, Alexis, a beginning science teacher is aware of the research documenting that longer wait time enhances students’ learning (e.g. Rowe, 2003) but is unsure of her behaviors so she asks a colleague to observe and record her wait times during one class period. Alexis learns her wait times are very short for all students so she starts practicing silently counting to five whenever she asks students a question.

Teachers can keep anecdotal records about students without help from peers. These records contain descriptions of incidents of a student’s behavior, the time and place the incident takes place, and a tentative interpretation of the incident. For example, the description of the incident might involve Joseph, a second grade student, who fell asleep during the mathematics class on a Monday morning. A tentative interpretation could be the student did not get enough sleep over the weekend, but alternative explanations could be the student is sick or is on medications that make him drowsy. Obviously additional information is needed and the teacher could ask Joseph why he is so sleepy and also observe him to see if he looks tired and sleepy over the next couple of weeks.

Anecdotal records often provide important information and are better than relying on one’s memory but they take time to maintain and it is difficult for teachers to be objective. For example, after seeing Joseph fall asleep the teacher may now look for any signs of Joseph’s sleepiness—ignoring the days he is not sleepy. Also, it is hard for teachers to sample a wide enough range of data for their observations to be highly reliable.

Teachers also conduct more formal observations especially for students with special needs who have IEPs. An example of the importance of informal and formal observations in a preschool follows:

The class of preschoolers in a suburban neighborhood of a large city has eight special needs students and four students—the peer models—who have been selected because of their well developed language and social skills. Some of the special needs students have been diagnosed with delayed language, some with behavior disorders, and several with autism.

The students are sitting on the mat with the teacher who has a box with sets of three “cool” things of varying size (e.g. toy pandas) and the students are asked to put the things in order by size, big, medium and small. Students who are able are also requested to point to each item in turn and say “This is the big one,” “This is the medium one,” and “This is the little one.” For some students, only two choices (big and little) are offered because that is appropriate for their developmental level.

The teacher informally observes that one of the boys is having trouble keeping his legs still so she quietly asks the aid for a weighted pad that she places on the boy’s legs to help him keep them still. The activity continues and the aide carefully observes students behaviors and records on IEP progress cards whether a child meets specific objectives such as: “When given two picture or object choices, Mark will point to the appropriate object in 80 percent of the opportunities.” The teacher and aides keep records of the relevant behavior of the special needs students during the half day they are in preschool. The daily records are summarized weekly. If there are not enough observations that have been recorded for a specific objective, the teacher and aide focus their observations more on that child, and if necessary, try to create specific situations that relate to that objective. At end of each month the teacher calculates whether the special needs children are meeting their IEP objectives.

Selected response items

Common formal assessment formats used by teachers are multiple choice, matching, and true/false items. In selected response items students have to select a response provided by the teacher or test developer rather than constructing a response in their own words or actions. Selected response items do not require that students recall the information but rather recognize the correct answer. Tests with these items are called objective because the results are not influenced by scorers’ judgments or interpretations and so are often machine scored. Eliminating potential errors in scoring increases the reliability of tests but teachers who only use objective tests are liable to reduce the validity of their assessment because objective tests are not appropriate for all learning goals (Linn & Miller, 2005). Effective assessment for learning as well as assessment of learning must be based on aligning the assessment technique to the learning goals and outcomes.

For example, if the goal is for students to conduct an experiment then they should be asked to do that rather that than being asked about conducting an experiment.

Common problems

Selected response items are easy to score but are hard to devise. Teachers often do not spend enough time constructing items and common problems include:

  1. Unclear wording in the items
    1. True or False: Although George Washington was born into a wealthy family, his father died when he was only 11, he worked as a youth as a surveyor of rural lands, and later stood on the balcony of Federal Hall in New York when he took his oath of office in 1789.
  2. Cues that are not related the content being examined.
    1. A common clue is that all the true statements on a true/false test or the corrective alternatives on a multiple choice test are longer than the untrue statements or the incorrect alternatives.
  3. Using negatives (or double negatives) the items.
    1. A poor item: “True or False: None of the steps made by the student was unnecessary.”
    2. A better item: True or False: “All of the steps were necessary.”
    3. Students often do not notice the negative terms or find them confusing so avoiding them is generally recommended (Linn & Miller 2005). However, since standardized tests often use negative items, teachers sometimes deliberately include some negative items to give students practice in responding to that format.
  4. Taking sentences directly from textbook or lecture notes. Removing the words from their context often makes them ambiguous or can change the meaning. For example, a statement from Chapter 3 taken out of context suggests all children are clumsy. “Similarly with jumping, throwing and catching: the large majority of children can do these things, though often a bit clumsily.” A fuller quotation makes it clearer that this sentence refers to 5-year-olds: “For some fives, running still looks a bit like a hurried walk, but usually it becomes more coordinated within a year or two. Similarly with jumping, throwing and catching: the large majority of children can do these things, though often a bit clumsily, by the time they start school, and most improve their skills noticeably during the early elementary years.” If the abbreviated form was used as the stem in a true/false item it would obviously be misleading.
  5. Avoid trivial questions e.g. Jean Piaget was born in what year?
    1. While it important to know approximately when Piaget made his seminal contributions to the understanding of child development, the exact year of his birth (1880) is not important.

Strengths and weaknesses

All types of selected response items have a number of strengths and weaknesses. True/False items are appropriate for measuring factual knowledge such as vocabulary, formulae, dates, proper names, and technical terms. They are very efficient as they use a simple structure that students can easily understand, and take little time to complete. They are also easier to construct than multiple choice and matching items. However, students have a 50 per cent probability of getting the answer correct through guessing so it can be difficult to interpret how much students know from their test scores. Examples of common problems that arise when devising true/false items are in Table 2.

Table 2: Common errors in selected response items
Type of item Common errors Example
True/False The statement is not absolutely true—typically because it contains a broad generalization. T/F: The President of the United States is elected to that office. This is usually true but the US Vice President can succeed the President.
True/False The item is opinion not fact T/F: Education for K-12 students is improved though policies that support charter schools. Some people believe this, some do not.
True/False Two ideas are included in item T/F: George H Bush the 40th president of the US was defeated by William Jefferson Clinton in 1992. The 1st idea is false; the 2nd is true making it difficult for students to decide whether to circle T or F.
True/False Irrelevant cues T/F: The President of the United States is usually elected to that office. True items contain the words such as usually generally; whereas false items contain the terms such as always, all, never.
Matching Columns do not contain homogeneous information

Directions: On the line to the US Civil War Battle write the year or confederate general in Column B.

Column A

  • Ft Sumter
  • 2nd Battle of Bull Run
  • Ft Henry

Column B

  • General Stonewall Jackson
  • General Johnson
  • 1861
  • 1862

Column B is a mixture of generals and dates.

Matching Too many items in each list Lists should be relatively short (4–7) in each column. More than 10 are too confusing.
Matching Responses are not in logical order In the example with Spanish and English words (Exhibit 1) should be in a logical order (they are alphabetical). If the order is not logical, student spend too much time searching for the correct answer.
Multiple Choice Problem (i.e. the stem) is not clearly stated problem New Zealand

  1. Is the worlds’ smallest continent
  2. Is home to the kangaroo
  3. Was settled mainly by colonists from Great Britain
  4. Is a dictatorship

This is really a series of true-false items. Because the correct answer is 3, a better version with the problem in the stem is

Much of New Zealand was settled by colonists from

  1. Great Britain
  2. Spain
  3. France
  4. Holland
Multiple Choice Some of the alternatives are not plausible Who is best known for their work on the development of the morality of justice?

  1. Gerald Ford
  2. Vygotsky
  3. Maslow
  4. Kohlberg

Obviously Gerald Ford is not a plausible alternative. 

Multiple Choice Irrelevant cues
  • Correct alternative is longer
  • Incorrect alternatives are not grammatically correct with the stem
  • Too many correct alternatives are in position “b” or “c” making it easier for students to guess. All the options (e.g. a, b, c, d) should be used in approximately equal frequently (not exact as that also provides clues).
Multiple Choice Use of “All of above” If all of the “above is used” then the other items must be correct. This means that a student may read the 1st response, mark it correct and move on. Alternatively, a student may read the 1st two items and seeing they are true does nor need to read the other alternatives to know to circle “all of the above.” The teacher probably does not want either of these options.

In matching items, two parallel columns containing terms, phrases, symbols, or numbers are presented and the student is asked to match the items in the first column with those in the second column. Typically there are more items in the second column to make the task more difficult and to ensure that if a student makes one error they do not have to make another. Matching items most often are used to measure lower level knowledge such as persons and their achievements, dates and historical events, terms and definitions, symbols and concepts, plants or animals and classifications (Linn & Miller, 2005). An example with Spanish language words and their English equivalents is in Exhibit 1.

Exhibit 1: Spanish and English translation

Directions: On the line to the right of the Spanish word in Column A, write the letter of the English word in Column B that has the same meaning.

Column A

  1. Casa ___
  2. Bebé ___
  3. Gata ___
  4. Perro ___
  5. Hermano___

Column B

  1. Aunt
  2. Baby
  3. Brother
  4. Cat
  5. Dog
  6. Father
  7. House

While matching items may seem easy to devise it is hard to create homogenous lists. Other problems with matching items and suggested remedies are in Table 2.

Multiple Choice items are the most commonly used type of objective test items because they have a number of advantages over other objective test items. Most importantly they can be adapted to assess higher levels thinking such as application as well as lower level factual knowledge. The first example in Exhibit 2 assesses knowledge of a specific fact whereas the second example assesses application of knowledge.

Exhibit 2: Multiple-Choice Examples

Who is best known for their work on the development of the morality of justice?

  1. Erikson
  2. Vygotsky
  3. Maslow
  4. Kohlberg

Which one of the following best illustrates the law of diminishing returns?

  1. A factory doubled its labor force and increased production by 50 per cent
  2. The demand for an electronic product increased faster than the supply of the product
  3. The population of a country increased faster than agricultural self sufficiency
  4. A machine decreased in efficacy as its parts became worn out

(Adapted from Linn and Miller 2005, p, 193).

There are several other advantages of multiple choice items. Students have to recognize the correct answer not just know the incorrect answer as they do in true/false items. Also, the opportunity for guessing is reduced because four or five alternatives are usually provided whereas in true/false items students only have to choose between two choices. Also, multiple choice items do not need homogeneous material as matching items do. However, creating good multiple choice test items is difficult and students (maybe including you) often become frustrated when taking a test with poor multiple choice items. Three steps have to be considered when constructing a multiple choice item: formulating a clearly stated problem, identifying plausible alternatives, and removing irrelevant clues to the answer. Common problems in each of these steps are summarized in Table 3 (below).

Constructed response items

Formal assessment also includes constructed response items in which students are asked to recall information and create an answer—not just recognize if the answer is correct—so guessing is reduced. Constructed response items can be used to assess a wide variety of kinds of knowledge and two major kinds are discussed: completion or short answer (also called short response) and extended response.

Completion and short answer

Completion and short answer items can be answered in a word, phrase, number, or symbol. These types of items are essentially the same only varying in whether the problem is presented as a statement or a question (Linn & Miller 2005). Look at Exhibit 3 for examples:

Exhibit 3: Completion and short answer questions

Completion: The first traffic light in the US was invented by ________.

Short Answer: Who invented the first traffic light in the US?

These items are often used in mathematics tests, for example:

3 + 10 = ____?

If x = 6, what does x(x − 1) = ________

Draw the line of symmetry on the following shape: A simple D-shape.

A major advantage of these items is they that they are easy to construct. However, apart from their use in mathematics they are unsuitable for measuring complex learning outcomes and are often difficult to score. Completion and short answer tests are sometimes called objective tests as the intent is that there is only one correct answer and so there is no variability in scoring but unless the question is phrased very carefully, there are frequently a variety of correct answers. For example, consider the short answer question “Where was President Lincoln born?”

The teacher may expect the answer “in a log cabin” but other correct answers are also “on Sinking Spring Farm,” “in Hardin County,” or “in Kentucky.” Common errors in these items are summarized in Table 3.

Table 3: Common errors in constructed response items
Type of item Common errors Examples
Completion and short answer There is more than one possible answer. Where was US President Lincoln born? The answer could be in a log cabin, in Kentucky, etc. 
Completion and short answer Too many blanks are in the completion item so it is too difficult or doesn’t make sense. In ________ theory, the first stage, ________ is when infants process through their ________ and________ ________.
Completion and short answer Clues are given by length of blanks in completion items. Three states are contiguous to New Hampshire: ________ is to the West, ________ is to the East, and ________ is to the South.
Extended response Ambiguous questions Was the US Civil War avoidable? Students could interpret this question in a wide variety of ways, perhaps even stating “yes” or “no.” One student may discuss only political causes another moral, political and economic causes. There is no guidance in the question for students.
Extended response Poor reliability in grading The teacher does not use a scoring rubric and so is inconsistent in how he scores answers especially unexpected responses, irrelevant information, and grammatical errors.
Extended response Perception of student influences grading By spring semester the teacher has developed expectations of each student’s performance and this influences the grading (numbers can be used instead of names). The test consists of three constructed responses and the teacher grades the three answers on each students’ paper before moving to the next paper. This means that the grading of questions 2 and 3 are influenced by the answers to question 1 (teachers should grade all the 1st question then the 2nd etc).
Extended response Choices are given on the test and some answers are easier than others Testing experts recommend not giving choices in tests because then students are not really taking the same test creating equity problems.

Extended response

Extended response items are used in many content areas and answers may vary in length from a paragraph to several pages. Questions that require longer responses are often called essay questions. Extended response items have several advantages and the most important is their adaptability for measuring complex learning outcomes— particularly integration and application. These items also require that students write and therefore provide teachers a way to assess writing skills. A commonly cited advantage to these items is their ease in construction; however, carefully worded items that are related to learning outcomes and assess complex learning are hard to devise (Linn & Miller, 2005). Well-constructed items phrase the question so the task of the student is clear. Often this involves providing hints or planning notes. In the first example below the actual question is clear not only because of the wording but because of the format (i.e. it is placed in a box). In the second and third examples planning notes are provided:

Example 1: Third grade mathematics

The owner of a bookstore gave 14 books to the school. The principal will give an equal number of books to each of three classrooms and the remaining books to the school library. How many books could the principal give to each student and the school?

Show all your work on the space below and on the next page. Explain in words how you found the answer. Tell why you took the steps you did to solve the problem.

(From Illinois Standards Achievement Test, 2006; (http://www.isbe.state.il.us/assessment/isat.htm))

Example 2: Fifth grade science: The grass is always greener

Jose and Maria noticed three different types of soil, black soil, sand, and clay, were found in their neighborhood. They decided to investigate the question, “How does the type of soil (black soil, sand, and clay) under grass sod affect the height of grass?”

Plan an investigation that could answer their new question. In your plan, be sure to include:

  • Prediction of the outcome of the investigation
  • Materials needed to do the investigation
  • Procedure that includes:
    • logical steps to do the investigation
    • one variable kept the same (controlled)
    • one variable changed (manipulated)
    • any variables being measure and recorded
    • how often measurements are taken and recorded

(From Washington State 2004 assessment of student learning (http://www.k12.wa.us/assessment/WASL/default.aspx))

Example 3: Grades 9–11 English

Writing prompt

Some people think that schools should teach students how to cook. Other people think that cooking is something that ought to be taught in the home. What do you think? Explain why you think as you do.

Planning notes

Choose One:

  • I think schools should teach students how to cook
  • I think cooking should l be taught in the home

I think cooking should be taught in ____________ because ________________________.

(From Illinois Measure of Annual Growth in English (http://www.isbe.state.il.us/assessment/image.htm))

A major disadvantage of extended response items is the difficulty in reliable scoring. Not only do various teachers score the same response differently but also the same teacher may score the identical response differently on various occasions (Linn & Miller 2005). A variety of steps can be taken to improve the reliability and validity of scoring. First, teachers should begin by writing an outline of a model answer. This helps make it clear what students are expected to include. Second, a sample of the answers should be read. This assists in determining what the students can do and if there are any common misconceptions arising from the question. Third, teachers have to decide what to do about irrelevant information that is included (e.g. is it ignored or are students penalized) and how to evaluate mechanical errors such as grammar and spelling. Then, a point scoring or a scoring rubric should be used.

In point scoring components of the answer are assigned points. For example, if students were asked: What are the nature, symptoms, and risk factors of hyperthermia?

Point Scoring Guide:

  • Definition (natures) 2 pts
  • Symptoms (1 pt for each) 5 pts
  • Risk Factors (1 point for each) 5 pts
  • Writing 3 pts

This provides some guidance for evaluation and helps consistency but point scoring systems often lead the teacher to focus on facts (e.g. naming risk factors) rather than higher level thinking that may undermine the validity of the assessment if the teachers’ purposes include higher level thinking. A better approach is to use a scoring rubric that describes the quality of the answer or performance at each level.

Scoring rubrics

Scoring rubrics can be holistic or analytical. In holistic scoring rubrics, general descriptions of performance are made and a single overall score is obtained. An example from grade 2 language arts in Los Angeles Unified School District classifies responses into four levels: not proficient, partially proficient, proficient and advanced is on Exhibit 4.

Exhibit 4: Example of holistic scoring rubric: English language arts grade 2

Assignment. Write about an interesting, fun, or exciting story you have read in class this year. Some of the things you could write about are:

  • What happened in the story (the plot or events)
  • Where the events took place (the setting)
  • People, animals, or things in the story ( the characters)

In your writing make sure you use facts and details from the story to describe everything clearly. After you write about the story, explain what makes the story interesting, fun or exciting.

Scoring Rubric
Level Points Criteria
Advanced Score 4
  • The response demonstrates well-developed reading comprehension skills.
  • Major story elements (plot, setting, or characters) are clearly and accurately described.
  • Statements about the plot, setting, or characters are arranged in a manner that makes sense.
  • Ideas or judgments (why the story is interesting, fun, or exciting) are clearly supported or explained with facts and details from the story.
Proficient Score 3
  • The response demonstrates solid reading comprehension skills.
  • Most statements about the plot, setting, or characters are clearly described.
  • Most statements about the plot, setting, or characters are arranged in a manner that makes sense.
  • Ideas or judgments are supported with facts and details from the story.
Partially Proficient Score 1
  • The response demonstrates some reading comprehension skills
  • There is an attempt to describe the plot, setting, or characters
  • Some statements about the plot, setting, or characters are arranged in a manner that makes sense.
  • Ideas or judgments may be supported with some facts and details from the story.
Not Proficient Score 1
  • The response demonstrates little or no skill in reading comprehension.
  • The plot, setting, or characters are not described, or the description is unclear.
  • Statements about the plot, setting, or characters are not arranged in a manner that makes sense.
  • Ideas or judgments are not stated, and facts and details from the text are not used.
Source: Adapted from English Language Arts Grade 2 Los Angeles Unified School District, 2001 (http://www.cse.ucla.edu/resources/justforteachers_set.htm)

Analytical rubrics provide descriptions of levels of student performance on a variety of characteristics. For example, six characteristics used for assessing writing developed by the Northwest Regional Education Laboratory (NWREL) are:

  • ideas and content
  • organization
  • voice
  • word choice
  • sentence fluency
  • conventions

Descriptions of high, medium, and low responses for each characteristic are available from Education Northwest.

Holistic rubrics have the advantages that they can be developed more quickly than analytical rubrics. They are also faster to use as there is only one dimension to examine. However, they do not provide students feedback about which aspects of the response are strong and which aspects need improvement (Linn & Miller, 2005). This means they are less useful for assessment for learning. An important use of rubrics is to use them as teaching tools and provide them to students before the assessment so they know what knowledge and skills are expected.

Teachers can use scoring rubrics as part of instruction by giving students the rubric during instruction, providing several responses, and analyzing these responses in terms of the rubric. For example, use of accurate terminology is one dimension of the science rubric in Table 4. An elementary science teacher could discuss why it is important for scientists to use accurate terminology, give examples of inaccurate and accurate terminology, provide that component of the scoring rubric to students, distribute some examples of student responses (maybe from former students), and then discuss how these responses would be classified according to the rubric. This strategy of assessment for learning should be more effective if the teacher (a) emphasizes to students why using accurate terminology is important when learning science rather than how to get a good grade on the test (we provide more details about this in the section on motivation later in this chapter); (b) provides an exemplary response so students can see a model; and (c) emphasizes that the goal is student improvement on this skill not ranking students.

Table 4: Example of a scoring rubric, Science
Level of understanding Use of accurate scientific terminology Use of supporting details Synthesis of information Application of information[1]
4 There is evidence in the response that the student has a full and complete understanding. The use of accurate scientific terminology enhances the response. Pertinent and complete supporting details demonstrate an integration of ideas. The response reflects a complete synthesis of information. An effective application of the concept to a practical problem or real-world situation reveals an insight into scientific principles.
3 There is evidence in the response that the student has a good understanding. The use of accurate scientific terminology strengthens the response. The supporting details are generally complete. The response reflects some synthesis of information. The concept has been applied to a practical problem or real-world situation.
2 There is evidence in the response that the student has a basic understanding. The use of accurate scientific terminology may be present in the response. The supporting details are adequate. The response provides little or no synthesis of information. The application of the concept to a practical problem or real-world situation is inadequate.
1 There is evidence in the response that the student has some understanding. The use of accurate scientific terminology is not present in the response. The supporting details are only minimally effective. The response addresses the question. The application, if attempted, is irrelevant.
0 The student has no understanding of the question or problem. The response is completely incorrect or irrelevant.

Performance assessments

Typically in performance assessments students complete a specific task while teachers observe the process or procedure (e.g. data collection in an experiment) as well as the product (e.g. completed report) (Popham, 2005; Stiggens, 2005). The tasks that students complete in performance assessments are not simple—in contrast to selected response items—and include the following:

  • playing a musical instrument
  • athletic skills
  • artistic creation
  • conversing in a foreign language
  • engaging in a debate about political issues
  • conducting an experiment in science
  • repairing a machine
  • writing a term paper
  • using interaction skills to play together

These examples all involve complex skills but illustrate that the term performance assessment is used in a variety of ways. For example, the teacher may not observe all of the process (e.g. she sees a draft paper but the final product is written during out-of-school hours) and essay tests are typically classified as performance assessments (Airasian, 2000). In addition, in some performance assessments there may be no clear product (e.g. the performance may be group interaction skills).

Two related terms, alternative assessment and authentic assessment are sometimes used instead of performance assessment but they have different meanings (Linn & Miller, 2005). Alternative assessment refers to tasks that are not pencil-and-paper and while many performance assessments are not pencil-and paper tasks some are (e.g. writing a term paper, essay tests). Authentic assessment is used to describe tasks that students do that are similar to those in the “real world.” Classroom tasks vary in level of authenticity (Popham, 2005). For example, a Japanese language class taught in a high school in Chicago conversing in Japanese in Tokyo is highly authentic— but only possible in a study abroad program or trip to Japan. Conversing in Japanese with native Japanese speakers in Chicago is also highly authentic, and conversing with the teacher in Japanese during class is moderately authentic. Much less authentic is a matching test on English and Japanese words. In a language arts class, writing a letter (to an editor) or a memo to the principal is highly authentic as letters and memos are common work products. However, writing a five-paragraph paper is not as authentic as such papers are not used in the world of work. However, a five paragraph paper is a complex task and would typically be classified as a performance assessment.

Advantages and disadvantages

There are several advantages of performance assessments (Linn & Miller 2005). First, the focus is on complex learning outcomes that often cannot be measured by other methods. Second, performance assessments typically assess process or procedure as well as the product. For example, the teacher can observe if the students are repairing the machine using the appropriate tools and procedures as well as whether the machine functions properly after the repairs. Third, well designed performance assessments communicate the instructional goals and meaningful learning clearly to students. For example, if the topic in a fifth grade art class is one-point perspective the performance assessment could be drawing a city scene that illustrates one point perspective. (http://www.sanford-artedventures.com). This assessment is meaningful and clearly communicates the learning goal. This performance assessment is a good instructional activity and has good content validity—common with well designed performance assessments (Linn & Miller 2005).

One major disadvantage with performance assessments is that they are typically very time consuming for students and teachers. This means that fewer assessments can be gathered so if they are not carefully devised fewer learning goals will be assessed—which can reduce content validity. State curriculum guidelines can be helpful in determining what should be included in a performance assessment. For example, Eric, a dance teacher in a high school in Tennessee learns that the state standards indicate that dance students at the highest level should be able to do demonstrate consistency and clarity in performing technical skills by:

  • performing complex movement combinations to music in a variety of meters and styles
  • performing combinations and variations in a broad dynamic range
  • demonstrating improvement in performing movement combinations through self-evaluation
  • critiquing a live or taped dance production based on given criteria

(http://www.tennessee.gov/education/ci/standards/music/dance912.shtml)

Eric devises the following performance task for his eleventh grade modern dance class:

In groups of 4–6 students will perform a dance at least 5 minutes in length. The dance selected should be multifaceted so that all the dancers can demonstrate technical skills, complex movements, and a dynamic range (Items 1–2). Students will videotape their rehearsals and document how they improved through self evaluation (Item 3). Each group will view and critique the final performance of one other group in class (Item 4). Eric would need to scaffold most steps in this performance assessment. The groups probably would need guidance in selecting a dance that allowed all the dancers to demonstrate the appropriate skills; critiquing their own performances constructively; working effectively as a team, and applying criteria to evaluate a dance.

Another disadvantage of performance assessments is they are hard to assess reliably which can lead to inaccuracy and unfair evaluation. As with any constructed response assessment, scoring rubrics are very important. An example of holistic and analytic scoring rubrics designed to assess a completed product are in Exhibit 4 and Table 4. A rubric designed to assess the process of group interactions is in Table 5.

Table 5: Example of group interaction rubric
Score Time management Participation and performance in roles Shared involvement
0 Group did not stay on task and so task was not completed. Group did not assign or share roles. Single individual did the task.
1 Group was off-task the majority of the time but task was completed. Groups assigned roles but members did not use these roles. Group totally disregarded comments and ideas from some members.
2 Group stayed on task most of the time. Groups accepted and used some but not all roles. Group accepted some ideas but did not give others adequate consideration
3 Group stayed on task throughout the activity and managed time well. Group accepted and used roles and actively participated. Groups gave equal consideration to all ideas
4 Group defined their own approach in a way that more effectively managed the activity. Group defined and used roles not mentioned to them. Role changes took place that maximized individuals’ expertise. Groups made specific efforts to involve all group members including the reticent members.
Source: Adapted from Group Interaction (GI) SETUP (2003). Issues, Evidence and You. Ronkonkomo, NY Lab-Aids, (http://cse.edc.org/products/assessment/middleschool/scorerub.asp).

This rubric was devised for middle grade science but could be used in other subject areas when assessing group process. In some performance assessments several scoring rubrics should be used. In the dance performance example above Eric should have scoring rubrics for the performance skills, the improvement based on self evaluation, the team work, and the critique of the other group. Obviously, devising a good performance assessment is complex and Linn and Miller (2005) recommend that teachers should:

  • Create performance assessments that require students to use complex cognitive skills. Sometimes teachers devise assessments that are interesting and that the students enjoy but do not require students to use higher level cognitive skills that lead to significant learning. Focusing on high level skills and learning outcomes is particularly important because performance assessments are typically so time consuming.
  • Ensure that the task is clear to the students. Performance assessments typically require multiple steps so students need to have the necessary prerequisite skills and knowledge as well as clear directions. Careful scaffolding is important for successful performance assessments.
  • Specify expectations of the performance clearly by providing students scoring rubrics during the instruction. This not only helps students understand what it expected but it also guarantees that teachers are clear about what they expect. Thinking this through while planning the performance assessment can be difficult for teachers but is crucial as it typically leads to revisions of the actual assessment and directions provided to students.
  • Reduce the importance of unessential skills in completing the task. What skills are essential depends on the purpose of the task. For example, for a science report, is the use of publishing software essential? If the purpose of the assessment is for students to demonstrate the process of the scientific method including writing a report, then the format of the report may not be significant. However, if the purpose includes integrating two subject areas, science and technology, then the use of publishing software is important. Because performance assessments take time it is tempting to include multiple skills without carefully considering if all the skills are essential to the learning goals.

Portfolios

“A portfolio is a meaningful collection of student work that tells the story of student achievement or growth” (Arter, Spandel, & Culham, 1995, p. 2). Portfolios are a purposeful collection of student work not just folders of all the work a student does. Portfolios are used for a variety of purposes and developing a portfolio system can be confusing and stressful unless the teachers are clear on their purpose. The varied purposes can be illustrated as four dimensions (Linn & Miller 2005):

Assessment for Learning ↔ Assessment of learning

Current Accomplishments ↔ Progress

Best Work Showcase ↔ Documentation

Finished ↔ Working

When the primary purpose is assessment for learning, the emphasis is on student self-reflection and responsibility for learning. Students not only select samples of their work they wish to include, but also reflect and interpret their own work. Portfolios containing this information can be used to aid communication as students can present and explain their work to their teachers and parents (Stiggins, 2005). Portfolios focusing on assessment of learning contain students’ work samples that certify accomplishments for a classroom grade, graduation, state requirements etc. Typically, students have less choice in the work contained in such portfolios as some consistency is needed for this type of assessment. For example, the writing portfolios that fourth and seventh graders are required to submit in Kentucky must contain a self-reflective statement and an example of three pieces of writing (reflective, personal experience or literary, and transactive). Students do choose which of their pieces of writing in each type to include in the portfolio (Kentucky Student Performance Standards).

Portfolios can be designed to focus on student progress or current accomplishments. For example, audio tapes of English language learners speaking could be collected over one year to demonstrate growth in learning. Student progress portfolios may also contain multiple versions of a single piece of work. For example, a writing project may contain notes on the original idea, outline, first draft, comments on the first draft by peers or teacher, second draft, and the final finished product (Linn & Miller 2005). If the focus is on current accomplishments, only recent completed work samples are included.

Portfolios can focus on documenting student activities or highlighting important accomplishments. Documentation portfolios are inclusive containing all the work samples rather than focusing on one special strength, best work, or progress. In contrast, showcase portfolios focus on best work. The best work is typically identified by students. One aim of such portfolios is that students learn how to identify products that demonstrate what they know and can do. Students are not expected to identify their best work in isolation but also use the feedback from their teachers and peers.

A final distinction can be made between a finished portfolio—maybe used to for a job application—versus a working portfolio that typically includes day-to-day work samples. Working portfolios evolve over time and are not intended to be used for assessment of learning. The focus in a working portfolio is on developing ideas and skills so students should be allowed to make mistakes, freely comment on their own work, and respond to teacher feedback (Linn & Miller, 2005). Finished portfolios are designed for use with a particular audience and the products selected may be drawn from a working portfolio. For example, in a teacher education program, the working portfolio may contain work samples from all the courses taken. A student may develop one finished portfolio to demonstrate she has mastered the required competencies in the teacher education program and a second finished portfolio for her job application.

Advantages and disadvantages

Portfolios used well in classrooms have several advantages. They provide a way of documenting and evaluating growth in a much more nuanced way than selected response tests can. Also, portfolios can be integrated easily into instruction, i.e. used for assessment for learning. Portfolios also encourage student self-evaluation and reflection, as well as ownership for learning (Popham, 2005). Using classroom assessment to promote student motivation is an important component of assessment for learning which is considered in the next section.

However, there are some major disadvantages of portfolio use. First, good portfolio assessment takes an enormous amount of teacher time and organization. The time is needed to help students understand the purpose and structure of the portfolio, decide which work samples to collect, and to self reflect. Some of this time needs to be conducted in one-to-one conferences. Reviewing and evaluating the portfolios out of class time is also enormously time consuming. Teachers have to weigh if the time spent is worth the benefits of the portfolio use.

Second, evaluating portfolios reliability and eliminating bias can be even more difficult than in a constructed response assessment because the products are more varied. The experience of the state-wide use of portfolios for assessment in writing and mathematics for fourth and eighth graders in Vermont is sobering. Teachers used the same analytic scoring rubric when evaluating the portfolio. In the first two years of implementation samples from schools were collected and scored by an external panel of teachers. In the first year the agreement among raters (i.e. inter-rater reliability) was poor for mathematics and reading; in the second year the agreement among raters improved for mathematics but not for reading. However, even with the improvement in mathematics the reliability was too low to use the portfolios for individual student accountability (Koretz, Stecher, Klein & McCaffrey, 1994). When reliability is low, validity is also compromised because unstable results cannot be interpreted meaningfully.

If teachers do use portfolios in their classroom, the series of steps needed for implementation are outlined in . If the school or district has an existing portfolio system these steps may have to be modified.

Table 6: Steps in implementing a classroom portfolio program
1. Make sure students own their portfolios. Talk to your students about your ideas of the portfolio, the different purposes, and the variety of work samples. If possible, have them help make decisions about the kind of portfolio you implement.
2. Decide on the purpose. Will the focus be on growth or current accomplishments? Best work showcase or documentation? Good portfolios can have multiple purposes but the teacher and students need to be clear about the purpose.
3. Decide what work samples to collect. For example, in writing, is every writing assignment included? Are early drafts as well as final products included?
4. Collect and store work samples. Decide where the work sample will be stored. For example, will each student have a file folder in a file cabinet, or a small plastic tub on a shelf in the classroom?
5. Select criteria to evaluate samples. If possible, work with students to develop scoring rubrics. This may take considerable time as different rubrics may be needed for the variety of work samples. If you are using existing scoring rubrics, discuss with students possible modifications after the rubrics have been used at least once.
6. Teach and require students conduct self evaluations of their own work. Help students learn to evaluate their own work using agreed upon criteria. For younger students, the self evaluations may be simple (strengths, weaknesses, and ways to improve); for older students a more analytic approach is desirable including using the same scoring rubrics that the teachers will use.
7. Schedule and conduct portfolio conferences. Teacher-student conferences are time consuming but conferences are essential for the portfolio process to significantly enhance learning. These conferences should aid students’ self evaluation and should take place frequently.
8. Involve parents. Parents need to understand the portfolio process. Encourage parents to review the work samples. You may wish to schedule parent, teacher-students conferences in which students talk about their work samples.
Source: Adapted from Popham (2005)

Assessment that enhances motivation and student confidence

Studies on testing and learning conducted more than 20 years ago demonstrated that tests promote learning and that more frequent tests are more effective than less frequent tests (Dempster & Perkins, 1993). Frequent smaller tests encourage continuous effort rather than last minute cramming and may also reduce test anxiety because the consequences of errors are reduced. College students report preferring more frequent testing than infrequent testing (Bangert-Downs, Kulik, Kulik, 1991). More recent research indicates that teachers’ assessment purpose and beliefs, the type of assessment selected, and the feedback given contributes to the assessment climate in the classroom which influences students’ confidence and motivation. The use of self-assessment is also important in establishing a positive assessment climate.

References

Airasian, P. W. (2000). Classroom Assessment: A concise approach 2nd ed. Boston: McGraw Hill.

Airasian, P. W. (2005). Classroom Assessment: Concepts and Applications 3rd ed. Boston: McGraw Hill.

Bangert-Downs, R. L.,Kulik, J. A., & Kulik, C-L, C. (1991). Effects of frequent classroom testing. Journal of Educational Research, 85(2), 89–99.

Borko, H. & Livingston, C. (1989) Cognition and Improvisation: Differences in Mathematics Instruction by Expert and Novice Teachers. American Educational Research Journal, 26, 473–98.

Dempster, F. N. & Perkins, P. G. (1993). Revitalizating classroom assessment: Using tests to promote learning. Journal of Instructional Psychology, 20(3) 197–203.

Koretz, D. Stecher, B. Klein, S. & McCaffrey, D. (1994). The evolution of a portfolio program: The impact and quality of the Vermont program in its second year (1992–3). (CSE Technical report 385) Los Angeles: University of California, Center for Research on Evaluation Standards and student Testing. Accessed January 25, 2006 from http://www.csr.ucla.edu.

Linn, R. L., & Miller, M. D. (2005). Measurement and Assessment in Teaching 9th ed. Upper Saddle River, NJ: Pearson.

Popham, W. J. (2005). Classroom Assessment: What teachers need to know. Boston, MA: Pearson.

Rowe, M. B. (2003). Wait-time and rewards as instructional variables, their influence on language, logic and fate control: Part one-wait time. Journal of Research in science Teaching, 40 Supplement, S19–32.

Stiggins, R. J. (2002). Assessment crisis: The absence of assessment FOR learning. Phi Delta Kappan, 83(10), 758–765.


  1. On the High School Assessment, the application of a concept to a practical problem or real-world situation will be scored when it is required in the response and requested in the item stem.