Basic concepts

Standardized tests are created by a team—usually test experts from a commercial testing company who consult classroom teachers and university faculty—and are administered in standardized ways. Students not only respond to the same questions they also receive the same directions and have the same time limits. Explicit scoring criteria are used. Standardized tests are designed to be taken by many students within a state, province, or nation, and sometimes across nations. Teachers help administer some standardized tests and test manuals are provided that contain explicit details about the administration and scoring. For example, teachers may have to remove all the posters and charts from the classroom walls, read directions out loud to students using a script, and respond to student questions in a specific manner.

Criterion referenced standardized tests measure student performance against a specific standard or criterion. For example, newly hired firefighters in the Commonwealth of Massachusetts in the United States have to meet physical fitness standards by successfully completing a standardized physical fitness test that includes stair climbing, using a ladder, advancing a hose, and simulating a rescue through a doorway (Human Resources Division, n.d.). Criterion referenced tests currently used in US schools are often tied to state content standards and provide information about what students can and cannot do. For example, one of the content standards for fourth grade reading in Kentucky is “Students will identify and describe the characteristics of fiction, nonfiction, poetry or plays” (Combined Curriculum Document Reading 4.1, 2006) and so a report on an individual student would indicate if the child can accomplish this skill. The report may state that number or percentage of items that were successfully completed (e.g. 15 out of 20, i.e. 75 per cent) or include descriptions such as basic, proficient, or advanced which are based on decisions made about the percent of mastery necessary to be classified into these categories.

Norm referenced standardized tests report students’ performance relative to others. For example, if a student scores on the seventy-second percentile in reading it means she outperforms 72 percent of the students who were included in the test’s norm group. A norm group is a representative sample of students who completed the standardized test while it was being developed. For state tests the norm group is drawn from the state whereas for national tests the sample is drawn from the nation. Information about the norm groups is provided in a technical test manual that is not typically supplied to teachers but should be available from the person in charge of testing in the school district.

Reports from criterion and norm referenced tests provide different information. Imagine a nationalized mathematics test designed to basic test skills in second grade. If this test is norm referenced, and Alisha receives a report indicating that she scored in the eighty-fifth percentile this indicates that she scored better than 85 per cent of the students in the norm group who took the test previously. If this test is criterion-referenced Alisha’s report may state that she mastered 65 per cent of the problems designed for her grade level. The relative percentage reported from the norm-referenced test provides information about Alisha’s performance compared to other students whereas the criterion referenced test attempts to describe what Alisha or any student can or cannot do with respect to whatever the test is designed to measure. When planning instruction classroom teachers need to know what students can and cannot do so criterion referenced tests are typically more useful (Popham, 2004). The current standard-based accountability and NCLB rely predominantly on criterion based tests to assess attainment of content-based standards. Consequently the use of standardized norm referenced tests in schools has diminished and is largely limited to diagnosis and placement of children with specific cognitive disabilities or exceptional abilities (Haertel & Herman, 2005).

Some recent standardized tests can incorporate both criterion-referenced and norm referenced elements in to the same test (Linn & Miller, 2005). That is, the test results not only provide information on mastery of a content standard but also the percentage of students who attained that level of mastery.

Standardized tests can be high stakes i.e. performance on the test has important consequences. These consequences can be for students, e.g. passing a high school graduation test is required in order to obtain a diploma or passing PRAXIS II is a prerequisite to gain a teacher license. These consequences can be for schools, e.g. under NCLB an increasing percentage of students in every school must reach proficiency in math and reading each year. Consequences for schools who fail to achieve these gains include reduced funding and restructuring of the school building. Under NCLB, the consequences are designed to be for the schools not individual students (Popham, 2005) and their test results may not accurately reflect what they know because students may not try hard when the tests have low stakes for them (Wise & DeMars, 2005).

Uses of standardized tests

Standardized tests are used for a variety of reasons and the same test is sometimes used for multiple purposes.

Assessing students’ progress in a wider context

Well-designed teacher assessments provide crucial information about each student’s achievement in the classroom. However, teachers vary in the types of assessment they use so teacher assessments do not usually provide information on how students’ achievement compares to externally established criteria. Consider two eighth grade students, Brian and Joshua, who received As in their middle school math classes. However, on the standardized norm referenced math test Brian scored in the fiftieth percentile whereas Joshua scored in the ninetieth percentile. This information is important to Brian and Joshua, their parents, and the school personnel. Likewise, two third grade students could both receive Cs on their report card in reading but one may pass 25 per cent and the other 65 per cent of the items on the Criterion Referenced State Test.

There are many reasons that students’ performance on teacher assessments and standardized assessments may differ. Students may perform lower on the standardized assessment because their teachers have easy grading criteria, or there is poor alignment between the content they were taught and that on the standardized test, or they are unfamiliar with the type of items on the standardized tests, or they have test anxiety, or they were sick on the day of the test. Students may perform higher on the standardized test than on classroom assessments because their teachers have hard grading criteria, or the student does not work consistently in class (e.g. does not turn in homework) but will focus on a standardized test, or the student is adept at the multiple choice items on the standardized tests but not at the variety of constructed response and performance items the teacher uses. We should always be very cautious about drawing inferences from one kind of assessment.

In some states, standardized achievement tests are required for home-schooled students in order to provide parents and state officials information about the students’ achievement in a wider context. For example, in New York home-schooled students must take an approved standardized test every other year in grades four through eight and every year in grades nine through twelve. These tests must be administered in a standardized manner and the results filed with the Superintendent of the local school district. If a student does not take the tests or scores below the thirty-third percentile the home schooling program may be placed on probation (New York State Education Department, 2005).

Diagnosing student’s strengths and weaknesses

Standardized tests, along with interviews, classroom observations, medical examinations, and school records are used to help diagnose students’ strengths and weaknesses. Often the standardized tests used for this purpose are administered individually to determine if the child has a disability. For example, if a kindergarten child is having trouble with oral communication, a standardized language development test could be administered to determine if there are difficulties with understanding the meaning of words or sentence structures, noticing sound differences in similar words, or articulating words correctly (Peirangelo & Guiliani, 2002). It would also be important to determine if the child was a recent immigrant, had a hearing impairment or mental retardation. The diagnosis of learning disabilities typically involves the administration of at least two types of standardized tests—an aptitude test to assess general cognitive functioning and an achievement test to assess knowledge of specific content areas (Peirangelo & Guiliani, 2006). We discuss the difference between aptitude and achievement tests later in this chapter.

Selecting students for specific programs

Standardized tests are often used to select students for specific programs. For example, the SAT (Scholastic Assessment Test) and ACT (American College Test) are norm referenced tests used to help determine if high school students are admitted to selective colleges. Norm referenced standardized tests are also used, among other criteria, to determine if students are eligible for special education or gifted and talented programs. Criterion referenced tests are used to determine which students are eligible for promotion to the next grade or graduation from high school. Schools that place students in ability groups including high school college preparation, academic, or vocational programs may also use norm referenced or criterion referenced standardized tests. When standardized tests are used as an essential criteria for placement they are obviously high stakes for students.

Assisting teachers’ planning

Norm referenced and criterion referenced standardized tests, among other sources of information about students, can help teachers make decisions about their instruction. For example, if a social studies teacher learns that most of the students did very well on a norm referenced reading test administered early in the school year he may adapt his instruction and use additional primary sources. A reading teacher after reviewing the poor end-of-the-year criterion referenced standardized reading test results may decide that next year she will modify the techniques she uses. A biology teacher may decide that she needs to spend more time on genetics as her students scored poorly on that section of the standardized criterion referenced science test. These are examples of assessment for learning which involves data-based decision making. It can be difficult for beginning teachers to learn to use standardized test information appropriately, understanding that test scores are important information but also remembering that there are multiple reasons for students’ performance on a test.

Accountability

Standardized tests results are increasingly used to hold teachers and administrators accountable for students’ learning. Prior to 2002, many States required public dissemination of students’ progress but under NCLB school districts in all states are required to send report cards to parents and the public that include results of standardized tests for each school. Providing information about students’ standardized tests is not new as newspapers began printing summaries of students’ test results within school districts in the 1970s and 1980s (Popham, 2005). However, public accountability of schools and teachers has been increasing in the US and many other countries and this increased accountability impacts the public perception and work of all teachers including those teaching in subjects or grade levels not being tested.

For example, Erin, a middle school social studies teacher, said:

As a teacher in a “non-testing” subject area, I spend substantial instructional time supporting the standardized testing requirements. For example, our school has instituted “word of the day,” which encourages teachers to use, define, and incorporate terminology often used in the tests (e.g. “compare,” “oxymoron,” etc.). I use the terms in my class as often as possible and incorporate them into written assignments. I also often use test questions of similar formats to the standardized tests in my own subject assessments (e.g. multiple choice questions with double negatives, short answer and extended response questions) as I believe that practice in the test question formats will help students be more successful in those subjects that are being assessed.

Accountability and standardized testing are two components of Standards Based Reform in Education that was initiated in the USA in 1980s. The two other components are academic content standards and teacher quality.

Types of standardized tests

Achievement tests

Summarizing the past: K-12 achievement tests are designed to assess what students have learned in a specific content area. These tests include those specifically designed by states to access mastery of state academic content standards as well as general tests such as the California Achievement Tests, The Comprehensive Tests of Basic Skills, Iowa Tests of Basic Skills, Metropolitan Achievement Tests, and the Stanford Achievement Tests. These general tests are designed to be used across the nation and so will not be as closely aligned with state content standards as specifically designed tests. Some states and Canadian Provinces use specifically designed tests to assess attainment of content standards and also a general achievement test to provide normative information.

Standardized achievement tests are designed to be used for students in kindergarten though high school. For young children questions are presented orally, and students may respond by pointing to pictures, and the subtests are often not timed. For example, on the Iowa Test of Basic Skills (www.riverpub.com) designed for students are young as kindergarten the vocabulary test assesses listening vocabulary. The teacher reads a word and may also read a sentence containing the word. Students are then asked to choose one of three pictorial response options.

Achievement tests are used as one criterion for obtaining a license in a variety of professions including nursing, physical therapy, and social work, accounting, and law. Their use in teacher education is recent and is part of the increased accountability of public education and most States require that teacher education students take achievement tests in order to obtain a teaching license. For those seeking middle school and high school licensure these are tests are in the content area of the major or minor (e.g. mathematics, social studies); for those seeking licenses in early childhood and elementary the tests focus on knowledge needed to teach students of specific grade levels. The most commonly used tests, the PRAXIS series, tests I and II, developed by Educational Testing Service, include three types of tests (www.ets.org):

Subject Assessments, these test on general and subject-specific teaching skills and knowledge. They include both multiple-choice and constructed-response test items.
Principles of Learning and Teaching (PLT) Tests assess general pedagogical knowledge at four grade levels: Early Childhood, K–6, 5–9, and 7–12. These tests are based on case studies and include constructed-response and multiple-choice items. Much of the content in this textbook is relevant to the PLT tests.
Teaching Foundations Tests assess pedagogy in five areas: multi-subject (elementary), English, Language Arts, Mathematics, Science, and Social Science.

These tests include constructed-response and multiple-choice items which tests teacher education students. The scores needed in order to pass each test vary and are determined by each state.

Diagnostic tests

Profiling skills and abilities: Some standardized tests are designed to diagnose strengths and weaknesses in skills, typically reading or mathematics skills. For example, an elementary school child may have difficult in reading and one or more diagnostic tests would provide detailed information about three components: (1) word recognition, which includes phonological awareness (pronunciation), decoding, and spelling; (2) comprehension which includes vocabulary as well as reading and listening comprehension, and (3) fluency (Joshi 2003). Diagnostic tests are often administered individually by school psychologists, following standardized procedures. The examiner typically records not only the results on each question but also observations of the child’s behavior such as distractibility or frustration. The results from the diagnostic standardized tests are used in conjunction with classroom observations, school and medical records, as well as interviews with teachers, parents and students to produce a profile of the student’s skills and abilities, and where appropriate diagnose a learning disability.

Aptitude tests

Predicting the future: Aptitude tests, like achievement tests, measure what students have learned, but rather than focusing on specific subject matter learned in school (e.g. math, science, English or social studies), the test items focus on verbal, quantitative, problem solving abilities that are learned in school or in the general culture (Linn & Miller, 2005). These tests are typically shorter than achievement tests and can be useful in predicting general school achievement. If the purpose of using a test is to predict success in a specific subject (e.g. language arts) the best prediction is past achievement in language arts and so scores on a language arts achievement test would be useful. However when the predictions are more general (e.g. success in college) aptitude tests are often used. According to the test developers, both the ACT and SAT Reasoning tests, used to predict success in college, assess general educational development and reasoning, analysis and problem solving as well as questions on mathematics, reading and writing (www.collegeboard.com; www.act.org). The SAT Subject Tests that focus on mastery of specific subjects like English, history, mathematics, science, and language are used by some colleges as entrance criteria and are more appropriately classified as achievement tests than aptitude tests even though they are used to predict the future.

Tests designed to assess general learning ability have traditionally been called Intelligence Tests but are now often called learning ability tests, cognitive ability tests, scholastic aptitude tests, or school ability tests. The shift in terminology reflects the extensive controversy over the meaning of the term intelligence and that its traditional use was associated with inherited capacity (Linn & Miller 2005). The more current terms emphasize that tests measure developed ability in learning not innate capacity. The Cognitive Abilities Test assesses K-12 students’ abilities to reason with words, quantitative concepts, and nonverbal (spatial) pictures. The Woodcock Johnson III contains cognitive abilities tests as well as achievement tests for ages 2 to 90 years (www.riverpub.com).

References

Combined Curriculum Document Reading 4.1 (2006). Accessed November 19, 2006 from http://www.education.ky.gov/KDE/Instructional+Resources/Curriculum+Documents+and+Resources/Teaching+Tools/Combined+Curriculum+Documents/default.htm

Haertel, E. & Herman, J. (2005) A historical perspective on validity arguments for accountability testing. In J. L.Herman & E. H. Haertel (Eds.) Uses and misuses of data for educational accountability and improvement. 104th Yearbook of the National Society for the Study of Education. Malden, MA: Blackwell.

Human Resources Division (n.d.). Firefighter Commonwealth of Massachusetts Physical Abilities Test (PAT). Accessed November, 19, 2006 from http://www.mass.gov/?pageID=hrdtopic&L=2&L0=Home&L1=Civil+Service&sid=Ehrd

Joshi, R. M. (2003). Misconceptions about the assessment and diagnosis of reading disability. Reading Psychology, 24, 247–266.

Linn, R. L., & Miller, M. D. (2005). Measurement and Assessment in Teaching 9th ed. Upper Saddle River, NJ: Pearson.

New York State Education Department (2005). Home Instruction in New York State. Accessed on November 19, 2006 from http://www.emsc.nysed.gov/nonpub/part10010.htm

Popham, W. J. (2004). America’s “failing” schools. How parents and teachers can copy with No Child Left Behind. New York: Routledge Falmer.

Popham, W. J. (2005). Classroom Assessment: What teachers need to know. Boston:, MA: Pearson.

Wise, S. L. & DeMars, C. W. (2005). Low examinee effort in low-stakes assessment: Problems and potential solutions. Educational Assessment 10(1), 1–17.

Chapter 12: Standardized and Other Formal Assessments