Measuring Intelligence

History of Intelligence Testing

Intelligence testing has evolved over time as researchers continually seek the best method for measuring intelligence.

Learning Objectives

Trace the history of intelligence testing

Key Takeaways

Key Points

  • The Wechsler scales were the first intelligence scales to base scores on a standardized normal distribution.
  • The Stanford-Binet Intelligence Scale formed the basis for one of the modern intelligence tests that remains in common use.
  • Critics claim that environmental factors, such as quality of education and school systems, lead to cultural discrepancies in test scores.
  • Two supposedly culture-fair intelligence tests are Cattell’s Culture-Fair Intelligence test and Raven’s Progressive Matrices. These tests focus on measuring ” g “, or general intelligence, rather than specific skill sets.

Key Terms

  • intelligence quotient: A score derived from one of several different standardized tests attempting to measure intelligence.
  • g: Short for general intelligence; a construct developed in psychometric investigations of cognitive abilities that summarizes positive correlations among different cognitive tasks.
  • psychometrician: A person who designs, administers, and scores tests.

Our concept of intelligence has evolved over time, and intelligence tests  have evolved along with it. Researchers continually seek ways to measure intelligence more accurately.

History of Intelligence Testing

The abbreviation ” IQ ” comes from the term intelligence quotient, first coined by the German psychologist William Stern in the early 1900s (from the German Intelligenz-Quotient). This term was later used in 1905 by Alfred Binet and Theodore Simon, who published the first modern intelligence test, the Binet-Simon intelligence scale. Because it was easy to administer, the Binet-Simon scale was adopted for use in many other countries.

These practices eventually made their way to the United States, where psychologist Lewis Terman of Stanford University adapted them for American use. He created and published the first IQ test in the United States, the Stanford-Binet IQ test. He proposed that an individual’s intelligence level be measured as a quotient (hence the term “intelligence quotient”) of their estimated mental age divided by their chronological age. A child’s “mental age” was the age of the group which had a mean score that matched the child’s score. So if a five year-old child achieved at the same level as an average eight year-old, he or she would have a mental age of eight. The original formula for the quotient was Mental Age/Chronological Age x 100. Thus, a five year-old child who achieved at the same level as his five year-old peers would score a 100. The score of 100 became the average score, and is still used today.

Wechsler Adult Intelligence Scale

In 1939, David Wechsler published the first intelligence test explicitly designed for an adult population, known as the Wechsler Adult Intelligence Scale, or WAIS. After the WAIS was published, Wechsler extended his scale for younger people, creating the Wechsler Intelligence Scale for Children, or WISC. The Wechsler scales contained separate subscores for verbal IQ and performance IQ, and were thus less dependent on overall verbal ability than early versions of the Stanford-Binet scale. The Wechsler scales were the first intelligence scales to base scores on a standardized bell curve (a type of graph in which there are an equal number of scores on either side of the average, where most scores are around the average and very few scores are far away from the average).

Modern IQ tests now measure a very specific mathematical score based on a bell curve, with a majority of people scoring the average and correspondingly smaller amounts of people at points higher or lower than the average. Approximately 95% of the population scores between 70 and 130 points. However, the relationship between IQ score and mental ability is not linear: a person with a score of 50 does not have half the mental ability of a person with a score of 100.


IQ Curve: The bell shaped curve for IQ scores has an average value of 100.

General Intelligence Factor

Charles Spearman was the pioneer of the theory that underlying disparate cognitive tasks is a single general intelligence factor or which he called g. In the normal population, g and IQ are roughly 90% correlated. This strong correlation means that if you know someone’s IQ score, you can use that with a high level of accuracy to predict their g, and vice versa. As a result, the two terms are often used interchangeably.

Culture-Fair Tests

In order to develop an IQ test that separated environmental from genetic factors, Raymond B. Cattell created the Culture-Fair Intelligence Test. Cattell argued that general intelligence g exists and that it consists of two parts: fluid intelligence (the capacity to think logically and solve problems in novel situations) and crystallized intelligence (the ability to use skills, knowledge, and experience). He further argued that g should be free of cultural bias such as differences in language and education type. This idea, however, is still controversial.

Another supposedly culture-fair test is Raven’s Progressive Matrices, developed by John C. Raven in 1936. This test is a nonverbal group test typically used in educational settings, designed to measure the reasoning ability associated with g.

The Flynn Effect

During the early years of research, the average score on IQ tests rose throughout the world. This increase is now called the “Flynn effect,” named after Jim Flynn, who did much of the work to document and promote awareness of this phenomenon and its implications. Because of the Flynn effect, IQ tests are recalibrated every few years to keep the average score at 100; as a result, someone who scored a 100 in the year 1950 would receive a lower score on today’s test.

IQ Tests

IQ tests are used to measure human intelligence quotient as measured against an age-based average intelligence score.

Learning Objectives

Explain how IQ scores are measured on a normal curve

Key Takeaways

Key Points

  • IQ tests calculate a person’s intelligence quotient score, which is based on a relative scale, measured against an age-based average score.
  • Because IQ tests are often used to predict either positive or negative events in a person’s life span, they can be misinterpreted as proving that intelligence causes certain outcomes. It is more likely, however, that environmental factors contribute to both IQ scores and to outcomes in life.
  • Current IQ tests measure personal scores based on standard deviations from a well-established average, and are thought to be relatively stable over time.
  • IQ tests are psychometric and person-centric tests that are statistically reliable and valid, but do not necessarily represent the same type of intelligence across cultures.
  • IQ tests are often criticized for being biased, and for only measuring one aspect of intelligence.

Key Terms

  • reliability: A measure of whether the results of a test are consistent and repeatable.
  • validity: An assessment of whether a test measures what it claims to measure.
  • matrix: A rectangular arrangement of numbers or terms having various uses, such as transforming coordinates in geometry, solving systems of linear equations in linear algebra, and representing graphs.
  • bell curve: A set of data in which the majority of scores are clustered around the mean, and there are fewer scores the farther they are from the mean.
  • standard deviation: A statistical measure of variance which indicates how different a given score is from the mean (average).

IQ tests attempt to measure and provide an intelligence quotient, which is a score derived from a standardized test designed to access human intelligence. There are now several variations of these tests that have built upon and expanded the original test, which was designed to identify children in need of remedial education. Currently, IQ tests are used to study distributions in scores among specific populations. Over time, these scores have come to be associated with differences in other variables such as behavior, performance, and well-being; these vary based on cultural norms.

Measuring IQ Scores

After decades of revision, modern IQ tests produce a mathematical score based on standard deviation, or difference from the average score. Scores on IQ tests tend to form a bell curve with a normal distribution. In a normal distribution, 50% of the scores will be below the average (or mean) score and 50% of the scores will be above it.


IQ Bell Curve: In a normally distributed bell curve, half the scores are above the mean and half are below. The farther from the mean, the less frequent a given score is.

Normal distributions are special, because their data follows a specific, reliable pattern. Standard deviation is a term for measuring how far a given score is from the mean; in any normal distribution, you can tell what percentage of a population will fall within a certain score range by looking at standard deviations. It is a statistical law that under a normal curve, 68% of scores will lie between -1 and +1 standard deviation, 95% of scores will lie between -2 and +2 standard deviations, and >99% percent of scores will fall between -3 and +3 standard deviations.

The scores of an IQ test are normally distributed so that one standard deviation is equal to 15 points; that is to say, when you go one standard deviation above the mean of 100, you get a score of 115. When you go one standard deviation below the mean, you get a score of 85. Two standard deviations are 30 points above or below the mean, three are 45 points, and so on. So by current measurement standards, 68% of people score between 85 and 115, 95% of the population score between 70 and 130 points, and over 99% of the population score between 55 and 145.

It should be noted that this standard of measure does not imply a linear relationship between IQ and mental ability: a person with a score of 50 does not have half the mental ability of a person with a score of 100.


Standard Deviations of IQ Scores: IQ test scores tend to form a bell curve, with approximately 95% of the population scoring between two standard deviations of the mean score of 100.

IQ tests are a type of psychometric (person-centric) testing thought to have very high statistical reliability. This means that while a person’s scores may vary slightly with age and environmental condition, they are repeatable and will generally agree with one another over time. They are also thought to have high statistical validity, which means that they measure what they actually claim to measure, intelligence. This means that many people trust them to be used in other applications, such as clinical or educational purposes.

Types of IQ Tests and Tasks

There are a wide variety of IQ tests that use slightly different tasks and measures to calculate an overall IQ score. The most commonly used test series is the Wechsler Adult Intelligence Scale (WAIS) and its counterpart, the Wechsler Intelligence Scale for Children (WISC). Other commonly used tests include the original and updated version of Stanford-Binet, the Woodcock-Johnson Tests of Cognitive Abilities, the Kaufman Assessment Battery for Children, the Cognitive Assessment System, and the Differential Ability Scale. While all of these tests measure intelligence, not all of them label their standard scores as IQ scores.


WAIS Test Components: The WAIS uses a variety of components to determine a person’s IQ score, including verbal, memory, perceptual, and processing skills.

Currently, most tests tend to measure both verbal and performance IQ. Verbal IQ is measured through both comprehension and working (short-term) memory skills, such as vocabulary and arithmetic. Performance IQ is measured through perception and processing skills, such as matrix completion and symbol coding. All of these measures and tasks are used to calculate a person’s IQ.


Sample IQ Test Question: This is a sample IQ test question modeled after a person’s ability to identify and continue patterns in progressive matrices.

Standardized Tests

Standardized tests are identical exams always administered in the same way so as to be able to compare outcomes across all test-takers.

Learning Objectives

Describe the strengths and limitations of standardized tests

Key Takeaways

Key Points

  • A standardized test is any exam that is always administered the same way and that is scored consistently according to a set of standards.
  • Types of standardized test include achievement tests, diagnostic tests, and aptitude tests.
  • Standardized tests evaluate performance either against a particular criterion or against the performance of others.
  • Standardized tests are often used to select students for programs or school admission.

Key Terms

  • standardized tests: An exam that is always administered the same way and that is scored consistently according to a set of standards.

Standardized tests are assessments that are always administered in the same way so as to be able to compare scores across all test-takers. Students respond to the same questions, receive the same directions, and have the same time limits, and the tests are scored according to explicit, standard criteria. Standardized tests are usually created by a team of test experts from a commercial testing company in consultation with classroom teachers and university faculty.

Standardized tests are designed to be taken by many students within a state, province, or nation (and sometimes across nations). Standardized tests are perceived as being “fairer” than non-standardized tests and more conducive to comparison of outcomes across all test takers. That said, several widely used standardized tests have also come under heavy criticism for potentially not actually evaluating the skills they say they test for.

Types of standardized tests include:

  • Achievement tests, which are designed to assess what students have learned in a specific content area or at a specific grade level.
  • Diagnostic tests, which are used to profile skills and abilities, strengths and weaknesses.
  • Aptitude tests, which, like achievement tests, measure what students have learned; however rather than focusing on specific subject matter learned in school, the test items focus on verbal, quantitative, problem solving abilities that are learned in school or in the general culture. According to test developers, both the ACT and SAT assess general educational development and reasoning, analysis and problem solving, as well as predicting success in college.

Scoring Standardized Tests

Standardized test scores are evaluated in two ways: relative to a specific scale or criterion (“criterion-referenced”) or relative to the rest of the test-takers (“norm-referenced”). Some recent standardized tests incorporate both criterion-referenced and norm-referenced elements in to the same test.


Scantron scoring: Many standardized tests are capable of testing students on only multiple-choice questions because they are scored by machine.

Standardized Tests and Education

Standardized tests are often used to select students for specific programs. For example, the SAT (Scholastic Aptitude Test) and ACT (American College Test) are norm-referenced tests used to help admissions officers decide whether to admit students to their college or university. Norm-referenced standardized tests are also one of the factors in deciding if students are eligible for special-education or gifted-and-talented programs. Criterion-referenced tests are often used to determine what students are eligible for promotion to the next grade or graduation from high school.

Standardized Tests and Intelligence

Some standardized tests are designed specifically to assess human intelligence. For example, the commonly used Stanford-Binet IQ test, the Wechsler Adult Intelligence Scale (WAIS), and the Wechsler Intelligence Scale for Children (WISC) are all standardized tests designed to test intelligence. However, these tests differ in how they define intelligence and what they claim to measure. The Stanford-Binet test aims to measure g-factor, or “general intelligence.” David Wechsler, the creator of the Wechsler intelligence scales, thought intelligence measurements needed to address more than just one factor and also that they needed to take into account “non-intellective factors” such as fear of failure or lack of confidence.

It is important to understand what a given standardized test is designed to measure (as well as what it actually measures, which may or may not be the same). For example, many people mistakenly believe that the SAT is a test designed to measure intelligence. However, while SAT scores and g-factor are related, the SAT is in fact designed to measure literacy, writing, and problem-solving skills needed to succeed in college and is not necessarily a reflection of intelligence.

Controversies in Intelligence and Standardized Testing

Intelligence tests and standardized tests face criticism for their uses and applications in society.

Learning Objectives

Discuss the major controversies surrounding intelligence testing

Key Takeaways

Key Points

  • Intelligence tests and standardized tests are widely used throughout many different fields (psychology, education, business, etc.) because of their ability to assess and predict performance, but their use is controversial.
  • One criticism lies in the use of intelligence and standardized tests as predictive measures for social outcomes; simply because test scores and outcomes are correlated does not mean one causes or predicts the other.
  • Another criticism occurs when scores of standardized tests are misused as measures of intelligence.
  • Standardized tests cannot adequately account for gender and culture differences, and critics argue that test outcomes are influenced by a number of unacknowledged factors, including genetics, environment, and culture.

Key Terms

  • psychometrics: The field of study concerned with the theory and technique of psychological measurement.
  • correlation: One of the several measures of the linear statistical relationship between two random variables, indicating the strength of the relationship but not necessarily the causation.
  • aptitude: The natural ability to acquire knowledge or skill.

Intelligence tests and standardized tests are widely used throughout many different fields (psychology, education, business, etc.) because of their ability to assess and predict performance. However, their uses and applications in society are often criticized. Those criticisms usually concern the use and applications of these measures.

The Issue of Validity

Intelligence tests (such as IQ tests) have always been controversial; critics claim that they measure factors other than intelligence. They also cast doubt on the validity of IQ tests and whether IQ tests actually measure what they claim to measure—intelligence. Some argue that environmental factors, such as quality of education and school systems, can cause discrepancies in test scores that are not based on intelligence. Other argue that an individual’s test-taking skills are being evaluated rather than their intelligence.

The field of psychometrics is devoted to the objective measurement of psychological phenomena, such as intelligence. Psychometricians have sought to make intelligence tests more culture fair and valid over the years, and to make sure that they measure g, or the “general intelligence factor” thought to underly all intelligence.

Prediction of Social Outcomes

Another criticism lies in the use of intelligence and standardized tests as predictive measures for social outcomes. Researchers have learned that IQ and general intelligence (g) correlate with some social outcomes, such as lower IQs being linked to incarceration and higher IQs being linked to job success and wealth. However, it is important to note that correlational studies only show a relationship between two factors: they give no indication about causation . As a result, critics of intelligence testing argue that intelligence cannot be used to predict such outcomes, and that environmental factors are more likely to contribute to both IQ test results and later outcomes in life.

The controversy surrounding using intelligence and standardized tests as predictive measures for social outcomes is, at its core, an ethical one. Consider the implications if employers decided to use intelligence tests as a way to screen prospective employees in order to predict which individuals will be successful in a job. This misapplication of intelligence testing is considered unethical, because it provides a measure for discriminating against fully qualified individuals. Again, even if intelligence scores correlate with job success, this does not mean that people with high intelligence will always be successful at work.

Standardized Test Scores and Intelligence

Another criticism points out that standardized tests that actually measure specific skills are misinterpreted as measures of intelligence. Researchers examined the correlation between the SAT exam and two other tests of intelligence and found a strong relationship between the results. They concluded that the SAT is primarily a test of g or general intelligence. However, correlational studies provide information about a relationship, not about causation. Using a standardized test like the SAT, which is designed to measure scholastic aptitude, as a measure of intelligence is outside the scope of the tests’ intended usage, even if the two do correlate.

Critics of standardized tests also point to problems associated with using the SAT and ACT exams to predict college success. According to recent research, the SAT and ACT have been found to be poor predictors of college success. Standardized tests don’t measure factors like motivational issues or study skills, which are also important for success in school. Predicting college success is most reliable when a combination of factors is considered, rather than a single standardized test score.


Student taking test: Students take the SAT and/or ACT exam in order to gain admittance to college.


A similar controversy surrounding the use of intelligence tests surrounds whether or not these tests are biased such that certain groups have an advantage over other groups. Questions of bias raise similar questions to the questions around whether intelligence tests should be used to predict social outcomes. For example, the relationship between wealth and IQ is well-documented. Could this mean that IQ tests are biased toward wealthy individuals? Or does the relationship go the other way? If there are statistically significant group differences in IQ, whether based on race, gender, socioeconomic status, age, or any other division, it is important to take a look at the intelligence test in question to make sure that there are no differences in testing method that give one group an advantage over others along any dimension other than intelligence.

Additionally, IQ cannot be said to describe or measure all possible cultural representations of intelligence. Various cultures value different types of mental abilities based on their cultural history, and the IQ test is a highly westernized construct. As such, IQ tests are also criticized for assessing only those particular areas emphasized in the western conceptualization of intelligence, such as problem-solving, and failing to account for other areas such as creativity or emotional intelligence.

IQ tests are often criticized for being culturally biased. A 2005 study stated that IQ tests may contain cultural influences that reduce their validity as a measure of cognitive ability for Mexican-American students, indicating a weaker positive correlation relative to sampled white American students. Other recent studies have questioned the culture-fairness of IQ tests when used in South Africa. Standard intelligence tests, such as the Stanford-Binet, are often inappropriate for children with autism, and may have resulted in incorrect claims that a majority of children with autism are mentally retarded.