7. Observational Strategies

          Definition:  Assessment methods involving the direct observation of behavior.

          Description.  Behavioral assessment is one of the predominant types of contemporary observation in clinical settings. The processes, assumptions and procedures of behavioral assessment differ from traditional measurement.  Hartman et al. (2004) emphasized that behavioral assessment is direct, repeated, and idiographic.  Assessment is direct in that the assessor measures observable behavior.  Any observed behavior is considered to be a sample of potential behavior, as opposed to a sign of an underlying, latent trait.  Behavior is measured repeatedly for the purpose of demonstrating relative stability before intervention, and change after intervention, thus demonstrating that the intervention is the cause of the behavioral change.  Assessment may consist of continuous recording of behavior (when only a few behaviors occur) or some type of strategic sampling with many behaviors.

Behavioral psychologists assess idiographic variables, that is, those unique to the individual in question.  Cone (1988) argued that nomothetic, trait-based measurements produce data remote from single cases.  He suggested that idiographic instruments will be more sensitive to individual behavior change.  In this context, idiographic measures are criterion-referenced (i.e., scores are compared to some absolute measure of behavior), while nomothetic are norm-referenced (i.e., scores are compared among individuals).  Norm-referenced tests are constructed to maximize variability among individuals (Swezey, 1981).  However, items that measure behaviors infrequently performed by the population are unlikely to be included in norm-referenced tests.  Jackson (1970), for example, suggested that items checked by less than 20% of the test development sample be dropped because they will not contribute to total score variability.  Yet those infrequent, idiographically relevant items may be the very ones of interest to therapists and theorists.


Although developed for use in inpatient mental health settings, the approach described by Paul and colleagues generalizes to a wide range of settings. Paul, Mariotto, and Redfield (1986) suggested that the units of observation be established before the observation period so that observers are able to focus on important elements.  Such units should be discrete samples of behavior, as opposed to global signs, since greater amounts of interpretation by observers are more likely to reflect characteristics of the observer.  In a clinical setting, examples of discrete (and inappropriate) behavior include talking to oneself and hitting another person. Error arising from such factors as carelessness or fatigue by the rater will be minimized when measurement data can be aggregated from multiple occasions.  Paul et al. (1986) concluded that the accuracy and relevance of observations can be maximized using multiple, discrete, and scheduled observations made by trained observers as soon as possible following a behavioral event.

Paul et al. (1986) noted two important sources of error that should be monitored with observers:  (a) decay, random changes in the observer’s reliability or consistency of observation, and (b) drift, systematic changes in the definition or interpretation of coding categories.  A rater evidencing decay might pay close attention to observing initially, then tire over the course of several hours.  A drifting rater might forget the initial rules for what constitutes “shouting,” for example, and begin to count in that category any time a person simply raises her or his voice.  Paul et al. (1986) maintained that such errors could be minimized by obtaining converging data from different assessment procedures, conditions, and operations.  Such observer biases have been linked to fatigue, knowledge of hypotheses, and observer’s expectancies (Hoshmand, 1994).


For use of behavioral assessment in education, check out this video.