0% found this document useful (0 votes)
33 views

Practical Implications of Data Reliability and Treatment Integrity Monitoring

This document discusses the importance of data reliability and treatment integrity monitoring in clinical practice. It provides reasons why these measures are paramount, as inaccurate or missing data could lead to ineffective interventions being continued or more intensive interventions being pursued unnecessarily. Examples from medical contexts help illustrate how failures to properly monitor patient progress could wrongly suggest that treatments are or are not working. The document concludes by advocating for the routine collection of reliability and integrity data in clinical work to support appropriate decision making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Practical Implications of Data Reliability and Treatment Integrity Monitoring

This document discusses the importance of data reliability and treatment integrity monitoring in clinical practice. It provides reasons why these measures are paramount, as inaccurate or missing data could lead to ineffective interventions being continued or more intensive interventions being pursued unnecessarily. Examples from medical contexts help illustrate how failures to properly monitor patient progress could wrongly suggest that treatments are or are not working. The document concludes by advocating for the routine collection of reliability and integrity data in clinical work to support appropriate decision making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Practical Implications of Data Reliability and

Treatment Integrity Monitoring


Timothy R. Vollmer, Ph.D., BCBA and Kimberly N. Sloman, Ph.D., BCBA, University of Florida
Claire St. Peter Pipkin, Ph.D., BCBA , West Virginia University

ABSTRACT
Data reliability and treatment integrity have important implications for clinical practice because
they can affect clinicians’ abilities to accurately judge the efficacy of behavioral interventions.
Reliability and integrity data also allow clinicians to provide feedback to caregivers and to adjust
interventions as needed. We present reasons why reliability and integrity measures are paramount
in clinical work, discuss events that may result in decreased reliability or integrity, and provide
several efficient means for collecting data and calculating reliability and integrity measures.
Descriptors: Data analysis, integrity, reliability

I t is standard practice to record


data reliability (i.e., interobserver
agreement) when conducting applied
behavioral experiments (Hartmann,
1977). It is not standard practice to
that a reinforcer should be delivered
after some specified instance of vocal
communication, is the reinforcer actually
delivered?
In the course of our behavior
changes or lack thereof in psychotropic
medication, use of restrictive or labor
intensive staffing, and so on. It seems
clear that few would question the
appropriateness of data reliability and
record treatment integrity in applied analytic practice, we frequently evaluate treatment integrity if the problem was
behavioral experiments, but there have the implementation of behavioral medical rather than behavioral. Consider
been strong calls to do so, along with procedures in service settings, schools, two medical analogies:
some recent evidence that the practice homes, and other settings. Often, • Patient A has severe seizures and is
is increasing in frequency (McIntyre, when we begin to detail the process of therefore prescribed medication Z
Gresham, DiGennaro, & Reid, 2007). collecting reliability data and treatment as treatment. Patient A’s parents are
However, there has been little discussion integrity data, we hear something akin asked to record all instances of seizures
about the importance of these measures to the following complaint: “We are not before the introduction and after the
in everyday practice of behavior analysis. conducting research here. I know you introduction of medication Z. Sup-
The purpose of this paper is to provide are researchers and for you that kind of pose the parents are reasonably dili-
a brief background on types of reliability thing is important, but we are running gent and accurately recording seizures
and integrity measures, a rationale for a treatment center, not conducting an prior to the medication, but slack off
the use of these measures in clinical experiment.” Such comments have a bit and forget to record many of the
settings, and some possible methods to come from a range of people including seizures after the introduction of medi-
collect reliability and integrity data. teachers, behavior analysts, sophisticated cation Z. At the next medical appoint-
Various kinds of reliability measures parents, and others. In other words, many ment, based on the parents’ data, Pa-
can be taken, but in this paper we are skilled people are conducting practice tient A’s medical team concludes that
referring specifically to the extent without data reliability and treatment medication Z was effective and the pa-
to which two observers agree on the integrity monitoring. We view this as a tient shall remain on the medication.
occurrence or nonoccurrence of events. potentially dangerous practice. In truth, data reliability checks would
For example, if person A records an A failure to collect data reliability have shown that the recording of sei-
instance of aggression between 2:30 and treatment integrity measures is zures had slacked off and there was no
p.m. and 2:35 p.m., does person B also potentially dangerous because life- real change in frequency. Medication
record an instance of aggression during changing decisions are made based on Z was ineffective but the data suggest-
that time frame? Do the observers agree the assumption that the data reported ed otherwise. Patient A receives an in-
that the episode did or did not occur? are reasonably accurate and based on effective medication and a potentially
By treatment integrity, we mean the the assumption that the prescribed effective medication (say, medication
extent to which behavioral procedures procedures were conducted as specified. Y) is left on the shelf.
are conducted according to a behavior Some life-changing decisions that • Patient B has severe seizures and is
change plan (Gresham, Gansle, Noell, arise from these assumptions include therefore prescribed medication X as
Cohen, & Rosenblum, 1993). For residential placement, the use of treatment. Patient B’s nurse is asked
example, if the behavior plan states restrictive behavioral procedures, to administer the medication X twice

4 PRATICAL IMPLICATIONS OF DATA RELIABILITY Behavior Analysis in Practice, 1(2), 4-11.

BAP_v1.2_p1-72.indd 4 10/10/08 8:55:48 AM


daily. Suppose the nurse frequently ioral assessment by a qualified team. of the data collector or treatment
forgets to give the medication but her A procedure based on differential re- implementer, such as when there are poor
data records of seizure episode fre- inforcement is prescribed as a result behavioral definitions. In such cases, the
quency are reasonably accurate (show- of the assessment outcome. Person data collector/procedure implementer
ing no change, because the medication D’s teacher is asked to implement the should not receive positive or corrective
is given no chance to work). At the procedure but frequently forgets to do feedback but should be invited to help
next medical appointment, Patient B’s so, and in the process reinforces self- review definitions and other sources
medical team concludes that Medica- injury and places alternative behavior of error. When feedback is given, we
tion X was ineffective and is moved to on extinction. At the next interdis- recommend that any opportunity for
prescribe Medication W as an alterna- ciplinary professional team meeting, positive feedback should be seized upon.
tive, and Medication W is known to the team concludes that the behavioral For example, the person monitoring
have serious side effects. In truth, the treatment was ineffective and they pre- the data should avoid statements such
medication may have been effective if scribe a potentially dangerous psycho- as “Well, that was a waste of time, the
administered as prescribed and now tropic medication, contingent physical behavior did not even occur so we could
Patient B is receiving a more danger- restraint, and extra staffing. In truth, not compare our data.” Rather, if both
ous medication. treatment integrity checks would have observers did not record an instance of
Such examples are relatively shown that the procedure was not behavior, the monitor can say, “Great,
straightforward because it is not difficult implemented correctly and it may well we both recorded that the behavior
to understand the need for (a) accurately have been effective if conducted with did not occur. That goes down as an
monitoring a medical condition that is good integrity. The person now re- agreement and we were successful today.”
being treated via medication, and (b) ceives dangerous, intrusive, and costly Similarly, there is almost always some
accurately administering medication. interventions. client/student appropriate behavior to
It is simple enough to insert behavior In these examples, data reliability reinforce, so the presence or absence of
and behavioral procedures into parallel errors resulted in “false positive” the targeted problem behavior should
examples, as follows: treatment outcomes (falsely showing a not be the only recorded behavior for
• Person C displays severe self-injury and good treatment effect) and treatment which the monitor provides feedback.
therefore receives a thorough behavior- integrity errors resulted in “false negative” We have found that if the corrective
al assessment by a qualified team. A outcomes (falsely showing no treatment feedback occurs with great frequency
procedure based on differential rein- effect). These examples were meant to in relation to the positive feedback,
forcement is prescribed as a result of highlight some of the implications of the monitor can become a conditioned
the assessment outcome. Person C’s data reliability and treatment integrity aversive stimulus. That is, data
parents are given instructions to con- monitoring. Our general thesis is that collectors and treatment implementers
duct the procedure and to record data measuring reliability and integrity is may begin to escape or avoid monitoring
on self-injury prior to and following inherently important. In addition, there sessions. On the other hand, when
the implementation of treatment. The are several advantages to such an approach the monitor frequently points out
parents are diligent and reasonably that may have practical utility on a day- correct data recording and procedural
accurate with data collection prior to to-day basis. Next, we will present some implementation, the sessions should be
treatment, but they slack off a bit fol- practical usages of data reliability and favorable for the primary data collector/
lowing the initiation of treatment and treatment integrity monitoring. procedure implementer. It may be
forget to record many instance of self- important to schedule observations
injury. At the next interdisciplinary Practical Usage during periods when the behavior is
professional team meeting, the team One practical usage of data reliability most likely to occur, in order to provide
concludes that the behavioral treat- and treatment integrity monitoring is more opportunities for comparison
ment was effective based on the par- to provide immediate feedback to the and feedback. For example, if target
ents’ data, and no changes are made. data collector and implementer of the behavior is maintained by escape from
In truth, data reliability checks would procedure. The feedback should take instructions, the observation should be
have shown that the behavioral treat- two forms: (a) positive feedback for scheduled for instructional sessions.
ment was ineffective. The differen- correct data recording and/or procedural Strategic scheduling of observations
tial reinforcement procedure was not implementation, and (b) corrective may be especially important for low rate
working and other potentially effective feedback for incorrect data recording behavior in order to increase observation
behavioral procedures (such as other and/or procedural implementation opportunities.
variations on differential reinforce- (DiGennaro, Martens, & Kleinmann, A second practical usage is to provide
ment) are left on the “shelf.” 2007; Sulzer-Azaroff & Mayer, 1991). delayed and cumulative performance
• Person D displays severe self-injury and Of course the incorrect data or procedural feedback to data collectors/procedure
therefore receives a thorough behav- implementation may not be the “fault” implementers (Noell et al., 2000). This

PRATICAL IMPLICATIONS OF DATA RELIABILITY 5

BAP_v1.2_p1-72.indd 5 10/10/08 8:55:48 AM


function is similar to the immediate
feedback discussed above, but relies
on the added feature of long-term
performance trends. With the same
caveats as discussed above (such as other
reasons for poor reliability and integrity
including poorly phrased definitions),
delayed feedback could take two general
forms: (a) positive feedback in the form
of recognition, promotion, and praise,
or (b) corrective feedback in the form of
additional training or further detailing
of procedures or supervisor meetings
(Noell et al.). Some excellent uses of
delayed positive feedback for cumulative
performance include public recognition
at a staff or parent meeting (e.g., “Mrs.
Smith has been providing care for a
child with very dangerous behavior; Figure 1. Hypothetical data showing the interaction between the percentage of correct
I am happy to report that her data steps completed (treatment integrity, shown in the filled circles) and child problem
reliability scores have exceeded 90% for behavior (shown in the open circles). Child problem behavior increases as treatment
the past three months and her treatment integrity decreases; a booster training (shown by the arrow) results in increased treatment
implementation scores were 100% for integrity and regained treatment effects.
last month!”); public recognition via
awards; acknowledgement on a website If reliability and integrity measures are schedule of reinforcement. Suppose a
or in a newsletter or newspaper; and so solid, then good clinical decisions can parent correctly implements extinction
on, including private recognition in a be made based on a proper evaluation of during 95% of episodes of child night
written or oral performance evaluation. treatment effects or lack thereof. time disruptive behavior. This means that
A third practical usage relates to the behavior is reinforced on a variable
clinical decision-making. Changes Some Caveats about Reliability ratio (VR) 20 schedule, which could
in behavioral procedures should be and Procedural Integrity maintain the problem behavior. Thus,
informed by reliability and treatment It is important to note that a high an integrity score that looks and sounds
integrity data, as exemplified by the reliability score does not necessarily “high” may be very bad, depending
hypothetical cases presented earlier. equate to high accuracy. Clearly, two on the procedure. Alternatively, some
For example, if there is an increase in observers could be wrong about the same procedures may not require high
problem behavior rates simultaneous thing (Hawkins & Dotson, 1975). Also, levels of integrity to be successful.
with improved data reliability scores, it because some reliability measures tend For example, an occasional error on a
is possible that data collectors are simply to be either more conservative or more differential reinforcement of alternative
getting better at data collection and, liberal than others, there is no “magic” (DRA) behavior schedule might not be
therefore, increased problem behavior score that would indicate good reliability. damaging if the alternative (desirable)
rates may not present a need for changed Because of these caveats related to data behavior receives more reinforcement
procedures. In the case of treatment reliability, we recommend using the than the problem behavior. Suppose a
integrity, it is possible that poor treatment measures to indicate when something parent reinforces tantrums on a VR 4
effects are not due to a poor treatment is clearly wrong. In other words, one schedule (75% integrity if the prescribed
per se, but due to a treatment that is should not necessarily be comforted by intervention is no reinforcement
not being implemented sufficiently. a high percentage of agreement, but one following tantrums) but reinforces
Figure 1 shows a hypothetical example should certainly be concerned by a low appropriate requests for attention on
of using integrity measures to determine percentage of agreement. a VR 2 schedule (50% integrity if the
a need for booster training (for an An important caveat about prescribed intervention is reinforcement
actual example, see Vollmer, Marcus, & treatment integrity is that different following all appropriate requests for
LeBlanc, 1994). Thus, a behavior analyst procedures require different levels of attention). Because the schedule is
should be equipped with both reliability correct implementation. For example, much richer for appropriate behavior, we
and treatment integrity data whenever an occasional error on an extinction can predict based on decades of research
critical clinical decisions are being made. procedure equates to an intermittent on choice behavior that the child would

6 PRATICAL IMPLICATIONS OF DATA RELIABILITY

BAP_v1.2_p1-72.indd 6 10/10/08 8:55:48 AM


allocate almost all behavior in the observers or personnel implementing specifically, observers may not know how
direction of appropriate behavior. Thus, behavioral programs provide a response to fill out the data collection forms or
what might look and sound like “low” at an inappropriate time. For data use data collection devices, and may also
integrity may be very good, depending reliability, errors of commission may commit errors because they are not aware
on the procedure. Our examples using include recording an event when it did of the correct definitions of behavior and
extinction and differential reinforcement not occur, or recording one event in place environmental events. Likewise, persons
are intended to be illustrative rather than of a different event. For example, an implementing behavioral programs
comprehensive, as of course behavioral observer may record that a child engaged may not have sufficient information to
procedures have many complexities that in self-injury when instead he engaged in conduct the protocol.
can be relatively sensitive or insensitive aggression. For treatment integrity, errors A second factor influencing integrity
to integrity problems (such as different of commission may include delivering and reliability is the complexity of the
prompting methods, and so on). some antecedent or consequence at protocol. For example, if a protocol
A rule of thumb might be to conclude an inappropriate time. For example, requires an observer to collect data on
that data reliability and treatment a therapist may accidentally deliver a numerous responses and environmental
integrity scores should be considered reinforcer after problem behavior in a events, he or she may be more likely
carefully in a context from which these DRA treatment session. to commit reliability errors. Similarly,
data are collected. How conservative or Some reliability and integrity errors integrity errors may be more likely in
liberal is the reliability measure? How may be subtler than those describe a case where the person implementing
important is it to record every instance above. For example, two observers may the program has to complete several
of behavior? What treatment procedure record the same response but at slightly different steps (e.g. a detailed prompting
is being used? What is the likely effect different times. To illustrate, suppose sequence) across a variety of responses
of a treatment integrity error given the two observers are recording instances of (e.g. both appropriate and inappropriate
procedure used? self-injury and reliability is assessed on behavior) or with numerous clients or
With one or two exceptions, we have a minute-by-minute basis. If observer students. Thus, it is important that both
written so far on the assumption that a A records an instance of self-injury at observers and therapists are given clear,
reliability or integrity error is committed the end of minute 5 and observer B detailed, and manageable instructions on
by the primary observer/implementer. records an instance of self-injury at the the protocol and behavioral definitions.
This may be true, but it may not be the beginning of minute 6, there would In addition, individuals should be
“fault” of the observer/implementer per be a lack of agreement within those provided with ample time to practice
se. In the sections that follows we will respective intervals. If this discrepancy performing the required tasks, and
discuss some common types of errors occurs frequently throughout the data should be provided immediate positive
and then some common reasons for (or collection, these errors could result and corrective feedback about their
sources of ) those errors. in both poor reliability scores and performance during training procedures
dissimilar data outcomes. Similarly, (i.e., competency based training).
Common Reliability integrity errors can be said to occur any A third factor is the failure to
and Integrity Errors time there are discrepancies between generalize from the training setting.
There are several possible errors the prescribed protocol and the actual Namely, individuals may be able to
that may contribute to low reliability implementation of events (Peterson et perform the skills (data collection or
or integrity scores. The two most al., 1982). That is, integrity errors may treatment implementation) accurately in
basic reliability and integrity errors include inappropriate reinforcer delivery the training sessions, but fail to do so in
may be described as errors of omission as well as slight changes in the protocol. the actual environment. Generalization
and commission. Errors of omission For example, the errors may include of the skills may be facilitated by training
occur when observers or personnel delivery of reinforcers after a delay and several different exemplars (e.g. instances
implementing behavioral programs do the presentation of social cues such as of the behavior) and conducting training
not provide the appropriate response nods or smiles from the therapist. in several different environments (Stokes
when a specific event occurs. For data & Baer, 1977).
Sources of Reliability and Integrity Errors
reliability, errors of omission may A fourth possible factor influencing
include failing to document a response Several possible causes for reliability reliability and integrity errors has been
or environmental event. For treatment and integrity errors have been outlined referred to as a “drift” in performance
integrity, errors of omission may include in the literature (e.g. Allen & Warzak, (e.g., Kazdin, 1977). That is, individuals
failing to deliver a reinforcer for an 2000; Kazdin, 1977; Peterson et al.). initially perform the skills as prescribed
appropriate alternative response in a One main factor influencing these errors but then drift or alter their behavior
DRA procedure. may simply be inadequate or incomplete from the original protocol. Careful
Errors of commission occur when training of the protocols. More monitoring of observers and those

PRATICAL IMPLICATIONS OF DATA RELIABILITY 7

BAP_v1.2_p1-72.indd 7 10/10/08 8:55:48 AM


individuals implementing programs .
the two observers are then compared
combined with periodic booster training Some Suggestions for Reliability within each 10-s interval. For example,
sessions may help to prevent drift from Measures if one observer recorded two instances
occurring. As mentioned earlier, different of behavior in the first 10-s interval
A fifth possible factor influencing methods of calculating reliability may and a second observer recorded three
reliability and integrity errors may be yield more conservative or liberal instances of behavior in the first interval,
competing environmental contingencies. estimates. In addition, measures vary the reliability for that interval would be
More specifically, there may be reinforcers in their ease of calculation. Thus, 66.7% (two instances divided by three
for departures from the protocol, practitioners may choose measures based instances and multiplied by 100). Once
punishers in place for adherence with on whether stringent criteria or ease of reliability has been calculated for all
the protocol, or both. For example, a calculation are desirable, as well as on intervals in the observation, the scores are
study by O’Leary, Kent, and Kanowitz the type of data that are available. averaged to obtain the mean reliability
(1975) showed that observers who had Reliability measures vary in at for the entire observation. Although 10-s
received specific information about the least two ways: the size of the time intervals are common in research, larger
session (e.g. behavior should decrease in window and the type of data. Larger intervals such as 1-min or 5-min may be
the treatment phase) and feedback (e.g. time windows may make calculations more practical in everyday application.
praise for scoring low rates of behavior easier than smaller ones. One of the Proportional reliability has several
and reprimands for scoring higher rates simplest ways of calculating reliability is possible advantages over whole-session
of behavior) were biased in their data to count the total number of responses reliability. First, proportional measures
collection. Likewise, inaccurate reports scored (or the total number of intervals are more stringent than whole-session
of low rates of problem behavior by containing responses, depending on the measures. By breaking the session
caregivers may be accidentally reinforced data collection system) by each observer into smaller units, interval-by-interval
by praise and encouragement from throughout the observation period, to calculations reduce the likelihood of
a behavior analyst, especially if the divide the smaller number by the larger obtaining good reliability when two
behavior analyst is not present when number, and to then multiply by 100. observers record entirely different
the data collection is taking place. This yields an overall percentage of responses (as in the example given for
Conversely, reports might be more agreement for that observation. Whole- whole-session reliability above).
accurate only when the caregiver is aware session measures are simple to understand Another method is the exact
that a behavior analyst was currently and to calculate, but they provide only a agreement method, for which the
collecting reliability data (e.g., Brackett, liberal estimate of the reliability of the data observational intervals are scored as an
Reid, & Green, 2007). Thus, it would collection. For an extreme example, one “agreement” if both observers counted
be important in these circumstances to observer could score 10 instances of the exactly the same number of behavior
emphasize and praise the accuracy of data target response, then become distracted instances. If they do not agree exactly,
collection and refrain from mentioning or fall asleep. The second observer may the interval is scored as a “disagreement.”
specific changes in behavior. miss those initial 10 responses, but later The number of agreements are then
Integrity errors may also occur due to record 10 other responses (while the divided by the total number of intervals
competing schedules of reinforcement. first observer sleeps). A whole-session and converted to a percentage. This
For example, a behavior analyst may reliability measure for these two data sets method is even more conservative than
recommend that parental attention be would be 100% because both observers the proportional method, but it can
delivered for appropriate behavior, and scored 10 responses, but those responses sometimes be overly conservative. For
not for tantrums. However, the parent would have occurred at entirely different example, when the observers are just
may be in a setting (e.g., a grocery store) times. slightly off in their timing, behavior
in which adherence to the program Using shorter intervals within scored in one interval for one observer
is not reinforced and may actually be a longer observation period makes and in another interval for a second
punished (e.g., other shoppers giving reliability calculations more stringent observer produces two disagreement
dirty looks). Therefore, the parent’s and improves the confidence that both intervals even though both observers
delivery of attention for tantrums is observers were recording the same were scoring the same behavioral event.
negatively reinforced and future integrity instance of behavior. The use of shorter, Another method for reliability is
errors become more likely. Emphasizing within-session intervals is sometimes used when partial interval or whole
accuracy, providing consistent feedback called the proportional method. To interval recording is in place. Partial
about integrity level, and providing calculate proportional agreement, the interval refers to scoring the interval if
reinforcement for high levels of integrity total observation time is broken into the behavior occurs at any point in that
may be necessary to maintain high levels discrete units (intervals). For instance, interval. Whole interval recording refers
of integrity (DiGennaro, Martens, & a 10-min observation might be broken to scoring the interval if the behavior
Kleinmann, 2007). into 60, 10-s intervals. The records for occurs throughout the interval. Thus,

8 PRATICAL IMPLICATIONS OF DATA RELIABILITY

BAP_v1.2_p1-72.indd 8 10/10/08 8:55:49 AM


there is no “count” of behavior; the scores become less sensitive to rate appropriately applied extinction to
interval is simply scored as “occurrence” fluctuations. problem behavior. We discuss possible
or “nonoccurrence.” In the case of interval data sheets for calculating integrity in
recording, reliability can be calculated Some Suggestions this way in the next section, and provide
by denoting each interval as either an for Integrity Measures examples in the appendices.
agreement (both observers recorded Integrity is typically calculated
behavior or did not record behavior) or by examining the total percentage of Suggestions for monitoring
a disagreement (one observer recorded opportunities for which the procedure and data sheets
behavior while the other did not). The was implemented correctly. For example, Monitoring reliability and integrity
total number of agreements for the the overall integrity would be 80% if can be difficult for practitioners,
session are then divided by agreements a parent correctly applied extinction particularly when consulting on multiple
plus disagreements and multiplied by to three of five undesirable responses cases. To increase ease and efficiency,
100 to yield the mean reliability for the and correctly reinforced five of five monitoring sessions could be relatively
entire observation. appropriate responses (eight correct brief and occur on an intermittent basis.
Unfortunately, interval-by-interval responses of a total of 10 opportunities). For example, practitioners could conduct
calculations are sometimes impractical This method is simple to explain and a 10-min monitoring session once per
or impossible. This is the case if the calculate. Unfortunately, it also lumps week with each of their clients (e.g.,
data collection system does not permit together different types of integrity (in Noell & Witt, 1998). There sometimes
breaking the records into smaller units. this example, reinforcing appropriate seems to be a false belief that reliability
For example, assume a teacher collects behavior and failing to reinforce and integrity monitoring should be
data on the number of times a student problem behavior); these different types continuous or nearly continuous; if
raises his hand throughout a class by of integrity may differentially affect the that were the case, the monitor himself
making tally marks on a piece of paper. intervention outcome. For example, or herself could simply conduct the
For some classes, a second observer (for reinforcing problem behavior may be procedures! Sampling is far more
instance, a behavioral consultant) also more detrimental than failing to reinforce efficient.
records instances of hand raising using appropriate behavior (St. Peter Pipkin, During monitoring sessions,
tallies. In this case, interval-by-interval 2006). Examining integrity on individual practitioners collect reliability and
reliability would be difficult to calculate components of an intervention may be integrity data using data collection
because the records cannot be easily as important as overall integrity because sheets tailored to the particular client’s
broken into smaller units; it is impossible interventions may withstand “low” intervention. For example, assume a
to tell when the teacher recorded the first levels of integrity if the contingencies practitioner developed an intervention
instance of hand raising and compare favor appropriate behavior over that involved delivering attention within
that to the consultant’s data. problem behavior. Calculating integrity 10 s of hand raising and not attending
Also, interval-by-interval reliability measures for individual components (i.e., ignoring) within 30 s of shouting.
methods sometimes inflate or deflate may also allow practitioners to provide The data collection sheet for this
agreement based on whether behavior more focused feedback to caregivers. intervention would have four sections:
occurs at a high or low rate. Going back For example, if a parent consistently one each for occurrences of hand raising
to the extreme example of an observer reinforces appropriate behavior but also and shouting, one for attending following
falling asleep, high agreement scores periodically reinforces problem behavior, hand raising, and one for not attending
might occur due to the fact that not individually calculated integrity following shouting. In the occurrences
much behavior occurred. Similarly, with measures permit practitioners to provide section, the practitioner would record the
high rate behavior, one observer could specific positive and corrective feedback number of opportunities to implement
essentially stop watching but continue to (respectively). This specific information the intervention (in this case, a count
score lots of behavior and obtain a high is only available at a quantitative level if of hand raising and shouting). The
score. To address these possibilities, it integrity measures are separated for each number of correct caregiver responses
is possible to score agreement on the component of the intervention. would be recorded in the other two
occurrence only intervals (i.e., evaluating One means of calculating integrity on sections. Sample data sheets (blank and
only those intervals in which one observer individual components of an intervention completed) are shown in Figures 2,
or the other scored the occurrence of is to use integrity-monitoring sheets 3, and 4. Figure 2 shows a blank data
behavior) and to score agreement on the in which each component is scored collection sheet, whereas the sheets in
nonoccurrence intervals (i.e., evaluating individually. For example, practitioners Figures 3 and 4 show hypothetical uses
only those intervals in which one observer using DRA procedures could record of the data sheet. In Figure 3, treatment
or the other scored the nonoccurrence of each instance the caregiver reinforced integrity is calculated by dividing the
behavior). By evaluating occurrence and appropriate behavior separately from number of correct teacher responses
nonoccurrence agreement, the reliability instances in which the caregiver (delivering and withholding attention

PRATICAL IMPLICATIONS OF DATA RELIABILITY 9

BAP_v1.2_p1-72.indd 9 10/10/08 8:55:49 AM


Figure 2. Sample blank data sheet for monitoring Figure 3. Sample data sheet for monitoring treatment integrity
treatment integrity; the intervention involves delivering that shows hypothetical data; treatment integrity is calculated by
attention within 10 s of hand raising and not attending dividing the number of correct teacher responses (delivering and
(i.e., ignoring) within 30 s of shouting. withholding attention following hand raising and screaming,
respectively) by the number of student responses, and multiplying
by 100; overall integrity is obtained by averaging the integrity
across the 1-min intervals.

following hand raising and screaming, responses by the larger and multiplying .

respectively) by the number of student by 100, or use a proportional agreement Conclusions


responses, and multiplying by 100. method. In Figure 4, reliability is Data reliability and treatment
Integrity varies throughout the session, calculated using a proportional agreement integrity should be measured in the
as shown by the integrity numbers in method. Agreement using this method everyday practice of behavior analysis.
each block. Overall integrity is obtained averages between 78% and 85%. If a less Failing to do so could be dangerous, and
by averaging the integrity across the stringent method of reliability calculation it is nearly impossible to judge the efficacy
1-min intervals, and it is relatively low was more appropriate, a whole-session of behavioral procedures without such
overall (60% for omission integrity and measure could be used, which would data. In addition, the ability to provide
27% for commission integrity). yield average agreement scores between feedback to data collectors and procedure
Data from brief monitoring 88% and 93%. implementers is paramount. Data
sessions could also be used to check the Using data sheets like these may reliability errors and treatment integrity
reliability of caregiver data collection. be useful because caregivers could be errors can be avoided through good
Comparison of the practitioner’s immediately alerted if reliability is low. training, solid descriptions of definitions
record with the caregiver’s record could Thus, brief monitoring sessions could and procedures, generalization and
permit immediate feedback to the be conducted using relatively simple maintenance training, and by making the
caregiver regarding the reliability of data materials. Despite the simplistic data procedures as simple and parsimonious
collection and intervention integrity. collection, these measures provide as possible. Monitoring should also be
In the example described above, the opportunities for calculating both simple and parsimonious, using efficient
recorded “opportunities” are also the reliability and integrity, and for providing methods such as intermittent sampling
counts of hand raising and shouting. The immediate feedback to caregivers about rather than continuous monitoring.
practitioner could use a whole-session ongoing recording of behavior and An issue that is likely to arise for
reliability measure (as described in the implementation of behavior-change practicing behavior analysts relates to the
section on reliability measurement) by procedures. problem of reimbursement. In short, is
dividing the smaller number of recorded . the practice of monitoring for reliability

10 PRATICAL IMPLICATIONS OF DATA RELIABILITY

BAP_v1.2_p1-72.indd 10 10/10/08 8:55:49 AM


(2007).  Treatment integrity of school-
based interventions with children in
Journal of Applied Behavior Analysis
Studies from 1991 to 2005. Journal
of Applied Behavior Analysis, 40, 659-
672.
Noell, G. H., & Witt, J. C. (1998).
Toward a behavior analytic approach
to consultation. In T. S. Watson and
F. M. Gresham (Eds.), Handbook of
child behavior therapy. New York, NY:
Plenum Press.
Noell, G. H., Witt, J. C., LaFleur, L.
H., Mortenson, B. P., Ranier, D.
D., & LeVelle, J. (2000). Increasing
intervention implementation in general
education following consultation: A
comparison of two follow-up strategies.
Journal of Applied Behavior Analysis,
33, 271-284.
O’Leary, K. D., Kent, R. N., & Kanowitz,
Figure 4. Sample data sheet for monitoring reliability that shows J. (1975). Shaping data collection
hypothetical data collected by a secondary observer; reliability is congruent with experimental
calculated using a proportional agreement method. hypotheses. Journal of Applied Behavior
Analysis, 8, 43-51.
and integrity reimbursable? The solution integrity and students’ inappropriate Peterson, L., Homer, A. L., & Wonderlich,
to the reimbursement issue is likely to behavior in special education S. A. (1982). The integrity of
vary from state to state or country to classrooms. Journal of Applied Behavior independent variables in behavior
country, or from one insurance company Analysis, 40, 447-461. analysis. Journal of Applied Behavior
to another. In our view, however, the Gresham, F. M., Gansle, K. A., Noell, Analysis, 15, 477-492.
practice is as important as any other G. H., Cohen, S. & Rosenblum, S. St. Peter Pipkin, C. C. (2006). A
assessment or therapy component of (1993). Treatment integrity of school- laboratory investigation of the effects
applied behavior analysis. Hence, based behavioral intervention studies: of treatment integrity failures on
behavior analysts are justified in billing 1980-1990. School Psychology Review, differential reinforcement procedures.
for their services even when, if not 22, 254-272. Unpublished doctoral dissertation,
especially when, they are taking measures Hartmann, D. P. (1977). Considerations University of Florida, Gainesville.
to ensure good reliability and integrity. in the choice of interobserver reliability Stokes, T. F., & Baer, D. M. (1977). An
estimates. Journal of Applied Behavior implicit technology of generalization.
References Analysis, 10, 103-116. Journal of Applied Behavior Analysis,
Allen, K. D., & Warzak, W. J. (2000). The Hawkins, R. P. and Dotson, V. A. (1975). 10, 349-367.
problem of parental nonadherence in Reliability scores that delude: An Sulzer-Azaroff, B., & Mayer, G. R. (1991).
clinical behavior analysis: Effective Alice in Wonderland Trip through Behavior analysis for lasting change. Fort
treatment is not enough. Journal of the misleading characteristics of Worth: Holt, Rinehart, & Winston.
Applied Behavior Analysis, 33, 373- interobserver agreement scores in Vollmer, T. R., Marcus, B. A., & LeBlanc,
391. interval recording. In E. Ramp and G. L. (1994). Treatment of self-injury and
Brackett, L., Reid, D. H., & Green, C. Semb (Eds.), Behavior Analysis: Areas hand mouthing following inconclusive
W. (2007). Effects of reactivity to of research and application. Englewood functional analysis. Journal of Applied
observations on staff performance. Cliffs, New Jersey: Prentice-Hall. Behavior Analysis, 27, 331-344.
Journal of Applied Behavior Analysis, Kazdin, A. E. (1977). Artifact, bias, and
40, 191-195. complexity of assessment: The ABCs of Author Note
DiGennaro, F. D., Martens, B. K., & reliability. Journal of Applied Behavior Address correspondence to Timothy
Kleinmann, A. E. (2007). A comparison Analysis, 10, 141-150. R. Vollmer, Psychology Department,
of performance feedback procedures McIntyre, L. L, Gresham, F. M., University of Florida, 32611 (email:
on teachers’ treatment implementation DiGennaro, F. D., & Reed, D. D. [email protected]).

PRATICAL IMPLICATIONS OF DATA RELIABILITY 11

BAP_v1.2_p1-72.indd 11 10/10/08 8:55:50 AM

You might also like