ATAVT en-US
ATAVT en-US
com
Manual
Adaptive Tachistoscopic Traffic Perception Test
Test label ATAVT
Version 52 – Revision 4
CONTENTS
1 SUMMARY............................................................................................................................. 4
4 EVALUATION ....................................................................................................................... 26
4.1 Objectivity ......................................................................................................................26
4.2 Reliability .......................................................................................................................26
4.3 Validity ...........................................................................................................................28
4.3.1 Content validity ........................................................................................................28
4.3.2 Construct validity ......................................................................................................28
4.3.3 Criterion validity .......................................................................................................34
4.4 Scaling ..........................................................................................................................39
4.5 Economy .......................................................................................................................39
4.6 Reasonableness ............................................................................................................39
4.7 Resistance to falsification ..............................................................................................39
4.8 Fairness ........................................................................................................................40
5 NORMING ............................................................................................................................ 41
6 TEST PROCEDURE............................................................................................................... 43
6.1 Decision-oriented testing ...............................................................................................43
6.2 Instruction and practice phase .......................................................................................45
6.3 Test phase.....................................................................................................................46
2
ATAVT
8 REFERENCES ...................................................................................................................... 51
3
ATAVT
1 SUMMARY
Author
G. Schuhfried
Application
Assesses obtaining an overview as part of attention performance.
Main areas of application: traffic psychology, human resources, manufacturing & safety
Theoretical background
The ATAVT measures obtaining an overview by briefly showing pictures that show traffic
situations. The items are constructed using an explicit, theory-led rationale which is based on
detailed quantitative analyses of the processes involved in the test performance. The
construction rationale is based on the design principles of the proven Tachistoscopic Traffic
Test (TAVTMB) (Biehl, 1996) and expands them by including current research findings on
cognitive skills (Cattell-Horn-Carroll model) (Schneider & McGrew, 2018). 84 items were drawn
up and evaluated using the 1PL Rasch model. The resulting item pool allows for adaptive test
presentation with all the associated advantages that this brings.
Test structure
Pictures of traffic situations appear briefly on the screen. After seeing each picture, the
respondent is asked to state what was in it. The respondent can choose from five answers that
are given. In two test forms items are presented adaptively (S1 and S2) - that is, after an initial
phase the respondent is presented with items whose difficulty is increasingly tailored to match
his/her ability. The other test forms are presented according to a set order.
Test forms
There are six test forms: S1 is designed for use in countries in which traffic drives on the right
and S2 for countries that drive on the left. In addition, there are four linear test forms, S3 to S6.
These forms are especially selective in the lower ability range.
Scoring
The person parameter Obtaining an overview is displayed, together with the corresponding
norm-referenced score including the confidence interval. The main variable Obtaining an
overview can be interpreted as a measure for obtaining an overview as part of focused
attention.
Reliability
Reliability is set to a standard error of measurement of 0.49 for the adaptive test forms S1 and
S2. This corresponds to a reliability of r = 0.80. The pre-defined precision of measurement
applies to all respondents at all performance ranges. Estimates of internal consistency,
measured by Cronbach's alpha, are available for the test scores of test forms S3 to S6. The
estimates are 0.726 for S3 and S4 and 0.729 for S5 and S6.
Validity
Due to the test design, ATAVT conforms precisely to the requirements for measuring obtaining
an overview and therefore has content validity. Correction analyses with construct-related and
non-construct-related tests as well as a confirmatory factor analysis prove the construct validity
of the test. Furthermore, there is a large number of studies in the areas of traffic psychology,
4
ATAVT
aviation psychology, and safety assessment (e.g. Sommer, Arendasy, Schuhfried &
Litzenberger, 2005; Vetter, Schünemann, Brieber, Debelak, Gatscha, Grünsteidel, 2018)
which prove the criterion validity of the ATAVT in different application contexts.
Norms
A representative norm sample for age and gender consisting of N=831 individuals from
German-speaking countries is available. Representative norm samples are also available for
Brazil (N=436), Italy (N=343) and Slovakia (N=331) as well as numerous convenience
samples.
Disclaimer
The results obtained in these tests are computer-generated. They are based on the
candidate’s response behavior and the technical infrastructure and depend on the test
presentation conditions as well as other factors including the candidate’s experience,
motivation, interest, self-awareness, and mental and physical state. In accordance
with the APA Standards for Educational and Psychological Testing (2014), page 13
and Standard 9.0, it is established that the test user bears ultimate responsibility for
evaluating whether the use of a test in a specific setting is justified. The test user must
take cultural and legal factors into account when making this decision. SCHUHFRIED
therefore cannot accept responsibility for decisions and interpretations made on the
basis of the test results and cannot be held liable for the consequences of using them,
regardless of how such consequences arise. The test results are confidential and
must be stored in accordance with relevant international and national data protection
guidelines and legislation.
5
ATAVT
Regardless of the respective issue, the user needs to have relevant qualifications in the area
of suitability assessment or assessment in the clinical and health context for applying, scoring,
and interpreting the questionnaire. This includes knowledge about statistical fundamentals,
test theory, quality criteria and design principles of psychometric tests. Norms such as
DIN33430 or comparable international norms provide detailed information on the necessary
qualifications. Furthermore, international standards such as the Performance Requirements,
Context Definitions and Knowledge & Skill specifications for the three EFPA levels of
qualifications in psychological assessment (European Federation of Psychologists‘
Associations, 2013), APA Standards for Educational and Psychological Testing (2014) or the
International Guidelines for Test Use (ITC, 2001) can be used.
Since the items are easy to understand, the ATAVT can be used for persons of different
educational levels and in a broad age range (see chapter 5). The administration time is
standardized and is between approx. 8 and 14 minutes depending on the test form. Testing
can take place both in a one-on-one setting and in a group. For group testing, however, the
respondents should use USB headsets. Due to the automatic test presentation and test
scoring, only the time for adding personal data must be planned for the supervisor in addition
to the test duration (approx. 6 and 15 minutes depending on the test form).
6
ATAVT
At the tip of the hierarchy is the g(eneral) factor; the second level comprises the secondary
factors and the third, lowest level is made up of primary factors. The g-factor is the broadest
construct and is often understood as general intelligence. In the latest version of the CHC
model (Schneider & McGrew, 2018), a total of 17 secondary factors are listed. Auditory (Ga)
and visual (Gv) processing, olfactory (Go), gustatory (Gg), kinesthetic (Gk), tactile (Gh) and
psychomotor (Gp) skills are attributed to this group of sensory, domain-specific skills.
Furthermore, a differentiation is made between different types of acquired knowledge:
quantitative knowledge (Gq), writing and reading (Grw), comprehension knowledge (Gc) and
domain-specific knowledge (Gkm). The area of general speed includes the secondary factors
processing speed (Gs), psychomotor speed (Gs), and reaction and decision speed (Gt). In the
area of domain-neutral general functions, a differentiation is made between fluid reasoning
(Gf), working memory (Gwm) as well as long-term storage and recall (GI & Gr). Figure 1 shows
the relative section of the CHC model for the theoretical basis of the measured skill of the
ATAVT and is described further in the following.
7
ATAVT
The Adaptive Tachistoscopic Traffic Perception Test (ATAVT) can be attributed to the
secondary factor of working memory (Gwm), more precisely the primary factor attentional
control. This is understood as the ability to focus attention on relevant stimuli and to ignore
irrelevant stimuli. The distinction from processing (Gs) and reaction speed (Gt) results from the
fact that for both an immediate, automated reaction is required and relevant information does
not have to be stored short-term.
The secondary factor visual processing (Gv) describes the ability of perception, discrimination,
manipulation, and recalling visual information. This includes the ability to visualize pictures in
one’s mind, to change them mentally (recall, rotate, etc.) and to apply them to problems, if
required. The differentiation from the secondary factor visual processing, specifically to the
primary factor visual memory results from the fact that working memory tasks are often simple,
numerous or sequential. Another major differentiating factor of working memory (Gwm) from
visual memory (below Gv), however, also is the fact that this is not just storing and recalling a
mental representation (Carroll, 1993) but furthermore also selecting and manipulating the
relevant visual information.
At the level of the primary factors, a distinction can be made for working memory (Gwm)
between visuo-spatial short-term memory and attentional control. Visuo-spatial short-term
memory describes the ability to encode visual information in working memory and retain it
short-term. If the item material requires discriminating between relevant and irrelevant stimuli,
this is rather a task that relates to attentional control.
8
ATAVT
Tests for measuring attentional control usually use relatively simple item material. Such simple
item material raises questions about the test’s ecological validity and the extent to which the
results can be generalized to cover more complex everyday situations. In view of these issues,
attempts were made even in the 1950s to operationalize these ability dimensions in ways that
were closer to real life. In traffic psychology assessments, Hessler and Grossjohann (1953)
used a tachistoscopic test method that enabled pictures of urban traffic situations to be
displayed for a period of one second. Development of such tests progressed in the years that
followed, particularly in the field of traffic psychological assessment. In 1960 TÜV-Rheinland
produced a tachistoscope test made up of 22 slides of traffic situations. Fischer and Kowar
(1977) also used a tachistoscope to display pictures of real-life traffic situations for exactly one
second. As with the other tachistoscope tests, the respondent’s task was to identify certain
pre-defined objects in the pictures displayed. After each picture had been briefly displayed, the
respondent was shown a list of objects and was asked to mark only those items that were
actually visible in the picture he/she had just seen. Compared with traditional
operationalization, these tachistoscopic tests have greater ecological validity and closer
resemblance to everyday life, and also have greater item complexity (Sommer, Herle, Häusler,
& Arendasy (2008)). Following the tradition of this field of research, the task of the ATAVT
consists of focusing attention on a traffic situation and to quickly identify relevant information
in this situation.
Each of these options represents a different class of objects that might be found in a traffic-
related context.
9
ATAVT
10
ATAVT
Incidentals, on the other hand, describe "superficial characteristics" that are interchangeable
among themselves. They have no influence on cognitive processes relevant to the solution
and therefore cannot have a significant effect on the item parameters. In the ATAVT the
individual concrete objects (e.g. vehicle type) in a category represent the incidentals.
On the basis of this construction rationale, requirement grids for the tasks of the ATAVT were
defined.
11
ATAVT
The item response theory (IRT; Fischer & Molenaar, 1995; van der Linden & Hambleton, 1997)
offers different measurement models, enabling the dimensionality of psychological tests to be
investigated in a theoretically well-founded manner.
12
ATAVT
The unidimensionality of dichotomous test items (e.g. right vs. wrong) can be tested using
Rasch’s logistic model (1980). This is a unidimensional latent trait model in which each
respondent v is described by a unidimensional person parameter θv and each test item i has
an item difficulty parameter εI. In the context of ability assessment, the position of θv describes
the respondent’s level of ability on the latent dimension measured. The probability that person
v will solve item i is specified by the model equation of the Rasch model in the form of a logistic
function:
eθv −εi
P(+ | θ v , ε i ) =
1 + eθv −εi
If the Rasch model fits for a defined item set and for particular subject groups, the test (or test
score) is described as “conforms with Rasch model”. A test that conforms with the Rasch model
has some useful psychometric properties:
(1) The test measures the same latent ability dimension for all test subjects. The ability under
consideration and the item difficulties can be characterized by a single parameter, respectively.
The raw score thus represents a sufficient statistic for the extent of the attribute in that
particular person (Fischer, 1974).
(2) The raw score obtained describes the respondent unambiguously with regard to his/her
position on the latent ability dimension θv that is to be measured.
(3) Any shift in the item parameters (e.g. εi εi + c) leads to a corresponding shift of the person
parameters (e.g. θv θv + c). Hence the estimation of item parameters does not depend on
the test sample used. This means that individuals or tasks can be compared with one another,
irrespective of which items the respondents have worked or which individuals from a particular
population have worked the tasks. Rasch (1980) refers in this context to “specific objectivity”.
This property of the Rasch model not only makes computerized adaptive testing possible but
also forms the basis for the generalizability of diagnostic statements about a respondent’s trait
strength over the sample of items used. For this reason, it is of key importance for the practical
application of psychological assessment methods.
(4) A further advantage of the validity of the Rasch model for the scales of a questionnaire is
that it demonstrates the fairness of the scoring method for different subject groups that are
held to be relevant. Only if the Rasch model fits the data can one be certain that none of the
respondent groups tested (e.g. women vs. men) is systematically advantaged or
disadvantaged by particular items.
One method of testing the Rasch model consists of checking the assumption of person
homogeneity (Rost, 2004) by means of likelihood quotient tests according to Andersen (1973).
This test relates the likelihood of the data for the item parameters estimated in the total sample
to the likelihoods of the data for the item parameters estimated in the sub-samples. This test
statistic has an asymptotic χ² distribution with (p-1)×(k-1) degrees of freedom, where k
represents the number of tasks and p is the number of sub-samples. A non-significant result
in an Andersen likelihood quotient test means that the item parameter estimates in the sub-
samples do not deviate significantly from the item parameter estimates of the total sample.
The item parameters can therefore be generalized over the sub-samples tested. The model
assumption of person homogeneity (Rost, 2004) can therefore be maintained with regard to
the splitting criteria investigated.
13
ATAVT
Table 2: Item set, splitting criteria, sample size and goodness of fit statistics
As Table 2 shows, the model test for age in the fifth item set is the only one to be equal or less
than the critical significance level of α=0.01. However, use of the Rasch model for the
incomplete full data set showed that none of the splitting criteria were conspicuous; in
14
ATAVT
consequence, no individual items were eliminated from the pool. Overall, therefore, the
assumption of unidimensionality for this item pool can be retained. This justifies the assumption
that the parameter estimates can be generalized over the subsamples studied for the entire
item pool. The assumption of person homogeneity (Rost, 2004) can therefore be regarded as
met. With regard to the assumption of item homogeneity (Rost, 2004) the Martin-Löf test did
not reach the critical significance level (see Table 2). This means that the newly designed items
of the ATAVT and the link items from the TAVTMB measure the same latent ability dimensions.
Findings on the criterion validity of the TAVTMB can therefore be transferred to the ATAVT.
Table 3: Results of model tests based on Ponocny (2001) for ATAVT/S3 and S4
Overall, the present results show a good fit for the Rasch model for ATAVT/S3 and S4.
15
ATAVT
Table 4: Results of model tests based on Ponocny (2001) for ATAVT/S5 and S6
Overall, the present results show a good fit for the Rasch model for ATAVT/S5 and S6.
In summary, these analyses show that both the adaptive test forms S1/S2 and also the linear
test forms S3-S6 based on this test are Rasch-homogeneous and can therefore measure
unidimensionally.
Dimensionality
In order to examine whether the items in both tests measure the same latent ability dimensions,
the TAVTMB items were included with the newly created ATAVT items in the investigation of
the unidimensionality of the ATAVT item pool. If the Rasch model (Rasch, 1980) applies for
the joint item pool for the TAVTMB and the ATAVT, then it can be assumed that both tests
assess the same latent ability dimension. Sommer et al. (2008) tested this hypothesis
empirically using a link design (see chapter 3.2.4 for details). The authors were able to
demonstrate that the combined item pool is Rasch-homogeneous. This proves that the two
tests measure exactly the same ability dimension.
By using a link design, the difficulty parameters of the ATAVT tasks could be adjusted to the
item difficulty parameters of the TAVTMB (vgl. Holland & Dorans, 2006; Kolen & Brennan,
2004). This brings with it the advantage that the person parameters of the ATAVT can be
transferred directly to the TAVTMB via a linear transformation, which means that the test
results of the ATAVT can be transferred to the TAVTMB and vice versa. The two tests are thus
directly comparable and interchangeable with regard to both the ability dimension that is
measured and the performance achieved by respondents in each.
16
ATAVT
0.031). Both tests differ, however, in the range of difficulties that can be covered with both item
pools. The distribution characteristics of both item pools are summarized in Table 5.
Despite nearly the same mean value and distribution, the ATAVT covers a wider range of
difficulty. This means that sufficient items are available in lower and upper ability ranges of the
ATAVT in order to provide reliable measurement of the respondents’ latent ability.
Table 6: Descriptive statistics for the ATAVT and TAVTMB person parameter
ATAVT
Mean Median SD Min Max Skew Kurtosis
0.198 0.284 0.683 -2.70 2.78 -0.512 1.234
TAVTMB
Mean Median SD Min Max Skew Kurtosis
0.4196 0.2396 1.369 -3.14 4.00 0.117 0.035
An inspection of the descriptive statistics shows that the range of the person parameters for
the TAVTMB is generally greater than with the ATAVT. This has the effect of a correspondingly
higher distribution of the person parameter for the TAVTMB. This result is interesting insofar
as the TAVTMB item pool includes a lower difficulty range than the ATAVT item pool.
17
ATAVT
indicators correspond to the relevant indicator for the distribution of the estimated person
parameter in the norm sample for the ATAVT. In the simulated testing with the TAVTMB, all
20 items were predetermined and then the person parameter estimated using the joint
maximum likelihood method. For the ATAVT, typical adaptive testing was simulated and the
person parameter similarly estimated using the joint maximum likelihood method. Unlike the
real test applications of the ATAVT, no item exposure control was used for these simulated
studies. The person parameter estimated this way was compared in a next step with the real
person parameter that was used as a basis for the simulation. To do so, a Pearson correlation
was calculated between the real person parameters and the person parameters estimated as
part of the simulation. In the ideal situation, these correlations should be close to 1.0. For
ATAVT, there was a correlation of 0.915, while with the TAVTMB a correlation of 0.867 was
calculated. Figure 3shows the distribution diagram for the relationship between the real person
parameter and the person parameter estimated for ATAVT (left) and TAVTMB (right) on the
basis of the test performance.
Figure 3: Distribution diagram for the relationship between the real person parameter and the person
parameter estimated for ATAVT/S1 (left) and TAVTMB/S1 (right) by the test.
As Figure 3 shows, the precision of the person parameter estimate in ATAVT is noticeably
higher. The difference in the precision of the person parameter estimate between ATAVT and
TAVTMB is explained by the previously reported empirical results from the comparison of the
person parameter estimates from ATAVT and TAVTMB. From a content perspective, these
results mean that a person's true ability can be measured more precisely with the ATAVT.
18
ATAVT
Since adaptive tests are constructed on a probabilistic test model such as the Rasch model
(1980), at any point in testing the amount of information already gathered is known. The
adaptive test can therefore be continued for each respondent until a pre-set target reliability
has been reached. The adaptive test therefore measures all respondents with equal reliability.
Traditional tests, by contrast, achieve a certain reliability on average, but this reliability can be
significantly reduced in individual cases, particularly for respondents whose performance is
either very strong or very weak.
19
ATAVT
INITIAL PHASE: The first item presented is normally one of medium difficulty, since at this
stage information about the respondent’s ability is normally not available. An item that is
generally of medium difficulty usually represents the best challenge for the respondent.
Depending on the response, a more difficult or easier task is presented. The response given
is then recorded and the preliminary person parameter is estimated by way of the weighted
maximum likelihood-method (Warm, 1989).
In the highly unlikely event of the respondent having given only incorrect or only correct
answers after ten items, the test is terminated and the last estimated person parameter is
output, otherwise, the main phase is branched.
MAIN PHASE: After each further item the individual’s ability θ is estimated by the maximum
likelihood method (ML) from all the k answers given up to that point. This is done by maximizing
the likelihood function for θ (siehe Warm, 1989).
∑ −
+ =0
2
The technical details on this maximization are described in Baker and Kim (2004) and
elsewhere. The standard error of measurement (SEM), is also calculated for this ML estimate
of the person parameter θ.
The STOP RULES described below are then checked. If none of these rules apply, the next
item is selected and presented to the respondent.
As the next (k+1)- task, the task from the pool of tasks not yet applied is presented to the
respondent whose difficulty parameter βk+1 has the smallest absolute distance |βk+1 - θk | to the
person ability determined after k responses θk so far. After k+1 worked items, the person
parameter θk+1 is estimated or updated according to the ML method and, when necessary, this
procedure is repeated until the abort criterion is fulfilled. Respondents therefore work different
and differing amounts of tasks adapted to their level.
20
ATAVT
Figure 4: Test algorithm for the AMT based on the Rasch Model
STOP RULES: The test is terminated according to the criterion of the following criteria which
applies first (see Table 7):
Tasks used once are registered and not used again for the respective respondent.
21
ATAVT
Table 8: Start parameters for different sociodemographic groups. The start items are selected to have a solution
probability of 75% for the average individual in the relevant group.
Because the ATAVT is sufficiently long, the differential start to the test, while it improves both
the test’s fairness and its reasonableness, has no effect on the person parameter that is to be
estimated.
The following case study illustrates the process of adaptive testing. It involves a fictitious
subject from a simulation study. From this simulation the true ability parameter (ξ=1.0) is
known. The trend of the ability parameter estimates shown in Figure 5 clearly shows that they
converge to a limiting value. The outlying black lines indicate the confidence interval on either
side of the performance parameter. As each item is completed the confidence interval
becomes narrower as the estimate becomes more accurate.
22
ATAVT
Figure 5: Log of the adaptive test of a simulation study. The broken lines mark the boundaries of the
confidence interval (5% error probability) of the person parameter estimate.
It is clear from simulation studies of the ATAVT that differing numbers of items are required to
achieve a particular reliability at different scale points (i.e. for different ability parameters) (see
Figure 6).
Figure 6 shows the reliability and the associated number of items required for the ATAVT at
different scale points (i.e. for different ability parameter estimates). The mean reliability is also
given. The test parameters were measured in a simulation study (n=5000) that assumed a
normal distribution of ability.
23
ATAVT
Figure 6: Test reliability (y-axis, left-hand graph) for fixed test length (k=20)
and test length (y-axis of right-hand graph) for pre-set target reliability (α=0.80)
in the ATAVT over different ability levels (x-axis).
This yielded a reliability of α=0.80 for a fixed number of items of k=20. At the particularly high
end of the ability range (from PR=98) a fall in the reliability obtained can be observed (see
Figure 6 left). For use in traffic psychological assessment, however, this particularly high
performance range is of little relevance. The right-hand graph in Figure 6 shows that a larger
number of items is required to maintain reliability at either end of the ability spectrum. In
summary these results show that the test has adequate reliability and that this reliability is
maintained over a broad ability spectrum.
This means that some items will be presented much more frequently than others. The concerns
in particular those tasks in the medium difficulty range which are of crucial importance for the
item pool, especially the items in the starting phase can therefore become public
overproportionally fast.
In order to prevent possible overexposure for certain items at the start of the test, a method
from Kingsbury and Zara (1989) is used in line with adaptive testing. In this method, the person
parameter is estimated first and then five tasks with maximum item information are set, that is
the items which have the smallest distance between the estimated person parameter and the
difficulty of the tasks. Out of the five resulting items, one is selected at random and presented
to the respondent. This turns the initial strong deterministic course by incorrect and correct
responses provided by the respondent into a probabilistic result. This reduces the risk of a
respondent practicing the route throughout the entire test to a more advantageous result and
learning it by heart, as in coached faking.
24
ATAVT
In addition, there are four linear test forms (S3, S4, S5 & S6). Test forms S3 and S5 are
designed for countries with right-hand traffic. Test forms S4 and S6 are designed in the same
way as S3 and S5 but are intended for countries that drive on the left. The linear test forms
S3, S4, S5 and S6 were developed especially for the lower ability range. Due to the linear
presentation, they have lower test security but they also have a shorter test duration.
The test duration of test forms S1 and S2 is approx. 14 minutes, the test duration of S3, S4,
S5 and S6 is approx. 8 minutes.
Main variable
Obtaining an overview
This consists of an ability parameter estimated according to the Rasch model for the ability to
obtain an overview as part of attentional control.
The ability parameter is calculated from the item difficulties and whether the items presented
are solved or answered incorrectly. An item is classed as solved only if the answer given
includes all the required object classes.
Additional information
Working time
This variable indicates the working time of the subtest in minutes and seconds.
25
ATAVT
4 EVALUATION
4.1 Objectivity
A test is objective if the test results are independent of the investigator. A distinction is made
between administration objectivity, scoring objectivity, and interpretation objectivity (Ziegler &
Bühner, 2012).
Administration objectivity
Administration objectivity is given if the result is independent of the test administrator and the
testing situation. Since administration of the ATAVT is computerized, all respondents receive
the same information, presented in the same way, about the test. These instructions are
independent of the test supervisor. Similarly, the test presentation is identical for all
respondents. This ensures administration objectivity.
Scoring objectivity
Scoring objectivity is given if the result is independent of the person evaluating the test. Since
the computer records answers, calculates variables, and compares norm-referenced scores,
computational errors are excluded and scoring objectivity is given.
Interpretation objectivity
Interpretation objectivity exists if the same conclusion is drawn from particular test results even
when they are interpreted by different people (Kubinger, 2009). If the test in question has been
normed, it is generally unambiguous in its interpretation: the norm-referenced score and the
confidence interval unequivocally determine the respondent’s “position” within the reference
population with regard to the measured trait. According to this definition the ATAVT has
interpretation objectivity.
4.2 Reliability
The reliability of a test describes the degree of the precision with which the aptitude is
measured (Ziegler & Bühner, 2012). Reliability coefficients can take values between 0 and 1:
a higher value reflects greater precision. According to the test appraisal guidelines of the
European Federation of Psychologists' Associations (EFPA, 2013) from July 2013, reliability
values over 0.7 are considered adequate, over 0.8 are considered good, and over 0.9 are
excellent.
26
ATAVT
In order to test the retest reliability and stability of the main variable Obtaining an overview, a
linear precursor version of the ATAVT was administered to a sample of 82 individuals (48%
men, 52% women) in the age range 17 – 78 (M=44; SD=17) immediately after testing and
three months later. The study was collected in 2006 in SCHUHFRIED’s Test and Research
Center. All persons were tested under supervision of a test administrator and participated in
the study voluntarily without any consequences associated with the working. Participation was
compensated for every person participating in the examination. This yielded a retest reliability
(retesting immediately after the first test) of r=0.76 and a stability (retesting after 3 months) of
r=0.70. The two values lie within the reliability of the linear form that was used on that occasion
(r=0.77) and therefore indicate moderate reliability of the test forms.
Reliability
The reliability of a test is composed of the length of the point range or the maximum possible
scale combined with the width of the confidence interval (Sijtsma & Emons, 2011). The relative
confidence interval calculated as a result indicates how well a differentiation can be made with
the test. The lower the relative confidence interval, the higher the reliability.
Due to the adaptive test presentation for test forms S1 and S2, the confidence interval can be
calculated consistently during testing while considering the standard error of measurement of
the current person parameter estimator and diminishes with increasing reliability. For this
reason, the relative confidence interval per test run may change slightly due to the marginal
differences of the final standard error of measurement. Providing a generally relative
confidence interval for adaptive testing therefore is not appropriate. The relative confidence
interval of the ATAVT for the linear test forms (S3-S6) can be seen in Table 9 whereby in the
“point range” case the range of the person parameters corresponds to the norm sample.
27
ATAVT
4.3 Validity
A test is valid if it actually measures the characteristics that it claims to measure (Ziegler &
Bühner, 2012).
Construct validity exists when it can be demonstrated that a test implements particular theory-
based approaches. With regard to construct validity, Embretson (1983) distinguishes between
construct representation and a nomological network.
28
ATAVT
The seven theoretically postulated relevant item characteristics (difficulty parameters) thus
explain 81.2% of the variance in the 1PL item difficulty parameters. Table 11 shows the
regression weights for the individual difficulty factors of the construction rationale.
Predictor B S.E. β t p
Number of visible object classes 1.061 0.078 0.769 13.640 <0.001
Proportion of necessary peripheral objects 1.366 0.340 0.206 4.020 <0.001
Proportion of necessary small objects 1.089 0.345 0.177 3.152 0.002
Confusion as a result of experience 0.880 0.274 0.164 3.213 0.002
Poor contour contrast 0.721 0.277 0.139 2.604 0.011
Proportion of necessary concealed objects 1.984 0.655 0.169 3.029 0.003
Proportion of typical objects 0.769 0.360 0.132 2.137 0.036
As Table 11 makes clear, all the theoretically postulated relevant item characteristics (difficulty
factors) make a significant contribution to prediction of the 1PL item difficulty parameters.
However, inspection of the β weights shows that the “Number of visible object classes” makes
the largest contribution to prediction of the item difficulty parameters.
29
ATAVT
Figure 7: Nomological network of the test battery according to the CHC model
In the selection of the tests for the battery used, different points were taken into consideration.
Firstly, attention was paid that only tests are used that have sufficient reliability (reliability of
the main variable ≥ 0.7) to prevent artificial impairment of the correlation between the
investigated test (COG) and non-construct-related tests or construct-related tests due to low
reliability. Secondly, the tests were selected at a similar level of generality to ensure sufficient
symmetry. Limited symmetry would be expected when the correlated tests broadly measure
characters with different widths (Bühner, 2011), for example by one test only measuring a
subarea of a skill while another test attempts to measure the entire ability with all of its
subareas. For this reason, a test was selected which is at the same level of generality in the
CHC model, that is those of the primary factors (see section 3.1.2). Thirdly, only ability tests
were selected to avoid validity restrictions due to different measurement methods. When tests
with different measurement methods are used (e.g. self-assessment, third-party assessment,
ability tests, etc.), this may result in impaired validity estimates due to method factors (Bühner,
2011). Fourthly, a representative sample for age and gender was collected for the examination
in order to avoid distribution restrictions. A distribution restriction and therefore limited validity
estimate would exist when, for example, a preselection of the respondents based on the test
scores achieved would be made (e.g. all persons must achieve a percentile rank over 16 in
the ATAVT test). After considering the points made, the following tests were selected for the
battery: The Cognitrone S11 (G. Schuhfried, 2019a), Determination Test S1 (G Schuhfried,
2019b), Visual Pursuit Test S3 (Wagner, Karner & Biehl, 2018) and Reaction Test S3 (G.
Schuhfried, 2019).
The adaptive test form S1 (Cronbach’s α = 0.80) of the ATAVT was used. Cognitrone (COG)
is a concentration test in the sense of an image comparison test. In test form S11 used for this
study (Cronbach’s α = 0.95) an abstract target figure is shown to the respondent for which the
respondent must decide whether or not it is included in a row of four other abstract figures.
The respondent controls the processing time himself/herself by only presenting the next
abstract figure after a reaction has been made. The Determination Test (DT) is a choice
reaction test during which the respondent is presented optical and acoustic stimuli sequentially.
In test form S1 (Cronbach’s α = 0.95) the optical stimuli are circles in blue, yellow, green, red
and white as well as two rectangles on the left or right side of the screen while the acoustic
30
ATAVT
stimuli are a high-pitched and low-pitched sound. The task is to operate the buttons assigned
to the stimuli on the SCHUHFRIED response panel or foot-operated keys as quickly as
possible when the corresponding stimulus is presented on the screen. In test form S1, the
presentation time of the stimuli is adapted to the ability of the respondent, but it is not
determined directly by the respondent. The Reaction Test (RT) is a test for measuring the
respondent’s ability to react. In test form S3 (Cronbach’s α = 0.93) used in this study, simple
optical and acoustic stimuli are presented for this purpose. Yellow and red circles are used as
optical stimuli while a high-pitched sound is used as the acoustic stimulus. The respondent’s
reaction is only required for a clearly defined stimulus combination. The sequence is not
determined by the respondent and the stimuli follow a defined interval. The reaction time is
registered via the SCHUHFRIED response panel. The Visual Pursuit Test (LVT) is a classic
line-labyrinth test for measuring visual orientation performance for which the respondent has
to recognize the end of a specified line as quickly as possible from a tangle of lines (Cronbach’s
α = 0,92). The picture with the lines that the respondent has to track is only shown when two
keys of the SCHUHFRIED respondent panel are pressed at the same time and disappears
immediately when the buttons are released. The respondent can freely choose the working
speed, that is the duration of viewing the speed and subsequent specifying of the correct end
point. In addition, the CORSI Block-Tapping Test (Schellig, 2018) and the Visual Memory Test
(Etzel & Hornke, 2018) were added to check construct validity as part of the CHC model. The
CORSI Block-Tapping Test (CORSI) measures the storage capacity of spatial working
memory. In test form S1 used for this study, the respondents are shown three irregularly
arranged blocks (Cronbach’s α = 0.81). A cursor touches a certain number of blocks in a row.
After this animation is shown, the respondent is asked to repeat the order that was shown. The
Visual Memory Test (VISGED) measures short-term visual memory performance. In the traffic
psychological short form (S11), the respondent is shown a town map. Various symbols, such
as a cross for a hospital and a book for a library, are marked on the map (Cronbach’s α = 0.73).
The respondent’s task is to remember the type and position of the symbols. He/she is then
asked about the position of particular symbols. The CORSI test was used to test convergent
validity. The test loads on the same secondary factor (visual processing) of the CHC model as
the LVT. The DT, RT, COG; ATAVT, and CORSI tests were used to assess discriminant
validity.
The sample was gathered in January and February 2019 in the Test & Research Center of
SCHUHFRIED GmbH, using a stratified quota sample plan according to gender, age, and
education. Overall, a sample of N=129 persons (78 women, 51 men) aged 20 to 87 (M = 52.4,
SD = 16.6) worked the tests. The expected age and gender distribution was obtained from
demographic data for Germany, Austria, and Switzerland for 2016 (Europäische Kommission,
2019). A comparison of the two distributions using a statistical chi² test (chi²[29]=25.98, p=0.63)
shows that they do not differ significantly from each other. The sample can therefore be
regarded as representative of the population of Austria, Germany, and Switzerland in terms of
gender and age.
In an initial step, the discriminant and convergent validity is assessed using intercorrelations
of the main variables. The results can be seen in Table 12.
31
ATAVT
Convergent validity is tested based on the correlation of the ATAVT with the CORSI test. As
can be seen in Table 12, the ATAVT has the highest correlation with CORSI as can be
expected from the nomological network or the CHC model in which both tests are attributed to
the secondary factor working memory (Gwm).
The correlations to the COG and DT can be traced back to the shared visuo-spatial item
material used in all tests. In addition, working memory also plays a role for working both
additional tests, which can also be traced back to the strong loading of working memory on the
g factor which can both be drawn from the literature (Schneider & McGrew, 2018) and from
the structural equation model described below.
Discriminant validity was tested based on the correlations of the ATAVT with the LVT and
VISGED tests. As can be seen in Table 12, the ATAVT shows low correlations with these two
tests which was assumed.
In a further step, the fit of the tests used in this study to the theoretical assumptions of the CHC
model or the nomological network were investigated. The fit of the CHC model or the
nomological network was investigated using structural equation models. The ML estimator
from the R package “Iavaan” (R Core Team, 2016; Rosseel, 2012) was used to estimate the
models (Reinecke, 2014).
The estimate was carried out based on the sample collected between January and February
2019 in the Test and Research Center of SCHUHFRIED GmbH, using a quota sample plan
stratified according to gender, age, and education. For the model, all persons were eliminated
who had incomplete data in one or multiple included procedures, which resulted in a sample
of N=119 persons (71 women, 48 men) aged 20 to 87 (M = 52.1; SD = 16.6). The expected
age and gender distribution was obtained from demographic data for Germany, Austria, and
Switzerland for 2016 (Europäische Kommission, 2019). A comparison of the two distributions
using a statistical chi² test (chi²[29]=26.61, p=0.59) shows that they do not differ significantly
from each other. The sample can therefore be regarded as representative of the population of
Austria, Germany, and Switzerland in terms of gender and age.
32
ATAVT
Figure 8: Structural equation model according to the CHC model and the nomological network; the negative
loading of the secondary factor Reaction & decision speed on the g factor is based on the mean times used for
the calculation, which is poled negatively compared to the other two secondary factors
The assessment of model fit is carried out using the indices CFI (Comparative Fit Index),
SRMR (Standardized Root Mean Square Residual) and RMSEA (Root Mean Square Error of
Approximation). A sufficient correlation between the empirical covariance matrix and the
covariance matrix to be expected because of the theoretically postulated factor model is shown
by the following values: CFI ≥ 0.90, SRMR ≤ 0.08 and RMSEA < 0.08 (Hu & Bentler, 1999; Yu,
2002). The key statistics on the structural equation model [χ²(11) = 9.223; CFI = 1.0; RMSEA
= 0.01 (95% Cl: 0.00; 0.084); SRMR = 0.030] result in a very good fit of the data on the
theoretically postulated model. In addition, the fit of a single-factor model was estimated due
to the intercorrelations between the tests in which all tests used load directly to a g factor
[χ²(14) = 22.107, CFI = 0.968, RMSEA = 0.070 (95% Cl: 0.00; 0.123), SRMR = 0.047]. The
comparison of both models shows a significantly better fit of the model created based on the
CHC model compared to a single-factor model, as can be seen in Table 13.
Table 13: Model comparison of the model based on the CHC model and a single-factor model.
df χ2 χ2 diff Df diff p
FEV model 11 9.223
1-factor model 14 22.107 13.208 3 0.004
In summary, both the intercorrelations found between the presented tests and the results of
the structural equation model can be interpreted as proof of construct validity and particularly
the convergent and discriminant validity of the ATAVT.
33
ATAVT
Table 14: Incremental validity and relative relevance of the individual tests
34
ATAVT
The ATAVT has a relative relevance of nearly 8%. These results therefore provide evidence
of its criterion validity.
A further study by Risser et al. (2008) investigated the criterion validity of two test batteries for
measuring driving-specific ability from the Expert System Traffic (Schuhfried, 2005), which also
included a precursor version of the ATAVT (TAVTMB; Biehl, 1996). The only difference
between the Plus and Standard test batteries is that the Plus Test Battery includes the
Peripheral Perception Test (PP) as a means of measuring the visual field and assessing
divided attention. The global assessment of driving behavior in a standardized driving test was
used as the criterion variable. The sample investigated by the authors consisted of 164 persons
(74 men; aged 19 to 91; M=59 years; SD=18 years). Some of the respondents were drivers
who had already committed traffic offenses.
A total of 39 people (18%) had completed compulsory schooling or an intermediate secondary
school but without completing vocational training (EU educational level 2), 96 people (43%)
had completed vocational training or a course at a technical college (EU educational level 3),
35 people (16%) had a school-leaving qualification at university entrance level or a qualification
from a technical university (EU educational level 4) and 52 people (23%) had a university
degree (EU educational level 5).
The correlations between the individual tests and the external criterion are shown in Table 15.
Table 15: Correlations between the tests and the external criterion
Building on the findings of previous studies, an artificial neural network was used to calculate
the criterion validity of the test battery. The results demonstrated that the Standard test battery
correctly classified 80.2% of the total sample. The classification rate for the Plus test battery
was 86.5%. This corresponds to a validity coefficient of R=0.68 for the Standard test battery
and R=0.78 for the Plus test battery. In the jackknife validation a classification rate of 80.2%
and a validity coefficient of R=0.67 were obtained for the Standard test battery. For the Plus
test battery the jackknife validation yielded a classification rate of 83.8% and a validity
coefficient of R=0.77. The intervals for the classification rate and the validity coefficient in the
bootstrap validation were [74.2%; 85.2%] and [0.54; 0.73] for the Standard test battery and
[77.5%; 87.2%] and [0.61; 0.79] for the Plus test battery. The findings were replicated by
Sommer et al. (2008) using an additional independent sample.
Table 16 shows the incremental validity and relative relevance of the individual tests of the two
test batteries. The relative relevance indicates the amount contributed by the test in question
to the validity of the predictive model.
35
ATAVT
Table 16: Incremental validity and relative relevance of the individual tests
STANDARD PLUS
Incremental Relative Incremental Relative
Predictor variables
validity relevance validity relevance
AMT: General intelligence 0.128 18.7% 0.078 11.5%
DT: Reactive stress tolerance 0.124 18.2% 0.071 10.5%
RT: Reaction speed 0.036 5.7% 0.057 8.6%
RT: Motor speed 0.132 19.2% 0.118 16.9%
PP: Field of vision -- -- 0.090 13.2%
PP: Tracking deviation -- -- 0.139 19.5%
ATAVT: Obtaining an overview 0.120 17.6% 0.038 5.8%
COG: Ability to concentrate 0.141 20.3% 0.093 13.6%
With a relative relevance of 17.6% (Standard test battery) and 5.8% (Plus test battery) the
results thus support the criterion validity of the ATAVT.
Evidence of the incremental criterion validity of the previous version of the ATAVT test
(TAVTMB; Biehl, 1996) was also provided by the results of a study by Sommer et al. (2008).
The authors investigated the criterion validity of a test battery for measuring driving-related
ability and aptitude for adapting to traffic conditions from the Expert System Traffic (Schuhfried,
2005), which also included a previous version of the ATAVT. The global assessment of driving
behavior in a standardized driving test was used as the criterion variable. Sommer et al. studied
a sample of 127 (79.9%) men and 32 (20.1%) women in the age range 19 – 73 (M=39.41;
SD=11.92). A total of 29 people (18.2%) had completed compulsory schooling or an
intermediate secondary school but without completing vocational training (EU educational level
2), 96 people (43%) had completed vocational training or a course at a technical college (EU
educational level 3), 35 people (16%) had a school-leaving qualification at university entrance
level or a qualification from a technical university (EU educational level 4) and 52 people (23%)
had a university degree (EU educational level 5). 109 individuals (68.6%) were classified as
safe drivers on the basis of their performance in the standardized driving test (Vienna Driving
Test).
The correlations between the individual tests and the external criterion are shown in Table 17.
Table 17: Correlations between the tests and the external criterion
RT RT
AMT DT ATAVT COG
cog. speed motor speed
- 0.120 -0.390* 0.184* 0.201* -0.393** 0.212*
** p<0,05; *p<0,01
Here again artificial neural networks were used to calculate the criterion validity for this
comprehensive test battery. The results showed that the Standard test battery correctly
classified 83.1% of the total sample. This corresponds to a validity coefficient of R=0.65. In the
jackknife validation a classification rate of 83.1% and a validity coefficient of R=0.65 were
obtained. The intervals for the classification rage and the validity coefficient in the bootstrap
validation were [77.4%; 89.1%] and [0.55; 0.76].
Table 18 shows the incremental validity and relative relevance of the individual tests. The
relative relevance indicates the amount contributed by the test in question to the validity of the
predictive model.
36
ATAVT
Table 18: Incremental validity and relative relevance of the individual tests
The ATAVT has a relative relevance of 18.4%; the results therefore show that it has criterion
validity.
Depending on the degree of severity and location of lesion, strokes and cranio-cerebral trauma
result in different cognitive impairments. Deficits in ability to react or attention are very frequent.
This is why the issue of fitness to drive plays a major role after an injury to the brain. Patients'
fitness to drive after a stroke or cranio-cerebral traumas was examined by Sommer, Heidinger,
Arendasy, Schauer, Schmitz-Gielsdorf, and Häusler (2010). The patients were split into two
groups based on a standardized driving test (N = 178, 81.5% men; age: M = 45.37, SD =
11.92): Driving test passed/not passed. Cognitive ability was measured with a traffic
psychological test battery which included the TAVT. Obtaining an overview correlated
significantly (r = -0.281) with the assessment of the driving test. Logistic models and models
based on an artificial neural network found a validity of the entire test battery of R = 0.53-0.81
with a correct classification rate of 79.2 – 89.9%.
In addition, a study conducted by Eder, Brieber, Uhl, Vetter, and Häcker (2018)showed that
the performances in the ATAVT for healthy older persons correlate to a significant degree with
the evaluation in a standardized driving test. As part of the study, 109 healthy older persons
over 65 were subjected to comprehensive pyschometric testing as well as a standardized
driving test. All persons were tested under supervision of a test administrator and participated
in the study voluntarily without any consequences associated with the working. Participation
was compensated for every person participating in the examination. Of the original 109
participants, 98 persons could be considered for the ultimate data assessment due to drop-
outs. The sample was collected from 2018 to 2019 and consists of 51 (52%) men and 47 (48%)
women aged between 65 and 88 (M = 73.13, SD = 5.73). The Kuratorium für
Verkehrssicherheit (Austrian Road Safety Board) conducted the driving tests for the evaluation
of driving competence. In the framework of the study, a standardized driving test (approx. 45
minutes) was planned and implemented within the Vienna city limits together with the KFV.
The driving tests were carried out per person in the presence of a driving instructor as well as
a trained driving behavior observer. A car with two sets of pedals which can also be controlled
by the driving instructor in an emergency was used. During the drive, driving competence was
measured by the driving behavior observer by way of a standardized record sheet. The
participants were rated regarding specific driving behavior dimensions and evaluated based
on an overall score across the entire driving test which was assigned at the end of the driving
test by the driving observer (grade 1 = very good to 5 = fail). A correlation analysis between
the test results in the ATAVT test and the grade awarded by the driving observer resulting in
a small correlation for the variable Obtaining an overview (r = 0.391) (Cohen, 1988).
1 Due to the limited reliability of the overall score in the driving test, a correction for attenuation according to
Schmidt und Hunter (1998) was applied. The uncorrected correlation coefficients between the overall grade in
the driving sample and obtaining an overview ATAVT corresponded to r = 0.28.
37
ATAVT
Taken together these studies of the criterion validity of the ATAVT demonstrate the relevance
of the test for traffic psychological assessment and thus meet the requirement for validation of
the test from the point of view of traffic safety.
In addition to studies of the criterion validity of the ATAVT in the field of traffic psychological
assessment, initial findings on criterion validity in the field of aviation psychological assessment
are also available. Sommer, Häusler, Koning and Arendasy (2006) investigated the criterion
validity of a comprehensive ability test battery, which included both selected subtests of the
INSBAT and a precursor version of the ATAVT (TAVTMB; Biehl, 1996). The data relates to 99
military pilot trainees (98 men) aged between 16 and 25 (M=18.84, SD=2.04). The global
assessment of performance in a standardized flight simulator served as the criterion variable.
This test battery enabled success in the flight simulator to be correctly predicted for 92.9% of
the respondents. This corresponds to a validity coefficient of 0.84. In the jackknife validation a
classification rate of 92.9% and a validity coefficient of 0.83 were obtained. The intervals for
the classification rage and the validity coefficient in the bootstrap validation were [88.2%;
97.7%] and [0.74; 0.94]. Table 19 shows the incremental validity and relative relevance of the
individual tests. The relative relevance indicates the amount contributed by the test in question
to the validity of the predictive model.
Table 19: Incremental validity and relative relevance of the individual tests
The results show that the ATAVT, with a relative relevance of 10.2%, has criterion validity in
the field of pilot selection.
Proof of the ATAVT’s validity has also been obtained in the field of rail transport (RSSB, 2010).
In a validation study, 85 train conductors and trainees worked on the precursor version of the
ATAVT (TAVTMB; Biehl, 1996) together with several other traffic safety-related ability tests.
As a criterion variable, different objective performance measurements and evaluations by
superiors were gathered. The empirically determined results of this analysis are summarized
in Table 20.
Table 20: Bivariate correlation between the main variable Obtaining an overview with different indices of traffic
safety and training success for train conductors and trainees.
The results prove the criterion validity of the ATAVT with regard to the criterion measures
gathered here and therefore support the criterion validity of the ATAVT in the area of safety
assessments.
Vetter et. al. (2018) conducted a study with professional bus drivers of an Austrian bus
company (N = 126; age: M = 42.12 years, SD = 10.34 years). The level of education of the
study participants was distributed as follows: 1.6% no school-leaving qualification, 14.3%
compulsory schooling without completing vocational training, 74.6% compulsory schooling
38
ATAVT
with vocational training, 8.7% school-leaving qualification at university entrance level, 0.8%
university degree. The average period of employment with the company was 7.1 years (SD =
6.4 years). The bus drivers first completed comprehensive psychological testing with the
SAROAD test set (SCHUHFRIED GmbH, 2012) which is included in the ATAVT (Test form
S1). Then, three driving tests were completed with a bus and rated by the assessors based on
the school grade system (1 = "very good" to 5 = "failed"). For the test, a half-hour drive in real
traffic in the immediate vicinity in real traffic with a set route, a driving exercise on a
maneuvering course as well as a driving test on a standardized hazard course, which simulates
critical traffic situations was performed. Depending on which driving test was predicted based
on the performance in the entire test set, the validation coefficients were between R = 0.376 –
0.616. The analysis of the bivariate correlations between individual tests and driving tests
showed that performance in the ATAVT correlated with the performance on the maneuvering
(r = 0.240) and hazard course (r = 0.253).
4.4 Scaling
The quality criterion of scaling is met when the empirical behavioral relationships are
represented exactly by the test scores (Kubinger, 2003).
For the ATAVT this means that the ability parameter fairly depicts all the relevant information
about the working of the test. Because of the validity of the Rasch model, the quality criterion
of scaling is met.
4.5 Economy
Economy describes how conservatively resources are used in relation to the amount of
information gained (Kubinger, 2009). Since they are computerized, the tests of the Vienna Test
System are very economical to administer and score. The administrator’s time is saved
because the instructions at the beginning of the test are standardized, relieving him of the need
to provide time-consuming verbal explanations. Because the test results are calculated
automatically, the time needed for manual calculation of raw and norm-referenced scores is
also saved. In addition, a shortening of the test results as far the option of decision-oriented
testing is used in the adaptive test forms S1 or S2 (see chapter 6.1).
4.6 Reasonableness
Reasonableness refers to the degree to which a test is constructed so that no undue mental
and physical demands are made of the respondent (Ziegler & Bühner, 2012). Since the ATAVT
requires little of the respondent emotionally, physically or in terms of time the test fulfills the
criterion of reasonableness.
39
ATAVT
4.8 Fairness
If tests are to meet the quality criterion of fairness, they must not systematically discriminate
against particular groups of respondents on the grounds of their sociocultural background
(Kubinger, 2003).
On account of the results referred to in chapter 3.2, the ATAVT can be assumed to have scaling
fairness. The statistical agreement of the item difficulty parameter estimates in the different
subsamples shows that the groups studied are not systematically advantaged or
disadvantaged by individual items. In addition, neither driving experience nor knowledge of
rules appear to give respondents an advantage (Biehl, 1996). In summary, therefore, the
quality criterion of fairness can be regarded as met.
40
ATAVT
5 NORMING
The norm-referenced scores were obtained by calculating the mean percentile rank PR(x) for
each raw score X according to the formula (Lienert & Raatz, 1994):
cum fx − fx 2
PR x = 100 ⋅
N
cum fx corresponds to the number of respondents who obtained the raw score X or a lower
score, fx is the number of respondents with the raw score X, and N is the sample size.
The test results of the newly collected data were then compared with the data of the existing
representative norm sample and checked for their statistic correlation. The two samples were
parallelized based on age, gender, and educational level for this purpose. A mean comparison
was then performed for independent samples. No statistically significant differences (Cohen’s
d<0.3) were found in the test performances of the newly collected data and the existing
representative norm sample (t =1.060; p=0.290). The existing representative norm sample was
therefore expanded with the newly collected data.
The expected age and gender distribution was obtained from demographic data for Germany,
Austria, and Switzerland for 2012 (Europäische Kommission, 2019). The norm sample consists
of 431 (52%) men and 400 (48%) women aged between 15 and 94 (M=46.31, SD=17.556).
Table 19 shows the distribution of age and gender in the norm sample and the corresponding
distribution in the population of Austria and Germany.
A comparison of the two distributions using a statistical chi² test (chi²[29]=41.98, p=0.06) shows
that they do not differ significantly from each other.
The norm sample can therefore be regarded as representative of the population of Austria,
Switzerland, and Germany in terms of gender and age.
3 individuals (<1%) have no school-leaving qualification (EU educational level 1), 97 individuals
(12%) have completed compulsory schooling or an intermediate secondary school but without
completing vocational training (EU educational level 2), 364 individuals (44%) have completed
vocational training or a course at a technical college (EU educational level 3), 286 individuals
(34%) have a school-leaving qualification at university entrance level (EU educational level 4)
and 81 individuals (10%) have a university degree (EU educational level 5).
41
ATAVT
Table 21: Observed frequencies of men and women of different age groups in the norm sample and comparison
with the expected values in a representative sample
Actual number of Actual number of Expected values Expected values
Age groups
men women men women
15 to 19 28 25 12 12
20 to 24 31 30 12 12
25 to 29 34 32 13 14
30 to 34 32 33 17 17
35 to 39 28 29 18 17
40 to 44 39 38 16 15
45 to 49 46 44 13 13
50 to 54 39 40 13 13
55 to 59 34 33 11 12
60 to 64 33 34 11 12
65 to 69 23 25 8 9
70 to 74 16 31 7 9
75 to 79 15 27 5 10
80 to 84 1 9 2 5
85+ 1 1 2 5
In addition to the total norm, age-specific norms (up to 36 years, 37-54 years, 55+ years) are
provided.
42
ATAVT
6 TEST PROCEDURE
The ATAVT consists of a combined instruction and practice phase and the test phase itself. A
total of six different test forms (see chapter 3.4) are available. Respondents should be able to
work the test undisturbed and should be given sufficient time. Attention should be paid to
adequate lighting and a quiet environment during the test presentation.
Should there be any unexpected disruptions during testing (such as a power failure), the test
administrator has the option to restart the worked test. In the event of technical disruption,
answers that have already been entered are deleted.
While the test is being worked, the test administrator can terminate it at any time if the
respondent so wishes. A corresponding warning is then issued when the test is scored. By
contrast, termination without the test administrator’s consent is not possible if the person being
tested has not completed the test.
For sufficient knowledge of the manual by the diagnostician, the time required to prepare the
test presentation (booting the computer, starting the Vienna Test System, adding a
respondent, starting the test) is limited to a few minutes. Since both the administration and
scoring are performed digitally, the time required is reduced to a minimum for the diagnostician
as well. There are no restrictions in terms of age.
43
ATAVT
purposes (PR=16 or parameter -0.631 for Group 1 and PR=33 or parameter -0.161 for Group
2) or until one of the other termination criteria used in the ATAVT applies (see chapter 3.3.2).
Figure 10: Option window for the traffic-psychological cancellation criteria of decision-oriented testing.
The top option is set by default; this does not involve any additional traffic-psychological cancellation
criteria.
Figure 11: Graph of the adaptive process in an ATAVT (S1 or S2) test session. The test terminates
once the overall confidence interval lies above the cut-off score and at least 6 items have been
presented.
44
ATAVT
Used in this way, these additional cancellation criteria may – depending on the respondent’s
ability – significantly reduce the test length. Figure 12 shows the average test length needed
to arrive at a conclusion about a Group 1 driver’s ability in obtaining an overview (driver without
increased responsibility).
Figure 12: Expected test length as a function of the respondent’s ability of obtaining an overview for an
investigation in Group 1. Above an ability level of roughly PR 50 the decision-oriented procedure
carried out in connection with a traffic-psychology-related investigation leads to a noticeable reduction
in test length. Where the ability level is less than PR 50, a quicker than normal decision (and
associated cancellation of the test) is not possible. The test is continued until the specified target
reliability is achieved. This is usually the case after approximately 21 items.
45
ATAVT
The test continues in adaptive test forms S1 and S2 until the standard error of measurement
falls below the pre-set level. Occasionally the test may be terminated by the system; this occurs
if 10 successive items are answered either correctly or incorrectly. Another abort criterion is
the number of items (k=30) presented to the respondent.
In the linear test forms a fixed item set of 13 items (test forms S3 and S4) or 14 items (test
forms S5 and S6) is presented. There is no time limit for working on the test.
46
ATAVT
From the percentile rank it is possible to determine whether a respondent's score is above
average, average or below average compared to the comparison group in question. Two
different definitions of the average range are used, depending on the area of application. In
the areas of clinical and neuropsychology and traffic psychology, a broader average range of
16 - 84 is used (definition: mean plus/minus one standard deviation). By contrast, in the areas
of human resources and sports psychology, a narrower average range of 25 - 75 is used
(definition: the middle 50% of the data). The PR can therefore be interpreted as follows:
Application area
Percentile rank
HR/sport Neuro/clinical/traffic
Below average 0-24 0-15
Average 25-75 16-84
Above average 76-100 85-100
In addition to the percentile ranks, other norm-referenced score scales can also be displayed
(e.g. T-scores, Z-scores, IQ scores). In each case, the data are re-calculated so that the mean
and dispersion are defined by a particular value. For example, T-scores have a mean of 50
and a standard deviation of 10.
With many tests, confidence intervals can also be taken into account. The confidence interval
reflects the test’s reliability; it is a defined area (around the measured test score) within which
the true value of the characteristic lies with a certain probability (e.g. 95%). In practical terms,
it can be stated that the true value almost certainly lies within the given confidence interval.
47
ATAVT
The item analysis protocol provides information on how each item was answered, whether the
item was correctly or incorrectly solved, and how long the respondent took to answer each
item. An item is only considered correctly solved if all the entries in the line corresponding to
that item are positive. This means that the respondent was able to identify all the required
objects in the picture and did not select any that were not actually present. This can be used
to investigate whether a higher than average number of problems arose at any particular point
during the test presentation.
48
ATAVT
Test protocol
The test protocol provides detailed information on test processing, e.g. which item was solved
correctly or incorrectly, the item difficulty parameter of the respective test item, and the
respectively current person parameter estimation (incl. confidence interval VI). Furthermore,
the current reliability is calculated after every task and the solution probability is calculated for
the individual tasks based on the final person parameter estimation.
For the psychological assessment of his ability, tests for measuring the ability to react,
concentration performance, orientation performance, stress tolerance, reaction behavior, and
attention are used. The tests, test forms and achieved test scores are shown in the following:
49
ATAVT
The German Guidelines for the Assessment of Respondents’ Fitness to Drive (BGL)
(Bundesanstalt für Straßenwesen, 2018) see increased responsibility for Group 2 drivers
(truck/bus) and therefore a higher threshold than for Group 1 drivers (car/motorcycle). Thus,
the BGL specifies that Group 1 drivers in all tests carried out for a specific purpose must
achieve a PR ≥ 16 while Group 2 drivers must achieve a PR ≥ 33 in the majority of the tests
and may not drop below a PR of 16 (vgl. Bundesanstalt für Straßenwesen, 2018, S. 11).
Mr. Jones shows the following results while taking the BGL into consideration:
For the Ability to react (RT), Mr. Jones achieves an average result with a PR of 75. Compared
to the reference population he is therefore able to react to relevant stimuli or stimulus
constellations with average speed. The ability to react is therefore sufficient for Group 1 and
2.
In concentration performance (COG) Mr. Jones also achieves an average result in the variable
Ability to concentrate with a PR of 80. He is therefore in an average position to keep the
balance and accuracy required for processing. Concentration performance therefore is
sufficient for both Group 1 and Group 2.
The client also has average scores with a PR of 35 in the area of stress tolerance (DT). This
means that Mr. Jones has an average level of reacting quickly and appropriately when under
stress. His ability to react under stress conditions is therefore sufficient for Group 1 and Group
2.
The score of the attention dimension (ATAVT) is also in the average range with a PR of 45.
His obtaining an overview as part of attentional control therefore is also sufficient for Group 1
and Group 2.
In orientation performance (LVT) Mr. Jones achieves a PR of 25 and therefore drops below
the required threshold of PR 33 for Group 2. However, Mr. Jones does not drop below PR 16
and overall he achieves a PR of at least 33 in the majority of the tests used. Therefore, the
driving-related ability for Group 2 is fulfilled despite dropping below the threshold of the
requirements of the BGL.
Regarding the required thresholds for Group 1, Mr. Jones achieves the required threshold of
PR 16 in all tested dimensions. Therefore, the driving-related ability for Group 1 corresponds
to the requirements of the BGL without restrictions.
Summary: Regarding the psychological performance ability, the test results achieved therefore
support sufficient ability for Group 1 and 2.
50
ATAVT
8 REFERENCES
American Educational Research Association, American Psychological Association & National
Council on Measurement in Education. (2014). Standards for educational and
psychological testing. Washington, DC: AERA.
Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38 (1),
123–140. Springer. doi:doi.org/10.1007/BF02291180
Arendasy, M. & Sommer, M. (2004). Measuring perceptual speed in complex everyday
situations. Perceptual and Motor Skills, 98 (2), 615–626. doi:10.2466/pms.98.2.615-626
Baddeley, A. D. (2012). Working memory: Theories, models, and controversies. Annual
Review of Psychology, 63, 1–29. doi:10.4324/9781315111261
Baker, F. B. & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2.
Aufl.). Boca Raton, FL: CRC Press.
Betz, N. E. & Weiss, D. J. (1976a). Psychological effects of immediate knowledge of results
and adaptive ability testing. Minneapolis: University of Minnesota.
Betz, N. E. & Weiss, D. J. (1976b). Effects of immediate knowledge of results and adaptive
testing in ability test performance. Minneapolis: University of Minnesota.
Biehl, B. (1996). Manual Tachistoscopic Traffic Perception Test (TAVTMB). Mödling:
Schuhfried.
Binet, A. & Simon, T. (1916). The development of intelligence in children:(the Binet-Simon
scale) (Vol. 11). Philadelphia, PA: Williams & Wilkins.
Bühner, M. (2011). Einführung in die Test- und Fragebogenkonstruktion. München: Pearson.
Bundesanstalt für Straßenwesen. (2018). Begutachtungsleitlinien zur Kraftfahreignung. (Stand
24.). Bergisch Gladbach: Bundesanstalt für Straßenwesen.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge:
Cambridge University Press.
Cattell, R. B. (1987). Intelligence: Its structure, growth and action (Advances in psychology,
Vol. 35). Amsterdam: North Holland.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2. Aufl.). New York,
NY: Routledge.
Deubel, H. & Hauske, G. (1988). The programming of visually guided saccades. In H. Marko,
G. Hauske & A. Struppler (Hrsg.), Processing structures for perception and action (S.
119–132). Weinheim: Verlag Chemie.
Eder, M., Brieber, D., Uhl, U., Vetter, M. & Häcker, H. (2018). Auswirkungen von
computerisiertem, kognitivem Training auf die Fahreignung gesunder, älterer Kraftfahrer.
Zeitschrift für Neuropsychologie, 28, 198.
Embretson, S. E. (1983). Construct validity: construct representation versus nomothetic span.
Psychological Bulletin, 93 (1), 179–197. doi:10.1037/0033-2909.93.1.179
Embretson, S. E. (1998). A cognitive design system approach to generating valid tests:
Application to abstract reasoning. Psychological Methods, 3 (3), 380–396. American
Psychological Association.
51
ATAVT
Embretson, S. E. (2002). Generating abstract reasoning items with cognitive theory. In S.H.
Irvine & P.C. Kyllonen (Hrsg.), Item generation for test development (S. 219–250).
Mahwah. NJ: Lawrence Erlbaum.
Etzel, S. & Hornke, L. F. (2018). Manual Visueller Gedächtnistest (Version 27 – Revision 6).
Mödling: Schuhfried.
Europäische Kommission. (2019). Bevölkerung am 1. Januar nach Altersgruppe und
Geschlecht (demo_pjangroup) [Data file]. Verfügbar unter:
http://epp.eurostat.ec.europa.eu/portal/page/portal/population/data/database
European Federation of Psychologists‘ Associations. (2013). Performance requirements,
context definitions and knowledge & skill specifications for the three EFPA levels of
qualifications in psychological assessment. Zugriff am 28.3.2019. Verfügbar unter:
www.efpa.eu/download/650d0d4ecd407a51139ca44ee704fda4
Findlay, J. M. (1980). The visual stimulus for saccadic eye movements in human observers.
Perception, 9 (1), 7–21. doi:10.1068/p090007
Fischer, G. H. (1974). Einführung in die Theorie psychologischer Tests: Grundlagen und
Anwendungen. Bern: Huber.
Fischer, G. H. & Kowar, P. (1977). A new traffic perception test on the basis of the logistic test
model. Accident Analysis & Prevention, 9 (3), 203–215. Elsevier. doi:10.1016/0001-
4575(77)90022-7
Fischer, G. H. & Molenaar, I. W. (Hrsg.). (1995). Rasch models: Foundations, recent
developments, and applications (1. Aufl.). New York, NY: Springer.
Fischer, G. H. & Ponocny-Seliger, E. (1998). Structural Rasch modeling: Handbook of the
usage of LPCM-WIN 1.0. Groningen: Progamma.
Frey, A., Hartig, J. & Moosbrugger, H. (2009). Effekte des adaptiven Testens auf die Motivation
zur Testbearbeitung am Beispiel des Frankfurter Adaptiven Konzentrationsleistungs-
Tests. Diagnostica, 55 (1), 20–28. doi:10.1026/0012-1924.55.1.20
Friedman, A. (1979). Framing pictures: The role of knowledge in automatized encoding and
memory for gist. Journal of Experimental Psychology: General, 108 (3), 316–355.
doi:doi.org/10.1037/0096-3445.108.3.316
Grossberg, S. (1987a). Cortical dynamics of three-dimensional form, color, and brightness
perception: I. Monocular theory. Perception & Psychophysics, 41 (2), 87–116.
doi:doi.org/10.3758/BF03204874
Grossberg, S. (1987b). Cortical dynamics of three-dimensional form, color and brightness
perception II: Binocular theory. Perception & Psychophysics, 41 (2), 117–158.
doi:10.3758/BF03204875
Häusler, J. (2006). Effekte von Testwiederholung und deren praktische Bedeutung im Rahmen
der verkehrspsychologischen Begutachtung. Österreichische Tagung für
Verkehrspsychologie. Gmunden.
Hessler, R. & Grossjohann, A. (1953). Untersuchung von Kraftfahrern in dem Medizinisch‐
Psychologischen Institut für Verkehrssicherheit beim Technischen Überwachungs‐Verein
Stuttgart e.V. Zeitschrift des Technischen Überwachungs‐Vereins, 1–12.
Hoffmann, J. & Kämpf, U. (1985). Mechanismen der Objektbenennung-Parallele
Verarbeitungskaskaden. Sprache & Kognition, 4, 217–230.
52
ATAVT
Hoffmann, J. & Klein, R. (1988). Kontexteffekte bei der Benennung und Entdeckung von
Objekten. Sprache & Kognition, 7, 25–39.
Hoffmann, J. & Zießler, M. (1982). Begriffe und ihre Merkmale. Zeitschrift für Psychologie, 190
(1), 46–77.
Hoffmann, J. & Zießler, M. (1986). The integration of visual and functional classifications in
concept formation. Psychological Research, 48 (2), 69–78.
Holland, P. W. & Dorans, N. J. (2006). Linking and equating. Educational Measurement, 4,
187–220.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal
of Statistics, 6 (2), 65–70. doi:10.2307/4615733
Horn, J. (1988). Thinking about human abilities. In J.R. Nesselroade & R.B. Cattell (Hrsg.),
Handbook of multivariate experimental psychology (2. Aufl., S. 645–685). New York, NY:
Plenum Press. doi:10.1007/978-1-4613-0893-5_19
Hu, L. & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis:
Conventional criteria versus new alternatives. Structural Equation Modeling: A
Multidisciplinary Journal, 6 (1), 1–55.
Instituto Brasileiro de Geografia e Estatística. (2011). Censo demográfico, 2010. Rio de
Janeiro: Author.
International Test Commission. (2001). International guidelines for test use. International
Journal of Testing, 1 (2), 93–114. doi:10.1207/s15327574ijt0102_1
Irvine, S. H. & Kyllonen, P. C. (2002). Item generation for test development. New York, NY:
Routledge.
Kersting, M. (2018). Zur Information über und Dokumentation von Instrumenten zur Erfassung
menschlichen Erlebens und Verhaltens - Die DIN SCREEN Checkliste 1, Version 3. In
Diagnostik- und Testkuratorium (Hrsg.), Personalauswahl kompetent gestalten:
Grundlagen und Praxis der Eignungsdiagnostik nach DIN 33430 (S. 224–244). Berlin:
Springer.
Kingsbury, G. G. & Zara, A. R. (1989). Procedures for selecting items for computerized
adaptive tests. Applied Measurement in Education, 2 (4), 359–375. Taylor & Francis.
doi:10.1207/s15324818ame0204_6
Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking (3. Aufl.). New York,
NY: Springer. doi:10.1007/978-1-4939-0317-7
Koller, I., Alexandrowicz, R. & Hatzinger, R. (2012). Das Rasch Modell in der Praxis. Eine
Einführung in eRm. Wien: Facultas Verlags- und Buchhandels AG.
Kubinger, K. D. (2003). Gütekriterien. In K.D. Kubinger & R.S. Jäger (Hrsg.), Schlüsselbegriffe
der psychologischen Diagnostik (S. 195–204). Weinheim: Beltz.
Kubinger, K. D. (2009). Psychologische Diagnostik. Theorie und Praxis psychologischen
Diagnostizierens. Göttingen: Hogrefe.
Lienert, G. A. & Raatz, U. (1994). Testaufbau und Testanalyse. Weinheim: Beltz.
van der Linden, W. J. & Glas, C. A. W. (Hrsg.). (2010). Elements of adaptive testing. New York,
NY: Springer.
53
ATAVT
van der Linden, W. J. & Hambleton, R. K. (Hrsg.). (1997). Handbook of modern item response
theory. Berlin: Springer.
Loftus, G. R. (1976). A framework for a theory of picture recognition. In R.A. Monty & J.W.
Senders (Hrsg.), Eye movements and psychological processes (S. 499–514). Hillsdale,
NJ: Lawrence Erlbaum.
Martin-Löf, P. (1973). Statistiska Modeller. Stockholm: Institutet för säkringsmatematik och
matematisk statistik vid Stockholms universitet.
McGrew, K. S. (1997). Analysis of the major intelligence batteries according to a proposed
comprehensive Gf-Gc framework. In D.P. Flanagan, J.L. Genshaft & P.L. Harrison
(Hrsg.), Contemporary intellectual assessment: Theories, tests, and issues (S. 151–179).
New York, NY: Guilford.
Meyers, L. S. & Rhoades, R. W. (1978). Visual search of common scenes. Quarterly Journal
of Experimental Psychology, 30, 297–310. doi:10.1080/14640747808400677
Müller, H. J. & Krummenacher, J. (2006). Visual search and selective attention. Visual
Cognition, 14 (4–8), 389–410. doi:10.1080/13506280500527676
Neisser, U. (1976). Cognition and reality. San Francisco, CA: Freeman.
Ottes, F. P., Van Gisbergen, J. A. & Eggermont, J. J. (1985). Latency dependence of colour-
based target vs nontarget discrimination by the saccadic system. Vision Research, 25 (6),
849–862. doi:10.1016/0042-6989(85)90193-2
Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika,
66 (3), 437–459. doi:doi.org/10.1007/BF02294444
R Core Team. (2016). R: A language and environment for statistical computing. R Foundation
for Statistical Computing. Verfügbar unter: https://www.r-project.org/
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago, IL:
University of Chicago Press.
Reinecke, J. (2014). Strukturgleichungsmodelle in den Sozialwissenschaften (2. Aufl.).
München: Oldenbourg.
Rettig, K. & Hornke, L. F. (1990). Adaptives Testen. In W. Sarges (Hrsg.), Management-
Diagnostik (S. 444–450). Göttingen: Hogrefe.
Risser, R., Chaloupka, C., Grundler, W., Sommer, M., Häusler, J. & Kaufmann, C. (2008).
Using non-linear methods to investigate the criterion validity of traffic-psychological test
batteries. Accident Analysis and Prevention, 40 (1), 149–157.
doi:10.1016/j.aap.2006.12.018
Rosch, E. (1977). Human categorization. In N. Warren (Hrsg.), Studies in cross-cultural
psychology (1. Aufl., S. 3–49). London: Academic Press.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of
Statistical Software, 48 (2), 1–36. doi:10.18637/jss.v048.i02
Rost, J. (2004). Lehrbuch Testtheorie - Testkonstruktion. Bern: Huber.
RSSB. (2010). Driver Selection: Development Phase - Updated selection criteria and validation
study (T628 report). London.
Schellig, D. (2018). Manual Block-Tapping-Test vorwärts Block-Tapping-Test rückwärts
Supra-Blockspannen-Test (Version 29). Mödling: Schuhfried.
54
ATAVT
Schmidt, F. L. & Hunter, J. E. (1998). The validity and utility of selection methods in personnel
psychology: Practical and theoretical implications of 85 years of research findings.
Psychological Bulletin, 124 (2), 262–274. doi:10.1037/0033-2909.124.2.262
Schneider, W. J. & McGrew, K. S. (2018). The Cattell-Horn-Carroll theory of cognitive abilities.
In D.P. Flanagan & E.M. McDonough (Hrsg.), Contemporary intellectual assessment:
Theories, tests, and issues (4. Aufl., S. 73–163). New York, NY: Guilford Press.
Schuhfried, G. (2005). Handanweisung Expertensystem Verkehr (XPSV). Mödling:
Schuhfried.
Schuhfried, G. (2019a). Manual Cognitrone (Version 52 – Revision 2). Mödling: Schuhfried.
Schuhfried, G. (2019b). Manual Determinationstest (Version 45 – Revision 1). Mödling:
Schuhfried.
Schuhfried, G. (2019). Manual Reaktionstest (Version 45 – Revision 1). Mödling: Schuhfried.
SCHUHFRIED GmbH. (2012). Manual Safety Assessment Road (Version 4 - Revision 2).
Mödling: SCHUHFRIED GmbH.
Sijtsma, K. & Emons, W. H. M. (2011). Advice on total-score reliability issues in psychosomatic
measurement. Journal of Psychosomatic Research, 70 (6), 565–572. Elsevier Inc.
doi:10.1016/j.jpsychores.2010.11.002
Sommer, M., Arendasy, M. E., Schützhofer, B. & Knessl, G. (2017). Reduktion von
Testwiederholungseffekten durch adaptives Testen - Experimentelle Überprüfung der
Retest-Effekte anhand des Faktors Wahrnehmungsgeschwindigkeit. Zeitschrift für
Verkehrssicherheit, 63 (2), 102–103.
Sommer, M., Arendasy, M., Schuhfried, G. & Litzenberger, M. (2005). Diagnostische
Unterscheidbarkeit unfallfreier und mehrfach unfallbelasteter Kraftfahrer mit Hilfe nicht-
linearer Auswertemethoden. Zeitschrift für Verkehrssicherheit, 51 (2), 82–86.
Sommer, M., Häusler, J., Koning, A. J. & Arendasy, M. (2006). Validation of the dutch airforce
test battery using artificial neural networks. Proceedings from 48th Annual International
Military Testing Association (S. 1–10).
Sommer, M., Heidinger, C., Arendasy, M., Schauer, S., Schmitz-Gielsdorf, J. & Häusler, J.
(2010). Cognitive and personality determinants of post-injury driving fitness. Archives of
Clinical Neuropsychology, 25 (2), 99–117. doi:10.1093/arclin/acp109
Sommer, M., Herle, Margit, Häusler, J. & Arendasy, M. (2008). Von TAVTMB zu ATAVT: Eine
Anwendung der automatisierten Itemgenerierung unter einschränkenden
Rahmenbedingungen. In G. Fleck & G. Ebner (Hrsg.), Zweites Österreichisches
Symposium für Psychologie im Militär: Schriftenreihe der Landesverteidigungsakademie
(S. 27–52). Wien: Bundesministerium für Landesverteidigung und Sport.
Sommer, M., Herle, M, Häusler, J., Risser, R., Schützhofer, B. & Chaloupka, C. (2008).
Cognitive and personality determinants of fitness to drive. Transportation Research Part
F: Traffic Psychology and Behaviour, 11 (5), 362–375. doi:10.1016/j.trf.2008.03.001
Testkuratorium. (2010). TBS-TK. Testbeurteilungssystem des Testkuratoriums der Föderation
Deutscher Psychologenvereinigungen. Revidierte Fassung vom 09. September 2009.
Psychologische Rundschau, 61, 52–56.
Tonidandel, S., Quiñones, M. A. & Adams, A. A. (2002). Computer-adaptive testing: The
impact of test characteristics on perceived performance and test takers’ reactions. Journal
of Applied Psychology, 87 (2), 320.
55
ATAVT
Vetter, M., Schünemann, A. L., Brieber, D., Debelak, R., Gatscha, M., Grünsteidel, F. et al.
(2018). Cognitive and personality determinants of safe driving performance in
professional drivers. Transportation Research Part F: Traffic Psychology and Behaviour,
52, 191–201. doi:10.1016/j.trf.2017.11.008
Wagner, M., Karner, T. & Biehl, B. (2018). Manual Linienverfolgungstest (Version 38 –
Revision 1). Mödling: Schuhfried.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory.
Psychometrika, 54 (3), 427–450. doi:10.1007/BF02294627
Yu, C. Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with
binary and continuous outcomes (Vol. 30). Los Angeles, CA: University of California.
Ziegler, M. & Bühner, M. (2012). Grundlagen der Psychologischen Diagnostik. Wiesbaden:
Springer VS.
Zimmer, H. D. (1984). Blume oder Rose? Unterschiede in der visuellen
Informationsverarbeitung bei Experten und Laien. Archiv für Psychologie, 136, 343–361.
56
ATAVT
The data relates to 486 (83.2%) men and 98 (16.8%) women aged between 80 and 93
(M=81.58, SD=2.78). 143 individuals (24.5%) have no school-leaving qualification (EU
education level 1), 241 individuals (41.3%) have completed compulsory schooling or an
intermediate secondary school but without completing vocational training (EU education level
2), 94 individuals (16.1%) have completed vocational training or a course at a technical college
(EU education level 3), 74 individuals (12.7%) have a school-leaving qualification at university
entrance level (EU education level 4) and 32 individuals (5.5%) have a university degree (EU
education level 5).
The data relate to 2747 men and 270 women aged between 16 and 83 (M=36.76, SD=12.383).
491 individuals have no school-leaving qualification (EU education level 1), 775 individuals
have completed compulsory schooling or an intermediate secondary school but without
completing vocational training (EU education level 2), 1049 individuals have completed
vocational training or a course at a technical college (EU education level 3), 389 individuals
have a school-leaving qualification at university entrance level (EU education level 4) and 315
individuals have a university degree (EU education level 5).
The data relate to 1417 men and 155 women aged between 18 and 66 (M=34.24, SD=10.73).
14 individuals have no school-leaving qualification (EU education level 1), 73 individuals have
completed compulsory schooling or an intermediate secondary school but without completing
vocational training (EU education level 2), 1012 individuals have completed vocational training
or a course at a technical college (EU education level 3), 6 individuals have a school-leaving
qualification at university entrance level (EU education level 4) and 23 individuals have a
university degree (EU education level 5).The remaining people did not provide any information
about their education.
57
ATAVT
Adults – Romania
The data which the norm sample “Adults – Romania" is based on data collected between 2008
and 2009 as part of norm studies at the University of Bucharest.
The data relate to 226 men and 180 women aged between 17 and 82 (M=29.2, SD=12.306).
24 individuals have no school-leaving qualification (EU education level 1), 39 individuals have
completed compulsory schooling or an intermediate secondary school but without completing
vocational training (EU education level 2), 101 individuals have completed vocational training
or a course at a technical college (EU education level 3), 121 individuals have a school-leaving
qualification at university entrance level (EU education level 4) and 121 individuals have a
university degree (EU education level 5).
The expected age and gender distribution resulted from demographic data from Italy in 2011
(Instituto Nazionale di Statistica, 2011). The norm sample consists of 155 (45%) men and 188
(55%) women aged between 17 and 94 (M=49.44, SD=18.939).
Table 23 shows the distribution of age and gender in the norm sample and the corresponding
distribution in the population of Italy:
Table 23: Observed frequencies of men and women of different age groups and comparison with the expected values in a
representative sample
A comparison of the two distributions using a statistical chi² test (chi²[29]=41.73, p=0.06) shows
that they do not differ significantly from each other.
The norm sample can therefore be regarded as representative of the population of Italy in
terms of gender and age.
58
ATAVT
completed vocational training or a course at a technical college (EU educational level 3), 81
individuals (24%) have a school-leaving qualification at university entrance level (EU
educational level 4) and 141 individuals (41%) have a university degree (EU educational level
5). For 2 people, the highest education completed is not known.
In addition to the total norm, age-specific norms (up to 30 years, 31-56 years, 57+ years),
education-specific norms (EU education levels 1-3, EU education level 4, EU educational level
5) and gender-specific norms are provided.
The data relates to 113 men and 20 women aged between 18 and 63. 4 individuals have no
school-leaving qualification (EU education level 1), 37 individuals have completed compulsory
schooling or an intermediate secondary school but without completing vocational training (EU
education level 2), 52 individuals have completed vocational training or a course at a technical
college (EU education level 3), 29 individuals have a school-leaving qualification at university
entrance level (EU education level 4) and 11 individuals have a university degree (EU
education level 5).
The data relates to 127 men and 42 women aged between 18 and 64. 2 individuals have no
school-leaving qualification (EU education level 1), 57 individuals have completed compulsory
schooling or an intermediate secondary school but without completing vocational training (EU
education level 2), 71 individuals have completed vocational training or a course at a technical
college (EU education level 3), 32 individuals have a school-leaving qualification at university
entrance level (EU education level 4) and 7 individuals have a university degree (EU education
level 5).
The data relates to 154 men and 44 women aged between 17 and 58. 2 individuals have no
school-leaving qualification (EU education level 1), 13 individuals have completed compulsory
schooling or an intermediate secondary school but without completing vocational training (EU
education level 2), 67 individuals have completed vocational training or a course at a technical
college (EU education level 3), 69 individuals have a school-leaving qualification at university
entrance level (EU education level 4) and 46 individuals have a university degree (EU
education level 5).
The data relates to 337 men and 92 women aged between 17 and 62. 9 individuals have no
school-leaving qualification (EU education level 1), 77 individuals have completed compulsory
59
ATAVT
schooling or an intermediate secondary school but without completing vocational training (EU
education level 2), 176 individuals have completed vocational training or a course at a technical
college (EU education level 3), 119 individuals have a school-leaving qualification at university
entrance level (EU education level 4) and 48 individuals have a university degree (EU
education level 5).
The data relates to 1353 men and 216 women aged between 18 and 80. 13 individuals have
no school-leaving qualification (EU education level 1), 315 individuals have completed
compulsory schooling or an intermediate secondary school but without completing vocational
training (EU education level 2), 640 individuals have completed vocational training or a course
at a technical college (EU education level 3), 394 individuals have a school-leaving
qualification at university entrance level (EU education level 4) and 206 individuals have a
university degree (EU education level 5).
The data relates to 126 men aged between 22 and 65. 13 individuals have no school-leaving
qualification (EU education level 1), 15 individuals have completed compulsory schooling or
an intermediate secondary school but without completing vocational training (EU education
level 2), 86 individuals have completed vocational training or a course at a technical college
(EU education level 3), 11 individuals have a school-leaving qualification at university entrance
level (EU education level 4) and 1 individual has a university degree (EU education level 5).
9 individuals (3%) have no school-leaving qualification (EU educational level 1), 84 individuals
(25%) have completed compulsory schooling or an intermediate secondary school but without
completing vocational training (EU educational level 2), 98 individuals (30%) have completed
vocational training or a course at a technical college (EU educational level 3), 107 individuals
(32%) have a school-leaving qualification at university entrance level (EU educational level 4)
and 33 individuals (10%) have a university degree (EU educational level 5).
The data relate to 213 men and 87 women aged between 48 and 88 (M=59.12, SD=8.072). 16
individuals have no school-leaving qualification (EU education level 1), 125 individuals have
completed compulsory schooling or an intermediate secondary school but without completing
vocational training (EU education level 2), 48 individuals have completed vocational training or
a course at a technical college (EU education level 3), 46 individuals have a school-leaving
60
ATAVT
qualification at university entrance level (EU education level 4) and 65 individuals have a
university degree (EU education level 5).
The data relate to 639 men and 9 women aged between 21 and 69 (M=46.78, SD=10.663). 46
individuals have completed compulsory schooling or an intermediate secondary school but
without completing vocational training (EU education level 2), 413 individuals have completed
vocational training or a course at a technical college (EU education level 3), 161 individuals
have a school-leaving qualification at university entrance level (EU education level 4) and 25
individuals have a university degree (EU education level 5).
The data relate to 283 men and 58 women aged between 19 and 70 (M=45.48, SD=10.238).
37 individuals have completed compulsory schooling or an intermediate secondary school but
without completing vocational training (EU education level 2), 172 individuals have completed
vocational training or a course at a technical college (EU education level 3), 113 individuals
have a school-leaving qualification at university entrance level (EU education level 4) and 19
individuals have a university degree (EU education level 5).
The data relates to 322 men aged between 21 and 71 (M=47.94, SD=9.664). 1 individuals
have no school-leaving qualification (EU education level 1), 18 individuals have completed
compulsory schooling or an intermediate secondary school but without completing vocational
training (EU education level 2), 203 individuals have completed vocational training or a course
at a technical college (EU education level 3), 81 individuals have a school-leaving qualification
at university entrance level (EU education level 4) and 19 individuals have a university degree
(EU education level 5).
Table 24 shows the distribution of age and gender in the norm sample and the corresponding
distribution in the population of Austria and Germany:
61
ATAVT
Table 24: Observed frequencies of men and women of different age groups in the norm sample and comparison with the
expected values in a representative sample
A comparison of the two distributions using a chi² test (chi²[29]=9.34, p=1) shows that they do
not differ significantly from each other. The norm sample can therefore be regarded as
representative of the population of Brazil in terms of gender and age.
26 individuals (6%) have no school-leaving qualification (EU educational level 1), 76 individuals
(17%) have completed compulsory schooling or an intermediate secondary school but without
completing vocational training (EU educational level 2), 80 individuals (18%) have completed
vocational training or a course at a technical college (EU educational level 3), 135 individuals
(31%) have a school-leaving qualification at university entrance level (EU educational level 4)
and 119 individuals (27%) have a university degree (EU educational level 5).
In addition to the total norm, age-specific norms (up to 28 years, 29-45 years, 46+ years),
education-specific norms (EU education levels 1-3, EU education levels 4-5), and gender-
specific norms are provided.
The data relates to 148 men aged between 26 and 73 (M=49.81, SD=9.724). 38 individuals
(26%) have no school-leaving qualification (EU educational level 1), 94 individuals (64%) have
completed compulsory schooling or an intermediate secondary school but without completing
vocational training (EU educational level 2), 14 individuals (9%) have completed vocational
training or a course at a technical college (EU educational level 3), 1 individual (1%) have a
school-leaving qualification at university entrance level (EU educational level 4) and 1
individual (1%) have a university degree (EU educational level 5).
62
ATAVT
The data relate to 176 men and 2 women aged between 21 and 71 (M = 46.01; SD = 10.546).
12 individuals have completed compulsory schooling or an intermediate secondary school but
without completing vocational training (EU education level 2), 101 individuals have completed
vocational training or a course at a technical college (EU education level 3), 53 individuals
have a school-leaving qualification at university entrance level (EU education level 4) and 12
individuals have a university degree (EU education level 5).
A total of 5 individuals have completed compulsory school or secondary modern school, but
without having completed vocational training (EU education level 2), 54 individuals have
completed vocational training or technical school (EU education level 3), 31 individuals have
completed secondary school with a high school diploma (EU education level 4) and 18
individuals have a university degree (EU education level 5).
For the reliability of test form S5, an estimate of the internal consistency using Cronbach's
alpha is available. For details see Section 4.2.
The Exclusive norm sample "Healthy older drivers - Austria" is not included as standard in the
Vienna Test System and can be purchased separately.
63
ATAVT
64
ATAVT
Table A1. A list of the evidence for the requirements of DIN 33430
mandatory
mandatory
Source
Source
Din no.
Din no.
optional
optional
(O) or
(O) or
Yes
Yes
No
No
(M)
(M)
A1 O ☒ ☐ Chapter 2 B21 O ☐ ☒
A2 O ☒ ☐ Chapter 2 B22 M ☒ ☐ Chapter 4.2
A3 O ☐ ☐ Not relevant B23 O ☐ ☒
A4 M ☒ ☐ Chapter 2 B24 O ☒ ☐ Chapter 4.2
A5 O ☒ ☐ Chapter 2 B25 O ☒ ☐ Chapter 4.2
A6 O ☒ ☐ Chapter 2 B26 M ☐ ☐ Not relevant
A7 O ☐ ☐ Not relevant B27 M ☒ ☐ Chapter 4.3
A8 O ☒ ☐ Chapter 2, 3.4 B28 M ☒ ☐ Chapter 4.3
A9 O ☒ ☐ Chapter 2, 3.4 B29 M ☒ ☐ Chapter 4.3
A10 O ☒ ☐ Chapter 2, 3.4 B30 M ☒ ☐ Chapter 4.3
A11 O ☒ ☐ Chapter 2 B31 M ☒ ☐ Chapter 4.3
A12 O ☐ ☐ Not relevant B32 M ☒ ☐ Chapter 4.3
A13 M ☒ ☐ Chapter 6 B33 M ☒ ☐ Chapter 4.3
A14 M ☒ ☐ Chapter 2 B34 O ☒ ☐ Chapter 4.3
A15 M ☒ ☐ Chapter 7 B35 M ☐ ☐ Not relevant
B1 M ☒ ☐ Chapter 3.1 B36 M ☐ ☐ Not relevant
B2 M ☒ ☐ Chapter 3.2 B37 M ☐ ☐ Not relevant
B3 M ☒ ☐ Chapter 4.3 B38 O ☐ ☐ Not relevant
B4 M ☒ ☐ Chapter 4.3 B39 O ☐ ☐ Not relevant
B5 M ☒ ☐ Chapter 4.3 B40 O ☐ ☐ Not relevant
B6 M ☒ ☐ Chapter 4.3 B41 O ☒ ☐ Chapter 4.3,
3.2.5
B7 M ☒ ☐ Chapter 4.3 B42 O ☒ ☐ Chapter 4.3
B8 M ☐ ☒ B43 O ☒ ☐ Chapter 4.3
B9 M ☐ ☒ B44 O ☒ ☐ Chapter 4.3
B10 O ☐ ☒ B45 O ☒ ☐ Chapter 4.3
B11 O ☐ ☒ B46 M ☒ ☐ Chapter 4.3
B12 O ☒ ☐ Chapter 4.3 B47 O ☐ ☒
B13 M ☒ ☐ Chapter 4.3 B48 M ☒ ☐ Chapter 4.3
B14 O ☒ ☐ Chapter 3.3 B49 O ☒ ☐ Chapter 4.3
65
ATAVT
66