0% found this document useful (0 votes)

32 views

ATAVT en-US

Uploaded by

federationofindianpsychology

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

ATAVT en-US

Uploaded by

federationofindianpsychology

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

VIENNA TEST SYSTEM schuhfried.

com

Manual
Adaptive Tachistoscopic Traffic Perception Test
Test label ATAVT

Version 52 – Revision 4

Moedling, March 2023

SCHUHFRIED GmbH, Hyrtlstrasse 45, 2340 Moedling, Austria

Tel. +43/2236/42315-0, Fax: +43/2236/46597
[email protected] www.schuhfried.com
Registered offices: Moedling, FN 104661p
Landesgericht Wr. Neustadt, UID No. ATU 19273809
ATAVT

CONTENTS
1 SUMMARY............................................................................................................................. 4

2 USE OF THE TEST .................................................................................................................. 6

3 DESCRIPTION OF THE TEST ................................................................................................. 7

3.1 Theoretical background .................................................................................................. 7
3.1.1 Definition attentional control and obtaining an overview ............................................ 7
3.1.2 Attentional control in the context of general cognitive ability ...................................... 7
3.1.3 Obtaining an overview as part of attentional control .................................................. 8
3.2 Test design ..................................................................................................................... 9
3.2.1 Test presentation conditions ..................................................................................... 9
3.2.2 Development of the item pool ...................................................................................10
3.2.3 Description of the Rasch model................................................................................12
3.2.4 Analysis in accordance with the Rasch model ..........................................................14
3.2.5 Equivalence of TAVTMB and ATAVT .......................................................................16
3.3 Adaptive testing .............................................................................................................19
3.3.1 Adaptive versus linear presentation .........................................................................19
3.3.2 The process of adaptive test presentation ................................................................20
3.3.3 Sociodemographically determined start phase .........................................................21
3.3.4 Maintaining the pre-set reliability for adaptive tests ..................................................22
3.3.5 Item Exposure Control .............................................................................................24
3.4 Test forms .....................................................................................................................25
3.5 Description of variables .................................................................................................25

4 EVALUATION ....................................................................................................................... 26
4.1 Objectivity ......................................................................................................................26
4.2 Reliability .......................................................................................................................26
4.3 Validity ...........................................................................................................................28
4.3.1 Content validity ........................................................................................................28
4.3.2 Construct validity ......................................................................................................28
4.3.3 Criterion validity .......................................................................................................34
4.4 Scaling ..........................................................................................................................39
4.5 Economy .......................................................................................................................39
4.6 Reasonableness ............................................................................................................39
4.7 Resistance to falsification ..............................................................................................39
4.8 Fairness ........................................................................................................................40

5 NORMING ............................................................................................................................ 41

6 TEST PROCEDURE............................................................................................................... 43
6.1 Decision-oriented testing ...............................................................................................43
6.2 Instruction and practice phase .......................................................................................45
6.3 Test phase.....................................................................................................................46

2
ATAVT

7 INTERPRETATION OF TEST RESULTS ................................................................................ 47

7.1 General notes on interpretation .....................................................................................47
7.2 Interpretation of variables ..............................................................................................47
7.3 Additional output of results ............................................................................................48
7.4 Case study ....................................................................................................................49

8 REFERENCES ...................................................................................................................... 51

9 APPENDIX A – NORM SAMPLES ....................................................................................... 57

10 APPENDIX B - NOTES ON INTERPRETATION IN TRAFFIC PSYCHOLOGICAL

ASSESSMENT .......................................................................................................................... 64

11 APPENDIX C – DIN 33430 CHECKLIST ............................................................................... 65

3
ATAVT

1 SUMMARY
Author
G. Schuhfried

Application
Assesses obtaining an overview as part of attention performance.
Main areas of application: traffic psychology, human resources, manufacturing & safety

Theoretical background
The ATAVT measures obtaining an overview by briefly showing pictures that show traffic
situations. The items are constructed using an explicit, theory-led rationale which is based on
detailed quantitative analyses of the processes involved in the test performance. The
construction rationale is based on the design principles of the proven Tachistoscopic Traffic
Test (TAVTMB) (Biehl, 1996) and expands them by including current research findings on
cognitive skills (Cattell-Horn-Carroll model) (Schneider & McGrew, 2018). 84 items were drawn
up and evaluated using the 1PL Rasch model. The resulting item pool allows for adaptive test
presentation with all the associated advantages that this brings.

Test structure
Pictures of traffic situations appear briefly on the screen. After seeing each picture, the
respondent is asked to state what was in it. The respondent can choose from five answers that
are given. In two test forms items are presented adaptively (S1 and S2) - that is, after an initial
phase the respondent is presented with items whose difficulty is increasingly tailored to match
his/her ability. The other test forms are presented according to a set order.

Test forms
There are six test forms: S1 is designed for use in countries in which traffic drives on the right
and S2 for countries that drive on the left. In addition, there are four linear test forms, S3 to S6.
These forms are especially selective in the lower ability range.

Scoring
The person parameter Obtaining an overview is displayed, together with the corresponding
norm-referenced score including the confidence interval. The main variable Obtaining an
overview can be interpreted as a measure for obtaining an overview as part of focused
attention.

Reliability
Reliability is set to a standard error of measurement of 0.49 for the adaptive test forms S1 and
S2. This corresponds to a reliability of r = 0.80. The pre-defined precision of measurement
applies to all respondents at all performance ranges. Estimates of internal consistency,
measured by Cronbach's alpha, are available for the test scores of test forms S3 to S6. The
estimates are 0.726 for S3 and S4 and 0.729 for S5 and S6.

Validity
Due to the test design, ATAVT conforms precisely to the requirements for measuring obtaining
an overview and therefore has content validity. Correction analyses with construct-related and
non-construct-related tests as well as a confirmatory factor analysis prove the construct validity
of the test. Furthermore, there is a large number of studies in the areas of traffic psychology,

4
ATAVT

aviation psychology, and safety assessment (e.g. Sommer, Arendasy, Schuhfried &
Litzenberger, 2005; Vetter, Schünemann, Brieber, Debelak, Gatscha, Grünsteidel, 2018)
which prove the criterion validity of the ATAVT in different application contexts.

Norms
A representative norm sample for age and gender consisting of N=831 individuals from
German-speaking countries is available. Representative norm samples are also available for
Brazil (N=436), Italy (N=343) and Slovakia (N=331) as well as numerous convenience
samples.

Time required for the test

Between 8 and 14 minutes, depending on the test form.

Disclaimer
The results obtained in these tests are computer-generated. They are based on the
candidate’s response behavior and the technical infrastructure and depend on the test
presentation conditions as well as other factors including the candidate’s experience,
motivation, interest, self-awareness, and mental and physical state. In accordance
with the APA Standards for Educational and Psychological Testing (2014), page 13
and Standard 9.0, it is established that the test user bears ultimate responsibility for
evaluating whether the use of a test in a specific setting is justified. The test user must
take cultural and legal factors into account when making this decision. SCHUHFRIED
therefore cannot accept responsibility for decisions and interpretations made on the
basis of the test results and cannot be held liable for the consequences of using them,
regardless of how such consequences arise. The test results are confidential and
must be stored in accordance with relevant international and national data protection
guidelines and legislation.

5
ATAVT

2 USE OF THE TEST

The Adaptive Tachistoscopic Traffic Perception Test (ATAVT) is used to measure obtaining
an overview as part of attentional control. Since this is a very basic ability which is required in
many everyday situations, the areas of application of the ATAVT are very broad. The ATAVT
is designed primarily for use in the HR area (e.g. for selecting professional drivers) and the
traffic area (fitness to drive due to legal regulations).

Regardless of the respective issue, the user needs to have relevant qualifications in the area
of suitability assessment or assessment in the clinical and health context for applying, scoring,
and interpreting the questionnaire. This includes knowledge about statistical fundamentals,
test theory, quality criteria and design principles of psychometric tests. Norms such as
DIN33430 or comparable international norms provide detailed information on the necessary
qualifications. Furthermore, international standards such as the Performance Requirements,
Context Definitions and Knowledge & Skill specifications for the three EFPA levels of
qualifications in psychological assessment (European Federation of Psychologists‘
Associations, 2013), APA Standards for Educational and Psychological Testing (2014) or the
International Guidelines for Test Use (ITC, 2001) can be used.

Since the items are easy to understand, the ATAVT can be used for persons of different
educational levels and in a broad age range (see chapter 5). The administration time is
standardized and is between approx. 8 and 14 minutes depending on the test form. Testing
can take place both in a one-on-one setting and in a group. For group testing, however, the
respondents should use USB headsets. Due to the automatic test presentation and test
scoring, only the time for adding personal data must be planned for the supervisor in addition
to the test duration (approx. 6 and 15 minutes depending on the test form).

The ATAVT is continuously maintained and updated by the publishers. It is therefore

recommended that the test software is updated at least every five years, to ensure access to
current test norms and research results. In addition, all users are invited to support the
maintenance and development of the ATAVT by reporting their experiences of the use of the
ATAVT to the publisher, and, where appropriate, by making (anonymized) test data available
for further studies.

6
ATAVT

3 DESCRIPTION OF THE TEST

3.1 Theoretical background
3.1.1 Definition attentional control and obtaining an overview
Attentional control is defined as the ability to focus attention on currently important aspects of
a situation and to ignore irrelevant aspects. This ability is associated with a kind of “spotlight”
that focuses on relevant information (Schneider & McGrew, 2018). The working memory model
from Baddeley forms the basis for the definition of attentional control from Schneider and
McGrew (2012). This definition postulates as central executing element of working memory a
hypothetical structure which, depending on the situation, focuses, shares and shifts attention
and also is an interface to long-term memory.
Obtaining an overview can be considered a special case of attentional control that describes
the ability to focus attention on an (everyday) scenario and to quickly identify relevant objects
quickly in this scenario.

3.1.2 Attentional control in the context of general cognitive ability

The construct of attentional control can be understood in the context of the CHC model (Cattell-
Horn-Carroll model). The CHC model is a theoretically well established and validated
integrative model of human cognitive abilities formulated by McGrew (1997) as a synthesis of
the three-stratum theory of cognitive abilities of Carroll (1993) and the investment theory (and
its extensions) of Horn (1988) and Cattell (1987). According to the CHC model, cognitive
abilities can be assigned to three different levels which differ regarding their specificity and
content range.

At the tip of the hierarchy is the g(eneral) factor; the second level comprises the secondary
factors and the third, lowest level is made up of primary factors. The g-factor is the broadest
construct and is often understood as general intelligence. In the latest version of the CHC
model (Schneider & McGrew, 2018), a total of 17 secondary factors are listed. Auditory (Ga)
and visual (Gv) processing, olfactory (Go), gustatory (Gg), kinesthetic (Gk), tactile (Gh) and
psychomotor (Gp) skills are attributed to this group of sensory, domain-specific skills.
Furthermore, a differentiation is made between different types of acquired knowledge:
quantitative knowledge (Gq), writing and reading (Grw), comprehension knowledge (Gc) and
domain-specific knowledge (Gkm). The area of general speed includes the secondary factors
processing speed (Gs), psychomotor speed (Gs), and reaction and decision speed (Gt). In the
area of domain-neutral general functions, a differentiation is made between fluid reasoning
(Gf), working memory (Gwm) as well as long-term storage and recall (GI & Gr). Figure 1 shows
the relative section of the CHC model for the theoretical basis of the measured skill of the
ATAVT and is described further in the following.

7
ATAVT

Figure 1: Hierarchy levels of the CHC model (excerpt)

The Adaptive Tachistoscopic Traffic Perception Test (ATAVT) can be attributed to the
secondary factor of working memory (Gwm), more precisely the primary factor attentional
control. This is understood as the ability to focus attention on relevant stimuli and to ignore
irrelevant stimuli. The distinction from processing (Gs) and reaction speed (Gt) results from the
fact that for both an immediate, automated reaction is required and relevant information does
not have to be stored short-term.
The secondary factor visual processing (Gv) describes the ability of perception, discrimination,
manipulation, and recalling visual information. This includes the ability to visualize pictures in
one’s mind, to change them mentally (recall, rotate, etc.) and to apply them to problems, if
required. The differentiation from the secondary factor visual processing, specifically to the
primary factor visual memory results from the fact that working memory tasks are often simple,
numerous or sequential. Another major differentiating factor of working memory (Gwm) from
visual memory (below Gv), however, also is the fact that this is not just storing and recalling a
mental representation (Carroll, 1993) but furthermore also selecting and manipulating the
relevant visual information.

At the level of the primary factors, a distinction can be made for working memory (Gwm)
between visuo-spatial short-term memory and attentional control. Visuo-spatial short-term
memory describes the ability to encode visual information in working memory and retain it
short-term. If the item material requires discriminating between relevant and irrelevant stimuli,
this is rather a task that relates to attentional control.

3.1.3 Obtaining an overview as part of attentional control

Obtaining an overview as a special case of attentional control describes the ability to focus
attention on an (everyday) scenario and to quickly identify relevant objects quickly and to store
them short-term in this scenario. In order for objects to be classified as relevant, they have to
be recognized first. This process is called “visual search”. The difficulty of recognizing an object
from multiple objects depends on both the properties of the object (e.g. size, familiarity, etc.)
and on its environment (e.g. number of irrelevant objects). Furthermore, focused attention
plays a key role for the visual search. Attention must be focused quickly on (possibly) relevant
objects of a scenario so that it can be classified with a minimum delay (Müller &
Krummenacher, 2006).

8
ATAVT

Tests for measuring attentional control usually use relatively simple item material. Such simple
item material raises questions about the test’s ecological validity and the extent to which the
results can be generalized to cover more complex everyday situations. In view of these issues,
attempts were made even in the 1950s to operationalize these ability dimensions in ways that
were closer to real life. In traffic psychology assessments, Hessler and Grossjohann (1953)
used a tachistoscopic test method that enabled pictures of urban traffic situations to be
displayed for a period of one second. Development of such tests progressed in the years that
followed, particularly in the field of traffic psychological assessment. In 1960 TÜV-Rheinland
produced a tachistoscope test made up of 22 slides of traffic situations. Fischer and Kowar
(1977) also used a tachistoscope to display pictures of real-life traffic situations for exactly one
second. As with the other tachistoscope tests, the respondent’s task was to identify certain
pre-defined objects in the pictures displayed. After each picture had been briefly displayed, the
respondent was shown a list of objects and was asked to mark only those items that were
actually visible in the picture he/she had just seen. Compared with traditional
operationalization, these tachistoscopic tests have greater ecological validity and closer
resemblance to everyday life, and also have greater item complexity (Sommer, Herle, Häusler,
& Arendasy (2008)). Following the tradition of this field of research, the task of the ATAVT
consists of focusing attention on a traffic situation and to quickly identify relevant information
in this situation.

3.2 Test design

When designing the ATAVT one of the aims was to create a link with existing research work
on the Tachistoscopic Traffic Perception Test (TAVTMB; Biehl, 1996). The original design
principles were therefore retained and enhanced through the inclusion of up-to-date findings
from the research literature. In addition, and drawing on the work of Fischer and Kowar (1977),
Biehl (1996) and Arendasy and Sommer (2004), scalability under the 1PL Rasch model was
also considered important in order to also allow for adaptive test versions. The following
chapters provide a brief outline of the development of the ATAVT item pool and an introduction
to the Rasch model and its application in testing the dimensionality of the item pool.

3.2.1 Test presentation conditions

After a short signal which serves as a cue, the respondent is shown pictures of traffic situations
(see Figure 2). It is the respondent’s task to carefully remember the pictures and mark the
corresponding fields to indicate what was shown in the picture. For each picture five options
are given:
• Pedestrian(s)
• Motor vehicle
• Bicyclists, motorcycle, moped
• Traffic sign(s)
• Traffic light

Each of these options represents a different class of objects that might be found in a traffic-
related context.

9
ATAVT

Figure 2: Example of instructions

3.2.2 Development of the item pool

Starting point for development of the test

The development of the ATAVT was based on two objectives: using current picture material of
traffic situations and ensuring enhanced test security. The necessity of the latter resulted from
a study on immediate test repetition of the TATVMB that showed indications for reduced test
security. Working with a sample of 39 (47.6%) men and 43 (52.4%) women aged 18-78 (M=44
years; SD= 17 years), Häusler (2006) investigated the influence of immediate repetition of the
test (15 minutes after completing the initial assessment) and of delayed retesting (3 months
after the initial assessment) on the person parameter of the main variable of the TAVTMB. He
found significant differences (F[1.924]=12.056, p<0.001, η²=0.130) between the three test
occasions. Since no assumption can be made that the ability has genuinely increased when
the test is immediately repeated, Häusler (2006) assumed that this is a purely test-specific
practice effect. In order to avoid this problem, the ATAVT was designed from the outset as a
computerized adaptive test (CAT) (for details: chapter 3.3). The intended use of the adaptive
test presentation for the ATAVT regarding the reduction of test repetition effects has now also
been proven empirically (Sommer, Arendasy, Schützhofer & Knessl, 2017).

Definition of the construction rationale

To enable a sufficiently large number of tasks to be created, the construction of the items was
theory-led. An item can be described in terms of radicals and incidentals (Irvine & Kyllonen,
2002).
Radicals are item characteristics that exert a systematic influence on the item parameters (in
this case the item difficulty parameters) since they affect cognitive processes that are relevant
to the solution. Seven radicals (see Table 1) were defined for the ATAVT that systematically
impact the difficulty of the relevant stimuli and therefore require higher attentional control.

10
ATAVT

Table 1: Summary of the construction rationale of the ATAVT

Radical Description References

Müller & Krummenacher,
Number of visible object classes Number of object classes depicted (2006)
Biehl (1996)
Proportion of necessary objects in a class
Findlay (1980)
Proportion of necessary that are more than 4 cm to the left/right of
Ottes, van Giesbergen &
peripheral objects the stimulus center or more than 2.6 cm
Eggermont (1985)
above/below it.
Proportion of necessary objects in a class
Proportion of necessary Grossberg (1987a, 1987b)
for which ≥ 30% of the object surface is
concealed objects Neisser (1976)
concealed by other objects.
Proportion of necessary small Proportion of necessary objects in a class
Findlay (1980)
objects with a diameter ≤ 1 cm².
The majority of the necessary objects in the
Deubel & Hauske (1988)
Poor contour contrast object class to be detected contrast poorly
Grossberg (1987a, 1987b)
with the background
Proportion of necessary objects in a class Hoffmann & Zießler (1982)
that are either typical representatives of Hoffmann & Kämpf (1985)
Proportion of typical objects
this class or are depicted in a typical Rosch (1977)
representation. Zimmer (1984)
Friedman (1979)
Irrelevant objects that in the context might Hoffmann & Zießler (1986)
Confusion as a result of
be confused with a relevant object that is Hoffmann & Klein (1988)
experience
not being searched for. Loftus (1976)
Meyers & Rhoades (1978)

Incidentals, on the other hand, describe "superficial characteristics" that are interchangeable
among themselves. They have no influence on cognitive processes relevant to the solution
and therefore cannot have a significant effect on the item parameters. In the ATAVT the
individual concrete objects (e.g. vehicle type) in a category represent the incidentals.
On the basis of this construction rationale, requirement grids for the tasks of the ATAVT were
defined.

Design of the initial item pool

In the first development step, 120 pictures of traffic situations were generated. The item pool
to be created was divided into five parts corresponding to the five levels of the characteristic
Number of visible object classes. For the 24 tasks in each of the five parts requirement grids
were drawn up specifying the picture characteristics Proportion of necessary concealed
objects, Proportion of necessary peripheral objects, Proportion of necessary small objects and
Poor contour contrast. Since the pictures of the ATAVT were intended to portray real traffic
situations, it was possible to insert only rough construction guidelines in the grid. For the
individual pictures it was specified only whether these four picture characteristics should apply
or not apply. The attempt was made to achieve an equal distribution of the levels of the different
picture characteristics. Based on these specifications, only images of real traffic situations
were taken by professional photographers. In a next step, the images were evaluated based
on their photographic quality. As a result of this assessment, 19 pictures were excluded on
account of their poor quality (e.g. over-exposure, picture taken into the light, sub-optimal
exposure times, design specification not met etc.). For test forms for left-hand traffic (S2, S4,
and S6) the pictures were mirrored and adapted graphically (e.g. writing on the traffic signs).

11
ATAVT

Design of the calibration study

In order to create a link between the ATAVT and existing research work on the Tachistoscopic
Traffic Perception Test (TAVTMB; Biehl, 1996) the remaining 101 pictures were divided for
purposes of calibration into five approximately equal sets each containing 20 tasks. Because
of the uneven number of tasks to be calibrated, one of the item sets contained k=21 new items.
To enable these new items to be analyzed across item group boundaries and together with the
items of the TAVTMB, each of the five item sets included seven items from the TAVTMB that
were representative of the level of difficulty (link design) and that could serve as link items
between the item sets. This therefore resulted in four item sets containing 27 test items and
one set containing k=28 test items. The link items were always presented in the first seven
positions within each set.

Item selection based on classical item indicators

In the next step the classical task difficulties and selectivities were determined separately for
the five item sets. The sample was collected in the years from 2006 to 2007 in the Research
Center of SCHUHFRIED GmbH. The sample consists of 363 (46.6%) men and 416 (53.4%)
women aged between 15 and 94 (M = 44.69; SD = 17.19). All persons were tested under
supervision of a test administrator and participated in the study voluntarily without any
consequences associated with the working. Participation was compensated for every person
participating in the examination.
An item is counted as correct when all characteristics shown were recognized. Based on the
calculated indicators of classical test theory, items that had an item difficulty of ≤ 0.01 or ≥ 0.99
or a selectivity of ≤ 0.30 were eliminated. These selection criteria resulted in four items being
eliminated from Set 2 and Set 5, and six items being removed from Set 4. Three items were
eliminated from Set 3 based on these classical selection criteria from the further analyses. In
Set 1 k=28 items therefore remained for the subsequent analysis using the logistic model of
Rasch (1980). In Set 2 and Set 5 k=23 items were retained, while in Set 3 there were k=24
and in Set 4 k=21 items. None of the link items needed to be excluded from any of the five
item sets.

Design of the linear test forms S3 to S6

The linear test forms S3 and S4 were each created using 13 items from the complete item pool
of test forms S1 and S2, while the linear test forms S5 and S6 each use 14 items from the
complete pool. The selection of the items was done in consideration of two perspectives: For
one, the item parameters should first and foremost deliver as much information as possible for
people with an ability parameter corresponding with a norm-referenced score of PR=16. For
another, the item sample regarding its tasks structure should be as representative as possible
for the overall item pool. The quality of the linear test forms was checked using the data of two
samples. Details on the results can be found in section 3.2.4 (Testing the validity of the Rasch
model) and in section 4.2 (Information on Reliability).

3.2.3 Description of the Rasch model

Classical test theory provides a “deterministic” and non-verifiable model of the relationship
between a subject’s test score and the true extent of his ability. Probabilistic test theory, by
contrast, adopts an approach within which the relationship between test score and the true
level of ability is modeled. Thus the term “item response theory” (van der Linden & Hambleton,
1997; Rost, 2004) is used in connection with “probabilistic test theory”.

The item response theory (IRT; Fischer & Molenaar, 1995; van der Linden & Hambleton, 1997)
offers different measurement models, enabling the dimensionality of psychological tests to be
investigated in a theoretically well-founded manner.

12
ATAVT

The unidimensionality of dichotomous test items (e.g. right vs. wrong) can be tested using
Rasch’s logistic model (1980). This is a unidimensional latent trait model in which each
respondent v is described by a unidimensional person parameter θv and each test item i has
an item difficulty parameter εI. In the context of ability assessment, the position of θv describes
the respondent’s level of ability on the latent dimension measured. The probability that person
v will solve item i is specified by the model equation of the Rasch model in the form of a logistic
function:

eθv −εi
P(+ | θ v , ε i ) =
1 + eθv −εi
If the Rasch model fits for a defined item set and for particular subject groups, the test (or test
score) is described as “conforms with Rasch model”. A test that conforms with the Rasch model
has some useful psychometric properties:
(1) The test measures the same latent ability dimension for all test subjects. The ability under
consideration and the item difficulties can be characterized by a single parameter, respectively.
The raw score thus represents a sufficient statistic for the extent of the attribute in that
particular person (Fischer, 1974).
(2) The raw score obtained describes the respondent unambiguously with regard to his/her
position on the latent ability dimension θv that is to be measured.
(3) Any shift in the item parameters (e.g. εi  εi + c) leads to a corresponding shift of the person
parameters (e.g. θv  θv + c). Hence the estimation of item parameters does not depend on
the test sample used. This means that individuals or tasks can be compared with one another,
irrespective of which items the respondents have worked or which individuals from a particular
population have worked the tasks. Rasch (1980) refers in this context to “specific objectivity”.
This property of the Rasch model not only makes computerized adaptive testing possible but
also forms the basis for the generalizability of diagnostic statements about a respondent’s trait
strength over the sample of items used. For this reason, it is of key importance for the practical
application of psychological assessment methods.
(4) A further advantage of the validity of the Rasch model for the scales of a questionnaire is
that it demonstrates the fairness of the scoring method for different subject groups that are
held to be relevant. Only if the Rasch model fits the data can one be certain that none of the
respondent groups tested (e.g. women vs. men) is systematically advantaged or
disadvantaged by particular items.

One method of testing the Rasch model consists of checking the assumption of person
homogeneity (Rost, 2004) by means of likelihood quotient tests according to Andersen (1973).
This test relates the likelihood of the data for the item parameters estimated in the total sample
to the likelihoods of the data for the item parameters estimated in the sub-samples. This test
statistic has an asymptotic χ² distribution with (p-1)×(k-1) degrees of freedom, where k
represents the number of tasks and p is the number of sub-samples. A non-significant result
in an Andersen likelihood quotient test means that the item parameter estimates in the sub-
samples do not deviate significantly from the item parameter estimates of the total sample.
The item parameters can therefore be generalized over the sub-samples tested. The model
assumption of person homogeneity (Rost, 2004) can therefore be maintained with regard to
the splitting criteria investigated.

13
ATAVT

3.2.4 Analysis in accordance with the Rasch model

Rasch model analyses for the complete item pool

The model tests used test the requirement of the Rasch model for subgroup variance. The
assumption of conformity with the Rasch model was tested by means of the likelihood quotient
tests according to Andersen (1973) using the classical splitting criteria of raw score, gender,
age, and educational level. The first risk level was set a priori to 0.05 and adjusted based on
the alpha error cumulation for multiple significance tests (Koller, Alexandrowicz & Hatzinger,
2012) using the Bonferroni-Holm correction (Holm, 1979). The calculations were carried out
separately for the five item sets by means of the software LpcmWin 1.0 (Fischer & Ponocny-
Seliger, 1998) and using the conditional maximum likelihood (CML) algorithm. An incomplete
Rasch model for link designs was also calculated. In addition the authors used the Martin-Löf
statistic (Martin-Löf, 1973) to investigate whether the new items and the link items from the
TAVTMB measure the same latent personality dimension (Rost, 2004). The results are shown
in Table 2.

Table 2: Item set, splitting criteria, sample size and goodness of fit statistics

Item set Splitting criterion N χ² df p1

Internal 170 29.300 27 0.347
Gender 170 35.541 27 0.126
Set 1 Age 170 35.694 27 0.126
Education 170 34.761 27 0.145
Martin-Löf (TAVTMB-ATAVT) 170 111.029 146 0.986
Internal 154 13.407 22 0.921
Gender 154 37.129 22 0.023
Set 2 Age 154 22.725 22 0.417
Education 154 13.526 22 0.917
Martin-Löf (TAVTMB-ATAVT) 154 95.273 111 0.219
Internal 159 30.970 23 0.124
Gender 159 38.907 23 0.020
Set 3 Age 159 41.658 23 0.011
Education 159 24.169 23 0.394
Martin-Löf (TAVTMB-ATAVT) 159 121.379 118 0.397
Internal 150 19.982 20 0.459
Gender 150 24.946 20 0.203
Set 4 Age 150 18.585 20 0.549
Education 150 12.865 20 0.883
Martin-Löf (TAVTMB-ATAVT) 150 88.026 97 0.731
Internal 146 25.003 22 0.297
Gender 146 33.100 22 0.060
Set 5 Age 146 49.641 22 0.001
Education 146 25.127 22 0.291
Martin-Löf (TAVTMB-ATAVT) 146 95.938 111 0.845
Internal 779 117.787 90 0.026
Gender 779 112.616 90 0.054
Total
Age 779 121.082 90 0.016
Education 779 96.973 90 0.289
1
The adjusted critical p scores are: 0.01, 0.0125, 0.0167, 0.025, and 0.05

As Table 2 shows, the model test for age in the fifth item set is the only one to be equal or less
than the critical significance level of α=0.01. However, use of the Rasch model for the
incomplete full data set showed that none of the splitting criteria were conspicuous; in

14
ATAVT

consequence, no individual items were eliminated from the pool. Overall, therefore, the
assumption of unidimensionality for this item pool can be retained. This justifies the assumption
that the parameter estimates can be generalized over the subsamples studied for the entire
item pool. The assumption of person homogeneity (Rost, 2004) can therefore be regarded as
met. With regard to the assumption of item homogeneity (Rost, 2004) the Martin-Löf test did
not reach the critical significance level (see Table 2). This means that the newly designed items
of the ATAVT and the link items from the TAVTMB measure the same latent ability dimensions.
Findings on the criterion validity of the TAVTMB can therefore be transferred to the ATAVT.

Analysis for linear forms S3 and S4

The fit of the Rasch model for both linear test forms S3 and S4 was studied in a small sample
of 135 people. The data was collected in 2015 in SCHUHFRIED’s Test and Research Center.
The sample consists of 70 women and 65 men aged 17 to 80 years (M = 45.3; SD = 16.3), a
total of 14 individuals have completed compulsory schooling or an intermediate secondary
school but without completing vocational training, 65 individuals have completed vocational
training or a course at a technical college, 37 individuals have a school-leaving qualification at
university entrance level and 19 persons have a university degree. Based on the sample size,
the model tests from Ponocny (2001) for measuring model quality were used. The first risk
level was set a priori to 0.05 and adjusted based on the alpha error cumulation for multiple
significance tests (Koller et al., 2012) using the Bonferroni-Holm correction (Holm, 1979). The
results of this analysis are shown in Table 3.

Table 3: Results of model tests based on Ponocny (2001) for ATAVT/S3 and S4

Model statistics Simulated data sets2 Model damage recorded p1

T11 500 Local dependency on items 0.192
Different item difficulty for high
T10 – Internal splitting criterion 500 0.7
and low performers
Different item difficulty across
T10 - partial criterion Gender 500 0.858
genders
Different item difficulty across
T10 - partial criterion Age 500 0.086
age groups
Different item difficulty across
T10 - partial criterion Education 500 0.146
educational levels
1
The adjusted critical p scores are: 0.01, 0.0125, 0.0167, 0.025, and 0.05.
2
Number of simulated datasets on whose basis the test statistics were calculated.

Overall, the present results show a good fit for the Rasch model for ATAVT/S3 and S4.

Analysis for linear forms S5 and S6

The fit of the Rasch model for both linear test forms S5 and S6 was studied in a small sample
of 127 people. The data was collected in 2015 and 2019 in SCHUHFRIED’s Test and Research
Center. It consists of 74 women and 53 men aged between 17 and 83 (M=50.68; SD=19.6). A
total of 13 individuals have completed compulsory education or an intermediate secondary
school but without any subsequent vocational training, 50 individuals have completed
vocational training or upper secondary technical school, 48 individuals have a school-leaving
qualification at university entrance level or a post-secondary technical qualification and 16
have a university degree. Based on the sample size, the model tests from Ponocny (2001) for
measuring model quality were used. The first risk level was set a priori to 0.05 and adjusted
based on the alpha error cumulation for multiple significance tests (Koller et al., 2012) using
the Bonferroni-Holm correction (Holm, 1979). The results of this analysis are shown in Table
4.

15
ATAVT

Table 4: Results of model tests based on Ponocny (2001) for ATAVT/S5 and S6

Model statistics Simulated data sets2 Model damage recorded p1

T11 500 Local dependency on items 0.078
Different item difficulty for high
T10 – Internal splitting criterion 500 0.034
and low performers
Different item difficulty across
T10 - partial criterion Gender 500 0.97
genders
Different item difficulty across
T10 - partial criterion Age 500 0.02
age groups
Different item difficulty across
T10 - partial criterion Education 500 0.44
educational levels
1
The adjusted critical p scores are: 0.01, 0.0125, 0.0167, 0.025, and 0.05.
2
Number of simulated datasets on whose basis the test statistics were calculated.

Overall, the present results show a good fit for the Rasch model for ATAVT/S5 and S6.

In summary, these analyses show that both the adaptive test forms S1/S2 and also the linear
test forms S3-S6 based on this test are Rasch-homogeneous and can therefore measure
unidimensionally.

3.2.5 Equivalence of TAVTMB and ATAVT

A fundamental objective of the test design for the ATAVT consisted of achieving the highest
possible degree of equivalence between the two tests. For this reason, a significant portion of
the item development is based on the construction rationale of the TAVTMB. The present
chapter offers a systematic comparison between the two tests.

Dimensionality
In order to examine whether the items in both tests measure the same latent ability dimensions,
the TAVTMB items were included with the newly created ATAVT items in the investigation of
the unidimensionality of the ATAVT item pool. If the Rasch model (Rasch, 1980) applies for
the joint item pool for the TAVTMB and the ATAVT, then it can be assumed that both tests
assess the same latent ability dimension. Sommer et al. (2008) tested this hypothesis
empirically using a link design (see chapter 3.2.4 for details). The authors were able to
demonstrate that the combined item pool is Rasch-homogeneous. This proves that the two
tests measure exactly the same ability dimension.
By using a link design, the difficulty parameters of the ATAVT tasks could be adjusted to the
item difficulty parameters of the TAVTMB (vgl. Holland & Dorans, 2006; Kolen & Brennan,
2004). This brings with it the advantage that the person parameters of the ATAVT can be
transferred directly to the TAVTMB via a linear transformation, which means that the test
results of the ATAVT can be transferred to the TAVTMB and vice versa. The two tests are thus
directly comparable and interchangeable with regard to both the ability dimension that is
measured and the performance achieved by respondents in each.

Comparing the difficulty of items from the TAVTMB and ATAVT

Based on the validity of the Rasch model (Rasch, 1980) for the items from both tests, relevant
distribution characteristics for the item difficulties for both tests can also be directly compared.
A t-test here, for example, showed that the ATAVT items and the TAVTMB items are equally
difficult on average (Levene’s test: F=0.149, p=0.70; t[102]=0.123, p=0.902, Cohen’s d=-

16
ATAVT

0.031). Both tests differ, however, in the range of difficulties that can be covered with both item
pools. The distribution characteristics of both item pools are summarized in Table 5.

Table 5: Descriptive statistics of the ATAVT and TAVTMB item pools

ATAVT item pool (k=84)

Mean Median SD Min Max Skew Kurtosis
0.000 0.219 1.891 -4.492 3.634 -0.418 -0.266
TAVTMB item pool (k=20)
Mean Median SD Min Max Skew Kurtosis
-0.057 -1.119 1.717 -2.877 2.657 0.122 -1.644

Despite nearly the same mean value and distribution, the ATAVT covers a wider range of
difficulty. This means that sufficient items are available in lower and upper ability ranges of the
ATAVT in order to provide reliable measurement of the respondents’ latent ability.

Comparison of person parameters from the ATAVT and TAVTMB:

The direct comparability of the person parameters for both tests based on the mutual
calibration of the ATAVT and TAVTMB items also allows for a direct comparison of the person
parameters that were obtained in both tests. Each respective current norm sample for both
tests was referred to in the process. However, since both norm samples differed significantly
regarding the mean age and distribution of the age (Levene: F=8.639, p=0.003;
T[1187.973]=3.008, p=0.003, Cohen’s d=0.173), a random sample parallelized for age, gender
and education was taken from the two sets of norm data. The parallelized norm sample from
the two tests showed differences in the distribution of the person parameters of the two test
forms (Levene: F=102.001, p<0.001) with a lack of significant mean difference
(T[323.904]=1.391, p=0.165). In addition, no difference was found in the probability of reaching
or exceeding the test-specific critical boundary of PR=16 (Chi²[1]=2.377, p=0.123, CC=0.071).
Table 6 shows the descriptive statistics for the person parameter in the norm sample.

Table 6: Descriptive statistics for the ATAVT and TAVTMB person parameter

ATAVT
Mean Median SD Min Max Skew Kurtosis
0.198 0.284 0.683 -2.70 2.78 -0.512 1.234
TAVTMB
Mean Median SD Min Max Skew Kurtosis
0.4196 0.2396 1.369 -3.14 4.00 0.117 0.035

An inspection of the descriptive statistics shows that the range of the person parameters for
the TAVTMB is generally greater than with the ATAVT. This has the effect of a correspondingly
higher distribution of the person parameter for the TAVTMB. This result is interesting insofar
as the TAVTMB item pool includes a lower difficulty range than the ATAVT item pool.

Comparison of measurement precision

Due to the above-describe results, a comparison of the precision of the person parameter
estimation between the two test forms (TAVTBM vs. ATAVT) was carried out. Simulation
studies were performed for this purpose. The advantage of a simulation study is that the true
person parameters are known – in contrast to the empirical comparison studies – so that the
precision of the parameter estimate can be investigated. As a basic principle, a sample of 1000
people was simulated for each test for which person parameters are normally distributed. The
mean value for this distribution is 0.228 with a standard deviation of 1.083. Both of these

17
ATAVT

indicators correspond to the relevant indicator for the distribution of the estimated person
parameter in the norm sample for the ATAVT. In the simulated testing with the TAVTMB, all
20 items were predetermined and then the person parameter estimated using the joint
maximum likelihood method. For the ATAVT, typical adaptive testing was simulated and the
person parameter similarly estimated using the joint maximum likelihood method. Unlike the
real test applications of the ATAVT, no item exposure control was used for these simulated
studies. The person parameter estimated this way was compared in a next step with the real
person parameter that was used as a basis for the simulation. To do so, a Pearson correlation
was calculated between the real person parameters and the person parameters estimated as
part of the simulation. In the ideal situation, these correlations should be close to 1.0. For
ATAVT, there was a correlation of 0.915, while with the TAVTMB a correlation of 0.867 was
calculated. Figure 3shows the distribution diagram for the relationship between the real person
parameter and the person parameter estimated for ATAVT (left) and TAVTMB (right) on the
basis of the test performance.

Figure 3: Distribution diagram for the relationship between the real person parameter and the person
parameter estimated for ATAVT/S1 (left) and TAVTMB/S1 (right) by the test.

As Figure 3 shows, the precision of the person parameter estimate in ATAVT is noticeably
higher. The difference in the precision of the person parameter estimate between ATAVT and
TAVTMB is explained by the previously reported empirical results from the comparison of the
person parameter estimates from ATAVT and TAVTMB. From a content perspective, these
results mean that a person's true ability can be measured more precisely with the ATAVT.

Generalization of the proof of validity across both tests

In addition, an advantage of the fact that ATAVT and TAVTMB both conform to the Rasch
model is that studies of the validity of one of the tests can be transferred directly to the other
test. ATAVT and TAVMB are thus equivalent not only in terms of content but also
psychometrically. However, the major advantages of the ATAVT over the TAVTMB consist in
(1) the greater precision of measurement at single-case level that arises from the adaptive
mode of presentation and (2) the higher level of test security that can be achieved through
adaptive presentation and item exposure control (see chapter 3.3.5).

18
ATAVT

3.3 Adaptive testing

3.3.1 Adaptive versus linear presentation
For the ATAVT a selection can be made between linear (S3-S6) and adaptive (S1 & S2) test
forms. For the linear test forms, the presentation of the test order is fixed, that is every
respondent receives the same items. The basic principle of adaptive testing is that the test is
adapted to the ability level of the person being tested. Binet and Simon adopted this approach
as far back as 1916 when they designed series of intelligence tests that were graduated
according to age (Binet & Simon, 1916). However, test theory principles for comparing the
performance of two subjects who work different items were not developed until 1960, when
Georg Rasch drew up the Rasch model.

Adaptive presentation requires powerful computers for successful implementation. Computers

can carry out the complex computation involved in estimating the subject’s level of
performance after each item worked and can select as the next item the one most closely
suited to this ability level. The advantages of adaptive testing are:
• The level of challenge that an item will present for a respondent can be estimated
before the item is actually worked. This makes it possible to structure the test in a way
that is optimally challenging for each respondent.
• Each item provides the maximum possible amount of information about the
respondent’s ability. There is therefore an optimally economic relationship between test
length and precision of measurement. More accurate results are obtained with a
smaller number of items than is the case with traditional tests that have a fixed item
sequence.
• Test security is maintained by the adaptive presentation since almost no respondents
receive the items in the same order.

There are two controversial views regarding the

motivational impacts of adaptive testing. Some older studies (Betz & Weiss, 1976a, 1976b)
postulate that adaptive testing increases motivation for working the test for weaker individuals.
They explained their findings with the fact that weaker individuals were not underchallenged
or high achieving persons are underchallenged thanks to a selection of tasks that are
appropriate to the performance. However, more recent studies (siehe z.B. Frey, Hartig &
Moosbrugger, 2009; Tonidandel, Quiñones & Adams, 2002), have shown that the presentation
of items usually implemented with medium individual solution probability for adaptive tests in
fact does not result in particularly high motivation when working the test. This is due to the fact
that this is a frustrating experience for the majority of the respondents to only solve 50% of the
tasks on average if they even perceive their success or failure during testing. Therefore, linear
test forms are also available for the ATAVT.

Since adaptive tests are constructed on a probabilistic test model such as the Rasch model
(1980), at any point in testing the amount of information already gathered is known. The
adaptive test can therefore be continued for each respondent until a pre-set target reliability
has been reached. The adaptive test therefore measures all respondents with equal reliability.
Traditional tests, by contrast, achieve a certain reliability on average, but this reliability can be
significantly reduced in individual cases, particularly for respondents whose performance is
either very strong or very weak.

19
ATAVT

3.3.2 The process of adaptive test presentation

The construction of the ATAVT and the practical administration of the test is based on the
Rasch model (see chapter 3.2.3). If the item difficulties were estimated according to the Rasch
model and they are known, the test can be presented adaptively. Many variations of the design
of the test algorithm are possible (van der Linden & Glas, 2010; siehe z.B. Rettig & Hornke,
1990). The procedure currently implemented in the ATAVT is described here.

INITIAL PHASE: The first item presented is normally one of medium difficulty, since at this
stage information about the respondent’s ability is normally not available. An item that is
generally of medium difficulty usually represents the best challenge for the respondent.
Depending on the response, a more difficult or easier task is presented. The response given
is then recorded and the preliminary person parameter is estimated by way of the weighted
maximum likelihood-method (Warm, 1989).

In the highly unlikely event of the respondent having given only incorrect or only correct
answers after ten items, the test is terminated and the last estimated person parameter is
output, otherwise, the main phase is branched.

MAIN PHASE: After each further item the individual’s ability θ is estimated by the maximum
likelihood method (ML) from all the k answers given up to that point. This is done by maximizing
the likelihood function for θ (siehe Warm, 1989).

∑ −
+ =0
2

The technical details on this maximization are described in Baker and Kim (2004) and
elsewhere. The standard error of measurement (SEM), is also calculated for this ML estimate
of the person parameter θ.

The STOP RULES described below are then checked. If none of these rules apply, the next
item is selected and presented to the respondent.

As the next (k+1)- task, the task from the pool of tasks not yet applied is presented to the
respondent whose difficulty parameter βk+1 has the smallest absolute distance |βk+1 - θk | to the
person ability determined after k responses θk so far. After k+1 worked items, the person
parameter θk+1 is estimated or updated according to the ML method and, when necessary, this
procedure is repeated until the abort criterion is fulfilled. Respondents therefore work different
and differing amounts of tasks adapted to their level.

20
ATAVT

Figure 4: Test algorithm for the AMT based on the Rasch Model

STOP RULES: The test is terminated according to the criterion of the following criteria which
applies first (see Table 7):

Table 7: Stop criteria

Reason for termination

1 Maximum number of successive correct answers (MAXCORR = 10) achieved
2 Maximum number of successive incorrect answers (MAXFALSE = 10) achieved
A maximum number of items (MAXITEMS = 30) has been exceeded (independent of measurement
3
error)
There are no additional items available in an acceptable environment of θ (minimum item
4
information 0.018)
5 The precision requirements for the measurement (in accordance with the test form) were met.

Tasks used once are registered and not used again for the respective respondent.

3.3.3 Sociodemographically determined start phase

The first items with which a respondent is confronted can have a particularly strong impact on
test motivation. In order to maximize the reasonableness of the ATAVT, different start items
are selected for different respondents on the basis of their sociodemographic characteristics.
The start item chosen will be one which is most likely to be suitable for an individual with those
particular sociodemographic traits (see Table 8).

21
ATAVT

Table 8: Start parameters for different sociodemographic groups. The start items are selected to have a solution
probability of 75% for the average individual in the relevant group.

Educational group Age Item parameters of the start tasks

EU 1 - EU 3 up to 39 years 0.00
EU 1 - EU 3 40 to 59 years -0.53
EU 1 - EU 3 60 years and older -1.39
EU 4 - EU 5 up to 39 years 0.36
EU 4 - EU 5 40 to 59 years -0.16
EU 4 - EU 5 60 years and older -0.80

Because the ATAVT is sufficiently long, the differential start to the test, while it improves both
the test’s fairness and its reasonableness, has no effect on the person parameter that is to be
estimated.

3.3.4 Maintaining the pre-set reliability for adaptive tests

An important advantage of adaptive testing is the fact that it enables a desired level of reliability
to be achieved for each subject. The higher the requirements on reliability, the more items are
needed.

The following case study illustrates the process of adaptive testing. It involves a fictitious
subject from a simulation study. From this simulation the true ability parameter (ξ=1.0) is
known. The trend of the ability parameter estimates shown in Figure 5 clearly shows that they
converge to a limiting value. The outlying black lines indicate the confidence interval on either
side of the performance parameter. As each item is completed the confidence interval
becomes narrower as the estimate becomes more accurate.

22
ATAVT

Figure 5: Log of the adaptive test of a simulation study. The broken lines mark the boundaries of the
confidence interval (5% error probability) of the person parameter estimate.

It is clear from simulation studies of the ATAVT that differing numbers of items are required to
achieve a particular reliability at different scale points (i.e. for different ability parameters) (see
Figure 6).

Figure 6 shows the reliability and the associated number of items required for the ATAVT at
different scale points (i.e. for different ability parameter estimates). The mean reliability is also
given. The test parameters were measured in a simulation study (n=5000) that assumed a
normal distribution of ability.

23
ATAVT

Figure 6: Test reliability (y-axis, left-hand graph) for fixed test length (k=20)
and test length (y-axis of right-hand graph) for pre-set target reliability (α=0.80)
in the ATAVT over different ability levels (x-axis).

This yielded a reliability of α=0.80 for a fixed number of items of k=20. At the particularly high
end of the ability range (from PR=98) a fall in the reliability obtained can be observed (see
Figure 6 left). For use in traffic psychological assessment, however, this particularly high
performance range is of little relevance. The right-hand graph in Figure 6 shows that a larger
number of items is required to maintain reliability at either end of the ability spectrum. In
summary these results show that the test has adequate reliability and that this reliability is
maintained over a broad ability spectrum.

3.3.5 Item Exposure Control

The start of testing for adaptive tests is completely deterministic. If the test always begins with
the same item, there are only two possible items which can be presented next, four possible
items in the third position, and so on.

This means that some items will be presented much more frequently than others. The concerns
in particular those tasks in the medium difficulty range which are of crucial importance for the
item pool, especially the items in the starting phase can therefore become public
overproportionally fast.

In order to prevent possible overexposure for certain items at the start of the test, a method
from Kingsbury and Zara (1989) is used in line with adaptive testing. In this method, the person
parameter is estimated first and then five tasks with maximum item information are set, that is
the items which have the smallest distance between the estimated person parameter and the
difficulty of the tasks. Out of the five resulting items, one is selected at random and presented
to the respondent. This turns the initial strong deterministic course by incorrect and correct
responses provided by the respondent into a probabilistic result. This reduces the risk of a
respondent practicing the route throughout the entire test to a more advantageous result and
learning it by heart, as in coached faking.

24
ATAVT

3.4 Test forms

The ATAVT consists of six test forms: Test form/S1 is designed for adaptive presentation in
countries with right-hand traffic and test form/S2 for adaptive presentation in countries with left-
hand traffic. The greatest advantage of test forms S1 and S2 consists of the increased test
security due to the adaptive presentation of the items. In addition, an advantage of adaptive
testing is the fact that the ratio between test length and reliability is optimal and is able to
achieve more precise results with fewer but individually adapted items.

In addition, there are four linear test forms (S3, S4, S5 & S6). Test forms S3 and S5 are
designed for countries with right-hand traffic. Test forms S4 and S6 are designed in the same
way as S3 and S5 but are intended for countries that drive on the left. The linear test forms
S3, S4, S5 and S6 were developed especially for the lower ability range. Due to the linear
presentation, they have lower test security but they also have a shorter test duration.

The test duration of test forms S1 and S2 is approx. 14 minutes, the test duration of S3, S4,
S5 and S6 is approx. 8 minutes.

3.5 Description of variables

The output variables are structured into main variables, subsidiary variables, and additional
information. The main variable describes the operationalization of the theoretical construct
described in chapter 3 and can therefore be interpreted as a measure for obtaining an
overview. The subsidiary variables describe additional information of the operationalized
construct relevant for particular assessment situations and allow for detailed interpretation.
The section additional information includes variables that provide supplementary indicators on
test processing, such as the working time of the test. In contrast to the main and subsidiary
variables, no quality criteria are tested. The interpretation information on the individual
variables is available in chapter 7.2.

Main variable

Obtaining an overview
This consists of an ability parameter estimated according to the Rasch model for the ability to
obtain an overview as part of attentional control.
The ability parameter is calculated from the item difficulties and whether the items presented
are solved or answered incorrectly. An item is classed as solved only if the answer given
includes all the required object classes.

Additional information

Working time
This variable indicates the working time of the subtest in minutes and seconds.

25
ATAVT

4 EVALUATION
4.1 Objectivity
A test is objective if the test results are independent of the investigator. A distinction is made
between administration objectivity, scoring objectivity, and interpretation objectivity (Ziegler &
Bühner, 2012).

Administration objectivity
Administration objectivity is given if the result is independent of the test administrator and the
testing situation. Since administration of the ATAVT is computerized, all respondents receive
the same information, presented in the same way, about the test. These instructions are
independent of the test supervisor. Similarly, the test presentation is identical for all
respondents. This ensures administration objectivity.

Scoring objectivity
Scoring objectivity is given if the result is independent of the person evaluating the test. Since
the computer records answers, calculates variables, and compares norm-referenced scores,
computational errors are excluded and scoring objectivity is given.

Interpretation objectivity
Interpretation objectivity exists if the same conclusion is drawn from particular test results even
when they are interpreted by different people (Kubinger, 2009). If the test in question has been
normed, it is generally unambiguous in its interpretation: the norm-referenced score and the
confidence interval unequivocally determine the respondent’s “position” within the reference
population with regard to the measured trait. According to this definition the ATAVT has
interpretation objectivity.

4.2 Reliability
The reliability of a test describes the degree of the precision with which the aptitude is
measured (Ziegler & Bühner, 2012). Reliability coefficients can take values between 0 and 1:
a higher value reflects greater precision. According to the test appraisal guidelines of the
European Federation of Psychologists' Associations (EFPA, 2013) from July 2013, reliability
values over 0.7 are considered adequate, over 0.8 are considered good, and over 0.9 are
excellent.

Internal consistency of test forms S1 and S2

Reliability in the sense of internal consistency is determined by the user’s choice of the test
form and therefore a critical standard error of measurement. Two adaptive test forms (test
forms S1 and S2) are currently offered as standard, based on a critical standard error of
measurement (SEM) of 0.49 which corresponds to a reliability of 0.80. Adaptive testing
ensures that each respondent is assessed with the same reliability. The findings reported in
chapter 3.3.4 show that this intended reliability is maintained throughout the ability continuum.
A fall in the intended reliability is observable only at a particularly high ability level (from
PR=98).

26
ATAVT

In order to test the retest reliability and stability of the main variable Obtaining an overview, a
linear precursor version of the ATAVT was administered to a sample of 82 individuals (48%
men, 52% women) in the age range 17 – 78 (M=44; SD=17) immediately after testing and
three months later. The study was collected in 2006 in SCHUHFRIED’s Test and Research
Center. All persons were tested under supervision of a test administrator and participated in
the study voluntarily without any consequences associated with the working. Participation was
compensated for every person participating in the examination. This yielded a retest reliability
(retesting immediately after the first test) of r=0.76 and a stability (retesting after 3 months) of
r=0.70. The two values lie within the reliability of the linear form that was used on that occasion
(r=0.77) and therefore indicate moderate reliability of the test forms.

Internal consistency of test forms S3 and S4

Using a sample of 135 people, an estimate of internal consistency was given for test forms S3
and S4. A description of this evaluation sample can be found in section 3.2.4. This estimate
resulted in a value of 0.726 for Cronbach's alpha.

Internal consistency of test forms S5 and S6

Using a sample of 127 people, an estimate of internal consistency was given for test forms S5
and S6. The described evaluation sample was used for the calculation (see section 3.2.4). The
sample was collected at the Test & Research Center of Schuhfried and consists of 74 women
and 53 men aged between 17 and 83 (M = 50.68, SD = 19.57). 13 individuals have completed
compulsory schooling or an intermediate secondary school but without completing vocational
training, 50 individuals have completed vocational training or a course at a technical college,
48 individuals have a school-leaving qualification at university entrance level and 16 individuals
have a university degree.
This estimate resulted in a value of 0.729 for Cronbach’s alpha.

Reliability
The reliability of a test is composed of the length of the point range or the maximum possible
scale combined with the width of the confidence interval (Sijtsma & Emons, 2011). The relative
confidence interval calculated as a result indicates how well a differentiation can be made with
the test. The lower the relative confidence interval, the higher the reliability.
Due to the adaptive test presentation for test forms S1 and S2, the confidence interval can be
calculated consistently during testing while considering the standard error of measurement of
the current person parameter estimator and diminishes with increasing reliability. For this
reason, the relative confidence interval per test run may change slightly due to the marginal
differences of the final standard error of measurement. Providing a generally relative
confidence interval for adaptive testing therefore is not appropriate. The relative confidence
interval of the ATAVT for the linear test forms (S3-S6) can be seen in Table 9 whereby in the
“point range” case the range of the person parameters corresponds to the norm sample.

Table 9: Relative confidence intervals of the linear test forms (S3-S6)

Test form Cronbach’s alpha Point range Wide CI rKI*

Test forms S3 and S4 0.726 -3.09 - 3.11 2.75 0.443

Test forms S5 and S6 0.729 -3.17 - 2.3 2.45 0.448

* Confidence level of 95%

27
ATAVT

4.3 Validity
A test is valid if it actually measures the characteristics that it claims to measure (Ziegler &
Bühner, 2012).

4.3.1 Content validity

Content validity assesses the extent to which a test accurately represents the construct it sets
out to measure.
For the ATAVT, the latent ability to be measured is that of obtaining an overview as part of
controlled attention. This is generally defined as the ability to focus attention on currently
important aspects of a situation (everyday scenario) and to ignore irrelevant aspects. This
ability is addressed in the ATAVT with tachistoscopic presentation of traffic scenes. Within
seconds, the respondent must assess which of five relevant object classes are shown in the
image and which are not. Based on these considerations, therefore it can be assumed that the
ATAVT meets the quality criteria for content validity.

4.3.2 Construct validity

Construct validity involves assessing the extent to which the theoretically expected correlations
between items and external indicators of the same construct (e.g. other tests) are supported
by practical evidence.

Construct validity exists when it can be demonstrated that a test implements particular theory-
based approaches. With regard to construct validity, Embretson (1983) distinguishes between
construct representation and a nomological network.

Evidence for the construct representation of the ATAVT

Construct representation determines what the test score means in terms of content; it is
concerned with the respondent’s cognitive processes while working the items and the extent
to which these processes depend on structural features of the individual items. It must
therefore be demonstrated that particular item characteristics do in fact involve the theoretically
postulated cognitive processes and can therefore explain differences in the item difficulty
parameters. The impact of the item characteristics and the associated cognitive processes can
be empirically tested using explanatory IRT models such as the Linear Logistic Test Model
(LLTM, Fischer, 1974) or multiple regression (z.B. Embretson, 1998, 2002).
Evidence for the construct representation of ATAVT is provided in a study from Sommer, Herle,
Häusler & Arendasy (2008) in which 363 men (46.6%) and 416 women (53.4%) aged between
15 to 94 (M = 44.69, SD = 17.19) participated. Drawing on the work of Embretson (2002), the
authors used an incremental multiple regression in order to predict the 1PL item difficulty
parameter for the k=84 newly constructed items based on key item characteristics that appear
to be relevant. The following item characteristics were included in the following order: (1)
Number of visible object classes, (2) Proportion of necessary peripheral objects, (3) Proportion
of necessary small objects, (4) Confusion as a result of experience, (5) Poor contour contrast,
(6) Proportion of necessary concealed objects and (7) Proportion of typical objects. The model
summary is shown in Table 10.

28
ATAVT

Table 10: Model summary of the step-wise multiple regression

Model R R² adj. R² Δ R² ΔF df1 df2 p

1 0.801 0.642 0.637 0.642 146.86 1 82 <0.001
2 0.838 0.703 0.696 0.061 16.68 1 81 <0.001
3 0.861 0.742 0.732 0.039 12.14 1 80 0.001
4 0.880 0.775 0.764 0.033 11.61 1 79 0.001
5 0.887 0.787 0.773 0.012 4.35 1 78 0.040
6 0.895 0.801 0.786 0.014 5.49 1 77 0.022
7 0.901 0.812 0.795 0.011 4.57 1 76 0.036

The seven theoretically postulated relevant item characteristics (difficulty parameters) thus
explain 81.2% of the variance in the 1PL item difficulty parameters. Table 11 shows the
regression weights for the individual difficulty factors of the construction rationale.

Table 11: Regression weights in Model 7 of the multiple regression

Predictor B S.E. β t p
Number of visible object classes 1.061 0.078 0.769 13.640 <0.001
Proportion of necessary peripheral objects 1.366 0.340 0.206 4.020 <0.001
Proportion of necessary small objects 1.089 0.345 0.177 3.152 0.002
Confusion as a result of experience 0.880 0.274 0.164 3.213 0.002
Poor contour contrast 0.721 0.277 0.139 2.604 0.011
Proportion of necessary concealed objects 1.984 0.655 0.169 3.029 0.003
Proportion of typical objects 0.769 0.360 0.132 2.137 0.036

As Table 11 makes clear, all the theoretically postulated relevant item characteristics (difficulty
factors) make a significant contribution to prediction of the 1PL item difficulty parameters.
However, inspection of the β weights shows that the “Number of visible object classes” makes
the largest contribution to prediction of the item difficulty parameters.

Convergent and discriminant validity

The convergent and discriminant validity addresses the question to what extent a test
correlates with other tests that aim to cover a similar or the same (convergent) or other
(discriminant) validity area (Bühner, 2011). Based on the CHC model (Schneider & McGrew,
2018) the convergent and discriminant validity of the ATAVT was investigated together with a
number of further tests which are frequently used as part of traffic psychological tests in
German-speaking countries and also internationally. For this purpose, a nomological network
was defined and tested for the tests used in the study in the sense of the CHC model (Figure
7).

29
ATAVT

Figure 7: Nomological network of the test battery according to the CHC model

In the selection of the tests for the battery used, different points were taken into consideration.
Firstly, attention was paid that only tests are used that have sufficient reliability (reliability of
the main variable ≥ 0.7) to prevent artificial impairment of the correlation between the
investigated test (COG) and non-construct-related tests or construct-related tests due to low
reliability. Secondly, the tests were selected at a similar level of generality to ensure sufficient
symmetry. Limited symmetry would be expected when the correlated tests broadly measure
characters with different widths (Bühner, 2011), for example by one test only measuring a
subarea of a skill while another test attempts to measure the entire ability with all of its
subareas. For this reason, a test was selected which is at the same level of generality in the
CHC model, that is those of the primary factors (see section 3.1.2). Thirdly, only ability tests
were selected to avoid validity restrictions due to different measurement methods. When tests
with different measurement methods are used (e.g. self-assessment, third-party assessment,
ability tests, etc.), this may result in impaired validity estimates due to method factors (Bühner,
2011). Fourthly, a representative sample for age and gender was collected for the examination
in order to avoid distribution restrictions. A distribution restriction and therefore limited validity
estimate would exist when, for example, a preselection of the respondents based on the test
scores achieved would be made (e.g. all persons must achieve a percentile rank over 16 in
the ATAVT test). After considering the points made, the following tests were selected for the
battery: The Cognitrone S11 (G. Schuhfried, 2019a), Determination Test S1 (G Schuhfried,
2019b), Visual Pursuit Test S3 (Wagner, Karner & Biehl, 2018) and Reaction Test S3 (G.
Schuhfried, 2019).

The adaptive test form S1 (Cronbach’s α = 0.80) of the ATAVT was used. Cognitrone (COG)
is a concentration test in the sense of an image comparison test. In test form S11 used for this
study (Cronbach’s α = 0.95) an abstract target figure is shown to the respondent for which the
respondent must decide whether or not it is included in a row of four other abstract figures.
The respondent controls the processing time himself/herself by only presenting the next
abstract figure after a reaction has been made. The Determination Test (DT) is a choice
reaction test during which the respondent is presented optical and acoustic stimuli sequentially.
In test form S1 (Cronbach’s α = 0.95) the optical stimuli are circles in blue, yellow, green, red
and white as well as two rectangles on the left or right side of the screen while the acoustic

30
ATAVT

stimuli are a high-pitched and low-pitched sound. The task is to operate the buttons assigned
to the stimuli on the SCHUHFRIED response panel or foot-operated keys as quickly as
possible when the corresponding stimulus is presented on the screen. In test form S1, the
presentation time of the stimuli is adapted to the ability of the respondent, but it is not
determined directly by the respondent. The Reaction Test (RT) is a test for measuring the
respondent’s ability to react. In test form S3 (Cronbach’s α = 0.93) used in this study, simple
optical and acoustic stimuli are presented for this purpose. Yellow and red circles are used as
optical stimuli while a high-pitched sound is used as the acoustic stimulus. The respondent’s
reaction is only required for a clearly defined stimulus combination. The sequence is not
determined by the respondent and the stimuli follow a defined interval. The reaction time is
registered via the SCHUHFRIED response panel. The Visual Pursuit Test (LVT) is a classic
line-labyrinth test for measuring visual orientation performance for which the respondent has
to recognize the end of a specified line as quickly as possible from a tangle of lines (Cronbach’s
α = 0,92). The picture with the lines that the respondent has to track is only shown when two
keys of the SCHUHFRIED respondent panel are pressed at the same time and disappears
immediately when the buttons are released. The respondent can freely choose the working
speed, that is the duration of viewing the speed and subsequent specifying of the correct end
point. In addition, the CORSI Block-Tapping Test (Schellig, 2018) and the Visual Memory Test
(Etzel & Hornke, 2018) were added to check construct validity as part of the CHC model. The
CORSI Block-Tapping Test (CORSI) measures the storage capacity of spatial working
memory. In test form S1 used for this study, the respondents are shown three irregularly
arranged blocks (Cronbach’s α = 0.81). A cursor touches a certain number of blocks in a row.
After this animation is shown, the respondent is asked to repeat the order that was shown. The
Visual Memory Test (VISGED) measures short-term visual memory performance. In the traffic
psychological short form (S11), the respondent is shown a town map. Various symbols, such
as a cross for a hospital and a book for a library, are marked on the map (Cronbach’s α = 0.73).
The respondent’s task is to remember the type and position of the symbols. He/she is then
asked about the position of particular symbols. The CORSI test was used to test convergent
validity. The test loads on the same secondary factor (visual processing) of the CHC model as
the LVT. The DT, RT, COG; ATAVT, and CORSI tests were used to assess discriminant
validity.

The sample was gathered in January and February 2019 in the Test & Research Center of
SCHUHFRIED GmbH, using a stratified quota sample plan according to gender, age, and
education. Overall, a sample of N=129 persons (78 women, 51 men) aged 20 to 87 (M = 52.4,
SD = 16.6) worked the tests. The expected age and gender distribution was obtained from
demographic data for Germany, Austria, and Switzerland for 2016 (Europäische Kommission,
2019). A comparison of the two distributions using a statistical chi² test (chi²[29]=25.98, p=0.63)
shows that they do not differ significantly from each other. The sample can therefore be
regarded as representative of the population of Austria, Germany, and Switzerland in terms of
gender and age.

5 individuals (4%) have completed compulsory schooling or an intermediate secondary school

but without completing vocational training (EU education level 2), 39 individuals (30%) have
completed vocational training or a course at a technical college (EU education level 3), 49
individuals (38%) have a school-leaving qualification at university entrance level (EU education
level 4) and 36 individuals (28%) have a university degree (EU education level 5). All
participating persons have a driver’s license and received compensation for participating in the
study.

In an initial step, the discriminant and convergent validity is assessed using intercorrelations
of the main variables. The results can be seen in Table 12.

31
ATAVT

Table 12: Intercorrelations of the presented tests

ATAVT COG DT LVT RT CORSI VISGED

ATAVT (S1) -.382 .305 .190 -.262 .447 .288
COG (S11) -.382 -.580 -.373 .417 -.451 -.400
DT (S1) .305 -.580 .440 -.394 .576 .484
LVT (S3) .190 -.373 .440 -.252 .372 .482
RT (S3) -.262 .417 -.394 -.252 -.361 -.204

Notes: Correlations that indicate convergent validity are printed in bold.

Convergent validity is tested based on the correlation of the ATAVT with the CORSI test. As
can be seen in Table 12, the ATAVT has the highest correlation with CORSI as can be
expected from the nomological network or the CHC model in which both tests are attributed to
the secondary factor working memory (Gwm).

The correlations to the COG and DT can be traced back to the shared visuo-spatial item
material used in all tests. In addition, working memory also plays a role for working both
additional tests, which can also be traced back to the strong loading of working memory on the
g factor which can both be drawn from the literature (Schneider & McGrew, 2018) and from
the structural equation model described below.

Discriminant validity was tested based on the correlations of the ATAVT with the LVT and
VISGED tests. As can be seen in Table 12, the ATAVT shows low correlations with these two
tests which was assumed.

In a further step, the fit of the tests used in this study to the theoretical assumptions of the CHC
model or the nomological network were investigated. The fit of the CHC model or the
nomological network was investigated using structural equation models. The ML estimator
from the R package “Iavaan” (R Core Team, 2016; Rosseel, 2012) was used to estimate the
models (Reinecke, 2014).

The estimate was carried out based on the sample collected between January and February
2019 in the Test and Research Center of SCHUHFRIED GmbH, using a quota sample plan
stratified according to gender, age, and education. For the model, all persons were eliminated
who had incomplete data in one or multiple included procedures, which resulted in a sample
of N=119 persons (71 women, 48 men) aged 20 to 87 (M = 52.1; SD = 16.6). The expected
age and gender distribution was obtained from demographic data for Germany, Austria, and
Switzerland for 2016 (Europäische Kommission, 2019). A comparison of the two distributions
using a statistical chi² test (chi²[29]=26.61, p=0.59) shows that they do not differ significantly
from each other. The sample can therefore be regarded as representative of the population of
Austria, Germany, and Switzerland in terms of gender and age.

5 individuals (4%) have completed compulsory schooling or an intermediate secondary school

but without completing vocational training (EU education level 2), 35 individuals (29%) have
completed vocational training or a course at a technical college (EU education level 3), 48
individuals (40%) have a school-leaving qualification at university entrance level (EU education
level 4) and 31 individuals (26%) have a university degree (EU education level 5). All
participating persons have a driver’s license and received compensation for participating in the
study.

32
ATAVT

Figure 8: Structural equation model according to the CHC model and the nomological network; the negative
loading of the secondary factor Reaction & decision speed on the g factor is based on the mean times used for
the calculation, which is poled negatively compared to the other two secondary factors

The assessment of model fit is carried out using the indices CFI (Comparative Fit Index),
SRMR (Standardized Root Mean Square Residual) and RMSEA (Root Mean Square Error of
Approximation). A sufficient correlation between the empirical covariance matrix and the
covariance matrix to be expected because of the theoretically postulated factor model is shown
by the following values: CFI ≥ 0.90, SRMR ≤ 0.08 and RMSEA < 0.08 (Hu & Bentler, 1999; Yu,
2002). The key statistics on the structural equation model [χ²(11) = 9.223; CFI = 1.0; RMSEA
= 0.01 (95% Cl: 0.00; 0.084); SRMR = 0.030] result in a very good fit of the data on the
theoretically postulated model. In addition, the fit of a single-factor model was estimated due
to the intercorrelations between the tests in which all tests used load directly to a g factor
[χ²(14) = 22.107, CFI = 0.968, RMSEA = 0.070 (95% Cl: 0.00; 0.123), SRMR = 0.047]. The
comparison of both models shows a significantly better fit of the model created based on the
CHC model compared to a single-factor model, as can be seen in Table 13.

Table 13: Model comparison of the model based on the CHC model and a single-factor model.

df χ2 χ2 diff Df diff p
FEV model 11 9.223
1-factor model 14 22.107 13.208 3 0.004

In summary, both the intercorrelations found between the presented tests and the results of
the structural equation model can be interpreted as proof of construct validity and particularly
the convergent and discriminant validity of the ATAVT.

33
ATAVT

4.3.3 Criterion validity

Evidence of the criterion validity of the ATAVT is provided by a number of studies in the field
of traffic psychological assessment, some of which involved a precursor version of the ATAVT,
the TAVTMB. Due to the equivalence of the ATAVT and TAVTMB tests (see chapter 3.2.5),
evidence of criterion validity of the TAVTMB also applies for the ATAVT.

A study by Sommer, Arendasy, Schuhfried, and Litzenberger (2005) investigated whether

accident-free drivers could be distinguished from those who had been involved in a number of
accidents based on their test scores. The authors investigated a sample of a total of 153
persons (86 men; aged 22 to 45; Mage=31.73 years; SDage= 6.25 years). The subjects were
volunteers who had been recruited through work with an Austrian insurance company. The
distribution of educational level in this sample was as follows: a total of 7 people (4.60%) had
completed compulsory schooling or an intermediate secondary school but without completing
vocational training (EU educational level 2), 42 people (27.50%) had completed vocational
training or a course at a technical college (EU educational level 3), 83 people (54.20%) had a
school-leaving qualification at university entrance level or a qualification from a technical
university (EU educational level 4) and 21 people (13.70%) had a university degree (EU
educational level 5). In a structured interview designed to elicit information about traffic
accidents, 75 individuals (49%) reported that they had had two or more accidents in which they
had been at fault, while the remaining 78 individuals (51%) reported no previous accidents. In
order to carry out an extreme group comparison, individuals with only one accident in which
they had been at fault were excluded from the sample, as it can not be assumed that a single
accident represents a relevant raising of the individual’s accident proneness.
There was no significant difference between the two groups in terms of gender (χ²=2.492, df=1,
p=0.114), educational distribution (χ²=3.114, df=3, p=0.374) or age (Z=1.563, p=0.059). They
did not differ either with respect to their driving environment or in their driving behavior. All the
participants had held a driving license for at least five years and had driven at least 50,000
kilometers. There was no significant difference between the two groups (Z=1.616, p=0.053)
with regard to the total distance they had driven. All the participants came from Vienna and
drove mainly in an urban environment.
The standardized traffic-psychological test battery used in this study included a precursor
version of the ATAVT (TAVTMB; Biehl, 1996). The authors made use of artificial neural
networks to predict the respondents’ accident proneness from their test scores. The results
showed that 92.2% of the sample were classified correctly. This corresponds to a validity
coefficient of R=0.84. In the jackknife validation a classification rate of 88.9% and a validity
coefficient of R=0.84 were obtained. The intervals for the classification rage and the validity
coefficient in the bootstrap validation were [84.6%; 94.5%] and [0.70; 0.89]. Table 14 shows
the incremental validity and relative relevance of the individual tests. The relative relevance
indicates the amount contributed by the test in question to the validity of the predictive model.

Table 14: Incremental validity and relative relevance of the individual tests

Test Incremental validity Relative relevance

AMT: General intelligence 0.011 1.4%
DT: Reactive stress tolerance 0.133 20.9%
SIGNAL: Number of correct and delayed reactions 0.075 9.2%
ATAVT: Obtaining an overview 0.064 7.9%
WRBTV: Subjectively accepted level of risk 0.072 8.9%
IVPE-PS: Mental stability 0.311 32.2%
IVPE-VB: Sense of social responsibility 0.087 10.6%
IVPE-SK: Self-control 0.066 8.2%
IVPE-TA: Adventurousness and need for excitement 0.001 2.0%

34
ATAVT

The ATAVT has a relative relevance of nearly 8%. These results therefore provide evidence
of its criterion validity.

A further study by Risser et al. (2008) investigated the criterion validity of two test batteries for
measuring driving-specific ability from the Expert System Traffic (Schuhfried, 2005), which also
included a precursor version of the ATAVT (TAVTMB; Biehl, 1996). The only difference
between the Plus and Standard test batteries is that the Plus Test Battery includes the
Peripheral Perception Test (PP) as a means of measuring the visual field and assessing
divided attention. The global assessment of driving behavior in a standardized driving test was
used as the criterion variable. The sample investigated by the authors consisted of 164 persons
(74 men; aged 19 to 91; M=59 years; SD=18 years). Some of the respondents were drivers
who had already committed traffic offenses.
A total of 39 people (18%) had completed compulsory schooling or an intermediate secondary
school but without completing vocational training (EU educational level 2), 96 people (43%)
had completed vocational training or a course at a technical college (EU educational level 3),
35 people (16%) had a school-leaving qualification at university entrance level or a qualification
from a technical university (EU educational level 4) and 52 people (23%) had a university
degree (EU educational level 5).

The correlations between the individual tests and the external criterion are shown in Table 15.

Table 15: Correlations between the tests and the external criterion

RT cog. RT motor PP visual PP Tracking

AMT DT TAVT COG
speed speed field deviation
0.22* 0.33** 0.27** -0.25** 0.07 -0.33* 0.42** 0.19
*p<0,05 ** p<0,01

Building on the findings of previous studies, an artificial neural network was used to calculate
the criterion validity of the test battery. The results demonstrated that the Standard test battery
correctly classified 80.2% of the total sample. The classification rate for the Plus test battery
was 86.5%. This corresponds to a validity coefficient of R=0.68 for the Standard test battery
and R=0.78 for the Plus test battery. In the jackknife validation a classification rate of 80.2%
and a validity coefficient of R=0.67 were obtained for the Standard test battery. For the Plus
test battery the jackknife validation yielded a classification rate of 83.8% and a validity
coefficient of R=0.77. The intervals for the classification rate and the validity coefficient in the
bootstrap validation were [74.2%; 85.2%] and [0.54; 0.73] for the Standard test battery and
[77.5%; 87.2%] and [0.61; 0.79] for the Plus test battery. The findings were replicated by
Sommer et al. (2008) using an additional independent sample.

Table 16 shows the incremental validity and relative relevance of the individual tests of the two
test batteries. The relative relevance indicates the amount contributed by the test in question
to the validity of the predictive model.

35
ATAVT

Table 16: Incremental validity and relative relevance of the individual tests

STANDARD PLUS
Incremental Relative Incremental Relative
Predictor variables
validity relevance validity relevance
AMT: General intelligence 0.128 18.7% 0.078 11.5%
DT: Reactive stress tolerance 0.124 18.2% 0.071 10.5%
RT: Reaction speed 0.036 5.7% 0.057 8.6%
RT: Motor speed 0.132 19.2% 0.118 16.9%
PP: Field of vision -- -- 0.090 13.2%
PP: Tracking deviation -- -- 0.139 19.5%
ATAVT: Obtaining an overview 0.120 17.6% 0.038 5.8%
COG: Ability to concentrate 0.141 20.3% 0.093 13.6%

With a relative relevance of 17.6% (Standard test battery) and 5.8% (Plus test battery) the
results thus support the criterion validity of the ATAVT.

Evidence of the incremental criterion validity of the previous version of the ATAVT test
(TAVTMB; Biehl, 1996) was also provided by the results of a study by Sommer et al. (2008).
The authors investigated the criterion validity of a test battery for measuring driving-related
ability and aptitude for adapting to traffic conditions from the Expert System Traffic (Schuhfried,
2005), which also included a previous version of the ATAVT. The global assessment of driving
behavior in a standardized driving test was used as the criterion variable. Sommer et al. studied
a sample of 127 (79.9%) men and 32 (20.1%) women in the age range 19 – 73 (M=39.41;
SD=11.92). A total of 29 people (18.2%) had completed compulsory schooling or an
intermediate secondary school but without completing vocational training (EU educational level
2), 96 people (43%) had completed vocational training or a course at a technical college (EU
educational level 3), 35 people (16%) had a school-leaving qualification at university entrance
level or a qualification from a technical university (EU educational level 4) and 52 people (23%)
had a university degree (EU educational level 5). 109 individuals (68.6%) were classified as
safe drivers on the basis of their performance in the standardized driving test (Vienna Driving
Test).

The correlations between the individual tests and the external criterion are shown in Table 17.

Table 17: Correlations between the tests and the external criterion

RT RT
AMT DT ATAVT COG
cog. speed motor speed
- 0.120 -0.390* 0.184* 0.201* -0.393** 0.212*
** p<0,05; *p<0,01

Here again artificial neural networks were used to calculate the criterion validity for this
comprehensive test battery. The results showed that the Standard test battery correctly
classified 83.1% of the total sample. This corresponds to a validity coefficient of R=0.65. In the
jackknife validation a classification rate of 83.1% and a validity coefficient of R=0.65 were
obtained. The intervals for the classification rage and the validity coefficient in the bootstrap
validation were [77.4%; 89.1%] and [0.55; 0.76].

Table 18 shows the incremental validity and relative relevance of the individual tests. The
relative relevance indicates the amount contributed by the test in question to the validity of the
predictive model.

36
ATAVT

Table 18: Incremental validity and relative relevance of the individual tests

Predictor variables Incremental validity Relative relevance

DT: Reactive stress tolerance 0.057 26.7%
RT: Reaction speed 0.034 16.2%
ATAVT: Obtaining an overview 0.039 18.4%
WRBTV: Subjectively accepted level of risk 0.036 17.1%
IVPE-PS: Mental stability 0.040 18.9%
IVPE-VB: Sense of social responsibility 0.006 2.7%

The ATAVT has a relative relevance of 18.4%; the results therefore show that it has criterion
validity.

Depending on the degree of severity and location of lesion, strokes and cranio-cerebral trauma
result in different cognitive impairments. Deficits in ability to react or attention are very frequent.
This is why the issue of fitness to drive plays a major role after an injury to the brain. Patients'
fitness to drive after a stroke or cranio-cerebral traumas was examined by Sommer, Heidinger,
Arendasy, Schauer, Schmitz-Gielsdorf, and Häusler (2010). The patients were split into two
groups based on a standardized driving test (N = 178, 81.5% men; age: M = 45.37, SD =
11.92): Driving test passed/not passed. Cognitive ability was measured with a traffic
psychological test battery which included the TAVT. Obtaining an overview correlated
significantly (r = -0.281) with the assessment of the driving test. Logistic models and models
based on an artificial neural network found a validity of the entire test battery of R = 0.53-0.81
with a correct classification rate of 79.2 – 89.9%.

In addition, a study conducted by Eder, Brieber, Uhl, Vetter, and Häcker (2018)showed that
the performances in the ATAVT for healthy older persons correlate to a significant degree with
the evaluation in a standardized driving test. As part of the study, 109 healthy older persons
over 65 were subjected to comprehensive pyschometric testing as well as a standardized
driving test. All persons were tested under supervision of a test administrator and participated
in the study voluntarily without any consequences associated with the working. Participation
was compensated for every person participating in the examination. Of the original 109
participants, 98 persons could be considered for the ultimate data assessment due to drop-
outs. The sample was collected from 2018 to 2019 and consists of 51 (52%) men and 47 (48%)
women aged between 65 and 88 (M = 73.13, SD = 5.73). The Kuratorium für
Verkehrssicherheit (Austrian Road Safety Board) conducted the driving tests for the evaluation
of driving competence. In the framework of the study, a standardized driving test (approx. 45
minutes) was planned and implemented within the Vienna city limits together with the KFV.
The driving tests were carried out per person in the presence of a driving instructor as well as
a trained driving behavior observer. A car with two sets of pedals which can also be controlled
by the driving instructor in an emergency was used. During the drive, driving competence was
measured by the driving behavior observer by way of a standardized record sheet. The
participants were rated regarding specific driving behavior dimensions and evaluated based
on an overall score across the entire driving test which was assigned at the end of the driving
test by the driving observer (grade 1 = very good to 5 = fail). A correlation analysis between
the test results in the ATAVT test and the grade awarded by the driving observer resulting in
a small correlation for the variable Obtaining an overview (r = 0.391) (Cohen, 1988).

1 Due to the limited reliability of the overall score in the driving test, a correction for attenuation according to
Schmidt und Hunter (1998) was applied. The uncorrected correlation coefficients between the overall grade in
the driving sample and obtaining an overview ATAVT corresponded to r = 0.28.

37
ATAVT

Taken together these studies of the criterion validity of the ATAVT demonstrate the relevance
of the test for traffic psychological assessment and thus meet the requirement for validation of
the test from the point of view of traffic safety.

In addition to studies of the criterion validity of the ATAVT in the field of traffic psychological
assessment, initial findings on criterion validity in the field of aviation psychological assessment
are also available. Sommer, Häusler, Koning and Arendasy (2006) investigated the criterion
validity of a comprehensive ability test battery, which included both selected subtests of the
INSBAT and a precursor version of the ATAVT (TAVTMB; Biehl, 1996). The data relates to 99
military pilot trainees (98 men) aged between 16 and 25 (M=18.84, SD=2.04). The global
assessment of performance in a standardized flight simulator served as the criterion variable.
This test battery enabled success in the flight simulator to be correctly predicted for 92.9% of
the respondents. This corresponds to a validity coefficient of 0.84. In the jackknife validation a
classification rate of 92.9% and a validity coefficient of 0.83 were obtained. The intervals for
the classification rage and the validity coefficient in the bootstrap validation were [88.2%;
97.7%] and [0.74; 0.94]. Table 19 shows the incremental validity and relative relevance of the
individual tests. The relative relevance indicates the amount contributed by the test in question
to the validity of the predictive model.

Table 19: Incremental validity and relative relevance of the individual tests

Test Incremental validity Relative relevance

Numerical inductive reasoning (INSBAT-NID) 0.157 13.4%
Arithmetical Competence (INSBAT-AK) 0.205 16.9%
Inspection Time (INSBAT-BZ) 0.195 16.2%
Decision Quality (INSBAT-EF) 0.167 14.1%
Figural inductive reasoning (INSBAT-FID) 0.324 24.6%
Numerical Flexibility (INSBAT-NF) 0.105 4.6%
Tachistoscopic Traffic Perception Test (TAVTMB) 0.117 10.2%

The results show that the ATAVT, with a relative relevance of 10.2%, has criterion validity in
the field of pilot selection.

Proof of the ATAVT’s validity has also been obtained in the field of rail transport (RSSB, 2010).
In a validation study, 85 train conductors and trainees worked on the precursor version of the
ATAVT (TAVTMB; Biehl, 1996) together with several other traffic safety-related ability tests.
As a criterion variable, different objective performance measurements and evaluations by
superiors were gathered. The empirically determined results of this analysis are summarized
in Table 20.

Table 20: Bivariate correlation between the main variable Obtaining an overview with different indices of traffic
safety and training success for train conductors and trainees.

Criterion variable Validity of main variable

Operating security systems in locomotives 0.21**
Performance in the execution of procedures 0.30**

The results prove the criterion validity of the ATAVT with regard to the criterion measures
gathered here and therefore support the criterion validity of the ATAVT in the area of safety
assessments.

Vetter et. al. (2018) conducted a study with professional bus drivers of an Austrian bus
company (N = 126; age: M = 42.12 years, SD = 10.34 years). The level of education of the
study participants was distributed as follows: 1.6% no school-leaving qualification, 14.3%
compulsory schooling without completing vocational training, 74.6% compulsory schooling

38
ATAVT

with vocational training, 8.7% school-leaving qualification at university entrance level, 0.8%
university degree. The average period of employment with the company was 7.1 years (SD =
6.4 years). The bus drivers first completed comprehensive psychological testing with the
SAROAD test set (SCHUHFRIED GmbH, 2012) which is included in the ATAVT (Test form
S1). Then, three driving tests were completed with a bus and rated by the assessors based on
the school grade system (1 = "very good" to 5 = "failed"). For the test, a half-hour drive in real
traffic in the immediate vicinity in real traffic with a set route, a driving exercise on a
maneuvering course as well as a driving test on a standardized hazard course, which simulates
critical traffic situations was performed. Depending on which driving test was predicted based
on the performance in the entire test set, the validation coefficients were between R = 0.376 –
0.616. The analysis of the bivariate correlations between individual tests and driving tests
showed that performance in the ATAVT correlated with the performance on the maneuvering
(r = 0.240) and hazard course (r = 0.253).

4.4 Scaling
The quality criterion of scaling is met when the empirical behavioral relationships are
represented exactly by the test scores (Kubinger, 2003).
For the ATAVT this means that the ability parameter fairly depicts all the relevant information
about the working of the test. Because of the validity of the Rasch model, the quality criterion
of scaling is met.

4.5 Economy
Economy describes how conservatively resources are used in relation to the amount of
information gained (Kubinger, 2009). Since they are computerized, the tests of the Vienna Test
System are very economical to administer and score. The administrator’s time is saved
because the instructions at the beginning of the test are standardized, relieving him of the need
to provide time-consuming verbal explanations. Because the test results are calculated
automatically, the time needed for manual calculation of raw and norm-referenced scores is
also saved. In addition, a shortening of the test results as far the option of decision-oriented
testing is used in the adaptive test forms S1 or S2 (see chapter 6.1).

4.6 Reasonableness
Reasonableness refers to the degree to which a test is constructed so that no undue mental
and physical demands are made of the respondent (Ziegler & Bühner, 2012). Since the ATAVT
requires little of the respondent emotionally, physically or in terms of time the test fulfills the
criterion of reasonableness.

4.7 Resistance to falsification

A test that meets the quality criterion of resistance to falsification is one that can prevent a
respondent answering questions in a manner deliberately intended to influence or control his
test score (Ziegler & Bühner, 2012). As with all ability tests, test scores on the ATAVT cannot
be deliberately manipulated by respondents to their advantage. In addition, the adaptive test
presentation of test forms S1 and S2 increase test security (see chapter 3.3.1). In some
circumstances, however, respondents may attempt to portray themselves as worse than they
are.

39
ATAVT

4.8 Fairness
If tests are to meet the quality criterion of fairness, they must not systematically discriminate
against particular groups of respondents on the grounds of their sociocultural background
(Kubinger, 2003).
On account of the results referred to in chapter 3.2, the ATAVT can be assumed to have scaling
fairness. The statistical agreement of the item difficulty parameter estimates in the different
subsamples shows that the groups studied are not systematically advantaged or
disadvantaged by individual items. In addition, neither driving experience nor knowledge of
rules appear to give respondents an advantage (Biehl, 1996). In summary, therefore, the
quality criterion of fairness can be regarded as met.

40
ATAVT

5 NORMING
The norm-referenced scores were obtained by calculating the mean percentile rank PR(x) for
each raw score X according to the formula (Lienert & Raatz, 1994):

cum fx − fx 2
PR x = 100 ⋅
N

cum fx corresponds to the number of respondents who obtained the raw score X or a lower
score, fx is the number of respondents with the raw score X, and N is the sample size.

Representative norm sample

The data of the representative norm sample of 831 individuals was gathered and expanded in
2007-2015 in the Test & Research Center of SCHUHFRIED GmbH, using a quota sample plan
stratified according to gender, age and education. The data of the latest data collection wave
was collected in 2015. All persons were tested under supervision of a test administrator and
participated in the study voluntarily without any consequences associated with the working.
Participation was compensated for every person participating in the examination.

The test results of the newly collected data were then compared with the data of the existing
representative norm sample and checked for their statistic correlation. The two samples were
parallelized based on age, gender, and educational level for this purpose. A mean comparison
was then performed for independent samples. No statistically significant differences (Cohen’s
d<0.3) were found in the test performances of the newly collected data and the existing
representative norm sample (t =1.060; p=0.290). The existing representative norm sample was
therefore expanded with the newly collected data.

The expected age and gender distribution was obtained from demographic data for Germany,
Austria, and Switzerland for 2012 (Europäische Kommission, 2019). The norm sample consists
of 431 (52%) men and 400 (48%) women aged between 15 and 94 (M=46.31, SD=17.556).
Table 19 shows the distribution of age and gender in the norm sample and the corresponding
distribution in the population of Austria and Germany.
A comparison of the two distributions using a statistical chi² test (chi²[29]=41.98, p=0.06) shows
that they do not differ significantly from each other.
The norm sample can therefore be regarded as representative of the population of Austria,
Switzerland, and Germany in terms of gender and age.
3 individuals (<1%) have no school-leaving qualification (EU educational level 1), 97 individuals
(12%) have completed compulsory schooling or an intermediate secondary school but without
completing vocational training (EU educational level 2), 364 individuals (44%) have completed
vocational training or a course at a technical college (EU educational level 3), 286 individuals
(34%) have a school-leaving qualification at university entrance level (EU educational level 4)
and 81 individuals (10%) have a university degree (EU educational level 5).

41
ATAVT

Table 21: Observed frequencies of men and women of different age groups in the norm sample and comparison
with the expected values in a representative sample
Actual number of Actual number of Expected values Expected values
Age groups
men women men women
15 to 19 28 25 12 12
20 to 24 31 30 12 12
25 to 29 34 32 13 14
30 to 34 32 33 17 17
35 to 39 28 29 18 17
40 to 44 39 38 16 15
45 to 49 46 44 13 13
50 to 54 39 40 13 13
55 to 59 34 33 11 12
60 to 64 33 34 11 12
65 to 69 23 25 8 9
70 to 74 16 31 7 9
75 to 79 15 27 5 10
80 to 84 1 9 2 5
85+ 1 1 2 5

In addition to the total norm, age-specific norms (up to 36 years, 37-54 years, 55+ years) are
provided.

Additional norm samples

Additional representative norm samples as well as convenience samples can be drawn from
Appendix A, Section 9.

42
ATAVT

6 TEST PROCEDURE
The ATAVT consists of a combined instruction and practice phase and the test phase itself. A
total of six different test forms (see chapter 3.4) are available. Respondents should be able to
work the test undisturbed and should be given sufficient time. Attention should be paid to
adequate lighting and a quiet environment during the test presentation.
Should there be any unexpected disruptions during testing (such as a power failure), the test
administrator has the option to restart the worked test. In the event of technical disruption,
answers that have already been entered are deleted.
While the test is being worked, the test administrator can terminate it at any time if the
respondent so wishes. A corresponding warning is then issued when the test is scored. By
contrast, termination without the test administrator’s consent is not possible if the person being
tested has not completed the test.
For sufficient knowledge of the manual by the diagnostician, the time required to prepare the
test presentation (booting the computer, starting the Vienna Test System, adding a
respondent, starting the test) is limited to a few minutes. Since both the administration and
scoring are performed digitally, the time required is reduced to a minimum for the diagnostician
as well. There are no restrictions in terms of age.

Figure 9: On-screen presentation of the ATAVT

6.1 Decision-oriented testing

In both adaptive test forms ATAVT incorporates a planning window that can be used to make
settings that relate to decision-oriented testing and that are especially relevant to traffic-
psychological assessment. In the Options window the user can specify whether the test should
be optimized to apply to a candidate in Group 1 (drivers without increased responsibility) or
Group 2 (drivers with increased responsibility). In both cases the test is continued until the
point at which there is a high statistical certainty (95%) that the latent dimension (in this case
obtaining an overview) lies above the threshold values specified for traffic-psychological

43
ATAVT

purposes (PR=16 or parameter -0.631 for Group 1 and PR=33 or parameter -0.161 for Group
2) or until one of the other termination criteria used in the ATAVT applies (see chapter 3.3.2).

Figure 10: Option window for the traffic-psychological cancellation criteria of decision-oriented testing.
The top option is set by default; this does not involve any additional traffic-psychological cancellation
criteria.

Figure 11: Graph of the adaptive process in an ATAVT (S1 or S2) test session. The test terminates
once the overall confidence interval lies above the cut-off score and at least 6 items have been
presented.

44
ATAVT

Used in this way, these additional cancellation criteria may – depending on the respondent’s
ability – significantly reduce the test length. Figure 12 shows the average test length needed
to arrive at a conclusion about a Group 1 driver’s ability in obtaining an overview (driver without
increased responsibility).

Figure 12: Expected test length as a function of the respondent’s ability of obtaining an overview for an
investigation in Group 1. Above an ability level of roughly PR 50 the decision-oriented procedure
carried out in connection with a traffic-psychology-related investigation leads to a noticeable reduction
in test length. Where the ability level is less than PR 50, a quicker than normal decision (and
associated cancellation of the test) is not possible. The test is continued until the specified target
reliability is achieved. This is usually the case after approximately 21 items.

6.2 Instruction and practice phase

At the start of the test it is explained that the pictures that follow are depictions of road traffic
situations. The respondent is instructed to study the pictures carefully and remember what they
show. After a short cue sound is played, the first trial picture is shown. Afterwards the
respondent reports what he/she has seen in the picture by selecting the relevant fields. For
each picture there are five answer options, which are always drawn from the same five object
classes:
• Pedestrian(s)
• Motor vehicle
• Bicyclists, motorcycle, moped
• Traffic sign(s)
• Traffic light
Once the item has been answered the trial picture is displayed again so that the respondent
can compare his/her answers with the original scene.

45
ATAVT

6.3 Test phase

In the test phase the respondent is again shown pictures of road traffic situations. The
respondent must again indicate which object classes were shown in each picture. In contrast
to the instruction phase, the pictures in the test phase are shown only for one second. Once
an answer has been given the next picture follows. Sufficient time is available for the answer.
The choice of the next item is always determined by the current estimate of the respondent’s
ability level.

The test continues in adaptive test forms S1 and S2 until the standard error of measurement
falls below the pre-set level. Occasionally the test may be terminated by the system; this occurs
if 10 successive items are answered either correctly or incorrectly. Another abort criterion is
the number of items (k=30) presented to the respondent.

In the linear test forms a fixed item set of 13 items (test forms S3 and S4) or 14 items (test
forms S5 and S6) is presented. There is no time limit for working on the test.

46
ATAVT

7 INTERPRETATION OF TEST RESULTS

7.1 General notes on interpretation
Test results are usually interpreted in terms of percentile ranks (PR). The percentile rank
indicates what percentage of a particular comparison group obtained the same or a lower score
on the variable in question. The comparison group is the norm sample that has been selected.
This is usually a representative sample of the overall population. For example, if a respondent
obtains a percentile rank of 76, this means that 76% of people in the comparison group have
the same or a lower level of the variable and 24% have the same or a higher level of the
variable. Half of the people in the comparison group with the same result as the respondent
are assigned to the group with a higher level of the variable and half are assigned to the group
with a lower level of the variable.

From the percentile rank it is possible to determine whether a respondent's score is above
average, average or below average compared to the comparison group in question. Two
different definitions of the average range are used, depending on the area of application. In
the areas of clinical and neuropsychology and traffic psychology, a broader average range of
16 - 84 is used (definition: mean plus/minus one standard deviation). By contrast, in the areas
of human resources and sports psychology, a narrower average range of 25 - 75 is used
(definition: the middle 50% of the data). The PR can therefore be interpreted as follows:

Table 22: Average ranges depending on application area

Application area
Percentile rank
HR/sport Neuro/clinical/traffic
Below average 0-24 0-15
Average 25-75 16-84
Above average 76-100 85-100

In addition to the percentile ranks, other norm-referenced score scales can also be displayed
(e.g. T-scores, Z-scores, IQ scores). In each case, the data are re-calculated so that the mean
and dispersion are defined by a particular value. For example, T-scores have a mean of 50
and a standard deviation of 10.

With many tests, confidence intervals can also be taken into account. The confidence interval
reflects the test’s reliability; it is a defined area (around the measured test score) within which
the true value of the characteristic lies with a certain probability (e.g. 95%). In practical terms,
it can be stated that the true value almost certainly lies within the given confidence interval.

7.2 Interpretation of variables

Obtaining an overview
The person parameter of the variable Obtaining an overview represents a measure of
performance for obtaining an overview as part of attentional control. The higher the percentile
rank, the higher the ability to control the focus of attention on obtaining an overview.

47
ATAVT

7.3 Additional output of results

Item analysis protocol

The item analysis protocol provides information on how each item was answered, whether the
item was correctly or incorrectly solved, and how long the respondent took to answer each
item. An item is only considered correctly solved if all the entries in the line corresponding to
that item are positive. This means that the respondent was able to identify all the required
objects in the picture and did not select any that were not actually present. This can be used
to investigate whether a higher than average number of problems arose at any particular point
during the test presentation.

48
ATAVT

Test protocol

The test protocol provides detailed information on test processing, e.g. which item was solved
correctly or incorrectly, the item difficulty parameter of the respective test item, and the
respectively current person parameter estimation (incl. confidence interval VI). Furthermore,
the current reliability is calculated after every task and the solution probability is calculated for
the individual tasks based on the final person parameter estimation.

7.4 Case study

Mr. Jones is 45 and came to an accredited assessment center. As ordered by the authorities,
he has to submit to a medical and psychological assessment due to an alcohol offense (0.9
mg/L, initial offense). He has the driving license classes B, C, and C1.

For the psychological assessment of his ability, tests for measuring the ability to react,
concentration performance, orientation performance, stress tolerance, reaction behavior, and
attention are used. The tests, test forms and achieved test scores are shown in the following:

49
ATAVT

Dimension Test/Test form Variable RW PR

Ability to react RT/S3 Reaction speed 394 75
Concentration
COG/S11 Ability to concentrate 2.41 80
performance
Orientation performance LVT/S3 Score 4 25
Reactive stress
Resilience DT/S1
tolerance
219 35
ATAVT/S1 Obtaining an
Attention overview
0.119 45

The German Guidelines for the Assessment of Respondents’ Fitness to Drive (BGL)
(Bundesanstalt für Straßenwesen, 2018) see increased responsibility for Group 2 drivers
(truck/bus) and therefore a higher threshold than for Group 1 drivers (car/motorcycle). Thus,
the BGL specifies that Group 1 drivers in all tests carried out for a specific purpose must
achieve a PR ≥ 16 while Group 2 drivers must achieve a PR ≥ 33 in the majority of the tests
and may not drop below a PR of 16 (vgl. Bundesanstalt für Straßenwesen, 2018, S. 11).

Mr. Jones shows the following results while taking the BGL into consideration:

For the Ability to react (RT), Mr. Jones achieves an average result with a PR of 75. Compared
to the reference population he is therefore able to react to relevant stimuli or stimulus
constellations with average speed. The ability to react is therefore sufficient for Group 1 and
2.

In concentration performance (COG) Mr. Jones also achieves an average result in the variable
Ability to concentrate with a PR of 80. He is therefore in an average position to keep the
balance and accuracy required for processing. Concentration performance therefore is
sufficient for both Group 1 and Group 2.

The client also has average scores with a PR of 35 in the area of stress tolerance (DT). This
means that Mr. Jones has an average level of reacting quickly and appropriately when under
stress. His ability to react under stress conditions is therefore sufficient for Group 1 and Group
2.

The score of the attention dimension (ATAVT) is also in the average range with a PR of 45.
His obtaining an overview as part of attentional control therefore is also sufficient for Group 1
and Group 2.

In orientation performance (LVT) Mr. Jones achieves a PR of 25 and therefore drops below
the required threshold of PR 33 for Group 2. However, Mr. Jones does not drop below PR 16
and overall he achieves a PR of at least 33 in the majority of the tests used. Therefore, the
driving-related ability for Group 2 is fulfilled despite dropping below the threshold of the
requirements of the BGL.
Regarding the required thresholds for Group 1, Mr. Jones achieves the required threshold of
PR 16 in all tested dimensions. Therefore, the driving-related ability for Group 1 corresponds
to the requirements of the BGL without restrictions.

Summary: Regarding the psychological performance ability, the test results achieved therefore
support sufficient ability for Group 1 and 2.

50
ATAVT

8 REFERENCES
American Educational Research Association, American Psychological Association & National
Council on Measurement in Education. (2014). Standards for educational and
psychological testing. Washington, DC: AERA.
Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38 (1),
123–140. Springer. doi:doi.org/10.1007/BF02291180
Arendasy, M. & Sommer, M. (2004). Measuring perceptual speed in complex everyday
situations. Perceptual and Motor Skills, 98 (2), 615–626. doi:10.2466/pms.98.2.615-626
Baddeley, A. D. (2012). Working memory: Theories, models, and controversies. Annual
Review of Psychology, 63, 1–29. doi:10.4324/9781315111261
Baker, F. B. & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2.
Aufl.). Boca Raton, FL: CRC Press.
Betz, N. E. & Weiss, D. J. (1976a). Psychological effects of immediate knowledge of results
and adaptive ability testing. Minneapolis: University of Minnesota.
Betz, N. E. & Weiss, D. J. (1976b). Effects of immediate knowledge of results and adaptive
testing in ability test performance. Minneapolis: University of Minnesota.
Biehl, B. (1996). Manual Tachistoscopic Traffic Perception Test (TAVTMB). Mödling:
Schuhfried.
Binet, A. & Simon, T. (1916). The development of intelligence in children:(the Binet-Simon
scale) (Vol. 11). Philadelphia, PA: Williams & Wilkins.
Bühner, M. (2011). Einführung in die Test- und Fragebogenkonstruktion. München: Pearson.
Bundesanstalt für Straßenwesen. (2018). Begutachtungsleitlinien zur Kraftfahreignung. (Stand
24.). Bergisch Gladbach: Bundesanstalt für Straßenwesen.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge:
Cambridge University Press.
Cattell, R. B. (1987). Intelligence: Its structure, growth and action (Advances in psychology,
Vol. 35). Amsterdam: North Holland.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2. Aufl.). New York,
NY: Routledge.
Deubel, H. & Hauske, G. (1988). The programming of visually guided saccades. In H. Marko,
G. Hauske & A. Struppler (Hrsg.), Processing structures for perception and action (S.
119–132). Weinheim: Verlag Chemie.
Eder, M., Brieber, D., Uhl, U., Vetter, M. & Häcker, H. (2018). Auswirkungen von
computerisiertem, kognitivem Training auf die Fahreignung gesunder, älterer Kraftfahrer.
Zeitschrift für Neuropsychologie, 28, 198.
Embretson, S. E. (1983). Construct validity: construct representation versus nomothetic span.
Psychological Bulletin, 93 (1), 179–197. doi:10.1037/0033-2909.93.1.179
Embretson, S. E. (1998). A cognitive design system approach to generating valid tests:
Application to abstract reasoning. Psychological Methods, 3 (3), 380–396. American
Psychological Association.

51
ATAVT

Embretson, S. E. (2002). Generating abstract reasoning items with cognitive theory. In S.H.
Irvine & P.C. Kyllonen (Hrsg.), Item generation for test development (S. 219–250).
Mahwah. NJ: Lawrence Erlbaum.
Etzel, S. & Hornke, L. F. (2018). Manual Visueller Gedächtnistest (Version 27 – Revision 6).
Mödling: Schuhfried.
Europäische Kommission. (2019). Bevölkerung am 1. Januar nach Altersgruppe und
Geschlecht (demo_pjangroup) [Data file]. Verfügbar unter:
http://epp.eurostat.ec.europa.eu/portal/page/portal/population/data/database
European Federation of Psychologists‘ Associations. (2013). Performance requirements,
context definitions and knowledge & skill specifications for the three EFPA levels of
qualifications in psychological assessment. Zugriff am 28.3.2019. Verfügbar unter:
www.efpa.eu/download/650d0d4ecd407a51139ca44ee704fda4
Findlay, J. M. (1980). The visual stimulus for saccadic eye movements in human observers.
Perception, 9 (1), 7–21. doi:10.1068/p090007
Fischer, G. H. (1974). Einführung in die Theorie psychologischer Tests: Grundlagen und
Anwendungen. Bern: Huber.
Fischer, G. H. & Kowar, P. (1977). A new traffic perception test on the basis of the logistic test
model. Accident Analysis & Prevention, 9 (3), 203–215. Elsevier. doi:10.1016/0001-
4575(77)90022-7
Fischer, G. H. & Molenaar, I. W. (Hrsg.). (1995). Rasch models: Foundations, recent
developments, and applications (1. Aufl.). New York, NY: Springer.
Fischer, G. H. & Ponocny-Seliger, E. (1998). Structural Rasch modeling: Handbook of the
usage of LPCM-WIN 1.0. Groningen: Progamma.
Frey, A., Hartig, J. & Moosbrugger, H. (2009). Effekte des adaptiven Testens auf die Motivation
zur Testbearbeitung am Beispiel des Frankfurter Adaptiven Konzentrationsleistungs-
Tests. Diagnostica, 55 (1), 20–28. doi:10.1026/0012-1924.55.1.20
Friedman, A. (1979). Framing pictures: The role of knowledge in automatized encoding and
memory for gist. Journal of Experimental Psychology: General, 108 (3), 316–355.
doi:doi.org/10.1037/0096-3445.108.3.316
Grossberg, S. (1987a). Cortical dynamics of three-dimensional form, color, and brightness
perception: I. Monocular theory. Perception & Psychophysics, 41 (2), 87–116.
doi:doi.org/10.3758/BF03204874
Grossberg, S. (1987b). Cortical dynamics of three-dimensional form, color and brightness
perception II: Binocular theory. Perception & Psychophysics, 41 (2), 117–158.
doi:10.3758/BF03204875
Häusler, J. (2006). Effekte von Testwiederholung und deren praktische Bedeutung im Rahmen
der verkehrspsychologischen Begutachtung. Österreichische Tagung für
Verkehrspsychologie. Gmunden.
Hessler, R. & Grossjohann, A. (1953). Untersuchung von Kraftfahrern in dem Medizinisch‐
Psychologischen Institut für Verkehrssicherheit beim Technischen Überwachungs‐Verein
Stuttgart e.V. Zeitschrift des Technischen Überwachungs‐Vereins, 1–12.
Hoffmann, J. & Kämpf, U. (1985). Mechanismen der Objektbenennung-Parallele
Verarbeitungskaskaden. Sprache & Kognition, 4, 217–230.

52
ATAVT

Hoffmann, J. & Klein, R. (1988). Kontexteffekte bei der Benennung und Entdeckung von
Objekten. Sprache & Kognition, 7, 25–39.
Hoffmann, J. & Zießler, M. (1982). Begriffe und ihre Merkmale. Zeitschrift für Psychologie, 190
(1), 46–77.
Hoffmann, J. & Zießler, M. (1986). The integration of visual and functional classifications in
concept formation. Psychological Research, 48 (2), 69–78.
Holland, P. W. & Dorans, N. J. (2006). Linking and equating. Educational Measurement, 4,
187–220.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal
of Statistics, 6 (2), 65–70. doi:10.2307/4615733
Horn, J. (1988). Thinking about human abilities. In J.R. Nesselroade & R.B. Cattell (Hrsg.),
Handbook of multivariate experimental psychology (2. Aufl., S. 645–685). New York, NY:
Plenum Press. doi:10.1007/978-1-4613-0893-5_19
Hu, L. & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis:
Conventional criteria versus new alternatives. Structural Equation Modeling: A
Multidisciplinary Journal, 6 (1), 1–55.
Instituto Brasileiro de Geografia e Estatística. (2011). Censo demográfico, 2010. Rio de
Janeiro: Author.
International Test Commission. (2001). International guidelines for test use. International
Journal of Testing, 1 (2), 93–114. doi:10.1207/s15327574ijt0102_1
Irvine, S. H. & Kyllonen, P. C. (2002). Item generation for test development. New York, NY:
Routledge.
Kersting, M. (2018). Zur Information über und Dokumentation von Instrumenten zur Erfassung
menschlichen Erlebens und Verhaltens - Die DIN SCREEN Checkliste 1, Version 3. In
Diagnostik- und Testkuratorium (Hrsg.), Personalauswahl kompetent gestalten:
Grundlagen und Praxis der Eignungsdiagnostik nach DIN 33430 (S. 224–244). Berlin:
Springer.
Kingsbury, G. G. & Zara, A. R. (1989). Procedures for selecting items for computerized
adaptive tests. Applied Measurement in Education, 2 (4), 359–375. Taylor & Francis.
doi:10.1207/s15324818ame0204_6
Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking (3. Aufl.). New York,
NY: Springer. doi:10.1007/978-1-4939-0317-7
Koller, I., Alexandrowicz, R. & Hatzinger, R. (2012). Das Rasch Modell in der Praxis. Eine
Einführung in eRm. Wien: Facultas Verlags- und Buchhandels AG.
Kubinger, K. D. (2003). Gütekriterien. In K.D. Kubinger & R.S. Jäger (Hrsg.), Schlüsselbegriffe
der psychologischen Diagnostik (S. 195–204). Weinheim: Beltz.
Kubinger, K. D. (2009). Psychologische Diagnostik. Theorie und Praxis psychologischen
Diagnostizierens. Göttingen: Hogrefe.
Lienert, G. A. & Raatz, U. (1994). Testaufbau und Testanalyse. Weinheim: Beltz.
van der Linden, W. J. & Glas, C. A. W. (Hrsg.). (2010). Elements of adaptive testing. New York,
NY: Springer.

53
ATAVT

van der Linden, W. J. & Hambleton, R. K. (Hrsg.). (1997). Handbook of modern item response
theory. Berlin: Springer.
Loftus, G. R. (1976). A framework for a theory of picture recognition. In R.A. Monty & J.W.
Senders (Hrsg.), Eye movements and psychological processes (S. 499–514). Hillsdale,
NJ: Lawrence Erlbaum.
Martin-Löf, P. (1973). Statistiska Modeller. Stockholm: Institutet för säkringsmatematik och
matematisk statistik vid Stockholms universitet.
McGrew, K. S. (1997). Analysis of the major intelligence batteries according to a proposed
comprehensive Gf-Gc framework. In D.P. Flanagan, J.L. Genshaft & P.L. Harrison
(Hrsg.), Contemporary intellectual assessment: Theories, tests, and issues (S. 151–179).
New York, NY: Guilford.
Meyers, L. S. & Rhoades, R. W. (1978). Visual search of common scenes. Quarterly Journal
of Experimental Psychology, 30, 297–310. doi:10.1080/14640747808400677
Müller, H. J. & Krummenacher, J. (2006). Visual search and selective attention. Visual
Cognition, 14 (4–8), 389–410. doi:10.1080/13506280500527676
Neisser, U. (1976). Cognition and reality. San Francisco, CA: Freeman.
Ottes, F. P., Van Gisbergen, J. A. & Eggermont, J. J. (1985). Latency dependence of colour-
based target vs nontarget discrimination by the saccadic system. Vision Research, 25 (6),
849–862. doi:10.1016/0042-6989(85)90193-2
Ponocny, I. (2001). Nonparametric goodness-of-fit tests for the Rasch model. Psychometrika,
66 (3), 437–459. doi:doi.org/10.1007/BF02294444
R Core Team. (2016). R: A language and environment for statistical computing. R Foundation
for Statistical Computing. Verfügbar unter: https://www.r-project.org/
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago, IL:
University of Chicago Press.
Reinecke, J. (2014). Strukturgleichungsmodelle in den Sozialwissenschaften (2. Aufl.).
München: Oldenbourg.
Rettig, K. & Hornke, L. F. (1990). Adaptives Testen. In W. Sarges (Hrsg.), Management-
Diagnostik (S. 444–450). Göttingen: Hogrefe.
Risser, R., Chaloupka, C., Grundler, W., Sommer, M., Häusler, J. & Kaufmann, C. (2008).
Using non-linear methods to investigate the criterion validity of traffic-psychological test
batteries. Accident Analysis and Prevention, 40 (1), 149–157.
doi:10.1016/j.aap.2006.12.018
Rosch, E. (1977). Human categorization. In N. Warren (Hrsg.), Studies in cross-cultural
psychology (1. Aufl., S. 3–49). London: Academic Press.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of
Statistical Software, 48 (2), 1–36. doi:10.18637/jss.v048.i02
Rost, J. (2004). Lehrbuch Testtheorie - Testkonstruktion. Bern: Huber.
RSSB. (2010). Driver Selection: Development Phase - Updated selection criteria and validation
study (T628 report). London.
Schellig, D. (2018). Manual Block-Tapping-Test vorwärts Block-Tapping-Test rückwärts
Supra-Blockspannen-Test (Version 29). Mödling: Schuhfried.

54
ATAVT

Schmidt, F. L. & Hunter, J. E. (1998). The validity and utility of selection methods in personnel
psychology: Practical and theoretical implications of 85 years of research findings.
Psychological Bulletin, 124 (2), 262–274. doi:10.1037/0033-2909.124.2.262
Schneider, W. J. & McGrew, K. S. (2018). The Cattell-Horn-Carroll theory of cognitive abilities.
In D.P. Flanagan & E.M. McDonough (Hrsg.), Contemporary intellectual assessment:
Theories, tests, and issues (4. Aufl., S. 73–163). New York, NY: Guilford Press.
Schuhfried, G. (2005). Handanweisung Expertensystem Verkehr (XPSV). Mödling:
Schuhfried.
Schuhfried, G. (2019a). Manual Cognitrone (Version 52 – Revision 2). Mödling: Schuhfried.
Schuhfried, G. (2019b). Manual Determinationstest (Version 45 – Revision 1). Mödling:
Schuhfried.
Schuhfried, G. (2019). Manual Reaktionstest (Version 45 – Revision 1). Mödling: Schuhfried.
SCHUHFRIED GmbH. (2012). Manual Safety Assessment Road (Version 4 - Revision 2).
Mödling: SCHUHFRIED GmbH.
Sijtsma, K. & Emons, W. H. M. (2011). Advice on total-score reliability issues in psychosomatic
measurement. Journal of Psychosomatic Research, 70 (6), 565–572. Elsevier Inc.
doi:10.1016/j.jpsychores.2010.11.002
Sommer, M., Arendasy, M. E., Schützhofer, B. & Knessl, G. (2017). Reduktion von
Testwiederholungseffekten durch adaptives Testen - Experimentelle Überprüfung der
Retest-Effekte anhand des Faktors Wahrnehmungsgeschwindigkeit. Zeitschrift für
Verkehrssicherheit, 63 (2), 102–103.
Sommer, M., Arendasy, M., Schuhfried, G. & Litzenberger, M. (2005). Diagnostische
Unterscheidbarkeit unfallfreier und mehrfach unfallbelasteter Kraftfahrer mit Hilfe nicht-
linearer Auswertemethoden. Zeitschrift für Verkehrssicherheit, 51 (2), 82–86.
Sommer, M., Häusler, J., Koning, A. J. & Arendasy, M. (2006). Validation of the dutch airforce
test battery using artificial neural networks. Proceedings from 48th Annual International
Military Testing Association (S. 1–10).
Sommer, M., Heidinger, C., Arendasy, M., Schauer, S., Schmitz-Gielsdorf, J. & Häusler, J.
(2010). Cognitive and personality determinants of post-injury driving fitness. Archives of
Clinical Neuropsychology, 25 (2), 99–117. doi:10.1093/arclin/acp109
Sommer, M., Herle, Margit, Häusler, J. & Arendasy, M. (2008). Von TAVTMB zu ATAVT: Eine
Anwendung der automatisierten Itemgenerierung unter einschränkenden
Rahmenbedingungen. In G. Fleck & G. Ebner (Hrsg.), Zweites Österreichisches
Symposium für Psychologie im Militär: Schriftenreihe der Landesverteidigungsakademie
(S. 27–52). Wien: Bundesministerium für Landesverteidigung und Sport.
Sommer, M., Herle, M, Häusler, J., Risser, R., Schützhofer, B. & Chaloupka, C. (2008).
Cognitive and personality determinants of fitness to drive. Transportation Research Part
F: Traffic Psychology and Behaviour, 11 (5), 362–375. doi:10.1016/j.trf.2008.03.001
Testkuratorium. (2010). TBS-TK. Testbeurteilungssystem des Testkuratoriums der Föderation
Deutscher Psychologenvereinigungen. Revidierte Fassung vom 09. September 2009.
Psychologische Rundschau, 61, 52–56.
Tonidandel, S., Quiñones, M. A. & Adams, A. A. (2002). Computer-adaptive testing: The
impact of test characteristics on perceived performance and test takers’ reactions. Journal
of Applied Psychology, 87 (2), 320.

55
ATAVT

Vetter, M., Schünemann, A. L., Brieber, D., Debelak, R., Gatscha, M., Grünsteidel, F. et al.
(2018). Cognitive and personality determinants of safe driving performance in
professional drivers. Transportation Research Part F: Traffic Psychology and Behaviour,
52, 191–201. doi:10.1016/j.trf.2017.11.008
Wagner, M., Karner, T. & Biehl, B. (2018). Manual Linienverfolgungstest (Version 38 –
Revision 1). Mödling: Schuhfried.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory.
Psychometrika, 54 (3), 427–450. doi:10.1007/BF02294627
Yu, C. Y. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with
binary and continuous outcomes (Vol. 30). Los Angeles, CA: University of California.
Ziegler, M. & Bühner, M. (2012). Grundlagen der Psychologischen Diagnostik. Wiesbaden:
Springer VS.
Zimmer, H. D. (1984). Blume oder Rose? Unterschiede in der visuellen
Informationsverarbeitung bei Experten und Laien. Archiv für Psychologie, 136, 343–361.

56
ATAVT

9 APPENDIX A – NORM SAMPLES

Older drivers - Italy
The data which the norm sample “Older drivers – Italy” is based on was collected between
2010 and 2012 by Dr. Max Dorfer and Dr. Carmen Unterthiner at the forensic medicine service
of the traffic psychology section of the South Tyrolean health service. Test subjects completed
the ATAVT as part of a legally required study of driver fitness.

The data relates to 486 (83.2%) men and 98 (16.8%) women aged between 80 and 93
(M=81.58, SD=2.78). 143 individuals (24.5%) have no school-leaving qualification (EU
education level 1), 241 individuals (41.3%) have completed compulsory schooling or an
intermediate secondary school but without completing vocational training (EU education level
2), 94 individuals (16.1%) have completed vocational training or a course at a technical college
(EU education level 3), 74 individuals (12.7%) have a school-leaving qualification at university
entrance level (EU education level 4) and 32 individuals (5.5%) have a university degree (EU
education level 5).

Traffic psychological clients - France

The data which the norm sample “Traffic psychological clients – France” is based on was
collected in the course of traffic psychological examinations in France between 2010 and 2016.

The data relate to 2747 men and 270 women aged between 16 and 83 (M=36.76, SD=12.383).
491 individuals have no school-leaving qualification (EU education level 1), 775 individuals
have completed compulsory schooling or an intermediate secondary school but without
completing vocational training (EU education level 2), 1049 individuals have completed
vocational training or a course at a technical college (EU education level 3), 389 individuals
have a school-leaving qualification at university entrance level (EU education level 4) and 315
individuals have a university degree (EU education level 5).

Applicants (professional drivers) - Netherlands

The data which the norm sample “Applicants (professional drivers) - Netherlands” is based on
was collected in several Dutch institutions between 2007 and 2013. Testing was carried out as
part of the process of recruiting professional drivers (bus and truck drivers).

The data relate to 1417 men and 155 women aged between 18 and 66 (M=34.24, SD=10.73).
14 individuals have no school-leaving qualification (EU education level 1), 73 individuals have
completed compulsory schooling or an intermediate secondary school but without completing
vocational training (EU education level 2), 1012 individuals have completed vocational training
or a course at a technical college (EU education level 3), 6 individuals have a school-leaving
qualification at university entrance level (EU education level 4) and 23 individuals have a
university degree (EU education level 5).The remaining people did not provide any information
about their education.

57
ATAVT

Adults – Romania
The data which the norm sample “Adults – Romania" is based on data collected between 2008
and 2009 as part of norm studies at the University of Bucharest.

The data relate to 226 men and 180 women aged between 17 and 82 (M=29.2, SD=12.306).
24 individuals have no school-leaving qualification (EU education level 1), 39 individuals have
completed compulsory schooling or an intermediate secondary school but without completing
vocational training (EU education level 2), 101 individuals have completed vocational training
or a course at a technical college (EU education level 3), 121 individuals have a school-leaving
qualification at university entrance level (EU education level 4) and 121 individuals have a
university degree (EU education level 5).

Representative norm sample – Italy

The representative norm sample of 343 individuals was gathered in 2014 at several locations
in Italy using a stratified quota sample plan according to gender and age. All persons have
completed the DRIVESC test set.

The expected age and gender distribution resulted from demographic data from Italy in 2011
(Instituto Nazionale di Statistica, 2011). The norm sample consists of 155 (45%) men and 188
(55%) women aged between 17 and 94 (M=49.44, SD=18.939).

Table 23 shows the distribution of age and gender in the norm sample and the corresponding
distribution in the population of Italy:

Table 23: Observed frequencies of men and women of different age groups and comparison with the expected values in a
representative sample

Actual number of Actual number of Expected values

Age groups Expected values men
men women women
15 to 19 4 4 10 9
20 to 24 15 13 10 10
25 to 29 15 14 11 11
30 to 34 16 18 13 13
35 to 39 11 14 15 15
40 to 44 8 16 16 16
45 to 49 11 16 16 16
50 to 54 9 15 14 14
55 to 59 13 12 12 13
60 to 64 12 18 12 13
65 to 69 11 12 10 11
70 to 74 17 13 10 11
75 to 79 6 16 7 10
80 to 84 3 4 5 8
85+ 4 3 3 8

A comparison of the two distributions using a statistical chi² test (chi²[29]=41.73, p=0.06) shows
that they do not differ significantly from each other.
The norm sample can therefore be regarded as representative of the population of Italy in
terms of gender and age.

50 individuals (15%) have no school-leaving qualification (EU educational level 1), 44

individuals (13%) have completed compulsory schooling or an intermediate secondary school
but without completing vocational training (EU educational level 2), 25 individuals (7%) have

58
ATAVT

completed vocational training or a course at a technical college (EU educational level 3), 81
individuals (24%) have a school-leaving qualification at university entrance level (EU
educational level 4) and 141 individuals (41%) have a university degree (EU educational level
5). For 2 people, the highest education completed is not known.
In addition to the total norm, age-specific norms (up to 30 years, 31-56 years, 57+ years),
education-specific norms (EU education levels 1-3, EU education level 4, EU educational level
5) and gender-specific norms are provided.

Applicants (general) – Australia

The data which the norm sample “Applicants (general)" is based on was collected in 2009-
2010 as part of norm studies in Australia.

The data relates to 113 men and 20 women aged between 18 and 63. 4 individuals have no
school-leaving qualification (EU education level 1), 37 individuals have completed compulsory
schooling or an intermediate secondary school but without completing vocational training (EU
education level 2), 52 individuals have completed vocational training or a course at a technical
college (EU education level 3), 29 individuals have a school-leaving qualification at university
entrance level (EU education level 4) and 11 individuals have a university degree (EU
education level 5).

Applicants (shunter/yardmaster) - Australia

The data which the norm sample “Applicants (shunter/yardmaster)” is based on was collected
in Australia between 2009 and 2010.

The data relates to 127 men and 42 women aged between 18 and 64. 2 individuals have no
school-leaving qualification (EU education level 1), 57 individuals have completed compulsory
schooling or an intermediate secondary school but without completing vocational training (EU
education level 2), 71 individuals have completed vocational training or a course at a technical
college (EU education level 3), 32 individuals have a school-leaving qualification at university
entrance level (EU education level 4) and 7 individuals have a university degree (EU education
level 5).

Applicants (Train Controller and Network Op.) – Australia

The data which the norm sample “Applicants (train controller and network operator)” is based
on was collected as part of norm studies in Australia between 2009 and 2010.

The data relates to 154 men and 44 women aged between 17 and 58. 2 individuals have no
school-leaving qualification (EU education level 1), 13 individuals have completed compulsory
schooling or an intermediate secondary school but without completing vocational training (EU
education level 2), 67 individuals have completed vocational training or a course at a technical
college (EU education level 3), 69 individuals have a school-leaving qualification at university
entrance level (EU education level 4) and 46 individuals have a university degree (EU
education level 5).

Applicants (train staff) - Australia

The data which the norm sample “Applicants (Train Staff)" is based on was collected in 2009
and 2010 as part of norm studies in Australia.

The data relates to 337 men and 92 women aged between 17 and 62. 9 individuals have no
school-leaving qualification (EU education level 1), 77 individuals have completed compulsory

59
ATAVT

schooling or an intermediate secondary school but without completing vocational training (EU
education level 2), 176 individuals have completed vocational training or a course at a technical
college (EU education level 3), 119 individuals have a school-leaving qualification at university
entrance level (EU education level 4) and 48 individuals have a university degree (EU
education level 5).

Applicants (trainee train drivers) - Australia

The data which the norm sample “Applicants (trainee train drivers)” is based on was collected
in Australia between 2009 and 2011.

The data relates to 1353 men and 216 women aged between 18 and 80. 13 individuals have
no school-leaving qualification (EU education level 1), 315 individuals have completed
compulsory schooling or an intermediate secondary school but without completing vocational
training (EU education level 2), 640 individuals have completed vocational training or a course
at a technical college (EU education level 3), 394 individuals have a school-leaving
qualification at university entrance level (EU education level 4) and 206 individuals have a
university degree (EU education level 5).

Bus drivers – Austria

The data which the norm sample “Bus drivers - Austria” is based on was collected in 2014 as
part of norming studies in Austria.

The data relates to 126 men aged between 22 and 65. 13 individuals have no school-leaving
qualification (EU education level 1), 15 individuals have completed compulsory schooling or
an intermediate secondary school but without completing vocational training (EU education
level 2), 86 individuals have completed vocational training or a course at a technical college
(EU education level 3), 11 individuals have a school-leaving qualification at university entrance
level (EU education level 4) and 1 individual has a university degree (EU education level 5).

Representative norm sample –Slovakia

The data of the representative norm sample of 331 individuals was gathered in 2008-2011 at
several locations in Slovakia. The norm sample consists of 170 (51%) men and 161 (49%)
women aged between 15 and 86 (M=40.76, SD=15.805).

9 individuals (3%) have no school-leaving qualification (EU educational level 1), 84 individuals
(25%) have completed compulsory schooling or an intermediate secondary school but without
completing vocational training (EU educational level 2), 98 individuals (30%) have completed
vocational training or a course at a technical college (EU educational level 3), 107 individuals
(32%) have a school-leaving qualification at university entrance level (EU educational level 4)
and 33 individuals (10%) have a university degree (EU educational level 5).

Healthy older drivers – Germany

The data which the norm sample “Healthy older drivers - Germany” is based on was collected
in the course of traffic psychological examinations in Germany between 2015 and 2016.

The data relate to 213 men and 87 women aged between 48 and 88 (M=59.12, SD=8.072). 16
individuals have no school-leaving qualification (EU education level 1), 125 individuals have
completed compulsory schooling or an intermediate secondary school but without completing
vocational training (EU education level 2), 48 individuals have completed vocational training or
a course at a technical college (EU education level 3), 46 individuals have a school-leaving

60
ATAVT

qualification at university entrance level (EU education level 4) and 65 individuals have a
university degree (EU education level 5).

Applicants (bus drivers) – Slovakia

The data which the norm sample “Applicants (bus drivers) - Slovakia” is based on was collected
in a Slovakian institution between 2011 and 2014. Testing was carried out as part of the
process of recruiting professional drivers (bus drivers).

The data relate to 639 men and 9 women aged between 21 and 69 (M=46.78, SD=10.663). 46
individuals have completed compulsory schooling or an intermediate secondary school but
without completing vocational training (EU education level 2), 413 individuals have completed
vocational training or a course at a technical college (EU education level 3), 161 individuals
have a school-leaving qualification at university entrance level (EU education level 4) and 25
individuals have a university degree (EU education level 5).

Applicants (tram drivers) – Slovakia

The data which the norm sample “Applicants (tram drivers) - Slovakia” is based on was
collected in a Slovakian institution between 2011 and 2014. Testing was carried out as part of
the process of recruiting professional drivers (tram drivers).

The data relate to 283 men and 58 women aged between 19 and 70 (M=45.48, SD=10.238).
37 individuals have completed compulsory schooling or an intermediate secondary school but
without completing vocational training (EU education level 2), 172 individuals have completed
vocational training or a course at a technical college (EU education level 3), 113 individuals
have a school-leaving qualification at university entrance level (EU education level 4) and 19
individuals have a university degree (EU education level 5).

Applicants (trolley bus drivers) – Slovakia

The data which the norm sample “Applicants (trolley bus drivers) - Slovakia” is based on was
collected in a Slovakian institution between 2011 and 2014. Testing was carried out as part of
the process of recruiting professional drivers (trolley bus drivers).

The data relates to 322 men aged between 21 and 71 (M=47.94, SD=9.664). 1 individuals
have no school-leaving qualification (EU education level 1), 18 individuals have completed
compulsory schooling or an intermediate secondary school but without completing vocational
training (EU education level 2), 203 individuals have completed vocational training or a course
at a technical college (EU education level 3), 81 individuals have a school-leaving qualification
at university entrance level (EU education level 4) and 19 individuals have a university degree
(EU education level 5).

Representative norm sample – Brazil

The representative norm sample - Brazil of 436 individuals was gathered in 2016 as part of
norm samples using a quota sample plan stratified according to gender, age, and education.
The expected age and gender distribution was obtained from demographic data for Brazil for
2010 (Instituto Brasileiro de Geografia e Estatística, 2011). The norm sample consists of 219
(50%) women and 217 (50%) men aged between 16 and 86 (M=39.18, SD=16.81).

Table 24 shows the distribution of age and gender in the norm sample and the corresponding
distribution in the population of Austria and Germany:

61
ATAVT

Table 24: Observed frequencies of men and women of different age groups in the norm sample and comparison with the
expected values in a representative sample

Actual number of Actual number of Expected values

Age groups Expected values men
men women women
15 to 19 24 24 26 25
20 to 24 22 26 26 26
25 to 29 25 28 25 26
30 to 34 31 27 23 24
35 to 39 22 22 20 21
40 to 44 18 18 19 20
45 to 49 16 18 17 18
50 to 54 14 14 15 16
55 to 59 11 12 12 13
60 to 64 9 9 9 10
65 to 69 10 7 7 8
70 to 74 8 6 5 6
75 to 79 4 3 3 4
80 to 84 2 3 2 3
85+ 1 2 1 2

A comparison of the two distributions using a chi² test (chi²[29]=9.34, p=1) shows that they do
not differ significantly from each other. The norm sample can therefore be regarded as
representative of the population of Brazil in terms of gender and age.

26 individuals (6%) have no school-leaving qualification (EU educational level 1), 76 individuals
(17%) have completed compulsory schooling or an intermediate secondary school but without
completing vocational training (EU educational level 2), 80 individuals (18%) have completed
vocational training or a course at a technical college (EU educational level 3), 135 individuals
(31%) have a school-leaving qualification at university entrance level (EU educational level 4)
and 119 individuals (27%) have a university degree (EU educational level 5).

In addition to the total norm, age-specific norms (up to 28 years, 29-45 years, 46+ years),
education-specific norms (EU education levels 1-3, EU education levels 4-5), and gender-
specific norms are provided.

Bus drivers – Chile

The data which the norm sample “Bus drivers - Chile” is based on was collected between 2016
and 2017 as part of norm studies in Chile.

The data relates to 148 men aged between 26 and 73 (M=49.81, SD=9.724). 38 individuals
(26%) have no school-leaving qualification (EU educational level 1), 94 individuals (64%) have
completed compulsory schooling or an intermediate secondary school but without completing
vocational training (EU educational level 2), 14 individuals (9%) have completed vocational
training or a course at a technical college (EU educational level 3), 1 individual (1%) have a
school-leaving qualification at university entrance level (EU educational level 4) and 1
individual (1%) have a university degree (EU educational level 5).

62
ATAVT

Professional drivers - Czech Republic

The data which the norm sample “Professional drivers - Czech Republic” is based on was
collected between 2018 and 2020. Testing was done as part of legally mandated testing for
professional drivers.

The data relate to 176 men and 2 women aged between 21 and 71 (M = 46.01; SD = 10.546).
12 individuals have completed compulsory schooling or an intermediate secondary school but
without completing vocational training (EU education level 2), 101 individuals have completed
vocational training or a course at a technical college (EU education level 3), 53 individuals
have a school-leaving qualification at university entrance level (EU education level 4) and 12
individuals have a university degree (EU education level 5).

Healthy older drivers – Austria (exclusive norm sample)

The exclusive norm sample "Healthy older drivers - Austria" is a sample of test persons in the
context of driving aptitude diagnostics, collected in the course of a study to test the
effectiveness of cognitive training in the context of traffic psychology. The sample (N = 108)
consists of 54 men and 54 women aged 65 to 88 years from Austria. The data were collected
in the years 2017 and 2018 at the Test and Research Center of SCHUHFRIED GmbH. All
individuals were tested under the supervision of a test leader and participated voluntarily in the
data collection without any consequences. Participation in the study was remunerated for.

A total of 5 individuals have completed compulsory school or secondary modern school, but
without having completed vocational training (EU education level 2), 54 individuals have
completed vocational training or technical school (EU education level 3), 31 individuals have
completed secondary school with a high school diploma (EU education level 4) and 18
individuals have a university degree (EU education level 5).

For the reliability of test form S5, an estimate of the internal consistency using Cronbach's
alpha is available. For details see Section 4.2.

The Exclusive norm sample "Healthy older drivers - Austria" is not included as standard in the
Vienna Test System and can be purchased separately.

63
ATAVT

10 APPENDIX B - NOTES ON INTERPRETATION IN

TRAFFIC PSYCHOLOGICAL ASSESSMENT
Germany
Notes on the interpretation of percentile ranks in the context of traffic psychological
assessment can be found in the Begutachtungs-Leitlinien (assessment guidelines)
(Bundesanstalt für Straßenwesen, 2018, Abschnitt 2.5). Depending on whether the
assessment relates to a driver of Group 1 or Group 2, percentile ranks of 16 (Group 1) and 33
(Group 2) are regarded as critical cut-off scores.

64
ATAVT

11 APPENDIX C – DIN 33430 CHECKLIST

DIN 33430 lists 69 requirements for procedure instructions; they are elaborated in Checklist 1
of the DIN Screen (Kersting, 2018). This checklist is officially regarded as the standard for
information and documentation of tools to measure human experience and behavior of the
Testkuratorium (test board) of the Federation of German Psychology Associations. The
requirements are listed by DIN number in columns 1 and 6 of Table A1, but their wording is
not printed here. They can be read in the current version of DIN 33430. The table simplifies
quality checking of the procedure instructions, as required, for example, in the context of the
test assessment system of the assessment and test board (Testkuratorium, 2010).

Table A1. A list of the evidence for the requirements of DIN 33430
mandatory

mandatory
Source

Source
Din no.

Din no.
optional

optional
(O) or

(O) or
Yes

Yes
No

No
(M)

(M)
A1 O ☒ ☐ Chapter 2 B21 O ☐ ☒
A2 O ☒ ☐ Chapter 2 B22 M ☒ ☐ Chapter 4.2
A3 O ☐ ☐ Not relevant B23 O ☐ ☒
A4 M ☒ ☐ Chapter 2 B24 O ☒ ☐ Chapter 4.2
A5 O ☒ ☐ Chapter 2 B25 O ☒ ☐ Chapter 4.2
A6 O ☒ ☐ Chapter 2 B26 M ☐ ☐ Not relevant
A7 O ☐ ☐ Not relevant B27 M ☒ ☐ Chapter 4.3
A8 O ☒ ☐ Chapter 2, 3.4 B28 M ☒ ☐ Chapter 4.3
A9 O ☒ ☐ Chapter 2, 3.4 B29 M ☒ ☐ Chapter 4.3
A10 O ☒ ☐ Chapter 2, 3.4 B30 M ☒ ☐ Chapter 4.3
A11 O ☒ ☐ Chapter 2 B31 M ☒ ☐ Chapter 4.3
A12 O ☐ ☐ Not relevant B32 M ☒ ☐ Chapter 4.3
A13 M ☒ ☐ Chapter 6 B33 M ☒ ☐ Chapter 4.3
A14 M ☒ ☐ Chapter 2 B34 O ☒ ☐ Chapter 4.3
A15 M ☒ ☐ Chapter 7 B35 M ☐ ☐ Not relevant
B1 M ☒ ☐ Chapter 3.1 B36 M ☐ ☐ Not relevant
B2 M ☒ ☐ Chapter 3.2 B37 M ☐ ☐ Not relevant
B3 M ☒ ☐ Chapter 4.3 B38 O ☐ ☐ Not relevant
B4 M ☒ ☐ Chapter 4.3 B39 O ☐ ☐ Not relevant
B5 M ☒ ☐ Chapter 4.3 B40 O ☐ ☐ Not relevant
B6 M ☒ ☐ Chapter 4.3 B41 O ☒ ☐ Chapter 4.3,
3.2.5
B7 M ☒ ☐ Chapter 4.3 B42 O ☒ ☐ Chapter 4.3
B8 M ☐ ☒ B43 O ☒ ☐ Chapter 4.3
B9 M ☐ ☒ B44 O ☒ ☐ Chapter 4.3
B10 O ☐ ☒ B45 O ☒ ☐ Chapter 4.3
B11 O ☐ ☒ B46 M ☒ ☐ Chapter 4.3
B12 O ☒ ☐ Chapter 4.3 B47 O ☐ ☒
B13 M ☒ ☐ Chapter 4.3 B48 M ☒ ☐ Chapter 4.3
B14 O ☒ ☐ Chapter 3.3 B49 O ☒ ☐ Chapter 4.3

65
ATAVT

B15 M ☐ ☐ Not relevant B50 O ☐ ☐ Not relevant

B16 M ☒ ☐ Chapter 5 B51 O ☐ ☐ Not relevant
B17 O ☒ ☐ Chapter 5 B52 O ☐ ☐ Not relevant
B18 M ☐ ☐ Not relevant B53 O ☐ ☐ Not relevant
B19 M ☒ ☐ Chapter 5 B54 O ☐ ☐ Not relevant
B20 O ☐ ☒
Explanation:
In some cases the statements in the DIN SCREEN checklist (Kersting, 2018) that are to be
assessed are preceded by branching questions to ensure that only those DIN requirements
that are relevant to the particular application purpose are checked. For example, requirements
B49 to B54 relate to proof of content validity. However, according to DIN 33430 this proof only
needs to be produced if the test claims to have content validity. This is checked with the DIN
SCREEN (Kersting, 2018) branching question V16: Is content validity claimed for the test?