0% found this document useful (0 votes)

54 views

Master'S Thesis: Behavioral Detection of Cheating in Online Examination

1. The thesis explores detecting online exam cheating through analyzing human-computer interaction dynamics like keystrokes and mouse movements. 2. The study focuses on using measures of user behavior and machine learning to identify cheating. This approach is grounded in cue leakage theory, which suggests cheaters reveal themselves through subtle behavioral changes that can be detected with pattern recognition and anomaly detection. 3. The author conducted an experiment recording participants' computer interactions during mock online exams. The data was then analyzed both quantitatively, by extracting metrics from interactions, and qualitatively, by manually examining video recordings. The analysis aimed to identify behavioral patterns that may indicate cheating.

Uploaded by

kmlee9052

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Master'S Thesis: Behavioral Detection of Cheating in Online Examination

Uploaded by

kmlee9052

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 120

2010:112

MASTER'S THESIS

Behavioral Detection of Cheating

in Online Examination

Matus Korman

Luleå University of Technology

D Master thesis
Computer and Systems Sciences
Department of Business Administration and Social Sciences
Division of Information Systems Sciences

2010:112 - ISSN: 1402-1552 - ISRN: LTU-DUPP--10/112--SE

Acknowledgements

I would like to thank everyone, who contributed in, opposed to, assisted with, or
otherwise helped me carrying out the study as well as writing this thesis – a result
of the study.
My thanks go to Dan Harnesk, PhD. (supervisor), Sören Samuelsson, PhD., and
John Lindström, PhD., for the valuable advice and research guidance I was given;
to Hugo Quisbert, PhD., Artjom Vassiljev and Viola Veiderpass for constructive op-
position; to Lars Furberg for the ideas, which helped me to navigate to the research
problem chosen and the interesting discussions we had; to Neil Costigan, PhD., for
his inspiring work and presentations; to professor Ann Hägerfors for managing is-
sues also related to my study; and to my family for their mental support and advice.
My further thanks go to Amir Molavi, Onur Yirmibesoglu, Marko Niemimaa, Elina
Laaksonen, Nebojsa Mihajlovski, Vladimir Kichatov, Ali Fakhr, Darya Plankina,
Anna Selischeva, Sana Rouis, Svante Edzén, Peter Anttu, and others, who con-
tributed to my thoughtflow through discussions, or supported me in different other
ways.

Special thanks go to Behaviometrics AB and the people, efforts of whom relate

to the study.

Also thanks to the contributions of all of you, the study has been done the way
it has, and I feel having learned valuable knowledge and gained practice, for which
there is use in the future.
Abstract

This thesis relates to studying possibilities of detecting online examination cheating

through the measures of human-computer interaction dynamics.

The need for and use of online or computer-based examination seems to be growing,
while this form of examination gives students a broader spectrum of opportunities
including those for cheating, as compared to non-computerized ways of examination.
The times are changing, there are many different reasons for examination dishonesty,
many ways of performing it, and many ways of coping with it. Given an equilib-
rium at this level, new ways of violation deserve new ways of prevention, or at least
detection.

The study focuses on a method of computer-based examination cheating detec-

tion based on measures of behavior and machine learning, and tries to link it to
a broadly taken concept of academic dishonesty. The detection potential of this
method is mainly indicated by cue leakage theory, subjects of which can be han-
dled with use of pattern recognition and anomaly detection theory, all through a
behavioral biometrics approach.
Contents

1 Introduction 1
1.1 Topic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Research goals and delimitation . . . . . . . . . . . . . . . . . . . . . 3
1.3 Significance of the study . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 7
2.1 Examination cheating . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 What’s wrong with cheating? . . . . . . . . . . . . . . . . . . 8
2.1.2 Why do students cheat? . . . . . . . . . . . . . . . . . . . . . 9
2.1.3 The mission: preventing cheating . . . . . . . . . . . . . . . . 16
2.1.4 How do students cheat? . . . . . . . . . . . . . . . . . . . . . 19
2.1.5 Detecting cheating as a means of prevention . . . . . . . . . . 21
2.1.6 Cheating review summary . . . . . . . . . . . . . . . . . . . . 22
2.2 Specifics of distance operation . . . . . . . . . . . . . . . . . . . . . . 26

3 Conceptual framework 29
3.1 Cue leakage theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Pattern recognition theory . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Anomaly detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Behaviometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.1 Biometrics in general . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.2 Specifics of behaviometrics . . . . . . . . . . . . . . . . . . . 38
3.4.3 Keystroke dynamics . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.4 Mouse dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4.5 Linguistic dynamics . . . . . . . . . . . . . . . . . . . . . . . 44
3.4.6 ‘Special purpose’ behaviometrics . . . . . . . . . . . . . . . . 44
3.5 Vision of a behavioral cheating detection approach . . . . . . . . . . 48
3.5.1 The angle of attack . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5.2 Behavioral characteristics as the cheating detection unifier . . 50
3.5.3 The detection mechanism . . . . . . . . . . . . . . . . . . . . 50

4 Methodology 53
4.1 My setting and the research method . . . . . . . . . . . . . . . . . . 53
4.2 Validity of a research design . . . . . . . . . . . . . . . . . . . . . . . 55
4.3 Reliability and validity of a measure . . . . . . . . . . . . . . . . . . 56
4.4 Research design and research process . . . . . . . . . . . . . . . . . . 57
4.4.1 Empirical inputs . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4.3 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Analysis and observations 65

5.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1.1 Quantitative molecular level . . . . . . . . . . . . . . . . . . . 65
5.1.2 Qualitative molecular level . . . . . . . . . . . . . . . . . . . 66
5.1.3 Qualitative molar level . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Observation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.1 General highlights . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.2 Session-specific highlights . . . . . . . . . . . . . . . . . . . . 68
5.4 Observation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4.1 General highlights . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4.2 Session-specific highlights . . . . . . . . . . . . . . . . . . . . 70
5.5 Observation 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.5.1 General highlights . . . . . . . . . . . . . . . . . . . . . . . . 73
5.5.2 Session-specific highlights . . . . . . . . . . . . . . . . . . . . 73
5.6 Triangulative analysis remarks . . . . . . . . . . . . . . . . . . . . . 75

6 Results and findings 77

6.1 Behavioral anomaly indication . . . . . . . . . . . . . . . . . . . . . 77
6.2 Indicating cheating . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 Indication difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7 Conclusion and discussions 79

7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 Cheating detection and prevention approach discussion . . . . . . . . 80
7.2.1 Behaviometric aspects . . . . . . . . . . . . . . . . . . . . . . 80
7.2.2 Cheating aspects . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.2.3 Psychological aspects . . . . . . . . . . . . . . . . . . . . . . 83
7.3 Research approach discussion . . . . . . . . . . . . . . . . . . . . . . 84
7.4 Outlooks for further research . . . . . . . . . . . . . . . . . . . . . . 84

Appendices 97

A Subjects of automated observation 99

A.1 Basic structure of the analytics . . . . . . . . . . . . . . . . . . . . . 99
A.2 Keystroke dynamics features . . . . . . . . . . . . . . . . . . . . . . 100
A.3 Mouse dynamics features . . . . . . . . . . . . . . . . . . . . . . . . 100
A.4 Silence dynamics features . . . . . . . . . . . . . . . . . . . . . . . . 100
A.5 Linguistic dynamics features . . . . . . . . . . . . . . . . . . . . . . . 101

B Subjects of manual observation 105

C Questionnaire and observation task content 107

C.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
C.2 Authentic writing and formulating . . . . . . . . . . . . . . . . . . . 108
C.3 Verbatim copying by reading . . . . . . . . . . . . . . . . . . . . . . 109
C.4 Verbatim copying by listening . . . . . . . . . . . . . . . . . . . . . . 109
C.5 Copying by reading and reformulating . . . . . . . . . . . . . . . . . 110
List of Figures

2.1 A cheating-extended model of Ajzen’s theory of planned behavior . . 10

2.2 Model of student cheating decision based on internal (personal) and
external factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Model of cheating causation . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Graphical overview of cheating and counter-cheating relations . . . . 24
2.5 Overview of a cheating and counter-cheating process . . . . . . . . . 25

3.1 A classification example . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Biometric system error rates . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 A typical architecture of a biometric system . . . . . . . . . . . . . . 36
3.4 Fusion of biometric systems . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 The biometric menagerie . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.6 An example process of mouse dynamics analysis . . . . . . . . . . . 43
3.7 Deterrence mechanism of cheating detection . . . . . . . . . . . . . . 49
3.8 Model of the cheating detection approach . . . . . . . . . . . . . . . 52

4.1 Research process overview . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2 The observation design used in the study . . . . . . . . . . . . . . . 60
4.3 The observation process (including questionnaire) . . . . . . . . . . . 62
4.4 Data flow and control relations of the data gathering and analysis
processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7.1 The cheating prevention approach . . . . . . . . . . . . . . . . . . . 82

A.1 Analytics structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

A.2 Context and process of the automated analysis part . . . . . . . . . 100

C.1 Example free diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 109

C.2 Diagram to copy (redraw) . . . . . . . . . . . . . . . . . . . . . . . . 110

List of Tables

2.1 Factors correlated to plagiarism behavior 1 . . . . . . . . . . . . . . 13

2.2 Factors correlated to plagiarism behavior 2 . . . . . . . . . . . . . . 14

iv
3.1 Meta-functions of a computer mediated communication text analysis
framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Text analysis linguistic features 1 . . . . . . . . . . . . . . . . . . . . 45
3.3 Text analysis linguistic features 2 . . . . . . . . . . . . . . . . . . . . 46

4.1 Levels of the predictor variable (PV) . . . . . . . . . . . . . . . . . . 61

7.1 Biometric properties of the approach . . . . . . . . . . . . . . . . . . 81

7.2 Discussion of measure validity . . . . . . . . . . . . . . . . . . . . . . 85

A.1 Explanation of terms used in the description of features . . . . . . . 101

A.2 Keystroke dynamics features . . . . . . . . . . . . . . . . . . . . . . 102
A.3 Mouse dynamics features . . . . . . . . . . . . . . . . . . . . . . . . 103
A.4 Silence dynamics features . . . . . . . . . . . . . . . . . . . . . . . . 104
A.5 Linguistic dynamics features . . . . . . . . . . . . . . . . . . . . . . . 104
Chapter 1

Introduction

Cheating in online examination is an educational problem similarly as it is in conven-

tional examination. Because of its lower detectability, however, universal reputation
of distance degrees suffers. This study strives to explore and verify possibilities of
detecting specific types of cheating based on behavioral measures of computer inter-
action taken during an online examination. Detecting cheating is seen as a way to
preventing it (lowering its extent).
Distance education became an educational field in 1970s and since then it is
gaining popularity in diverse parts of the world (Keegan, 1996; Allen & Seaman,
2005, 2007; Howell et al., 2003). According to Allen & Seaman (2008), nearly 22%
of all higher education enrollments in the United States in year 2007 were online
enrollments. The number was around 4 million and there is still a growing ten-
dency. Moreover, online education is dominantly perceived as critical to long-term
institutional strategy by educational institutions at least across the United States
(Allen & Seaman, 2008). Based on different trends and factors, the interest for
distance education is increasing and expected to increase further (Hawkridge, 1995;
Irele, 2005; Allen & Seaman, 2003). The trends have varied characters, among other
motivational (Parker, 2003; Maguire, 2005; Allen & Seaman, 2008), social, political
and technological (Bates, 1995; Howell et al., 2003). Relatively high future growth
of distance education is expected in developing countries (Koul, 1995).
Distance education is a form of education, in which teachers and class audi-
ence are separated by physical distance and/or by time (Moore & Kearsley, 1996,
chap. 1) as compared to conventional (on-site) education, which is based on face-
to-face meetings and time-synchronous physical presence of students, required by
the technology predominantly employed (Keegan, 1996, chap. 1,2). Following Kee-
gan (1996), distance education and conventional education differ at least in physical
centralization and time-synchronization (accessibility), economics, market, and also
didactics (Reushle & McDonald, 2004; Reushle et al., 1999), administration and
evaluation. Nowadays, not only different universities around the world offer courses
and programs for distance studies, there are whole universities often called ‘open
universities’, which are built on the concept of, and provide solely distance educa-
tion.
As to the process of education, the concept of physical decentralization and
time-asynchronization is also applicable to the process of assessment including ex-
amination (Mason, 1995), since the two are often employed at the same time, or in

1
a mutually successive manner. Distance examination is in different forms used to
validate the level of knowledge, skills or abilities of students/examinees. The most
common distance examination method seems to be online examination, which uses
a network-enabled computer environment (e.g., the Internet) to set up a two-way
communication.
Although dependent on specific environment, while major concerns in distance
education compared to on-site education are mostly related to finding, achieving and
maintaining effective means of teaching/tutoring, learning, student support and ad-
ministration (Holmberg, 1995; Keegan, 1996; Bates, 2005; Kim et al., 2008), the
problems of fairness assurance and trust seem to be often more challenging in on-
line examination compared to traditional/conventional examination means (Rowe,
2004). Public trust and fairness in education including examination is an important
attribute (Rumyantseva, 2005; Heyneman, 2002), yet seemingly tricky to achieve and
maintain (Herberling, 2002). The technology, which on one hand enables distributed
and asynchronous education, opens up a broad range of cheating possibilities within
an examination process on the other hand. Controlling or at least perceiving largely
unknown and distant examination environments as a way to detect and prevent ex-
amination dishonesty seems to be non-trivial. Also as a matter of this fact, distance
education often renders less accepted than conventional (on-site) education (Colum-
baro & Monaghan, 2009; Bourne et al., 2005). In a more general context, Allen &
Seaman (2003) shows that online education is perceived inferior to conventional ed-
ucation, however, near future beliefs (for three years later) show an optimistic turn
in the balance. Around six years later, Columbaro & Monaghan (2009) show that
such beliefs have been and might tend to be too optimistic, since more than 95%
of employers would prefer to accept a traditional degree to an online one in several
different fields according to their study.
Examination cheating and academic dishonesty in general seem to have been an
educational problem since a long time ago (Cizek, 1999). According to UC Berke-
ley (2009), cheating can be defined as “fraud, deceit, or dishonesty in an academic
assignment, or using or attempting to use materials, or assisting others in using
materials, that are prohibited or inappropriate in the context of the academic as-
signment in question” (no page numbering). Students often tend to shortcut achiev-
ing their grades and maintaining their sense of personal integrity otherwise than
through investing adequate amount of effort and time (Diekhoff et al., 1996). Aca-
demic cheating is prevalent and at the same time, it seems to have growing tendency
(Cizek, 1999; Dick et al., 2003; McCabe et al., 2006; Wehman, 2009; Howell et al.,
2009). A study in McCabe et al. (2006) shows that cheating was reported by 56%
business students and 47% non-business students. An earlier McCabe’s study (also
mentioned in the paper) shows that 66% of all students reported at least one serious
cheating incident in the past year, while among engineering students the number
was 72% and business students led with 84%. According to a survey carried out in
the United States, around 94% students reported cheating in any form, around 65%
students reported test cheating and more than 50% students reported plagiarism.
According to Stumber-McEwen et al. (2009), there is a wealth of studies on preva-
lence of cheating available, however, their quantitative results vary greatly based
on the type of survey and specific survey conditions. As to on-site examination,
cheating also applies to distance examination (Underwood, 2006; Wehman, 2009).

2
Different sources perceive the cheating prevalence among on-site and online exam-
ined students differently (Stumber-McEwen et al., 2009; Herberling, 2002; Watson
& Sottile, 2010). Assuming that an online examination environment tends to be less
cheat-constraining and less perceivable by examiners than an on-site one, students
may generally tend to cheat more from distance as also believed by Rowe (2004).
Following an information security approach (Whitman & Mattord, 2008), the
occurrence of online examination cheating as an undesirable activity is a form of
risk, and the higher the cheating severity and probability, the greater the risk con-
trol importance. The ultimate goal of risk control, in this context applied to the
educational field, is to effectively reduce risk related to the educational process.
Effectively reducing risk of online examination cheating is a problem.
There are multiple approaches to controlling online examination cheating (Olt,
2002), many of them suitable in one way or another. The primary approach usable
with the thesis concerns is the ‘police approach’ – monitoring for and reacting on
suspicion or detection, along with deterrence-based cheating demotivation. This ap-
proach is somewhat analogous to a feedback control system (Åström & Murray, 2008,
chap. 1) and within such one needs to first perceive the examination environment
and detect anomalies in order to be able to make effective control actions. Perceiv-
ing a distant online examination environment and effectively detecting cheating is a
problem.

1.1 Topic
The topic of this thesis is to explore and verify possibilities of detecting specific types
of online examination cheating based on behavioral measures of human-computer
interaction. More specifically, the focus lies on utilizing behaviometrics (behavioral
biometrics) for the analysis of keystroke, mouse and linguistic dynamics.
The primary motivation for this study is to enable or help faculties to both
(1) fight the prevalent and rather invisible online examination cheating, and to (2)
indirectly increase the acceptance of online grades.
By content, this study is focusing on the use of behaviometrics (work with
keystroke, mouse and linguistic dynamics) based on information technology and
machine learning (software, pattern recognition, anomaly detection, visualization)
for detecting examination cheating (an educational concern).

1.2 Research goals and delimitation

The goal of the research endeavor is not to enable one to exactly tell whether a
student cheats or not. According to the character of a probabilistic analysis process
and the variety of input data to it in context and time frame of this thesis, such a goal
would render extremely difficult to achieve to me. Instead, I consider the following
information to be both useful and realistic to indicate based on the measures of
human-computer interaction (keystroke and mouse events with their timings, and
linguistic features from keystrokes):

1. Histogrammatically displayed extent of behavioral anomaly compared to a be-

havioral baseline

3
2. Histogrammatically displayed amount of stress

3. Histogrammatically displayed probability of cheating together with the type of

possible cheating activity (e.g. copying by reading, listening, etc.) for each
suspicious segment of behavior during the examination session time

Being able to effectively and in a highly automated way provide the above about the
target population (described below) is the research vision (in a longer term). The
goal of this study, however, is to approach this vision with focus on the first and the
third point.
The target population to which the research goal relates are distance students, a
great part of whose might be employed adults (Paulsen & Rekkedal, 2001), mostly
aged between 25 to 40 years. The rest of the target group might be graduate students
aged mostly between 20 and 30 years. The age ranges used are assumptive and they
constitute a part of the study’s delimitation.
The following are research questions, answering which I expect to contribute to
achieving the research goal:

RQ-1 What are the behavioral signs of tasks carried out when cheating that
manifest themselves on keystroke, mouse and linguistic dynamics of the
user’s computer interaction during a computer-based examination?

RQ-2 How distinct is normal behavior from a cheating behavior and how dis-
tinct are different types of cheating behavior from each other?

The following are delimitation statements for this study: (1) A small number of
participants of online examination simulations (observations) are selected based on
convenience, instead of careful alignment to the target population. (2) No special
equipment such as skin humidity, body temperature or heartbeat sensors is used
within the study. (3) Automation of the whole cheating detection process from
gathering inputs to seeing indications of cheating type and amount itself, is not a
part of the study.

1.3 Significance of the study

The contribution the study aims at is to identify and prototype a new approach to de-
tecting computer-based examination cheating using such behaviometric techniques,
operation of which does not depend on the availability of any student-uncommon
hardware1 . No previous work known to the author has been done within this topic
at the time of writing.
Being able to reliably detect and perceive examination cheating could help assur-
ing examination fairness, promoting academic integrity, and hence be a step towards
shifting motivation of many students from cheating to seemingly more strategically
valuable personal efforts.
1
Assuming that in order to carry out an online examination, at least a computer with keyboard
and mouse is available to the student.

4
1.4 Document structure
After having introduced the topic, drawn the research goal, questions and delimi-
tation statements in the introduction chapter, the document describes the problem
background and parts of the state of the art in the background chapter. The chapter
conceptual framework contains description of core theories and concepts applied in
the study. The research method and its details are described as next in the method
chapter. Observations useful to know before analysis, are described in the chapter
named respectively. Findings of the study are summarized in the results and findings
chapter. Finally, the whole research is summarized and concluded in the conclusion
chapter, and different questions are additionally discussed from the author’s points
of view.

5
6
Chapter 2

Background

This chapter summarizes some cheating-related background and the state of the art
in relation to the research problem and the approaches chosen to solve it.

2.1 Examination cheating

This section tries to outline cheating from different perspectives and answer to a
couple of questions, which could arise with regards to cheating. Firstly, cheating is
described and some reasons for its negative consideration are given. Subsequently,
this section looks at how cheating is done, how can one detect it, why students
cheat and how can one prevent it. Finally, what was stated about cheating so far
is summarized according to my apprehension, and a description on what distance
operation can change regarding cheating is drawn.
As described earlier, examination cheating is a highly prevalent matter. Wehman
(2009) provides an extensive summary of self-reported academic dishonesty identified
by a number of scholars through a period of sixteen years – from 1992 to 2008.
Before going deeper toward consequences, forms, reasons, and ways of detect-
ing and preventing cheating, let us attempt to define or at least characterize what
cheating is. Based on the fact that the threshold of what is considered as cheating
depends on course-specific context and is therefore variable, Dick et al. (2003) try
to define cheating in a way, which overcomes this problem:

A behavior may be defined as cheating if [at least] one of the two following
questions can be answered in the positive:

• Does the behavior violate the rules that have been set for the assessment
task?
• Does the behavior violate the accepted standard of student behavior at
the institution?

(Dick et al., 2003, p. 172)

Although the second question asked by Dick et al. uses the term ‘accepted standard
of student behavior’, practical image of which looks rather informal and fuzzy, the
definition seems to reflect the perception of cheating pretty well in general – and
also in the fuzziness, on the other hand. As said afterwards regarding the previous,

7
in both cases, this assumes that the accepted rules and standards have been
clearly laid out for students. (Dick et al., 2003, p. 172)

Facing the reality, this might not be the case in many academic environments,
though. Another problem with the definition is that technically breaking the rules or
such standard might also be inadvertent (unintentional), or too trivial, so it rather
becomes perceived as poor learning behavior instead of cheating.
Severity is an important parameter of cheating, especially in responding to cheat-
ing or handling it otherwise. Dick et al. (2003) proposes a number of factors to
consider regarding cheating severity (seriousness):

• The presence of deception (deceptive intention) in achieving an unfair advan-

tage.

• The presence of direct harm to some other person by the cheating behavior.

• Course-relative value of the assessment task, on which the cheating was present.

• Width of the cheating scope (1-2 students, or more?)

• The presence of criminal behavior within the cheating behavior.

• The cheater’s learning outcome achievement (as inversely related).

Dick et al. (2003) uses the term ‘management of cheating’ as an organizational

process with three stages as follows:

1. Cheating preemption stage trying to reduce cheating incidence within courses

by e.g. design of academic integrity policy and programs, culture, examination
environment, assessment etc.

2. Cheating detection stage trying to detect student cheating by e.g. examining

turned in assignments and student behavior

3. Cheating response stage trying to reactively respond to detected student cheat-

ing

2.1.1 What’s wrong with cheating?

Although the answer to this question might in its simple form sound pretty obvious,
let us try to look for a broader and more explicit answer. Taken very generally,
academic dishonesty including examination cheating would perhaps not be a sound
problem unless it had some serious consequences within the society.

Cheating is an important issue that needs to be considered for two main reasons.
The first reason is that students who cheat are likely to have not achieved
competence in a variety of skills that will be necessary for them to use in their
profession. Graduating incompetent professionals is likely to cause:

• Damage to society, as incompetent professionals may produce work that

fails or is even dangerous to human life.
• Damage to the profession, as every professional represents the profession
to the wider community and any incompetence will reflect badly on it.

8
• Damage to the reputation of the institution as employers realise that the
graduates from an institution are sometimes or are often incompetent.
• Damage to the reputation of the degree for the same reason.

The second reason that cheating is an important issue for academics is the harm
it causes to individual students. It

• Harms the educational environment for all students as academics must

spend time and energy controlling cheating that could be better utilized
on enhancing positive learning
• Harms the cheating student by their loss of learning and leaves them
unprepared for their profession when they graduate
• Harms their fellow students who do not cheat as the cheating student
gains an unfair advantage over them. In an environment where grades are
important for scholarships and future employment, this can have serious
consequences.

(Dick et al., 2003, p. 173)

Besides that, cheating can pose a greater risk to the ones who cheated and were
detected:
The student learns little when the opportunity to learn is ignored, the gratifica-
tion of creating something that he or she distinctly owns is lost, and if discovered
by others, the career of the student could be ruined depending upon the con-
text and seriousness of the offense (Whitley & Keith-Spiegel, 2001). (Wehman,
2009, p. 12)

Regarding student recommendation of examination process (presumably influ-

encing course and degree reputation), Shen et al. (2004) carried out a field exper-
iment on 114 students showing that perceived examination fairness is positively
correlated to ethical recommendations about the examination process.
Digging a bit into cheating dynamics and following Albert Bandura’s social cog-
nitive theory (Bandura, 1991, 2002), especially in terms of social referential com-
parison, not only single acts of cheating are harmful. It is also the forming effect
of making the perception of cheating more common in surrounding human environ-
ment, which pushes the thresholds of social acceptability at people and hence, aids
further establishment and spreading of the ‘cheating culture’ (McCabe et al., 2006;
Megehee & Spake, 2008). Within such a culture, cheating even forms itself away
from being perceived as dishonest (Cizek, 1999).
Summarized and perhaps a bit extended, too, Whitley & Keith-Spiegel (2002)
states that a faculty should be concerned about academic dishonesty because of
the following eight issues: (1) Equity, (2) character development, (3) the mission
to transfer knowledge, (4) student morale, (5) faculty morale, (6) students’ future
behavior, (7) reputation of the institution, and (8) public confidence in higher edu-
cation.

2.1.2 Why do students cheat?

The question why people cheat seems to be important to answer for a discussion of
detection as a form of cheating prevention. Although a deep and thorough answer

9
Figure 2.1: Model of Ajzen’s (1991) Theory of Planned Behavior extended by Stone et al.
(2009)

to the question is beyond the limits of the thesis focus, this part provides a more
general and somewhat more near-the-surface answer instead.
From a pragmatic perspective and according to Ajzen’s (1991) Theory of Planned
Behavior (TPB) extended by Stone et al. (2009) (outlined on figure 2.1), people
intend to cheat and perform it according to three components – (1) beliefs about
cheating and its outcomes, (2) perceived normative acceptability of cheating, and
(3) the ability (or difficulty) to cheat and remain undetected (thus unpunished).
Although the theory describes an ‘internal cheating control mechanism’, it does not
explain what are the incentives for considering a cheating behavior at all. For the
needs of this study, let us simply assume the following:

• Cheating is one of the strategies to achieve perceived goals with taking an

examination, which could in most cases be the achievement of a score (grade)
good enough at a cost low enough; certainly not the only one, though.

• Within the process of decision making, people (students) choose strategies

based on perceived feasibility in terms of costs (e.g. invested time, effort,
money or mood, risk to undertake etc.) and benefits (e.g. receiving better
examination score, receiving more social acceptance from certain peers, etc).

For a deeper insight towards more ‘under the hood’ relations between student goals,
motivation and expectancy, one can refer to Covington (2000) and Eccles & Wigfield
(2002).
From a different perspective, Lawrence Hinman’s words say:
People with integrity not only refrain from cheating, but don’t want to cheat.
[...] People with integrity have a sense of wholeness, of who they are, that
eliminates the desire to pretend – through cheating, through plagiarizing, and
the like – that they are someone else. For them, signing their name to something

10
Figure 2.2: Model of student cheating decision based on internal (personal) and external
factors based on Dick et al. (2003)

signifies that it is theirs. They would not want to pass something off as their
own. (Hinman, 1997, no page numbering)
People with integrity also have a clear vision of what is right and what is wrong.
Their world is not the murky world of thoughtless and easygoing relativism, but
a world that is sharply illuminated by the light of their vision of goodness. And
added to this clarity of vision is the strength of will to act of the basis of that
vision. They see what is right, and they stand up for it, even when the personal
cost is high. (Hinman, 1997, no page numbering)

Dick et al. (2003) identified four reasons based on which a student may decide
to cheat: Sensitivity as the ability to interpret a moral situation, judgement as the
ability to determine if a certain action is correct or not, [self-]motivation as the
influence of internal values, character as the ability to resist pressures to perform
an immoral act.
As an extension to the previous model, Dick et al. provide a model of student
cheating decision based on internal factors (‘personal domain’) and the external ones
as shown in figure 2.2. Technology is in this context seen as the enabler of different
possibilities, cheating among other. Societal context refers to e.g. influence of a
student’s peer group, family, media, role models, culture, etc. Situational context
may include e.g. heavy or irrelevant course load, inadequate teaching, difficult as-
signments, lack of environment control from the examiners or proctors, some sort
of dependence on passing the examination, etc. Demographic factors including age,
gender, marital status, socioeconomic status, ethnicity, religiosity. (Dick et al., 2003)
Diekhoff et al. (1996), O’Leary (1999) and McCabe et al. (2006) discuss rela-
tionships between cheating and cheater properties such as age, gender, cultural,
educational or professional background, etc. For instance in environments where
words are perceived as ‘belonging to society’ more than ‘belonging to individual’,
cheating tends to be perceived as more acceptable and hence, more commonplace
(O’Leary, 1999).
The importance of performing well on examination, and hence increased fear-

11
based cheating pressure evoked by conditions with high student population and
grading strongly affecting an individual’s future career, also tends to result in higher
cheating rates among students (Howell et al., 2009). Opposed to that, dominantly
intrinsically motivated students (those with dominant mastery goal orientation),
show less cheating behavior than their dominantly performance goal oriented or
dominantly neutral peers (Rettinger & Kramer, 2008).
According to Whitley & Keith-Spiegel (2002), there are five norms, which are
usually not perceived as academically dishonest by students: (1) Students may study
from old tests without explicit permission (as long as the tests are not stolen),
(2) taking shortcuts such as reading condensed books, listing unread sources in
bibliography, and faking lab reports is permissible, (3) unauthorized collaboration
with others is fine, especially when helping friends, (4) some forms of plagiarism such
as omitting sources and using direct quotations without citation are acceptable, (5)
conning teachers by faking excuses for missing deadlines and so, is permissible. Such
misconceptions make students more leaned towards the respective cheating without
realizing the seriousness of it.
On top of that, Wehman (2009) has identified that fear of negative teacher
evaluations and student morals and habits back from years ago are topics related to
the cheating problem.
Students often know that they are conducting an immoral activity when cheating.
As summarized by Whitley & Keith-Spiegel (2002) and corresponding with TPB,
theory of cognitive dissonance (Aronson, 1969) and neutralization theory (Harris
& Dumas, 2009), students’ justifications for academic dishonesty (seemingly being
applicable to any kind of consciously immoral activity in general) can include denial
of injury (‘it doesn’t hurt anyone’), denial of personal responsibility (‘I got sick and
couldn’t read the stuff’), denial of personal risk (‘they can’t punish me anyhow’),
selective morality (‘I only cheat to pass the classes’, or ‘friends come first, they
needed help’), trivializing (minimizing seriousness) (‘the assignment has a little
weight in final grade’), a necessary act (‘if I don’t do well, my parents will kill me’),
and dishonesty as a norm (‘everyone does it’).
Another argument placing cheating into a more acceptable light is that cheating,
and more specifically plagiarism, versus collaborative spreading of knowledge, seem
to be a bit conflictive and fuzzy in borders:
There is a certain unambiguity about when ‘collaborating in learning commu-
nity to extend knowledge and understanding’ stops and ‘submitting only your
own work’ starts. (Le Heron, 2001, p. 3?)

Le Heron (2001) also says the following regarding student expectations:

Student expectation is that their study will qualify them for a high paying job.
Many are mature students re-training and in order to re-join the workforce
quickly they often take more papers than they can cope with. Some students
have the expectation that will pass because they have paid reasonably high fees.
(Le Heron, 2001, p. 245)

Extensively interviewing six first-year master’s students from three different pro-
grams at a university, Love & Simmons (1998) identified a set of factors correlated
to plagiarism behavior, which are divided into several groups based on character of
the factors: mediation character (inhibiting vs. contributing), factor type (internal

12
Mediation Type Effect Factor
Personal confidence
Positive professional ethics
Positive Fairness to authors
Internal Desire to work or learn
Fairness to others
Fear of detection consequences
Negative
Inhibiting Guilt
Professors’ knowledge
Probability of being caught
Time pressure
External –
Cheating perceived as dangerous
Type of work required
Need for knowledge in the future
Negative personal attitudes
Internal – Lack of awareness
Contributing Lack of competence
{Grade, time, task} pressure
External –
Professor leniency

Table 2.1: Factors correlated to plagiarism behavior according to Love & Simmons (1998)

vs. external), and emotional effect (positive vs. negative). Those are summarized
in table 2.1. The set of factors is further extended by theoretical summary of Olt
(2007) and Megehee & Spake (2008) as summarized on table 2.2 according to the
apprehension of the author of this thesis. Although the authors focus on plagiarism
behavior, the results seem to have partial relevance to cheating in general.
As an addition to the tables, Iyer & Eastman (2008) found that perceptions
of low social desirability at students are directly correlated to the amount of their
cheating behavior.
In form of an extended application of TPB, figure 2.3 graphically summarizes
causes of cheating and the expected benefits as one of cheating factor groups.

13
Mediation Type Factor
Academic achievement
Inhibiting Internal
Age
Difficulty seeing marks of plagiarism
Disorganization
Cryptomnesia
Fear of failure
Procrastination and laziness
Internal
Sense of alienation
Thrill seeking
Social activities
Cheating rationalization
Absenteeism
Unrealistic assignments
Ambivalence of faculty and administration
Benefits outweigh risks
Competition (jobs and graduate school)
Contributing
Devaluing assignment by the instructor
Ethical lapses
Information overload
Institution’s subscriptions to market ideologies
External Instructor bad example
Prominent bad examples
Opportunity
Peer observation
Social networking
Instructors’ failure to keep pace with tech. advances
Instructors’ failure to rotate curriculum
Instructors’ lenience
Lack of trust between student and instructor
Previous cheating experience
Cultural background
Gender
Internal
Marital status
–
Major
Student perception of instructor
External
Testing environment

Table 2.2: My apprehension of factors correlated to plagiarism behavior according to Olt

(2007) and Megehee & Spake (2008) – those additional to table 2.1

14
15
Figure 2.3: Model of cheating causation (inspired by Whitley & Keith-Spiegel, 2002)
Within an analogy between cheaters in the educational field and attackers in
the field of information security, as there are different types of attackers, there
might be similarly different types of cheaters. According to Whitman & Mattord
(2007), attackers have different motivations to intrude such as personal and social
status, the thrill of doing it, revenge, financial gain, ideology, industrial espionage,
etc. Attempting to draw an analogy, cheaters might also cheat for different reasons
such as a notion of personal gain (grades or other academic credit, personal or
social status), providing oneself an additional layer of failure protection (although
a forbidden one), to accommodate oneself with a social environment, or simply
possessing a habit of cheating.
Although students are mostly believed to cheat for grades Cizek (1999), views
and experiences on it may slightly differ, e.g. that cheaters mostly just want to pass
a course or an examination (Le Heron, 2001).
To sum up this section, it seems that there won’t be any existential emergency
for cheating intentions among students at least as long as we use the kinds of school
systems we use today. That could mean a very long time in the future we’ll have
to keep combating cheating in one way or another. Besides, there are a number of
cheating correlates, which might make cheating a clue or a signal directed toward
improving other educational issues at an institution.

2.1.3 The mission: preventing cheating

In the history of education and assessment, a number of cheating prevention methods
have taken place. Each of those can be categorized according to its prevention
approaches or strategies.
Lawrence Hinman in Olt (2002) has identified three approaches to minimizing
cheating:

• The virtues approach seeking to promote students’ intrinsic motivation (self-

motivation) to be honest and learn instead of cheating. It is a promotion-based
and deeply positively oriented approach.

• The prevention approach seeking to eliminate or reduce cheating opportuni-

ties and suppress elements of ‘cheating culture’. This is a neutrally oriented
approach – not promoting, not deterring, just reducing the time-space to cheat.

• The police approach seeking to detect and punish cheating in reaction to it.
This approach is based on punishment and deterrence (as described and dis-
cussed by e.g. Carlsmith et al., 2002) – in other words, the ‘big brother’ style.

Inspired by the risk management terminology of Whitman & Mattord (2008), all of
the approaches can be seen as a form of cheating avoidance, the last one perhaps
also being partially mitigative.
Similarly, Olt (2002) has identified four basic strategies for minimizing academic
dishonesty in online assessment. For the sake of more clarity, I assigned names to
those (in italics):

1. Environment control strategy. This strategy focuses on acknowledging the dis-

advantages of online assessment and finding ways to overcome them through

16
technical and operational means of perceiving and/or controlling the exami-
nation environment.

2. Hardened assessment design strategy. This strategy focuses on effectively de-

signing online assessment and the assignments (questions) in order to reduce
the cheating-proneness.

3. Unique assignment strategy. This strategy focuses on the uniqueness of assign-

ments or rather the ‘correct’ answers to them by e.g. rotating or modifying
the curriculum, so that e.g. sharing graded assignments or exams does not
help cheaters much.

4. Integrity policy strategy. This strategy focuses on providing students with an

academic integrity policy in order to promote an integral environment (free
from cheating).
In the field of combating plagiarism, Usick (2004) in Olt (2007) has created a
plagiarism prevention model called Three-R’s model, which stands for (1) respect
between instructor and student towards each other and the academic discipline, (2)
relevancy in linking together the course matter with the real world matter in a
student’s perception, and (3) refresh-ing of the integrity policy awareness.
Within the area of information systems teaching, Le Heron (2001) tried to iden-
tify countermeasures to cheating in plain paper-based examination. Those are: Oral
explanation of skills in addition to writing a paper, oline performance consistency
test in addition to writing a paper, online skills test only. In context of LeHeron’s
research, online skills test has rendered most effective in terms of all cheating detec-
tion, provability and student skills verification, although it is potentially possible for
a student to reuse a work completed by someone else at an earlier session. Another
point is that online testing poses additional administration requirements such as reg-
istering and identifying students, marking procedures, and securing the examination
process (Le Heron, 2001).
Howell et al. (2009) has identified a number of ways used to combat cheating:
• The ‘Honor System’, which builds on creating a honest and cheating-resistant
atmosphere and culture.

• Banning or controlling electronic devices.

• Photo and/or government identification.

• Physical biometric scanning such as fingerprinting and palm vein scanning.

• Commercial security systems such as web camera or 360-degree camera surveil-

lance systems, behaviometric (behavioral biometric) authentication and iden-
tification systems, systems based on asking and getting answers to personal
questions gathered from a database.

• Cheat-resistant computers using a highly restrictive computer environment,

which allows students to more or less only write the exam using the computer.

• Lawsuits to fight companies and websites providing braindumps (answer sheets

in different forms), which is an approach mostly used by large corporations.

17
• Computer-adaptive testing and randomized testing. Instead of having the same
variant of test for each examinee, rest of the test varies based on how one has
answered the answered questions.

• Statistical analysis [-based detection]. This includes different types of statistical

analysis and forensic methods, among other behaviometrics used differently
than for plain authorization or identification.

Following Cizek (1999), Rowe (2004) and Deubel (2003), there are a few more
cheating fighting ideas as e.g.:

• Knowing the writing style of students before examining them to be able to

easier detect diction or writing style anomalies.

• Planning for unexpected matters, which can occur when using information
technology, or simply examination operation in general. For instance, a student
computer may crash, or may be taken down intentionally. Similarly, students
may ask for using a bathroom or having a drink or a snack innocently, or in
an attempt to realize fraudulent intentions such as cheating.

• Entrapment such as trying to plant fake tests in locations, where curious people
searching exam questions or answers are likely to find those. It is an analogy
to ‘honeypots’ in network security as discussed in Whitman & Mattord (2007).
This method applied to education, however, seems to lay over the border of
professional ethics.

From a faculty-defensive point of view, Whitley & Keith-Spiegel (2002) in Wehman

(2009) list three conditions, which can make a faculty liable for student harm – if a
faculty member (1) makes a malicious false accusation, (2) discusses a cheating case
and uses a student’s name together with individuals not involved in the case reso-
lution, and (3) violates a student’s right to due process by ignoring the institution’s
procedures for resolving academically dishonest accusations. Wehman (2009) also
identified reasons why faculty personnel does not always take action in response to
detecting cheating, the latter the less frequent: being aware that nothing would have
prevented the faculty from acting, being afraid of inability to prove the case, stu-
dent denial of the incident, it would be too time consuming to pursue, being afraid
of law suits, having feared hassle faced from administration, student negotiated a
good excuse, being lazy, being afraid that management skills would be perceived
as lacking, knowing that student was making decent progress in the course, being
afraid of student violence, being afraid of damaging relationship with the student,
and identified cheating after a grade was given to the student. That is to say that
there are a lot of hinders in proceeding from cheating detection to reaction for those
which have the interest or responsibility to do so.
Finally, Dick et al. provide a recommendation:

“An ounce of prevention is worth pound of cure” – deterring cheating is far

more effective than detecting and punishing cheating due to the costly nature
of formal responses to cheating, so academic should focus their time and energy
on pre-empting cheating rather than detecting cheating. (Dick et al., 2003, p.
182)

18
In conclusion, there seems to be quite a number of different means to fight
cheating, however and as seemingly generally valid, no silver bullets that simply
‘fix it all alone’. According to what was summarized, an educational institution
needs to employ a broad range of approaches and methods to be effective in this
process. Omitting one or more approaches as e.g. focusing on detection, reaction
and deterrence only, while not cheat-proofing the environment and/or building an
integral culture, might not work very well, especially in the long run. Although
this study primarily aims at ‘the police approach’, this section was also meant to
mention that this approach needs some complementary support, since it is itself too
incomplete to rely on as the only one.

2.1.4 How do students cheat?

In my point of view, to answer the question how does cheating occur is required in
order for cheating detection methods to be developed. Since providing a compre-
hensive list of cheating methods would be vast, yet not directly useful for the study,
this section tries to categorize the methods by their operational similarity, instead.
First of all, the word examination might be mostly associated with a typical long
and extensive individually written examination at the end of a course. There are,
however, different types of examination, or assessment in general. Kim et al. (2008)
lists several from available literature (described below) and compares their usage
at three different programs at a university. According to assessment type, there
is formative assessment (assessment of learning experience progress; continuous,
ongoing assessment and feedback), and summative assessment (measuring learning
at the end of the process; traditional tests). Besides, assessment can be catego-
rized based on individuality to individual assessment (e.g. personal assessment,
self-assessment), and team assessment (e.g. assessment in collaborative learning).
According to assessment instrument or method, there are paper or essay (e.g.
student papers and reports), exam, quiz or problem set (e.g. conventional tests,
proctored testing, midterm and final exams, self-tests), discussion or chat (e.g. on-
line discussion, chat or e-mail), project, simulation or case study (e.g. authentic
assessment, collaborative projects, case studies), reflection (e.g. meta-cognitive es-
say), portfolio (e.g. electronic portfolio, portfolio essay), peer evaluations. These
are different types of examination, presumably each prone to cheating in one way
or another, all to different extent.
Rowe (2004) identifies three main categories of cheating problems: (1) getting
assessment answers in advance, (2) unfair retaking of assessments, and (3) getting
unauthorized help during assessment. Those can be further broken into slightly more
items for the sake of being more specific. Inspired by Airasian (2001), Cizek (1999),
Stumber-McEwen et al. (2009), Howell et al. (2009), Rowe (2004), Dick et al. (2003)
and Faucher & Caves (2009), the following categories of examination cheating can
be identified. Those are, however, still rather general, as chosen for the purpose of
this study:

• Using physical resources to cheat. This can occur in form of reading own or
others’ crib, desk or hand notes, papers, books, pieces of clothing or tissues,
looking at other students’ work, or using steganographic methods (e.g. ultra-
violet light) to extract notes or other data protected respectively.

19
• Using electronic resources to cheat. For example, using resources as notes,
papers, e-books, web sites, old student work or old answer sheets from a com-
puter network, computer, telephone or other electronic medium, which are not
allowed to use.

• Using communication, which is not allowed. An example is talking to peers,

listening to someone online or using a radio device, or other exchange of signals
with peers and anyone else besides the examiners and/or persons, with whom
it is allowed in a specific way. Even talking to an examiner asking details
about a question as it was unclear, in order to get more information to figure
out an answer to that one or some other question is, in fact, also cheating.

• Using unauthorized intelligence such as obtaining answers or examination ques-

tions in advance.

• Impersonation, which means using someone else to take parts or whole exam-
ination instead of the authentic person.

• Fabrication of facts or measurements such as misreporting error of measure-

ment, etc.

• Corrupting examination integrity such as changing answers when teachers al-

low students to grade each other’s tests, or unauthorized access to the tests
between being taken and being graded.

• Process-level tricks such as using deceptive excuses, or unfair retaking of ex-

ams, and hence, training oneself for specific type of questions instead of ade-
quately learning the study matter.

• Social engineering such as grade negotiation through exploitation of personal

sympathy etc.

• Organized cheating and faculty personnel corruption such as bribing examiners,

proctors, illegal infiltration of the grading process and other types serious fraud
(it is in fact also a form of examination or grading process integrity corruption).

• Plagiarism, which means using parts of someone else’s work without giving
adequate credit.

To sum up, there has been a number of different cheating categories identified
across the existing literature. Some of the categories cover tens or perhaps hundreds
of specific cheating methods. Information about those together with fairly advanced
cheating tactics can be read in Cizek (1999, chap. 3). For the purpose of this
study, however, describing those detailed seems to be marginally important, since
new technologies are being invented, and cheaters keep on modifying the existing
ways to cheat and finding new ones all the time.

Methods used to cheat on tests are like snowflakes: There is an infinite number
of possibilities. The possibilities are, however, related to the type of testing
being considered. (Cizek, 1999, p. 37)

20
Many forms of ‘exam-time’ cheating seem to have a common denominator – ob-
taining information from disallowed sources to give correct answers without having
learned the subject matter (reading, hearing, etc.), or letting someone else answer
instead of the authentic person. The rest of cheating types seems to require longer
time or other than exam conditions to set up, and hence, it is of marginal interest
for this study.

2.1.5 Detecting cheating as a means of prevention

One of the strategies in the mission of preventing cheating is deterrence through
detection and response. As discussed previously, in order to respond, one must
detect first. There are a number of approaches to detection of different kinds of
cheating. The following list tries to identify those based on existing literature such
as Cizek (1999), Howell et al. (2009) or Rowe (2004):

• Checking for identity that it is the authentic person who is being examined.

• Checking for forbidden tools such as crib notes, electronic devices, etc.

• Using examination proctors, who manually observe an examination environ-

ment. For more completeness, those can also be undercover proctors acting as
being examinees or indifferent individuals during an examination.

• Automated surveillance systems, which in different ways monitor students dur-

ing examination.

• Plagiarism detection systems and Internet searches, which try to detect collu-
sion between students, cut-and-paste plagiarism, and the usage of paper mills
(old paper databases) by e.g. searching in those and searching the Internet
for similar texts among everything freely accessible and indexed by the search
engines (such as Google).

• Statistical analysis methods, most of which analyze parameters of student re-

sponses to examination assignments or questions and the similarities of those in
a group. Besides that, statistical methods can also address different measures
of human behavior.

• Possibly also auditing and intra-organizational intelligence 1 , which can be used

for combating e.g. personnel corruption.

Similarly, Harris (2009) identified some strategies of detecting plagiarism: Look-

ing for clues, knowing the possible sources of a suspect paper and/or searching for the
paper online, using a plagiarism detector system, which can automate the previous.
Further regarding the clues, the following examples are mentioned: Mixed citation
styles, lack of references or quotations, inconsistent formatting, off topic elements,
signs of datedness such as lack of recent sources from a certain year, anachronisms
such as referring to long past events such as they were current or recent, anomalies
and inconsistencies in style (vocabulary usage, rhetorical structure, punctuation,
spelling, layout, etc.), and ‘smoking guns’ such as e.g. text (“Thank you for using
1
This item has been added by the author and it is not mentioned in the literature cited herein

21
TermPaperMania”), inconsistently embedded links (URLs) and other forms of direct
and apparent plagiarism evidence (Harris, 2009).
Additionally, University of Alberta Libraries (2009) identifies a clue that if a sub-
mitted paper exceeds student’s research or writing capabilities, or has an anomalous
tone (too professional, journalistic or scholarly), or simply somehow largely exceeds
expectations from the student, it might signalize plagiarism or some other form of
cheating.
Within cheating detection based on personal vigilance, Dick et al. identify tech-
niques as careful scrutiny, eye inspection, hand analysis, observation, and pattern
spotting. Three comparisons commonly made are

(1) across the students looking for similarities of submissions, (2) within an
individual assessment looking for changes in style or unusual ideas, (3) with
previous work by the same student looking for dramatic changes in quality.
(Dick et al., 2003, p. 181)

As an important note and also relevant to this study, Cizek (1999) points out the
difficulty and pitfalls of taking probabilistic evidence as sufficient to prove cheating.
Although the class of statistical cheating detection methods seems to be the most
promising regarding power and availability, the methods may function rather as an
indicator and deterrent than a tool providing strong evidence alone. Another fact is
that Cizek focused on statistical methods of analyzing examination answers, which
do not take eventual measures during the examination process, building on such
assumptions as e.g. that the methods cannot detect use of cheat sheets (crib notes),
impersonation, electronic communication, etc. In contrast, this study is hoping to
show the opposite.

2.1.6 Cheating review summary

This part tries to summarize what was reviewed about examination cheating in this
section so far and how it is perceived by the author. It is complemented by figures
2.4 and 2.5.
In the very narrow goal context of attaining an examination pass (or a score
high enough), cheating simply renders as a highly viable strategy. As such, it is
probably often going to be chosen by students as well as its ‘high performance’ is
probably often going to be confronted with the ideals of morality, ethics, principles
of academic integrity and productivity at both personal and societal level. Those
seem to be facts one can not do much about. On the other hand, within cheating
prevention in terms of its preemption, one can try to change the parameters of some
student decision making processes by e.g.:

• Strengthening the ideals of morality and ethics, or the perception of academic

integrity principles so that they outweigh cheating incentives in the process
of cheating consideration. An example way to accomplish this is the use of
academic integrity programs.

• Broadening the perceived goal context by e.g. making students understand why
and how it is beneficial for them to all (1) learn the study matter properly, (2)
not getting caught cheating because of its probable consequences, and (3) not

22
contributing to spreading of the cheating culture. This can also be a goal of
an academic integrity program.

• Limiting both challenges and possibilities of cheating by e.g. course, assess-

ment and assessment environment design optimization. The optimization itself
seems to be a non-trivial task, which needs to address a number of different
relations between (1) cheating incentives, (2) their “factories” inside a student
mind and (3) the extrinsic arousal of those.

• Increasing the risk (penalty and probability) of being caught upon cheating by
e.g. hardening consequences of being detected cheating and increasing cheating
detection capabilities.

Cheating itself can occur in a number of forms. Also thanks to the generally de-
sired and deeply valued student inventiveness, the forms cheating effectively change
over time, which makes it both costly and inefficient to address detection and pre-
vention of narrow cheating form groups one by one. Moreover, doing so can make the
counter-cheaters at best a couple of steps behind the cheaters. Regarding cheating
detection, there are efforts to develop more effective methods capable of detecting a
broader and more general range of cheating forms, i.e. through applying automated
statistical analysis to different measures of human behavior.
Last and not least, in the ways of both detecting and preventing cheating, there
are hinders and limits of different kind – ranging from misalignment between the
counter-cheating and administrative, through fear from reporting cheating, up to
political unsuitability of e.g. cheating detection methods.

23
Figure 2.4: Graphical overview of cheating and counter-cheating relations

24
25
Figure 2.5: Overview of a cheating and counter-cheating process
2.2 Specifics of distance operation
There is no doubt about the great accessibility advantages and freedom in the choice
of study tempo the concept of distance work provides. On the other hand and within
some reflection, the distance mode of operation could affect at least the following
aspects compared to the conventional one:
• The study/examination environment and the student perception of it. A dif-
ference between on-site and distance study/examination environment seems
to be apparent. On-site students can attend school sessions together with
peers in an environment with a strongly academic feel, walking or travel to
school, attend lectures seeing peers and lecturers, and often feel as being a
part of a student community sharing similar goals together with others who
are physically near. One can have a lunch and talk to peers, study together
and cooperate on assignments face to face, etc. Distance students attend
school sessions from behind a computer screen, seeing and hearing peers and
lecturers on a videoconferencing tool, reading course matters from a remote
learning management system and rather seldom having a computer-mediated
peer discussion (Paulsen, 2001), perhaps physically alone for most of the time.
Independently from whether one is in some ways superior or inferior to the
other, there are certainly many differences between how an on-site student and
a distance student can perceive and feel about their studies. Similarly the dif-
ference seems to apply to the examination process. Sitting in a controlled room
with an adequate surveillance feels certainly different from sitting in one’s of-
fice or living room having a microphone and webcamera with a constant and
limited angle of sight on.

• The possibilities and capacities of communication channels among students

and between students and teachers. One can surely e-mail or call a peer or a
teacher independently from whether one is an on-site or a distance student.
The difference might come if one wants to discuss a topic face to face, simply
because it could under some circumstances be more effective. (Paulsen, 2001;
Stumber-McEwen et al., 2009) A question is if a videoconferencing tool can
be a sufficient replacement for a physical meeting (see Media Synchronicity
Theory by Dennis & Valacich, 1999) for all types of students not only in terms
of plain words being said, but also how they are being said and heard, how both
communication parts perceive the atmosphere, how close do they feel toward
each other as people, and more in general, what is the overall enjoyability
of such meeting compared to a physical meeting, not forgetting a range of
motivational factors and cheating correlates possibly involved - such as those
mentioned earlier in the text (e.g. tables 2.1 and 2.2). Paulsen (2001) presents
an empirical study about distance student perspectives in Norway, stating that
the usage of electronic discussion forums is weak and while communication
with teachers is mostly perceived as satisfactory, communication among study
peers is mostly seen as lacking.

• The level of examiner perception and control of the examination environment.

The ability to control the environment, or at least to perceive and detect differ-
ent activities of examinees or students, changes from conventional to distance

26
environment and mode of operation (Rowe, 2004). On a conventional examina-
tion, an examiner can often see parts of the classroom from different angles and
also hear what is happening. Although this could be possible within a distance
examination as well, it could require rather special surveillance equipment for
students, which comes with a cost to obtain and operate. Yet a different type
of problem is the analytical capacity of such detection systems - does it just
record data (e.g. voice, video, keystrokes, etc.) and make the actual detection
of tens or hundreds of students up to a human, or can it operate automatedly?

• The ‘behavioral distance’ between acceptable operation and cheating. In other

words, how much syntagmatic behavioral difference there is between the two.

• The level of cheating possibilities. Provided that an environment is largely

uncontrolled and unperceived by the examiners, how and how well can one
keep students away from cheating?

• Indirectly the extent to which employers accept distance degrees. The public
trust in and employee acceptance of distance degrees seems to be smaller com-
pared to conventional degrees (Columbaro & Monaghan, 2009; Bourne et al.,
2005; Allen & Seaman, 2003). Although it might be tricky to identify the rea-
sons for this mistrust, some of them could presumably be related to different
assumptions about quality limits of distance education, cheating in distance
assessment, or simply doubts about a nonstandard and unconventional way of
studying.

The intention with these lines is not to mark one of the two environments as superior
or inferior to the other. It is to signify that an environment may have practically
beneficial advantages, while at the same time, it may have practical disadvantages,
some of them in form of threats.
A different and more friendly view toward the concept of distance education is
that it best suits adults in need of additional or continued education, who cannot
afford an interruption from their job (Paulsen & Rekkedal, 2001). Moreover, com-
pulsory time-bound sessions have been shown as dramatically reducing application
interest of this type of students (ibid).
Regarding statistics and comparison between cheating among on-site and dis-
tance students, there are a couple of studies showing varied results (Stumber-McEwen
et al., 2009; Herberling, 2002; Watson & Sottile, 2010). Some of them state that
distance students cheat more, some of them state the opposite. Let this be anyhow,
according to the results presented, distance students cheat as well as their on-site
counterparts do – and that seems to be a good reason to find ways of reducing that
matter.

27
28
Chapter 3

Conceptual framework

This chapter identifies important theoretical concepts related to cheating detection

in context of this study. Those include cue leakage theory, pattern recognition theory,
anomaly detection, and behaviometrics. Finally, my vision of a cheating detection
approach binds them by outlining the approach, and the vision gets related to a
specific goal through the summarized theory on examination cheating.

3.1 Cue leakage theory

Cue leakage theory is a concept based on the fact that when someone performs an
activity, the person tends to unconsciously leave cues about the activity being per-
formed. Perhaps the most common example is lying, which leaves different cues,
many in form of muscular activity such as facial gestures (Ekman, 1985). Although
the process of leaving cues is to some extent deliberately controllable (ibid), it is
questionable whether one can hide all respective cues when e.g. lying to another per-
son face to face. Generalized, the concept applies to the field of deception deception
(DePaulo et al., 2003; Anolli et al., 2001), recent research in which also focuses on
the electronic (computer-based) and networked environment, specifically text-based
asynchronous computer-mediated communication (TAC) (Zhou et al., 2003, 2004;
Zhou, 2005; Adkins et al., 2004; Fuller et al., 2006; Lee et al., 2009). Although
deception detection is not directly used in this study, linguistic features used in the
study are inspired or taken from studies related to deception detection in TAC.
Speculatively extending the theory further, cue leakage does not only apply to
deception activity, in context of which it is mostly spelled. Assumedly, it applies to
any process or activity – conscious or unconscious. Moreover, it assumedly applies
to all systems, not only humans. In example, an attacked computer system or
network often behaves in an anomalous way during an active intrusion and there
often remains evidence of the intrusion afterwards (Whitman & Mattord, 2007). In
another example, decreasing morale of a football team (a social system) leaves cues
based on which one can often notice the matter with or without any words flowing.
Even in another example, a dog having knowingly performed something undesirable
often behaves differently than usually – by gaze, facial gestures, movement, etc.
Those are all cues, each bearing a meaning, which strengthens by combination.
Finally applied to education, one can expect the very same concept to apply
as well. A student constructing sentences and subsequently writing these using a

29
Figure 3.1: A classification example

keyboard should well have different keystroke dynamics and/or text diction than the
same student rewriting text from a book, which is written by someone else having
different language habits.

3.2 Pattern recognition theory

Pattern recognition theory is a theory based on a scientific discipline called pattern
recognition. The aim of pattern recognition
is the classification of objects into a number of categories or classes. (Theodor-
idis & Koutroumbas, 2006)
Those objects are generically referred to as patterns (ibid).
Nowadays, pattern recognition is broadly used in fields of e.g. automated decision
making, optical character recognition, speech recognition, computer-aided diagnosis,
and so on (Theodoridis & Koutroumbas, 2006).
The measures used for classification are known as features. Generally, there is a
set of features for each classification problem, which form a classification vector
x = (x1 , x2 , . . . , xn ) (3.1)
where n is the number of features considered. A single classification vector identifies
a single pattern (object) (ibid). Based on a specific recognition problem, different
features can be used, i.e. spatial coordinates (position), time, latency, color, speed,
volume, radiation intensity, etc. An example is outlined in figure 3.1, where the
feature vector consists of two features (x1 and x2 ), based on which patterns (objects)
are classified to four classes. Those classes can be disjunct as well as they can overlap.
Usually, the nature of practical pattern recognition problems is fairly complex
and multivariate, and it is not possible or viable to define classes precisely according
to specific criteria. Therefore, classes are usually defined approximately and classifi-
cation mechanisms often misclassify patterns according to the intended classification
criteria.
Theodoridis & Koutroumbas (2006) have identified two major types of pattern
recognition:

30
1. Supervised pattern recognition, which operates based on a priori known clas-
sification information. Such classifiers can either be designed with a model of
the classification problem, or they can be trained by training feature vectors
before they classify inputs.

2. Unsupervised pattern recognition, which is just given input patterns, and those
are subsequently clustered to groups based on similarities within the set of
input patterns.
According to Huang (2006), Thomason (1990) and Jain et al. (2000), there are
five approaches to pattern recognition: (1) template matching (the simplest one),
(2) decision-theoretic (Jain et al., 2000), (3) syntactic-structural (Thomason, 1990),
(4) functional (Huang, 2006), and (5) neural network based.

3.3 Anomaly detection

Anomaly detection seems to be an important concept for methods and technology
related to different fields including biometrics or intrusion detection, especially when
having profiles to which subject measures are matched. In short,
anomaly detection refers to the problem of finding patterns in data that do
not conform to expected behavior. These nonconforming patterns are often
referred to as anomalies, outliers, discordant observations, exceptions, aberra-
tions, surprises, peculiarities, or contaminants in different application domains.
Of these, anomalies and outliers are two terms used most commonly in the
context of anomaly detection; sometimes interchangeably. Anomaly detection
finds extensive use in a wide variety of applications such as fraud detection for
credit cards, insurance, or health care, intrusion detection for cybersecurity,
fault detection in safety critical systems, and military surveillance for enemy
activities. (Chandola et al., 2009, p. 15:1)

Applied to distance examination, an example of anomalous behavior could be a

behavior indicating that a student is copying text from somewhere while working
on a task with no reading allowed.
Chandola et al. (2009) did a survey on techniques and application domains in
the area of anomaly detection, and also drew a general summary about the field.
Applications of anomaly detection cover at least (1) cyber-intrusion detection, (2)
fraud detection, (3) medical anomaly detection, (4) industrial damage detection, (5)
image processing, (6) textual anomaly detection, and (7) sensor networks (ibid).
Within the concept of anomaly detection, three types of anomalies are used
(Chandola et al., 2009):
• Point anomaly – having a collection of data, from which an individual data
instance is anomalous with respect to the rest of the data.

• Contextual anomaly, also called conditional anomaly – if a data instance is

anomalous based on a specific context (conditions). Detecting this kind of
anomaly requires one to have contextual (environmental) attributes to define
a context, and behavioral (indicator) attributes to define normal or anomalous
behavior and detect an anomaly. Contextual attributes can be defined as
spatial, graphs, sequential, or profile.

31
• Collective anomaly – if a collection of related data instances is anomalous with
respect to the rest of the data in the whole set.

Regarding the techniques of anomaly detection, three modes have been identified
(ibid), partially resembling or inheriting from the classification of machine learning
algorithms1 :

1. Supervised anomaly detection. Techniques operating in supervised mode use

a training data set, which contains labeled/classified data instances for both
normal and anomaly classes.

2. Semi-supervised anomaly detection. Techniques operating in semi-supervised

mode use a training data set, which only contains normal (non-anomalous)
data instances.

3. Unsupervised anomaly detection. Techniques operating in unsupervised mode

do not need a training data set. This mode assumes that normal instances are
far more frequent than the anomalous ones, and its precision can suffer from
not having this fulfilled.

The applicability of those modes increase from the first down to the third one.
According to Chandola et al. (2009), techniques of anomaly detection vary based
on specific application, which is further related to a specific notion of anomaly and
a specific nature of input data. Those techniques can be (1) classification based,
(2) clustering based, (3) nearest neighbor based, (4) statistical, or (5) spectral. More
specifically, those techniques commonly include statistical profiling using histograms,
artificial neural networks, support vector machines, rule-based systems, parametric
and nonparametric statistical modeling, bayesian networks, clustering-based tech-
niques, nearest neighbor based techniques, information theoretic techniques, spectral
analysis, regression, and mixture models (ibid).
The output of an anomaly detection technique can either be a label denoting
whether a given data instance is normal or anomalous, or a score providing a finer
resolution of the same (ibid).
Within a more specific application of anomaly detection, Stakhanova et al. (2010)
describes a framework for intrusion detection using a fusion of specification-based
and anomaly-based approach.

3.4 Behaviometrics
Behavioral biometrics, or perhaps more precisely called behaviometrics, refer to bio-
metrics (or rather just metrics) using behavioral traits of subjects, such as e.g. hand-
writing, gait, voice characteristics, keystroke dynamics, mouse dynamics of humans,
communication or control behavior of hard systems and many other (Yampolskiy &
Govindaraju, 2008).
1
Machine learning algorithm classification: (1) supervised learning, (2) semi-supervised learning,
(3) unsupervised learning, (4) reinforcement learning (learning how to act given an observation),
(5) transduction (learning to predict), and (6) learning to learn; According to Wikipedia: http:
//en.wikipedia.org/wiki/Machine_learning [Accessed: 2010-04-02]

32
This section will first describe common aspects of biometrics in general, then the
specifics of behaviometrics, and finally continue describing selected behaviometric
methods, which are of interest for this study.

3.4.1 Biometrics in general

Characteristics as face, voice, body figure, color, motorics, etc. have been naturally
used for recognition among animals and humans since ages ago. Recently, people
started to use those also officially – for police and forensic use, starting by body mea-
surements and later also fingerprints (Jain et al., 2004). Nowadays, those methods
are mostly automatedly used for a wide range of purposes including the govern-
mental, forensic, military, healthcare, commercial and academic, yet those are not
limited to security and access control.
The term biometrics refers to the usage of pattern recognition techniques to
measurable physiological or behavioral characteristics (Gamboa & Fred, 2004; Jain
et al., 1999, 2004). Hence, biometrics can be divided into two major groups: (1)
physiological biometrics and (2) behavioral biometrics, each measuring the respective
type of characteristics (Gamboa & Fred, 2004; Shanmugapriya & Padmavathi, 2009).

Biometrics identify people by measuring some aspect of individual anatomy or

physiology (such as your hand geometry or fingerprint), some deeply ingrained
skill or behavior (such as your handwritten signature), or some combination of
the two (such as your voice) (Anderson, 2008, p. 457).

Within the field of information security, it is possible to authenticate or identify

a person based on four groups of information, according to Whitman & Mattord
(2008):

• What a person knows such as an alphanumeric code or a combination of user

name and password.

• What a person has such as a key, file, magnetic card, integrated chip card, or
some other authentication token.

• What a person is, which in fact more precisely means what a person seems to
be based on physiological characteristics such as a fingerprint, iris or retinal
pattern, DNA etc.

• What a person produces including how a person produces it (or behaves) such as
voice, signature pattern, gait, keystroke dynamics, and other types of behavior.

The following are the properties desirable for a biometric method working with
a set of personal characteristics, inspired by Jain et al. (1999):

• Universality meaning that at best nearly everyone possesses the characteristics

(exceptions always tend to occur).

• Uniqueness meaning that no two different persons are equal in terms of the
characteristics.

• Permanence as the time invariance of the characteristics at a person.

33
• Collectability as the quantitative measurability of the characteristics, often
including its cost (not necessarily monetary).

• Performance as both the effectivity and efficiency of the method in terms of

its accuracy and resource demands.

• Acceptability as the extent to which people (including the public) are willing
to accept the use of the method.

• Circumvention as the ease of deliberately fooling a system based on the be-

haviometric method. Often, in order to keep this level low, one has to protect
the confidentiality (Whitman & Mattord, 2008) of biometric profiles well.

Biometric methods and systems usually rely on three types of usage operation
according to Jain et al. (2004):

• Enrollment, which measures a subject for the first time, extracts features from
the measurement, creates a biometric profile containing the measurement-
based features and stores the profile in a database.

• Authentication, also called negative recognition, which validates the authentic-

ity of a subject according to a given biometric profile and the subject measures.

• Identification, also called positive recognition, which tries to identify a subject

according to a set of biometric profiles and the subject measures.

Although not found in the literature, biometric methods can also identify patterns
within the subject measures, both dependently or independently from a biometric
profile. An example of such special application is automated stress measurement
(Vizer et al., 2009).
Within the operations mentioned above, four main groups of errors can occur,
according to Peacock et al. (2004), Gamboa & Fred (2004) and Jain et al. (2004):

• Failure to capture (FTC), also called failure to acquire, when a system fails
to take subject measures, i.e. an iris scanner fails to scan a person’s iris well
enough.

• Failure to enroll (FTE), when a system fails to build a biometric profile of

sufficient quality, i.e. a fingerprint authentication system takes several scans
of a person’s fingerprint, and those do not match, resulting in the system’s
inability to construct a usable biometric profile.

• False rejection (FR), also called false non match (FNM) or false negative,
which is a type 1 error. It is the case when an authentic subject gets rejected
(evaluated as a non-authentic subject). In a security application, this does
not directly pose security risk, however, it can do indirectly. Frequent false
rejections are highly annoying and under such conditions, people tend to start
ignoring the importance of the respective system alerts, or circumventing such
systems.

34
• False acceptance (FA), also called false match (FM), impostor pass (IP) or false
positive, which is a type 2 error. It is the case when a non-authentic subject
gets accepted (evaluated as an authentic subject). In a security application,
this is what directly poses security risk (compared to type 1 error).

As other authentication and identification methods, biometrics are also prone to

attacks. The most sensitive attack from the biometric perspective is the imperson-
ation attack, which occurs when an impostor (a non-authentic subjects) attempts
to act as an authentic subject. Impersonation itself is the successful result of the
attack. Another attack, especially relevant for identification, is identity concealment
attack, which occurs when a subject tries to hide its identity.
Part of biometric performance is measured through the following statistical pa-
rameters (Peacock et al., 2004; Jain et al., 2004):

• False rejection rate (FRR), also called false non match rate (FNMR). It is the
statistical probability that a false rejection will occur in a recognition operation
of a biometric system.

• Falce acceptance rate (FAR), also called false match rate (FMR) or impostor
pass rate (IPR). It is the statistical probability that a false acceptance will
occur in a recognition operation of a biometric system.

• Equal error rate (EER), sometimes also called crossover rate. It is the proba-
bility where both false rejection rate and false acceptance rate are equal toward
each other.

• Average error rate (AER), which it not used very commonly, combines FRR
and FAR into one scalar value and can even serve for the approximation of
EER.

• Failure to acquire rate (FTA), describing the percentage of cases for which the
system lacks sufficient power or ability to classify a subject.

• Failure to enroll rate (FTR), describing the percentage of users lacking enough
quality in their input samples to enroll in the system.

Additional parameters mentioned by Peacock et al. (2004) are the following:

• Cost to a user to enroll (CUE), which means the number of units to submit
to the system before enrolling as a valid user. The units can be keystrokes
or fingerprint scans or something else, based on the type of biometric system
used.

• Cost to a user to authenticate, which means the number of units to submit to

the system before a valid user authenticates.

Figure 3.2 describes FRR and FAR parameters, and their distribution graphically.
On the left diagram, one can see impostor and genuine subject distribution, and a
matching score threshold the matching mechanism uses on a two-dimensional scale
of matching score and probability. Those parameters largely determine the error
rates of the system (false rejection and false acceptance rate), typical relation of

35
Threshold
1

1
Impostors Genuines Forensic use

FAR
p

Commercial use

High-security use
FRR FAR
0

0
−∞ Matching score +∞ 0 FRR 1

Figure 3.2: Biometric system error rates (inspired by Jain et al., 2004). In terms of Detection
Theory (Abdi, 2007), the impostors are noise, while genuines mean signal.

Figure 3.3: A typical architecture of a biometric system (inspired by Jain et al., 2004)

which (also called receiving operating curve - ROC ) is drawn on the right diagram.
The point where the impostor and genuine subject distribution curves cross over
each other signifies the equal error rate (EER).
Simplified, a typical biometric system design has at least the following compo-
nents (inspired by Jain et al., 2004):
• Sensor, which measures the subject.

• Feature extraction module (feature extractor), which extracts a set of param-

eters (features) from the subject measures.

• Matching module, which matches the input features with the profile in the
database (if any available).

• Decision module, which makes a decision whether or not to accept the sub-
ject (in authentication mode), or the identity of the subject or an error (in
identification mode).

• Database, which stores subject profile containing subject measurement fea-

tures.
Such design consisting of the components named above is depicted on figure 3.3.
Since relying on a single authentication method or system has its reliability and
security weaknesses, fusion of multiple authentication methods is used. Such systems
are called multimodal biometric systems (Jain et al., 2004). Having several typical
biometric systems as described in figure 3.3, each two of those can be combined in
several ways as indicated on figure 3.4, and each way has its own advantages and
disadvantages over each other (ibid).
Public awareness and perceptions of biometrics vary as well as the assumptions
these are based on (Furnell & Evangelatos, 2007). The same study also shows rather

36
a)

Figure 3.4: Fusion of biometric systems: (a) at capture, (b) at feature extraction, (c) at
matching, and (d) at decision. Inspired by Jain et al. (2004)

37
strong privacy concerns of the public. The concept of privacy is also discussed by
Peacock et al. (2004), Moskovitch et al. (2009) and Yampolskiy & Govindaraju
(2008). From a more technical and cryptography point of view, privacy and secrecy
of biometrics in biometric secrecy systems are discussed by Ignatenko & Willems
(2009).
From a biometrics-wide point of view, Doddington et al. (1998) formulated a
classification of four types of speakers analogized to animals by characteristics of
their recognizability:

• Sheep, who match well against themselves and poorly against others. They
make up most of the population.

• Goats, who are difficult to match against themselves.

• Lambs, who are easy to impersonate.

• Wolves, who match well at impersonating others.

Later, Yager & Dunstone (2010) also took a look at user classification with regards
to how well biometrics performs for different users, or how well can different users
perform on biometrics, and extended the previous classification in the following way
(also described in figure 3.5):

• Chameleons, who rarely get false rejections, but are likely to cause false ac-
ceptances toward others.

• Phantoms, who score low again themselves as well as others.

• Doves, who match well against themselves and poorly agains others. That
makes them the ‘positively ideal’ users of biometrics.

• Worms, who match poorly against themselves, but well against others.

To connect the new classification to the old one, the dove ideal is equal to the sheep
one, while both lambs and wolves just have high impostor ranks. Importantly to
note, impostor rank means both the likelihood to impersonate and the likelihood to
get impersonated.
I perceive this concept as important to realize, since a recognition system is
usually used to recognize all kinds of subjects having different recognition properties.

3.4.2 Specifics of behaviometrics

Perhaps the major step differentiating behaviometrics from physiological biometrics
is the consideration of time in the enrollment and recognition operations. According
to Yampolskiy & Govindaraju (2008), behaviometrics are not as well established as
physiological biometrics, so far.
The most researched behaviometrics according to Yampolskiy & Govindaraju
(2007, 2008) are the following: Audit logs, biometric sketch, blinking, call-stack, call-
ing behavior, car driving style, command line lexicon, credit card use, dynamic facial
features, e-mail behavior, gait/stride, game strategy, graphical user interface interac-
tion, handgrip, haptic, keystroke dynamics, lip movement, mouse dynamics, network

38
6
Worms Chameleons

Impostor rank (∼ FAR)

Phantoms Doves
-
Genuine rank (∼ (1 - FRR))

Figure 3.5: The biometric menagerie according to Yager & Dunstone (2010)

traffic, painting style, programming style, registry access, signature/handwriting,

soft behaviometrics such as different intelligence (IQ) measures, storage activity,
system calls, tapping, text authorship, voice/speech/singing. According to Yam-
polskiy & Govindaraju (2009), one can even consider odor dynamics, heart beat
sound, electrocardiogram (ECG) and electroencephalogram (EEG) as behaviomet-
rics. Thorpe et al. (2005) presents a brain-computer interface (BCI) technology
called Pass-thoughts, which measures thought-dependent signals produced by brain,
and based on that recognizes people.
Yampolskiy & Govindaraju (2009) further identified two more groups of be-
haviometric technologies: (1) Software interaction biometric technologies such as
operating system interaction behavior, web browsing behavior, e-mail checking and
sending behavior, word processing behavior, media interaction behavior, photo edit-
ing behavior, or behavior with the usage of any other software system, (2) video
surveillance behaviometrics such as eating and drinking behavior, interaction with
electronics, driving style, shopping habits, exercise routine, dress and appearance
choices, vocabulary, or any other visually receivable behavior.
Behaviometrics can be classified into five categories based on the type of subject
measurements (Yampolskiy & Govindaraju, 2008):

1. Authorship-based such as analysis of a piece of text or drawing (e.g. signature).

2. Direct human-computer interaction based (using active interaction and obser-

vation) such as inputs from input devices (keyboard, mouse, joystick, or haptic
devices), from which it attempts to extract specific behavioral traits as habits,
strategy or skill exhibited by the user during the interaction.

3. Indirect human-computer interaction based (using observation of different in-

dications or evidence) such as program execution trace, audit log, call stack
data or system call trace analysis.

4. Motor skill based (focusing on muscle usage traits, which rely on function

39
of brain, nervous system, skeleton, joints etc) such as inputs from keyboard,
mouse etc.

5. Purely behavioral, which instead of focusing directly on body measurements,

analyze utilization of different strategies, knowledge and skills during the per-
formance of different tasks. Examples include analysis of painting or program-
ming style, credit card use, or text writing diction.

As apparent from the categorization, a single behaviometric system or method can

fit multiple of the categories described above.
Inspired by Moskovitch et al. (2009), one can divide behaviometrics into at least
the following two categories:

• Login-time recognition, which does its work at the beginning of a usage session,
or perhaps also as an isolated periodic, sporadic, or event-based activity later.

• Continuous recognition, which works all the time during a usage session based
on how one interacts with a system (e.g. a computer).

Yampolskiy & Govindaraju (2009) identified a few general properties of behavior

(in general):

• Speed meaning how fast a behavior is performed.

• Correctness in terms of quantity of mistakes compared to the desired behavior

in given situation.

• Redundancy as useless repetitiveness of the same behavior per time unit.

• Consistency as the similarity of the same behavior measured at different oc-

casions.

• Rule obedience as the amount of socially less acceptable behavior (e.g. per time
unit). An example of such behavior might be examination cheating, abuse of
language, or parking car on an unsuitable spot.

The following are some of key advantages usually easier achievable by behavio-
metrics compared to physiological biometrics according to Shanmugapriya & Padma-
vathi (2009), Wood et al. (2008), Jain et al. (2004) and Yampolskiy & Govindaraju
(2007, 2008):

• Unobtrusiveness toward the subject of recognition. Sometimes the subjects

do not even need to know about the behaviometric system and the respective
procedures in order for the system to effectively operate.

• Price for the system and its operation, codetermined by its dependence on
uncommon equipment (e.g. in context of daily computer usage).

From a different angle, some of the major drawbacks of using behaviometrics com-
pared to physiological biometrics follow:

• Relative instability of traits, since physiological traits tend to be more static

at different conditions, over time and in the long run.

40
• Time requirements, since behaviometrics incorporate timing and it takes a
while before such system can effectively recognize a subject.

Moskovitch et al. (2009) identified different alternatives of deployment configu-

ration of behaviometric systems in a computer environment:

• Host deployment, in which the entire system is deployed on a laptop, desktop

or a server.

• Client-server, in which part of the system is deployed on a server host, while

the rest is on client hosts. This can be done in several variants, namely (1)
thin client (e.g. only sensors, or code executable within a web browser), (2)
thick client, and (3) hybrid; dependent on how much of the total system logic
exists on the client side (and server side, respectively).

Yampolskiy & Govindaraju (2008) identified five areas, which may benefit from
progress in the field of behaviometrics: (1) opponent modeling in game theory and
related fields (also applicable in the military), (2) user modeling for marketing and
customization or optimization purposes, (3) criminal profiling for investigation pur-
poses, (4) jury profiling for juridical predictions, and (5) plan recognition for under-
standing the goals of an intelligent agent.

3.4.3 Keystroke dynamics

Keystroke dynamics is a behaviometric based on the measurement and analysis of
dynamic aspects of user-keyboard interaction. There are a number of different fea-
tures different behaviometric methods or systems extract from the keyboard inputs.
Perhaps the most known are the statistics of typing speed parameters such as letters
per minute or words per minute, timings between keystroke events such as digraph
up up
timings (timed sequence: keydown1 → key1 → keydown 2 → key2 ), similarly tri-
graph timings, keyword timings, flight times, dwell times, keystroke-time overlaps,
use of backspace and many other measures (Tappert et al., 2009; Yampolskiy &
Govindaraju, 2008; Hempstalk, 2008; Villani et al., 2006; Moskovitch et al., 2009).
Applied to identification and/or authentication of users, keystroke dynamics of-
fer possibilities for multimodal and continuous operation such as authentication,
identification, or simply monitoring (Wood et al., 2008). For example, instead of a
plain system login by user name and password at the beginning of a session, one can
be authenticated by both – keystroke dynamics hardened user name and password
authetication at the beginning of a session (Ilonen, 2003; Hempstalk, 2008; Rybnik
et al., 2009), and keystroke dynamics applied to any and all keyboard inputs dur-
ing the session without requiring any explicit user authentication responses, which
significantly increases the authentication coverage of using the system (Hempstalk,
2008). While some applications are limited to hardening password-based authen-
tication procedures, some other are more generally applicable (Shanmugapriya &
Padmavathi, 2009; Gunetti & Picardi, 2005). The former group is of less interest
for the study and is therefore omitted. The latter approach builds on the same
techniques, however, it is open for longer and continuous text input instead of being
limited to a short sequence of keys pressed and released. The fact also makes it
more reasonable to recognize complex structures of keystroke dynamics manifested

41
within and across longer time spans of interaction. Tappert et al. (2009) presents
a behaviometric solution based on long-text input keystroke dynamics. As an im-
portant compromise to consider with keystroke dynamics recognition system design,
Gunetti & Picardi (2005) and Hempstalk (2008) found out that in current state
those systems either require large quantities of typing before accepting or rejecting
a subject, or they are susceptible to small fluctuations in the typing patterns.
Shanmugapriya & Padmavathi (2009) categorized the use of keystroke dynam-
ics in the following ways: (1) Static at login (the case of password hardening), (2)
periodic dynamic, (3) continuous dynamic, (4) keyword-specific, and (5) application-
specific. In context of keystroke dynamics, the terms ‘static’ and ‘dynamic’ are
sometimes replaced by terms ‘fixed/structured text’ and ‘free text’, since some re-
searchers believe that the former terms may be misleading (Gunetti & Picardi, 2005).
The behaviometric recognition is also realized in different ways across different
studies and systems. According to Shanmugapriya & Padmavathi (2009), most com-
mon approaches are either statistical, or based on artificial neural networks (ANN).
Other methods include hidden markov models, bayesian classifiers, gaussian classi-
fiers, gaussian mixture modeling, rhythm-based algorithms, k-nearest neighbor algo-
rithms (k-NN), distance-based algorithms (using euclidean, hamming, manhattan,
chebyshev, or some other distance measure), support vector machines (SVM) (Giot
et al., 2009; Jagadeesan & Hsiao, 2009; Hosseinzadeh & Krishnan, 2008). Toward
the ‘more exotic sounding’ ones, Hempstalk (2008) names an application of a modi-
fied LZ78 compression algorithm used for input log prediction, and Karnan & Akila
(2009) uses genetic algorithms (GA) and particle swarm optimization (PSO) in order
to gain better recognition accuracy.
From a development perspective, Hosseinzadeh & Krishnan (2008) proposed a
protocol for the development of behaviometric technology, specifically keystroke dy-
namics, which tries to cover different problematic aspects encountered within previ-
ous work in the area. Those aspects include (1) feature design, (2) data collection,
(3) error reporting, and (4) data acquisition, all seen upon as working in a cycle
(1,2,3,4).
As stated earlier and shown by the results of Wood et al. (2008), keystroke
dynamics change over time. The results show a progressive decline in both identifi-
cation and authentication using this behaviometric method during a period of four
weeks without updating the reference profile for the users.
Keystroke dynamics are influenced by factors as stress (Vizer et al., 2009), alert-
ness, fatigue, mood, illness, injury, time of day, simultaneous activities to writing,
etc. (Gunetti & Picardi, 2005; Hempstalk, 2008). Moreover,

disturbances to a typist usually result in a lower typing speed. [...] Typists

naturally pause at the end of words, sentences and paragraphs as they prepare
to type the next section. Pauses also occur between syllables and difficult key
combinations, but to a lesser extent than other natural pauses. Indeed, these
small pauses may form part of the typist’s pattern, but it can be difficult to
distinguish legitimate pauses from distractions or mood influences. Noise is
therefore unavoidably present in all continuous input, regardless of its length.
(Hempstalk, 2008, p. 25)

Except the less deterministic environmental effects, a simple change of keyboard can
change the typing dynamics (Gunetti & Picardi, 2005; Villani et al., 2006).

42
Figure 3.6: An example process of mouse dynamics analysis (inspired by Ahmed & Traoré,
2007). This general model is also applicable to keystroke dynamics and basically any other
behaviometric method or technology. Compared to what is apprehensible from the typical
architecture of a biometric system shown in figure 3.3, this process incorporates usage session
identification (for the subsequent analysis steps to see a broader behavioral context) and
noise reduction (to cut information of lesser significance to the recognition process).

Finally, the application of keystroke dynamics can effectively improve the imu-
nity toward security threats stemming from e.g. (1) shoulder surfing, (2) spyware,
(3) social engineering, (4) login guessing, (5) brute force password attacks, or (6)
dictionary password attacks, according to Shanmugapriya & Padmavathi (2009). It
is potentially usable against many kinds of keyboard-based computer usage imper-
sonation and as a basis for an intrusion detection system (IDS) (Gunetti & Picardi,
2005). Keystroke dynamics and many other behaviometric methods can be used to
minimize risks of the attacks mentioned, and on top of that, a more serious matter
called identity theft (financial, criminal, business/commercial, or identity cloning)
(Moskovitch et al., 2009; Jagadeesan & Hsiao, 2009).

3.4.4 Mouse dynamics

Mouse dynamics is a behaviometric area with somewhat shorter history and less
research in compared to keystroke dynamics. Even though, mouse dynamics be-
haviometrics has rendered as a promising recognition method (Ahmed & Traoré,
2007). Mouse dynamics uses measures as pointer movement velocity, acceleration,
angular velocity or jerk, all dependent on movement direction (Yampolskiy & Govin-
daraju, 2008, 2009; Ahmed & Traoré, 2007; Moskovitch et al., 2009). Ajufor et al.
(2008) used additional features to extract such as e.g. movement curvature proper-
ties and mouse click duration. Jagadeesan & Hsiao (2009) combined keyboard and
mouse dynamics recognition, adding parameters called mouse-to-keyboard interac-
tion ratio and interaction quotient. An example of mouse dynamics analysis system
is outlined in figure 3.6.
Ahmed & Traoré (2007) classifies mouse movement into four categories: (1)
mouse move, (2) drag and drop, (3) point and click, and not less importantly, (4)
silence.
As other behaviometric methods, also mouse dynamics recognition is sensitive
to variations in the measured dynamics. Shen et al. (2009) successfully tried to
minimize the effect of those using dimensionality reduction techniques, showing im-
provement of both FRR and FAR in the recognition process.

43
According to Ahmed & Traoré (2007), mouse dynamics have been mostly used
to aid graphical user interface (GUI) design. Most of security-related research in
mouse dynamics is focused on continuous authentication and identification according
to Bours & Fullu (2009), who prototyped a mouse dynamics login system. A similar
experiment was carried out by Aksarı & Artuner (2009). Beyond the scope of GUI
design and information security, Zavadskas et al. (2008) and Kaklauskas et al. (2009)
used mouse dynamics for emotional state analysis and Vizer et al. (2009) used it for
stress measurement (both applications discussed in a later section). Both of those
applications are somewhat closer to psychological application.

3.4.5 Linguistic dynamics

Although linguistic dynamics is not explicitly named as a form of behaviometrics, by
the definition of behaviometrics and the nature of linguistic dynamics, it seems to be
similar to keystroke or mouse dynamics. One of differences is that the analysis does
not operate on keyboard inputs, or letters, but rather words, their semantics, and
constructs built on these instead. Within words, sentences, paragraphs, etc, one can
analyze different aspects, lying on at least two planes: (1) form and (2) meaning.
For the latter, the theory of Systemic Functional Linguistics (SFL) (Eggins, 2004;
Fawcett, 2008) is of particular interest. Its aim is to analyze and explain how do
we use language to address meanings – all rather in a paradigmatic sense (focusing
on meaning) than the syntagmatic one (focusing on form). In order to design a
linguistic dynamics recognition system however, the use of both seems to be fruitful,
since one can measure features of style with identification potential in both form
and meaning of text or even more in general, in what one produces (Abbasi & Chen,
2008).
Fields or areas close to linguistic recognition are e.g. autorship attribution (Juola,
2006), or stylistic text classification (Argamon et al., 2007). Language cohesion plays
an important role in linguistic analysis and recognition, which can be analyzed
computationally (Crossley et al., 2007; Graesser et al., 2004).
Abbasi & Chen (2008) proposed a framework for text analysis of computer medi-
ated communication, three meta-functions of which are shown in table 3.1. Within
that framework, there is a categorization of different linguistic features, shown in
table 3.2. Another collection of features are shown in table 3.3.
Reflecting on linguistic dynamics as a behaviometric method, it is usable as
a method for plagiarism detection (Howell et al., 2009; Cizek, 1999). Moreover,
belonging to the group of purely behavioral behaviometrics within the classification
of Yampolskiy & Govindaraju (2008), it does not depend on any direct or real-
time interaction – plain words as a result of such interaction suffice, even though
extending it on the time dimension might help making the analysis more powerful.

3.4.6 ‘Special purpose’ behaviometrics

This part outlines three application of behaviometrics: stress measurement, emo-
tional state analysis and deception detection.
The reason for calling it ‘special purpose’ behaviometrics is that the vast majority
of behaviometric applications seem to be in the fields of authentication and iden-
tification of people. Applying the concepts to other fields makes it thus somewhat

44
Meta-function Information type Analysis type
Topics Topical analysis
Events Event detection
Ideational
Opinions Sentiment analysis
Emotions Affect analysis
Authorship analysis
Style Deception detection
Textual Power cues
Genres Genre analysis
Vernaculars Semantic networks
Social networks
Interpersonal Interaction
Conversation streams

Table 3.1: Meta-functions of a computer mediated communication text analysis framework

by Abbasi & Chen (2008).

Resource Category Feature group Examples

Function words of, for, the, on, who, what
Semantic Punctuation !, ?, :, ”, –
Special characters @, #, , &
Structural Technical structure font colors, sizes
Sentiments positive/negative terms
Language
Affect classes happiness, hate, anger
Lexicons Idiosyncracies misspellings, slang
Geographic places, cities, states
Temporal time references
Thesaurus Synonyms –
Word lexical # of words, average word length
Character lexical # of characters, % of num. chars
Vocabulary richness –
Lexical
Word length distrib. –
Character N-grams sp, spe, spel, spell
Digit N-grams 14, 26 784
Processing
POS tag N-grams TX PWR, PWR, PWR LVL
Syntactic
Word N-grams go to, to the, go to the
Noun phrases keyboards, computers
Semantic Named entities Europe, Volkswagen
Bag-of-words all except function words
Structural Document structure URLs, quotations

Table 3.2: Text analysis linguistic features categorized by Abbasi & Chen (2008).

45
Type Feature
Words
Verbs
Quantity Modifiers (adjective or adverb)
Function words (prepositions, articles, conjunctions)
Sentences
# of words
Average sentence length ( )
# of sentences
Complexity # of chars
Average word length ( )
# of words
# of punctuation
Pausality ( )
# of sentences
# of passive verbs
Passive verb ratio ( )
# of verbs
# of modal verbs
Modal verb ratio ( )
# of verbs
Non-immediacy You reference ratio
Self reference ratio
# of 1st person plural pronouns
Group reference ratio ( )
# of words
rd
# of 3 person pronouns
Other reference ratio ( )
# of words
# of modifiers
Expresiveness Emotiveness ( )
# of nouns+# of verbs
# of unique
Lexical diversity ( )
# of words
Diversity # of function words
Redundancy ( )
# of sentences
# of unique non-function words
Content word diversity ( )
# of non-function words
# of misspelled words
Informality Typo ratio ( )
# of words
Affect ratio
Sensory ratio
Temporal immediate ratio
Specificity
Temporal non-immediate ratio
Spatial close ratio
Spatial far ratio

Table 3.3: Linguistic features (inspired by Adkins et al. (2004) and Zhou et al. (2003))

46
unconventional.

Stress measurement
Vizer et al. (2009) carried out an exploratory study in automated stress detection
using keystroke and linguistic dynamics. Although the study is directed toward
aging population and the assessment of individuals’ cognitive status, some concepts
and findings seem to be of broader applicability.
According to Vizer et al., a solution purely based on the analysis of keystroke
dynamics and linguistic features

(1) unobtrusively gathers data, (2) facilitates the process of gathering baseline
data, (3) allows data to be captured continuously over a length of time, (4)
leverages behaviors in which the individual is already engaged, (5) requires no
extra equipment, (6) can automatically adjust to the unique characteristics of
each individual, and therefore (7) allows for early detection of changes (Vizer
et al., 2009, p. 871).

Moreover, the instability of an individual’s typing patterns have been attributed to

stress, environment, or changes in physical or cognitive function (Monrose & Rubin,
2000).
According to Vizer et al. (2009), outwardly visible signs of response to physical
and cognitive stress are very similar and without measuring stress hormone levels
immediately, one cannot be sure about what kind of stress to attribute the stress
signs. Vizer et al. found that changes in features as time per keystroke, lexical
diversity and typing pause length were apparent in conditions of both physical and
cognitive stress.
Vizer et al. summarizes some other mostly successful ways of detecting stress in
general, e.g. optical recognition of facial expressions, pressure-sensing mice to detect
frustration, or automatic speech analysis. The whole field is partially related to the
field of deception detection, since dynamics of an individual’s affect are one of the
cues indicating deceptive intentions and actions (Vizer et al., 2009). Viewed more
positively, the field of stress measurement is related to the field of work psychology
in terms of productivity and work performance optimization (ibid).

Emotional state analysis

Emotional state analysis is very closely related to the field of affective computing
(Picard, 1997, 2000, 2003), for which there is a broad range of possible applications.
Besides different kinds of anomaly detection, it is e.g. entertainment, optimization
in learning, or simply making human-computer interaction more pleasant, ‘human’
and less depressing sometimes.

Emotion is the transient psychological, physiological and behavioural re-

sponse to thoughts, events and social activity. A typical classification
of emotions might be the following primary families: anger/annoyance,
fear/anxiety, sadness/loneliness, disgust/shame, surprise/shock, pleasure/joy,
love/friendship. These feelings arise from neural excitement triggered by
perception, cognition, or memory (Zavadskas et al., 2008, p. 430).

47
Moreover, each of the emotions have at least two important attributes: (1) arousal
(intensity) and (2) valence (‘direction’, e.g. in terms of being positive or negative)
(Zimmermann et al., 2003; Picard, 1997).
The study of (Zavadskas et al., 2008) focuses on analyzing emotional state of
computer users with regards to their work performance and productivity. A number
of parameters were measured, including mouse pressure (buttons and the mouse
itself) using force sensors, electrogalvanic skin conductance, palm skin temperature,
behaviometric parameters related to mouse movement and clicks, amplitude of hand
tremble, idle time, and the use of scroll wheel.
Kaklauskas et al. (2009) used the same platform for analyzing emotional state
of students during examination process, and Zimmermann et al. (2003) did an ex-
periment measuring mood using keyboard and mouse dynamics.

Deception detection
The concept of deception detection is largely based on concepts of Interpersonal
Deception Theory (IDT) (Buller & Burgoon, 1996), Cue Leakage Theory (Ekman,
1985; DePaulo et al., 2003), Reality Monitoring (Johnson & Raye, 1981), McCor-
nack’s Information Manipulation Theory (IMT) in (Fuller et al., 2006), Media Rich-
ness Theory and Media Synchronicity Theory (Dennis & Valacich, 1999), and a few
more (Zhou, 2005; Zhou et al., 2004; Fuller et al., 2006). Although e.g. IDT holds
that around 90% of deceit cues have nonverbal character such as facial, gaze, gesture
and other expressions, and most research within the field of deception detection was
directed toward face-to-face (FtF) dynamics, there is also some research on detect-
ing deceit using linguistic features in computer mediated communication (CMC)
(Adkins et al., 2004; Fuller et al., 2006; DePaulo et al., 2003; Zhou, 2005; Zhou et
al., 2003, 2004; Lee et al., 2009).
Deceptive communication has long been a problem for military, govern-
ment, and business organizations. The Internet has provided another
way to communicate deceptively; a way that offers greater anonymity
and leaner media for disguising intent. (Adkins et al., 2004, p. 122)
In context of CMC, deception detection is tightly bound to linguistic analysis as to
a tool for extracting different cues signalizing deception. Since the CMC-specific de-
ception detection concepts are seen as most relevant for this study, concepts specific
for FtF or other areas of deception detection are omitted here.
DePaulo et al. (2003) did an extensive summary of text-based cues of decep-
tion in CMC. Moreover, Zhou (2005) mentioned nonverbal cues to automated CMC
deception detection such as voice-related and keyboard-related behavior, eye move-
ment, facial expression, body postures etc. She hypothesized a number of relations
between deceit and linguistics within instant messaging, together with listing a num-
ber of cues, however, many of them seem to be mostly related to (if not dependent
on) interactive communication.

3.5 Vision of a behavioral cheating detection approach

This section presents my own vision of an approach of cheating detection, which is
besides models of human behavior and its causality, based on concepts of cue leakage

48
Figure 3.7: Deterrence mechanism of cheating detection linked to Ajzen’s (1991) theory of
planned behavior extended by Stone et al. (2009), and the model of student cheating decision
from Dick et al. (2003). For description of the models, see 2.1.2.

theory, pattern recognition, anomaly detection, and behaviometrics.

First, there is a description of how the approach can contribute to the mission of
cheating prevention. Second, behavioral characteristics are presented as a possible
common denominator for examination cheating. Thirdly and finally, a structural
model of the approach is outlined and discussed.

3.5.1 The angle of attack

Following the ‘police approach’ to preventing cheating, cheating detection is seen

as a tool of both deterring cheating and allowing faculty to react upon cheating if
detected. The mechanism of the latter (enabling reaction) seems to be quite simple –
if one perceives something, one can react on it. The mechanism of the former seems
a bit more complex, though. Linked to Ajzen’s (1991) theory of planned behavior
extended by Stone et al. (2009), deterrence mechanism of cheating detection targets
perceived behavioral control of a student as a possible cheater (outlined in figure 3.7).
Linked to the model of student cheating from Dick et al. (2003), the mechanism aims
at changing the perception of situational context – making the student perceive the
risk that his/her eventual cheating gets detected (also outlined in figure 3.7). Both
of the effects directly depend on the student’s perception of effectiveness of cheating
detection in a specific situation, which can be formed in different ways, e.g. by
simply convincing students that there is an effective cheating detection mechanism,
or by letting students know cases of other students being detected and subsequently
punished.

49
3.5.2 Behavioral characteristics as the cheating detection unifier
First of all, it seems important to specify what behavior means in this context. I
see it as a set of actions performed by a system during a non-zero time interval.
Following this definition, behavior is not only related to what we deliberately (and
consciously) say, how we decide, etc. It is also how we say that, how we write what
we write, what is the word selection, etc., part of which has always unconscious and
habitual roots. I.e. the field of behaviometrics (see 3.4) is based on this and not
being this so, one would hardly be able to effectively authenticate people based on
their behavioral traits, since it would be trivial to fake for anyone.
Following the concept of cue leakage, an activity is among other reflected by per-
ceivable behavioral cues. More specifically, a student cheating on an examination
performs a set of activities signifying or being typical for a specific kind of cheat-
ing, and those activities get reflected by some of the behavioral cues the student
leaves in different kinds of his/her behavior. Considering an online computer-based
examination, student writes his/her exam using at least keyboard and/or mouse.
Comparing this approach to the approach of using examination proctors to de-
tect when students read from crib notes, other unauthorized resources, or they talk
to each other, I see the following advantages: it is (1) more automatable, (2) op-
erationally cheaper, and (3) more broadly applicable (to both detect the usage of a
full range of cheating methods, and to detect them in an audiovisually unperceived
environment). On the other hand and at the same time, I see it as (1) less definite
(i.e. if a proctor sees a student reading from a crib note, it is a very strong cheating
indication, while if a student’s behavior shows likelihood of cheating, the indication
is much weaker, because there can be a number of other factors affecting it and being
ignored by the detection mechanism), and (2) dependent on information technology.

3.5.3 The detection mechanism

The described approach employs measurement through behaviometrics and recogni-
tion through anomaly detection and classification as the tools for cheating detection.
Behaviometrics seem to be the ultimately available way of unobtrusively mea-
suring human-computer interaction behavior of students, specifically keystroke and
mouse dynamics, from which one can further extract linguistic dynamics. Only
measuring behavioral features, however, does not allow one to detect cheating. The
seemingly easier problem is the classification of behavior into the acceptable and the
unacceptable. The seemingly tougher problem is the judgment how unacceptable
the behavior is, why so, and the identification of cognitive processes based on the
cues extracted from behavioral measures.
On a way to accomplish what is mentioned above, there are hinders making the
problem more challenging, at least:

1. Not only that people have own habits and dynamics of motor behavior, much
of it is also rooted in neuropsychology and as such, unconsciously influenced
(Stelmach & Requin, 1980; Kelso, 1982). Those often slightly differ from in-
dividual to individual, and hence, what could be contextually considered as
anomalous behavior for one student, might be normal for another student, and
vice versa.

50
2. There are many factors influencing behavior (Vizer et al., 2009; Hempstalk,
2008; Gunetti & Picardi, 2005), while most of them remain unknown to an
analyst or a cheating detection system. Those unknown factors cause rather
unavoidable error in conclusions of a detection process. This is also a reason
why relying on probabilistic cheating detection methods based on statistical
analysis (and classification) alone is not perceived as sufficient to trigger actions
in order to bring personal consequences (Cizek, 1999).

3. There are different types of examination problems/questions, which may re-

quire different tasks and they may produce different behavior. I.e., it would
make a difference if one was allowed to copy text or a diagram from a book
or the problem assignment, would legitimately do so, while his/her behavior
would be recognized and reported as cheating.

An approach to overcome the first is to profile a student’s behavior for signs of both
normal and suspicious behavior before a cheating analysis is performed. This is,
what is commonly used in behaviometrics and biometrics in general for authenti-
cation and identification purposes (Jain et al., 1999, 2004). In practice, the second
problem seems to be pretty out of control to me. Perhaps the solution lies in the
usage – not to rely on such methods alone and watch out for their indicatory outputs
being misinterpreted as proofs by those who use them. In order to overcome the
third problem, I see the following solutions: either (1) limiting the perceived rele-
vance of cheating detection results to a specific range of problems/questions, or (2)
extending the cheating detection so that it takes into account both relevant exami-
nation information and the specific context in which the examined student operates,
in order to increase overall relevance of the cheating indication.
Figure 3.8 outlines a model of a cheating detection method the study aims for.
While a student is writing an examination, his/her human-computer interaction
behavior is being recorded (measured). Either directly or after the examination,
it can be analyzed. The analysis consists of several steps as follows: (1) feature
extraction based on models of behavior on a molecular level, which also incorporates
noise reduction, (2) anomaly detection, which compares the actual inputs to the a
priori created and known profiles of the student, and (3) classification of the anomaly
trying to indicate whether and how the student is cheating.
The anomaly detection is semi-supervised, since it only learns from profiled nor-
mal behavior. The output of the anomaly detection process is the amount of behav-
ioral anomaly relative to the profiled normal behavior, in form of a multidimensional
vector. The type of anomaly is contextual (conditional ) according to the classifica-
tion of Chandola et al. (2009).
The classification process classifies behavioral anomaly according to both built-in
generalized models of behavior and profiled suspicious behavior. Thus, the classifi-
cation is supervised.
Precision of a method like the one described here would seemingly necessarily
diverge by time unless at least the normal behavior profile was being updated from
time to time (Wood et al., 2008). Also regarding that fact, the system should be
able to run at least in the following modes: (1) enrollment of a student as the
process of profiling his/her behavior, (2) recognition of eventual cheating based on
both profiled behavior and generalized models, and (3) profile adjustment, which

51
Figure 3.8: Model of the cheating detection approach

can run manually or automatically, e.g. after each recognition based on segments
of near-normal behavior. Discussing the operational perspective in more detail is
beyond the scope of the thesis.
Finally, according to the classification of Yampolskiy & Govindaraju (2009),
this approach as a biometric, would fall into four out of the five categories identi-
fied: authorship-based (linguistic dynamics), direct human-computer behavior based
(keystroke and mouse dynamics), motor skill based (keystroke and mouse dynamics),
and purely behavioral (linguistic dynamics).

52
Chapter 4

Methodology

This chapter describes the research process and the methodology to gather and
analyze data within this study.
Given the research goals (see 1.2), this study has a dominantly descriptive char-
acter, trying to characterize/describe a phenomenon (specific meanings in behavioral
dynamics in relation to a specific activity on which the behavior manifests). Ac-
cording to Leedy & Ormrod (2005), descriptive research involves identifying charac-
teristics of the observed phenomenon, or exploring possible correlations among two
or more phenomena, while the situation is examined as it is, without changing or
modifying the situation under investigation. Moreover, descriptive research is not
intended to determine cause-and-effect relationships (ibid).
A real-world phenomenon such as specifics of a person’s human-computer inter-
action behavior and their dependencies on specific activities the person performs
at the same time, is fairly complex both within the boundaries of the phenomenon
itself and the rest of the related environment. Such behavioral specifics depend on
a broad range of factors (situational, personal, technological, societal, etc.), and on
top of that, the factors work together, and are dynamically interrelated. A way to
explore the relations within such phenomenon is to simulate situations, when the
phenomenon is expected to occur. Within such a simulation, however, practical
problems arise: How to validly and reliably simulate such situations, gather data
(observations, measurements, etc.), analyze and interpret those in order to meet the
research goals?
For the first issue (the simulation), there are two parameters, both in some way
mutually antagonistic: control over the situation in terms of both influence and
measurability/perceivability, and ecological validity in terms of the genuineness of
the situation, its resemblance to reality, or simply, its non-artificiality (Clark-Carter,
2009). The problem here is to chose a research method and design, which maximizes
control and minimizes compromises to ecological validity.
The subsequent issues (the data, analysis, etc.), will be discussed and covered
gradually in this chapter.

4.1 My setting and the research method

Initially, there are two major directions to lean towards regarding the approach
to the research problem (see 1.2): the qualitative and the quantitative approach.

53
Although those two approaches are not categorically distinct, since the process of
qualitative research involves quantitative methods, and quantitative research always
involves some interpretation by the researcher, which has a deeply qualitative char-
acter. Since the major research concerns of this study are related to human behavior
and its relation to cognition, and hence, psychology, the research methods and ap-
proaches are mostly discussed from the perspectives of this field. According to
Clark-Carter (2009), the quantitative approach is generally related to experiment-
ing, measuring, asking questions, observing, and statistically analyzing. Even if the
inputs are textual, before a statistical analysis, they usually need to be assigned
numerical values. Qualitative approach mostly differs in the analysis process, since
the input data are often collected as textual, and as such they are also analyzed.
Compared to the quantitative approach, the qualitative one is generally related
to exploring, describing and interpreting experiences of participants (Smith, 2008).
On the philosophical plane, quantitative approach is influenced by positivism, which
among other assumes that a subject can always be objectively described by a system
of measurable variables and their deterministic interactions. This applies especially
to behaviorism, which adopted a radically positivist view. Cognitivism as the ma-
jor replacement of the behaviorist trend also contains some underlying positivism,
according to Ashworth (2008). The qualitative approach is somewhat more leaned
towards humanism as opposed to naturalism, while constructivist, interpretivist,
and critical theorist views are more common. Constructivism as the epistemological
opposite of positivism, is in short based on the assumption that knowledge is being
constructed within a mind instead of being observed from reality. Interpretivism is
further extended by the assumption that all knowledge is a matter of interpretation
as a form of construction (Ashworth, 2008). Critical theory, which also builds on
interpretation, is defined as “the examination and critique of society and culture,
drawing from knowledge across the social sciences and humanities”1 . Critical the-
ory is based on values and holds that knowledge is “generated through ideological
critiques of power, privilege and oppression”2 as rooted in feminist and advocacy
research.
According to the character of the research problem, I have chosen quantitative
approach as the dominant one, yet not the only one. In a measurement and de-
terminism based contextual validation of concepts (models), which are products of
interpretation and introspection, I accept positivism in a context-aware cognitivist
approach in the lowest, quantitative layer of the study, seeing animal and human
behavior as a co-product of cognition and mental state. Within the more abstract,
qualitative layer, I use the constructivist viewpoint holding that our knowledge as a
result of individual mental construction on top of individual perception and cogni-
tion, is individually possessed. Looking deeper into epistemology for some thoughts
on justification of knowledge, it can seem valid in certain contexts, invalid in other
contexts, while the resolution of this problem might lie in some concepts either not
taken into account or not eliminated within a specific reasoning. Having rejected the
positivist notion of ultimate reality, I see the value of this study’s findings through

1
According to Wikipedia: http://en.wikipedia.org/wiki/Critical_theory [Accessed 2010-
04-01]
2
According to anonymous presentation slides: http://www.docstoc.com/docs/8558617/
Research-Philosophy/ [Accessed 2010-04-05]

54
a coherentist viewpoint (Kuukkanen, 2007). In this view, the findings present a
tiny drop in the sea of concepts, which linked to other findings support and/or op-
pose, and get supported and/or opposed by some of those. In the long run, the
findings might either help us converging to a more powerful model of reality, or
be rejected/corrected in case they render as erroneous or otherwise invalid. I feel,
however, no ability to judge the external validity in an absolute sense.
Clark-Carter (2009) mentions modeling, artificial intelligence, experiment, inter-
view, questionnaire, observation, content analysis, meta-analysis, and case study as
quantitative methods of psychological research. Among qualitative methods, there
are at least phenomenology, interpretative phenomenological analysis, grounded the-
ory study, narrative study, conversation analysis, discourse analysis, focus group
study, and cooperative inquiry, as all described in (Smith, 2008).
In order to achieve the research goals of this study, I have chosen observational
study as the dominant research method, since the primary concern is rather covert
human behavior and its causal relations from cognition. Covert behavior is a behav-
ior which cannot be observed directly, such as physiological responses, as character-
ized by Clark-Carter (2009). For some classification, Clark-Carter recognizes three
types of behavior: (1) overt non-verbal, (2) verbal, and (3) covert. The subject to be
observed and the concepts to be described are related to distinguishing characteris-
tics in the dynamics of human-computer interaction behavior (criterion variables) in
relation to specific tasks or activities performed simultaneously in specific conditions
(predictor variables), while the tasks primarily include writing, reading, listening,
and different types of cognition. Because of perceived difficulties controlling extrane-
ous influences, more than a single observation is used. Three systematic continuous
real-time observations are complemented by other means of data collection, such
as a questionnaire, which is largely a subject of qualitative interpretation. Using
several different methods focusing on the same area of research is referred to as
triangulation (ibid), which is also used in the study.

4.2 Validity of a research design

Kerlinger & Lee (2000) holds that a research design has two basic purposes: (1) to
answer research questions, and (2) to control variance, or more precisely, to (2a)
maximize systematic variance as the variation in measures caused by the measure-
ment subjects being influenced in certain predictable way, (2b) control extraneous
systematic variance, and (2c) minimize error variance as the varying of measures
that is unaccounted for (ibid). In my apprehension, both of the purposes are di-
rectly related to validity toward a specific research goal.
The validity of research designs are of two main types: external and internal.

External validity refers to the generalisability of the findings of a piece of re-

search. Similarities can be seen between this form of validity and ecological
validity. There are two main areas where the generalisability of the research
could be in question. Firstly, there may be a question over the degree to which
the particular conditions pertaining in the study can allow the results of the
study to be generalised to other conditions – the tasks required of the partic-
ipants, the setting in which the study took place or the time when the study
was conducted. Secondly, we can question whether aspects of the participants

55
can allow the results of a study to be generalised to other people – whether
they are representative of the group from whom they come, and whether they
are representative of a wider range of people. (Clark-Carter, 2009, p. 40)

As quoted, the concerns are mostly task, setting and time with regards to different
conditions; aspects of the participants, and generalizability to other groups. Clark-
Carter also mentions two main ways to improve external validity of a research design:
replication and sampling (selection of participants).

Internal validity is the degree to which a design successfully demonstrates that

changes in a dependent variable are caused by changes in an independent vari-
able (Clark-Carter, 2009, p.42).

Selection, maturation of subject participants, history, instrumentation (measure-

ment influence when measuring at multiple occasions), testing, attrition (the loss of
participants from the study), selection by maturation, imitation (diffusion of treat-
ments, e.g. as a result of interpersonal influence among different groups of partici-
pants), compensation (from those dealing with participants across different groups),
compensatory rivalry (participants in one group make extra effort to perform bet-
ter than those in another group), demoralization, and regression to the mean, are
mentioned as threats to internal validity (ibid). Ways to improving internal valid-
ity include the usage of control groups, adequate briefing of participants and those
conducting the study, as well as randomness in allocation to different conditions
(ibid).

4.3 Reliability and validity of a measure

Reliability of measures within data collection is defined as
the extent to which a test or procedure produces similar results under constant
conditions on all occasions (Bell, 2005, p. 117),

or synonymically as
dependability, stability, consistency, reproducibility, predictability, and lack of
distortion (Kerlinger & Lee, 2000, p. 642).

In somewhat similar fashion, Bell (2005) states that

usual definitions of validity are that it tells us whether an item or instrument
measures or describes what it is supposed to measure or describe, but this is
rather vague and leaves many questions unanswered. Sapsford & Jupp (1996)
offer a more precise definition. They take ‘validity’ to mean ‘the design of re-
search to provide credible conclusions; whether the evidence which the research
offers can bear the weight of the interpretation that is put on it’ (p. 1). They
argue that what has to be established is whether data:

Do measure or characterize what the authors claim, and that the inter-
pretations do follow from them. The structure of a piece of research
determines the conclusions that can be drawn from it and, most impor-
tantly, the conclusions that should not be drawn from it. (Sapsford &
Jupp, 1996, p. 1)

56
If an item is unreliable, then it must also lack validity, but a reliable item is
not necessarily also valid. It could produce the same or similar responses on all
occasions, but not be measuring what it is supposed to measure. (Bell, 2005,
p. 117-118)

The concept of validity can be further divided into several types: (1) Face validity
as the validity perception the people being measured and the people administering
the measures have of the measures; (2) construct validity as the extent of assessing
some theoretical construct well; (3) content validity as the degree to which a measure
covers the full range of behavior related to what is being measured; and (4) criterion
validity as the extent to which a measure fulfills certain criteria – mostly in terms
of concurrency and predictability (Clark-Carter, 2009; Kerlinger & Lee, 2000).

4.4 Research design and research process

Since the study focuses on several parameters of human-computer interaction be-
havior as criterion variables, it is multivariate. Although there could certainly be
multiple predictor variables (many of them unknown to and uncontrolled by me),
only one can be considered as the predictor variable (the active one, deliberately
tampered with by the researcher) – the tasks co-causing and performed simulta-
neously with the human-computer interaction. Further, since each participant is
supposed to go through all of the tasks (as levels of the predictor variable), it re-
sembles within-subjects design in experimental designs, although this study uses an
observational one. The use of this approach raises a question about the effects of
predictor variable level order (Clark-Carter, 2009). In order to enable localization
of possible order effects of the tasks performed, those vary for each participant. Per-
forming different tasks in context of this study is not expected to have significant
carry-over effects, and therefore, no artificial delays are used to counter these. In-
stead, I rely on the localizability of these through varying of tasks (predictor variable
levels) on an individual basis.
The research process consists of several steps as outlined on figure 4.1. Below,
there is a description of each stage together with the models as artifacts.
Models. The models are my initial set of expectations and assumptions regard-
ing the behavior and its causality. Those inspired the design of the observations and
guide the choice of behavioral features to extract and analyze during the analysis.
Observations (1, 2, 3). In all observations, participants are assigned different
tasks in sequences while being continuously observed during the whole session –
automatedly and manually. As discussed later in more detail (see 4.4.1), most of
the observation subjects are recorded using information technology, while the rest
is manually remarked.
Analysis. The observation analysis is a process consisting of two parts: (1)
translating the data taken within observations to composite constructs having sub-
jectively more directly applicable and meaningful parameters in order to describe the
behavior and its dynamics, and (2) interpreting relations between actions, subject-
related factors and behavior through triangulation.
Evaluation. Within evaluation as a process tightly related to analysis, I draw
conclusions between actions, subject-related factors and behavior based on the out-
puts of analysis.

57
Figure 4.1: Research process overview

4.4.1 Empirical inputs

The empirical inputs are divided into two types: automatedly gathered data, manu-
ally gathered data. While the former is primary to the analysis, the latter is meant
to serve for triangulation during the analysis and further explanation of eventual
behavioral anomalies.
The automatedly gathered data are the following:

• Keystroke input events

◦ Timestamp resolved to tens of milliseconds

◦ Keycode
◦ Event type (key press, release or hold)

• Mouse input events

◦ Timestamp resolved to tens of milliseconds

◦ Mouse button (when applicable)
◦ Axial position difference since last mouse input event (for X, Y and mouse
wheel)

• Plain text, which the participants write

• Answers to the questions in questionnaire (see 4.4.3 and appendix C)

Based on the keyboard input events and primarily plain text, it is also possible to
extract typed linguistic features, which also fall under the category of automatedly
gathered data. The data are recorded using standard computer hardware (keyboard

58
and mouse) and a custom software, which records the input events from the hardware
together with timestamps with nominal precision of tens of milliseconds.
The manually gathered data are not completely specified. At least the follow-
ing are focused on: (1) observer feelings about the environment, (2) observer notes
about the weather, (3) observer notes about the lighting conditions, (4) observer
notes about the room temperature, and (5) observer notes about any signitficant
events or anomalies during the observations.
The participants are anonymous in terms of omitting the association of the gath-
ered inputs with the personally identifiable data of the participants, such as name,
nickname, or personal number.

Choice of the empirical data

The automatedly gathered data have been chosen because they both build up most
of the behavior extractable from a computer with only keyboard and mouse as
the input devices, and allow the extraction of linguistic features from the human-
computer interaction (see appendix A).
The manually gathered data have been chosen in order to enable localization of
uncontrolled factor effects to a subjectively reasonably high degree.

Choice of the recording method

The choice of recording methods for observations was done in accordance with the
following statement:
The ideal method of recording what is observed is one which is both unobtrusive
and preserves as much of the original behaviour as possible. (Clark-Carter, 2009,
p. 101)

For the manually gathered data, only manual remarks are taken (in the ‘pen and pa-
per’ fashion). To reliably gather the data describing molecular behavior, however,
automated recording able to record the data as specified above was chosen. The
reasons for the choice were perceived needs for (1) relatively high time accuracy,
implying the need for relatively high time resolution, (2) reliable continuous gath-
ering of data without loss in form of leave-outs, (3) minimal obtrusiveness during
the gathering process, (4) high efficiency and automation of gathering during the
gathering process, (5) efficient storage and transfer, (6) efficient and trivial recon-
structibility of the input event flow. With regards to those needs, a custom software
based input event recording method rendered as most suitable from the recording
methods realistic for the study.

4.4.2 Observations
The observations are the only process of obtaining empirical inputs for the study.
Despite of initially more courageous planes, I have chosen three single-participant
observations instead of one or more multi-participant ones, mainly because of prac-
tical limits being faced. This was done at the cost of losing probability of locating
eventual external factor based effects on participant behavior. Each of the observa-
tions happens in a different place and at a different time, while the tasks to perform
are of same types and theirselves nearly the same.

59
Figure 4.2: The observation design used in the study

Design

The observations can be classified as (according to Clark-Carter (2009)):

• having complete observer, since the observer only observes participants, while
the observer’s behavior does not participate in the observations,

• observing molecular behavior (as opposed to molar behavior), since it focuses

on behavioral components rather than the whole,

• systematic (also called formal ), since the observation is largely predetermined,

• ecological, since context and setting in which the behavior occurs is of interest,
and meanings together with intentions also play a role, and finally

• structured, since some models of behavior exist before the observations.

The design of the observation is outlined in figure 4.2. Continuous real-time sampling
is seen as the most suitable for the observation, because it enables to record most
of the behavior, while it is both technologically inexpensive and unobtrusive with
regards to the data of interest and recording method used (see 4.4.1).
Each participant within the group has to be observed during each level of the
predictor variable as described in table 4.1. The levels of the predictor variable are
mere instructions what the participants should perform – both in an automated
or a manual manner. With respect to the subject phenomenon of the study, the
effects of specific predictor variable levels on the observed behavior are assumed to
be contemporaneous. Significant carry-over effects are not expected and therefore,
no artificial delays are introduced in between different changing level of the predictor
variable. Although order effects are not expected either, there is a countermeasure
against them in form of varied predictor variable levels for each participant. In
addition, there is a separate ‘copying’ template (different text to copy) for each
predictor variable level that involves copying within a single observation (those differ
from one observation to another). Later within each observation, two levels are
repeated in order to observe the behavior with increased similarity with the text
being copied.

60
Level Level name Level description
PV:AW Authentic writing Writing a text and drawing a diagram
as being formulated or constructed by
self (not reading or hearing it)
PV:VCC Verbatim copying from Rewriting a text and redrawing a dia-
computer screen gram 1:1 (without changes) – from the
computer screen
PV:VCP Verbatim copying from Rewriting a text 1:1 (without changes)
paper – from a physical paper
PV:VCL Verbatim copying by Listening to a text and rewriting it us-
listening ing computer (without deliberate refor-
mulation)
PV:RCC Reformulative copying Rewriting a text and with own reformu-
from computer screen lation – from the computer screen

Table 4.1: Levels of the predictor variable (PV)

Process

The process of each of the observations is outlined in figure 4.3 and it will be held
in the following sequence:

1. Meeting a participant at a specific place and introducing the observation ses-

sion.

2. Letting the participant install required data gathering software on his or her
computer and thus, set up the observational environment.

3. Starting the observation process by starting the automated data collection and
recording.

4. Providing the participant a questionnaire (appendix C) regarding basic per-

sonal information, familiarity with equipment, perception of the environment,
etc.

5. Letting the participant be automatedly perform specific tasks, corresponding

to the predictor variable levels according to table 4.1. Those are:

• PV:AW
• PV:VCC
• PV:VCP

6. Reading a text to all participants, which they have to rewrite using their
computers (corresponding to PV:VCL).

7. Letting the participant finish the observation by performing the a task under
predictor variable PV:RCC.

8. Closing the session

61
Figure 4.3: The observation process (including questionnaire)

Sampling
For the participant selection, a variant of nonprobability sampling between purposive
and convenience sampling (Leedy & Ormrod, 2005, p. 206) has been chosen. In
purposive sampling, people are chosen for a specific purpose – being believed to
belong to the target group in this case. Convenience sampling takes people as they
are readily available (ibid).
The sample consists of three purposively selected participants. Although the
whole target population consists of millions of people, with around 3,9 millions
only in the United States in year 2007 (Allen & Seaman, 2008), the sample size
is small because of limiting practical research conditions. As argued by Leedy &
Ormrod (2005), a sample size of 400 people would be adequate for a descriptive
study. Unfortunately, a number even close to this high I perceive as beyond the
research possibilities of this study.
Since the behavioral patterns dependent on tasks one performs simultaneously
with interacting with a computer when being examined are expected to be dis-
tributed as largely general within study’s target group, there are no special require-
ments regarding the sample variety or size besides what is mentioned above.

Environment
The observational environment is a room of a flat or a shared corridor, such as a
living room. Because of limited control over the student examination environment
in conditions of the intended (‘sharp’) use, no special care is taken regarding the
room selection except for (1) silence in the room, (2) comfortable light conditions
and (3) comfortable temperature.

4.4.3 Questionnaire
Within the observations, each physical participant has been asked to fill in an elec-
tronic questionnaire, while already being observed. The questionnaire is further
described in appendix C.

62
Figure 4.4: Data flow and control relations of the data gathering and analysis processes

4.4.4 Analysis
As mentioned earlier, the analysis has two stages – the statistical and the triangula-
tive one. In the former, quantitative data (keystroke and mouse events) taken within
observations are translated into composite constructs having subjectively more di-
rectly applicable and meaningful parameters (see appendix A), in order to describe
the behavior and its dynamics. In the latter, outputs of the former stage are com-
pared and analyzed together with the interpretations of qualitative empirical inputs,
in order to identify possible relations between actions, subject-related factors and
behavior.
Figure 4.4 provides an overview of data flow and control relations of both data
gathering and analysis processes. First, the data are gathered from participants,
their behavior and the environment. Subsequently, the major part of the data is
mediated in the automated process branch (in figure 4.4). Before producing analysis
results, the automatedly gathered data are statistically analyzed and visualized, and
activity-behavior relationships are identified and/or verified using a triangulative
analysis. Both manual data gathering and triangulative analysis are carried out
manually.
Triangulation in context of this work is used to identify and/or verify action-
behavior relationships. Inputs to the triangulative analysis have both quantitative
and qualitative character (as described by figure 4.4). The following are the trian-
gulation input categories:

• Keystroke, mouse and linguistic dynamics together with single keyboard and
mouse interaction events of the participant behavior

63
• Context of the participant tasks including timing

• Environmental observation data

• Questionnaire answers from the participants

• Observer notes and perceptions

Visualization
Different parameters and their changes in time are visualized by a custom software
written in Java (J2SE) environment.

Statistical analysis
Similarly to how visualization is treated, statistical analysis is done using custom
software. The complexity of statistical measures is fairly low, since those only include
mean and standard deviation of behavioral features.

64
Chapter 5

Analysis and observations

In this chapter, the analysis process, as well as three observations, which were the
source of empiry for the study, are briefly described. The observation description
includes brief facts about the observation process, observation conditions, and a
brief qualitative representation of quantitative parameters extracted from the obser-
vations.

5.1 Analysis
This section strives to describe the process of analysis, which is dividable into three
layers or parts, according to time sequence as well as level of abstraction. The first,
quantitative molecular level, focuses on the automated extraction of properties the
recorded behavior, such as single key latency and other ones described by measures
listed in appendix A, and selection of time interval based sampling parameters for
an appropriate visualization. The second, qualitative molecular level, focuses on
the transformation of the numeric and plotted (graphically visualized) quantitative
measures to a qualitative description, one by one. Finally, the third, qualitative
molar level, focuses on identifying possible relations between manual observations
and the results of the previous two levels/parts of the analysis, not only within the
analysis of a single observation session, but also across the observation sessions.
The ultimate goal of the analysis was to identify seemingly general or individual
behavioral cues appearing at participants when performing specific tasks as being
observed. The validity of the statement that a behavior tends to leave specific cues
usable for its identification (Ekman, 1985), which can be a subject of computer-based
analysis (Zhou et al., 2003, 2004; Lee et al., 2009) was taken for granted within this
study.

5.1.1 Quantitative molecular level

On this level, various numeric parameters (see appendix A) were extracted from
human-computer interaction input events and their timings, by samples of specific
duration and with specific overlay. Those were subsequently plotted (graphically
visualized), and hence, prepared for further stage of analysis within the study. More
specifically, the parameters include features of keyboard, mouse and linguistic dy-
namics. The linguistic dynamics features used and measured were inspired by Ab-

65
basi & Chen (2008), Zhou et al. (2003), and Adkins et al. (2004). The analysis of
mouse dynamics used was somewhat inspired by Ahmed & Traoré (2007) in terms
of mouse operation units (mouse move, drag & drop and point & click), as well as
an angular measurement of those. From a large part, it was designed within the
study. Keystroke dynamics analysis contained largely custom measurement features
designed within the study, besides the well known ones such as timings of single key
and digraph uses (i.e. see Gunetti & Picardi, 2005).
Putting the parameters together adding time, sample duration and overlay (the
sampling parameters) were chosen for a visually optimal resolution of time-based
changes of the measures. Sample duration means the time interval from the first
event within the sample until the last one. Sample overlay means the amount of
mutual overlay between two upcoming samples. The following sampling settings
were chosen based on the readability of plots of the parameters measured:

• Keystroke, mouse and silence dynamics analysis. Sample duration: 4000 mil-
liseconds; Sample overlay: 0.5.

• Linguistic dynamics analysis: Sample duration: between 10000 and 20000

milliseconds; Sample overlay: between 0 and 0.75. The sample size has been
increased in order to accumulate more of the by-nature slower linguistic dy-
namics to a single sample, thus increase the difference identification possibility.
The overlay has been adjusted in order to increase or decrease the histogram
fluency (shakiness), thus easier reading and comparison of some dynamics fea-
tures.

5.1.2 Qualitative molecular level

On this level, differences between measures under different predictor variables (par-
ticipant tasks) were identified and qualitatively described. This is the level of ab-
straction, which draws the border between the quantitative and the qualitative part
of the study, since the quantitative data were transformed to qualitative descriptions
of their dynamics comparison.
This step of the analysis could have been largely automated by applying computer-
based anomaly detection (Chandola et al., 2009) for identification of differences in
the measures across different time intervals, and pattern recognition (Theodoridis &
Koutroumbas, 2006), which would allow for easier comparison between the personal
tendency between sessions, or different time intervals within a single session. Unfor-
tunately, the development of the technical tools for the study did not reach enough
advancement for an effective automation of those techniques. Therefore, these steps
of identifying differences and comparing tendencies across different times and differ-
ent participants, were performed manually.

5.1.3 Qualitative molar level

On this level, manual observations and participants’ answers to the questionnaire
(see appendix C) together with the qualitative descriptions of measure dynamics
comparison are used to identify possible relations between each other and across all
the observation sessions carried out. This was meant to provide possible explanations

66
for the occurrence of eventual temporary deviations or other specific phenomena,
linking those to their possible causes.

5.2 Observations
Within the analysis process, there is around hundred different parameters of human-
computer interaction dynamics (see appendix A), falling into four major groups –
keystroke dynamics, mouse dynamics, silence dynamics, and linguistic dynamics.
The first three groups are intertwined and mutually time-bound (within the analy-
sis). Because of the huge amount of the data representing those different parameters,
this chapter only describes a few of them, and does so in a qualitative way.
According to the analysis results, part of the computer interaction dynamics
(some of its features) had similar or the same tendency in all observations. Another
part of the features were distinct within one session, as the tasks were performed
by a single participant. In the rest of the features, much difference has not been
visually identified, which might either mean that the difference simply did not exist,
or it was too small for the observer to realize from a visual graphical reading.
Since one of the potential threats of examination is impersonation, it seems
relevant to note that each of the participants shown visibly different typing dynamics
features (e.g. key flight, break consistency, key downtime, key rate, mouse speed
and acceleration in different angles and in general, mouse speed center, silence ratio,
linguistic writing diction, etc). This observation allows for the conclusion that the
participants or people examined as well as impersonation would be identifiable, to
some extent at least (see Jain et al., 2004, on biometrics in general).
Among session-specific highlights, dynamics feature designations are used, which
are characterized in appendix A.

5.3 Observation 1
The first observation took place in the evening at the participant’s home. The light
in the room was getting shady, temperature comfortable, although the participant
did not sit in a comfortable position, and the keyboard was positioned slightly
higher according the participant. Because of prior technical difficulties resulting in
observation data loss, this observation was repeated with a different computer and
keyboard from the participant’s own, all of which might have affected the results.
The environment was silent and there were only few distractions such as one phone
call and reading a message on phone (by the participant).

5.3.1 General highlights

Authentic writing yields longer and irregular breaks in typing, not necessarily bound
to words, sentences or blocks. As the most apparent opposite, copying yields short
breaks, which are more regular and uniform. Copying by listening also yields shorter
breaks, but those are somewhat less regular and uniform, compared to copying from
screen or paper.
Applied to drawing, copying yields faster interaction speed (more mouse moves
and clicks being made). It also affects the sequence of e.g. creating diagram nodes,

67
connecting and naming them.

5.3.2 Session-specific highlights

The session-specific highlights are in a structured way listed below. For a closer
description of the parameters, please refer to appendix A.
* GENERAL *
GDTm (any key downtime mean)
- formulating: significantly higher peaks, more peaks
- copying: slightly lower base, rather small peaks, not so many
- hearing: similar to formulating
- reformulating: similar to formulating (and hearing)
GDTsd (any key downtime standard deviation)
- formulating: higher peaks, more often
- copying: slightly lower base, lower peaks, slightly less often
- hearing: lower base (just like copying), but peaks
similar to the ones of formulating
- reformulating: similar to formulating - higher base,
higher peaks
GKRm (any key rate mean)
- formulating: higher peaks, more often; base is similar,
except for reformulating, where it is slightly
higher and more varied than formulating
* WORDS *
WLm (word length mean)
- copying, hearing, reformulating: slightly higher
- reformulating: varied (highly 0-jumpy)
- hearing: less 0-jumpy than reformulating, although more than
writing and copying
WDm (word duration mean)
- similar to WLm (word length)
WDsd (word duration standard deviation)
- copying is slightly higher
- hearing comes after copying
- reformulating comes after hearing
- formulating is smallest
WKFLsd (key flight standard deviation)
- copying: copying and hearing slightly higher
WDLATsd (deliminator latency standard deviation)
- slightly more varied
WNWLATm (next word latency mean)
- copying, hearing, reformulating: slightly higher
* SINGLE KEYS *
SKDTm (single key downtime mean)
- copying: slightly lower peaks
SKRsd (single key rate standard deviation)
- copying: slightly higher peaks
* DIGRAPHS *
DKRm (digraph key rate mean)
- copying: more dense, slightly higher
DKFLsd (digraph key flight standard deviation)
- similar to KRm
DD1sd, DD2sd (standard deviations of digraph key durations)
- copying, hearing: more dense
* CLICK MOVES *
MMAsd (angle standard deviation)
- making: slightly higher peaks
MMmSsd (max speed standard deviation)

68
- copying: higher peaks
MMSm, MMSsd (speed mean and standard deviation)
- copying: slightly higher peaks
MMACsd (acceleration standard deviation)
- copying: higher peaks
MMCm (curvature mean)
- making: higher positive peaks
- copying: higher negative peaks
* PLAIN MOVES *
MMLsd (move length standard deviation)
- copying: slightly higher peaks
MMDm, MMDsd (move duration mean and standard deviation)
- making: slightly higher, slightly higher peaks
* DRAG MOVES *
MMSm, MMSsd (speed mean and standard deviation)
- making: slightly higher base, lower peaks
- copying: slightly lower base, higher peaks
MMSCsd (speed center standard deviation)
- making: higher peaks
MMlmSPsd (last max speed position standard deviation)
- making: higher peaks
MMACsd (acceleration standard deviation)
- copying: more regular peaks that are slightly higher
MMCm (curvature mean)
- copying: slightly more negative
* CLICKS *
MCCCsd (click count standard deviation)
- copying: higher peaks
MCCRm (click rate mean)
- copying: higher peaks
MCCRsd (click rate standard deviation)
- copying: higher peaks
* DRAGS *
- not apparent
* SILENCE *
S# (silence count):
- copying: slightly less varied
- hearing: slightly more varied than copying
- reformulating: more varied than hearing - about as
much as formulating
SLm, SLsd (latency mean and standard deviation)
- copying, reformulating: slightly higher and higher
peaks, more dense
SR (silence ratio)
- copying: lower
- hearing: higher peaks, but not as high as formulating
* LINGUISTICS *
LWPSm, LWPSsd (words per sentence mean and standard deviation)
- formulating: slightly higher
LWLm, LWLsd (word length mean and standard deviation)
- copying, reformulating: higher
- formulating, hearing: lower
LAr (article ratio)
- copying: slightly higher
LQR (quantifier ratio)
- formulating: higher
LCWR (capital words ratio)
- formulating: higher

69
5.4 Observation 2
The second observation also took place in the evening, in a small flat, however not
belonging to the participant. The room lighting was artificial and according to the
participant, sufficiently bright. The temperature was slightly above the comfortable
level and the participant expressed slightly negative emotions when realizing the ef-
fort asked to complete parts the observation, and felt slightly bored in the beginning.
This observation is not complete, since (1) the participant has cognitively simplified
one task (making up and drawing diagram), and (2) the last question (rewriting text
with reformulation) has been omitted by misunderstanding.

5.4.1 General highlights

There are longer and irregular breaks in authentic writing. Copying from screen and
paper makes the breaks more uniform and shorter. Copying by listening makes the
breaks longer than they are with authentic writing.
Believed to be caused by a known issue of cognitive simplification of a task by
the participant, differences in drawing dynamics from making up an own diagram
to copying one, are not apparent.

5.4.2 Session-specific highlights

The session-specific highlights are in a structured way listed below. For a closer
description of the parameters, please refer to appendix A.
* GENERAL *
DKRm (any key rate mean)
- formulating: slightly lower, but more peaks
- copying: slightly higher (except for occassional peaks)
- hearing: between formulating and copying
DKRsd (any key rate standard deviation)
- copying: slightly higher and more uniformly distributed
- hearing: slightly more uniformly distributed
* WORDS *
general word notes
- they are visibly less dense with formulating than they
are with copying and hearing
WLm (word length mean)
- not much difference
WLsd (word length standard deviation)
- formulating: slightly more stable and lower
- copying: higher and more peaky
WDm (word duration mean)
- formulating: lower, more stable
- copying: higher, less stable
- hearing: between the two
WDsd (word duration standard deviation)
- copying: higher peaks, less stable
- hearing: between formulating and copying
WKDm, SKKDsd (key duration mean and standard deviation)
- no noticeable difference
WKFLm (key flight mean)
- copying: very slightly higher, peaks slightly more often
WKFLsd (key flight standard deviation)
- copying: peaks more often, generally slightly higher

70
than formulating on average
- hearing: closer to copying than formulating
WDLATm (word deliminator latency mean)
- copying: higher than formulating
- hearing: much like formualting
WDLATsd (word deliminator latency standard deviation)
- copying: higher, also peaks higher
- hearing: much like formuating
NWLATm (next word latency mean)
- copying: slightly higher than formulating
- hearing: between formulating and copying, somewhat
more like copying
NWLATsd (next word latency standard deviation)
- copying: more dense than formulating (less 0-values);
otherwise difficult to say (maybe slightly higher alsio)
- hearing: much like copying
* SINGLE KEYS *
SKDTm (single key downtime mean)
- formulating: peaky, less uniform than copying
- copying: visibly more uniform and with smaller peaks
SKDTsd (single key downtime standard deviation)
- formulating: medium density, some peaks
- copying: high density
- hearing: low density
SKRm (key rate mean)
- formulating: slightly lower
- copying: slightly higher
- hearing: between
SKRsd (single key rate standard deviation)
- just density (formulating, hearing ~= copying)
* DIGRAPHS *
DDm (digraph duration mean)
- copying, hearing: slightly more uniform
- about the same values
DDsd (digraph duration standard deviation)
- copying, hearing: slightly more uniform
(this all might be the density issue)
DKRm (digraph key rate mean)
- all much like Dm (digraph duration)
DKRsd (digraph key rate standard deviation)
- formulating, copying: quite much zero
- hearing: higher than the rest
- copying and hearing more dense than formulating
DKFLm (digraph key flight mean)
- copying more dense than formulating
- copying has peaks more often than both
formulating and hearing
DKFLsd (digraph key flight standard deviation)
- much like KRsd (digraph key rate)
W# (word count), K# (single key count), D# (digraph count)
- formulating: more dispersed
- copying: more fluent
- hearing: between formulating and copying
* CLICK MOVES *
MMDIm (move distance mean)
- making: lower peaks; arc-like distributed
- copying: higher peaks, higher in general;
arc^(-1)-like distributed

71
MMDIsd (move distance standard deviation)
- much like MDDIm (move distance mean)
MMAm (move angle mean)
- making: varies more uniformly
- copying: more dispersed
MMAsd (move angle standard deviation)
- making: more arc-like
MMLm (move length mean)
- making: more arc-like
MMLsd (move length standard deviation)
- making: more arc-like
MMDm (move duration mean)
- making: less varied (less zero-values), quite
uniform distribution
- coping: more varied (from 0 to peaks, which are
a bit higher than with making), quite
uniformly distributed, too
MMDsd (move duration standard deviation)
- similar to MMDm
MMmSm (move max speed mean)
- making: higher, less varied, arc-like
- copying: shorter in the middle, more varied,
especially in the ends, arc^(-1)-like
MMmSsd (move max speed standard deviation)
- highly varied; more difficult to see differences
except the arc-like distribution with making
MMSm, Ssd (move speed mean and standard deviation)
- similar to MMmSm and MMmSsd
MMSCm, MMSCsd (speed center mean and standard deviation)
- no visible differences
MMACm, MMACsd (acceleration mean and standard deviation)
- making: much less varied (uniform) than copying; arc-like
- copying: varying from 0-values to peaks
* PLAIN MOVES *
- much like click move, but little less apparent
* DRAG MOVES *
MMLm (move length mean)
- making: higher, more peaky
MMLsd (move length standard deviation)
- making: higher
MMmSm, MMmSsd (max speed mean and standard deviation)
- much like MMLm, MMLsd
MMSm, MMsSd (speed mean and standard deviation)
- much like Lm, Lsd
MMACm, MMACsd (acceleration mean and standard deviation)
- making: higher
MMaCm (acceleration center mean)
- making: lower
MMaCsd (acceleration center standard deviation)
- making: a little more dense
* CLICKS *
MC# (click count)
- copying: more uniformly distributed
* DRAGS *
MDDm (drag duration mean)
- making: slightly shorter
MDTTm (tailing time mean)
- making: shorter, varying to both negative and

72
positive direction from zero
- copying: longer, all negative
* SILENCE *
S# (silence count)
- more varied (jumping to 0)
SDsd (duration standard deviation)
- slightly more dense
SR (silence ratio)
- making: slightly more (boosted sampling time to 40
seconds compared to other measures)
* LINGUISTICS *
LPR (paragraph ratio)
- making: slightly more paragraphs
LWLm, LWLsd (word length mean and standard deviation)
- making: slightly shorter
LDiv (lexical diversity)
- making: slightly lower
LAR (article ratio)
- making: slightly lower

5.5 Observation 3
The third observation was taken on longer distance. The light was natural and
bright, the room temperature was perceived slightly high and the participant has
been under a significant physical load around 2 hours prior to the observation.
The participant was connected from a distant location, which hindered any direct
manual observation. As a quasi-replacement, questions were asked by the observer
in a telephone call. The copying by listening part of this observation is not available.

5.5.1 General highlights

Similarly to the first two observations, authentic writing yields longer and more
irregular breaks compared to copying. Copying with reformulation yields longer
breaks than verbatim copying, but not as long as authentic writing.
Applied to drawing diagrams, making up an original diagram and drawing it
yields more randomish placed breaks and slightly varying speed of interaction, com-
pared to copying a diagram from a computer screen.

5.5.2 Session-specific highlights

The session-specific highlights are in a structured way listed below. For a closer
description of the parameters, please refer to appendix A.
* GENERAL *
- formulating: longer breaks in writing
- copying: more uniform breaks, short ones
- copying with reformulation: longer breaks than copying,
but shorter than formulating;
still more uniform than formulating
- drawing-making: more random-like breaks, slightly varying
speed of clicking/drawing
- drawing-copying: more uniform breaks, slighty more uniform speed
of clicking/drawing
* WORDS *

73
WKRm, WKRsd (word key rate mean and standard deviation)
- formulating: higher peaks
- copying: lower peaks, less varied
- reformulating: much like formulating
WDLATm, WDLATsd (word deliminator latency mean and std. dev.)
- formulating: more varied, less dense
- copying: opposite, does not jump much to 0-values
NWKATm, NWLATsd (next word latency mean and std. dev.)
- much like DLATm, DLATsd
* SINGLE KEYS *
SKRm, SKRsd (single key rate mean and standard deviation)
- formulating: more 0-jumpy, slightly less dense
* CLICK MOVES *
MMDIm (move distance mean)
- making: slightly more peaky; more arc-like distribution
- copying: rather arc^(-1)-like distribution
MMAm (move angle mean)
- making: more arc-like; less varied
- copying: more varied across all angles
MMLm (move length mean)
- making: increasing; higher than with making in the end
MMmSm, MMmSsd (move max speed mean and standard deviation)
- making: arc-like
MMlmSPm, MMlmSPsd (last max speed position mean and std. dev.)
- making: lower, but with slightly higher peaks
* PLAIN MOVES *
- in general much less apparent changes compared to click move
- arc-like distribution difference between making and copying lost
MMAsd (move angle standard deviation)
- moving: slightly higher
MMSm, MMSsd (move speed mean and standard deviation)
- making: slightly lower
MMACm, MMACsd (acceleration mean and standard deviation)
- making: slightly lower (as Sm, Ssd)
* DRAG MOVES *
MMDIsd (move distance standard deviation)
- making: more varying
MMaCm, MMaCsd (absolute curvature mean and standard deviation)
- making: higher, higher peaks
* CLICKS *
MC# (click count)
- making: slightly less
* DRAGS *
MDDm, MDDsd (drag duration mean and standard deviation)
- making: lower
MDMLm, MDMLsd (drag move latency mean and standard deviation)
- making: lower
* SILENCE *
S# (silence count)
- copying: slightly higher
SDm, SDsd (silsnce duration mean and standard deviation)
- copying: significantly shorter
SLATm, SLATsd (silence latency mean and standard deviation)
- copying: very slightly higher
SR (silence ratio)
- copying: significantly smaller
* LINGUISTICS *
LPNR (punctuation ratio)

74
- copying: slightly smaller

5.6 Triangulative analysis remarks

In general, the small sample size, low variety together with low overlay of conditions
across the sample did not allow for much triangulation of changes in behavioral dy-
namics and environmental factors. Despite the fact, the earlier mentioned technical
difficulties during the first observation, which resulted in performing the whole se-
quence of assigned tasks again by the participant, have caused faster completion of
the sequence (the observation session) as well as noticeably faster pace of writing,
compared to performing the sequence for the first time. Also important to note,
while performing the sequence for the second time, the participant used a different
keyboard, subjectively more suited for fast typing, and the amount of mental arousal
seemed to be higher, too. All this presumably resulted in less apparent differences
in duration and regularity typing breaks, as well as other parameters measured.

75
76
Chapter 6

Results and findings

This chapter qualitatively presents main findings of the study. Those are based on
the observations and aim to provide answers to the research questions of the study.

6.1 Behavioral anomaly indication

The observation analysis has shown similar differences in multiple different dynamics
parameters, which were simulated by changing the mode of writing text or drawing
diagram from authentically making it up (formulating), through copying from screen
or paper, listening to a voice reading the text to write, to copying text by reading
and reformulating. This imposes that there is a potential for identification of not
only general, but also specific behavioral anomalies. Across the sample population
of this study, some of the parameters that may indicate a specific type of behavioral
anomaly are general, while some are specific to the person performing the behavior.
Descriptions of the specific differences for each observation session are summarized
in section 5.2 and its subsections. Knowing how a specific person behaves under
specific conditions (knowing a behavioral profile) then gives a potential to improve
the indication reliability.

6.2 Indicating cheating

Since there is a potential to indicate specific type of activities such as writing text
authentically, copying text by reading, or listening to a spoken text, there is a poten-
tial for indicating written examination cheating as long as some of those activities
are not allowed within the examination or the specific examination task. This state-
ment is based on a logical deduction, although not an experimental validation, which
might be of interest for further research in the field.

6.3 Indication difficulties

As resulted from the observation analysis, major difficulties in the indication of a
behavioral anomaly or a specific behavior is the specificity of the respective behavior
to the people producing the behavior. Indicating behavioral anomaly specifically
(e.g. copying by listening instead of authentic writing) appears to be more difficult

77
than merely indicating that there is an anomaly from authentic writing for a specific
person.
Logically, better resolution (more different independent parameters are available
about the behavior) gives us better possibilities for valid detection, classification, and
thus indication of an anomaly. The analysis results have shown the difficulty and
high time demands of personally inspecting tens or hundreds of different interaction
dynamics parameters, which pretty much casts a shadow at this approach. The
easier way seems to be through using some automation. A way of automating
the process leads through anomaly detection (Chandola et al., 2009) and pattern
recognition (Theodoridis & Koutroumbas, 2006), both of which offer a range of
differently suitable approaches within.
Since prototyping of the whole cheating detection process in an automated way
was beyond the limits for the study, automated anomaly detection and classification
have not taken place. Instead, these parts were performed manually through reading
and processing input event maps and histograms of the parameters measured, which
was the point of conversion for quantitative descriptive data into qualitative ones.

78
Chapter 7

Conclusion and discussions

This chapter presents a brief conclusive summary of the study, followed by discussion
of aspects related to behaviometrics and technology, cue leakage, as well as the
cheating-related background. Besides, it discusses future research options perceived
as beneficial to the topic.

7.1 Conclusion

The study found and described keyboard and mouse related behavioral tendencies
with different modes of writing imposed by different tasks involving activities along
with the writing itself. Thereby, it provided an answer to the first research question
(see section 1.2). The second research question is answered by the description of
significance of the behavioral measure differences provided (see section 5.2).
This work starts from the very fundamentals by rather broadly summarizing the
phenomenon of cheating including its consequences, forms, causes, means of per-
ceiving and combating it. From this point the work continues toward concepts of
automated behavioral analysis, further to its core – the research method, observa-
tions and findings – and ends with conclusions and discussions regarding the core,
conceptually and pragmatically linked to its background. Compared to the research
goals, the scope might arguably seem too broad, although I believe it is important
and valuable to connect the core of the study to as much of its basis (background)
as possible, to make it easier to realize relevant connections and inspire thoughts
leading to further validation or exploration of the field.
The results of the study contain description of behavioral changes imposed by
performing a sequence of specifically assigned tasks. The descriptions have highly
qualitative character, and resolution seemingly too low for a direct applicability to
developing an effective solution of automated cheating detection. Instead, as the
results of an approach prototype, they may have higher potential to inspire and
encourage further research in the area.

79
7.2 Cheating detection and prevention approach discus-
sion
This section discusses the chosen approach and its relation to both behaviometrics
and the phenomenon of examination cheating.

7.2.1 Behaviometric aspects

With regards to the approach, at least the following phases are required for the
cheating detection to work for each person: (1) Initial profiling, within which the
personal profile/fingerprint reflecting behavior with different types of tasks needs to
be created. This can be done by a single session, designed similarly as the observation
session described in the study. After this phase, (2) the operational phase can take
place, within which examinations of the person can be analyzed, and the personal
profile/fingerprint can be updated to maintain its refection of the actual behavioral
tendencies of the person, since those tend to diverge in time (Wood et al., 2008).
According to Jain et al. (1999) and the properties of biometric measures, I see
those as described in table 7.1.
The following types of attacks are seen relevant with regards to the approach:

1. Impersonation, which would make a person able to write an exam for someone
else without this being detected within the analysis.

2. Identity concealment (e.g. through masquerade during profiling or examina-

tion), which would make the system unable to verify the authenticity of the
person (student) within the analysis.

3. Masquerade as a deliberate attempt to temporarily alter behavior of one-

self, which could make the system unable to recognize the actual behavioral
anomaly, such as cheating. In other words, this would allow the person (stu-
dent) to cheat without any anomalies being detected.

Besides the attacks, there is a threat of misclassification of the behavior, which is

seen relevant. This means the presence of a positive cheating indication even if the
person (student) does not cheat, or vice versa.
Following a terms used by Peacock et al. (2004), the cost to a user to enroll would
be equal to at least completing the profiling session named above. A construct such
as cost to a user to get his/her behavior classified could be measured for each task,
in terms of how much amount of input is needed in order for the behavior and
anomalies to be classified. The number might highly vary dependent on the task,
the person and the actual inputs.
Cheating detectability could possibly drag some inspiration from the general bio-
metric analogy with animals (Doddington et al., 1998; Yager & Dunstone, 2010) in
terms of recognizability, or in context of this study then, cheating detectability. Al-
though exploring this would require a large sample of people to be analyzed, I believe
that people could be categorized according to cheating detectability the same way
they were categorized according to biometric recognizability, since both approaches
share a common basis – physiological or behavioral biometric measurement, and
machine learning.

80
Property Remarks
Universality The behavior of anyone who uses a keyboard and mouse to write
a computer-based examination can be measured and analyzed. I
have not identified any principal-level exceptions to this. Exceptions
can occur on the technological level, where they can be limited by
incompatibility of a specific technological solution and the platform
(operating system environment) a person (student) uses.
Uniqueness In case of person authentication (impersonation detection), this pa-
rameter is inherited from behavioral biometrics in general. In case
of detection of the authentic person’s cheating, each specific behav-
ior appeared to have its specifics compared to the other ones for the
person.
Permanence The invariance of specific behavioral characteristics for people has
not been a concern within the study. Supposedly it slightly changes
and a countermeasure against deviation from profile is to regularly
update the profile with examination analyses.
Collectability Collectability requires technology-driven recording of input events,
and the time to write an examination session or a task (part of the
session). Besides, running a technological solution for automated
cheating detection deserves some time spent on administration of the
records stored in the system’s database, etc.
Performance Implemented in Java 6 runtime environment without specific atten-
tion to computation speed optimizations, quantitative analysis of a
1-hour examination takes around 1-3 seconds on a 2.5 GHz Intel Core
2 Duo CPU (T9300), dependent on sampling settings. Operating
memory requirements floated between 0.5 GB to 2 GB, dependent
on sampling settings. This indicates that the quantitative part of the
analysis can be performed on a common workstation, taking almost
negligible time.
Acceptability Acceptability has not been a concern within this study. Supposedly
there might be privacy concerns with respect to (1) gained ability
for future identification of the person based on computer interaction
style, as well as (2) potential ability to extract personality features
from computer interaction style. Besides, people might feel uneasy
being aware that their computer inputs are being recorded during an
examination session.
Circumvention Circumvention possibilities have not been a concern within the study.

Table 7.1: Biometric properties of the approach (with regards to Jain et al. (1999))

81
Figure 7.1: The cheating prevention approach

A potentially helpful part of cheating detection, stress detection, was not taken
into account for this study, because of its perceived difficulty, as understood from
Vizer et al. (2009).
The analysis of linguistic dynamics used in this study was limited to analysis
of lexical units, without reaching further to syntax and semantic relations of the
language (English in case of the study). Presumably, the resolution provided by a
in-deep linguistic analysis, based on e.g. the theory of systemic functional linguistics
(Eggins, 2004; Fawcett, 2008), would certainly outperform the one used, although
the difficulty of performing such analysis did not allow for its application in this
study.

7.2.2 Cheating aspects

There are a large number of varied ways to cheat during examinations, while many
of them share common bases. In computer-based examination cheating, it is often
reading from another written/drawn source instead of one’s own memory, or listening
to a spoken source. One could arguably also include plain copying in a copy-paste
manner, although that is out of focus for the approach, since it is easily preventable
given an adequate use of the available software possibilities (developing or acquiring a
simple editor, which restricts copying such as the one through clipboard). Addressing
the ways of cheating in terms of combating it, can seemingly be done manually,
automatedly, proactively, reactively, positively, negatively, and so on. Dependent on
the “combat vector”, or simply the specific way of prevention, detection of cheating
occurrences can be required first, as it is in case of reacting on cheating. Cheating
can also be combated using many narrowly applicable measures, or just a few ones
with broader application possibilities can be used. There are many motivational
factors to address when combating cheating, since it is a dominantly motivational
issue.
The approach described within this study was meant to cover all three phases
of cheating prevention according to Dick et al. (2003) – preemption, detection and
response. The approach is outlined in figure 7.1.
This purely “police approach” (according to Lawrence Hinman in Olt (2002))
as opposed to virtues and prevention approach, is meant realize the preemption
through the perceived situational constraints (see the model of cheating causation
by Whitley & Keith-Spiegel (2002) in figure 2.3) and hence, perceived behavioral
control, one of the concepts of theory of planned behavior (Ajzen, 1991) shown in

82
figure 3.7, including the extension of Stone et al. (2009). In my opinion, the studied
approach has a high potential for the way of cheating prevention described above,
although the validation of this statement is left up to some other study.
Unfortunately, there is a problem related to combating cheating in general, res-
olution of which remains untouched by this approach. There are usually conditions
when faculty can get liable for student harm, including malicious false accusation,
use of names of individuals not involved in a given cheating case, or violation of
student’s right to due process by ignoring the institution’s procedures for resolv-
ing academically dishonest accusations – as mentioned by Whitley & Keith-Spiegel
(2002) in Wehman (2009). This is also a motivational problem addressing perceived
behavioral control of theory of planned behavior (this time related to combating
cheating). This fact itself gets even strengthened by the notion of misclassification
problems relevant for the approach studied, that triggering cases based on full re-
liance on the studied approach alone might get a faculty in trouble, given certain
considerable situations.
In the end, I believe that the most effective effect of the approach is the preemp-
tive one, while the actual strength to prove already existing cheating is less critical
given that most of the students would be ashamed and strive to avoid being even
detected, not talking about getting punished within an official cheating case.
This approach might well be no exception to what Dick et al. (2003) stated: “an
ounce of prevention is worth pound of cure” (p. 182).

7.2.3 Psychological aspects

Handling cheating detection issues without automation seems obtrusive, since one
needs to search for it, which imposes certain level of suspicion toward the people
examined. Suspicion as an emotion has a negative character and spending a lot of
time with it might have effects on forming personality on individual level as well
as effects on forming culture on the social level. Besides, it also seems difficult to
achieve equal judgment of cheating for examinations from one student to another
(Cizek, 1999). Human perception has its limits varying based on different factors,
to which technology such as computers seem to be immune. Therefore, although
strength of qualitative analysis tends to be greater at humans, strength of quantita-
tive analysis as well as stability and consistency of routine tasks tend to be higher
at machines. In my own experience, leaving quantitatively difficult routine tasks to
human often leads to low and unstable performance, compared to machines. Applied
to cheating detection or pattern recognition, this is consistent with e.g. threshold
theory (of psychophysics) (McNicol, 2004, chap. 7), which states that a stimulus
needs to be significant enough in order to be taken into account within our per-
ception. For humans, this equilibrium seems to depend on a range of factors, and
changes in time. Therefore, I see a strong need for automating the somewhat trivial,
yet computationally difficult routine tasks, to both achieve a better effectivity and
efficiency.

83
7.3 Research approach discussion
In this study, a rather limited set of inputs was used – those one could capture
through the keyboard and mouse of a computer. I believe that adding more such
as voice/speech, video recognition through commonly available hardware as micro-
phones and web cameras, would improve the potential of indication effectiveness.
Not to mention less commonly accessible measurements such as electrocardiogram
or electroencephalogram, which would increase the detection resolution even more.
In spite of all this, delimiting the study to keyboard and mouse inputs are perceived
reasonable because of the current common availability of this equipment compared
to the rest of the equipment mentioned.
The study involved quite extensive development and use of information tech-
nology in order to be done, because a reliable machine-based recording of the user
inputs together with some automated quantitative analysis support seemed to be
the far best approach to performing the study. At the time of writing, the author is
not aware about any more effective and practically applicable approaches.
Within a short reflection, the validity of this study’s research design and findings
is heavily limited by the population size, purposiveness of the sampling and time
requirement optimizations of the observation process. Carrying out the study again,
I would primarily attempt to lessen those limits. In table 7.2, there is a brief
discussion on four properties of measure validity as categorized by Clark-Carter
(2009) and Kerlinger & Lee (2000).
In the end, there is a need to admit that the applicability of this study’s findings
toward combating cheating is rather indirect. There seems to be a lot to exploration
to be done further, followed by appropriate application of technology and more,
in order to develop a well functioning behavioral anomaly indication approach or
mechanism. There are no silver bullets and in the field of behavioral analysis, it
seems to be actual specifically. Hopefully, this small set of findings is another drop
in the sea furthering the process of achieving the above mentioned goal.

7.4 Outlooks for further research

Within outlooks for some potential follow-up research, I see several directions –
technical, motivational, behavioral, and organizational.
On the technical level, two topics could be helpful: Going in-deep, (1) studying
the possibilities and performance of cheating-related anomaly detection and classi-
fication automation, for which machine learning techniques are central, seems ben-
eficial. Going in breadth, (2) studying behaviometric cheating detection approach
extension could help improve the precision possibilities. The approach might be ex-
tensible using physiological biometrics such as skin humidity or heartbeat measures,
or other behavioral biometrics such as audio-visual analysis of voice, background
sounds, facial gestures, gaze or movements through web camera and microphone.
Being able to measure and analyze stress (Vizer et al., 2009) could largely help,
too. On the motivational level, (3) correlations of applying the cheating detection
approach to motivational aspects, e.g. with regards to the theory of planned be-
havior (Ajzen, 1991), could be explored. On the behavioral level, three issues might
be of interest. Firstly, (4) environmental aspects and mood state related effects on

84
Type of validity Remarks
Face validity No doubts about the validity of measures have been noted
during the observations. Personally, I believe different mea-
sures can only help – those patterns in which were not iden-
tified were simply ignored for the session instance, while the
others were taken into account. There have been measures
that appeared relevant and also those, in which I didn’t
identify any patterns across the different tasks performed
by the participants. In context of the analysis, however,
excluding a parameter could limit the analytic capabilities
for other samples, in which the parameter might distinguish
the behavior while performing some tasks. There was no
list of cues to look for – the cues are encoded in combi-
nations of behavioral dynamics measures and time, which
is why every measure with some independence toward the
set of measures already taken into account, is potentially
useful.
Construct validity Since no theory discussing specific behavioral measures was
used in the study or known to me, I’m not able to judge
the construct validity of the measures used.
Content validity The completeness of the measures toward the measured
phenomenon was certainly limited. Although there was
strong effort to maximize the amount of different and in-
dependent measures in order to measure most of the phe-
nomenon, it was practically impossible to measure the phe-
nomenon fully.
Criterion validity Concurrency, or how much do the measures show same re-
sults as other measures of the same phenomenon taken at
the same time, is a parameter I cannot judge, since no alter-
native measures were taken for the validation. I hope and
believe that the concurrency of the measures is high, close
to 100 %. Predictability, or how much do the measures re-
flect past or future states, is limited by a broad range of
factors and maturation as a continuous change or drift of
a person’s behavioral dynamics. Therefore, it is surely be-
low 100 %, although hopefully still high enough, since the
successful use of today’s behaviometric technology for au-
thentication and identification imposes rather low amount
of such changes (Wood et al., 2008).

Table 7.2: Discussion of measure validity with regards to Clark-Carter (2009) and Kerlinger
& Lee (2000)

85
human-computer interaction behavior could be studied, which seems to be a broad
and demanding area, requiring experimental approach. Studying those phenomena
are close to measuring of the emotional state itself (Zavadskas et al., 2008; Kak-
lauskas et al., 2009). Secondly, (5) possibilities of intentional circumvention of the
behavioral measures could be studied, with regards to masquerading driven by e.g.
impersonation of a person, or sabotage of the approach effectivity. Thirdly, (6)
properties of behavior under typical examination conditions could be studied and
described more thoroughly, e.g. in terms of general properties of behavior identified
by Yampolskiy & Govindaraju (2008): speed, correctness, redundancy, consistency
and rule obedience. Finally, on the organizational level, (7) the organizational limi-
tations, appetite for, and scope of applicability of the behavioral cheating detection
approach could be studied. Those might include requirements for maintenance, pri-
vacy aspects, regulations, psychological hygiene issues, etc. Since the use of the
automated cheating detection approach and technological solutions lies on humans
in the organization, organizational aspects deserve attention for the assurance of
such use’s effectivity.

86
Bibliography

Abbasi, A., Chen, H., 2008. CyberGate: A Design Framework And System For
Text Analysis of Computer-Mediated Communication. MIS Quarterly, 32(4), pp.
811-837.

Abdi, H., 2007. Signal Detection Theory (SDT). In Salkind, N., ed., 2007. Encyclo-
pedia of Measurement and Statistics. Thousand Oaks, Canada: Sage.

Adkins, M., Twitchell, D.P., Burgoon, J.K., Nunamaker, J.F., 2004. Advanced in
Automated Deception Detection in Text-Based Computer-Mediated Communica-
tion. Proceedings of SPIE, Bellingham: SPIE, Vol. 5423, pp. 122-129.

Ahmed, A.A.E., Traoré, I., 2007. A New Biometric Technology Based on Mouse
Dynamics. IEEE Transactions on Dependable and Secure Computing, 4(3), pp.
165-179.

Airasian, P., 2001. Classroom Assessment: Concepts and Applications. Boston:

McGraw-Hill.

Ajufor, N., Amalraj, A., Diaz, R., Islam, M., Lampe, M., 2008. Refinement of a
Mouse Movement Biometric System. In Proceedings of Student-Faculty Research
Day, CSIS, Pace University. New York City, USA, 2 May 2008.

Ajzen, I., 1991. The Theory of Planned Behavior. Organizational Behavior and Hu-
man Decision Processes, 50(2), pp. 179-211.

Aksarı, Y., Artuner, H., 2009. Active Authentication by Mouse Movements. In

Proceedings of 24th International Symposium on Computer and Information Sci-
ences, Middle East Technical University Northern Cyprus Campus, Cyprus, 14-16
September 2009.

Allen, E.I., Seaman, J., 2003. Seizing the Opportunity: The Quality and Extent of
Online Education in the United States, 2002 and 2003. Needham: Sloan Consor-
tium.

Allen, E.I., Seaman, J., 2005. Growing By Degrees: Online Education in the United
States, 2005. Needham: Sloan Consortium.

Allen, E.I., Seaman, J., 2007. Online Nation: Five years of growth in online learning.
Needham: Sloan Consortium.

Allen, E.I., Seaman, J., 2008. Staying the Course: Online Education in the United
States, 2008. Needham: Sloan Consortium.

87
Anderson, R.J., 2008. Security Engineering: A Guide to Building Dependable Dis-
tributed Systems. 2nd edition. Indianapolis: Wiley Publishing, Inc.

Anolli, L., Balconi, M., Ciceri, R., 2001. Deceptive Miscommunication Theory
(DeMiT): A New Model for the Analysis of Deceptive Communication. In Anolli,
L., Ciceri, R., Riva, G., eds., 2001. Say not to Say: New perspectives on miscom-
munication, IOS Press.

Argamon, S., Whitelaw, C., Chase, P., Hota, S.R., Garg, N., Levitan, S., 2007. Stylis-
tic Text Classification using Functional Lexical Features. Journal of the American
Society for Information Science and Technology, 58(6), pp. 802-822.

Aronson, E., 1969. Theory of Cognitive Dissonance: A Current Perspective. In

Berkowitz, L., ed., 1969. Advances in Experimental Social Psychology. London:
Academic Press, Inc.

Ashworth, P., 2008. Conceptual Foundations of Qualitative Psychology. In Smith,

J.A., 2008. Qualitative Psychology: A Practical Guide to Research Methods. Lon-
don: SAGE Publications Ltd.

Åström, K.J., Murray, R.M., 2008. Feedback Systems: An Introduction for Scientists
and Engineers. Princeton: Princeton University Press.

Bandura, A., 1991. Social Cognitive Theory of Self-Regulation. Organizational Be-

havior and Human Decision Processes, 50(2), pp. 248-287.

Bandura, A., 2002. Selective Moral Disengagement in the Excercise of Moral Agency.
Journal of Moral Education, 31(2), pp. 101-119.

Bates, A.W.T., 1995. Creating the Future: Developing a vision in open and distance
learning. In F. Lockwood, ed. 1995. Open and Distance Learning Today. London:
Routledge. Ch. 5.

Bates, A., 2005. Technology, E-learning and Distance Education. 2nd ed. London:
Routledge

Bell, J., 2005. Doing Your Research Project: A Guide for First-Time Researchers in
Education, Health and Social Science, 4th ed. Berkshire: McGraw-Hill Education,
2005.

Bourne, J., Harris, D., Mayadas, F., 2005. Online Engineering Education: Learning
Anywhere, Anytime. Journal for Asynchronous Learning Networks, 9(1), pp. 15-
41.

Bours, P., Fullu, C.J., 2009. A Login System Using Mouse Dynamics. In Proceed-
ings of the Fifth International Conference on Intelligent Information Hiding and
Multimedia Signal Processing, Kyoto, Japan, 12-14 September 2009.

Buller, D.B., Burgoon, J.K., 1996. Interpersonal Deception Theory. Communication

Theory, 6(3), pp. 203-242.

88
Carlsmith, K.M., Darley, J.M., Robinson, P.H., 2002. Why do we punish? Deter-
rence and just deserts as motives for punishment. Journal of personality and social
psychology, 83(2) pp. 284-299.

Chandola, V., Banerjee, A., Kumar, V., 2009. Anomaly Detection: A Survey. ACM
Computing Surveys, 41(3), article 15.

Cizek, G.J. 1999. Cheating on Tests: How to Do It, Detect It and, Prevent It.
Mahwah: Lawrence Erlbaum Associates Inc.

Cizek, G.J., 2001. An Overview of Issues Concerning Cheating on Large-Scale Tests.

Chapel Hill: University of North Carolina.

Cizek, G.J., 2003. Detecting and Preventing Classroom Cheating: Promoting In-
tegrity in Assessment. Thousand Oaks: Dorwin Press, Inc.

Clark-Carter, D., 2009. Quantitative Psychological Research: The Complete Stu-

dent’s Companion, 3rd edition. New York: Psychology Press.

Columbaro, N.L., Monaghan, C.H., 2009. Employer Perceptions of Online De-

grees: A Literature Review. Online Journal of Distance Learning Administra-
tion, [Online]. 12 (1), Available at: http://www.westga.edu/~distance/ojdla/
browsearticles.php [Accessed 2010-03-02].

Covington, M.V., 2000. Goal Theory, Motivation and School Achievement: An In-
tegrative Review. Annual Review of Psychology, 51(1), pp. 170-200.

Crossley, S.A., Louwerse, M.M., McCarthy, P.M., McNamara, D.S., 2007. A Linguis-
tic Analysis of Simplified and Authentic Texts. The Modern Language Journal,
91(1), pp. 15-30.

Dennis, A.R., Valacich, J.S., 1999. Rethinking Media Richness: Towards a Theory of
Media Synchronicity. In Proceedings of the 32nd Hawaii International Conference
on System Sciences, Maui, Hawaii, 5-8 January 1999.

DePaulo, B.M., Malone, B.E., Lindsay, J.J., Muhlenbruck, L., Charlton, K., Cooper,
H., 2003. Cues to Deception. Psychological Bulletin, 129(1), pp. 74-118.

Deubel, P., 2003. Learning from Reflections - Issues in Building Quality Online
Courses. Online Journal of Distance Learning Administration, [Online]. 6 (3),
Available at: http://www.westga.edu/~distance/ojdla/browsearticles.php
[Accessed 2010-03-02].

Dick, M., Sheard, J., Bareiss, C., Carter, J., Joyce, D., Harding, T., Laxer, C. 2003.
Addressing student cheating: Definitions and solutions. ACME SIGCSE Bulletin,
35(2), pp. 172-184.

Diekhoff, G.M., LaBeff, E.E., Clark, R.E., Williams, L.E., Francis, B., Haines, V.J.,
1996. College Cheating: Ten Years Later. Research in Higher Education 37(4),
pp. 487-503.

89
Doddington, G., Liggett, W., Martin, A., Przybocki, M., Reynolds, D., 1998.
SHEEP, GOATS, LAMBS and WOLVES: A Statistical Analysis of Speaker Per-
formance in the NIST 1998 Speaker Recognition Evaluation. Proceedings of In-
ternational Conference on Spoken Language Processing, 1998.

Eccles, J.S., Wigfield, A., 2002. Motivational Beliefs, Values and Goals. Annual
Review of Psychology, 53(1), pp. 109-132.

Ekman, P., 1985. Telling Lies: Cues to Deceit in the Marketplace, Politics, and
Marriage. New York: W. W. Norton & Company Inc.

Eggins, S., 2004. An Introduction to Systemic Functional Linguistics. London: Con-

tinuum International Publishing Group.

Faucher, D., Caves, S., 2009. Academic Dishonesty: Innovative cheating techniques
and the detection and prevention of them. Teaching and Learning in Nursing,
4(2), pp. 37-41.

Fawcett, R.P., 2008. Invitation to Systemic Functional Linguistic through the Cardiff
Grammar, 3rd ed. London: Equinox Publishing Ltd.

Fuller, C., Burgoon, J.K., Twitchell, D.P., Biros, D.P., Adkins, M., 2006. An Analy-
sis of Text-Based Deception Detection Tools. In Proceedings of the Twelfth Amer-
ican Conference on Information Systems, Acapulco, Mexico, 4-6 August 2006.

Furnell, S., Evangelatos, K., 2007. Public awareness and perceptions of biometrics.
Computer Fraud and Security, 2007(1), pp. 8-13.

Gamboa, H., Fred, A., 2004. A Behavioural Biometric System Based on Human
Computer Interaction. In Proceedings of SPIE, 2004.

Giot, R., El-Abed, M., Rosenberger, C., 2009. Keystroke Dynamics with Low Con-
straints SVM Based Passphrase Enrollment. In Proceedings of IEEE Third Inter-
national Conference on Biometrics: Theory, Applications and Systems, Washing-
ton, USA, 28-30 September 2009.

Graesser, A.C., McNamara, D.S., Louwerse, M.M., Cai, Z., 2004. Coh-Metrix: Anal-
ysis of text on cohesion and language. Behavior Research Methods, Instruments
& Computers, 36(2), pp. 193-202.

Gunetti, D., Picardi, C., 2005. Keystroke Analysis of Free Text. ACM Transactions
on Information and System Security, 8(3), pp. 312-347.

Harris, R., 2009. Anti-Plagiarism Strategies for Research Papers. [Online] Available
at: http://www.virtualsalt.com/antiplag.htm [Accessed: 2010-03-11].

Harris, L.C., Dumas, A., 2009. Online customer misbehaviour: an application of

neutralization theory. Marketing Theory, 9(4), pp. 379-402.

Hawkridge, D., 1995. The Big Bang Theory in Distance Education. In F. Lockwood,
ed. 1995. Open and Distance Learning Today. London: Routledge. Ch. 1.

90
Hempstalk, K., 2008. You are what you type? In Proceedings of New Zealand
Computer Science Research Student Conference, Christchurch, New Zealand, 14-
18 April 2008.
Herberling, M., 2002. Maintaining Academic Integrity in On-Line Education. Online
Journal of Distance Learning Administration, [Online]. 5 (1), Available at: http:
//www.westga.edu/~distance/ojdla/browsearticles.php [Accessed 2010-02-
28].
Heyneman, S.P., 2002. Education and Corruption. Annual Meeting of the Associa-
tion for the Study of Higher Education, 20 November 2002, Sacramento, Califor-
nia.
Hinman, L.M., 1997. Cultivating Integrity to Combat Plagiarism. [Online]. San
Diego: San Diego Union-Tribune. Available at: http://ethics.sandiego.edu/
lmh/op-ed/combat-plagiarism/index.asp [Accessed: 2010-03-09].
Holmberg, B., 1995. Theory and practice of distance education. 2nd ed. London:
Routledge
Hosseinzadeh, D., Krishnan, S., 2009. Gaussian Mixture Modeling of Keystroke
Patterns for Biometric Applications. IEEE Transactions on Systems, Man, and
Cybernetics – Part C: Applications and Reviews, 38(6), pp. 816-826.
Howell, S.L., Williams, P.B., Lindsey, N.K., 2003. Thrirty-two Trends Affecting Dis-
tance Education: An Informed Foundation for Strategic Planning. Online Jour-
nal of Distance Learning Administration, [Online]. 6 (3), Available at: http:
//www.westga.edu/~distance/ojdla/browsearticles.php [Accessed 2010-02-
28].
Howell S.L., Sorensen, D., Tippets, H.R., 2009. The New (and Old) News about
Cheating for Distance Educators. Online Journal of Distance Learning Admin-
istration, [Online]. 12 (3), Available at: http://www.westga.edu/~distance/
ojdla/browsearticles.php [Accessed 2010-03-02].
Huang, J.K., 2006. A Functional Approach to Pattern Recognition Theory. In pro-
ceedings of IEEE International Conference on Granular Computing, 10-12 May
2006, IEEE Computer Society, pp. 700-703.
Ignatenko, T., Willems, M.J., 2009. Biometric Systems: Privacy and Secrecy As-
pects. IEEE Transactions on Information Forensics and Security, 4(4), pp. 956-
973.
Ilonen, J., 2003. Keystroke dynamics. [Lecture paper] Lappeenranta: Lappeenranta
University of Technology.
Irele, M.E., 2005. Can Distance Education be Mainstreamed? Online Journal of
Distance Learning Administration, [Online]. 8 (2), Available at: http://www.
westga.edu/~distance/ojdla/browsearticles.php [Accessed 2010-02-26].
Iyer, R., Eastman, J.K., 2008. The Impact of Unethical Reasoning on Academic
Dishonesty: Exploring the Moderating Effect of Social Desirability. Marketing
Education Review, 18(2), pp. 21-33.

91
Jagadeesan, H., Hsiao, M.S., 2009. A Novel Approach to Design of User Re-
Authentication Systems. In Proceedings of 3rd IEEE International Conference on
Biometrics: Theory, Applications and Systems, Washington, USA, 28-30 Septem-
ber 2009.

Jain, A.K., Bolle, R., Pankanti, S., 1999. Introduction to Biometrics. In Jain, A.K.,
Bolle, R., Pankanti, S., eds., 1999. Biometrics: Personal Identification in Net-
worked Society. Norwell: Kluwer Academic Publishers.

Jain, A.K., Duin R.P.W., Mao, J., 2000. Statistical Pattern Recognition: A Review.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), pp. 4-37.

Jain, A.K., Ross, A., Prabhakar, S., 2004. An Introduction to Biometric Recognition.
IEEE Transactions on Circuits and Systems for Video Technology, Special Issue
on Image- and Video-Based Biometrics, 14(1), pp. ?-?.

Johnson, P.K., Raye, C.L., Reality Monitoring. Psychological Review, 88(1), pp.
67-85.

Juola, P., 2006. Authorship Attribution. Foundation and Trends in Information

Retrieval, 1(3), pp. 233-334.

Kaklauskas, A., Krutinis, M., Seniut, M., 2009. Biometric Mouse Intelligent System
for Student’s Emotional and Examination Process Analysis. In Proceedings of
Ninth IEEE International Conference on Advanced Learning Technologies, Riga,
Latvia, 15-17 July 2009.

Karnan, M., Akila, M., 2009. Identity Authentication based on Keystroke Dynamics
using Genetic Algorithm and Particle Swarm Optimization. In Proceedings of 2nd
IEEE International Conference on Computer Science and Information Technol-
ogy, Bejing, China, 8-11 August 2009.

Keegan, D., 1996. Foundations of distance education. 3rd ed. London: Routledge

Kelso, J.A.S., ed., 1982. Human Motor Behavior: An Introduction. Mahwah, USA:
Lawrence Erlbaum Associates, Inc.

Kerlinger, F.N., Lee, H.B., 2000. Foundations of Behavioral Research, 4th ed. New
York, USA: Thomson Learning.

Kim, N., Smith, M.J., Maeng, K., 2008. Assessment in Online Distance Education: A
Comparison of Three Online Programs at a University. Online Journal of Distance
Learning Administration, [Online]. 11 (1), Available at: http://www.westga.
edu/~distance/ojdla/browsearticles.php [Accessed 2010-03-11].

Koul, B.N., 1995. Trends, Directions and Needs: A view from developing countries.
In F. Lockwood, ed. 1995. Open and Distance Learning Today. London: Routledge.
Ch. 3.

Kuukkanen, J.-M., 2007. Kuhn, the correspondence theory of truth and coherentist
epistemology. Studies in History and Philosophy of Science, 38(1), pp. 555-566.

92
Le Heron, J., 2001. Plagiarism, learning dishonesty or just plain cheating: The
context and countermeasures in Information Systems teaching. Australian Journal
of Educational Technology, 17(3) pp. 244-264.

Lee, C., Welker, R.B., Odom, M.D., 2009. Features of Computer-Mediated, Text-
Based Messages that Support Automatable, Linguistic-Based Indicators for De-
ception Detection. Journal of Information Systems, 23(1), pp. 5-24.

Leedy, P.D., Ormrod, J.E., 2005. Practical research: planning and design. 8th ed.
Upper Saddle River: Pearson Prentice Hall.

Love, P.G., Simmons, J., 1998. Factors influencing cheating and plagiarism among
graduate students in a college of education. College Student Journal, 35(4), pp.
539-551.

Maguire, L.L., 2005. Literature Review - Faculty Participation in Online Distance

Education: Barriers and Motivators. Online Journal of Distance Learning Ad-
ministration, [Online]. 8 (1), Available at: http://www.westga.edu/~distance/
ojdla/browsearticles.php [Accessed 2010-02-26].

Mason, R., 1995. Using Electronic Networking for Assessment. In F. Lockwood, ed.
1995. Open and Distance Learning Today. London: Routledge. Ch. 20.

McCabe, D.L., Butterfield, K.D., Treviño, L.K., 2006. Academic Dishonesty in

Graduate Business Programs: Prevalence, Causes and Proposed Action. Academy
of Management Learning & Education, 5(3) pp. 294-305.

McCabe, D.L., Pavela, G., 2004. Ten (Updated) Principles of Academic Integrity.
Change, 36(3), pp. 10-16.

McNicol, D., 2004. A primer of signal detection theory. Mahwah, USA: Lawrence
Erlbaum Associates, Inc.

Megehee, C.M., Spake, D.F., 2008. The Impact of Perceived Peer Behavior, Probable
Detection and Punishment Severity on Student Cheating Behavior. Marketing
Education Review, 18(2), pp. 5-19.

Monrose, F., Rubin, A.D., 2000. Keystroke dynamics as a biometric for authentica-
tion. Future Generation Computer Systems, 16(1), pp. 351-359.

Moore, M., Kearsley, G., 1996. Distance Education: A Systems View. 1st ed. Bel-
mont: Wadsworth Publishing Company

Moskovitch, R., Feher, C., Messerman, A., Kirschnick, N., Mustafić, T., Camtepe,
A., Löhlein, B., Heister, U., Möller, S., Rokach, L., Elovici, Y., 2009. Identity
Theft, Computers and Behavioral Biometrics. IEEE Intelligence and Security In-
formatics, Richardson, USA, 8-11 June 2009.

Murdock, T.B., Anderman, E.M., 2006. Motivational Perspectives on Student

Cheating: Towards an Integrated Model of Academic Dishonesty. Educational
Psychologist, 41(3), pp. 129-145.

93
O’Leary, D.P., 1999. 12 Professional Ethics. [Online]. College Park: University of
Maryland, Department of Computer Science. Available at: http://www.cs.umd.
edu/%7Eoleary/gradstudy/node13.html [Accessed: 2010-03-09].

Olt, M.R., 2002. Ethics and Distance Education: Strategies for Minimizing Aca-
demic Dishonesty in Online Assessment. Online Journal of Distance Learning Ad-
ministration, [Online]. 5 (3), Available at: http://www.westga.edu/~distance/
ojdla/browsearticles.php [Accessed 2010-02-28].

Olt, M.R., 2007. A New Design on Plagiarism: Developing an Instructional Model

to Deter Plagiarism in Online Courses. Ph. D., Minneapolis: Capella University.

Parker, A., 2003. Motivation and Incentives for Distance Faculty. Online Journal
of Distance Learning Administration, [Online]. 6 (3), Available at: http://www.
westga.edu/~distance/ojdla/browsearticles.php [Accessed 2010-02-28].

Paulsen, M.F., Rekkedal, T., 2001. Voksne kan og vill lære på Internett. In Paulsen,
M.F., ed., 2001. Nettbasert utdanning: Erfaringer og visjoner. Bekkestua: NKI
Forlaget.

Paulsen, M.F., 2001. Studenters syn på nettbasert utdanning. In Paulsen, M.F., ed.,
2001. Nettbasert utdanning: Erfaringer og visjoner. Bekkestua: NKI Forlaget.

Peacock, A., Ke, X., Wilkerson, M., 2004. Typing Patterns: A Key to User Iden-
tification. IEEE Security & Privacy Magazine, pp. 40-47. September/October,
2004.

Picard, R.W., 1997. Affective Computing. Cambridge, USA: MIT Press.

Picard, R.W., 2000. Affective Computing. Cambridge, USA: Massachusetts Institute

of Technology.

Picard, R.W., 2003. Affective Computing: Challenges. International Journal of

Humen-Computer Studies, 56(1), pp. 55-64.

Rettinger, D.A., Kramer, Y., 2008. Situational and Personal Causes of Student
Cheating. Research in Higher Education, 50(3), pp. 293-313.

Reushle, S., Dorman, M., Evans, P., Kirkwood, J., McDonald, J., Worden, J., 1999.
Critical Elements: Designing for online teaching. Proceedings of ASCILITE99
Responding to Diversity: 16th Annual Conference, QUT, Brisbane, 5-8 December.

Reushle, S., McDonald, J., 2004. Online learning: Transcending the physical. In
Logan Campus, Griffith University: ETL Conference, 2004. Brisbane, Australia,
04-05 November 2004.

Rowe, N.C., 2004. Cheating in Online Student Assessment: Beyond Plagiarism.

Online Journal of Distance Learning Administration, [Online]. 7 (2), Available
at: http://www.westga.edu/~distance/ojdla/browsearticles.php [Accessed
2010-02-25].

Rumyantseva, N.L., 2005. Taxonomy of Corruption in Higher Education. Peabody

Journal of Education, 80(1), pp. 81-92.

94
Rybnik, M., Panasiuk, P., Saeed, K., 2009. User Authentication with Keystroke
Dynamics using Fixed Text. International Conference on Biometrics and Kansei
Engineering, Cieszyn, Poland, 25-28 June 2009.
Sapsford, R. and Jupp, V., 1996. Data Collection and Analysis. London: Sage.
Shanmugapriya. D., Padmavathi, G., 2009. A Survey of Biometric Keystroke Dy-
namics: Approaches, Security and Challenges. International Journal of Computer
Science and Information Security, 5(1), pp. 115-119.
Shen, J., Bieber, M., Cheng, K., Hiltz, S.R., 2004. Traditional In-class Examination
vs. Collaborative Online Examination in Asynchronous Learning Networks: Field
Evaluation Results. Proceedings of the Tenth Americas Conference on Information
Systems, New York, August 2004.
Shen, C., Cai, Z., Guan, X., Sha, H., Du, J., 2009. Feature Analysis in Mouse
Dynamic in Identity Authentication and Monitoring. In Proceedings of IEEE In-
ternational Conference on Communications, Dresden, Germany, 14-18 June 2009.
Shon, P.C.H., 2006. How College Students Cheat On In-Class Examinations: Cre-
ativity, Strain, and Techniques of Innovation. Plagiary: Cross-Disciplinary Studies
in Plagiarism, Fabrication, and Falsification, 1(10): pp. 1-20.
Smith, J.A., 2008. Qualitative Psychology: A Practical Guide to Research Methods.
London: SAGE Publications Ltd.
Stakhanova, N., Basu, S., Wong, J., 2010. On the symbiosis of specification-based
and anomaly-based detection. Computers & Security, 29(1), pp. 253-268.
Stelmach, G.E., Requin, J., eds., 1980. Tutorials in Motor Behavior. Amsterdam:
North-Holland Publishing Company.
Stone, T.H., Jawahar, I.M., Kisamore, J.L., 2009. Using the theory of planned be-
havior and cheating justifications to predict academic misconduct. Career Devel-
opment International, 14(3), pp. 221-241.
Stumber-McEwen, D., Wiseley, P., Hoggatt, S., 2009. Point, Click and Cheat:
Frequency and Type of Academic Dishonesty in the Virtual Classroom. On-
line Journal of Distance Learning Administration, [Online]. 12 (3), Available
at: http://www.westga.edu/~distance/ojdla/browsearticles.php [Accessed
2010-03-02].
Tappert, C.C., Villani, M., Cha, S., 2009. Keystroke Biometric Identification and
Authentication on Long-Text Input. In Wang, L., Geng, X., eds. 2009. Behav-
ioral Biometrics for Human Identification: Intelligent Applications, Hershey: IGI
Global, pp. 342-367.
Theodoridis, S., Koutroumbas, K., 2006. Pattern Recognition, 3rd edition. San Diego,
USA: Academic Press, Elsevier.
Thomason, M.G., 1990. Introduction and Overview. In Bunke, H., Sanfeliu, A., eds.,
1990. Syntactic and structural pattern recognition: theory and applications (Series
in computer science; vol. 7). Singapore: World Scientific Publishing Co. Pte. Ltd.

95
Thorpe, M., 1995. The Challenge Facing Course Design. In F. Lockwood, ed. 1995.
Open and Distance Learning Today. London: Routledge. Ch. 17.

Thorpe, J., Van Oorschot, P.C., Somayaji, A., 2005. Pass-thoughts: Authenticat-
ing with Our Minds. In Proceedings of New Security Paradigms Workshop, Lake
Arrowhead, USA, 20-23 September 2005, pp. 45-56.

UC Berkeley, 2009. Teaching Guide for Graduate Student Instructors. [On-

line] Berkeley: UC Berkeley. Available at: http://gsi.berkeley.edu/
teachingguide2009/academic-misconduct/introduction.html [Accessed
2010-03-04].

Underwood, J., 2006. Digital Technologies and Dishonesty in Examinations and

Tests. Nottingham: Nottingham Trent University.

University of Alberta Libraries, 2009. Guide to Plagiarism and Cyber-Plagiarism.

[Online] Available at: http://guides.library.ualberta.ca/content.php?
pid=62200&sid=459213 [Accessed: 2010-03-11]

Usick, B., 2004. Preventing Plagiarism: A new Three-R Model. Paper presented
on 3rd annual UTS Teaching and Learning Symposium. Winnipeg, Canada, 06
February 2004.

Villani, M., Tappert, C., Ngo, G., Simone, J., Fort, H.S., Cha, S., 2006. Keystroke
Biometric Recognition Studies on Long-Text Input under Ideal and Application-
Oriented Conditions. In Proceedings of Student/Faculty Research Day, CSIS, Pace
University. New York City, USA, 5 May 2006.

Vizer, L.M., Zhou, L., Sears, A., 2009. Automated stress detection using keystroke
and linguistic features: An exploratory study. International Journal of Human-
Computer Studies, 67(10), pp. 870-886.

Watson, G., Sottile, J., 2010. Cheating in the Digital Age: Do students cheat
more in online courses? Online Journal of Distance Learning Administra-
tion, [Online]. 13 (1), Available at: http://www.westga.edu/~distance/ojdla/
browsearticles.php [Accessed 2010-03-11].

Wehman, P., 2009. Faculty Prescriptions for Academic Integrity: An Urban Campus
Perspective. Ph. D., Pittsburgh: University of Pittsburgh.

Whitley, B.E., Keith-Spiegel, P., 2001. Introduction to the Special Issue. Ethics and
Behavior, 11(3), pp. 217-218.

Whitley, B.E., Keith-Spiegel, P., 2002. Academic Dishonesty: An Educator’s Guide.

Mahwah: Lawrence Erlbaum.

Whitman, M., Mattord, H., 2007. Guide to Network Defense and Countermeasures.
2nd ed. Boston: Course Technology, Cengage Learning.

Whitman, M., Mattord, H., 2008. Management of Information Security. 2nd ed.
Boston: Course Technology, Cengage Learning.

96
Wood, E., Zelaya, J., Saari, E., King, K., Gupta, M., Howard, N., Ismat, S., Kane,
M.A., Naumowicz, M., Varela, D., Villani, M., 2008. Longitudinal Keystroke Bio-
metric Studies on Long-Text Input. Proceedings of Student-Faculty Research Day,
CSIS, Pace University, May 2, 2008.

Yager, N., Dunstone, T., 2010. The Biometric Menagerie. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 32(2), pp. 220-230.

Yampolskiy, R.V., Govindaraju, V., 2007. Direct and Indirect Human Computer
Interaction Based Biometrics. Journal of Computers, 2(10), pp. 76-88.

Yampolskiy, R.V., Govindaraju, V., 2008. Behavioural biometrics: a survey and

classification. International Journal of Biometrics, 1(1), pp. 81-113.

Yampolskiy, R.V., Govindaraju, V., 2009. Taxonomy of Behavioral Biometrics. In

Wang, L., Geng, X., eds. 2009. Behavioral Biometrics for Human Identification:
Intelligent Applications, Hershey: IGI Global, pp. 1-43.

Zavadskas, E., Kaklauskas, A., Seniut, M., Dzemyda, G., Ivanikovas, S., Stankevic,
V., Simkevičius, C., Jaruševičius, A., 2008. Web-Based Biometric Mouse Intelli-
gent System for Analysis of Emotional State and Labour Productivity. In Pro-
ceedings of The 25th International Symposium on Automation and Robotics in
Construction, Vilnius, Lithuania, 26-29 June 2008.

Zimmermann, P., Guttormsen, S., Danuser, B., Gomez, P., 2003. Affective Comput-
ing – A Rationale for Measuring Mood with Mouse and Keyboard. International
Journal of Occupational Safety and Ergonomics, 9(4), pp. 539-551.

Zhou, L., 2005. An Empirical Investigation of Deception Behavior in Instant Mes-

saging. IEEE Transactions on Professional Communication, 48(2), pp. 147-160.

Zhou, L., Twitchell, D.P., Qin, T., Burgoon, J.K., Nunamaker, J.F., 2003. An
Exploratory Study into Deception Detection in Text-based Computer-Mediated
Communication. In Proceedings of the 36th Hawaii International Conference on
System Sciences, Waikoloa Village: Island of Hawaii, 6-9 January 2003.

Zhou, L., Burgoon, J.K., Nunamaker, J.F., Twitchell, D., 2004. Automated
Linguistics-Based Cues for Detecting Deception in Text-based Asynchronous
Computer-Mediated Communication. Group Decision and Negotiation 13 (in
press), pp. 81-106.

97
98
Appendix A

Subjects of automated
observation

A.1 Basic structure of the analytics

The analysis used in this study has its custom structure, which aids extraction of
the statistical parameters reflecting computer interaction dynamics. The structure
of the user input elements being analyzed consists of three items. (1) Input events
are provided by the operating system on which the user input sensor module runs.
(2) L1 composites (level 1 composites) try to wrap certain input events in a more
cohesive way, e.g. key press and key release of the same key happening after each
other form a key use. Finally, L2 composites (level 2 composites) wrap the L1
composites so that the cohesion increases even more. An example L2 construct,
word, is a sequence of alphanumeric characters typed after each other and without
breaks. Besides that, there are multiple types of L1 and L2 composites, which are
described in figure A.1.

Figure A.1: Analytics structure

The automated path of the empirical inputs is depicted in figure A.2. The inputs
are being captured by the sensor module on the computer the user (participant) is

99
using, until they are analyzed and extracted features from.

Figure A.2: Context and process of the automated analysis part

Parts of the feature names describe the character of the features. A few terms
used further in this appendix are explained in table A.1.
When working with samples for graphical representation of the features, two
additional parameters play a role:

• Sample duration, which determines how long (timewise) are the samples for
the analysis of a whole session.

• Sample overlap, which determines how large part of two following samples
overlap within the analysis of a whole session.

Adjusting those parameters plays a role for analysis and visualization of the param-
eters across the whole session.

A.2 Keystroke dynamics features

The features of keystroke dynamics, which have been extracted, are listed in table
A.2.

A.3 Mouse dynamics features

Important to note, there are more different applications of the mouse dynamics
features. Specific features relate to plain mouse clicks, mouse drags, mouse scrolls,
while the ones related to mouse moves are applicable to all plain mouse moves,
mouse drags (movement is an essential part of those), and non-plain mouse clicks
(these have moves before the actual clicks). Also, each mouse move is divided into
twelve categories based on the mouse move angle (the angle of the movement vector
from source point to destination point. There is a zero category, which comprises
all moves, not divided based on their angle.
The features of mouse dynamics, which have been extracted are listed in table
A.3.

A.4 Silence dynamics features

The features of silence dynamics, which have been extracted are listed in table A.4.

100
Term Description
Duration The time while the composite is being put in.
Latency The time before the occurrence of a specific event or composite.
Rate The frequency of occurrence.
Downtime The time while a key or button is pressed.
Flight The time from releasing former key or button until pressing the
next one. It can have a negative value.
Distance Length of the shortest way from source point to destination point.
Length Length of the trajectory the mouse pointer has gone when going
from source point to destination point.
Center A point in interval where the value integrated across the interval
from 0 to the actual position is half of the value integrated across
the whole definition set (the interval from 0 to the end). The
definition set is session- or sample-relative time.
Ratio A ratio between occurrence of a specific composite across all of
the same type, or a ratio between summed duration of a specific
composite across the duration of all others of the same type.
Tailing time The time from button release and mouse move end.
Digraph Sequence of two keys within a word.
Multikey Multiple key L2 composite.
Single key Single key L2 composite.
Word Word L2 composite.
Mean The average
PN
value across all samples in a specific population of
i=0 valuei
size N . N
Standard deviation The square root of value q Pvariance across all samples in a specific
i=0 (valuei −mean)
N 2
population of size N . N

Table A.1: Explanation of terms used in the description of features

A.5 Linguistic dynamics features

The features of linguistic dynamics, which have been extracted are listed in table
A.5.

101
Feature designation Feature name (description)
KA Keyboard activity
GDTm Any key downtime mean
GDTsd Any key downtime standard deviation
GKRm Any key key rate mean
GKRsd Any key key rate standard deviation
MK# Multikey count
MKDm Multikey duration mean
MKDsd Multikey duration standard deviation
MKDTm Multikey downtime mean
MKDTsd Multikey downtime standard deviation
MKKFLm Multikey key flight mean
MKKFLsd Multikey key flight standard deviation
MKKRm Multikey key rate mean
MKKRsd Multikey key rate standard deviation
W# Word count
WLˆ Word length maximum
WLm Word length mean
WLsd Word length standard deviation
WDˆ Word duration maximum
WDm Word duration mean
WDsd Word duration standard deviation
WKDm Word key duration mean
WKDsd Word key duration standard deviation
WKFLm Word key flight mean
WKFLsd Word key flight standard deviation
WDLATm Word deliminator latency mean
WDLATsd Word deliminator latency standard deviation
NWLATm Next word latency mean
NWLATsd Next word latency standard deviation
SK# Single key count
SKDTm Single key downtime mean
SKDTsd Single key downtime standard deviation
SKRm Single key rate mean
SKRsd Single key rate standard deviation
D# Digraph count
DDm Digraph duration mean
DDsd Digraph duration standard deviation
DKRm Digraph key rate mean
DKRsd Digraph key rate standard deviation
DKFLm Digraph key flight mean
DKFLsd Digraph key flight standard deviation
DD1m Digraph key 1 duration mean
DD1sd Digraph key 1 duration standard deviation
DD2m Digraph key 2 duration mean
DD2sd Digraph key 2 duration standard deviation

Table A.2: Keystroke dynamics features

102
Feature designation Feature name (description)
MM# Mouse move count
MMDIm Mouse move distance mean
MMDIsd Mouse move distance standard deviation
MMAm Mouse move angle mean
MMAsd Mouse move angle standard deviation
MMLm Mouse move length mean
MMLsd Mouse move length standard deviation
MMDm Mouse move duration mean
MMDsd Mouse move duration standard deviation
MMmSm Mouse move maximal speed mean
MMmSsd Mouse move maximal speed standard deviation
MMSm Mouse move speed mean
MMSsd Mouse move speed standard deviation
MMSCm Mouse move speed center mean
MMSCsd Mouse move speed center standard deviation
MMlmSPm Mouse move last maximal speed position mean
MMlmSPsd Mouse move last maximal speed position standard deviation
MMACm Mouse move acceleration mean
MMACsd Mouse move acceleration standard deviation
MMACCm Mouse move acceleration center mean
MMACCsd Mouse move acceleration center standard deviation
MMaCm Mouse move absolute curvature mean
MMaCsd Mouse move absolute curvature standard deviation
MMCm Mouse move curvature mean
MMCsd Mouse move curvature standard deviation
MC# Mouse click count
MCCCm Mouse click click count mean
MCCCsd Mouse click click count standard deviation
MCDTm Mouse click downtime mean
MCDTsd Mouse click downtime standard deviation
MCFLm Mouse click flight time mean
MCFLsd Mouse click flight time standard deviation
MCCRm Mouse click click rate mean
MCCRsd Mouse click click rate standard deviation
MS# Mouse scroll count
MSSCm Mouse scroll scroll count mean
MSSCsd Mouse scroll scroll count standard deviation
MSSRm Mouse scroll scroll rate mean
MSSRsd Mouse scroll scroll rate standard deviation
MD# Mouse drag count
MDDm Mouse drag duration mean
MDDsd Mouse drag duration standard deviation
MDMLATm Mouse drag move latency1 mean
MDMLATsd Mouse drag move latency standard deviation
MDTTm Mouse drag tailing time mean
MDTTsd Mouse drag tailing time standard deviation

Table A.3: Mouse dynamics features

103
Feature designation Feature name (description)
S# Silence count
SDm Silence duration mean
SDsd Silence duration standard deviation
SLATm Silence latency mean
SLATsd Silence latency standard deviation
STC Silence time center
SR Silence ratio

Table A.4: Silence dynamics features

Feature designation Feature name (description)

LPR Linguistic paragraph ratio
LWPSm Linguistic words per sentence mean
LWPSsd Linguistic words per sentence standard deviation
LWLm Linguistic word length mean
LWLsd Linguistic word length standard deviation
LDiv Linguistic diversity (use of unique words)
LRed Linguistic redundancy (use of function words)
LFWR Linguistic function word ratio
LBOWR Linguistic bag-of-words2 ratio
LNER Linguistic negation words ratio
LNNR Linguistic negated words ratio
LAR Linguistic article ratio
LQR Linguistic quantifier ratio
LAVR Linguistic adverb ratio
LCWR Linguistic capital words ratio
LNLR Linguistic new line ratio
LWR Linguistic white characters ratio
LPNR Linguistic punctuation ratio
LSCR Linguistic special character ratio
LPPSR Linguistic puncuation per sentence ratio

Table A.5: Linguistic dynamics features

104
Appendix B

Subjects of manual observation

Manual observations were partially focused at general conditions of the observa-

tion sessions such as light and its quality, noise level, temperature, environment
description, as well as any events that occurred during the sessions such as par-
ticipants handling phone calls, or occasional talks between the participant and the
observer. Also, any spontaneous verbal and significant non-verbal reactions noticed,
were noted as remarks to the observation session.

105
106
Appendix C

Questionnaire and observation

task content

Each participant of the observation sessions was asked to fill in the questionnaire
and perform the tasks described below in the appendix.

C.1 Questionnaire
Please try to answer my questions in an essay-like text. There are no
specific demands for formulation or diction besides that I’d like you to
avoid writing in bullets. Please try to reflect on what the questions are
asking and formulate the answers into sentences and even paragraphs if
you wish.

Please try to answer as much ’from the heart’ as possible, and use neutral
answers only if you think they truly reflect your feelings. If you don’t
feel comfortable about answering a question, please omit it and skip to
some further question. :)

What is your gender, age and job position?

How much computer experience do you have (e.g. in years)?
How much writing have you done on a computer keyboard (e.g. little, moderately,
a lot)?

Did you drink a tee or coffee before the meeting? If so, how much and how long
ago?
Did you eat lunch or something smaller before the meeting?
Have you been traveling (walking, riding bicycle) or physically exercising in some in
the past minutes?
Have you felt busy or relaxed recently (today)?
Have you experienced anything unusual that could affect your mood in some way
today? If so, can you describe it a little?
Feel free to add any other remarks to how you feel or what has happened to you
these days – something quite positive, negative, or both?

107
How do you feel today (bad, good, happy, sad, sleepy, ...)? Feel free to describe
as much as you wish.
How much light do you have in the room (little, accurately much, too much)?
How is the light quality in your room? Do you have sunlight, fluorescent lamps or
good old light bulbs?
How do you feel about the temperature in your room (colder, comfortable, warmer)?
What do you think or feel about this session so far (somewhat long and boring, ar-
tificial, indifferent, relax for you, or something else)?
How do you feel about the atmosphere where you are in general? Feel free to specify.

How long do you use the computer you are using now (approximation)?
Do you like the comfort your keyboard provides you?
How do you feel about the mouse this way?
If you have anything else to highlight about your equipment, situation or feelings,
feel free to share it – indicate, mention or describe. :)

C.2 Authentic writing and formulating

Please answer the questions below using a couple of sentences for each.
You can be more verbose if you like and if reflecting on the questions
makes you feel happy!

If you think about your past study time, what have been your favorite courses
and why? Why did you like what you did about them?

What are your hobbies or simply activities you like to do in your free time? Why
do you like them and what makes them interesting to you?

Please open a painting program (like Microsoft Paint or something sim-

ilar). Try to write a tree diagram such as the following, but please make
your own layout (position each node as you wish) and name the nodes
by different names human names you come up with using the text tool
in the painting program. When you finish it, close the application (e.g.
without saving the results) and continue answering the next question.

See image C.1.

Please continue answering the following question.

Try to imagine yourself in a couple of years from now. What would you like to
do or work with? Where would you like to live? What would you like to have? Do
you have any specific ambitions you want to fulfill one day? Feel free to share it.

108
Figure C.1: Example free diagram

C.3 Verbatim copying by reading

Please copy the text below as the answer.

Data classification is a tedious task, because every piece of information or docu-

ment must be examined and assigned a classification tag. New documents may get
automatic tags based on their links to other documents. User classifications are
based on their rank and unit of work and are only changed when they change jobs.
It is hard to classify users in commercial environments in this way: for example, in
a medical system it makes no sense to assign a doctor a higher classification than a
patient, because a patient has the right to see their record.

The system delivers functionality and information to clients across the public In-
ternet through one or more Web servers. Larger systems may use multiple Web
servers and multiple application servers to deliver this functionality, all protected
by a demilitarized zone. The application must exchange data with the client. A
percentage of this data will be sensitive in nature.

Now, please open a painting program again, and try to ’copy’ the following dia-
gram (paint it as similar as the one here):

See image C.2.

Now, please pick any book of yours and try to copy some paragraph, or
a few sentences (around 10 or more).

C.4 Verbatim copying by listening

Please tell the observer to read something to you, while you listen and
write down what the observer reads. The observer will adjust the speed
of reading to your writing tempo.

109
Figure C.2: Diagram to copy (redraw)

C.5 Copying by reading and reformulating

Please rewrite the text below to the answer, but reformulate it, so that it
is not identical. Please try to reformulate each sentence as much as you
can, while you preserve the meaning of the text. The changes should be
’cosmetic’ – as if you were cheating through copying into an essay that
has to be written by your own.

An integrated circuit, also known as IC or chip, is a miniaturized electronic cir-

cuit consisting mainly of semiconductor devices, as well as passive components. It
has been manufactured in the surface of a thin substrate of semiconductor material.
Integrated circuits are used in almost all electronic equipment in use today and have
revolutionized the world of electronics. Computers, cellular phones, and other digi-
tal appliances are now inextricable parts of the structure of modern societies, made
possible by the low cost of production of integrated circuits.

A hybrid integrated circuit is a miniaturized electronic circuit constructed of indi-

vidual semiconductor devices, as well as passive components, bonded to a substrate
or circuit board. A monolithic integrated circuit is made of devices manufactured
by diffusion of trace elements into a single piece of semiconductor substrate, a chip.

110