0% found this document useful (0 votes)
41 views21 pages

ChatGPT The End of Online Exam Integrity

The document discusses concerns about the AI system ChatGPT being used to cheat on online exams by generating realistic answers to exam questions requiring critical thinking. While proctoring and other techniques aim to prevent cheating, ChatGPT's abilities raise questions about maintaining integrity of online assessments going forward.

Uploaded by

yasmine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views21 pages

ChatGPT The End of Online Exam Integrity

The document discusses concerns about the AI system ChatGPT being used to cheat on online exams by generating realistic answers to exam questions requiring critical thinking. While proctoring and other techniques aim to prevent cheating, ChatGPT's abilities raise questions about maintaining integrity of online assessments going forward.

Uploaded by

yasmine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

ChatGPT: The End of Online Exam Integrity?

Teo Susnjak1
arXiv:2212.09292v1 [cs.AI] 19 Dec 2022

1
School of Mathematical and Computational Sciences, Massey
University, Auckland, New Zealand

December 20, 2022

Abstract
This study evaluated the ability of ChatGPT, a recently developed
artificial intelligence (AI) agent, to perform high-level cognitive tasks and
produce text that is indistinguishable from human-generated text. This
capacity raises concerns about the potential use of ChatGPT as a tool for
academic misconduct in online exams. The study found that ChatGPT is
capable of exhibiting critical thinking skills and generating highly realistic
text with minimal input, making it a potential threat to the integrity
of online exams, particularly in tertiary education settings where such
exams are becoming more prevalent. Returning to invigilated and oral
exams could form part of the solution, while using advanced proctoring
techniques and AI-text output detectors may be effective in addressing
this issue, they are not likely to be foolproof solutions. Further research
is needed to fully understand the implications of large language models
like ChatGPT and to devise strategies for combating the risk of cheating
using these tools. It is crucial for educators and institutions to be aware
of the possibility of ChatGPT being used for cheating and to investigate
measures to address it in order to maintain the fairness and validity of
online exams for all students.
Keywords— ChatGPT; online exams; large language models; assessment cheating;
academic integrity; invigilated exams; proctoring tools; GPT-3;

1 Introduction
Higher education has seen a significant shift towards online learning in recent years,
and this trend has been accelerated by the COVID-19 pandemic [7]. Many Higher
Education Institutions (HEIs) have had to quickly adapt to the challenges posed by the
pandemic by transitioning to online classes and exams[8, 9, 15]. It is unlikely that these
trends towards online education will reverse in the near future [12] notwithstanding the
challenges encountered, since the benefits of remote learning have become appreciated
by both HEIs and students alike [7].
As the sector has increasingly moved online, concerns around academic integrity
have also been amplified [7, 25, 14]. The transition to online exams, in particular,

1
ChatGPT: The End of Online Exam Integrity?

has raised concerns about the potential for cheating and other forms of academic
misconduct [6, 3, 11, 18, 15]. This is due, in part, to the anonymity and lack of direct
supervision that are inherent to online exams, as well as the ease with which students
may be able to access and share resources during the exam.
While concerns around academic integrity in online exams have been raised, there
is a lack of definitive research with no consolidated literature reviews yet conducted
quantifying the extent of dishonest practices in online assessments [14]. Indications are
that the prevalence is on the rise. In earlier studies, Fask et al. [13], Corrigan-Gibbs
et al. [10], Alessio et al. [4] detect that significant rates of cheating occurred in online
assessments, while in general, Arnold [6] mention that there is a belief among educators
that academic misconduct is on the rise and that online assessment is particularly
conducive to cheating. More recently, Noorbehbahani et al. [18] reported that cheating
in online exams covering more than a decade of research, found that dishonesty in
online exams is more prevalent than in traditional face-to-face exams.
To preserve academic integrity in online exams, HEIs have implemented revised
recommendations for formulating assessments [25], various technological strategies such
as proctored exams [4], plagiarism detection software, exam security measures, as well
as revisions of institutional academic integrity policies and educational campaigns to
deter misconduct and honor codes Corrigan-Gibbs et al. [10]. While these strategies
individually or in tandem, may be effective in mitigating the risk of academic misconduct,
there is currently insufficient evidence regrading their overall effectiveness in preserving
academic integrity in online exams. Meanwhile, ethical concerns surrounding the use
of proctoring software on personal computers [5] and their recent challenges on legal
grounds [2] have gained momentum.
An additional measure that HEIs have explored is a shift towards using more chal-
lenging exam questions [16] that require greater degrees of critical thinking. Whisenhunt
et al. [25]1 note that these types of assessments comprising essays and short-answer
responses are generally perceived by educators to be more suitable at measuring critical
thinking [22] as well as facilitating deeper learning [24, 26]. The underlying intention
behind them is to move away from multiple-choice and simple information-retrieval
questions since these types of questions are regarded as more susceptible to cheating
[18] when encountering misconduct involving unauthorized web access.
However, a new threat to the academic integrity of online exams, even ones requiring
high-order reasoning has emerged. With the recent 2 public release of ChatGPT by
OpenAI[1], the world has seen a significant leap in AI capabilities that involve natural
language processing and reasoning. This publicly3 available technology is not only
able to engage in sophisticated dialogue and provide information on virtually all topics.
It is also able to generate compelling and accurate answers to difficult questions
requiring an advanced level of analysis, synthesis, and application of information, as
will be demonstrated in this study. It can even devise critical questions itself, the very
questions that educators in different disciplines would use for their students’ evaluation
of competencies. Assuming that high-stakes exams will continue to be perceived as
valuable and will continue to be used in education, this development may spell the end
of the academic integrity of online examinations. It is therefore imperative that the
capabilities of this AI agent be examined.
1 The authors develop a set of recommendations for conducting multiple-choice exams

in online environments. Appendix A demonstrates the ability of ChatGPT to answer such


questions.
2 A beta research release of ChatGPT took place on November 30, 2022
3 The technology was available to the public as of the publication date of this paper

2
ChatGPT: The End of Online Exam Integrity?

1.1 ChatGPT
ChatGPT is a large language model. A large language model is a type of AI that uses
deep learning (a form of machine learning) to process and generate natural language
text. These models are trained on massive amounts of text data, allowing them to
learn the nuances and complexities of human language. In the case of ChatGPT,
it was trained on a diverse range of text data which included books, articles, and
online conversations, to enable it to engage in non-trivial dialogue and provide accurate
information on a wide range of topics 4 . The development of ChatGPT represents a
significant advancement in the field of natural language processing and AI in general,
building upon the initial GPT (Generative Pretrained Transformer[23]) model and
paving the way for further innovations in this area.
One of the key advantages of these large language models is their ability to
understand the context of a given prompt and generate appropriate responses. This
paper focuses on demonstrating this capability. This is a significant improvement
over earlier language models, which were often unable to interpret the meaning and
intent behind a given piece of text. Another important aspect is its ability to generate
high-quality text that is difficult to distinguish from human writing. With its ability to
draw out knowledge and answer difficult academic questions, it is inherently capable of
answering examination questions that would otherwise not easily be answered through
web searches, and to provide accurate and reliable responses.

1.2 Aims and Contribution


The goal of this article is to provide a first analysis of the capabilities of ChatGPT
in demonstrating the complexity of its reasoning and its ability to answer non-trivial
university-level questions across multiple disciplines, and its ability to present its
responses with clarity and coherence.
The contribution of this study is its novelty and the urgency of raising an alarm
about the unprecedented threat that large language models like ChatGPT pose to
academic integrity. The intention is to highlight the fact that current mitigation
strategies, recommendations, and processes for preserving academic integrity are likely
not yet capable of addressing this danger.
While this technology is arguably one of the greatest advancements in AI capabilities
since the advent of web search engines, and will offer enormous opportunities for
productivity advancements, it also brings a fresh challenge to the HEI sector which
is already in a state of disruption, but compelled to examine this development with
urgency.

2 Background
The literature review examines the most recent investigations into the problem of
academic integrity, with a greater focus on the context of online assessments.
Butler-Henderson and Crawford [8] conducted a systematic review that highlighted
the transformation of learning and teaching towards more active learning environments,
particularly in the context of the COVID-19 pandemic. This has led to the adoption of
online examination formats which the authors discuss as being driven by a desire to
increase international enrollments, and the ’massification’ of higher learning, while the
4 The details of the datasets used to train ChatGPT have not been publicly released.

3
ChatGPT: The End of Online Exam Integrity?

impact of the pandemic has been to accelerate these trends. The authors identify the
limitations and challenges of online examinations, including cheating issues, together
with access to technology, and the lack of standardized approaches. The study concludes
by calling for further research on online examinations and the importance of designing
online examinations that are fair, valid, and reliable.
In a comprehensive report, Barber et al. [7] discuss that academic misconduct,
including plagiarism and cheating, is a concern for higher education institutions and
educators in both in-person and digital assessments. Technology has played a role
in helping institutions detect plagiarism, and new developments in technology, such
as biometric authentication, authorship analysis, and proctoring software, are being
used to identify misconduct. However, the use of proctoring software has also raised
concerns about privacy and international students have had issues with the software
due to differences in bandwidth. The shift to digital teaching and learning during the
COVID-19 pandemic has prompted a review of assessment approaches, including the
use of open-book online exams and more authentic, integrated assessments.
Coghlan et al. [9] report that online exam proctoring technologies, which use AI
and machine learning, have gained attention. However, the study notes that these
technologies have faced controversy and ethical concerns, including questions about
student privacy, potential bias, and the validity and reliability of the software. Some
universities have defended their use, while others have retreated from or rejected the
use of these technologies.
A recent systematic literature review by Noorbehbahani et al. [18] on cheating
in online exams covering more than a decade of research found that cheating in
online exams is indeed a significant concern. The study claims that cheating in online
exams is more prevalent than in traditional face-to-face exams. The authors note
that a wide range of technologies and tools can be used to facilitate cheating, such as
remote desktop and screen sharing, searching for solutions on the internet, and using
social networks. Apart from online proctoring, the authors identify a combination of
prevention strategies, such as cheat-resistant questions, and detection methods, such
as plagiarism detection software and machine learning algorithms, as potentially being
effective.
Henderson et al. [15] also found that the prevalence of cheating in online exams
is a significant issue, while this still remains an issue even in on-campus, paper-based
invigilated exams. The authors’ findings point to previous research which has shown
that cheating persists despite security measures, with conflicting existing evidence
about the impact of invigilation and online security on cheating. Their conclusion
is that while technology-based security measures can impact student experience and
attitudes towards integrity, they do not necessarily reduce cheating.
With respect to proctoring software, Alin et al. [5] also raised the issue of ethi-
cal concerns surrounding the use of this technology on personal computers and the
interpretation of what constitutes suspicious behavior. The authors stress that even
when proctoring systems are permitted for use on online examinations, the exams can
still be vulnerable to cheating. The authors posit that there is currently a lack of
understanding about how cheating may occur in virtual proctored exams and how to
best mitigate it.
Meanwhile, the Khan et al. [16] also highlighted that proctoring software that
required students to keep their cameras on during online examinations was considered
stressful and intrusive to privacy by the students, while they also believed cheating
would continue irrespective of the measures. The study also suggested using strategies
like replacing multiple-choice questions with short-answer questions and employing

4
ChatGPT: The End of Online Exam Integrity?

tighter time limits. While Koh and Daniel [17] also noted identified one of the key
strategies used by educators as teaching transitioned to an online mode was to convert
multiple-choice questions to written critical thinking questions, a move which the
students reported as leading to their perception that the examinations become harder.

3 Methodology
The methodology for examining the critical and higher-order thinking capabilities of
ChatGPT is described here. Three steps were followed listed below and described in
more detail in this section.
1. Firstly, ChatGPT was asked itself to generate examples of difficult critical
thinking questions that involve some scenario, and which target undergraduate
students from various disciplines.
2. Secondly, ChatGPT was then asked to provide an answer to the generated
questions.
3. Lastly, ChatGPT was asked to critically evaluate the answer given to the question.

Figure 1: The publicly accessible online interface to ChatGPT shows the text
input prompt that the bottom.

Setup: An account was created through OpenAI[19] granting access to ChatGPT


under the beta research release for experimental usage. Figure 1 shows the online
portal for accessing ChatGPT.

5
ChatGPT: The End of Online Exam Integrity?

Selected disciplines: A broad range of discipline areas from the Sciences, Ed-
ucation Studies, Humanities, and Business were selected for demonstrative purposes.
Specifically, ChatGPT was prompted to generate subject-specific questions and re-
sponses with respect to specific disciplines of Machine Learning, Marketing, History,
and Education.

Question generation: To test the ability of ChatGPT to generate critical


thinking questions across multiple disciplines, the model was first provided with a
prompt indicating that an example of critical thinking questions for each field of study
was sought, which were challenging but appropriate for undergraduate students. This
was accomplished by posing the following prompt for each discipline:

"Please generate an example of a difficult critical question in


the field of Machine Learning targeting undergraduate students
which involves an hypothetical scenario."

Answer generation: ChatGPT was subsequently prompted to answer the ques-


tions it devised for each discipline with specific instructions on the structure and form
of the response as follows

"Please answer the following question in several paragraphs using


500 words with examples and supporting arguments."

Critical evaluation of responses: To test the ability of ChatGPT to critically


evaluate and analyze the responses it has generated, and to provide suggestions for
improvement, the model was prompted to consider the original question posed and the
answer given with the following additional instruction:

"Can you please critically evaluate the following response to the


above question, listing its strengths and weaknesses as well as
suggestions for improvement."

3.1 Evaluation of responses


Universal intellectual standards are standards that a suitable for use in evaluating the
quality of reasoning about any problem, issue, or situation [21]. The level of critical
thinking expressed ought to consider the purpose, information, and implications of
the arguments presented, and these need to be evaluated for quality using universal
intellectual standards as outlined by Paul [20], with persuasiveness and originality also
added. To that end, the following criteria representing dimensions of critical thinking
will be applied to the responses to assess the quality of the responses provided by
ChatGPT to the prompts outlined above.

6
ChatGPT: The End of Online Exam Integrity?

Relevance: Is the idea being expressed relevant to the topic or question at hand?
Does it address the complexities of the issue?

Clarity: Is the text easy to understand? Is it well-structured and logically organized?


Does it use appropriate language and vocabulary for the intended audience?

Accuracy: Is the idea being expressed true or accurate? Can it be verified through
evidence or other means?

Precision: Is the idea being expressed specific and detailed enough? Is it precise
and unambiguous?

Depth: Does the idea being expressed go beyond the surface level and consider the
underlying complexities and nuances of the issue? Does the text provide a thorough
and in-depth analysis of the topic? Does it consider multiple perspectives and present
a balanced view?

Breadth: Does the idea being expressed consider the full range of relevant perspec-
tives and viewpoints on the issue?

Logic: Does the idea being expressed follow logical and consistent reasoning? Are
the conclusions supported by the evidence presented?

Persuasiveness: Does the text effectively persuade the reader to accept its argu-
ments or conclusions? Is the evidence presented strong and convincing?

Originality: Does the text offer new insights or ideas, or does it simply repeat
information that is already widely known?

4 Results
The responses by ChatGPT to the prompts outlined in the methodology are shown
in four tables, with each one representing a separate discipline, namely Education in
Table 1, Machine Learning in Table 2, History in Table 3 and Marketing in Table 4.

4.1 Response Analysis


There are several key factors to consider when conducting a higher order critical analysis
of the regenerated responses.

Clarity: Across all responses to the prompts, ChatGPT has demonstrated strong
clarity. The language used in the responses is straightforward to understand and
follows the structure and conventions of what one would expect from natural language
responses. The responses are well-organized and coherent there is an intentional flow
of ideas in longer texts. Clarity is also expressed in the rationale provided for the
questions and well as in the critical evaluations. The vocabulary, especially the technical
language when necessary, and grammar can be regarded as appropriate for the intended
audience.

7
ChatGPT: The End of Online Exam Integrity?

Table 1: The example of a critical-thinking question generated by ChatGPT


and its underlying reasoning for the discipline of Education. The response to
the question posed as generated by ChatGPT, together with its own critical
evaluation of its answer with suggestions for improvement.
Question with Rationale Answer
"In a hypothetical scenario In a hypothetical scenario where a school district is experiencing high
where a school district is ex- rates of student absenteeism, implementing a personalized learning
periencing high rates of stu- program may have a positive impact on attendance rates and student
dent absenteeism, how might achievement. Personalized learning programs are designed to tai-
implementing a personalized lor instruction to the individual needs and interests of each student,
learning program impact at- which can increase motivation and engagement in learning. When
tendance rates and student students feel that their learning is meaningful and relevant to their
achievement, and what con- own lives, they are more likely to attend school regularly and put
siderations should be taken forth effort in their studies.
into account when design- There is evidence to support the idea that personalized learning pro-
ing and implementing such a grams can improve attendance rates and student achievement. For
program?" Rationale: The example, a study by the U.S. Department of Education found that
above question is suitable be- students in personalized learning programs had higher attendance
cause it presents an hypo- rates and made greater gains in academic achievement compared to
thetical scenario that is rele- their peers in traditional programs. Similarly, a review of research
vant to the field of education on personalized learning by the Clayton Christensen Institute found
and challenges undergradu- that personalized learning programs had positive impacts on student
ate students to think criti- engagement, motivation, and achievement.
cally about how to address However, there are also considerations that need to be taken into
a real-world problem. The account when designing and implementing a personalized learning
question asks students to con- program in a school district experiencing high rates of student ab-
sider the potential impacts senteeism. One consideration is the need for adequate resources and
of a personalized learning support. Personalized learning programs often require more individu-
program on attendance rates alized attention from teachers and may require additional technology
and student achievement, as and materials. It is important to ensure that the school has the
well as the considerations necessary resources and support in place to effectively implement a
that should be taken into ac- personalized learning program. Another consideration is the need for
count when designing and im- proper training and professional development for teachers. Personal-
plementing such a program. ized learning approaches may be different from traditional methods,
This type of question encour- and teachers may need support and guidance to effectively implement
ages students to apply their these approaches in their classrooms.
knowledge and understand- Overall, implementing a personalized learning program in a school
ing of education theories and district experiencing high rates of student absenteeism may have a
practices to a hypothetical positive impact on attendance rates and student achievement. How-
situation and consider the ever, it is important to carefully consider the resources and support
potential consequences of dif- needed to effectively implement such a program, as well as the train-
ferent approaches. Overall, ing and professional development needs of teachers. By taking these
the question is suitable be- factors into account, a school district can increase the chances of suc-
cause it challenges students cess in improving attendance rates and student achievement through
to think critically and cre- personalized learning.
atively about a relevant issue
in the field of education.
Critical evaluation of the answer:
Strengths: The response provides evidence to support the idea that personalized learning programs
can improve attendance rates and student achievement. The response acknowledges the need for
adequate resources and support as well as proper training and professional development for teachers
in order to effectively implement a personalized learning program.
Weaknesses: The response does not provide specific examples or details about the studies or research
mentioned. It would be helpful to have more information about the context and methods of these
studies in order to better understand their findings and how they apply to the hypothetical scenario.
The response does not address any potential challenges or drawbacks of implementing a personalized
learning program in a school district experiencing high rates of student absenteeism. It would be
helpful to consider potential barriers or difficulties that may arise and how they could be addressed.
Suggestions for improvement: Provide specific examples and details about the studies or research
mentioned, including information about the context and methods used. This would help to provide
more context and credibility to the findings. Consider potential challenges or drawbacks of implement-
ing a personalized learning program in a school district experiencing high rates of student absenteeism
and suggest strategies for addressing them. This would provide a more balanced and nuanced per-
spective on the potential impacts of personalized learning. Consider other factors that may impact
attendance rates and student achievement in this hypothetical scenario, such as socio-economic fac-
tors, school culture, or community resources. This would provide a more comprehensive analysis of
the issue and suggest more holistic approaches to addressing it.

8
ChatGPT: The End of Online Exam Integrity?

Table 2: The example of a critical-thinking question generated by ChatGPT and


its underlying reasoning for the discipline of Machine Learning. The response
to the question posed as generated by ChatGPT, together with its own critical
evaluation of its answer with suggestions for improvement.
Question with Rationale Answer
"Imagine you are working on One possible reason for the discrepancy in performance between
a machine learning project the training data and unseen data could be overfitting. Overfitting
to predict the likelihood of occurs when a model is too complex and is able to fit the training
a customer churning from a data extremely well, but is not able to generalize to new data. This
subscription service. Your can be caused by having too many parameters in the model, or
model is performing well on by having a limited amount of training data. In either case, the
the training data, but when model becomes too specialized to the training data and is not able
you test it on unseen data, to accurately predict outcomes for new data.
the accuracy drops signifi- To troubleshoot this issue, one approach would be to try simplify-
cantly. What could be the ing the model by reducing the number of parameters or increasing
possible reasons for this dis- the amount of training data. This can help the model to more accu-
crepancy and how would you rately capture the underlying patterns in the data and improve its
troubleshoot the issue?" Ra- generalization performance. Another option would be to use tech-
tionale: The above question niques such as regularization, which helps to prevent overfitting by
is suitable because it presents introducing a penalty for using too many parameters in the model.
an hypothetical scenario that If overfitting is not the cause of the discrepancy, there could be
is commonly encountered in other factors at play. One possibility is that the training data
real-world machine learning and unseen data come from different distributions. For example,
projects. It challenges the the training data may be collected from a specific time period or
students to think critically geographic region, while the unseen data may come from a different
about the various factors that time period or region. In this case, the model may be performing
can impact the performance well on the training data because it has learned patterns that are
of a machine learning model, specific to that data, but those patterns may not generalize to the
and to consider how they unseen data.
might go about troubleshoot- To address this issue, one solution could be to ensure that the
ing and addressing any is- training data is representative of the overall distribution of data
sues. Additionally, the ques- that the model will be applied to. This can be achieved through
tion touches upon the impor- sampling techniques such as stratified sampling, which ensures that
tance of evaluating a model the training data contains a balanced representation of different
on unseen data, which is a subgroups within the overall data. Another option would be to use
crucial step in the machine domain adaptation techniques, which help the model to learn to
learning process. Overall, the adjust to changes in the data distribution.
question requires the students Overall, there are many potential reasons for a discrepancy in per-
to apply their knowledge of formance between training data and unseen data, and the appropri-
machine learning principles ate approach to troubleshooting will depend on the specific circum-
and techniques to a realis- stances of the project. By carefully evaluating the model and the
tic scenario, which helps to data, and applying appropriate techniques to address any issues, it
deepen their understanding is possible to improve the generalization performance of the model
of the field. and increase its accuracy on new data.

Critical evaluation of the answer:


Strengths: The response correctly identifies overfitting as a possible cause of the discrepancy in
performance between the training data and unseen data. The response provides several suggestions
for troubleshooting the issue, including simplifying the model, using regularization techniques, and
ensuring that the training data is representative of the overall data distribution.
Weaknesses: The response does not mention other possible causes of the discrepancy beyond
overfitting and data distribution, such as bias in the training data or a lack of relevant features
in the model. The response does not provide specific examples of techniques that can be used to
simplify the model or ensure that the training data is representative of the overall data distribution.
Suggestions for improvement: Mention other potential causes of the discrepancy, such as bias in
the training data or a lack of relevant features in the model. Provide specific examples of techniques
that can be used to simplify the model or ensure that the training data is representative of the
overall data distribution, such as feature selection or sampling techniques. Explain the benefits and
drawbacks of each technique suggested, and provide guidance on how to decide which technique is
most appropriate for a given situation.

9
ChatGPT: The End of Online Exam Integrity?

Table 3: The example of a critical-thinking question generated by ChatGPT


and its underlying reasoning for the discipline of History. The response to
the question posed as generated by ChatGPT, together with its own critical
evaluation of its answer with suggestions for improvement.
Question with Rationale Answer
"Imagine that a new discov- A new discovery that completely changes our understanding of a
ery has been made that com- major historical event would have a profound impact on our un-
pletely changes our under- derstanding of the past. This discovery would likely challenge
standing of a major historical long-held beliefs and assumptions about the event in question, and
event. How would this discov- would require historians to reevaluate and revise their interpreta-
ery impact our understanding tions of the past.
of the past, and what steps One example of this type of discovery might be the uncovering
would need to be taken to in- of new evidence about a key event in world history, such as the
corporate it into the historical assassination of President John F. Kennedy. For decades, the as-
narrative?" sassination has been viewed as the work of a lone gunman, but
Rationale: imagine if new evidence were discovered that indicated a broader
This question requires a deep conspiracy was at play. This new information would fundamentally
understanding of the role of change our understanding of the event and its place in history.
historical narratives in shap- To incorporate this discovery into the historical narrative, a number
ing our understanding of the of steps would need to be taken. First, historians would need to
past, as well as the ability carefully review and evaluate the new evidence to determine its
to think critically about the authenticity and relevance. This process might involve conducting
potential impacts and impli- additional research and analysis, as well as seeking the input of
cations of a major histori- other experts in the field. If the evidence is found to be credible, it
cal discovery. It also re- would then be necessary to incorporate it into the existing historical
quires an awareness of the narrative.
ways in which different per- This process would involve revising existing interpretations of the
spectives and interpretations event and developing new theories to explain the new information.
can impact our understand- This might involve reexamining other historical events and sources
ing of history, and the abil- in light of the new discovery, as well as considering how the discov-
ity to propose potential solu- ery changes our understanding of the broader historical context. In
tions to these challenges in addition, it would be important to communicate the new discovery
the context of a hypothetical to the public and other stakeholders, such as educators and poli-
scenario. cymakers, to ensure that the revised historical narrative is widely
understood and accepted.
Overall, a new discovery that completely changes our understand-
ing of a major historical event would have significant implications
for our understanding of the past. By carefully evaluating and in-
corporating this new information into the historical narrative, we
can ensure that our understanding of the past is accurate and com-
plete.
Critical evaluation of the answer:
The response to the question is a well-written and detailed discussion of the potential impact of a
new discovery that completely changes our understanding of a major historical event. The response
provides a clear explanation of how such a discovery would challenge existing beliefs and interpre-
tations of the past, and offers a thoughtful analysis of the steps that would need to be taken to
incorporate the discovery into the historical narrative.
One strength of the response is its use of a specific example to illustrate the potential impact of
a new discovery. By discussing the hypothetical discovery of new evidence about the assassination
of President John F. Kennedy, the response provides a concrete illustration of how a new discovery
might challenge existing historical narratives and require historians to revise their interpretations
of the past.
Another strength of the response is its thoroughness. The response covers all of the key steps that
would need to be taken to incorporate a new discovery into the historical narrative, from carefully
evaluating the evidence to revising existing interpretations and communicating the new information
to the broader public.
One potential weakness of the response is that it does not discuss the potential challenges and
difficulties that might arise in the process of incorporating a new discovery into the historical
narrative. For example, the response does not mention the potential for resistance or skepticism
from other historians or members of the public who may be attached to the existing historical
narrative. Additionally, the response does not address the potential impact of the new discovery on
broader issues, such as the teaching of history or public policy.
To improve the response, one suggestion would be to provide more detail and analysis on the
potential challenges and difficulties that might arise in the process of incorporating a new discovery
into the historical narrative. This could include discussing the potential for resistance or skepticism
from other historians or members of the public, as well as the potential impact of the new discovery
on broader issues such as the teaching of history or public policy. Additionally, the response could be
strengthened by providing more concrete examples of how a new discovery might challenge existing
historical narratives and require historians to revise
10 their interpretations of the past.
ChatGPT: The End of Online Exam Integrity?

Table 4: The example of a critical-thinking question generated by ChatGPT


and its underlying reasoning for the discipline of Marketing. The response to
the question posed as generated by ChatGPT, together with its own critical
evaluation of its answer with suggestions for improvement.
Question with Rationale Answer
"Imagine you are a marketing As a marketing manager at a company that sells organic skincare
manager at a company that products, facing competition from a cheaper and more widely available
sells organic skincare prod- synthetic option can be a daunting challenge. However, with a well-
ucts. A competing company planned and targeted marketing strategy, it is possible to differentiate
has just released a new line our organic products and convince consumers to choose our brand over
of synthetic skincare products the synthetic options.
that are significantly cheaper One potential approach could be to focus on highlighting the benefits
and more widely available in of using organic products. Many consumers, particularly millennial
stores. Your target audience women, are increasingly concerned about the impact of their purchas-
is primarily health-conscious ing decisions on the environment and their own health. By emphasiz-
millennial women. How would ing the sustainability and natural ingredients of our products, we can
you develop a marketing strat- appeal to this group’s values and differentiate ourselves from the syn-
egy to differentiate your or- thetic competition. For example, we could use social media campaigns
ganic products and convince and influencer partnerships to showcase the benefits of using organic
consumers to choose your products and the negative consequences of synthetic ingredients on
brand over the cheaper syn- the environment and personal health.
thetic options?" In addition, we can leverage our unique selling points (USPs) to fur-
Rationale: ther differentiate ourselves from the competition. For example, if our
This is a suitable question be- products are made with rare or hard-to-find organic ingredients, or
cause it presents an actual if we offer a wider range of products than the synthetic brand, we
scenario that marketing man- can emphasize these points in our marketing efforts. By focusing on
agers may face in the real what makes our brand stand out, we can convince consumers that our
world and requires the stu- products are worth the extra cost.
dents to think critically about Another strategy could be to target a niche market within the larger
how to approach the problem. group of health-conscious millennial women. For example, we could
It also requires the students focus on pregnant women or women with sensitive skin, who may be
to consider the target audi- particularly concerned about the ingredients in their skincare prod-
ence and their motivations, ucts. By tailoring our marketing efforts to a specific subset of the
as well as the strengths and target audience, we can effectively communicate the benefits of our
weaknesses of both the or- products to a more targeted group of consumers.
ganic and synthetic products. Finally, it may also be beneficial to address consumer concerns about
The question also allows for the cost of our products head-on. One way to do this could be through
multiple potential solutions, the use of loyalty programs or special offers, such as discounts for first-
encouraging the students to time buyers or bundle deals. By offering value to our customers, we
consider a variety of strate- can help offset the perceived higher cost of our products and make
gies and consider the poten- them more appealing to price-sensitive consumers.
tial outcomes of each option. Overall, there are many potential approaches to differentiating our
Overall, this question chal- organic skincare products and convincing consumers to choose our
lenges the students to apply brand over the cheaper synthetic options. By highlighting the bene-
their knowledge and skills to fits of using organic products, leveraging our USPs, targeting a niche
a realistic and complex prob- market, and addressing cost concerns, we can effectively communicate
lem in the field of marketing. the value of our products and convince consumers to make the switch.
Critical evaluation of the answer:
The above response to the question presents several potential strategies for differentiating organic
skincare products and convincing consumers to choose them over cheaper synthetic options. One
strength of the response is that it considers the motivations and values of the target audience, specifi-
cally health-conscious millennial women, and suggests targeting marketing efforts towards those values.
It also highlights the importance of leveraging unique selling points and addressing cost concerns as
ways to differentiate the organic products.
However, there are a few weaknesses in the response that could be addressed for improvement. One
weakness is that the response does not consider the potential challenges or drawbacks of implementing
the suggested strategies. For example, targeting a niche market may be more effective in the short
term, but it may also limit the brand’s long-term growth potential. Additionally, offering discounts
or loyalty programs may help attract price-sensitive consumers, but it could also erode the perceived
value of the products.
Another weakness is that the response does not consider the potential competition from the synthetic
brand. For example, the synthetic brand may also target health-conscious consumers or offer loyalty
programs of their own. By considering the competitive landscape, the marketing manager can better
anticipate the actions of the competition and adjust their strategy accordingly.
Overall, the response presents several potential strategies for differentiating organic skincare products
and convincing consumers to choose them over cheaper synthetic options. However, to improve the
response, it would be helpful to consider the potential challenges and drawbacks of each strategy and
the competitive landscape.
11
ChatGPT: The End of Online Exam Integrity?

Accuracy: In order to fully consider the accuracy of the responses to the questions,
evaluations of subject experts from each of the four disciplines would need to be
sought. The author can attest to the accuracy of the response and the question and
the subsequent critique of the responses with respect to Machine Learning, where the
concept of overfitting is well described and examples of techniques that can be used to
address it are accurately provided.
It is beyond the scope of this study to draw in subject experts from Marketing,
Education (specializing in the U.S. context), and History to assess these responses for
accuracy. Returning to the Machine Learning question, the posed question is identical
to a question that the author has used in Data Science courses; however, with a different
scenario. The justification for generating this question is also correct. The critique
of the generated response is astute, and if integrated into the actual exam response,
would carry full marks which actual students rarely achieve in the experience of the
author.

Precision: The responses to the questions are specific as well as detailed. In the
context of the Machine Learning response, specific examples of techniques that can be
used to troubleshoot the issue were provided. The responses also clearly distinguish
between different potential causes of the discrepancy. In the context of Education, the
e U.S. Department of Education was drawn into the response as well as the Clayton
Christensen Institute. For History, the example of the assassination of President John
F. Kennedy was discussed, while the Marketing responses identified specific target
groups. The precision in responses is also demonstrated across all critical evaluations
where specific and detailed points are provided.

Relevance: In general, for each set of requests for each discipline, the responses
provided to the requests to generate an initial exam question, followed by an answer and
subsequently a critical evaluation of the answer, are all demonstrably relevant to the
prompts. All responses were on-topic and relevant to both the subject matter concerning
each discipline and to the intent of the requests, which required the generation of a
difficult question involving a hypothetical scenario, followed by an actual answer, and
then a critical analysis of the answer.

Depth: ChatGPT has demonstrated a noteworthy level of depth in terms of the


complexity of the questions formulated, as well as the rationales offered to support
them. The answers were constrained to 500 words, and within that constraint, well-
thought-out strategies and examples were provided for all disciplines. Meanwhile, the
critiques of the responses were substantial, with strengths and weaknesses, together with
suggested improvements offered as requested, demonstrating the AI agent’s thorough
and an in-depth analysis understanding of each of the topics.

Breadth: Again, given the constraints within which the responses needed to be
given, the provided answers provided explanations of two scenarios in each case. While
the suggestions for improvements in each case offered further examples of the agent’s
capacity for breadth.

Logic: All responses follow logical and consistent reasoning, providing specific exam-
ples and explanations. The logical coherence is particularly emphasized in rationales

12
ChatGPT: The End of Online Exam Integrity?

given to justify the generation of each question, as well as in the answers to the same
questions where different scenarios were explained and well-organized.

Persuasiveness: While subject experts may offer additional points of critique to


the answers offered by the AI agent, a remarkable feature of the responses is the
confidence with which they are expressed. Arguments and evidence are presented
in a clear and logical manner and efforts to address potential counterarguments or
objections are mentioned. Ultimately, the quality of the persuasiveness of the responses
will vary depending on the perspective of the reader, it is self-evident that they are,
rightly or wrongly, expressed without reservations.

Originality: The AI agent has been trained on a large dataset of human-generated


text that contains a vast amount of accumulated knowledge, and by design, generates
responses based on the patterns and conventions found in that dataset. It is therefore
hard to conceive that the responses can be original if they are correct. However,
since accuracy is a known weak point of large language models, it is conceivable that
some connections between topics and events from the dataset corpus made during
the training of the models are incorrect (referred to as "hallucinations"), in which
case, some responses could be original if they are incorrect. In this scenario, it is
probable that the responses provided present information and insights that are based
on established knowledge and practices in the field of language models and their
applications, and while the responses may not necessarily be original or novel in the
sense of presenting completely new ideas, they do seemingly useful information and
insights that are sufficient for answering undergraduate examinations.

5 Discussion
It is clear from the experimental evidence conducted in this paper that AI technologies
have reached exceptional levels and are now capable of critical thinking rather than
just information retrieval. The generated responses can be assessed as being clear in
exposition, precise with respect to examples used, relevant to the requests while being
sufficiently deep and broad considering the constraints imposed while being logically
coherent in longer texts. These advances in AI technology are not just incremental
improvements, but can arguably be regarded as the most significant technological leap
in the last 25 years.
One of the most impressive capabilities of ChatGPT is its ability to reason critically,
as well as express thoughts and ideas in flawless prose. This technology has demonstrated
exceptional competency in these areas, seemingly matching the capabilities of humans.
Paul [20] states that a person is a critical thinker to the extent that they regularly learn
and critique their thinking in order to improve it. The AI agent’s ability to critique
its own responses, with clear discussions of its merits and weaknesses, and suggested
improvements demonstrates that the agent possesses this capability. The AI agent has
exceeded mere memorization and information retrieval capabilities afforded by current
web search engines, and is instead now able to conceptualize and demonstrated higher
order thinking.
The full impact of these advances on society will become apparent over time,
but the consequences for the education sector are profound and particularly relevant
and immediate to online examination integrity. There is a risk that students could
potentially use ChatGPT to cheat on exams, as the technology is able to generate

13
ChatGPT: The End of Online Exam Integrity?

responses that are indistinguishable from those of a human. Experimental findings in


this study support this concern. This raises serious questions about the reliability and
validity of online exams going forward and the potential for cheating to occur.
Overall, ChatGPT’s competency in thinking and reasoning critically, as well as ex-
pressing thoughts and ideas in quality prose, presents both opportunities and challenges.
While it has the potential to facilitate the exchange of ideas, increase productivity and
facilitate learning, it also presents challenges for the education sector. This technology
will require careful consideration and oversight, as well as consensus on what constitutes
ethical and responsible use of this technology.

5.1 Recommendations for mitigating strategies


Finally, when ChatGPT was posed the question: "How would you suggest that students
are prevented from using ChatGPT from cheating in online university exams?", the
following response was given:

14
ChatGPT: The End of Online Exam Integrity?

"There are a few steps that universities could take in order to prevent students
from using ChatGPT or any other AI tool to cheat on online exams.
Firstly, universities could implement measures to prevent students from using
ChatGPT or any similar tool during the exam. For example, they could use
proctoring software to monitor students’ screens and ensure that they are not
using any unauthorized tools or resources during the exam. They could also
use secure browsers or other technologies to prevent students from accessing
external websites or applications during the exam.
Secondly, universities could also implement measures to detect and deter
students who attempt to use ChatGPT or any other AI tool to cheat on the
exam. For example, they could use plagiarism detection software to identify
and flag any exam answers that are suspiciously similar to answers generated
by ChatGPT or any other AI tool. They could also use machine learning
algorithms to analyze the exam answers and identify any patterns or anomalies
that may indicate the use of unauthorized tools or resources.
Thirdly, universities could also educate students about the ethical implications
of using ChatGPT or any other AI tool to cheat on exams. They could
provide students with information about the university’s academic integrity
policies, and could remind them of the consequences of using unauthorized
tools or resources during the exam. This could help to prevent students from
attempting to use ChatGPT or any similar tool to cheat on the exam, and
could encourage them to approach the exam honestly and ethically.
In addition to the steps mentioned above, universities could also consider using
AI to help prevent students from using ChatGPT or any other AI tool to cheat
on online exams. For example, they could use machine learning algorithms
to analyze the exam answers and identify any patterns or anomalies that
may indicate the use of unauthorized tools or resources. They could also use
natural language processing (NLP) algorithms to detect any answers that are
suspiciously similar to answers generated by ChatGPT or any other AI tool.
By using AI in this way, universities could improve their ability to detect and
deter students who attempt to use ChatGPT or any other AI tool to cheat
on the exam. This could help to ensure the fairness, validity, and integrity of
the examination process, and could prevent students from gaining an unfair
advantage over their peers.
Additionally, universities could also consider using AI to help develop more
secure and robust online exam platforms. For example, they could use AI
to improve the security of the exam platform, and to prevent students from
accessing unauthorized tools or resources during the exam. They could also
use AI to improve the user experience of the exam platform, and to make it
easier for students to take the exam from any location.
Overall, there are several ways in which universities could use AI to help
prevent students from using ChatGPT or any other AI tool to cheat on online
exams. By using AI to improve the security and reliability of the exam
platform, and to detect and deter cheating, universities can help to ensure the
fairness, validity, and integrity of their examination processes."

Perhaps some small signs of weakness of ChatGPT’s responses are found in the
above, and in fairness, possibly more due to the difficulty of devising a solution to

15
ChatGPT: The End of Online Exam Integrity?

the stated problem than to its capabilities. Firstly, the suggestion of using plagiarism
detection software is unlikely to be effective, as ChatGPT is able to generate unique
answers given the same prompts - this is particularly the case with small perturbations
to the inputs. This means that the answers generated by ChatGPT may not be similar
to any other answers, making it difficult for plagiarism software to identify them as
suspicious.
Secondly, the suggestion of using machine learning algorithms and natural language
processing (NLP) algorithms to detect suspicious answers is also in line with what is
currently being used by some HEIs [14], but may not be realistic at this point if these
solutions need to be developed. This solution is likely too expensive and unaffordable
for many universities. These technologies require significant resources and expertise to
implement and maintain, and may not be feasible for most institutions. There are some
indications that GPT-text output detectors already in existence5 have some potential
to identify AI-generated text due to an underlying signature in all the text. However,
these tools need further research.
With respect to the suggestion of using AI to improve the security and reliability of
the online exam platform and to detect and deter cheating may be ineffective. While
AI can be a useful tool in these areas, it is not a panacea and may not be able to fully
address the issue of cheating using ChatGPT or any other AI tool. Unfortunately, it
does not even appear to be effective to ask ChatGPT if it has generated specific pieces
of text in order to catch cheating, as preliminary attempts using this strategy have
shown that it does not retain records of the text it has generated in previous sessions.
Thirdly, the suggestion of educating students about the ethical implications of
using ChatGPT or any other AI tool to cheat on exams is unlikely to be effective in
preventing cheating. While education is an important part of promoting academic
integrity, it may not be sufficient on its own to deter students who are determined to
cheat, and such initiatives have already been shown to only be marginally effective [10].
One well-known limitation of ChatGPT is its uni-modal input capabilities - meaning
that ChatGPT can only accept human text as input. Therefore, in online examinations
without effective proctoring software is to be conducted, examinations would need to
incorporate more than just text for posing questions. Therefore, the following strategies
can be considered:
• Use multi-modal channels for exam questions: Embedding images to exam
questions can make it more difficult for students to cheat and for ChatGPT to
generate accurate responses, as the technology relies on text input only.
• Experiment with pre-recorded video recorded questions that combine verbal
questions with images: This can add an additional layer of difficulty for students
attempting to cheat and make it more challenging for ChatGPT to generate
accurate responses.
• GPT output detection: Check responses against GPT language detector models
online at various portals6
• Return to oral exams: Requiring students to demonstrate their knowledge
verbally in real-time online or on-campus premises.
It is only a matter of time before large language models evolve into more general
5 https://huggingface.co/openai-detector/
6 https://huggingface.co/openai-detector/

16
ChatGPT: The End of Online Exam Integrity?

AI agents with the ability to incorporate multiple channels7 , including images, videos,
and audio inputs. For now, exploiting the limitations of the technology is the only way
to stay ahead, while carefully evaluating the effectiveness of different strategies and
continually adapting and refining them as needed.

5.2 Limitations
This is a preliminary investigation into the capabilities of ChatGPT to be used for
answering critical thinking questions in online settings. As such, further improvements
could be made, such as using independent subject experts from Education, History,
and Marketing to evaluate the responses, indeed, subject experts and previous exam
questions from various courses could be used instead in future studies. However, this
study has also demonstrated that ChatGPT is capable of generating effective questions.

6 Conclusion
This study has investigated the capability of a recently released AI agent, ChatGPT,
to perform high-order thinking tasks and to generate text that is indistinguishable
from that of humans, and which could be used as a tool for academic dishonesty in
online examinations. The AI agent was prompted to generate questions, to provide
and rationale, followed by an answer as well as a critique.
The study has found that the emergence of technologies like ChatGPT presents
a significant threat to the integrity of online exams, particularly in the context of
tertiary education where online exams are becoming increasingly common. These
models demonstrate a high degree of critical thinking and are able to generate highly
realistic text with little input, making it possible for students to cheat on exams.
A return to invigilated and oral exams and the use of advanced proctoring tools
may be effective in combating this threat, though they are not a perfect solution. New
AI and machine learning tools capable of detecting text outputs from ChatGPT-like
models need to be researched. While further research is needed to fully understand the
implications of these large language models and to develop strategies for addressing the
potential for cheating using these tools. It is important for educators and institutions
to be aware of the potential of this tool to facilitate cheating and to take steps to
combat it, in order to maintain the integrity of online exams and ensure fair and valid
assessments for all students.

References
[1] OpenAI. URL https://openai.com/.

[2] Case 1:21-cv-00500-kdb document 37 aaron m. ogletree, plaintiff v. cleveland


state university, defendant. URL https://law.justia.com/cases/federal/
district-courts/ohio/ohndce/1:2021cv00500/275187/37/. Accessed on De-
cember 16, 2022.
7 GPT-4 is under development and multi-modal input capability is reportedly being re-
searched.

17
ChatGPT: The End of Online Exam Integrity?

[3] K. Ahsan, S. Akbar, and B. Kam. Contract cheating in higher education: a


systematic literature review and future research agenda. Assessment & Evaluation
in Higher Education, pages 1–17, 2021.

[4] H. M. Alessio, N. Malay, K. Maurer, A. J. Bailer, and B. Rubin. Examining the


effect of proctoring on online test scores. Online Learning, 21(1):146–161, 2017.

[5] P. Alin, A. Arendt, and S. Gurell. Addressing cheating in virtual proctored


examinations: toward a framework of relevant mitigation strategies. Assessment
& Evaluation in Higher Education, pages 1–14, 2022.

[6] I. J. Arnold. Cheating at online formative tests: Does it pay off? The Internet
and Higher Education, 29:98–106, 2016.

[7] M. Barber, L. Bird, J. Fleming, E. Titterington-Giles, E. Edwards, and C. Leyland.


Gravity assist: Propelling higher education towards a brighter future - office for
students, 2021. URL https://www.officeforstudents.org.uk/publications/
gravity-assist-propelling-higher-education-towards-a-brighter-future/.

[8] K. Butler-Henderson and J. Crawford. A systematic review of online examinations:


A pedagogical innovation for scalable authentication and integrity. Computers &
Education, 159:104024, 2020.

[9] S. Coghlan, T. Miller, and J. Paterson. Good proctor or “big brother”? ethics of
online exam supervision technologies. Philosophy & Technology, 34(4):1581–1606,
2021.

[10] H. Corrigan-Gibbs, N. Gupta, C. Northcutt, E. Cutrell, and W. Thies. Deterring


cheating in online environments. ACM Trans. Comput.-Hum. Interact., 22(6), sep
2015. ISSN 1073-0516. doi: 10.1145/2810239. URL https://doi.org/10.1145/
2810239.

[11] C. Crook and E. Nixon. How internet essay mill websites portray the student
experience of higher education. The Internet and Higher Education, 48:100775,
2021.

[12] D. Dumulescu and A. I. Muţiu. Academic leadership in the time of covid-


19—experiences and perspectives. Frontiers in Psychology, 12:648344, 2021.

[13] A. Fask, F. Englander, and Z. Wang. Do online exams facilitate cheating? an


experiment designed to separate possible cheating from the effect of the online
test taking environment. Journal of Academic Ethics, 12(2):101–112, 2014.

[14] M. Garg and A. Goel. A systematic literature review on online assessment security:
Current challenges and integrity strategies. Computers & Security, 113:102544,
2022.

[15] M. Henderson, J. Chung, R. Awdry, M. Mundy, M. Bryant, C. Ashford, and


K. Ryan. Factors associated with online examination cheating. Assessment &
Evaluation in Higher Education, pages 1–15, 2022.

[16] S. Khan, M. E. K. Kambris, and H. Alfalahi. Perspectives of university students


and faculty on remote education experiences during covid-19-a qualitative study.
Education and information technologies, 27(3):4141–4169, 2022.

18
ChatGPT: The End of Online Exam Integrity?

[17] J. H. L. Koh and B. K. Daniel. Shifting online during covid-19: A systematic


review of teaching and learning strategies and their outcomes. International
Journal of Educational Technology in Higher Education, 19(1):1–23, 2022.

[18] F. Noorbehbahani, A. Mohammadi, and M. Aminazadeh. A systematic review of


research on cheating in online exams from 2010 to 2021. Education and Information
Technologies, pages 1–48, 2022.

[19] OpenAI. Chatgpt: A large-scale generative pre-training model for conversation,


2020. URL https://openai.com/blog/chatgpt/.

[20] R. Paul. The state of critical thinking today. New directions for community
colleges, 2005(130):27–38, 2005.

[21] R. Paul and L. Elder. Critical thinking. Sonoma State University Rohnert Park,
CA, 1990.

[22] K. F. Stanger-Hall. Multiple-choice exams: an obstacle for higher-level thinking in


introductory science classes. CBE—Life Sciences Education, 11(3):294–306, 2012.

[23] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,


and I. Polosukhin. Attention is all you need. Advances in neural information
processing systems, pages 5998–6008, 2017.

[24] D. J. Watters and J. J. Watters. Approaches to learning by students in the


biological sciences: Implications for teaching. International Journal of Science
Education, 29(1):19–43, 2007.

[25] B. L. Whisenhunt, C. L. Cathey, D. L. Hudson, and L. M. Needy. Maximizing


learning while minimizing cheating: New evidence and advice for online multiple-
choice exams. Scholarship of Teaching and Learning in Psychology, 2022.

[26] A. Y. Zheng, J. K. Lawhorn, T. Lumley, and S. Freeman. Application of bloom’s


taxonomy debunks the" mcat myth". Science, 319(5862):414–415, 2008.

A Examples of Multiple-choice questions and an-


swers generated by ChatGPT
The following are responses to the requests to generate examples of multiple-choice
questions across four disciplines considered in this study, and to provide answers with
associated explanations.

A.1 Machine Learning


Question: Which of the following is NOT a disadvantage of using a support vector
machine (SVM) model for classification tasks?
• A) SVMs can be sensitive to the choice of kernel and hyperparameters, which
can be difficult to select.
• B) SVMs can have poor generalization performance when the data is not linearly
separable.

19
ChatGPT: The End of Online Exam Integrity?

• C) SVMs are not sensitive to the scaling of the input features.


• D) SVMs can be slow to train on large datasets.
• E) SVMs can be prone to overfitting when the number of features is much greater
than the number of training examples.
Answer: C) SVMs are not sensitive to the scaling of the input features.

Explanation: SVMs are a type of linear classifier that seeks to find the hyperplane
in feature space that maximally separates the classes. One disadvantage of SVMs is
that they can be sensitive to the choice of kernel and hyperparameters, which can be
difficult to select and can impact the model’s performance. SVMs can also have poor
generalization performance when the data is not linearly separable, and they can be
slow to train on large datasets. In addition, SVMs can be prone to overfitting when the
number of features is much greater than the number of training examples. However,
SVMs are not sensitive to the scaling of the input features, meaning that they can
perform well even if the features are on different scales.

A.2 Education
Question: Which of the following theories best explains the process of learning
through observation and imitation?
• A) Piaget’s cognitive development theory
• B) Vygotsky’s sociocultural theory
• C) Bandura’s social learning theory
• D) Bloom’s taxonomy of educational objectives
Answer: C) Bandura’s social learning theory

Explanation: Bandura’s social learning theory explains the process of learning


through observation and imitation by stating that individuals learn by watching and
mimicking the actions of others. This is known as vicarious learning or modeling.
According to the theory, individuals do not only learn new behaviors through direct
reinforcement, but also through observing and imitating the actions of others. For
example, a child may learn to tie their shoes by watching their parent demonstrate
the action and then practicing it themselves. Piaget’s cognitive development theory
focuses on the child’s mental development and the stages they go through in order to
understand the world around them. Vygotsky’s sociocultural theory emphasizes the
role of cultural and social factors in learning and development. Bloom’s taxonomy
of educational objectives is a classification system that outlines the different levels of
cognitive thinking and learning objectives. None of these theories specifically address
the process of learning through observation and imitation.

A.3 Marketing
Question: In which of the following marketing strategies is the company attempting
to persuade consumers to purchase their product through emotional appeals and
aspirational messaging?
• A) Price skimming

20
ChatGPT: The End of Online Exam Integrity?

• B) Psychological pricing
• C) Market segmentation
• D) Lifestyle branding
• E) Product positioning
Answer: D) Lifestyle branding

Explanation: Lifestyle branding is a marketing strategy that focuses on creating


an emotional connection with consumers by presenting the company’s products as a
way to enhance the consumer’s desired lifestyle or image. This strategy often involves
aspirational messaging and appeals to the consumer’s emotions in order to persuade
them to purchase the product. The other options do not involve emotional appeals or
aspirational messaging, so they are not the correct answer.

A.4 History
Question: Which of the following best explains the significance of the Indian Mutiny
of 1857 to the British Empire?
• A) It marked the beginning of the end of British colonial rule in India.
• B) It led to the establishment of the British Raj, a period of direct British rule
in India.
• C) It sparked a global debate about the morality of imperialism.
• D) It resulted in the abolition of the East India Company and the transfer of
power to the British Crown.
• E) It contributed to the development of the British industrial revolution.
Answer: D) It resulted in the abolition of the East India Company and the transfer
of power to the British Crown.

Explanation: The Indian Mutiny of 1857, also known as the Sepoy Mutiny, was a
significant event in the history of British India. It was a widespread rebellion against
the British East India Company, which was the main governing body in India at the
time. The mutiny began as a protest against the use of animal fat in the greasing of
rifle cartridges, which was offensive to Hindus and Muslims, but it quickly escalated
into a broader uprising against British rule. The mutiny was eventually suppressed, but
it had far-reaching consequences for the British Empire. One of the most significant
outcomes of the mutiny was the abolition of the East India Company and the transfer
of power to the British Crown. This marked a major shift in the governance of India,
as the British government took direct control of the country and established the British
Raj, a period of direct British rule that lasted until 1947.

21

You might also like