Generation of Assessment Questions from Textbooks Enriched with Knowledge Models

Generation of Assessment Questions from
Textbooks Enriched with Knowledge Models
Lucas Dresscher, Isaac Alpizar Chacon, Sergey Sosnovsky

1a. Introduction
● Assessment question in digital textbooks:
○ opportunity to practice
○ opportunity to receive feedback
○ opportunity to engage in interactive learning

1a. Introduction
● Assessment questions in adaptive textbooks:
○ opportunity to assess student knowledge
➢ opportunity to adapt

1a. Introduction
● Assessment questions in adaptive textbooks:
○ opportunity to assess student knowledge
➢ opportunity to adapt
● Three primary ways to add assessment questions:
○ create them manually
○ integrate external assessment material
○ generate them automatically

1b. Introduction
● AQG: current state-of-the-art
○ High-quality factoid questions
○ Limitations:
■ Simplicity
■ Few generic systems
■ Little adaptivity
● Contributions
○ Unique source type
○ Open domain
○ Variety of questions
○ Supports adaptivity

2. Intextbooks Platform
● Knowledge models
○ Extraction: create semantic model
○ Enrichment: link additional information
○ Serialization: export model

3. Automatic Question Generation System
● Generic semantic rule-based AQG system
○ Domain independent
○ Uses textual and semantic information
○ Can generate questions for a part of a book or a (set of) concept(s)

3A. Source Extraction
● Input
○ Knowledge model of a textbook
● Extraction
○ Relevant sentences
○ Index terms’ enrichments
■ Domain specificity
● Output
○ Initial set of sentences

3B. Preprocessing
● Annotation pipeline Stanford CoreNLP
○ Parts of Speech (POS)
○ Dependency parsing
○
● Filtering
○ Incorrect form (questions, imperative, grammatically incomplete,..)
○ Context reference (needs preceding sentences to make sense)
○ Visual references (figures, tables,...)
○ Numerical examples,
○ etc.
● Output
○ set of grammatically-fit annotated sentences

3C. Sentence Selection
● Select most appropriate sentences
● Scoring process
○ Weighted average of individual features
■ Different aspect
■ Feature score f [0, 1]
■ Weight w
○ Sentence score s [0, 1]
○ Threshold comparison
● Output
○ Potential source phrases

● Determine generatable question types
● Question types
○ True-false (unmodified)
○ True-false (negated)
○ True-false (substituted)
○ Cloze
○ Multiple-choice
● Output
○ Definitive source sentences
The mean is the average.
The mean is not the average.
The median is the average.
The ______ is the average.
The ______ is the average.
3D. Question Type Selection
A. Median
B. Mean
C. Mode
D. Variance

3E. Question Construction
● Create questions in surface form
● TFU
○ Directly use sentence
○ Answer: true
● TFN
○ Negate question stem
○ Answer: false
● TFS
○ Substitute target concept
○ Answer: false
The mean is not the average.
The median is the average.

3E. Question Construction
● CQ
○ Replace target concept by gap
○ Gap selection
● MCQ
○ CQ with options
■ Key
■ Distractors
○ Distractor generation
■ Related elements
■ Scoring procedure
The _____ is the average.
The _____ is the average.
A. Median
B. Mean
C. Mode
D. Variance

4. Evaluation
● Selection procedure
○ Three university-level statistics textbooks
○ Ten randomly selected co-occurring concepts
■ Five for automatic generation
■ Five for manual creation
○ 50 questions in total

4. Evaluation
● Selection procedure
○ Three university-level statistics textbooks
○ Ten randomly selected co-occurring concepts
■ Five for automatic generation
■ Five for manual creation
○ 50 questions in total
● Evaluation approach
○ Expert evaluation
○ Metrics
■ General: wording, assessment value, difficulty
■ Specific: gap quality, distractor quality

4. Evaluation
● Research questions:
1. Is the approach conceptually sound?
2. Is the approach practically sound?

4. Evaluation
● Inter-rater agreement (Fleiss’ Kappa)
○ 0.24 wording
○ 0.27 assessment value
○ -0.02 difficulty

4. Evaluation
● Inter-rater agreement (Fleiss’ Kappa)
○ 0.24 wording
○ 0.27 assessment value
○ -0.02 difficulty
● Comparison between handcrafted and generated questions
(Mann-Whitney U test)
○ Statistically significant difference for the overall assessment value (Handcrafted > Generated)
■ 0.32 (U = 413.5, P = 0.048)
■ No statistically significant difference per question type
○ No significant differences for the overall wording

4. Evaluation
● Assessment value
needs improvements
● Largest difference
TFUs and MCQs
● TFSs particularly poor

4. Evaluation
● Good overall wording
● Small differences
● TFSs, CQs and MCQs
particularly good

4. Evaluation
● Easy to medium
● Small differences

5. Conclusion
● Limitations
○ Nature of education textbooks
○ Weights of feature sets
● Future work
○ Other domains
○ Additional features
○ Closer Intextooks integration

Time to Generate Some Questions

4. Evaluation
● Question type specific metrics
● Gap quality good
○ No difference
○ Comment: ambiguous question
● Distractor quality mediocre
○ 2 out of 3 good handcrafted
○ 1-2 out of 3 good generated
○ Comment: unrelated to key

Generation of Assessment Questions from Textbooks Enriched with Knowledge Models

More Related Content

Similar to Generation of Assessment Questions from Textbooks Enriched with Knowledge Models (20)

More from Sergey Sosnovsky (20)

Recently uploaded (20)

Generation of Assessment Questions from Textbooks Enriched with Knowledge Models

Editor's Notes