Quick Start Tutorial of KH Coder 2: Quantitative Content Analysis or Text Min...khcoder
This document provides a quick start tutorial for using KH Coder, a free software for quantitative content analysis and text mining of English language data. It outlines steps for configuring KH Coder for English, preparing a project using an English novel as sample text, performing preprocessing and analyzing word frequencies. It also demonstrates methods for exploring word co-occurrences, identifying distinctive words in chapters, and using coding rules to count concepts and perform cross tabulation of codes. The goal is to analyze themes and characteristics of each chapter in the novel.
This document discusses executing SQL queries and making plugins in KH Coder. It provides an overview of KH Coder and its capabilities for quantitative content analysis, text mining, and computational linguistics. It then demonstrates how to execute SQL queries directly in KH Coder to bypass search functions or automate processes using plugins. Sample plugin code is shown. Tables in the KH Coder MySQL database are outlined, including the structure of key tables like words and lemmas.
MacAD.UK 2018: Your Code Should Document ItselfBryson Tyrrell
Slide deck for my presentation at MacAD.UK 2018. I this talk I cover how I use Python docstrings, reStructuredText, and Sphinx to generate human readable documentation from my code base, and then automate the creation of that documentation with ReadTheDocs.org.
**Some of the slides contained video that is not shown in this copy.**
Automated Software Requirements LabelingData Works MD
Video of the presentation is available here: https://youtu.be/L6EMnvALYtU
Talk: Machine Learning for Requirements Engineering
Speaker: Jon Patton
This project applies a number of machine learning, deep learning, and NLP techniques to solve challenging problems in requirements engineering.
This document summarizes a text mining project analyzing Stack Overflow posts tagged with R, statistics, machine learning, and other tags. It describes the dataset, data cleaning process, feature engineering including creating word frequency matrices, and unsupervised feature selection to reduce the feature space. Power features beyond word frequencies were also extracted, such as counts of code blocks, LaTeX blocks, and words in titles and bodies. The goal was to classify posts as related to R or not related to R for supervised learning.
R by example: mining Twitter for consumer attitudes towards airlinesJeffrey Breen
This document describes analyzing sentiment towards airlines on Twitter. It searches Twitter for mentions of airlines, collects the tweets, scores the sentiment of each tweet using a simple word counting algorithm, and summarizes the results for each airline. It then compares the Twitter sentiment scores to customer satisfaction scores from the American Customer Satisfaction Index. A linear regression shows a relationship between the Twitter and ACSI scores, suggesting Twitter sentiment analysis can provide insights into customer satisfaction.
This power point contains:
The Problem Solving Aspect, Problem definition phase, Getting started on a problem, The use of specific examples Similarities among problems,
Working backwards from the solution, General Problem solving strategies
Introduction to programming in C, First Program in C
Variables and Identifiers, Data types, Basic Input Output in C - Characters, Basic Input Output in C - Formatted IO
Arithmetic Operators, Relational and Logical Operators, More Operators, Precedence and Associativity of operators
If statement, if-else statement, if statement mistakes, nested if statements.Decision control sturctures.
Conditional Operator, Switch statement
Decision control sturctures (contd.).
While loop, Looping - For loop, Control with break and continue, Variants of for loop. Iterative control structures.
Nested for loops , Printing patterns with loops, do-while loop. Iterative control structures (Contd.).
Functions in c, Definition and declaration of a function, Scope of a function
auto storage class, extern storage class, static storage class, register storage class
Introduction to Pointers in C, Parameter passing techniques, Pointer Arithmetic in C, Pointer Arithmetic with Pointers
Arrays, Searching an Element, Arrays and Memory in C
Pointers with Arrays, Functions and Arrays
2D Arrays. Implementation, basic transformations on 2D arrays like transpose, addition, subtraction and multiplication.
Introduction to recursion, Recursion basic programs like: factorial, Fibonacci, sum of digits
VisibleThread Docs Training for New Users - Updated July 2014 (VT version 2.10)VisibleThread
This document provides an agenda and overview for a 1-day training on using Visible Thread software for request for proposal (RFP) and proposal development. It covers navigating the Visible Thread dashboard interface, creating folders and uploading documents, understanding folder versus document views, using dictionaries and structure outlines, analyzing plain language readability statistics, creating compliance matrices by searching documents for mandatory language, and identifying weak language using quality analysis scans. Exercises are included to have users practice these skills by exploring sample documents and running analyses.
This document discusses methods for generating descriptive elements (DEs) to summarize texts for queries.
It presents two main works: (1) extracting candidate DEs and (2) assigning DEs to texts. For the first work, it extracts DE candidates from web search results and evaluates them to find adequate candidates. For the second work, it assumes texts with the same DE contain similar words and uses triggers of co-occurring words to assign DEs, achieving high recall but low precision. It then explores using modification relations between words to construct triggers, but precision remains low.
The conclusion is that triggers alone do not ensure precision in DE assignment. The system needs to use only the part of the text that explains
The key points of maintainable code according to the document are:
1) Code should be written to be easily read and understood. Methods and classes should not go on for too many lines to hold the reader's context.
2) Code should leverage existing libraries rather than reinventing the wheel for common tasks.
3) Guard clauses and failure fast approaches like exceptions help simplify code flow and make errors obvious.
4) Null values and unnecessary complexity like nested if statements should be avoided when possible.
1. The document introduces Codex, a large language model fine-tuned on publicly available code from GitHub. It evaluates Codex's ability to generate Python code from docstrings on a new dataset of 164 programming problems called HumanEval.
2. Codex is able to solve 28.8% of problems in HumanEval with a single sample, compared to 0% for GPT-3 and 11.4% for GPT-J. Generating 100 samples per problem increases Codex's performance to 70.2%.
3. The document discusses limitations of existing code generation metrics and proposes evaluating functional correctness by checking if generated code passes unit tests. It also describes the sandbox used to safely execute generated code.
The document provides an overview of machine learning and discusses various concepts related to applying machine learning to real-world problems. It covers topics such as feature extraction, encoding input data, classification vs regression, evaluating model performance, and challenges like overfitting and underfitting models to data. Examples are given for different types of learning problems, including text classification, sentiment analysis, and predicting stock prices.
The Ring programming language version 1.5.1 book - Part 5 of 180Mahmoud Samir Fayed
The document describes the Ring programming language. Key features include:
- Native object-oriented support including classes, inheritance, and polymorphism.
- Reflection and meta-programming capabilities.
- Clear program structure with statements first, then functions, packages, and classes.
- Support for exception handling, runtime code evaluation, I/O, math, strings, lists, files, databases, security, internet, zip files, and CGI.
- Ability to create GUI, game, and web applications via embedded libraries.
- Simple, natural syntax that encourages organization and readability.
The document discusses language-independent methods for clustering similar contexts without using syntactic information or manually annotated data. It describes representing contexts as vectors of lexical features like unigrams and bigrams. First-order representations use features directly present in contexts, while second-order incorporates related words via co-occurrence networks. Measures like log-likelihood help identify meaningful word associations as features. The goal is to cluster contexts based on their feature vectors, as implemented in the SenseClusters software.
Hands-On Lab Data Transformation Services - SQL ServerSerra Laercio
The document provides instructions for a hands-on lab to learn how to use SQL Server 2005 Data Transformation Services (DTS). The objectives are to create a DTS package, add tasks like Execute SQL and Data Flow, monitor package execution, and log results. Steps guide adding an existing package to a new project, examining the package components, executing it in debug mode. Further steps create a new package, data source, and connections and walk through adding tasks to load data into a dimension table using transformations.
The document discusses aspects of being a professional including being highly educated, working autonomously on intellectually challenging tasks, defining technical terms, reading books, referring to references, thinking before working and complaining, and not being overly pedantic. It provides examples of some technical terms and concepts along with explanations to illustrate how to think like a professional.
This chapter provides an overview of the content and structure of the book. It introduces the main chapters that will focus on tools and programming languages like Linux, shell scripting, sed, awk, perl, MySQL, R, and worked examples. It outlines the prerequisites needed to complete the exercises, which are running Linux, Unix, Mac OS X, or Cygwin. Formatting conventions for the text are also described. Additional online resources that accompany the book are mentioned. The overall aim is to present fundamental concepts and tools for command line programming and data analysis in a practical, example-driven way.
The Ring programming language version 1.5.4 book - Part 6 of 185Mahmoud Samir Fayed
This document provides an overview of the Ring programming language. Key features include native object-oriented support with encapsulation, inheritance, polymorphism and composition. It also supports reflection, exception handling, runtime code evaluation, I/O, math functions, strings, lists, files, databases, security, internet, zip, and CGI functionality. The language aims to have clear structure, be compact, encourage organization, and support both procedural and object-oriented paradigms. It can be used to create applications, libraries, games and more.
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
A Large number of digital text information is generated every day. Effectively searching, managing and
exploring the text data has become a main task. In this paper, we first present an introduction to text
mining and LDA topic model. Then we deeply explained how to apply LDA topic model to text corpus by
doing experiments on Simple Wikipedia documents. The experiments include all necessary steps of data
retrieving, pre-processing, fitting the model and an application of document exploring system. The result of
the experiments shows LDA topic model working effectively on documents clustering and finding the
similar documents. Furthermore, the document exploring system could be a useful research tool for
students and researchers.
C# coding standards, good programming principles & refactoringEyob Lube
The document discusses C# coding standards and principles of good programming. It covers topics such as the importance of coding standards, principles like KISS and SOLID, general naming conventions, and refactoring code for better design. Specific points made include that coding standards create consistency, enable quicker understanding of code, and facilitate collaboration. The SOLID principles of single responsibility, open/closed, Liskov substitution, interface segregation and dependency inversion are explained. Meaningful naming is also emphasized, with examples of how intention-revealing names can greatly improve readability and maintainability of code.
The presentation was given to Rivier Scala / Clojure User Group meeting on 10.6.2013. It is half-baked presentation. Will upload the final version when ready.
The first part is about DSLs in general, complexities in software engineering and abstraction. The seconds part presents an quick overview about DSLs in Scala and touches some of the technologies used for deep embedding.
Building a program refers to the process of taking editable source code and other files and transforming them into a finished software product through various steps like preprocessing, compiling, linking, testing, and packaging. CMake is a cross-platform build system developed by Kitware in 2001 that consists of tools like CMake, Ctest, and CPack to manage the entire build process across different platforms.
The document discusses language-independent methods for clustering similar contexts without using syntactic or lexical information from annotated resources. It describes representing contexts as vectors based on lexical features, and clustering the vectors to group similar contexts. Contexts can be headed, containing a target word, or headless. Features include unigrams, bigrams, and co-occurrences, identified by frequency or association measures. Contexts can be represented in first-order vectors based on feature presence, or second-order vectors averaging word co-occurrence vectors.
This document discusses computer tools for academic research. It aims to make computer use more effective for research tasks like downloading data, running regressions, and writing papers. The course covers programming principles, version control, data management beyond spreadsheets, modular Python programming, testing code, and numeric computing tools. It uses a sample research project on social networks and app adoption to illustrate these tools. The document compares the academic research cycle to software development and argues that following good programming practices can help optimize researchers' time.
VisibleThread Docs Training for New Users - Updated July 2014 (VT version 2.10)VisibleThread
This document provides an agenda and overview for a 1-day training on using Visible Thread software for request for proposal (RFP) and proposal development. It covers navigating the Visible Thread dashboard interface, creating folders and uploading documents, understanding folder versus document views, using dictionaries and structure outlines, analyzing plain language readability statistics, creating compliance matrices by searching documents for mandatory language, and identifying weak language using quality analysis scans. Exercises are included to have users practice these skills by exploring sample documents and running analyses.
This document discusses methods for generating descriptive elements (DEs) to summarize texts for queries.
It presents two main works: (1) extracting candidate DEs and (2) assigning DEs to texts. For the first work, it extracts DE candidates from web search results and evaluates them to find adequate candidates. For the second work, it assumes texts with the same DE contain similar words and uses triggers of co-occurring words to assign DEs, achieving high recall but low precision. It then explores using modification relations between words to construct triggers, but precision remains low.
The conclusion is that triggers alone do not ensure precision in DE assignment. The system needs to use only the part of the text that explains
The key points of maintainable code according to the document are:
1) Code should be written to be easily read and understood. Methods and classes should not go on for too many lines to hold the reader's context.
2) Code should leverage existing libraries rather than reinventing the wheel for common tasks.
3) Guard clauses and failure fast approaches like exceptions help simplify code flow and make errors obvious.
4) Null values and unnecessary complexity like nested if statements should be avoided when possible.
1. The document introduces Codex, a large language model fine-tuned on publicly available code from GitHub. It evaluates Codex's ability to generate Python code from docstrings on a new dataset of 164 programming problems called HumanEval.
2. Codex is able to solve 28.8% of problems in HumanEval with a single sample, compared to 0% for GPT-3 and 11.4% for GPT-J. Generating 100 samples per problem increases Codex's performance to 70.2%.
3. The document discusses limitations of existing code generation metrics and proposes evaluating functional correctness by checking if generated code passes unit tests. It also describes the sandbox used to safely execute generated code.
The document provides an overview of machine learning and discusses various concepts related to applying machine learning to real-world problems. It covers topics such as feature extraction, encoding input data, classification vs regression, evaluating model performance, and challenges like overfitting and underfitting models to data. Examples are given for different types of learning problems, including text classification, sentiment analysis, and predicting stock prices.
The Ring programming language version 1.5.1 book - Part 5 of 180Mahmoud Samir Fayed
The document describes the Ring programming language. Key features include:
- Native object-oriented support including classes, inheritance, and polymorphism.
- Reflection and meta-programming capabilities.
- Clear program structure with statements first, then functions, packages, and classes.
- Support for exception handling, runtime code evaluation, I/O, math, strings, lists, files, databases, security, internet, zip files, and CGI.
- Ability to create GUI, game, and web applications via embedded libraries.
- Simple, natural syntax that encourages organization and readability.
The document discusses language-independent methods for clustering similar contexts without using syntactic information or manually annotated data. It describes representing contexts as vectors of lexical features like unigrams and bigrams. First-order representations use features directly present in contexts, while second-order incorporates related words via co-occurrence networks. Measures like log-likelihood help identify meaningful word associations as features. The goal is to cluster contexts based on their feature vectors, as implemented in the SenseClusters software.
Hands-On Lab Data Transformation Services - SQL ServerSerra Laercio
The document provides instructions for a hands-on lab to learn how to use SQL Server 2005 Data Transformation Services (DTS). The objectives are to create a DTS package, add tasks like Execute SQL and Data Flow, monitor package execution, and log results. Steps guide adding an existing package to a new project, examining the package components, executing it in debug mode. Further steps create a new package, data source, and connections and walk through adding tasks to load data into a dimension table using transformations.
The document discusses aspects of being a professional including being highly educated, working autonomously on intellectually challenging tasks, defining technical terms, reading books, referring to references, thinking before working and complaining, and not being overly pedantic. It provides examples of some technical terms and concepts along with explanations to illustrate how to think like a professional.
This chapter provides an overview of the content and structure of the book. It introduces the main chapters that will focus on tools and programming languages like Linux, shell scripting, sed, awk, perl, MySQL, R, and worked examples. It outlines the prerequisites needed to complete the exercises, which are running Linux, Unix, Mac OS X, or Cygwin. Formatting conventions for the text are also described. Additional online resources that accompany the book are mentioned. The overall aim is to present fundamental concepts and tools for command line programming and data analysis in a practical, example-driven way.
The Ring programming language version 1.5.4 book - Part 6 of 185Mahmoud Samir Fayed
This document provides an overview of the Ring programming language. Key features include native object-oriented support with encapsulation, inheritance, polymorphism and composition. It also supports reflection, exception handling, runtime code evaluation, I/O, math functions, strings, lists, files, databases, security, internet, zip, and CGI functionality. The language aims to have clear structure, be compact, encourage organization, and support both procedural and object-oriented paradigms. It can be used to create applications, libraries, games and more.
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
A Large number of digital text information is generated every day. Effectively searching, managing and
exploring the text data has become a main task. In this paper, we first present an introduction to text
mining and LDA topic model. Then we deeply explained how to apply LDA topic model to text corpus by
doing experiments on Simple Wikipedia documents. The experiments include all necessary steps of data
retrieving, pre-processing, fitting the model and an application of document exploring system. The result of
the experiments shows LDA topic model working effectively on documents clustering and finding the
similar documents. Furthermore, the document exploring system could be a useful research tool for
students and researchers.
C# coding standards, good programming principles & refactoringEyob Lube
The document discusses C# coding standards and principles of good programming. It covers topics such as the importance of coding standards, principles like KISS and SOLID, general naming conventions, and refactoring code for better design. Specific points made include that coding standards create consistency, enable quicker understanding of code, and facilitate collaboration. The SOLID principles of single responsibility, open/closed, Liskov substitution, interface segregation and dependency inversion are explained. Meaningful naming is also emphasized, with examples of how intention-revealing names can greatly improve readability and maintainability of code.
The presentation was given to Rivier Scala / Clojure User Group meeting on 10.6.2013. It is half-baked presentation. Will upload the final version when ready.
The first part is about DSLs in general, complexities in software engineering and abstraction. The seconds part presents an quick overview about DSLs in Scala and touches some of the technologies used for deep embedding.
Building a program refers to the process of taking editable source code and other files and transforming them into a finished software product through various steps like preprocessing, compiling, linking, testing, and packaging. CMake is a cross-platform build system developed by Kitware in 2001 that consists of tools like CMake, Ctest, and CPack to manage the entire build process across different platforms.
The document discusses language-independent methods for clustering similar contexts without using syntactic or lexical information from annotated resources. It describes representing contexts as vectors based on lexical features, and clustering the vectors to group similar contexts. Contexts can be headed, containing a target word, or headless. Features include unigrams, bigrams, and co-occurrences, identified by frequency or association measures. Contexts can be represented in first-order vectors based on feature presence, or second-order vectors averaging word co-occurrence vectors.
This document discusses computer tools for academic research. It aims to make computer use more effective for research tasks like downloading data, running regressions, and writing papers. The course covers programming principles, version control, data management beyond spreadsheets, modular Python programming, testing code, and numeric computing tools. It uses a sample research project on social networks and app adoption to illustrate these tools. The document compares the academic research cycle to software development and argues that following good programming practices can help optimize researchers' time.
Example of Using R #1: Exporting the Result of Correspondence Analysiskhcoder
This document provides instructions for exporting the results of a correspondence analysis from KH Coder as an R file and CSV file. It explains how to save the correspondence analysis results as an R source file, then open R and execute the file to recreate the plot. It also describes how to export the X-Y coordinates from the plot as a CSV file that can be opened in other programs like SPSS to recreate the graph.
Applications of Radioisotopes in Cancer Research.pptxMahitaLaveti
:
This presentation explores the diverse and impactful applications of radioisotopes in cancer research, spanning from early detection to therapeutic interventions. It covers the principles of radiotracer development, radiolabeling techniques, and the use of isotopes such as technetium-99m, fluorine-18, iodine-131, and lutetium-177 in molecular imaging and radionuclide therapy. Key imaging modalities like SPECT and PET are discussed in the context of tumor detection, staging, treatment monitoring, and evaluation of tumor biology. The talk also highlights cutting-edge advancements in theranostics, the use of radiolabeled antibodies, and biodistribution studies in preclinical cancer models. Ethical and safety considerations in handling radioisotopes and their translational significance in personalized oncology are also addressed. This presentation aims to showcase how radioisotopes serve as indispensable tools in advancing cancer diagnosis, research, and targeted treatment.
Eric Schott- Environment, Animal and Human Health (3).pptxttalbert1
Baltimore’s Inner Harbor is getting cleaner. But is it safe to swim? Dr. Eric Schott and his team at IMET are working to answer that question. Their research looks at how sewage and bacteria get into the water — and how to track it.
Animal Models for Biological and Clinical Research ppt 2.pptxMahitaLaveti
This presentation provides an in-depth overview of the pivotal role animal models play in advancing both basic biological understanding and clinical research. It covers the selection and classification of animal models—ranging from invertebrates to rodents and higher mammals—and their applications in studying human physiology, disease mechanisms, drug development, and toxicology. Special emphasis is placed on the use of genetically modified models, patient-derived xenografts (PDX), and disease-specific models in cancer, neuroscience, infectious diseases, and metabolic disorders. The talk also addresses ethical considerations, regulatory guidelines, and the principles of the 3Rs (Replacement, Reduction, and Refinement) in animal research
Structure formation with primordial black holes: collisional dynamics, binari...Sérgio Sacani
Primordial black holes (PBHs) could compose the dark matter content of the Universe. We present the first simulations of cosmological structure formation with PBH dark matter that consistently include collisional few-body effects, post-Newtonian orbit corrections, orbital decay due to gravitational wave emission, and black-hole mergers. We carefully construct initial conditions by considering the evolution during radiation domination as well as early-forming binary systems. We identify numerous dynamical effects due to the collisional nature of PBH dark matter, including evolution of the internal structures of PBH halos and the formation of a hot component of PBHs. We also study the properties of the emergent population of PBH binary systems, distinguishing those that form at primordial times from those that form during the nonlinear structure formation process. These results will be crucial to sharpen constraints on the PBH scenario derived from observational constraints on the gravitational wave background. Even under conservative assumptions, the gravitational radiation emitted over the course of the simulation appears to exceed current limits from ground-based experiments, but this depends on the evolution of the gravitational wave spectrum and PBH merger rate toward lower redshifts.
Preclinical Advances in Nuclear Neurology.pptxMahitaLaveti
This presentation explores the latest preclinical advancements in nuclear neurology, emphasizing how molecular imaging techniques are transforming our understanding of neurological diseases at the earliest stages. It highlights the use of radiotracers, such as technetium-99m and fluorine-18, in imaging neuroinflammation, amyloid deposition, and blood-brain barrier (BBB) integrity using modalities like SPECT and PET in small animal models. The talk delves into the development of novel biomarkers, advances in radiopharmaceutical chemistry, and the integration of imaging with therapeutic evaluation in models of Alzheimer’s disease, Parkinson’s disease, stroke, and brain tumors. The session aims to bridge the gap between bench and bedside by showcasing how preclinical nuclear imaging is driving innovation in diagnosis, disease monitoring, and targeted therapy in neurology.
TOI-421 b: A Hot Sub-Neptune with a Haze-free, Low Mean Molecular Weight Atmo...Sérgio Sacani
Common features of sub-Neptune atmospheres observed to date include signatures of aerosols at moderate equilibrium temperatures (∼500–800 K) and a prevalence of high mean molecular weight atmospheres, perhaps indicating novel classes of planets such as water worlds. Here we present a 0.83–5μm JWST transmission spectrum of the sub-Neptune TOI-421 b. This planet is unique among previously observed counterparts in its high equilibrium temperature (Teq ≈ 920 K) and its Sun-like host star. We find marked differences between the atmosphere of TOI-421 b and those of sub-Neptunes previously characterized with JWST, which all orbit late K and M stars. Specifically, water features in the NIRISS/SOSS bandpass indicate a low mean molecular weight atmosphere consistent with solar metallicity and no appreciable aerosol coverage. Hints of SO2 and CO (but not CO2 or CH4) also exist in our NIRSpec/G395M observations, but not at sufficient signal-to-noise ratio to draw f irm conclusions. Our results support a picture in which sub-Neptunes hotter than ∼850K do not form hydrocarbon hazes owing to a lack of methane to photolyze. TOI-421 b additionally fits the paradigm of the radius valley for planets orbiting FGK stars being sculpted by mass-loss processes, which would leave behind primordial atmospheres overlying rock/iron interiors. Further observations of TOI-421 b and similar hot sub-Neptunes will confirm whether haze-free atmospheres and low mean molecular weights are universal characteristics of such objects.
Investigating the central role that theories of the visual arts and creativity played in the development of fascism in France, Mark Antliff examines the aesthetic dimension of fascist myth-making within the history of the avant-garde. Between 1909 and 1939, a surprising array of modernists were implicated in this project, including such well-known figures as the symbolist painter Maurice Denis, the architects Le Corbusier and Auguste Perret, the sculptors Charles Despiau and Aristide Maillol, the “New Vision” photographer Germaine Krull, and the fauve Maurice Vlaminck.
3. 3
Preface
This presentation is a tutorial on how to use KH Coder.
KH Coder is a free software for quantitative content analysis or
text mining. It is also utilized for computational linguistics.
Details and downloads: http://khcoder.net/en
Introduction
4. Table of Contents
4
Introduction
Data
Purpose of Analysis
Preparation
Install KH Coder
Configure Stopwords
Create a Project and Run Pre-Processing
Step 1
Word Frequency List
The Context where a word is used
Co-occurrence Network of Words
Correspondence Analysis of Words
Closing Remarks for Step 1
Step 2
Use Coding Rules to Count Concepts
Retrieve Documents Assigned a Specific Code
Characters in Each Chapter
Characters and Verbs
Change of Words Co-occurring with Marilla
Conclusions
Introduction
5. Data
5
We are going to analyze a novel Anne of Green Gables by Montgomery.
When you prepare your own data for analysis, please open the attached
“Anne.xls” file in “tutorial_en” folder and see the figure below.
(1) Enter column names in the first row
(2) Enter actual data in the second and subsequent rows
(*) Enter data in the first sheet if you use Excel or Calc
Introduction
6. Purpose of Analysis
6
To confirm whether the quantitative analysis can
also illustrate the centrality of Marilla
It has been pointed out that the heroine Anne’s foster mother Marilla
plays an essential role in the novel and that Marilla is more central
than Anne's best friend Diana, and Gilbert with whom Anne has a
faint romance.
To demonstrate a quantitative content analysis
approach that comprises the following two steps:
[Step 1] Extract words automatically from data and statistically
analyze them to obtain a whole picture and explore the features of
the data while avoiding the prejudices of the researcher.
[Step 2] Specify coding rules, such as "if there is a particular
expression, we regard it as an appearance of the concept A", and
extract concepts from the data. Then, statistically analyze the
concepts to deepen the analysis.
Introduction
8. Install KH Coder
8
Preparation
(1) Double click the downloaded file
(2) Click
(3) Click
Now you are ready.
The number of unzipped files may vary between versions.
9. Interface Language
9
Preparation
(1) Double click the shortcut on
your desktop to start KH Coder
In case the menu is not displayed in your
favorite language, please select it here
We call this a “menu”.
Interface translation is not completed.
If you find a typo or if you have a suggestion, post it here:
https://github.com/ko-ichi-h/khcoder/issues
10. Configure Stopwords
10
(2) Click
(3) Open the “tutorial_en”
folder,drag the file
“stopwords_sample.txt”
and drop here.
(Alternatively, simply paste
the content of the file here.)
(4) Click
(1) Go to [Project] [Settings] in the menu of KH Coder
Preparation
Black balloons indicate
operations you have to
perform.
11. Notes on Stopwords
11
You can specify any words as stopwords in KH Coder to exclude
those words from your analysis.
Stopwords will be given the special POS tag “OTHER”.
“OTHER” is NOT checked by default, so that words
with “OTHER” tag will be excluded from analyses.
Preparation
Green balloons and
bare texts are notes.
No operation needed.
12. Create a Project & Run Pre-Processing
12
If you get “could not find JAVA” error, please install JAVA.
Next time you start KH Coder, go to [Project] [Open] in the
menu and open the project you have created here.
KH Coder “concentrates” on the task. So it may look frozen
or “not responding”. But it is normal when it is busy.
(1) Go to [Project] [New] in the menu of KH Coder
(2) Click [Browse] and open “anne.xls” in the “tutorial_en” folder
(3) Make sure [text] and [English] are
selected
(4) Click
(5) Go to [Pre-Processing] [Run Pre-Processing] in the menu and click [OK]
Preparation
14. Word Frequency List (1/2)
14
Go to [Tools] [Words] [Frequency List (Excel)] in the menu
These numbers are counts of base forms / lemma
Step 1
15. Word Frequency List (2/2)
15
The character name that most frequently appears next to the heroine
“ANNE” is not her best friend “Diana” but “MARILLA”.
In the novel, an orphan “girl” or “child” heroine gets adopted, finds a
“home”, and goes to “school”. And she once had a inferiority complex
about her “hair”.
Words Freq Words Freq Words Freq
ANNE 1138 little 283 want 149
say 952 girl 267 home 136
MARILLA 849 thing 260 child 134
think 486 tell 252 Barry 132
Diana 414 look 246 school 128
know 364 good 225 sit 126
Matthew 361 feel 215 night 117
just 358 time 208 really 116
come 353 eye 152 hair 114
make 286 Lynde 151 Gilbert 113
Step 1
16. The Context Where a Word is Used
16
Step 1
(1) Go to [Tools] [Words] [KWIC Concordance] in the menu
(2) Type a word and hit the [Enter] key
(3) Double click a line to view
the whole paragraph
17. Co-Occurrence Network of Words (1/2)
17
Step 1
(1) Go to [Tools] [Words] [Co-Occurrence Network] in the menu
(3) Select [Subgraph:
modularity] here
(2) Configure as shown in
this screen and click [OK]
18. Co-Occurrence Network of Words (2/2)
18
girl
time
home
school
eye
face
evening
boy
night
bed
minister
tea man
woman
ANNE
thing
MARILLA
Diana
Matthew
people
LyndeBarry
Gilbert
Avonlea
Rachel
GABLES
GREEN
Jane
Ruby
Allan
Josie
Stacy
Blythe
Gillis
Andrews
book
Pye
little
good
day
old
year
window
white
glad
world
dress
new
hair
red
lovely
course
nice
real
just
really
life
hard
pretty
right
say
think
know
come
make
big
tell
look
feel
way
sure
want
sit
like
imagine
place
imagination
suppose let
try
believelive
away
heart
love
worduse great
grow
read
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
“Diana”, “Marilla”, and
“Matthew” are connected
close to “Anne”
“Gilbert” is in rather remote
part and connected to
“Anne” via “school”
“Jane”, “Ruby”, “Josie”, and
“Stacy” are also connected
via “school”
The figure is retouched with
Illustrator
Step 1
19. Methods for Exploring Co-Occurrences of Words
19
To explore co-occurrences of words, you can also use:
hierarchical cluster analysis
Multi-dimensional scaling
By interpreting these result, you may find major themes of the text
from groups of words which tend to appear together.
KH Coder uses R as back end to execute these multivariate methods.
Step 1
co-occurrence network cluster analysis MDS
20. Correspondence Analysis of Words (1/2)
20
Step 1
(1) Go to [Tools] [Words] [Correspondence Analysis] in the menu
(3) Select [grayscale] here
(2) Configure as shown in
this screen and click [OK]
21. Correspondence Analysis of Words (2/2)
21
Step 1
−0.50
−0.25
0.00
0.25
0.50
−1.0 −0.5 0.0 0.5
Dimension 1 (0.1417, 54.59% )
Dimension2(0.07,26.97%)
Frequency
300
600
900
01−07
08−19
20−28
29−38
Allan
Diana
Stacy
ANNE
Jane
Josie
minister
Cuthbert
child
Matthew
Ruby
Rachel Gilbert
imagine
school
boy
Barry
Pyeroad
Gillis
year
old
MARILLAlittle
new
Blythe
evening
heart
place
room
bed
walk
hair
friend
want
stay
lifeimagination
house
night
In the beginning [01-07],
the “child” Anne was
allowed to “stay” in
“Cuthbert’s house”.
Then in [08-19], she
met a neighbor girl
“Diana” and started
going to “school”. At the
school, she met
“Gilbert”.
In the latter half of the
novel, Anne and Diana
went separate ways,
and Anne's schoolmates,
such as “Josie”, “Jane”,
and “Ruby”, become
characteristic. Anne also
learned a lot from adult
women such as Mrs.
“Allan” and Miss “Stacy”.
We can understand the story flow throughout the novel
by checking characteristic words of each part.
22. Characteristic Words of each Part
Step 1
(1) Go to [Tools] [Variables & Headings] in the menu
(2) Click “part”
(3) Select “Sentences”
(4) Select “catalogue: Excel”
Top 10 characteristic words
of each part are tabulated. It
can be used as an alternative
for correspondence analysis.
23. Closing Remarks for Step 1
23
Statistical analyses of automatically extracted words
are suitable for gaining a whole picture of the data
Main theme (word frequency list or co-occurrence network)
Relations between characters or words (co-occurrence network)
Story flow (correspondence analysis)
About the centrality of Marilla
Most frequently appears next to the heroine Anne
Her relationship with Anne appears to be almost as strong as Diana’s
Be present throughout all four parts of the story
Step 1
We obtained overviews of entire data in this step. Next, we
are going to put more focus on Marilla using coding rules.
25. Use Coding Rules to Count Concepts
25
In some cases, we have to count concepts, not words.
To count concepts, you can compose “cording rules” like this:
*Character_name_Gilbert
Gilbert or Gil
Indicates the name of this code: “Character_name_Gilbert”
Not only the documents containing “Gilbert” but
also those containing “Gil” are assigned this code.
If a document is acceptable under multiple coding rules, multiple
codes will be assigned to the document.
Step 2
26. Retrieve Documents Assigned a Specific Code
26
Step 2
(1) Go to [Tools] [Documents] [Search Documents] in the menu
(2) Click [Browse] and open“code_1.txt” in the “tutorial-en” folder
(3) Double click any of the codes
(4) Double click a line to view
the whole paragraph
27. Characters in Each Chapter (1/2)
27
Step 2
(1) Go to [Tools] [Coding] [Crosstab] in the menu
(2) Click [Browse] and open“code_1.txt” in the “tutorial_en” folder
(5) Click
(4) Click(3) Select [Sentences]
and [chapter]
28. Characters in Each Chapter (2/2)
28
Step 2
Gilbert
Diana
Matthew
Marilla
ANNE
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Pearson rsd.
5.0
2.5
0.0
−2.5
−5.0
Percent:
10
20
Marilla and Anne are present almost everywhere
Although Marilla and Anne were apart in chapter 35, there was an
emotional reunion in the following chapter 36. Anne won a scholarship
and rejoiced saying “Oh, won’t Matthew and Marilla be pleased!”
29. Characters and Verbs (1/2)
29
Step 2
(1) Go to [Tools] [Coding] [Co-occurrences Network] in the menu
(2) Click [Browse] and open “code_2.txt” in the tutorial folder
(3) Configure as shown in
this screen and Click [OK]
30. Characters and Verbs (2/2)
30
Step 2
.06
.04
.11
.06
.04
.06
.05
.04
.04
.03
.03
.03.04
.03
.05
.05
.04
Matthew
Marilla
ANNE
Diana Gilbert
think
know
tell
look
feel
Anne often expresses what she
“feels” to Marilla:
“I do feel dreadfully sad,
Marilla” (c21)
Marilla and Anne often “look”
at each other:
Marilla looked at Anne and
softened at sight of the child’s
pale face… (c6)
Anne looked at her with eyes
limpid with sympathy (c20)
Marilla looked at her with a
tenderness that would never
have been suffered to reveal
itself in any clearer light… (c30)
Marilla and Anne exchange their feelings by words, and also with their eyes,
meaning that a close and intimate relationship is depicted between the two.
31. (1) Go to [Tools] [Words] [Word Association] in the menu
(2) Click [Browse] and open“code_3.txt”
(3) Click [*Marilla]
(4) Hold down [Ctrl]
key on the keyboard
and click [*01-07]
(5) Click
Change of Words Co-occurring with Marilla (1/3)
31
Step 2
* To search the words co-occurring
with Marilla in the following part "08-
19", repeat procedure (3) and then
click [*08-19] instead of [*01-07] in
procedure (4).
32. Change of Words Co-occurring with Marilla (2/3)
32
Step 2
01-07 08-19 20-28 29-38
Matthew .053 say .072 say .042 Matthew .041
mare .040 ANNE .059 think .034 look .040
Cuthbert .040 just .039 ANNE .032 sit .039
table .038 think .036 cake .030 ANNE .038
dish .037 brooch .031 make .028 say .038
child .033 tell .030 minister .028 face .031
bed .032 evening.025 Allan .026 girl .026
say .032 home .024 feel .025 think .024
uncomfortable .032 set .024 know .024 want .022
sorrel .032 let .023 time .023 lean .022
“Marilla really did not know how to talk to the child, and
her uncomfortable ignorance made her crisp and...” (c4)
The “child” is upgraded to “Anne” and implying that it is
impossible to bring up a child without “saying” anything.
The “feel” and “look”
33. Change of Words Co-occurring with Marilla (3/3)
33
Step 2
Change of Marilla
1. Uncomfortable ignorance [01-07]
2. Calling Anne and Saying many things [08-28]
3. Exchanging feelings by words and eyes
with Anne [20-38]
The change is depicted throughout the story.
34. Conclusions
34
Step 2
Results of step 2 showed that:
Marilla is literally present almost everywhere
A close and intimate relationship is depicted between
Marilla and Anne
Change of Marilla and growing relationship between
Marilla and Anne is depicted throughout the story
Our analysis supports the assertion that Marilla plays central
roll in the story.
Identifying keywords like “child”, “uncomfortable”, “look”, and
“feel” through quantitative analysis is considered to be useful
for extracting depiction which specifically describes Marilla’s
roll and change in the story.
35. Web site of KH Coder
http://khcoder.net/en
For more details on this tutorial
Part 1: http://www.ritsumei.ac.jp/file.jsp?id=325881
Part 2: http://www.ritsumei.ac.jp/file.jsp?id=346128
Questions or Comments?
https://github.com/ko-ichi-h/khcoder/issues