SlideShare a Scribd company logo
Quick Start Tutorial of KH Coder:
Quantitative Content Analysis or Text Mining
of English Language Data
Koichi Higuchi
1
2
Preface
 This presentation is a part of tutorials for using KH Coder.
 KH Coder is a free software for quantitative content
analysis or text mining. It is also utilized for
computational linguistics.
 Details and downloads:
http://khc.sourceforge.net/en/
Table of Contents
3
Configure KH Coder for English speaking people / English data
 1. Change the interface language to English
 2. Settings for analyzing English text
 Notes on the stopwords
Create a new project and prepare for analysis
 3. Create a new project
 4. Run pre-processing
Frequently appeared words and co-occurrences
 5. Word frequency list
 6. KWIC and collocation stats
 7. Co-occurrence network of words
 Methods for exploring co-occurrences of words
Characteristics of each chapter
 8. Distinctive words of each chapter
 9. Correspondence analysis of words and chapters
Coding Rules
 Use coding rules to count concepts
 10. Search documents with coding rules
 11. Cross tabulation of the codes
1. Change the Interface Language to English
4
Choose “English” here
and restart KH Coder.
If you prefer the Japanese interface, you may skip this step.
You may also change the interface font.
Go to [Project] [Settings] in the menubar.
2. Settings for Analyzing English Text
5
(1) Go to [Project] [Settings] in the menubar.
(2) Select “Lemmatization.”
(3) Click “config.”
(4) Open the “tutorial_en”
folder, then drag the file
“stopwords_sample_en.txt”
and drop here. (Or just paste
the content of the file here)
(5) Click “OK.”(6) Click “OK.”
Notes on the Stopwords
6
You can specify any words as stopwords in KH Coder.
The stopwords will be given the special POS tag “OTHER.”
Words with “OTHER” tag will be excluded from analyses by default.
3. Create a New Project
7
(1) Go to [Project] [New] in the menubar.
(2) Click “Browse” and open the file
“tutorial_en/botchan_en.txt”
(3) fill in whatever
memo you like
(4) Click “OK.”
In this tutorial we analyze a
novel “Botchan” by Soseki.
“botchan_en.txt” contains all 11
chapters of the novel.
Chapter headings are marked
with h1 tag
Next time you start KH Coder,
go to [Project] [Open] in the
menubar and open the project
you have created here.
4. Run Pre-Processing
8
Go to [Pre-Processing] [Run Pre-Processing]
in the menubar. Then click “OK.”
Sentence splitting, tokenization, POS tagging
and lemmatization are performed.
The results are compiled into MySQL database
for searching and statistical analysis.
When processing data, KH Coder
“concentrates” on the job. So sometimes it
looks frozen. But it is normal when CPU or disk
is busy.
5. Word Frequency List
9
Go to [Tools] [Words] [Frequency List] in the menubar.
These are counts of base forms / lemmas
6. KWIC and Collocation Stats 1/2
10
(1) Go to [Tools] [Words] [KWIC Concordance] in the menubar.
(2) Input a base form of a word
and hit “Enter” on the keybord
When you change sort options,
click “Search” button again.
Double click any line to view
wider contexts. You can
change viewing Units below.
(3) Click “Stats” to open
the collocation stats.
6. KWIC and Collocation Stats 2/2
11
(1) Follow the steps in the previous slide to open the collocation stats.
(2) You can filter words
by POS tags.
“L1” stands for “Left 1.” Numbers in this column
indicate how many times each words appeared
just before the Node Word (left side, distance 1).
7. Co-Occurrence Network of Words
12
(3) Click “Config” and check “Larger nodes
for higher frequency words”, then lick “OK.”
Now you can see a co-occurrence network of high frequency words in the text.
The color change from blue (low) to pink (high). It indicates the centrality index.
(1) Go to [Tools] [Words] [Co-Occurrence Network] in the menubar.
(2) Select “Paragraphs” as Unit, then click “OK”
(4) Click “Config” and increase “edges” (co-
occurences) to “top 100,” then lick “OK.”
(5) Select “Community: modularity” as “color.”
Which version did you like?
Methods for Exploring Co-Occurrences of Words
13
To explore co-occurrences of words, you can also use:
 hierarchical cluster analysis
 multidimensional scaling
co-occurrence network cluster analysis MDS
By interpreting these result, you may find major themes of the text
from groups of words which tend to appear together.
KH Coder uses R as back end to execute these multivariate methods.
8. Distinctive Words of Each Chapter
14
(2) Click “Heading 1.”
Top 10 distinctive words of each chapter
are tabulated. The “distinctiveness” is
calculated using Jaccard index.
Basically, if a word shows larger
probability of appearance in a specific
chapter, It’s considered distinctive.
(1) Go to [Tools] [Variables & Headings] [List] in the menubar.
(3) Select “Sentences.”
(4) Select “catalogue: Excel.”
9. Correspondence Analysis of Words and Chapters
15
(2) Click “OK”
Using correspondence analysis,
you can visually interpret
characteristics of each chapter.
(1) Go to [Tools] [Words] [Correspondence Analysis] in the menubar.
(3) Click “Config”, then reduce words
to “Top 30,” check “Bubble plot,”
uncheck “Size of variables...,” and
click “OK.” (This step is optional.)
Use Coding Rules to Count Concepts
16
In some cases, we have to count concepts, not words.
To count concepts, you can compose “cording rules” like this:
*shopping
store or shop or ( merchandise and not develop )
Indicates the name of this code.
The conditions for attaching this code. Cases that contain words
like store and shop are given the code “shopping.” The
parenthetical notation means that cases should contain the word
“merchandise” but should not contain the word “develop.”
If a case is acceptable under multiple coding rules, multiple codes will
be given to the case.
We use “tutorial_en/themes.txt”
as example coding rules in this
tutorial. Please open this file and
check the content.
10. Search Documents with Coding Rules
17
(1) Go to [Tools] [Documents] [Search Documents] in the menubar.
(2) Click “Browse” and select
“tutorial_en/themes.txt”
(3) Select “Paragraphs”
(4) Double click a code
(5) Double click a result to
view the whole paragraph. When you compose a coding
rule, it is important to search and
check the actual documents
which are acceptable under the
rule.
11. Cross Tabulation of Codes
18
(1) Go to [Tools] [Coding] [Crosstab] in the menubar.
(2) Click “Browse” and select
“tutorial_en/themes.txt”
(3) Select “Sentences”
(5) Click “all” to
make a graph.
In the latter half of the novel,
it looks like “aggression”
overwhelms “positive affect”
and forms the climax of the
story at chapter X.
(4) Click “Run”
Acknowledgement
I am grateful to students who attended the 2011
“text mining” class at Doshisha University (Faculty
of Culture and Information Science) for giving me
some hints on composing coding rules for
“Botchan.”
Questions or Comments?
Please feel free to post questions or comments at
web forum here:
https://sourceforge.net/p/khc/discussion/
Ad

More Related Content

What's hot (15)

Le cryptage et le décryptage
Le cryptage et le décryptageLe cryptage et le décryptage
Le cryptage et le décryptage
SKYWARE COMPAGNY
 
Gérer ses signets en ligne avec Diigo
Gérer ses signets en ligne avec DiigoGérer ses signets en ligne avec Diigo
Gérer ses signets en ligne avec Diigo
URFIST de Rennes
 
Vector space model12345678910111213.pptx
Vector space model12345678910111213.pptxVector space model12345678910111213.pptx
Vector space model12345678910111213.pptx
someyamohsen2
 
Indexation et ri
Indexation et riIndexation et ri
Indexation et ri
Drawat Yassine
 
Présentation Cryptographie
Présentation CryptographiePrésentation Cryptographie
Présentation Cryptographie
Cynapsys It Hotspot
 
DSA
DSADSA
DSA
rrupa2
 
Steganography
SteganographySteganography
Steganography
sandeipz
 
Diffie-Hellman Key Exchange
Diffie-Hellman Key ExchangeDiffie-Hellman Key Exchange
Diffie-Hellman Key Exchange
Gürkan YILDIRIM
 
Unit 4
Unit 4Unit 4
Unit 4
KRAMANJANEYULU1
 
Python Cryptography & Security
Python Cryptography & SecurityPython Cryptography & Security
Python Cryptography & Security
Jose Manuel Ortega Candel
 
Network Security - Block cipher
Network Security - Block cipherNetwork Security - Block cipher
Network Security - Block cipher
Ashish Duggal
 
Cyclic Attacks on the RSA Trapdoor Function
Cyclic Attacks on the RSA Trapdoor FunctionCyclic Attacks on the RSA Trapdoor Function
Cyclic Attacks on the RSA Trapdoor Function
Dharmalingam Ganesan
 
CNIT 141: 6. Hash Functions
CNIT 141: 6. Hash FunctionsCNIT 141: 6. Hash Functions
CNIT 141: 6. Hash Functions
Sam Bowne
 
Ch9
Ch9Ch9
Ch9
Mahender Kumar
 

Similar to Quick Start Tutorial of KH Coder 2: Quantitative Content Analysis or Text Mining of English Language Data (20)

[OUTDATED] Quick Start Tutorial of KH Coder 3
[OUTDATED] Quick Start Tutorial of KH Coder 3[OUTDATED] Quick Start Tutorial of KH Coder 3
[OUTDATED] Quick Start Tutorial of KH Coder 3
khcoder
 
1428393873 mhkx3 ln
1428393873 mhkx3 ln1428393873 mhkx3 ln
1428393873 mhkx3 ln
WilfredodelaCernaJr
 
HKU ppt
HKU pptHKU ppt
HKU ppt
maths_joe
 
Hku Ppt
Hku PptHku Ppt
Hku Ppt
maths_joe
 
ATLAS.ti Training - Covering the Basics (Mac edition)
ATLAS.ti Training - Covering the Basics (Mac edition)ATLAS.ti Training - Covering the Basics (Mac edition)
ATLAS.ti Training - Covering the Basics (Mac edition)
Arun Verma
 
ATLAS.ti training presentation: Covering the basics
ATLAS.ti training presentation: Covering the basics ATLAS.ti training presentation: Covering the basics
ATLAS.ti training presentation: Covering the basics
Arun Verma
 
Basics-of-HTML Basics-of-HTML Basics-of-HTML
Basics-of-HTML Basics-of-HTML Basics-of-HTMLBasics-of-HTML Basics-of-HTML Basics-of-HTML
Basics-of-HTML Basics-of-HTML Basics-of-HTML
Bala Anand
 
902350_HTML_Jar.ppt
902350_HTML_Jar.ppt902350_HTML_Jar.ppt
902350_HTML_Jar.ppt
mevitechnologies
 
DOC-20220920-WA0012..pptx
DOC-20220920-WA0012..pptxDOC-20220920-WA0012..pptx
DOC-20220920-WA0012..pptx
AnuragKashyap413069
 
902350_HTML_Jar.ppt
902350_HTML_Jar.ppt902350_HTML_Jar.ppt
902350_HTML_Jar.ppt
ARUNVEVO1
 
web development html css javascrptt902350_HTML_Jar.ppt
web development html css javascrptt902350_HTML_Jar.pptweb development html css javascrptt902350_HTML_Jar.ppt
web development html css javascrptt902350_HTML_Jar.ppt
PuniNihithasree
 
902350_HTML_Jar.ppt
902350_HTML_Jar.ppt902350_HTML_Jar.ppt
902350_HTML_Jar.ppt
KulmiyeCaliJaxaf
 
HTML_Fundamentals_Tutorial_902350_HTML_Jar.ppt
HTML_Fundamentals_Tutorial_902350_HTML_Jar.pptHTML_Fundamentals_Tutorial_902350_HTML_Jar.ppt
HTML_Fundamentals_Tutorial_902350_HTML_Jar.ppt
hinalsomani93
 
Introduction to HTML Basic to advance full
Introduction to HTML Basic to advance fullIntroduction to HTML Basic to advance full
Introduction to HTML Basic to advance full
VinuS29
 
Basics-of-HTML.ppt
Basics-of-HTML.pptBasics-of-HTML.ppt
Basics-of-HTML.ppt
Bala Anand
 
html presentation on basis of tage .ppt
html presentation on basis of tage  .ppthtml presentation on basis of tage  .ppt
html presentation on basis of tage .ppt
ProgressiveHeights2
 
Intro to HTML
Intro to HTMLIntro to HTML
Intro to HTML
VincentAcapen
 
HTMLppt 1hagbSKJhzdvjhdjdvhjsfhjsdhfshfshsfhsf
HTMLppt 1hagbSKJhzdvjhdjdvhjsfhjsdhfshfshsfhsfHTMLppt 1hagbSKJhzdvjhdjdvhjsfhjsdhfshfshsfhsf
HTMLppt 1hagbSKJhzdvjhdjdvhjsfhjsdhfshfshsfhsf
RudraRathore6
 
902350 html jar
902350 html jar902350 html jar
902350 html jar
siva thirumal
 
html tags
 html tags html tags
html tags
YogeshDhamke2
 
[OUTDATED] Quick Start Tutorial of KH Coder 3
[OUTDATED] Quick Start Tutorial of KH Coder 3[OUTDATED] Quick Start Tutorial of KH Coder 3
[OUTDATED] Quick Start Tutorial of KH Coder 3
khcoder
 
ATLAS.ti Training - Covering the Basics (Mac edition)
ATLAS.ti Training - Covering the Basics (Mac edition)ATLAS.ti Training - Covering the Basics (Mac edition)
ATLAS.ti Training - Covering the Basics (Mac edition)
Arun Verma
 
ATLAS.ti training presentation: Covering the basics
ATLAS.ti training presentation: Covering the basics ATLAS.ti training presentation: Covering the basics
ATLAS.ti training presentation: Covering the basics
Arun Verma
 
Basics-of-HTML Basics-of-HTML Basics-of-HTML
Basics-of-HTML Basics-of-HTML Basics-of-HTMLBasics-of-HTML Basics-of-HTML Basics-of-HTML
Basics-of-HTML Basics-of-HTML Basics-of-HTML
Bala Anand
 
902350_HTML_Jar.ppt
902350_HTML_Jar.ppt902350_HTML_Jar.ppt
902350_HTML_Jar.ppt
ARUNVEVO1
 
web development html css javascrptt902350_HTML_Jar.ppt
web development html css javascrptt902350_HTML_Jar.pptweb development html css javascrptt902350_HTML_Jar.ppt
web development html css javascrptt902350_HTML_Jar.ppt
PuniNihithasree
 
HTML_Fundamentals_Tutorial_902350_HTML_Jar.ppt
HTML_Fundamentals_Tutorial_902350_HTML_Jar.pptHTML_Fundamentals_Tutorial_902350_HTML_Jar.ppt
HTML_Fundamentals_Tutorial_902350_HTML_Jar.ppt
hinalsomani93
 
Introduction to HTML Basic to advance full
Introduction to HTML Basic to advance fullIntroduction to HTML Basic to advance full
Introduction to HTML Basic to advance full
VinuS29
 
Basics-of-HTML.ppt
Basics-of-HTML.pptBasics-of-HTML.ppt
Basics-of-HTML.ppt
Bala Anand
 
html presentation on basis of tage .ppt
html presentation on basis of tage  .ppthtml presentation on basis of tage  .ppt
html presentation on basis of tage .ppt
ProgressiveHeights2
 
HTMLppt 1hagbSKJhzdvjhdjdvhjsfhjsdhfshfshsfhsf
HTMLppt 1hagbSKJhzdvjhdjdvhjsfhjsdhfshfshsfhsfHTMLppt 1hagbSKJhzdvjhdjdvhjsfhjsdhfshfshsfhsf
HTMLppt 1hagbSKJhzdvjhdjdvhjsfhjsdhfshfshsfhsf
RudraRathore6
 
Ad

More from khcoder (9)

KH Coder 3 チュートリアル(スライド版)
KH Coder 3 チュートリアル(スライド版)KH Coder 3 チュートリアル(スライド版)
KH Coder 3 チュートリアル(スライド版)
khcoder
 
【旧版】KH Coder 3 チュートリアル(スライド版)
【旧版】KH Coder 3 チュートリアル(スライド版)【旧版】KH Coder 3 チュートリアル(スライド版)
【旧版】KH Coder 3 チュートリアル(スライド版)
khcoder
 
Jaccard係数の計算式と特徴(2)
Jaccard係数の計算式と特徴(2)Jaccard係数の計算式と特徴(2)
Jaccard係数の計算式と特徴(2)
khcoder
 
Jaccard係数の計算式と特徴(1)
Jaccard係数の計算式と特徴(1)Jaccard係数の計算式と特徴(1)
Jaccard係数の計算式と特徴(1)
khcoder
 
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41stフリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
khcoder
 
KH Coder 2 チュートリアル(スライド版)
KH Coder 2 チュートリアル(スライド版)KH Coder 2 チュートリアル(スライド版)
KH Coder 2 チュートリアル(スライド版)
khcoder
 
Executing SQL Queries and Making Plugins
Executing SQL Queries and Making PluginsExecuting SQL Queries and Making Plugins
Executing SQL Queries and Making Plugins
khcoder
 
Example of Using R #1: Exporting the Result of Correspondence Analysis
Example of Using R #1: Exporting the Result of Correspondence AnalysisExample of Using R #1: Exporting the Result of Correspondence Analysis
Example of Using R #1: Exporting the Result of Correspondence Analysis
khcoder
 
Rファイルの保存と活用1―KH Coderによる対応分析の結果のエクスポートと活用―
Rファイルの保存と活用1―KH Coderによる対応分析の結果のエクスポートと活用―Rファイルの保存と活用1―KH Coderによる対応分析の結果のエクスポートと活用―
Rファイルの保存と活用1―KH Coderによる対応分析の結果のエクスポートと活用―
khcoder
 
KH Coder 3 チュートリアル(スライド版)
KH Coder 3 チュートリアル(スライド版)KH Coder 3 チュートリアル(スライド版)
KH Coder 3 チュートリアル(スライド版)
khcoder
 
【旧版】KH Coder 3 チュートリアル(スライド版)
【旧版】KH Coder 3 チュートリアル(スライド版)【旧版】KH Coder 3 チュートリアル(スライド版)
【旧版】KH Coder 3 チュートリアル(スライド版)
khcoder
 
Jaccard係数の計算式と特徴(2)
Jaccard係数の計算式と特徴(2)Jaccard係数の計算式と特徴(2)
Jaccard係数の計算式と特徴(2)
khcoder
 
Jaccard係数の計算式と特徴(1)
Jaccard係数の計算式と特徴(1)Jaccard係数の計算式と特徴(1)
Jaccard係数の計算式と特徴(1)
khcoder
 
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41stフリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
フリーソフトウェア「KH Coder」を使った計量テキスト分析 ―手軽なマウス操作による分析からプラグイン作成まで― #TokyoWebmining 41st
khcoder
 
KH Coder 2 チュートリアル(スライド版)
KH Coder 2 チュートリアル(スライド版)KH Coder 2 チュートリアル(スライド版)
KH Coder 2 チュートリアル(スライド版)
khcoder
 
Executing SQL Queries and Making Plugins
Executing SQL Queries and Making PluginsExecuting SQL Queries and Making Plugins
Executing SQL Queries and Making Plugins
khcoder
 
Example of Using R #1: Exporting the Result of Correspondence Analysis
Example of Using R #1: Exporting the Result of Correspondence AnalysisExample of Using R #1: Exporting the Result of Correspondence Analysis
Example of Using R #1: Exporting the Result of Correspondence Analysis
khcoder
 
Rファイルの保存と活用1―KH Coderによる対応分析の結果のエクスポートと活用―
Rファイルの保存と活用1―KH Coderによる対応分析の結果のエクスポートと活用―Rファイルの保存と活用1―KH Coderによる対応分析の結果のエクスポートと活用―
Rファイルの保存と活用1―KH Coderによる対応分析の結果のエクスポートと活用―
khcoder
 
Ad

Recently uploaded (20)

Cryptocurrency Exchange Script like Binance.pptx
Cryptocurrency Exchange Script like Binance.pptxCryptocurrency Exchange Script like Binance.pptx
Cryptocurrency Exchange Script like Binance.pptx
riyageorge2024
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
Streamline Your Manufacturing Data. Strengthen Every Operation.
Streamline Your Manufacturing Data. Strengthen Every Operation.Streamline Your Manufacturing Data. Strengthen Every Operation.
Streamline Your Manufacturing Data. Strengthen Every Operation.
Aparavi
 
Building Apps for Good The Ethics of App Development
Building Apps for Good The Ethics of App DevelopmentBuilding Apps for Good The Ethics of App Development
Building Apps for Good The Ethics of App Development
Net-Craft.com
 
Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509
Fermin Galan
 
A Deep Dive into Odoo CRM: Lead Management, Automation & More
A Deep Dive into Odoo CRM: Lead Management, Automation & MoreA Deep Dive into Odoo CRM: Lead Management, Automation & More
A Deep Dive into Odoo CRM: Lead Management, Automation & More
SatishKumar2651
 
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdfTop Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
evrigsolution
 
Innovative Approaches to Software Dev no good at all
Innovative Approaches to Software Dev no good at allInnovative Approaches to Software Dev no good at all
Innovative Approaches to Software Dev no good at all
ayeshakanwal75
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEMGDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
philipnathen82
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Creating Automated Tests with AI - Cory House - Applitools.pdf
Creating Automated Tests with AI - Cory House - Applitools.pdfCreating Automated Tests with AI - Cory House - Applitools.pdf
Creating Automated Tests with AI - Cory House - Applitools.pdf
Applitools
 
Best Practices for Collaborating with 3D Artists in Mobile Game Development
Best Practices for Collaborating with 3D Artists in Mobile Game DevelopmentBest Practices for Collaborating with 3D Artists in Mobile Game Development
Best Practices for Collaborating with 3D Artists in Mobile Game Development
Juego Studios
 
Effortless SMS Blasts from Salesforce with Message Blink — No Tab Switching!
Effortless SMS Blasts from Salesforce with Message Blink — No Tab Switching!Effortless SMS Blasts from Salesforce with Message Blink — No Tab Switching!
Effortless SMS Blasts from Salesforce with Message Blink — No Tab Switching!
Message Blink
 
The Elixir Developer - All Things Open
The Elixir Developer - All Things OpenThe Elixir Developer - All Things Open
The Elixir Developer - All Things Open
Carlo Gilmar Padilla Santana
 
Cryptocurrency Exchange Script like Binance.pptx
Cryptocurrency Exchange Script like Binance.pptxCryptocurrency Exchange Script like Binance.pptx
Cryptocurrency Exchange Script like Binance.pptx
riyageorge2024
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
Surviving a Downturn Making Smarter Portfolio Decisions with OnePlan - Webina...
OnePlan Solutions
 
Streamline Your Manufacturing Data. Strengthen Every Operation.
Streamline Your Manufacturing Data. Strengthen Every Operation.Streamline Your Manufacturing Data. Strengthen Every Operation.
Streamline Your Manufacturing Data. Strengthen Every Operation.
Aparavi
 
Building Apps for Good The Ethics of App Development
Building Apps for Good The Ethics of App DevelopmentBuilding Apps for Good The Ethics of App Development
Building Apps for Good The Ethics of App Development
Net-Craft.com
 
Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509Orion Context Broker introduction 20250509
Orion Context Broker introduction 20250509
Fermin Galan
 
A Deep Dive into Odoo CRM: Lead Management, Automation & More
A Deep Dive into Odoo CRM: Lead Management, Automation & MoreA Deep Dive into Odoo CRM: Lead Management, Automation & More
A Deep Dive into Odoo CRM: Lead Management, Automation & More
SatishKumar2651
 
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdfTop Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
Top Magento Hyvä Theme Features That Make It Ideal for E-commerce.pdf
evrigsolution
 
Innovative Approaches to Software Dev no good at all
Innovative Approaches to Software Dev no good at allInnovative Approaches to Software Dev no good at all
Innovative Approaches to Software Dev no good at all
ayeshakanwal75
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Why Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card ProvidersWhy Tapitag Ranks Among the Best Digital Business Card Providers
Why Tapitag Ranks Among the Best Digital Business Card Providers
Tapitag
 
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
Interactive odoo dashboards for sales, CRM , Inventory, Invoice, Purchase, Pr...
AxisTechnolabs
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEMGDS SYSTEM | GLOBAL  DISTRIBUTION SYSTEM
GDS SYSTEM | GLOBAL DISTRIBUTION SYSTEM
philipnathen82
 
Wilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For WindowsWilcom Embroidery Studio Crack 2025 For Windows
Wilcom Embroidery Studio Crack 2025 For Windows
Google
 
The Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdfThe Significance of Hardware in Information Systems.pdf
The Significance of Hardware in Information Systems.pdf
drewplanas10
 
Creating Automated Tests with AI - Cory House - Applitools.pdf
Creating Automated Tests with AI - Cory House - Applitools.pdfCreating Automated Tests with AI - Cory House - Applitools.pdf
Creating Automated Tests with AI - Cory House - Applitools.pdf
Applitools
 
Best Practices for Collaborating with 3D Artists in Mobile Game Development
Best Practices for Collaborating with 3D Artists in Mobile Game DevelopmentBest Practices for Collaborating with 3D Artists in Mobile Game Development
Best Practices for Collaborating with 3D Artists in Mobile Game Development
Juego Studios
 
Effortless SMS Blasts from Salesforce with Message Blink — No Tab Switching!
Effortless SMS Blasts from Salesforce with Message Blink — No Tab Switching!Effortless SMS Blasts from Salesforce with Message Blink — No Tab Switching!
Effortless SMS Blasts from Salesforce with Message Blink — No Tab Switching!
Message Blink
 

Quick Start Tutorial of KH Coder 2: Quantitative Content Analysis or Text Mining of English Language Data

  • 1. Quick Start Tutorial of KH Coder: Quantitative Content Analysis or Text Mining of English Language Data Koichi Higuchi 1
  • 2. 2 Preface  This presentation is a part of tutorials for using KH Coder.  KH Coder is a free software for quantitative content analysis or text mining. It is also utilized for computational linguistics.  Details and downloads: http://khc.sourceforge.net/en/
  • 3. Table of Contents 3 Configure KH Coder for English speaking people / English data  1. Change the interface language to English  2. Settings for analyzing English text  Notes on the stopwords Create a new project and prepare for analysis  3. Create a new project  4. Run pre-processing Frequently appeared words and co-occurrences  5. Word frequency list  6. KWIC and collocation stats  7. Co-occurrence network of words  Methods for exploring co-occurrences of words Characteristics of each chapter  8. Distinctive words of each chapter  9. Correspondence analysis of words and chapters Coding Rules  Use coding rules to count concepts  10. Search documents with coding rules  11. Cross tabulation of the codes
  • 4. 1. Change the Interface Language to English 4 Choose “English” here and restart KH Coder. If you prefer the Japanese interface, you may skip this step. You may also change the interface font. Go to [Project] [Settings] in the menubar.
  • 5. 2. Settings for Analyzing English Text 5 (1) Go to [Project] [Settings] in the menubar. (2) Select “Lemmatization.” (3) Click “config.” (4) Open the “tutorial_en” folder, then drag the file “stopwords_sample_en.txt” and drop here. (Or just paste the content of the file here) (5) Click “OK.”(6) Click “OK.”
  • 6. Notes on the Stopwords 6 You can specify any words as stopwords in KH Coder. The stopwords will be given the special POS tag “OTHER.” Words with “OTHER” tag will be excluded from analyses by default.
  • 7. 3. Create a New Project 7 (1) Go to [Project] [New] in the menubar. (2) Click “Browse” and open the file “tutorial_en/botchan_en.txt” (3) fill in whatever memo you like (4) Click “OK.” In this tutorial we analyze a novel “Botchan” by Soseki. “botchan_en.txt” contains all 11 chapters of the novel. Chapter headings are marked with h1 tag Next time you start KH Coder, go to [Project] [Open] in the menubar and open the project you have created here.
  • 8. 4. Run Pre-Processing 8 Go to [Pre-Processing] [Run Pre-Processing] in the menubar. Then click “OK.” Sentence splitting, tokenization, POS tagging and lemmatization are performed. The results are compiled into MySQL database for searching and statistical analysis. When processing data, KH Coder “concentrates” on the job. So sometimes it looks frozen. But it is normal when CPU or disk is busy.
  • 9. 5. Word Frequency List 9 Go to [Tools] [Words] [Frequency List] in the menubar. These are counts of base forms / lemmas
  • 10. 6. KWIC and Collocation Stats 1/2 10 (1) Go to [Tools] [Words] [KWIC Concordance] in the menubar. (2) Input a base form of a word and hit “Enter” on the keybord When you change sort options, click “Search” button again. Double click any line to view wider contexts. You can change viewing Units below. (3) Click “Stats” to open the collocation stats.
  • 11. 6. KWIC and Collocation Stats 2/2 11 (1) Follow the steps in the previous slide to open the collocation stats. (2) You can filter words by POS tags. “L1” stands for “Left 1.” Numbers in this column indicate how many times each words appeared just before the Node Word (left side, distance 1).
  • 12. 7. Co-Occurrence Network of Words 12 (3) Click “Config” and check “Larger nodes for higher frequency words”, then lick “OK.” Now you can see a co-occurrence network of high frequency words in the text. The color change from blue (low) to pink (high). It indicates the centrality index. (1) Go to [Tools] [Words] [Co-Occurrence Network] in the menubar. (2) Select “Paragraphs” as Unit, then click “OK” (4) Click “Config” and increase “edges” (co- occurences) to “top 100,” then lick “OK.” (5) Select “Community: modularity” as “color.” Which version did you like?
  • 13. Methods for Exploring Co-Occurrences of Words 13 To explore co-occurrences of words, you can also use:  hierarchical cluster analysis  multidimensional scaling co-occurrence network cluster analysis MDS By interpreting these result, you may find major themes of the text from groups of words which tend to appear together. KH Coder uses R as back end to execute these multivariate methods.
  • 14. 8. Distinctive Words of Each Chapter 14 (2) Click “Heading 1.” Top 10 distinctive words of each chapter are tabulated. The “distinctiveness” is calculated using Jaccard index. Basically, if a word shows larger probability of appearance in a specific chapter, It’s considered distinctive. (1) Go to [Tools] [Variables & Headings] [List] in the menubar. (3) Select “Sentences.” (4) Select “catalogue: Excel.”
  • 15. 9. Correspondence Analysis of Words and Chapters 15 (2) Click “OK” Using correspondence analysis, you can visually interpret characteristics of each chapter. (1) Go to [Tools] [Words] [Correspondence Analysis] in the menubar. (3) Click “Config”, then reduce words to “Top 30,” check “Bubble plot,” uncheck “Size of variables...,” and click “OK.” (This step is optional.)
  • 16. Use Coding Rules to Count Concepts 16 In some cases, we have to count concepts, not words. To count concepts, you can compose “cording rules” like this: *shopping store or shop or ( merchandise and not develop ) Indicates the name of this code. The conditions for attaching this code. Cases that contain words like store and shop are given the code “shopping.” The parenthetical notation means that cases should contain the word “merchandise” but should not contain the word “develop.” If a case is acceptable under multiple coding rules, multiple codes will be given to the case. We use “tutorial_en/themes.txt” as example coding rules in this tutorial. Please open this file and check the content.
  • 17. 10. Search Documents with Coding Rules 17 (1) Go to [Tools] [Documents] [Search Documents] in the menubar. (2) Click “Browse” and select “tutorial_en/themes.txt” (3) Select “Paragraphs” (4) Double click a code (5) Double click a result to view the whole paragraph. When you compose a coding rule, it is important to search and check the actual documents which are acceptable under the rule.
  • 18. 11. Cross Tabulation of Codes 18 (1) Go to [Tools] [Coding] [Crosstab] in the menubar. (2) Click “Browse” and select “tutorial_en/themes.txt” (3) Select “Sentences” (5) Click “all” to make a graph. In the latter half of the novel, it looks like “aggression” overwhelms “positive affect” and forms the climax of the story at chapter X. (4) Click “Run”
  • 19. Acknowledgement I am grateful to students who attended the 2011 “text mining” class at Doshisha University (Faculty of Culture and Information Science) for giving me some hints on composing coding rules for “Botchan.” Questions or Comments? Please feel free to post questions or comments at web forum here: https://sourceforge.net/p/khc/discussion/