SlideShare a Scribd company logo
Language Technology Enhanced Learning Fridolin Wild The Open University, UK Gaston Burek University of Tübingen Adriana Berlanga Open University, NL
Workshop Outline 1 | Deep Introduction    Latent-Semantic Analysis (LSA) 2 | Quick Introduction   Working with R 3 | Experiment   Simple Content-Based Feedback 4 | Experiment   Topic Proxy #
Latent-Semantic Analysis LSA
Latent Semantic Analysis Assumption: language utterances  do  have  a semantic structure However, this structure is obscured by word usage (noise, synonymy, polysemy, …) Proposed LSA Solution:  map doc-term matrix  using conceptual indices  derived statistically (truncated  SVD )  and make similarity comparisons  using e.g.  angles
Input (e.g., documents) { M } =  Deerwester, Dumais, Furnas, Landauer, and Harshman (1990):  Indexing by Latent Semantic Analysis, In: Journal of the American  Society for Information Science, 41(6):391-407 Only the red terms appear in more  than one document, so strip the rest. term = feature vocabulary = ordered set of features TEXTMATRIX
Singular Value Decomposition =
Truncated SVD …  we will get a different matrix (different values,  but still of the same format as M). latent-semantic space
Reconstructed, Reduced Matrix m4:  Graph   minors : A  survey
Similarity in a Latent-Semantic Space (Landauer, 2007) Query Target 1 Target 2 Angle 2 Angle 1 Y dimension X dimension
doc2doc - similarities Unreduced = pure vector space model - Based on  M = TSD’ - Pearson Correlation   over document vectors reduced - based on  M 2  = TS 2 D’ - Pearson Correlation    over document vectors
(Landauer, 2007)
Configurations 4 x 12 x 7 x 2 x 3  =  2016 Combinations
Updating: Folding-In SVD factor stability Different texts – different factors Challenge: avoid unwanted factor changes  (e.g., bad essays) Solution: folding-in instead of recalculating SVD is computationally expensive 14 seconds (300 docs textbase)  10 minutes (3500 docs textbase) …  and rising!
The Statistical Language  and Environment R R
 
Help > ?'+' > ?kmeans > help.search("correlation") http://www.r-project.org  => site search => documentation Mailinglist r-help Task View NLP: http://cran.r-project.org/  -> Task Views -> NLP
Installation & Configuration install.packages("lsa", repos="http://cran.r-project.org") install.packages("tm", repos="http://cran.r-project.org") install.packages("network", repos="http://cran.r-project.org") library(lsa) setwd("d:/denkhalde/workshop") dir() ls() quit()
The lsa Package Available via CRAN, e.g.: http://cran.at.r-project.org/src/contrib/Descriptions/lsa.html   Higher-level Abstraction to Ease Use Five core methods: textmatrix()  /  query() lsa() fold_in() as.textmatrix() Supporting methods for term weighting, dimensionality calculation, correlation measurement, triple binding
Core Processing Workflow tm = textmatrix(‘dir/‘) tm = lw_logtf(tm) * gw_idf(tm) space = lsa(tm, dims=dimcalc_share()) tm3 = fold_in(tm, space) as.textmatrix(tm)
A Simple Evaluation of Students Writings Feedback
Evaluating Student Writings External Validation? Compare to Human Judgements! (Landauer, 2007)
How to do it... library( "lsa“ )   # load package # load training texts trm = textmatrix( "trainingtexts/“ ) trm = lw_bintf( trm ) * gw_idf( trm ) # weighting space = lsa( trm ) # create an LSA space # fold-in essays to be tested (including gold standard text) tem = textmatrix( "testessays/", vocabulary=rownames(trm) ) tem = lw_bintf( tem ) * gw_idf( trm ) # weighting tem_red = fold_in( tem, space ) # score an essay by comparing with  # gold standard text (very simple method!) cor( tem_red[,"goldstandard.txt"], tem_red[,"E1.txt"] ) => 0.7
Evaluating Effectiveness Compare Machine Scores  with Human Scores Human-to-Human Correlation Usually around .6 Increased by familiarity between   assessors, tighter assessment schemes, … Scores vary even stronger with decreasing   subject familiarity (.8 at high familiarity,   worst test -.07) Test Collection: 43 German Essays, scored from 0 to 5 points (ratio scaled), average length: 56.4 words Training Collection: 3 ‘golden essays’, plus 302 documents from a marketing glossary, average length: 56.1 words
(Positive) Evaluation Results LSA machine scores: Spearman's rank correlation rho data:  humanscores[names(machinescores), ] and machinescores  S = 914.5772, p-value =  0.0001049 alternative hypothesis: true rho is not equal to 0  sample estimates: rho  0.687324  Pure vector space model: Spearman's rank correlation rho data:  humanscores[names(machinescores), ] and machinescores  S = 1616.007, p-value =  0.02188 alternative hypothesis: true rho is not equal to 0  sample estimates: rho  0.4475188
Concept-Focused Evaluation (using http://eczemablog.blogspot.com/feeds/posts/default?alt=rss)
Visualising Lexical Semantics Topic Proxy
Network Visualisation Term-2-Term  distance matrix = = Graph t 1 t 2 t 3 t 4 t 1 1 t 2 -0.2 1 t 3 0.5 0.7 1 t 4 0.05 -0.5 0.3 1
Classical Landauer Example tl = landauerSpace$tk %*% diag(landauerSpace$sk) dl = landauerSpace$dk %*% diag(landauerSpace$sk) dtl = rbind(tl,dl) s = cosine(t(dtl)) s[which(s<0.8)] = 0 plot( network(s), displaylabels=T,  vertex.col = c(rep(2,12), rep(3,9)) )
Divisive Clustering (Diana)
edmedia Terminology
Code Sample d2000 = cosine(t(dtm2000)) dianac2000 = diana(d2000, diss=T) clustersc2000 = cutree(as.hclust(dianac2000), h=0.2) plot(dianac2000, which.plot=2, cex=.1)  # dendrogramme winc = clustersc2000[which(clustersc2000==1)]  # filter for cluster 1 wincn = names(winc) d = d2000[wincn,wincn] d[which(d<0)] == 0 btw = betweenness(d, cmode=&quot;undirected&quot;)  # for nodes size calc btwmax = colnames(d)[which(btw==max(btw))] btwcex = (btw/max(btw))+1 plot(network(d), displayisolates=F, displaylabels=T, boxed.labels=F, edge.col=&quot;gray&quot;,  main=paste(&quot;cluster&quot;,i), usearrows=F, vertex.border=&quot;darkgray&quot;, label.col=&quot;darkgray&quot;, vertex.cex=btwcex*3, vertex.col=8-(colnames(d) %in% btwmax))
Permutating Permutation
Permutation test NON PARAMETRIC:  does not assume that the data have a particular  probability distribution .   Suppose the following  ranking of elements of two categories X and Y Actual data to be evaluated, (x_1,x_2,y_1) = (1,9,2).  Let, T(x_1,x_2,y_1)=abs(mean X- mean Y) = 2
Permutation Usually, it is not practical to evaluate all N! permutatioons.  We can approximate the p-value by sampling randomly from the set of permutations.
The permutations are: permutation    value of T         --------------------------------------------          (1,9,3)          2             (actual data)          (9,1,3)           2                       (1,3,9)           7                       (3,1,9)           7                       (3,9,1)           5                      (9,3,1)           5            
Some results Students discussions on safe prescribing: Classified according expected learning outcomes related subtopics topics: A=7, B=12, C=53, D=4, E=40, F=7 Graded: poor, fair, good, excelent Methodology used: LSA Bag of words/Maximal Repeated Phrases Permutation test
Challenging Questions Discussion
Questions Dangers of using  Language Technology? Ontologies = Neat? NLP = Nasty? Other possible application areas? Corpus Collection? What is good effectiveness? When can we say that an algorithm works well? Other aspects not evaluated…
Questions? #eof.

More Related Content

What's hot (20)

PPTX
Duet @ TREC 2019 Deep Learning Track
Bhaskar Mitra
 
PPTX
Neural Models for Information Retrieval
Bhaskar Mitra
 
ODP
Topic Modeling
Karol Grzegorczyk
 
PDF
Interactive Latent Dirichlet Allocation
Quentin Pleplé
 
PDF
Author Topic Model
FReeze FRancis
 
PPTX
Deep Learning for Search
Bhaskar Mitra
 
PPT
Intelligent Methods in Models of Text Information Retrieval: Implications for...
inscit2006
 
PPTX
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
PPTX
Neural Models for Information Retrieval
Bhaskar Mitra
 
PPTX
Adversarial and reinforcement learning-based approaches to information retrieval
Bhaskar Mitra
 
PPT
Topic Models - LDA and Correlated Topic Models
Claudia Wagner
 
PDF
TopicModels_BleiPaper_Summary.pptx
Kalpit Desai
 
PPTX
Tdm probabilistic models (part 2)
KU Leuven
 
PPTX
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
PPTX
A Simple Introduction to Neural Information Retrieval
Bhaskar Mitra
 
PPTX
Learning group dssm - 20170605
Shuai Zhang
 
PPTX
Neural Models for Document Ranking
Bhaskar Mitra
 
PDF
Skip gram and cbow
hyunyoung Lee
 
PDF
Topics Modeling
Svitlana volkova
 
PDF
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
Paris Women in Machine Learning and Data Science
 
Duet @ TREC 2019 Deep Learning Track
Bhaskar Mitra
 
Neural Models for Information Retrieval
Bhaskar Mitra
 
Topic Modeling
Karol Grzegorczyk
 
Interactive Latent Dirichlet Allocation
Quentin Pleplé
 
Author Topic Model
FReeze FRancis
 
Deep Learning for Search
Bhaskar Mitra
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
inscit2006
 
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 
Neural Models for Information Retrieval
Bhaskar Mitra
 
Adversarial and reinforcement learning-based approaches to information retrieval
Bhaskar Mitra
 
Topic Models - LDA and Correlated Topic Models
Claudia Wagner
 
TopicModels_BleiPaper_Summary.pptx
Kalpit Desai
 
Tdm probabilistic models (part 2)
KU Leuven
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
Bhaskar Mitra
 
A Simple Introduction to Neural Information Retrieval
Bhaskar Mitra
 
Learning group dssm - 20170605
Shuai Zhang
 
Neural Models for Document Ranking
Bhaskar Mitra
 
Skip gram and cbow
hyunyoung Lee
 
Topics Modeling
Svitlana volkova
 
Ultra-efficient algorithms for testing well-parenthesised expressions by Tati...
Paris Women in Machine Learning and Data Science
 

Similar to Language Technology Enhanced Learning (20)

PDF
Text Mining Analytics 101
Manohar Swamynathan
 
PPT
Intro.ppt
WrushabhShirsat3
 
PPTX
Learning deep structured semantic models for web search
hyunsung lee
 
ODP
Introduction to R
agnonchik
 
DOC
Discovering Novel Information with sentence Level clustering From Multi-docu...
irjes
 
PPT
similarity measure
ZHAO Sam
 
PPTX
Dsm as theory building
ClarkTony
 
PPT
lecture_mooney.ppt
butest
 
PDF
User biglm
johnatan pladott
 
PPTX
Deep Learning for Search
Bhaskar Mitra
 
PPTX
Deep Learning for Search
Bhaskar Mitra
 
PDF
Lk module3
Krishna Nanda
 
PDF
Can recurrent neural networks warp time
Danbi Cho
 
PPT
ECO_TEXT_CLUSTERING
George Simov
 
PPTX
Natural Language Processing in R (rNLP)
fridolin.wild
 
PDF
Crash-course in Natural Language Processing
Vsevolod Dyomkin
 
PDF
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Dwaipayan Roy
 
PDF
Manipulating string data with a pattern in R
Lun-Hsien Chang
 
PDF
Fosdem 2013 petra selmer flexible querying of graph data
Petra Selmer
 
PDF
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
ranapoonam1
 
Text Mining Analytics 101
Manohar Swamynathan
 
Intro.ppt
WrushabhShirsat3
 
Learning deep structured semantic models for web search
hyunsung lee
 
Introduction to R
agnonchik
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
irjes
 
similarity measure
ZHAO Sam
 
Dsm as theory building
ClarkTony
 
lecture_mooney.ppt
butest
 
User biglm
johnatan pladott
 
Deep Learning for Search
Bhaskar Mitra
 
Deep Learning for Search
Bhaskar Mitra
 
Lk module3
Krishna Nanda
 
Can recurrent neural networks warp time
Danbi Cho
 
ECO_TEXT_CLUSTERING
George Simov
 
Natural Language Processing in R (rNLP)
fridolin.wild
 
Crash-course in Natural Language Processing
Vsevolod Dyomkin
 
Representing Documents and Queries as Sets of Word Embedded Vectors for Infor...
Dwaipayan Roy
 
Manipulating string data with a pattern in R
Lun-Hsien Chang
 
Fosdem 2013 petra selmer flexible querying of graph data
Petra Selmer
 
INTRODUCTION AND HISTORY OF R PROGRAMMING.pdf
ranapoonam1
 
Ad

More from telss09 (20)

PPT
Pimp my PhD
telss09
 
PPT
Modeling of the learning outcomes reached at the summer school
telss09
 
PPT
Outcome: authoring an adaptive online course
telss09
 
PDF
First steps in social network analysis
telss09
 
DOC
Mash-Up Personal Learning Environments
telss09
 
PPT
Recommender Systems in TEL
telss09
 
PPT
The relation of PLE, LMS, and Open Content
telss09
 
PPT
MOS MindOnSite
telss09
 
DOC
Junior/Senior Faculty Breakfast
telss09
 
DOC
Industry Breakfast
telss09
 
PDF
Configuring VLEs for Mathematics
telss09
 
PPT
Personal Competence Development in Learning Networks
telss09
 
PPT
Assessing and promoting computer-supported collaborative learning
telss09
 
PPT
Technology and the Transformation of Learning
telss09
 
PDF
Towards a Conceptual Framework for Requirement Gathering and Roadmapping in t...
telss09
 
PDF
Language Technologies for Lifelong Learning
telss09
 
PPT
Adaptive Learning Environments
telss09
 
PPT
Methods to test an e-learning Web application.
telss09
 
PPT
EC-TEL Doctoral Consortium
telss09
 
PDF
Creating integrated domain, task and competency model
telss09
 
Pimp my PhD
telss09
 
Modeling of the learning outcomes reached at the summer school
telss09
 
Outcome: authoring an adaptive online course
telss09
 
First steps in social network analysis
telss09
 
Mash-Up Personal Learning Environments
telss09
 
Recommender Systems in TEL
telss09
 
The relation of PLE, LMS, and Open Content
telss09
 
MOS MindOnSite
telss09
 
Junior/Senior Faculty Breakfast
telss09
 
Industry Breakfast
telss09
 
Configuring VLEs for Mathematics
telss09
 
Personal Competence Development in Learning Networks
telss09
 
Assessing and promoting computer-supported collaborative learning
telss09
 
Technology and the Transformation of Learning
telss09
 
Towards a Conceptual Framework for Requirement Gathering and Roadmapping in t...
telss09
 
Language Technologies for Lifelong Learning
telss09
 
Adaptive Learning Environments
telss09
 
Methods to test an e-learning Web application.
telss09
 
EC-TEL Doctoral Consortium
telss09
 
Creating integrated domain, task and competency model
telss09
 
Ad

Recently uploaded (20)

PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
PPTX
How to Manage Access Rights & User Types in Odoo 18
Celine George
 
PDF
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
PPTX
PPT on the Development of Education in the Victorian England
Beena E S
 
PPTX
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
PPTX
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
PPTX
Soil and agriculture microbiology .pptx
Keerthana Ramesh
 
PPTX
BANDHA (BANDAGES) PPT.pptx ayurveda shalya tantra
rakhan78619
 
PPTX
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PPT
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
PPSX
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
PPTX
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
PPTX
Capitol Doctoral Presentation -July 2025.pptx
CapitolTechU
 
PPTX
Quarter1-English3-W4-Identifying Elements of the Story
FLORRACHELSANTOS
 
PPTX
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
PPTX
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
PPTX
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
PPTX
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
PPTX
How to Configure Prepayments in Odoo 18 Sales
Celine George
 
Dimensions of Societal Planning in Commonism
StefanMz
 
How to Manage Access Rights & User Types in Odoo 18
Celine George
 
IMP NAAC REFORMS 2024 - 10 Attributes.pdf
BHARTIWADEKAR
 
PPT on the Development of Education in the Victorian England
Beena E S
 
Unit 2 COMMERCIAL BANKING, Corporate banking.pptx
AnubalaSuresh1
 
How to Manage Large Scrollbar in Odoo 18 POS
Celine George
 
Soil and agriculture microbiology .pptx
Keerthana Ramesh
 
BANDHA (BANDAGES) PPT.pptx ayurveda shalya tantra
rakhan78619
 
Pyhton with Mysql to perform CRUD operations.pptx
Ramakrishna Reddy Bijjam
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
Health Planning in india - Unit 03 - CHN 2 - GNM 3RD YEAR.ppsx
Priyanshu Anand
 
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
Capitol Doctoral Presentation -July 2025.pptx
CapitolTechU
 
Quarter1-English3-W4-Identifying Elements of the Story
FLORRACHELSANTOS
 
Gall bladder, Small intestine and Large intestine.pptx
rekhapositivity
 
How to Create Rental Orders in Odoo 18 Rental
Celine George
 
2025 Winter SWAYAM NPTEL & A Student.pptx
Utsav Yagnik
 
How to Configure Lost Reasons in Odoo 18 CRM
Celine George
 
How to Configure Prepayments in Odoo 18 Sales
Celine George
 

Language Technology Enhanced Learning

  • 1. Language Technology Enhanced Learning Fridolin Wild The Open University, UK Gaston Burek University of Tübingen Adriana Berlanga Open University, NL
  • 2. Workshop Outline 1 | Deep Introduction Latent-Semantic Analysis (LSA) 2 | Quick Introduction Working with R 3 | Experiment Simple Content-Based Feedback 4 | Experiment Topic Proxy #
  • 4. Latent Semantic Analysis Assumption: language utterances do have a semantic structure However, this structure is obscured by word usage (noise, synonymy, polysemy, …) Proposed LSA Solution: map doc-term matrix using conceptual indices derived statistically (truncated SVD ) and make similarity comparisons using e.g. angles
  • 5. Input (e.g., documents) { M } = Deerwester, Dumais, Furnas, Landauer, and Harshman (1990): Indexing by Latent Semantic Analysis, In: Journal of the American Society for Information Science, 41(6):391-407 Only the red terms appear in more than one document, so strip the rest. term = feature vocabulary = ordered set of features TEXTMATRIX
  • 7. Truncated SVD … we will get a different matrix (different values, but still of the same format as M). latent-semantic space
  • 8. Reconstructed, Reduced Matrix m4: Graph minors : A survey
  • 9. Similarity in a Latent-Semantic Space (Landauer, 2007) Query Target 1 Target 2 Angle 2 Angle 1 Y dimension X dimension
  • 10. doc2doc - similarities Unreduced = pure vector space model - Based on M = TSD’ - Pearson Correlation over document vectors reduced - based on M 2 = TS 2 D’ - Pearson Correlation over document vectors
  • 12. Configurations 4 x 12 x 7 x 2 x 3 = 2016 Combinations
  • 13. Updating: Folding-In SVD factor stability Different texts – different factors Challenge: avoid unwanted factor changes (e.g., bad essays) Solution: folding-in instead of recalculating SVD is computationally expensive 14 seconds (300 docs textbase) 10 minutes (3500 docs textbase) … and rising!
  • 14. The Statistical Language and Environment R R
  • 15.  
  • 16. Help > ?'+' > ?kmeans > help.search(&quot;correlation&quot;) http://www.r-project.org => site search => documentation Mailinglist r-help Task View NLP: http://cran.r-project.org/ -> Task Views -> NLP
  • 17. Installation & Configuration install.packages(&quot;lsa&quot;, repos=&quot;http://cran.r-project.org&quot;) install.packages(&quot;tm&quot;, repos=&quot;http://cran.r-project.org&quot;) install.packages(&quot;network&quot;, repos=&quot;http://cran.r-project.org&quot;) library(lsa) setwd(&quot;d:/denkhalde/workshop&quot;) dir() ls() quit()
  • 18. The lsa Package Available via CRAN, e.g.: http://cran.at.r-project.org/src/contrib/Descriptions/lsa.html Higher-level Abstraction to Ease Use Five core methods: textmatrix() / query() lsa() fold_in() as.textmatrix() Supporting methods for term weighting, dimensionality calculation, correlation measurement, triple binding
  • 19. Core Processing Workflow tm = textmatrix(‘dir/‘) tm = lw_logtf(tm) * gw_idf(tm) space = lsa(tm, dims=dimcalc_share()) tm3 = fold_in(tm, space) as.textmatrix(tm)
  • 20. A Simple Evaluation of Students Writings Feedback
  • 21. Evaluating Student Writings External Validation? Compare to Human Judgements! (Landauer, 2007)
  • 22. How to do it... library( &quot;lsa“ ) # load package # load training texts trm = textmatrix( &quot;trainingtexts/“ ) trm = lw_bintf( trm ) * gw_idf( trm ) # weighting space = lsa( trm ) # create an LSA space # fold-in essays to be tested (including gold standard text) tem = textmatrix( &quot;testessays/&quot;, vocabulary=rownames(trm) ) tem = lw_bintf( tem ) * gw_idf( trm ) # weighting tem_red = fold_in( tem, space ) # score an essay by comparing with # gold standard text (very simple method!) cor( tem_red[,&quot;goldstandard.txt&quot;], tem_red[,&quot;E1.txt&quot;] ) => 0.7
  • 23. Evaluating Effectiveness Compare Machine Scores with Human Scores Human-to-Human Correlation Usually around .6 Increased by familiarity between assessors, tighter assessment schemes, … Scores vary even stronger with decreasing subject familiarity (.8 at high familiarity, worst test -.07) Test Collection: 43 German Essays, scored from 0 to 5 points (ratio scaled), average length: 56.4 words Training Collection: 3 ‘golden essays’, plus 302 documents from a marketing glossary, average length: 56.1 words
  • 24. (Positive) Evaluation Results LSA machine scores: Spearman's rank correlation rho data: humanscores[names(machinescores), ] and machinescores S = 914.5772, p-value = 0.0001049 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.687324 Pure vector space model: Spearman's rank correlation rho data: humanscores[names(machinescores), ] and machinescores S = 1616.007, p-value = 0.02188 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.4475188
  • 25. Concept-Focused Evaluation (using http://eczemablog.blogspot.com/feeds/posts/default?alt=rss)
  • 27. Network Visualisation Term-2-Term distance matrix = = Graph t 1 t 2 t 3 t 4 t 1 1 t 2 -0.2 1 t 3 0.5 0.7 1 t 4 0.05 -0.5 0.3 1
  • 28. Classical Landauer Example tl = landauerSpace$tk %*% diag(landauerSpace$sk) dl = landauerSpace$dk %*% diag(landauerSpace$sk) dtl = rbind(tl,dl) s = cosine(t(dtl)) s[which(s<0.8)] = 0 plot( network(s), displaylabels=T, vertex.col = c(rep(2,12), rep(3,9)) )
  • 31. Code Sample d2000 = cosine(t(dtm2000)) dianac2000 = diana(d2000, diss=T) clustersc2000 = cutree(as.hclust(dianac2000), h=0.2) plot(dianac2000, which.plot=2, cex=.1) # dendrogramme winc = clustersc2000[which(clustersc2000==1)] # filter for cluster 1 wincn = names(winc) d = d2000[wincn,wincn] d[which(d<0)] == 0 btw = betweenness(d, cmode=&quot;undirected&quot;) # for nodes size calc btwmax = colnames(d)[which(btw==max(btw))] btwcex = (btw/max(btw))+1 plot(network(d), displayisolates=F, displaylabels=T, boxed.labels=F, edge.col=&quot;gray&quot;, main=paste(&quot;cluster&quot;,i), usearrows=F, vertex.border=&quot;darkgray&quot;, label.col=&quot;darkgray&quot;, vertex.cex=btwcex*3, vertex.col=8-(colnames(d) %in% btwmax))
  • 33. Permutation test NON PARAMETRIC: does not assume that the data have a particular probability distribution . Suppose the following ranking of elements of two categories X and Y Actual data to be evaluated, (x_1,x_2,y_1) = (1,9,2). Let, T(x_1,x_2,y_1)=abs(mean X- mean Y) = 2
  • 34. Permutation Usually, it is not practical to evaluate all N! permutatioons. We can approximate the p-value by sampling randomly from the set of permutations.
  • 35. The permutations are: permutation    value of T         --------------------------------------------         (1,9,3)         2             (actual data)         (9,1,3)           2                     (1,3,9)           7                     (3,1,9)           7                     (3,9,1)           5                     (9,3,1)           5            
  • 36. Some results Students discussions on safe prescribing: Classified according expected learning outcomes related subtopics topics: A=7, B=12, C=53, D=4, E=40, F=7 Graded: poor, fair, good, excelent Methodology used: LSA Bag of words/Maximal Repeated Phrases Permutation test
  • 38. Questions Dangers of using Language Technology? Ontologies = Neat? NLP = Nasty? Other possible application areas? Corpus Collection? What is good effectiveness? When can we say that an algorithm works well? Other aspects not evaluated…