0% found this document useful (0 votes)

39 views5 pages

A Machine Learning Approach For Stylometric Analysis of Bangla Literature

bangla stylometry

Uploaded by

Chowdhury Rafsan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views5 pages

A Machine Learning Approach For Stylometric Analysis of Bangla Literature

bangla stylometry

Uploaded by

Chowdhury Rafsan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2017 20th International Conference of Computer and Information Technology (ICCIT), 22-24 December, 2017

A Machine Learning Approach for Stylometric Analysis of Bangla

Literature
Urmee Pal Ayesha Siddika Nipu Sabir Ismail
Computer Science and Engineering Computer Science and Engineering Computer Science and Engineering
Shahjalal University of Shahjalal University of Shahjalal University of
Science and Technology Science and Technology Science and Technology
Sylhet, Bangladesh Sylhet, Bangladesh Sylhet, Bangladesh
[email protected] [email protected] [email protected]

Abstract—The term Stylogenetics refers to the eloquent anal- from writer to writer. Analyzing all these categories, modern
ysis of authors literary corpora which are based on clustering. technology started a new era where writers can be detected
While writing, a writer focuses on some frequent things sub- via their writing style and statistic in case of any deceit. As
consciously. We1 focused on these things and tried to detect the
affinity and divergence of the writing of different authors. In this a consequence of this mechanism, none can claim others
approach, our proposal is regarding on some particular features writing as his/her own so far. If so, the fraud will easily be
to distinguish authors individuality who writes and establishes detected and the copyright will be preserved.
their own viewpoint on similar issues. Here we assembled Bengali
Blogs scripted by twenty Bangladeshi authors of two different Although Sylogenetics is applied in English literature pre-
fields e.g. Political, Educational and analyzed the corpus. Via our
methodology, we evaluated some features such as negative Word viously, in Bengali literature, it is utilized recently. In this
frequency in particular position, Rapidity of use of highest length approach, our proposal is regarding on some particular features
word and sentence, Suffix Count, Use of particular Punctuation, like negative Word frequency in particular position, Rapidity
Common Recognizable word frequency, Classification of Parts of of use of highest length word and sentence, Suffix Count,
speech, Numeric words frequency and so on. First, we trained the Use of particular Punctuation, Common Recognizable word
system using these features and then distinguished from random
data sets using two machine learning approaches, Support Vector frequency, Classification of Parts of speech, Numeric word
Machines (SVM) and Naive Bayes classifier. frequency and so on.
This proposal provides more accuracy than previously established
works as all the collected corpus here, are of different writers II. RELATED WORKS
writing, on the analogous field. Stylogenetics has created a new era though there have been
Index Terms—Stylogenetics, Clustering, Affinity, Machine
learning, SVM, Naive Bayes, Frequency, Distinguish, Analogous. lots of works done by Stylometry which is fundamentally
the same as Stylogenetics. The investigation of the sequence
of a writer’s work constructed particularly by the repetition
I. INTRODUCTION of particular patterns of speculation is known as Stylometry.
Language is basically a collection of words through which, This field covered distinctive parts in the sphere of Stylometry.
people share their feelings and views with the world either
via oral or writing. Language differs from geographical areas, Prapti Das, Rishmita Tasmim and Sabir Ismail worked for
nations, cultures and so on. But even in the same language, Stylogenetics which presents an overview of writing patterns
the appearance of writing differs from person to person. It’s by four different Bangladeshi writers [1]. Vector Space Model
not that easy to recognize a script or a writer by their writing is constructed to collect different features of training as
manually. To make this task more static, technology is used well as testing data sets of distinct data sets. The maximum
fundamentally based on Stylogenetics. correlational values are then evaluated. This paper inspired
us mostly as it worked for clustering based Bangla literary
The term Stylogenetics refers to the eloquent analysis corpora which are the goal of our current research.
of authors literary corpora which are based on clustering.
While writing, a writer focuses on some frequent things Kim Luyckx, Walter Daelemans and Edward Vanhoutte
subconsciously. These things may base on the location, gave an account of an assay with a huge corpus of five
gender, age or mentality of the writer and it mostly varies million words comprising of agent tests of male and female
originators so far [2]. Luyckx, Kim, Walter Daelemans, and
1 contributions of the first and second authors on this work are equal Edward Vanhoutte worked on frequency of word, syntax and

978-1-5386-1150-0/17/$31.00 ©2017 IEEE

lexical analysis of words give more explicit decision used by • Recurrence of Pronoun according to Person
them whereas token level features are vastly applied though • Recurrence of Conjunction according to Classification
[3]. Their significant addition is remarkable so far in the field • Common Recognizable word frequency
of research on Stylogenetics. • Numeric word frequency
• Future Predicted Word Frequency
Michael Brennan and Rachel Greenstadt tried to manage • Advisory Word Frequency
Stylometry with antagonistic assaults using powerful strategies
C. EXPERIMENTAL STUDY
[4]. Neural Network Approach, Synonym-Based Classifier,
Statistical Method using the Signature Stylometric System 1) Negative Word frequency in Particular Position: Use
methodologies are taken to be reviewed in contrast to two of Negative words denotes negativity in an authors mentality.
sorts of antagonistic assaults. Here we computed the frequency of using negative words like
না, নয়, েনই etc. in some particular position. Through this, we
Roger Peng Nicolas Hengartner mainly focused on the saw that Political writers like AD, AC, BS holds negativity in
style of writing of particular authors and established different their writing consequently more often whereas GK, TA rarely
breeds for each of them [5]. This paper substantially adopted uses negative words in their scripts. It distinguishes among
PCA(Principal Component Analysis) and CDA(Canonical writers accurately. The Same feature is implemented in case
Discrimination Analysis) to specify corpus structure and of Educational writers also. Figure 1 shows the negative word
distinguish origination through machine learning. frequency in particular position for Political writers.

III. METHODOLOGY
A. CORPUS
We assembled Bengali Blogs scripted by twenty
Bangladeshi authors of two different fields e.g. Political,
Educational and analyzed the corpus.
We gathered around 2,01,628 words of political writings and
1,43,760 words of educational writings. We named those
twenty writers differently due to copyright protection.

Table I
Information of Political Writers Corpus

Writer Words Writer Words

AD 19394 TA 19498
AC 20930 FM 18922
AG 18497 BS 22896
GK 18159 SEA 20897 Figure 1. Negative Word frequency in Particular Position (Political)
CS 21418 SK 21017

2) The rapidity of the use of Highest length word: Ana-

lyzing all the data sets of twenty particular writers, we found
Table II
Information of Educational Writers Corpus
that 4 length words are most rapidly used in almost all authors
writings. So we computed the frequency of 4 length words for
Writer Words Writer Words each writer. This feature helps to distinguish among all these
AH 16421 MA 12256 authors precisely. Figure 2 and 3 shows Rapidity of the use
GR 14386 MH 14652
JHM 13920 MMA 15785 of Highest length word for educational and political authors
MZI 14736 MR 15920 respectively.
MB 13194 SV 12490
3) Rapidly used Sentence length: Frequency of different
length of sentence is another important feature to detect
B. FEATURE SELECTION
author.Using this feature, we categorized authors by the
We selected these following features and analyzed for divergence of their used sentence length.
Twenty writers data.
• Negative Word frequency in Particular Position 4) Suffix Count: The suffix of each individual writers often
• Rapidity of the use of Highest length word vary from each other. That’s why we counted the suffix of
• Rapidly used sentence length writers to preserve the copyright. Figure 4 shows Suffix Count
• Suffix Count for political authors.
• Use of particular Punctuation
5) Use of particular Punctuation: From various
punctuation marks, we worked on Interrogative sign as
it’s not used that much frequently usually. The frequency of
the use of Interrogative sign differs from writer to writer and
it defines lack of confidence in an author.Figure 6 shows
Frequency of Interrogative Sign for Political writers.

Figure 2. Rapidity of the use of Highest length word (Educational)

Figure 5. Frequency of Interrogative Sign (Political)

6) Recurrence of Pronoun according to Person: There are

three types of person in Bengali grammar e.g. 1st person, 2nd
person, 3rd person. We computed the recurrence of pronoun
according to person to discriminate among our selected
authors. Figure 6 shows the recurrence of pronoun according
to person for educational authors.

Figure 3. Rapidity of the use of Highest length word (Political)

Figure 6. Recurrence of Pronoun according to Person (Educational)

7) Recurrence of Conjunction according to Classification:

Authors don’t maintain the use of features based on syntax
usually in conscious mind. That’s why it can be a feature to
differ the writings of different writers.In Bengali grammar,
4 types of Conjunction are most significant.e.g.বাংলা অব য়,
তৎসম অব য়, িবেদিশ অব য়, অনু কার বা ধব্ন াত্মক অব য়। We computed
Figure 4. Suffix Count (Political) the recurrence of these 4 types of conjunction in each writer
to individuate authors. Figure 7 shows the recurrence of
conjunctions for each writer.

Figure 9. Common Recognizable word frequency (Political)

Figure 7. Recurrence of Conjunction according to Classification (Educational)

8) Common Recognizable word frequency: Each writer has

some unique words which are recognizable to them. As we
selected authors who write and establish their own viewpoint
on similar issues, some recognizable words are common to all
writers. We computed the frequency of these selected common
recognizable words and calculated divergence. Figure 8 and 9
shows commonly recognizable word frequency for educational
and political authors respectively.

Figure 10. Numeric word frequency (Educational)

mentality in a writer.We analyzed this feature on our data set

of twenty writers and it turns as most deterministic one.

11) Optative word frequency: In the Bengali language,

there remain some optative words like উিচত,েযওনা,করেব etc. We
analyzed our corpus and computed the frequency of these
optative word frequency in each writer to discriminate among
them.
Figure 8. Common Recognizable word frequency (Educational)
IV. RESULT ANALYSIS
9) Numeric word frequency: Use of numeric words in Classification algorithms based on two machine learning
writings is another significant feature. Bengali numeric words approaches: Support Vector Machines (SVM) and Naive Bayes
like একিট, পৰ্থম, চার etc are used in different frequency in classifier are implemented to classify an unknown document
particular writers. This feature plays an important role in to its original writer. The documents of different writers are
differentiating authors.Figure 10 shows the numeric word stemmed first and the machine is trained with our selected
frequency for educational authors. features using 90% of the documents. The rest 10% of the
documents are being tested to indicate the original author.
10) Future Predicted Word Frequency: Some Bengali Then the accuracy is calculated for two different classification
words like হেব, হেত পাের,হয়েতা etc defines more future prediction models.
Figure 11 shows the accuracy of a document indicating [7] Holmes, D. I. (1985), “The Analysis of Literary Style: A Review,”
it’s original author implementing both SVM and Naive Bayes Journal of the Royal Statistical Society, Series A, 148, 328, 341.
[8] Williams, C. B. (1940), ”A Note on the Statistical Analysis of Sentence-
classifier. Length as a Criterion of Literary Style,” Biometrika, 31, 356, 361.
[9] Mark Richardson, “Principal Component Analysis”, May 2009.
[10] Cristinel Constantin, “Principal Component Analysis – A Powerful Tool
in Computing Marketing Information”, Bulletin of the Transilvania
University of Braşov Series V: Economic Sciences, Vol. 7 (56) No.
2 – 2014.
[11] Alexander Ilin and Tapani Raiko, “Practical Approaches to Principal
Component Presence of Missing Values”, Journal of Machine Learning
Research 11 (2010) 1957-2000.
[12] Herv ´ eAbdi and Lynne J. Williams, “Principal Component Analysis”.
[13] P. Julia Grace and A. Sheema, “A Survey on Fake Indian Paper Currency
Identification System”, Volume 6, Issue 7, July 2016.
[14] Lukic Tina, Blesic IvanaA, Basarin Biljana, Ivanovic Bibic Ljubica,
Milosevic Dragana and Sakulski Dusan, “Predatory and Fake Scientific
Journals/Publishers – A Global Outbreak with Rising Trend: A Review”.
[15] Calix, K., et al. ”Stylometry for e-mail author identification and authen-
tication.” Proceedings of CSIS Research Day, Pace University (2008):
Figure 11. Pie Chart of Accuracy Comparison 1048-1054.
[16] Celikel, Ebru, and Mehmet Emin Dalkılıç. ”Investigating the effects of
recency and size of training text on author recognition problem.” Inter-
In our classification system, if five or six writers documents national Symposium on Computer and Information Sciences. Springer,
are taken, then the model gives the best accuracy of 90.74% Berlin, Heidelberg, 2004.
[17] Clark, Jonathan H., and Charles J. Hannon. ”A classifier system for au-
on SVM and 86.21% accuracy on Naive Bayes. The accuracy thor recognition using synonym-based features.” Mexican International
decreases to 73.64% on SVM and 70.38% on naive Bayes Conference on Artificial Intelligence. Springer, Berlin, Heidelberg,
classification while taking 20 individual writers of both polit- 2007.
[18] Holmes, David I., and Richard S. Forsyth. ”The Federalist revisited:
ical and educational field. New directions in authorship attribution.” Literary and Linguistic com-
V. CONCLUSIONS puting 10.2 (1995): 111-127.
[19] Juola, Patrick. ”Authorship Attribution. Foundations and Trends (r) in
Many standard multivariate statistical techniques are pro- Information Retrieval.” (2008).
vided in Stylogenetics. This motivates to explore and analyze [20] Oakes, Michael. ”Ant colony optimisation for stylometry: The federalist
papers.” Proceedings of the 5th International Conference on Recent
literary data to a great extent. In this paper, twenty different Advances in Soft Computing. 2004.
writers blogs on analogous field are reviewed using fourteen [21] Tweedie, Fiona J., Sameer Singh, and David I. Holmes. ”Neural network
features. These statistical values are then used to compare applications in stylometry: The Federalist Papers.” Computers and the
Humanities 30.1 (1996): 1-10.
among them. Dimension Reduction and Principal Component [22] Uzuner, Ozlem, and Boris Katz. ”A comparative study of language
Analysis (PCA) can be implemented to get the highest re- models for book and author recognition.” IJCNLP. 2005.
markable feature which will give the accurate result to find [23] Rudman, Joe, et al. ”The State of Authorship Attribution Studies:(1)
The History and the Scope;(2) The Problems—Towards Credibility and
the original author. Validity.” Panel session from ACH/ALLC 1997 (1997).
Analyzing the statistical values, we may find some special [24] Burrows, John F. ”Word-patterns and story-shapes: The statistical anal-
words or types used by each author. We will take a particular ysis of narrative style.” Literary & Linguistic Computing 2.2 (1987):
61-70.
topic and paraphrase it according to each writer’s writing [25] Stamatatos, Efstathios, Nikos Fakotakis, and George Kokkinakis. ”Au-
patterns. Thus we can recreate the famous authors writing who tomatic authorship attribution.” Proceedings of the ninth conference
are no more among us so far. on European chapter of the Association for Computational Linguistics.
Association for Computational Linguistics, 1999.
As this is a recent field of science and literature, we hope that [26] Khmelev, Dmitri V., and Fiona J. Tweedie. ”Using markov chains for
Stylogenetics will go a long way to motivate others to work identification of writer.” Literary and linguistic computing 16.3 (2001):
on Bengali Literature. Also, a special feature for converting 299-307.
[27] Kukushkina, Olga V., Anatoly A. Polikarpov, and Dmitry V. Khmelev.
one writer’s writing into other writers can be worked on. ”Using literal and grammatical statistics for authorship attribution.”
Problems of Information Transmission 37.2 (2001): 172-184.
References
[1] Prapti Das, Rishmita Tasmim and Sabir Ismail, “An Experimental Study
of Stylometry in Bangla Literature”.
[2] Kim Luyckx, Walter Daelemans and Edward Vanhoutte, “Stylogenetics:
Clustering based stylistic analysis of literary corpora”.
[3] Luyckx, Kim, Walter Daelemans, and Edward Vanhoutte. ”Stylogenet-
ics: Clustering-based stylistic analysis of literary corpora.” Proceedings
of the 5th International Conference on Language Resources and Evalu-
ation (LREC’06), Genoa, Italy. 2006.
[4] Brennan, Michael Robert, and Rachel Greenstadt. ”Practical Attacks
Against Authorship Recognition Techniques.” IAAI. 2009.
[5] Peng, Roger D., and Nicolas W. Hengartner. ”Quantitative analysis of
literary styles.” The American Statistician 56.3 (2002): 175-185.
[6] D. I. Holmes, “A Stylometric Analysis of Mormon Scripture and Related
Texts”, Journal of the Royal Statistical Society. Series A (Statistics in
Society), Vol. 155, No. 1. (1992), pp. 91-120.

Writing Ethnographic Fieldnotes, Second Edition
From Everand
Writing Ethnographic Fieldnotes, Second Edition
Robert M. Emerson
3.5/5 (32)
Collaborative Writing in L2 Classrooms
From Everand
Collaborative Writing in L2 Classrooms
Neomy Storch
No ratings yet
Interchange 1 Unit 1A
No ratings yet
Interchange 1 Unit 1A
4 pages
Stylo Me Try
No ratings yet
Stylo Me Try
7 pages
zhkiani,+1.+Stylometry+of+Short+Stories_Zafar+Ullah
No ratings yet
zhkiani,+1.+Stylometry+of+Short+Stories_Zafar+Ullah
17 pages
Applied Linguistics: A Genre Analysis Of: Research Articles Results and Discussion Sections in Journals Published in Applied Linguistics
From Everand
Applied Linguistics: A Genre Analysis Of: Research Articles Results and Discussion Sections in Journals Published in Applied Linguistics
Veronica M. Mutinda
No ratings yet
T Anmoy Chakraborty: Proceedings of COLING 2012: Demonstration Papers
No ratings yet
T Anmoy Chakraborty: Proceedings of COLING 2012: Demonstration Papers
10 pages
An Approach To Detecting Writing Styles Based On Clustering Technique
No ratings yet
An Approach To Detecting Writing Styles Based On Clustering Technique
7 pages
3-Calix, K., et al. 2008
No ratings yet
3-Calix, K., et al. 2008
7 pages
Author Identification On Anonymous Regional Literature
No ratings yet
Author Identification On Anonymous Regional Literature
7 pages
Sanskrit Parsing: Based on the Theories of Śābdabodha
From Everand
Sanskrit Parsing: Based on the Theories of Śābdabodha
Amba Kulkarni
No ratings yet
File
No ratings yet
File
6 pages
Document Author Classification Using Parsed Language Structure
No ratings yet
Document Author Classification Using Parsed Language Structure
21 pages
Document Author Classification Using Parsed Language Structure
No ratings yet
Document Author Classification Using Parsed Language Structure
21 pages
A Classifier System For Author Recognition Using Synonym-Based Features
No ratings yet
A Classifier System For Author Recognition Using Synonym-Based Features
11 pages
A Classifier System For Author Recognition Using Synonym-Based Features
No ratings yet
A Classifier System For Author Recognition Using Synonym-Based Features
11 pages
Full
No ratings yet
Full
5 pages
Quantitative Patterns of Stylistic Influence in The Evolution of Literature
No ratings yet
Quantitative Patterns of Stylistic Influence in The Evolution of Literature
7 pages
Personality Profile of Fictional Characters in Books Using Natural Language Processing
No ratings yet
Personality Profile of Fictional Characters in Books Using Natural Language Processing
25 pages
Corpus Stylistic: Presented By: Quissa Marie M. Gonzales-BSED Presented To: Dr. Arjan Espiritu
No ratings yet
Corpus Stylistic: Presented By: Quissa Marie M. Gonzales-BSED Presented To: Dr. Arjan Espiritu
16 pages
Exploiting Stylistic Idiosyncrasies For Authorship
No ratings yet
Exploiting Stylistic Idiosyncrasies For Authorship
9 pages
Exploiting Stylistic Idiosyncrasies For Authorship
No ratings yet
Exploiting Stylistic Idiosyncrasies For Authorship
9 pages
Prose Analysis
From Everand
Prose Analysis
Hugo Raines
No ratings yet
Assignments across the Curriculum: A National Study of College Writing
From Everand
Assignments across the Curriculum: A National Study of College Writing
Dan Melzer
4/5 (1)
Kinematic Rhetoric: Non-Discursive, Time-Affect Images in Motion
From Everand
Kinematic Rhetoric: Non-Discursive, Time-Affect Images in Motion
Joddy Murray
No ratings yet
Linguistic modality in Shakespeare Troilus and Cressida: A casa study
From Everand
Linguistic modality in Shakespeare Troilus and Cressida: A casa study
Iolanda Plescia
No ratings yet
The Lines Between the Words
From Everand
The Lines Between the Words
Nabhan Ahmad
No ratings yet
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Communication in Drama: a Pragmatic Approach
From Everand
Communication in Drama: a Pragmatic Approach
Dr. Umesh S. Jagadale
No ratings yet
Automated Analysis of Bangla Poetry For Classification and
No ratings yet
Automated Analysis of Bangla Poetry For Classification and
7 pages
Department of Electronics & Communication Engineering MANIT Bhopal
No ratings yet
Department of Electronics & Communication Engineering MANIT Bhopal
6 pages
Language Craft
From Everand
Language Craft
Axel von Neumann
No ratings yet
Differentiation Among Scripts On The Basis of Histogram: Volume 2, Issue 3, May - June 2013
No ratings yet
Differentiation Among Scripts On The Basis of Histogram: Volume 2, Issue 3, May - June 2013
3 pages
Visualization in Stylometry
No ratings yet
Visualization in Stylometry
15 pages
Argamon Law Policy 2013 PDF
No ratings yet
Argamon Law Policy 2013 PDF
17 pages
An N Gram Based Approach To The Automati PDF
No ratings yet
An N Gram Based Approach To The Automati PDF
28 pages
Visualization Cluster Analysise
No ratings yet
Visualization Cluster Analysise
15 pages
Language Patterns
From Everand
Language Patterns
Talia Mercer
No ratings yet
Stylometry With R A Package For Computational Text Analysis
No ratings yet
Stylometry With R A Package For Computational Text Analysis
15 pages
Chinese Rhetoric and Writing: An Introduction for Language Teachers
From Everand
Chinese Rhetoric and Writing: An Introduction for Language Teachers
Andy Kirkpatrick
No ratings yet
Cross-Linguistic Authorship Attribution and Gender
No ratings yet
Cross-Linguistic Authorship Attribution and Gender
14 pages
Language, Linguistics, and Development Simplified
From Everand
Language, Linguistics, and Development Simplified
Narinder Mehra
No ratings yet
On The Design and Use of Non Traditional Authorship Attribution Methods
No ratings yet
On The Design and Use of Non Traditional Authorship Attribution Methods
5 pages
Authorshio Arabic Short Texts
No ratings yet
Authorshio Arabic Short Texts
13 pages
Intro Text Mining
No ratings yet
Intro Text Mining
83 pages
Can - Authorship Attribution Using Principal Component Analysis and Competitive Neural Networks
No ratings yet
Can - Authorship Attribution Using Principal Component Analysis and Competitive Neural Networks
16 pages
英汉小说语篇中话语标记功能的对比研究：英文
From Everand
英汉小说语篇中话语标记功能的对比研究：英文
徐欣，张媛
No ratings yet
Practicing Theory in Second Language Writing
From Everand
Practicing Theory in Second Language Writing
CSPtrade2
No ratings yet
2055-6819-1-PB
No ratings yet
2055-6819-1-PB
7 pages
Li 和 Tang - 2022 - Noun Phrase Complexity Influence of L1 (Mandarin)
No ratings yet
Li 和 Tang - 2022 - Noun Phrase Complexity Influence of L1 (Mandarin)
10 pages
Tacit and Explicit Understanding in Computer Support: Gerry Stahl's eLibrary, #2
From Everand
Tacit and Explicit Understanding in Computer Support: Gerry Stahl's eLibrary, #2
Gerry Stahl
No ratings yet
Thesis Meaning in Bengali
100% (3)
Thesis Meaning in Bengali
6 pages
Chris Stolte PHD Thesis
100% (2)
Chris Stolte PHD Thesis
6 pages
Stylometry with R_ A Package for Computational Text Analysis (1)
No ratings yet
Stylometry with R_ A Package for Computational Text Analysis (1)
16 pages
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
From Everand
Analysis of a Medical Research Corpus: A Prelude for Learners, Teachers, Readers and Beyond
Georgette Nicolas Jabbour
No ratings yet
On the Robustness of Authorship Attribution Based on Character _i
No ratings yet
On the Robustness of Authorship Attribution Based on Character _i
20 pages
DikshaTripathi
No ratings yet
DikshaTripathi
6 pages
Syntax Choices
From Everand
Syntax Choices
Hugo Raines
No ratings yet
Statistical Semantics: Fundamentals and Applications
From Everand
Statistical Semantics: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ben Shade
No ratings yet
Ben Shade
68 pages
Amancio 2013 Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript
No ratings yet
Amancio 2013 Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript
10 pages
ADMSHS Emp Tech Q2 M18 ICT Project For Website Traffic FV
100% (1)
ADMSHS Emp Tech Q2 M18 ICT Project For Website Traffic FV
20 pages
Chapter 2 Research Paper
No ratings yet
Chapter 2 Research Paper
3 pages
A Resource Center For Language and Culture Learners
No ratings yet
A Resource Center For Language and Culture Learners
2 pages
Steps in Developing Research Topic
No ratings yet
Steps in Developing Research Topic
15 pages
Cyntiya Firdha Az-Zahra Job Interview Vocabs
No ratings yet
Cyntiya Firdha Az-Zahra Job Interview Vocabs
3 pages
One Point Preaching
100% (3)
One Point Preaching
5 pages
Minor Project Sales Promotion
No ratings yet
Minor Project Sales Promotion
33 pages
Present Simple Positive
No ratings yet
Present Simple Positive
11 pages
Communication (From Latin "Communis", Meaning To Share) Is The
No ratings yet
Communication (From Latin "Communis", Meaning To Share) Is The
2 pages
Developing SMART Goals Rubric Aug 2011
100% (1)
Developing SMART Goals Rubric Aug 2011
1 page
Account Manager
No ratings yet
Account Manager
8 pages
Syllabus GE 5 Purposive Communication BSA
100% (2)
Syllabus GE 5 Purposive Communication BSA
8 pages
MedNet Group5
No ratings yet
MedNet Group5
12 pages
Casa Milano Advertising Report
No ratings yet
Casa Milano Advertising Report
17 pages
ENGL 158 - Letter Writing
No ratings yet
ENGL 158 - Letter Writing
35 pages
Activity 5 (Labadan-Cpe107)
No ratings yet
Activity 5 (Labadan-Cpe107)
2 pages
Essay Bullying
100% (2)
Essay Bullying
4 pages
Add A Selection Screen To A Table Maintenance D..
No ratings yet
Add A Selection Screen To A Table Maintenance D..
5 pages
Learn Cantonese 1 PDF
100% (1)
Learn Cantonese 1 PDF
164 pages
Chapter 5 - Grammar For IELTS
No ratings yet
Chapter 5 - Grammar For IELTS
19 pages
Daily Lesson LOG School Grade Level 11 Teacher Learning Area 21 Teaching Dates and Time Quarter 1 I. Objectives
No ratings yet
Daily Lesson LOG School Grade Level 11 Teacher Learning Area 21 Teaching Dates and Time Quarter 1 I. Objectives
3 pages
The Reflection About MST
No ratings yet
The Reflection About MST
5 pages
1st mod (2)
No ratings yet
1st mod (2)
43 pages
Espaol III Holistic Rubric For Webquest Mayas Incas Aztecs
No ratings yet
Espaol III Holistic Rubric For Webquest Mayas Incas Aztecs
2 pages
Past Tense Pronunciation
No ratings yet
Past Tense Pronunciation
2 pages
Sophie Stacey Media CV
No ratings yet
Sophie Stacey Media CV
2 pages
E SRS
No ratings yet
E SRS
1 page
MG-GY 6013 Org Behaviour - MISHKEN
No ratings yet
MG-GY 6013 Org Behaviour - MISHKEN
7 pages
Detach, Timers & RealTime QXDM Log
No ratings yet
Detach, Timers & RealTime QXDM Log
16 pages

A Machine Learning Approach For Stylometric Analysis of Bangla Literature

Uploaded by

A Machine Learning Approach For Stylometric Analysis of Bangla Literature

Uploaded by

2017 20th International Conference of Computer and Information Technology (ICCIT), 22-24 December, 2017

A Machine Learning Approach for Stylometric Analysis of Bangla

978-1-5386-1150-0/17/$31.00 ©2017 IEEE

Writer Words Writer Words

2) The rapidity of the use of Highest length word: Ana-

Figure 2. Rapidity of the use of Highest length word (Educational)

Figure 5. Frequency of Interrogative Sign (Political)

6) Recurrence of Pronoun according to Person: There are

Figure 3. Rapidity of the use of Highest length word (Political)

Figure 6. Recurrence of Pronoun according to Person (Educational)

7) Recurrence of Conjunction according to Classification:

Figure 9. Common Recognizable word frequency (Political)

Figure 7. Recurrence of Conjunction according to Classification (Educational)

8) Common Recognizable word frequency: Each writer has

Figure 10. Numeric word frequency (Educational)

mentality in a writer.We analyzed this feature on our data set

11) Optative word frequency: In the Bengali language,

You might also like