SlideShare a Scribd company logo
Name: Ahmad Mashhood
Subject: Computational
Linguistics
Assignment Topic
Summarization of single documents technical articles by computer
summarizer tools.
Submitted to: Sir. Aamir Wali
FAST- NU Lahore Campus
Summarization:
Hovey, E. H. (2005) defined summary as a text which is formulated on the
criteria of “Significant portion of information from the original text ” and the
criteria of its length in comparisonwith the original text which should not be more
than the half of the original text.
Mani and Maybury (1999) described the process of“TextSummarization” they
considered it a process like “distillation” in which the most essential and
significant information from one or more text is collected and this information is
presented in an “abridge form”.
Automated Summarization:
Human made summaries generally done using human intelligent
capabilities. The advancement in computer processingsystems and Natural
language processing opened a new domain of research, whose focus was to
producehuman like abstractive summaries of single or multiple document texts.
The use of computer programs, online tools, resulted in “auto abstract” a term
coined by Luhen (1958). He was the first personto give a significant work in the
field of automated summarization.
Needof automated summaries:
The need of automated summaries is based on well-defined
purposes and goals. With the advancement of extensive text on the computer web
pages, document archives, various newspaper articles, reports, based on the same
event it became difficult to read this kind of extensive information by human
beings. It requires time, human intelligence resources, which results in difficulty in
decision making process.If automated summaries are provided in no time
constraints, it will save human resources, effort, and will facilitate decision
making process.
Goalof Automated Summarization research:
The goal of this domain of research of Natural
language processing (NLP) and artificial intelligence is to achieve the generation
of automated abstracts with high human like similarity. Although more work is
needed to move forward from extractive summaries.
Difference betweenabstractand extract summary:
The basic difference between abstract and extract is
(1)In extracts different words or sentences are selected from the original text
and then they are combined together using a chronological sequence of the
original text. Key words are also identified by using extracting technique.
(2) While abstracts are better oriented and sequenced, in which words are
paraphrased or new words are used and they have or should have the ability
to replace the original text. Research is happening to achieve this level.
Types of summaries:
The summaries can be generally classified on the basis of extraction or abstraction
but there are many new kinds of research oriented summaries are produced which
are following, it can be
Outline of a document text, main heading of a news articles, snippets, which
are formed by giving a summary of a web page, when we searchthrough
searchengine.
It can be single document summary or multi document summary.
Generic summaries: These summaries gave us significant information as their
focus is not on the kind or user relevant information.
Query based Summaries: when computer have gave answer to complex questions
it uses the process ofsummarization after information retrieval and gave a user
relevant information based summary. Snippets are also summaries
Single documents summarization of technical articles:
The summaries which are formed by single documents of technical articles are
generally extracts. There are three general steps or problems as quoted by Martin
and Juffrusky. Whenever a computer programme has to summarize a single
document text article.
(1). which part of the original text content should be selected?
The content selectionforsummarization should be at the level of sentences while
summarizing single documents technical article. It is generally assumed before
programming a summarizer tool.
(2). the second problem is related to the arrangement of the extracted sentences.
This ordering of information decides the structures of the summary.
(3). the third problem is to make the arranged sentences fit into the context of the
summary. Which is known as sentence realization.
In order to achieve this stage we have to reject certain portion of sentences while,
certain portion are considerimportant for contextual clarity. Non-significant
phrases are removed, many sentences showing similar words are placed together to
make a coherent summary.
Flow diagram of a generic single document summarizer;
(1).ContentSelection:
It can be of two kinds
(A). Unsupervised content selection:
It is just like classifying sentences using classifier.
Which labels each sentence, with a binary label
(Important vs unimportant) or (extract worthy vs not extract worthy).
Simplest unsupervised algorithm as devised by (Luhen, 1958) it refers towards selection of more
salient or
Information carrying sentences. Can be calculated by frequency method but usual now day’s
salience is calculated by using weighting scheme.
Tf-idf
Weight (wi) = tfij multiply idfi
Supervised content selection:
Classification:
Position:T1, p2S1, P3S1, P4S1, P1S1, P2S2
Cue phrases:in short, in conclusion, in summary. Etc.
Word significance,
Sentence length, shorter one,
Cohesion: lexical chains, more terms for chains is a significant sentence,
Probability:
P (extra worthy(s) |f1, f2, f3…..Fn). > 0.5
Alignment:
Alignment algorithm such as HMMs (Jing, 2002), parallel corporacan also be
used.
Sentence simplification:
It is also known as sentence “ compression” uses algorithm of a parser or partial
parser , which uses rules of elimination purposed byZarjic et al (2007), Corney et
al (2006) and Vander et al (2007 a)
Remove. Appositives, attribution clauses, Abbreviations without named entities,
initial adverbials.
Evaluation of the Summarizers:
Recall: it is the fraction of sentences chosen by human that were identified by the
system correctly;
Recall = | System –human select overlap|
| Sentence selection by human|
Precision: it is the fraction of system sentence which were identified by it
correctly.
F-1 Score= it refers towards harmonic mean of precision and recall
F1 = 2. Precision .recall
Precision + Recall
Automated metrics ROUGE (Recall-oriented Understudy for Gisting Evaluation).
It is much efficient as compare to other methods.
References:
(1). Hovy, E. H. Automated Text Summarization. In R. Mitkov (ed), The Oxford
Handbookof Computational Linguistics, chapter 32, pages 583–598. Oxford
University Press, 2005.
(2). [39] Mani, I., House, D., Klein, G., et al. The TIPSTER SUMMAC Text
Summarization Evaluation. In Proceedings of EACL, 1999.
(3). Luhn, H., P. The Automatic Creation of Literature Abstracts. In Inderjeet Mani
and Mark Marbury, editors, Advances in Automatic Text Summarization. MIT
Press, 1999
Summarization in Computational linguistics

More Related Content

What's hot (19)

PDF
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
mlaij
 
PDF
05 handbook summ-hovy
Sagar Dabhi
 
PDF
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
IJET - International Journal of Engineering and Techniques
 
PDF
Ijarcet vol-3-issue-1-9-11
Dhabal Sethi
 
PDF
Improvement of Text Summarization using Fuzzy Logic Based Method
IOSR Journals
 
PDF
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
ijnlc
 
PDF
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
IJNLC Int.Jour on Natural Lang computing
 
PPTX
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Quinsulon Israel
 
PDF
Implementation of Urdu Probabilistic Parser
Waqas Tariq
 
PDF
A survey on phrase structure learning methods for text classification
ijnlc
 
PDF
A statistical model for gist generation a case study on hindi news article
IJDKP
 
PDF
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ijnlc
 
PDF
Cl35491494
IJERA Editor
 
PDF
Elevating forensic investigation system for file clustering
eSAT Publishing House
 
PDF
Elevating forensic investigation system for file clustering
eSAT Journals
 
PDF
A survey on sentence fusion techniques of abstractive text summarization
IJERA Editor
 
PPTX
Text summarization
Akash Karwande
 
PPT
Query based summarization
damom77
 
PDF
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
ijctcm
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
mlaij
 
05 handbook summ-hovy
Sagar Dabhi
 
[IJET-V1I6P17] Authors : Mrs.R.Kalpana, Mrs.P.Padmapriya
IJET - International Journal of Engineering and Techniques
 
Ijarcet vol-3-issue-1-9-11
Dhabal Sethi
 
Improvement of Text Summarization using Fuzzy Logic Based Method
IOSR Journals
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
ijnlc
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
IJNLC Int.Jour on Natural Lang computing
 
Dissertation defense slides on "Semantic Analysis for Improved Multi-document...
Quinsulon Israel
 
Implementation of Urdu Probabilistic Parser
Waqas Tariq
 
A survey on phrase structure learning methods for text classification
ijnlc
 
A statistical model for gist generation a case study on hindi news article
IJDKP
 
ANALYSIS OF MWES IN HINDI TEXT USING NLTK
ijnlc
 
Cl35491494
IJERA Editor
 
Elevating forensic investigation system for file clustering
eSAT Publishing House
 
Elevating forensic investigation system for file clustering
eSAT Journals
 
A survey on sentence fusion techniques of abstractive text summarization
IJERA Editor
 
Text summarization
Akash Karwande
 
Query based summarization
damom77
 
An Approach To Automatic Text Summarization Using Simplified Lesk Algorithm A...
ijctcm
 

Similar to Summarization in Computational linguistics (20)

PDF
Automatic Text Summarization Using Natural Language Processing (1)
Don Dooley
 
PDF
Article Summarizer
Jose Katab
 
PPTX
3__Python - Tool Text summarization.pptx
ranyangfelix
 
PDF
A Survey on Automatic Text Summarization
IRJET Journal
 
PDF
A domain specific automatic text summarization using fuzzy logic
IAEME Publication
 
PPTX
Keyword_extraction.pptx
BiswarupDas18
 
PDF
IRJET- Automatic Recapitulation of Text Document
IRJET Journal
 
PDF
An automatic text summarization using lexical cohesion and correlation of sen...
eSAT Publishing House
 
PDF
IRJET- A Survey Paper on Text Summarization Methods
IRJET Journal
 
PPTX
Automatic keyword extraction.pptx
BiswarupDas18
 
PDF
Text Summarization Talk @ Saama Technologies
Siddhartha Banerjee
 
PDF
IRJET - Text Summarizer.
IRJET Journal
 
PDF
NLP Based Text Summarization Using Semantic Analysis
INFOGAIN PUBLICATION
 
PDF
Automatic Text Summarization using Natural Language Processing
IRJET Journal
 
PDF
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
mlaij
 
PDF
Evaluation of Techniques for Automatic Text Extraction
Omar Azzam
 
PPTX
Tldr
Narayana Murthy
 
PDF
Automatic Text Summarization: A Critical Review
IRJET Journal
 
PDF
Design of optimal search engine using text summarization through artificial i...
TELKOMNIKA JOURNAL
 
PDF
Automation tool for evaluation of the quality of nlp based
IAEME Publication
 
Automatic Text Summarization Using Natural Language Processing (1)
Don Dooley
 
Article Summarizer
Jose Katab
 
3__Python - Tool Text summarization.pptx
ranyangfelix
 
A Survey on Automatic Text Summarization
IRJET Journal
 
A domain specific automatic text summarization using fuzzy logic
IAEME Publication
 
Keyword_extraction.pptx
BiswarupDas18
 
IRJET- Automatic Recapitulation of Text Document
IRJET Journal
 
An automatic text summarization using lexical cohesion and correlation of sen...
eSAT Publishing House
 
IRJET- A Survey Paper on Text Summarization Methods
IRJET Journal
 
Automatic keyword extraction.pptx
BiswarupDas18
 
Text Summarization Talk @ Saama Technologies
Siddhartha Banerjee
 
IRJET - Text Summarizer.
IRJET Journal
 
NLP Based Text Summarization Using Semantic Analysis
INFOGAIN PUBLICATION
 
Automatic Text Summarization using Natural Language Processing
IRJET Journal
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
mlaij
 
Evaluation of Techniques for Automatic Text Extraction
Omar Azzam
 
Automatic Text Summarization: A Critical Review
IRJET Journal
 
Design of optimal search engine using text summarization through artificial i...
TELKOMNIKA JOURNAL
 
Automation tool for evaluation of the quality of nlp based
IAEME Publication
 
Ad

More from Ahmad Mashhood (20)

DOCX
English reading strategic instructions effectiveness on reading comprehension
Ahmad Mashhood
 
DOCX
curriculum designing and development
Ahmad Mashhood
 
PPTX
English vocabulary and basic grammar teaching by morphology
Ahmad Mashhood
 
PPTX
Vocabulary and grammar teaching through Morphological Awareners
Ahmad Mashhood
 
PPT
Phonological features of English consonants spoken by Shina Speakers
Ahmad Mashhood
 
DOCX
Annotated biblography
Ahmad Mashhood
 
DOCX
Critical summary of a Research article
Ahmad Mashhood
 
PPTX
Presentation on language and the brain
Ahmad Mashhood
 
DOC
Linguistics, noam chomsky
Ahmad Mashhood
 
PPT
Elements of Comedy
Ahmad Mashhood
 
DOCX
CALL based software or tool evaluation
Ahmad Mashhood
 
PPTX
Need analysis of teachrrs and students slides
Ahmad Mashhood
 
PPTX
Critical Analysis of a research article
Ahmad Mashhood
 
DOCX
Critical reflection
Ahmad Mashhood
 
DOCX
What is a spam ?
Ahmad Mashhood
 
PPTX
Research article main components
Ahmad Mashhood
 
PPT
Research proposal
Ahmad Mashhood
 
PPTX
Ict and langauge teaching
Ahmad Mashhood
 
PPTX
Annotation of the article
Ahmad Mashhood
 
PPTX
Animal and human language
Ahmad Mashhood
 
English reading strategic instructions effectiveness on reading comprehension
Ahmad Mashhood
 
curriculum designing and development
Ahmad Mashhood
 
English vocabulary and basic grammar teaching by morphology
Ahmad Mashhood
 
Vocabulary and grammar teaching through Morphological Awareners
Ahmad Mashhood
 
Phonological features of English consonants spoken by Shina Speakers
Ahmad Mashhood
 
Annotated biblography
Ahmad Mashhood
 
Critical summary of a Research article
Ahmad Mashhood
 
Presentation on language and the brain
Ahmad Mashhood
 
Linguistics, noam chomsky
Ahmad Mashhood
 
Elements of Comedy
Ahmad Mashhood
 
CALL based software or tool evaluation
Ahmad Mashhood
 
Need analysis of teachrrs and students slides
Ahmad Mashhood
 
Critical Analysis of a research article
Ahmad Mashhood
 
Critical reflection
Ahmad Mashhood
 
What is a spam ?
Ahmad Mashhood
 
Research article main components
Ahmad Mashhood
 
Research proposal
Ahmad Mashhood
 
Ict and langauge teaching
Ahmad Mashhood
 
Annotation of the article
Ahmad Mashhood
 
Animal and human language
Ahmad Mashhood
 
Ad

Recently uploaded (20)

PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PDF
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
Import Data Form Excel to Tally Services
Tally xperts
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 

Summarization in Computational linguistics

  • 1. Name: Ahmad Mashhood Subject: Computational Linguistics Assignment Topic Summarization of single documents technical articles by computer summarizer tools. Submitted to: Sir. Aamir Wali FAST- NU Lahore Campus
  • 2. Summarization: Hovey, E. H. (2005) defined summary as a text which is formulated on the criteria of “Significant portion of information from the original text ” and the criteria of its length in comparisonwith the original text which should not be more than the half of the original text. Mani and Maybury (1999) described the process of“TextSummarization” they considered it a process like “distillation” in which the most essential and significant information from one or more text is collected and this information is presented in an “abridge form”. Automated Summarization: Human made summaries generally done using human intelligent capabilities. The advancement in computer processingsystems and Natural language processing opened a new domain of research, whose focus was to producehuman like abstractive summaries of single or multiple document texts. The use of computer programs, online tools, resulted in “auto abstract” a term coined by Luhen (1958). He was the first personto give a significant work in the field of automated summarization. Needof automated summaries: The need of automated summaries is based on well-defined purposes and goals. With the advancement of extensive text on the computer web pages, document archives, various newspaper articles, reports, based on the same event it became difficult to read this kind of extensive information by human beings. It requires time, human intelligence resources, which results in difficulty in decision making process.If automated summaries are provided in no time constraints, it will save human resources, effort, and will facilitate decision making process. Goalof Automated Summarization research: The goal of this domain of research of Natural language processing (NLP) and artificial intelligence is to achieve the generation of automated abstracts with high human like similarity. Although more work is needed to move forward from extractive summaries.
  • 3. Difference betweenabstractand extract summary: The basic difference between abstract and extract is (1)In extracts different words or sentences are selected from the original text and then they are combined together using a chronological sequence of the original text. Key words are also identified by using extracting technique. (2) While abstracts are better oriented and sequenced, in which words are paraphrased or new words are used and they have or should have the ability to replace the original text. Research is happening to achieve this level. Types of summaries: The summaries can be generally classified on the basis of extraction or abstraction but there are many new kinds of research oriented summaries are produced which are following, it can be Outline of a document text, main heading of a news articles, snippets, which are formed by giving a summary of a web page, when we searchthrough searchengine. It can be single document summary or multi document summary. Generic summaries: These summaries gave us significant information as their focus is not on the kind or user relevant information. Query based Summaries: when computer have gave answer to complex questions it uses the process ofsummarization after information retrieval and gave a user relevant information based summary. Snippets are also summaries Single documents summarization of technical articles: The summaries which are formed by single documents of technical articles are generally extracts. There are three general steps or problems as quoted by Martin and Juffrusky. Whenever a computer programme has to summarize a single document text article.
  • 4. (1). which part of the original text content should be selected? The content selectionforsummarization should be at the level of sentences while summarizing single documents technical article. It is generally assumed before programming a summarizer tool. (2). the second problem is related to the arrangement of the extracted sentences. This ordering of information decides the structures of the summary. (3). the third problem is to make the arranged sentences fit into the context of the summary. Which is known as sentence realization. In order to achieve this stage we have to reject certain portion of sentences while, certain portion are considerimportant for contextual clarity. Non-significant phrases are removed, many sentences showing similar words are placed together to make a coherent summary. Flow diagram of a generic single document summarizer; (1).ContentSelection: It can be of two kinds (A). Unsupervised content selection: It is just like classifying sentences using classifier. Which labels each sentence, with a binary label (Important vs unimportant) or (extract worthy vs not extract worthy). Simplest unsupervised algorithm as devised by (Luhen, 1958) it refers towards selection of more salient or Information carrying sentences. Can be calculated by frequency method but usual now day’s salience is calculated by using weighting scheme. Tf-idf Weight (wi) = tfij multiply idfi
  • 5. Supervised content selection: Classification: Position:T1, p2S1, P3S1, P4S1, P1S1, P2S2 Cue phrases:in short, in conclusion, in summary. Etc. Word significance, Sentence length, shorter one, Cohesion: lexical chains, more terms for chains is a significant sentence, Probability: P (extra worthy(s) |f1, f2, f3…..Fn). > 0.5 Alignment: Alignment algorithm such as HMMs (Jing, 2002), parallel corporacan also be used. Sentence simplification: It is also known as sentence “ compression” uses algorithm of a parser or partial parser , which uses rules of elimination purposed byZarjic et al (2007), Corney et al (2006) and Vander et al (2007 a) Remove. Appositives, attribution clauses, Abbreviations without named entities, initial adverbials. Evaluation of the Summarizers: Recall: it is the fraction of sentences chosen by human that were identified by the system correctly; Recall = | System –human select overlap| | Sentence selection by human|
  • 6. Precision: it is the fraction of system sentence which were identified by it correctly. F-1 Score= it refers towards harmonic mean of precision and recall F1 = 2. Precision .recall Precision + Recall Automated metrics ROUGE (Recall-oriented Understudy for Gisting Evaluation). It is much efficient as compare to other methods. References: (1). Hovy, E. H. Automated Text Summarization. In R. Mitkov (ed), The Oxford Handbookof Computational Linguistics, chapter 32, pages 583–598. Oxford University Press, 2005. (2). [39] Mani, I., House, D., Klein, G., et al. The TIPSTER SUMMAC Text Summarization Evaluation. In Proceedings of EACL, 1999. (3). Luhn, H., P. The Automatic Creation of Literature Abstracts. In Inderjeet Mani and Mark Marbury, editors, Advances in Automatic Text Summarization. MIT Press, 1999