0% found this document useful (0 votes)
115 views

Halliday Tagger PDF

The document summarizes The Halliday Centre Tagger, an online platform that facilitates semi-automatic annotation and analysis of texts based on Systemic Functional Grammar. It features a web-based design allowing for collaborative annotation. Some grammatical options are semi-automatically identified to reduce annotation efforts. Annotated texts can be visualized and summarized to help users identify linguistic patterns. The platform's goal is to improve productivity over offline annotation tools through its online and collaborative capabilities as well as features that guide the annotation process.

Uploaded by

ram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views

Halliday Tagger PDF

The document summarizes The Halliday Centre Tagger, an online platform that facilitates semi-automatic annotation and analysis of texts based on Systemic Functional Grammar. It features a web-based design allowing for collaborative annotation. Some grammatical options are semi-automatically identified to reduce annotation efforts. Annotated texts can be visualized and summarized to help users identify linguistic patterns. The platform's goal is to improve productivity over offline annotation tools through its online and collaborative capabilities as well as features that guide the annotation process.

Uploaded by

ram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

The Halliday Centre Tagger: An Online Platform for Semi-automatic

Text Annotation and Analysis


Billy T.M. Wong,1 Ian C. Chow,2 Jonathan J. Webster,3 Hengbin Yan4
1,2
Department of Translation, The Chinese University of Hong Kong
Shatin, NT, Hong Kong SAR
3,4
The Halliday Centre for Intelligent Applications of Language Studies, City University of Hong Kong
Tat Chee Avenue, Kowloon, Hong Kong SAR
1
[email protected], [email protected], [email protected], [email protected]

Abstract
This paper reports the latest development of The Halliday Centre Tagger (the Tagger), an online platform provided with
semi-automatic features to facilitate text annotation and analysis. The Tagger features a web-based architecture with all functionalities
and file storage space provided online, and a theory-neutral design where users can define their own labels for annotating various kinds
of linguistic information. The Tagger is currently optimized for text annotation of Systemic Functional Grammar (SFG), providing by
default a pre-defined set of SFG grammatical features, and the function of automatic identification of process types for English verbs.
Apart from annotation, the Tagger also offers the features of visualization and summarization to aid text analysis. The visualization
feature combines and illustrates multi-dimensional layers of annotation in a unified way of presentation, while the summarization
feature categorizes annotated entries according to different SFG systems, i.e., transitivity, theme, logical-semantic relations, etc. Such
features help users identify grammatical patterns in an annotated text.

Keywords: The Halliday Centre Tagger, Systemic Functional Grammar, Corpus Annotation

online platform provided with semi-automatic features to


1. Introduction facilitate SFG-based text annotation and analysis.
Annotation is a process to enrich a text with linguistic Annotation is performed online with support for
information which is implicitly present, that becomes a collaboration. In order to reduce annotation efforts, some
resource reusable for tasks such as language study and grammatical options are semi-automatically identified.
development of natural language processing (NLP) The annotated entries in a text can be visualized and
technology. Many available corpora, such as the Penn summarized, to help users intuitively identify and locate
Treebank, 1 British National Corpus 2 and American the occurrence of patterning.
National Corpus, 3 are annotated with basic linguistic
information like part-of-speech, name entities and 2. The Halliday Centre Tagger
syntactic structures, with the aid of automatic taggers. The Tagger is featured for its web-based architecture. A
Our current work lies in the development of a corpus user simply needs to upload a text to the Tagger and then
annotated according to Systemic Functional Grammar use available functionalities to annotate. All texts and
(SFG) (Yan & Webster, 2013). SFG describes the annotation are stored online in the user’s account and are
realization of meaning in language through a web-accessible. Collaborative annotation is supported on
paradigmatic set of functional-semantic choices in the the same text by multiple users either synchronously or
functional-semantic aspect. The annotation of SFG asynchronously. Although the Tagger is optimized for
involves a multi-dimensional analysis of text based on SFG-annotation, it remains theory-neutral. Apart from a
three meta-functions, i.e., ideational, interpersonal and pre-defined set of SFG grammatical features provided by
textual, each representing a layer of meaning with a set of default, users can also define their own labels for
options for annotators to pick. Although there are a few annotating other types of linguistic information (Figure
tools for SFG-annotation, such as Systemic Coder 1).
(O’Donnell, 1995), SysFan (Wu, 2000), LBIS Coder
(Sugimoto et al., 2005) and UAM Corpustool (O’Donnell,
2008), they are limited by virtue of being standalone
offline applications, offering no support for collaboration,
and providing no features to assist manual annotation,
thus highlighting the need for a better tool for the task.
The Halliday Centre Tagger (the Tagger) 4 (Chan et al., Figure 1: User-defined labels for annotation
2012; Yan & Webster, 2012; Wong et al., 2013) is an
The Tagger is also designed for supporting
1 multi-dimentional annotation with different kinds of
http://catalog.ldc.upenn.edu/LDC99T42
2
http://www.natcorp.ox.ac.uk lingustic information annotated on the same text. They are
3
http://www.americannationalcorpus.org represented as layers below a text span, each showing a
4
http://hallidaycentre.cityu.edu.hk/06_hctagger.html type of linguistic information (Figure 2).

1664
Figure 2: The annotation interface of the Halliday Centre Tagger

resources including GUM, 6 FrameNet 7 and SUMO 8


3. Semi-automatic Annotation (Chow & Webster, 2007, 2008). For the English verbs in
Manual SFG annotation requires considerable human WordNet, over 95% of verb-senses are identified with
effort as it involves multi-dimensional text analysis. corresponding SFG process types.
Current progress in development of automatic SFG The identification of process type for an English verb
parsing remains rudimentary. Nevertheless, based on our begins with word sense disambiguation (WSD), in order
previous work, we have developed and integrated into the to first identify the possible sense(s) of the verb based on
Tagger some semi-automated features intended to its occurrence context. We employ a WordNet-based
improve the productivity of annotation. WSD system WordNet::SenseRelate:: AllWords (SR-AW)
SFG annotation involves text analysis in terms of the (Pedersen & Kolhatka, 2009) for this purpose, which
three meta-functions: ideational, interpersonal and textual. offers a satisfying performance in determining word
Each has its own set of grammatical and semantic sense.9 The process type of each possible word sense is
categories denoting different underlying functions from then identified with the use of the lexicographical
different perspectives. database.
We focus on semi-automatic analysis of ideational The feature of process type identification is provided on
meta-function which consists of logical and experiential the Tagger by eliminating (i.e., greying out) the
meanings. The experiential focuses on construing the flux improbable options of process type when annotating
of experience and is structurally realized by the system of English verbs. Upon user’s selection of a correct process
transitivity. Taking the clause as the most basic type, the improbable options of semantic roles
lexico-grammatical unit, clause constituents include a (participants) in the annotating clause which are
process, possibly one or more participants and dependent on the choice of process type are then greyed
circumstances. Process is typically a verb, defining the out. Users only need to pick out the correct ones from a
type of experiential meaning and governing the semantic reduced set of options, thus substantially reducing manual
roles of participants. effort in doing annotation, especially with respect to
semantic roles whose possible options are greater than
3.1 Process Type Identification twenty.
An experiment was performed to verify the effectiveness
We provide the Tagger with the feature of semi-automatic
of this semi-automatic features. In our current annotation
identification of process types for English verbs. This is
based on a lexicographical database of SFG process types 6
http://glotta.ntua.gr/StateoftheArt/Ontologies/newUM.html
developed by Chow (2008), in which the process type of 7
https://framenet.icsi.berkeley.edu/fndrupal
each English verb-sense in WordNet 5 is identified via 8
http://www.ontologyportal.org
utilizing various interoperable lexical and ontological 9
According to Pedersen and Kolhatka (2009), SR-AW yields the
F-measure results of 61% in SemCor, 59% in SSENSEVAL-2
5
http://wordnet.princeton.edu and 54% in SENSEVAL-3.

1665
5. Summary
The latest development of the Halliday Centre Tagger has
increased the automatic capability of the SFG annotation
tool. The web-based architecture provides users’
convenience in accessing the Tagger and managing
annotation tasks online, and supports collaboration on
annotation with other users. The theory-neutral design
allows users to flexibly define their own labels for
annotation of any kind of linguistic information. The
feature of semi-automatic process type identification can
Figure 3: The visualization feature raise users’ productivity. The features of visualization and
summarization aid users to carry out text analysis, in
project, a set of annotated speeches of 2,426 words were addition to merely doing annotation work. The Tagger is
selected as test data, which contain totally 238 clauses. currently employed in our project of developing an
For process type identification, 167 out of 231 process SFG-annotated corpus for corpus-based study and
types were correctly identified, i.e., an accuracy rate of development of related NLP technology for SFG, in which
72%. The major errors come from the WSD system, i.e., the present progress remains rudimentary. The ongoing
parts-of-speech are incorrectly identified or phrasal verbs progress and some intermediate statistics will be presented.
cannot be recognized.
6. Acknowledgements
4. Features for Text Analysis The research described in this paper was supported by
Apart from annotation, the Tagger also provides the City University of Hong Kong through the SRG grants
features of visualization and summarization to aid text 7002802 and 7004163.
analysis based on the annotation. Through reorganizing
the presentation of annotated entries, these features help 7. References
users to identify grammatical patterns in the annotated Chan, C.L., Yan, H., Lee, S.Y., Webster, J. and Wong, H.K.
texts. (2012). A database design for complex linguistic data in
collaborative Web application. In Proceedings of the
4.1 Visualization Second International Conference on Digital
The visualization feature combines and illustrates the Information and Communication Technology and it's
multi-dimensional annotation in a unified way of Applications (DICTAP). Bangkok, Thailand, pp.
presentation. Based on the Brat rapid annotation tool 159--165.
(Stenetorp et al., 2012), the multi-dimensional Chow, I.C. (2008). Constructing a Linguistic Resource of
meta-functions in a text are now visualized in a Verbs: An Ontology Engineering Approach. PhD
color-coded, interactive and presentable manner (Figure dissertation. City University of Hong Kong: Hong
3). It shows the distribution of different SFG annotated Kong.
entries in a text that helps one to intuitively identify and Chow, I.C. and Webster, J.J. (2007). Integration of
locate the occurrence of patterning. linguistic resources for verb classification: FrameNet
frame, WordNet verb and SUMO. Lecture Notes of
4.2 Summarization Computer Science LNCS, Vol. 4394, pp. 1--11.
The summarization feature categorizes annotated entries Chow, I.C. and Webster, J.J. (2008). Supervised clustering
according to different SFG systems, i.e., transitivity, of the WordNet verb hierarchy for Systemic Functional
theme, logical-semantic relations, etc. The use of various Process type identification. In Proceedings of
constituents in a text is systematically summarized, with International Conference on Global Interoperability
statistics showing their occurrence frequency (Figure 4). for Language Resources (ICGL 2008). Hong Kong, pp.
Users can opt for an overall or focused summary 51--58.
depending on their interest in the use of all or particular O’Donnell, M. (1995). From corpus to codings:
constituents. The summary is provided in a table format Semi-automating the acquisition of linguistic features.
popularly used in the study of SFG, saving users In Proceedings of the AAAI Spring Symposium on
conceivable time in preparing this kind of material on Empirical Methods in Discourse Interpretation and
their own. Generation. Stanford University, California.
Both features are fully automatic requiring no human O’Donnell, M. (2008). The UAM CorpusTool: Software
intervention. Users only need to work on the annotation. for corpus annotation and exploration. In Bretones
The visualized and summarized results can then be Callejas, Carmen M. et al. (eds) Applied Linguistics
generated by the system. Now: Understanding Language and Mind / La
Lingüística Aplicada Hoy: Comprendiendo el Lenguaje
y la Mente. Almería: Universidad de Almería, pp.
1433--1447.

1666
Figure 4: The Summarization feature

Pedersen, T. and Kolhatkar, V. (2009). WordNet:: Yan, H., and Webster, J. (2012). Collaborative annotation
SenseRelate::AllWords—A broad coverage word sense and visualization of functional and discourse structures.
tagger that maximimizes semantic relatedness. In In Proceedings of the 24th Conference on
Proceedings of the North American Chapter of the Computational Linguistics and Speech Processing
Association for Computational Linguistics - Human (ROCLING).Taiwan, pp. 366--374.
Language Technologies Conference (NAACL-HLT Yan, H., and Webster, J. (2013). A corpus-based approach
2009). Boulder, CO, pp. 17--20. to linguistic function. In Proceedings of the 27th
Sugimoto, T., Ito, N., Iwashita, S. and Sugeno, M. (2005). Pacific Asia Conference on Language, Information,
A computational framework for text processing based and Computation (PACLIC 27). Taiwan, pp. 215--221.
on systemic functional linguistics. In Proceedings of
the 1st Computational Systemic Functional Grammar
Conference. University of Sydney, pp. 2--11.
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou,
S. and Tsujii, J. (2012). Brat: a Web-based tool for
NLP-assisted text annotation. In Proceedings of the
Demonstrations Session at the 13th Conference of the
European Chapter of the Association for
Computational Linguistics (EACL). Avignon, France,
pp. 102--107.
Wong, B. T.M., Pun, C. F.K. and Webster, J.J. (2013). The
Halliday Centre Tagger for corpus-based study of
Systemic Functional Linguistics. In Proceedings of the
7th Annual International Free Linguistics Conference,
Hong Kong.
Wu, C. (2000) Modelling Linguistic Resources: A
Systemic-functional Approach. PhD thesis, Department
of Linguistics, Macquarie University, Sydney,
Australia.

1667

You might also like