Halliday Tagger PDF
Halliday Tagger PDF
Abstract
This paper reports the latest development of The Halliday Centre Tagger (the Tagger), an online platform provided with
semi-automatic features to facilitate text annotation and analysis. The Tagger features a web-based architecture with all functionalities
and file storage space provided online, and a theory-neutral design where users can define their own labels for annotating various kinds
of linguistic information. The Tagger is currently optimized for text annotation of Systemic Functional Grammar (SFG), providing by
default a pre-defined set of SFG grammatical features, and the function of automatic identification of process types for English verbs.
Apart from annotation, the Tagger also offers the features of visualization and summarization to aid text analysis. The visualization
feature combines and illustrates multi-dimensional layers of annotation in a unified way of presentation, while the summarization
feature categorizes annotated entries according to different SFG systems, i.e., transitivity, theme, logical-semantic relations, etc. Such
features help users identify grammatical patterns in an annotated text.
Keywords: The Halliday Centre Tagger, Systemic Functional Grammar, Corpus Annotation
1664
Figure 2: The annotation interface of the Halliday Centre Tagger
1665
5. Summary
The latest development of the Halliday Centre Tagger has
increased the automatic capability of the SFG annotation
tool. The web-based architecture provides users’
convenience in accessing the Tagger and managing
annotation tasks online, and supports collaboration on
annotation with other users. The theory-neutral design
allows users to flexibly define their own labels for
annotation of any kind of linguistic information. The
feature of semi-automatic process type identification can
Figure 3: The visualization feature raise users’ productivity. The features of visualization and
summarization aid users to carry out text analysis, in
project, a set of annotated speeches of 2,426 words were addition to merely doing annotation work. The Tagger is
selected as test data, which contain totally 238 clauses. currently employed in our project of developing an
For process type identification, 167 out of 231 process SFG-annotated corpus for corpus-based study and
types were correctly identified, i.e., an accuracy rate of development of related NLP technology for SFG, in which
72%. The major errors come from the WSD system, i.e., the present progress remains rudimentary. The ongoing
parts-of-speech are incorrectly identified or phrasal verbs progress and some intermediate statistics will be presented.
cannot be recognized.
6. Acknowledgements
4. Features for Text Analysis The research described in this paper was supported by
Apart from annotation, the Tagger also provides the City University of Hong Kong through the SRG grants
features of visualization and summarization to aid text 7002802 and 7004163.
analysis based on the annotation. Through reorganizing
the presentation of annotated entries, these features help 7. References
users to identify grammatical patterns in the annotated Chan, C.L., Yan, H., Lee, S.Y., Webster, J. and Wong, H.K.
texts. (2012). A database design for complex linguistic data in
collaborative Web application. In Proceedings of the
4.1 Visualization Second International Conference on Digital
The visualization feature combines and illustrates the Information and Communication Technology and it's
multi-dimensional annotation in a unified way of Applications (DICTAP). Bangkok, Thailand, pp.
presentation. Based on the Brat rapid annotation tool 159--165.
(Stenetorp et al., 2012), the multi-dimensional Chow, I.C. (2008). Constructing a Linguistic Resource of
meta-functions in a text are now visualized in a Verbs: An Ontology Engineering Approach. PhD
color-coded, interactive and presentable manner (Figure dissertation. City University of Hong Kong: Hong
3). It shows the distribution of different SFG annotated Kong.
entries in a text that helps one to intuitively identify and Chow, I.C. and Webster, J.J. (2007). Integration of
locate the occurrence of patterning. linguistic resources for verb classification: FrameNet
frame, WordNet verb and SUMO. Lecture Notes of
4.2 Summarization Computer Science LNCS, Vol. 4394, pp. 1--11.
The summarization feature categorizes annotated entries Chow, I.C. and Webster, J.J. (2008). Supervised clustering
according to different SFG systems, i.e., transitivity, of the WordNet verb hierarchy for Systemic Functional
theme, logical-semantic relations, etc. The use of various Process type identification. In Proceedings of
constituents in a text is systematically summarized, with International Conference on Global Interoperability
statistics showing their occurrence frequency (Figure 4). for Language Resources (ICGL 2008). Hong Kong, pp.
Users can opt for an overall or focused summary 51--58.
depending on their interest in the use of all or particular O’Donnell, M. (1995). From corpus to codings:
constituents. The summary is provided in a table format Semi-automating the acquisition of linguistic features.
popularly used in the study of SFG, saving users In Proceedings of the AAAI Spring Symposium on
conceivable time in preparing this kind of material on Empirical Methods in Discourse Interpretation and
their own. Generation. Stanford University, California.
Both features are fully automatic requiring no human O’Donnell, M. (2008). The UAM CorpusTool: Software
intervention. Users only need to work on the annotation. for corpus annotation and exploration. In Bretones
The visualized and summarized results can then be Callejas, Carmen M. et al. (eds) Applied Linguistics
generated by the system. Now: Understanding Language and Mind / La
Lingüística Aplicada Hoy: Comprendiendo el Lenguaje
y la Mente. Almería: Universidad de Almería, pp.
1433--1447.
1666
Figure 4: The Summarization feature
Pedersen, T. and Kolhatkar, V. (2009). WordNet:: Yan, H., and Webster, J. (2012). Collaborative annotation
SenseRelate::AllWords—A broad coverage word sense and visualization of functional and discourse structures.
tagger that maximimizes semantic relatedness. In In Proceedings of the 24th Conference on
Proceedings of the North American Chapter of the Computational Linguistics and Speech Processing
Association for Computational Linguistics - Human (ROCLING).Taiwan, pp. 366--374.
Language Technologies Conference (NAACL-HLT Yan, H., and Webster, J. (2013). A corpus-based approach
2009). Boulder, CO, pp. 17--20. to linguistic function. In Proceedings of the 27th
Sugimoto, T., Ito, N., Iwashita, S. and Sugeno, M. (2005). Pacific Asia Conference on Language, Information,
A computational framework for text processing based and Computation (PACLIC 27). Taiwan, pp. 215--221.
on systemic functional linguistics. In Proceedings of
the 1st Computational Systemic Functional Grammar
Conference. University of Sydney, pp. 2--11.
Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou,
S. and Tsujii, J. (2012). Brat: a Web-based tool for
NLP-assisted text annotation. In Proceedings of the
Demonstrations Session at the 13th Conference of the
European Chapter of the Association for
Computational Linguistics (EACL). Avignon, France,
pp. 102--107.
Wong, B. T.M., Pun, C. F.K. and Webster, J.J. (2013). The
Halliday Centre Tagger for corpus-based study of
Systemic Functional Linguistics. In Proceedings of the
7th Annual International Free Linguistics Conference,
Hong Kong.
Wu, C. (2000) Modelling Linguistic Resources: A
Systemic-functional Approach. PhD thesis, Department
of Linguistics, Macquarie University, Sydney,
Australia.
1667