2007 - Shawar - Are Chatbots Useful in Education
2007 - Shawar - Are Chatbots Useful in Education
Chatbots are computer programs that interact with users using natural languages. This technology started in the 1960s; the aim was to see if chatbot systems could fool users that they were real humans. However, chatbot systems are not only built to mimic human conversation, and entertain users. In this paper, we investigate other applications where chatbots could be useful such as education, information retrival, business, and e-commerce. A range of chatbots with useful applications, including several based on the ALICE/AIML architecture, are presented in this paper. Chatbots sind Computerprogramme, die mit Benutzern in natrlicher Sprache kommunizieren. Die ersten Programme gab es in den 60er Jahren; das Ziel war festzustellen, ob Chatbots Benutzer davon berzeugen knnten, dass sie in Wirklichkeit Menschen seien. Chatbots werden aber nicht nur gebaut, um menschliche Kommunikation nachzuahmen und um Benutzer zu unterhalten. In diesem Artikel untersuchen wir andere Anwendungen fr Chatbots, zum Beispiel in Bildung, Suchmaschinen, kommerzielle Anwendungen und e-commerce. Wir stellen eine Reihe von Chatbots mit ntzlichen Anwendungen vor, einschliesslich mehrerer Chatbots, die auf der ALICE/AIML Architektur basieren.
1 Introduction The need of conversational agents has become acute with the widespread use of personal machines with the wish to communicate and the desire of their makers to provide natural language interfaces (Wilks, 1999) Just as people use language for human communication, people want to use their language to communicate with computers. Zadrozny et al. (2000) agreed that the best way to facilitate Human Computer Interaction (HCI) is by allowing users to express their interest, wishes, or queries directly and naturally, by speaking, typing, and pointing. This was the driver behind the development of chatbots. A chatbot system is a software program that interacts with users using natural language. Different terms have been used for a chatbot such as: machine conversation system, virtual agent, dialogue system, and chatterbot. The purpose of a chatbot system is to simulate a human conversation; the chatbot architecture integrates a language model and computational algo-
2 The ALICE Chatbot System A.L.I.C.E. (Articial Intelligence Foundation, 2007; Abu Shawar and Atwell, 2003a; Wallace, 2003) is the Articial Linguistic Internet Computer Entity, which was rst implemented by Wallace in 1995. Alices knowledge about English conversation patterns is stored in AIML les. AIML, or Articial Intelligence Mark-up Language, is a derivative of Extensible Mark-up Language (XML). It was developed by Wallace and the Alicebot free software community from 1995 onwards to enable people to input dialogue pattern knowledge into chatbots based on the A.L.I.C.E. open-source software technology. AIML consists of data objects called AIML objects, which are made up of units called topics and categories. The topic is an optional top-level element, has a name attribute and a set of categories related to that topic. Categories are the basic unit of knowledge in AIML. Each category is a rule for matching an input and converting to an output, and consists of a pattern, which matches against the user input, and a template, which
30
LDV-FORUM
The <that> tag is optional and means that the current pattern depends on a previous chatbot output. The AIML pattern is simple, consisting only of words, spaces, and the wildcard symbols _ and *. The words may consist of letters and numerals, but no other characters. Words are separated by a single space, and the wildcard characters function like words. The pattern language is case invariant. The idea of the pattern matching technique is based on nding the best, longest, pattern match. 2.1 Types of ALICE/AIML Categories There are three types of categories: atomic categories, default categories, and recursive categories. a. Atomic categories: are those with patterns that do not have wildcard symbols, _ and *, e.g.:
<category> <pattern>10 Dollars</pattern> <template>Wow, that is cheap. </template> </category>
In the above category, if the user inputs 10 dollars, then ALICE answers WOW, that is cheap. b. Default categories: are those with patterns having wildcard symbols * or _. The wildcard symbols match any input but they differ in their alphabetical order. Assuming the previous input 10 Dollars, if the robot does not nd the previous category with an atomic pattern, then it will try to nd a category with a default pattern such as:
<category> <pattern>10 *</pattern> <template>It is ten.</template> </category>
31
In this example <srai> is used to reduce the input to simpler form what is *. c.2 Divide and conquer
<category> <pattern>YES*</pattern> <template> <srai>YES</srai> <sr/> <template> </category>
The input is partitioned into two parts, yes and the second part; * is matched with the <sr/> tag. <sr/>=<srai><star/></srai> c.3 Synonyms
<category> <pattern>HALO</pattern> <template> <srai>Hello</srai> </template> </category>
The input is mapped to another form, which has the same meaning. 2.2 ALICE Pattern Matching Algorithm Before the matching process starts, a normalization process is applied for each input, to remove all punctuation; the input is split into two or more sentences if appropriate; and converted to uppercase. For example, if input is: I do not know. Do you, or will you, have a robots.txt le? Then after the normalization it will be: DO YOU OR WILL YOU HAVE A ROBOTS DOT TXT FILE.
32
LDV-FORUM
33
After applying the text processing module in phase two, the result is:
F72PS002: Hello PS000: Hello Donald
The second version of the program has a more general approach to nding the best match against user input from the training dialogue. Two machine learning categorygeneration techniques were adapted, the rst word approach, and the most signicant word approach. In the rst word approach we assumed that the rst word of an utterance may be a good clue to an appropriate response: if we cannot match the input against a complete corpus utterance, then at least we can try matching just the rst word of a corpus utterance. For each atomic pattern, we generated a default version that holds the rst word followed by wildcard to match any text, and then associated it with the same atomic template. One advantage of the Machine-Learning approach to re-training ALICE is that we can automatically build AIML from a corpus even if we dont understand the domain or even the language; to demonstrate this, the program was tested using the Corpus of Spoken Afrikaans (van Rooy, 2003). Unfortunately this approach still failed to satisfy our trial users, who found some of the responses of the chatbot were inappropriate; so instead of simply assuming that the rst word is the best signpost, we look for the word in the utterance with the highest information content, the word that is most specic to this utterance compared to other utterances in the corpus. This should be the
34
LDV-FORUM
35
keyword match is found such as Very interesting. Please go on., or Can you think of a special example?. Figure 2 shows an example of chatting with ELIZA. When ELIZA was released, at least some users believed that they were talking to a real therapist, and spent hours talking about their problems. Even though ELIZA was not able to understand, and a user can realise after a while
36
LDV-FORUM
of chatting that many ELIZA responses are extracted from users input, it was the inspiration for many modern chatbots which aim mainly to fool users that they are talking to another human as applied in the imitation game (Turing Test, Turing, 1950). After ELIZA, a lot of chatbots or human-computer dialogue systems have been developed either to simulate different ctional or real personalities using different algorithms of pattern matching, such as simple keyword-matching in ELIZA, or more linguistically-sophisticated models such as using Markov Models like MegaHAL (Hutchens and Alder, 1998). Another example used in this eld is ALICE, the chatbot engine we used in our research, which was built basically to entertain users and talk to them as a real person. ALICE won the Loebner prize (2003) competition three times in 2000, 2001, and 2004. The Loebner competition is the way used nowadays to judge how much a chatbot could convince a user that it is a real human by chatting for 10 minutes. Figure 3 shows an example of chatting with ALICE. In fact this conversation seems good, however if you try to chat more, you will probably gure out that: ALICE does not save the history of conversation. ALICE does not truly understand what you said; it gives you the responses from the knowledge domain stored in her brain. These are also the most common drawbacks in almost all chatbots. 5 A Chatbot as a Tool to Learn and Practice a Language We used our Java program described in section 3, to read a Corpus of Spoken Afrikaans (Korpus Gesproke Afrikaans) (van Rooy, 2003) and to convert it to the AIML format les. Since the corpus does not cover topics like greetings, some manual atomic categories were added for this purpose and the default ones were generated by the program
37
automatically. As a result two Afrikaans chatbots were generated: Afrikaana (2002), which speaks just Afrikaans, and a bilingual version speaking English and Afrikaans, named AVRA (2002); this was inspired by our observation that the Korpus Gesproke Afrikaans actually includes some English, as Afrikaans speakers are generally bilingual and switch between languages comfortably. We mounted prototypes of the chatbots on websites using Pandorabot service, and encouraged open-ended testing and feedback from remote users in South Africa. Unfortunately, users found that many responses were not related to the topic or nonsense. The reasons behind most of the users feedback can be related to three issues. Firstly the dialogue corpus context does not cover a wide range of domains, so Afrikaana can only talk about the domain of the training corpus. Secondly, the repeated approach that we used to solve the problem of determining the pattern and the template in case of more than two speakers may lead to incoherent transcripts: if the training corpus does not have straightforward equivalents of user and chatbot then it can be non-trivial to model turn-taking correctly in Machine-Learnt AIML (Abu Shawar and Atwell, 2005b). Thirdly, our machine-learnt models have not included linguistic analysis markup, such as grammatical, semantic or dialogue-act annotations (Atwell, 1996; Atwell et al., 2000), as ALICE/AIML makes no use of such linguistic knowledge in generating conversation responses. However, users found it an interesting tool to practise the language and enjoyed chatting, and we concluded that even with its key-word based matching technique, a chatbot could be used as a tool for unknown languages, where unknown means (i) unknown to the chatbot author/developer, and/or (ii) unknown to computational linguistics, that is, where there is a shortage of existing tools to deal with the languages.
38
LDV-FORUM
Analysis of all dialogues generated and feedback from students revealed that: 1256 users chatted with Alice, 88% of them chatted only once and did not come back to the site; the duration of chatting was short; 17% made positive comments such as: you are so nice, you are clever, etc, and 24% evaluated it negatively. In this respect, Jia concluded that the failure of this experiment is down to the pattern matching technique used in Alice which is based on key-word matching without any attempt to understand what is said. The topics of chatting covered every aspect in our daily life, for example: study, emotion, life, computer, free time, travel/world and job. 11.39% of students talk about English study, and exams, and 13% mentioned love, mostly students younger than 30 years old dealt with Alice as a friend rather than as a teacher, and told her some private emotional problems and experiences. Jia (2004b) concluded that the conversational chatbot should not only work as a teacher or learning partner with rich special knowledge, but also as a dear friend who may enjoy the joy and suffer the pain of the users. After that Jia (2004a) developed an intelligent Web-Based teaching system for foreign language learning which consists of: natural language mark-up language that labels grammar elements; natural language object model in Java which represents the grammatical elements; natural language database; a communication response mechanism which considers the discourse context, the world model and the personality of the users and of the system itself. In the same respect, Chantarotwong (2005) reported that responses of most chatbots are frequently predictable, redundant, lacking in personality, and having no memory of previous responses which could lead to very circular conversation. However, in contrast to these ndings, Fryer and Carpenter (2006) claimed that chatbots could provide a means of language practice for students anytime and virtually
39
40
LDV-FORUM
41
a.
Direct answer generated by FAQchat using the FAQ of the school of computing at University of Leeds Human: How can I remove a job from the printer? FAQchat: If you want to remove a job from the printer queue then find out the job number using the lpq command, and in a UNIX shell type: %lprm -P printername job_no You can only remove jobs that you have sent to the printer yourself - apart from Support that is. For more information look at: How do I remove a job from the printer queue? (Printing) Related links generated by FAQchat Human: How can I delete a job from a printer? FAQchat: You may find your answer in the following links: How do I check the printer queue? (Printing) How do I remove a job from the printer queue? (Printing) How do I check my printer credits? (Printing) How do I use the printer from a Windows machine? (Other Windows information ) Figure 5: Answers generated by FAQchat
b.
Nevertheless, Using a chatbot as an information retrieval system is not only restricted to the education eld. The YPA is a natural language dialogue system that allows users to retrieve information from British Telecoms Yellow pages (Kruschwitz et al., 1999, 2000). The yellow pages contain advertisements, with the advertiser name, and contact information. The YPA system returns addresses and if no address found, a conversation is started and the system asks users more details in order to give a user the required address. The YPA is composed of Dialog Manager, Natural Language front-end, Query Construction Component, and the Backend database. The Backend includes relational database that contains tables extracted from the Yellow pages. The conversation starts by accepting users input through a graphical user interface, then the dialogue manager sent the textual input to the Natural Language Fronted for parsing. After that, the parse tree is sent to the Query Constructed Component which translates the input into a database query to query the Backend database and returns the retrieved address. If no addresses found then the dialog manager start putting questions for the user to obtain more clarication. To evaluate the YPA, 75 queries were extracted from a query corpus, and a response sheet was prepared to see if the returned addresses were appropriate or not, how many dialog steps were necessary, the total number of addresses recalled and the number of those relevant to the original query. Results show that 62 out of 75 queries managed to return addresses, and 74% of those addresses were relevant to the original query. The YPA answers questions such as I need a plumber with an emergency service?, Which restaurants are there in Colchester high school?
42
LDV-FORUM
7 Chatbot Assistants in E-Commerce, Business, and other Domains Shopping assistant is one of the most persuasive sales in traditional commerce. Shopping assistant offer help in a store, provide additional information on products and simplify decision making process helping to nd a good that satises customers requirements and various constraints. (Bogdanovych et al., 2005) Happy Assistant is a natural language dialog-based navigation system that helps users access e-commerce sites to nd relevant information about products and services (Chai et al., 2000). The system is composed of three main modules: the presenta-
43
U: I am looking for a notebook for my consulting business S: Please describe your financial constraints. In this stage a list of most popular ones is displayed U: not important thing performance is essential S: are you looking for something that is top of the line? U: Yes, absolutely S: I have found something for you. The target notebook is displayed for the user. And beneath it a summary of the users request displayed to explain why this product is displayed. Figure 7: A sample of chatting with the Happy Assistant system
Evaluations and results showed that users found it easy to use, meeting the users needs, users like the idea that they could express their needs in their language, users feel that the computer did all the work for them, and moreover users found that the system reduced the interaction time. Sanelma (2003) is a ctional person to talk with in a museum, which provides background information concerning a certain piece of art. Sanelma is a 26 year old woman from Helsinki of the 30s as shown in gure 8. Rita (real time Internet technical assistant), an eGain graphical avatar, is used in the ABN AMRO Bank to help customer doing some nancial tasks such as a wire money transfer (Voth, 2005). If Rita does not understand, it can redirect the customer to another channel such as an e-mail or live chat.
44
LDV-FORUM
8 Conclusion We have surveyed several chatbot systems which succeed in practical domains like education, information retrieval, business, e-commerce, as well as for amusement. In the future, you could imagine Chatterbots acting as talking books for children, Chatterbots for foreign language instruction, and teaching Chatterbots in general. (Wallace et al., 2003). However, in the education domain Knill et al. (2004) concluded that the teacher is the backbone in the teaching process. Technology like computer algebra systems, multimedia presentations or chatbots can serve as ampliers but not replace a good guide. In general, the aim of chatbot designers should be: to build tools that help people, facilitate their work, and their interaction with computers using natural language; but not to replace the human role totally, or imitate human conversation perfectly. Finally, as Colby (1999) states, We need not take human-human conversation as the gold standard for conversational exchanges. If one had a perfect simulation of
45
Articial Intelligence Foundation (2007). The A. L. I. C. E. Articial Intelligence Foundation. Published online: http://www.alicebot.org oder http://alicebot.franz.com/. Atwell, E. (1996). Comparative evaluation of grammatical annotation models. In Sutcliffe, R., Koch, H.-D., and McElligott, A., editors, Industrial Parsing of Technical Manuals, pages 2546. Rodopi, Amsterdam. Atwell, E. (2005). Web chatbots: the next generation of speech systems? European CEO, NovemberDecember:142144. Atwell, E., Demetriou, G., Hughes, J., Schiffrin, A., Souter, C., and Wilcock, S. (2000). A comparative evaluation of modern english corpus grammatical annotation schemes. ICAME Journal, 24:723. AVRA (2002). Published talk?botid=daf612c52e3406bb. online: http://www.pandorabots.com/pandora/
Batacharia, B., Levy, D., A., R. C., Krotov, and Wilks, Y. (1999). CONVERSE: a conversational companion. In Wilks, Y., editor, Machine conversations, pages 205215. Kluwer, Boston/ Dordrecht/ London.
46
LDV-FORUM
Jia, J. (2004a). CSIEC (computer simulator in educational communication): An intelligent webbased teaching system for foreign language learning. In Kommers, P. and Richards, G., editors, Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2004, pages 41474152, Chesapeake, VA. AACE press. Jia, J. (2004b). The study of the application of a web-based chatbot system on the teaching of foreign languages. In Proceedings of the SITE2004 (The 15th annual conference of the Society for Information Technology and Teacher Education), pages 12011207. AACE press. Jurafsky, D. and Martin, J. (2000). Introduction. In Speech and Language Processing: an Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, pages 118. Prentice Hall, New Jersey. Kerfoot, B. P., Baker, H., Jackson, T. L., Hulbert, W. C., Federman, D. D., Oates, R. D., and DeWolf, W. C. (2006). A multi-institutional randomized controlled trial of adjuvant web-based teaching to medical students. Academic Medicine, 81(3):224230.
47
Kruschwitz, U., De Roeck, A., Scott, P., Steel, S., Turner, R., and Webb, N. (1999). Natural language access to yellow pages. In Third International conference on knowledge-based intelligent information engineering systems, pages 3437. Kruschwitz, U., De Roeck, A., Scott, P., Steel, S., Turner, R., and Webb, N. (2000). Extracting semistructured data-lessons learnt. In Proceedings of the 2nd international conference on natural language processing (NLP2000), pages 406417. Loebner, H. (2003). Home page of the loebner prize-the rst turing test. Published online: http://www.loebner.net/Prizef/loebner-prize.html. Mann, W. (2002). Dialog diversity corpus. rcf.usc.edu/billmann/diversity/DDivers-site.htm. Published online: http://www-
Molla, D. and Vicedo, J. (2007). Question answering in restricted domains: An overview. Computational Linguistics, 33(1):4161. Pandorabot (2002). Published online: http://www.pandorabots.com/pandora. Sanelma (2003). Published online: http://www.mlab.uiah./mummi/sanelma/. Schumaker, R. P., Ginsburg, M., Chen, H., and Liu, Y. (2007). An evaluation of the chat and knowledge delivery components of a low-level dialog system: The AZ-ALICE experiment. Decision Support Systems, 42(2):22362246. Turing, A. (1950). Computing machinery and intelligence. Mind, 49:433460. van Rooy, B. (2003). Transkripsiehandleiding van die Korpus Gesproke Afrikaans. [Transcription Manual of the Corpus Spoken Afrikaans.]. Potchefstroom University, Potchefstroom. Voth, D. (2005). Practical agents help out. IEEE Intelligent Systems, 20(2):46. Wallace, R. (2003). The Elements of AIML Style. A.L.I.C.E. Articial Intelligence Foundation, Inc. Wallace, R., Tomabechi, H., and Aimless, D. (2003). Chatterbots go native: Considerations for an eco-system fostering the development of articial life forms in a human world. Published online: http://www.pandorabots.com/pandora/pics/chatterbotsgonative.doc. Webber, G. M. (2005). Data representation and algorithms for biomedical informatics applications. PhD thesis, Harvard University. Weizenbaum, J. (1966). ELIZA A computer program for the study of natural language communication between man and machine. Communications of the ACM, 10(8):3645. Weizenbaum, J. (1967). Contextual understanding by computers. Communications of the ACM, 10(8):474480. Wilensky, R., Chin, D., Luria, M., Martin, J., Mayeld, J., and Wu, D. (1988). The berkeley unix consultant project. Computational Linguistics, 14(4):3584.
48
LDV-FORUM
49