A Semantic Based Multi-Platform IoT Integration Approach From Sensors To Chatbots
A Semantic Based Multi-Platform IoT Integration Approach From Sensors To Chatbots
Figure 1 High level architecture of our chatbot for responding to vehicle-related complaints
In this paper, we describe the challenges and lessons learned from KB construction from text aims at converting the unstructured
deploying a virtual assistant for suggesting repairs of equipment- noisy textual data into a structured task-specific actionable
related complaints. We demonstrate on two popular frameworks, knowledge that captures entities (elements of interest (EOI)), their
namely ASK and gActions. Here, we focus on understanding and attributes, and their relationships (Pujara & Singh, 2018). KBs are
responding to vehicle-related problems, as an example equipment, key components for many AI and knowledge-driven tasks such as
which could be initiated by a driver or a technician. Along the question answering (Hao, et al., 2017), decision support systems
paper, we try to answer three questions: 1) how can we facilitate (Dikaleh, Pape, Mistry, Felix, & Sheikh, 2018), recommender
the acquisition of domain-specific knowledge about entities systems (Zhang, Yuan, Lian, Xie, & Ma, 2016), and others. KB
related to equipment problems?; 2) how much knowledge off-the- construction has been an attractive research topic for decades
shelf frameworks can digest effectively; 3) how accurately these resulting in many general KBs such as DBPedia (Auer, et al.,
frameworks built-in LU engines can identify entities in user 2007), Freebase (Bollacker, Evans, Paritosh, Sturge, & Taylor,
utterances?. 2008), Google Knowledge Vault (Dong, et al., 2014), ConceptNet
(Speer & Havasi, 2013), NELL (Carlson, et al., 2010), YAGO
Due to the scalability and accuracy limitations we experienced
(Hoffart, Suchanek, Berberich, & Weikum, 2013), and domain-
with ASK and gActions, we describe an alternative scalable
specific KBs such as Amazon Product Graph, Microsoft
pipeline for: 1) extracting the knowledge about equipment
Academic Graph (Sinha, et al., 2015).
components and their associated problems entities, and 2) learning
to identify such entities in user utterances. We show through The first step toward building such KBs is to extract information
evaluation on real dataset that the proposed framework about target entities, attributes, and relationships between them.
understanding accuracy scales better with large volume of Several information extraction frameworks have been proposed in
domain-specific entities being up to 30% more accurate. literature including OpenIE (Banko, Cafarella, Soderland,
Broadhead, & Etzioni, 2007), DeepDive (Niu, Zhang, Ré, &
Shavlik, 2012), Fonduer (Wu, et al., 2018), Microsoft QnA Maker
2 Background and Related Work (Shaikh, 2019), and others. Most of current information extraction
Figure 1 shows the main components of our chatbot. In a nutshell, systems utilize Natural Language Processing (NLP) techniques
user utterance is firstly transcribed into text using the Automatic such as Part of Speech Tags (POS), shallow parsing, and
Speech Recognition (ASR) module. Then, the LU module dependency parse trees to extract linguistic features for
identifies the entities (component, problem) in the input. recognizing entities.
Afterwards, the parsed input is passed to the dialog manager Despite the extensive focus in the academic and industrial labs on
which keeps track of the conversation state and decides the next constructing general purpose KBs, identifying component names
response (e.g., recommended repair) which is finally uttered back
and their associated problems in text has been lightly studied in
to the user using the language generation module.
literature.
As we mentioned earlier, we focus here on the cognitive-intensive
task of creating the Knowledge Base (KB) of target entities on
which the LU engine will be trained.
Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities Shalaby et al.
Figure 2 The Knowledge base construction framework is a pipeline of five main stages
Table 1 Sample vehicle complaint utterances more <has-a> relationships (e.g., coolant gauge <has-a> not
(problems in red, and components in blue) reading gauge <has-a> not reading). Unsupervised curation
Complaint Utterance and purification of extracted entities is another key differentiator
low oil pressure of our framework compared to prior work. The proposed
fuel filter is dirty framework utilizes a state-of-the-art deep learning for sequence
leak at oil pan tagging to annotate raw sentences with component(s) and
coolant reservoir cracked problem(s).
pan leaking water
coolant tank is leaking 3 A Pipeline for KB Extraction
Existing chatbot development frameworks require knowledge
The closest work to ours is (Niraula, Whyatt, & Kao, 2018) who about target entities8 which would appear in users utterances. For
proposed an approach to identify component names in service and each entity type (e.g., component, problem, etc.), an extensive
maintenance logs using a combination of linguistic analysis and vocabulary of possible values of such entities should be provided
machine learning. The authors start with seed head nouns by the virtual assistant developer. These vocabularies are then
representing high level part names (e.g., valve, switch). Then used to train the underlying LU engine to identify entities in user
extract all n-grams ending with these head nouns. Afterwards, the utterance.
extracted n-grams are purified using heuristics. Finally, the We propose a pipeline for creating a KB of entities related to
purified part names are used to create an annotated training data vehicle complaint understanding from short texts, specifically
for training a Conditional Random Fields (CRF) model (Lafferty, posts in public Questions and Answers (QA) forums.
McCallum, & Pereira, 2001) to extract part names in raw Nevertheless, the design of the proposed framework is flexible
sentences. and generic enough to be applied to several other maintenance
Similarly, (Chandramouli, et al., 2013) introduced a simple scenarios of different equipment given a corpus with mentions of
approach using n-gram extraction from service logs. Given a seed the same target entities. Table 1 shows sample complaint
of part types, the authors extract all n-grams, with maximum of utterances from QA posts. As we can notice, most of these
three tokens, which end with these part types. Then candidate n- utterances are short sentences composed of a component along
grams are scored using a mutual information metric, and then with an ongoing problem.
purified using POS tagging. As shown in Figure 2, the proposed KB construction system is
Our framework automatically construct a KB of equipment organized as a pipeline. We start with a domain-specific corpus
components and their problems entities with “component <has-a> that contains our target entities. We then process the corpus
problem” relationships. Unlike previous work, we go one step through five main stages including preprocessing, candidate
further by extracting not only components and part names, but generation using POS-based syntactic rules, embedding-based
also their associated problems. Unlike (Niraula, Whyatt, & Kao, filtration and curation, and enrichment through training a
2018), we start with syntactic rules rather than seed head nouns. sequence-to-sequence (seq2seq) slot tagging model. Our pipeline
The rules require less domain knowledge and should yield higher produces two outputs:
coverage. We then expand the constructed KB through two steps:
1) reorganizing the extracted vocabulary of components into a A KB of three types of entities including car options (car,
hierarchy using a simple traversal mechanism introducing <is-a> truck, vehicle, etc.), components, and their associated
relationships (e.g., stop light <is-a> light), and 2) aggregating all
the problems associated with subtype components in the hierarchy 8
Slots in ASK terminology
and associating them with supertype components introducing
Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities Shalaby et al.
Table 2 POS-based syntactic rules for candidate entity generation (problems in red, and components in blue)
Utterance POS Rule
replace water pump VB (NN\S*\s?)+ (NN\S*\s?)+ component
low oil pressure JJ (NN\S*\s?)+ JJ problem, (NN\S*\s?)+ component
fuel filter is dirty (NN\S*\s?)+ VBZ JJ (NN\S*\s?)+ component, JJ problem
coolant reservoir cracked (NN\S*\s?)+ VBD (NN\S*\s?)+ component, VBD problem
pan leaking water (NN\S*\s?)+ VBG (NN\S*\s?)+ (NN\S*\s?)+ component, VBG (NN\S*\s?)+ problem
coolant tank is leaking (NN\S*\s?)+ VBZ VBG (NN\S*\s?)+ component, VBG problem
problems. These entities can be used to populate the First, all sentences are extracted and parsed using the Stanford
vocabulary needed to build the voice-based agent in both CoreNLP library (Manning, et al., 2014). Second, we employ
ASK and DialogFlow. linguistic heuristics to define chunks of tokens corresponding to
A tagging model which we call Sequence-to-Sequence component and problem entities based on their POS tags.
Tagger (S2STagger). Besides its value in enriching the KB Specifically, we define the rules considering only the most
with new entities, S2Stagger can also be used as a standalone
frequent POS patterns in our dataset.
LU system that’s able to extract target entities from raw user
utterances. Table 2 shows the rules defined for the most frequent six POS
patterns. For example, whenever a sentence POS pattern matches
In the following sub-sections, we will describe in more details an adjective followed by sequence of nouns of arbitrary length (JJ
each of the stages presented in Figure 2. (NN\S*\s?)+$) (e.g. “low air pressure”), the adjective chunk is
considered a candidate problem entity (“low”) and the noun
3.1 Preprocessing
sequence chunk is considered a candidate component entity (“air
Dealing with noisy text is challenging. In the case of equipment pressure”). It is worth mentioning that, the defined heuristics are
troubleshooting, service and repair records and QA posts include designed to capture components with long multi-term names
complaint, diagnosis, and correction text which represent highly which are common in our corpus (e.g., “intake manifold air
rich resources of components and problems that might arise with pressure sensor”). We also discard irrelevant tokens in the
each of them. Nevertheless, these records are typically written by extracted chunk such as determiners (a, an, the) preceding noun
technicians and operators who have time constraints and may lack sequences and others.
language proficiency. Consequently, the text will be full of typos,
spelling mistakes, inconsistent use of vocabulary, and domain- 3.3 Curation
specific jargon and abbreviations. For these reasons, cautious use In this stage, we prune incorrect and noisy candidate entities using
of preprocessing is required to reduce such inconsistencies and weak supervision. We found that most of these wrong extractions
avoid inappropriate corrections. We perform the following were due to wrong annotations from the POS tagger due to the
preprocessing steps: noisy nature of the text. For example, “clean” in “clean tank” was
Lowercase. incorrectly tagged as adjective rather than verb causing “clean” to
Soft normalization: By removing punctuation characters be added to the candidate problems pool. Another example
separating single characters (e.g., a/c, a.c, a.c. ac). “squeals” in “belt squeals” was tagged as plural noun rather than
Hard normalization: By collecting all frequent tokens that are verb causing “belt squeals” to be added to the candidate
prefixes of a larger token and manually replace them with their components pool. To alleviate these issues, we employ different
normalized version (e.g., temp temperature, eng engine, weak supervision methods to prune incorrectly extracted entities
diag diagnose…etc). as follows:
Dictionary-based normalization: We create a dictionary of Statistical-based pruning: A simple pruning rule is to eliminate
frequent abbreviations and use it to normalize tokens in the candidates that rarely appear in our corpus with frequency less
original text (e.g., chk, ch, ck check) than F.
Manual tagging: We manually tag terms as vehicle, car, truck, Linguistic-based pruning: These rules focus on the number and
etc. as a car-option entity. structure of tokens in the candidate entity. For example, a
candidate entity cannot exceed T terms, must have terms with a
3.2 Candidate Generation minimum of L letters each, and cannot have alphanumeric
To extract candidate entities, we define a set of syntactic rules tokens.
based on POS tags of complaint utterances.
Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities Shalaby et al.
Figure 3 S2STagger utilizes LSTM encoder-decoder to generate the IOB tags of input utterance. Attention layer is used to learn to
softly align the input/output sequences
Figure 4 Component hierarchy construction through backward traversal. Left – traversal through “engine oil pressure gauge” resulting
in three higher level components. Right – example hierarchy with “sensor” as the root supertype component
Embedding-based pruning: Fixed-length distributed However, depending solely on rules limits the system recognition
representation models (aka embeddings) have proven effective capacity to mentions in structures that match these predefined
for representing words and entities in many NLP tasks rules. Moreover, it is infeasible to handcraft rules that cover all
(Mikolov, Sutskever, Chen, Corrado, & Dean, 2013) (Shalaby, possible complaint structures limiting the system recall. It is also
Zadrozny, & Jin, 2019). We exploit the fact that these models expected that new components and problems will emerge,
can effectively capture similarity and relatedness relationships especially in highly dynamic domains, and running the rules on an
between pairs of words and entities using their embeddings. To updated snapshot of the corpus would be an expensive solution.
this end, we employ the model proposed by (Shalaby, A more practical and efficient solution is to build a machine
Zadrozny, & Jin, 2019) to obtain the vector representations of learning model to tag raw sentences and identify chunks of tokens
all candidates. Then, we normalize all vectors and compute the that correspond to our target entities. To this end, we adopt a
similarity score between pairs of candidates using the dot neural attention-based seq2seq model called S2STagger to tag raw
product between their corresponding vectors. Afterwards, we sentences and extract target entities from them. To train
prune all candidate problems that do not have at least P other S2STagger, we create a dataset from utterances that match our
problem entities with a minimum of Sp similarity score. And syntactic rules and label terms in these utterances using the inside-
prune all components that do not have at least C other outside-beginning (IOB) notation (Ramshaw & Marcus, 1999).
component entities with a minimum of Sc similarity score. For example, “the car air pressure is low” would be tagged as
Sentiment-based pruning: Utterances that express problems and “<O> <car-options> <B-component> <I- component> <O>
issues usually have negative sentiment. With this assumption, <B-problem>”. As the extractions from the syntactic rules
we prune all candidate problem entities that are not followed by curation are highly accurate, we expect to have
semantically similar to at least one word from the list of highly accurate training data for our tagging model. It is worth
negative sentiment words created by (Hu & Liu, 2004). Here, mentioning that we only use utterances with mentions of entities
we measure the similarity score using the embeddings of not pruned during the curation phase.
candidate problem entities and the sentiment words as in the
embedding-based pruning. Sentiment-based pruning helps As shown in Figure 3, S2STagger utilizes an encoder-decoder
discarding wrong extractions such as “clean” in “clean tank” Recurrent Neural Network architecture (RNN) with Long-Short
where “clean” is tagged incorrectly as an adjective. Term Memory (LSTM) cells (Gers, Schmidhuber, & Cummins,
1999). During encoding, raw terms in each sentence are processed
3.4 Slot tagging (S2STagger) sequentially through an RNN and encoded as a fixed-length
A desideratum of any information extraction system is to be vector that captures all the semantic and syntactic structures in the
lexical-agnostic; i.e., to be able to generalize well and identify sentence. Then, a decoder RNN takes this vector and produces a
unknown entities that have no mentions in the original dataset. sequence of IOB tags, one for each term in the input sentence.
Another desideratum is to be structure-agnostic; i.e., to be able to Because each tag might depend on one or more terms in the input
generalize well and identify seen or new entities in utterances with but not the others, we utilize an attention mechanism so that the
different structures from those in the original dataset. Rule-based network learns what terms in the input are more relevant for each
candidate extraction typically yields highly precise extractions.
Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities Shalaby et al.
in our KB to both models as slot values and entities for AlexaSkill exact match accuracy in which each and every term in the
and DiagFlow respectively. The third model is S2STagger trained utterance must be tagged same as in ground truth for the utterance
on all the tagged utterances in the QA dataset. It’s important to to be considered correctly tagged. As we can notice, S2STagger
emphasize that, the training utterances for S2STagger are the gives the best exact accuracy outperforming the other models
same from which KB entities and utterance structures are significantly. Interestingly, with the utterances that have same
extracted and fed to both AlexaSkill and DiagFlow. Due to model structure but different entities, S2STagger tagging accuracy is
size limitations imposed by these frameworks, we couldn’t feed close to same structure / same entities utterances. This indicates
the raw utterances to both agents as we did with S2STagger. that our model is more lexical-agnostic and can generalize better
than the other two models. AlexaSkill model comes second, while
We create an evaluation dataset of utterances that were manually
DiagFlow model can only tag few utterances correct, indicating its
tagged. The dataset describes vehicle-related complaints and
heavy dependence on exact entity matching and limited
shown in Table 4. The utterances are chosen such that three
generalization ability.
aspects of the model are assessed. Specifically, we would like to
quantitatively measure the model accuracy on utterances with: 1) On the other hand, the three models seem more sensitive to
same syntactic structures and same entities as in the training variation in utterance syntactic structure than its lexical variation.
utterances (119 in total), 2) same syntactic structures but different As we can notice in Table 6, the three models fail to correctly tag
entities from the training utterances (75 in total), and 3) different almost all the utterances with different structure (S2STagger tags
syntactic structures but same entities as in the training utterances 2 of 20 correct). Even when we measure the accuracy on the
(20 in total). It is worth mentioning that, to alleviate the out-of- entity extraction level not the whole utterance, both AlexaSkill
vocabulary (OOV) problem, different entities are created from and DiagFlow models still struggle with understanding the
terms in the model vocabulary. This way, incorrect tagging can different structure utterances. S2Stagger, on the other hand, can
only be attributed to the model inability to generalize to entities tag 9 component entities and 4 problem entities correctly, which is
tagged differently from the training ones. still lower than its accuracy on the utterances with same structure.
Table 5 shows that accuracy of S2STagger compared to the other
models on the car complaints evaluation dataset. We report the
Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities Shalaby et al.
Table 7 Success and failure tagging of the three models. Bold indicates incorrect tagging
Utterance Tags
Same structure/Same entities
my car steering wheel wobbles Ground Truth <O> <car-option> <B-component> <I-component > <B-problem>
AlexaSkill <O> <car-option> <B-component> <I-component > <B-problem>
DiagFlow <O> <car-option> <B-component> <I-component > <B-problem>
S2STagger <O> <car-option> <B-component> <I-component > <B-problem>
car has low coolant Ground Truth <car-option> <O> <B-problem> <B-component>
AlexaSkill <car-option> <O> <B-problem> <B-component>
DiagFlow fail
S2STagger <car-option> <O> <B-problem> <B-component>
Same structure/different entities
clutch pedal is hard to push Ground Truth <B-component> <I-component> <O> <B-problem> <I-problem> <I-problem>
AlexaSkill <B-component> <I-component> <O> <B-problem> <I-problem> <I-problem>
DiagFlow fail
S2STagger <B-component> <I-component> <O> <B-problem> <I-problem> <I-problem>
wrapped brake rotor Ground Truth <B-problem> <B-component> <I-component>
AlexaSkill fail
DiagFlow fail
S2STagger <B-problem> <B-component> <I-component>
Different structure/same entities
the steering wheel in my car Ground Truth <O> <B-component> <I-component> <O> <O> <car-option> <B-problem>
wobbles
AlexaSkill fail
DiagFlow fail
S2STagger <O> <B-problem> <B-component> <I-component> <O> <car-option> <B-
problem>
low car coolant Ground Truth <B-problem> <car-option> <B-component>
AlexaSkill <B-problem> <O> <B-component>
DiagFlow fail
S2STagger <B-problem> <car-option> <B-component>
problem(s) in the noisy user complaints text and focusing on these scale approach to probabilistic knowledge fusion.
entities only while predicting the repair. ConferenceProceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery and
The results demonstrate superior performance of the proposed
data mining, (pp. 601-610).
knowledge construction pipeline including S2STagger, the slot
Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to
tagging model, over popular systems such as ASK and
forget: Continual prediction with LSTM.
DialogFlow in understanding vehicle-related complaints. One
Hao, Y., Zhang, Y., Liu, K., He, S., Liu, Z., Wu, H., & Zhao, J.
important and must have feature is to increase the effectiveness of
(2017). An end-to-end model for question answering
S2STagger and off-the-shelf NLU systems to handle utterances
over knowledge base with cross-attention combining
with different from training structures. We think augmenting the
global knowledge . ConferenceProceedings of the 55th
training data with carrier phrases is one approach. Additionally,
Annual Meeting of the Association for Computational
training the model to paraphrase and tag jointly could be a more
Linguistics (Volume 1: Long Papers) , 1, pp. 221-231.
genuine approach as it does not require to manually define the
Hoffart, J., Suchanek, F. M., Berberich, K., & Weikum, G.
paraphrasing or carrier phrases patterns.
(2013). YAGO2: A spatially and temporally enhanced
There were also some of the issues that impacted the development knowledge base from Wikipedia . Artificial Intelligence,
of this research. For example, the limited scalability of off-the- 194, pp. 28-61.
shelf NLU systems: ASK model size cannot exceed 1.5MB, while Hu, M., & Liu, B. (2004). Mining and summarizing customer
DialogFlow Agents cannot contain more than 10K different reviews. ConferenceProceedings of the tenth ACM
entities. Deployment of the constructed KB on any of these SIGKDD international conference on Knowledge
platforms would be limited to a subset of the extracted discovery and data mining, (pp. 168-177).
knowledge. Therefore, it seems mandatory for businesses and Kumar, A., Gupta, A., Chan, J., Tucker, S., Hoffmeister, B.,
R&D labs to develop in-house NLU technologies to bypass such Dreyer, M., . . . others. (2017). Just ASK: building an
limitations. architecture for extensible self-service spoken language
understanding. arXiv preprint arXiv:1711.00549.
REFERENCES Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & random fields: Probabilistic models for segmenting and
Ives, Z. (2007). Dbpedia: A nucleus for a web of open labeling sequence data .
data. Springer. López, G., Quesada, L., & Guerrero, L. A. (2017). Alexa vs. Siri
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine vs. Cortana vs. Google Assistant: a comparison of
translation by jointly learning to align and translate . speech-based natural user interfaces. International
arXiv preprint arXiv:1409.0473. Conference on Applied Human Factors and
Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., & Ergonomics, (pp. 241-250).
Etzioni, O. (2007). Open information extraction from Luong, M.-T., Pham, H., & Manning, C. D. (2015). Effective
the web. IJCAI, 7, pp. 2670-2676. approaches to attention-based neural machine
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. translation . arXiv preprint arXiv:1508.04025.
(2008). Freebase: a collaboratively created graph Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., &
database for structuring human knowledge. McClosky, D. (2014). The Stanford CoreNLP natural
ConferenceProceedings of the 2008 ACM SIGMOD language processing toolkit . ConferenceProceedings of
international conference on Management of data, (pp. 52nd annual meeting of the association for
1247-1250). computational linguistics: system demonstrations , (pp.
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr, E. 55-60).
R., & Mitchell, T. M. (2010). Toward an architecture Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J.
for never-ending language learning. AAAI, 5, p. 3. (2013). Distributed representations of words and
Chandramouli, A., Subramanian, G., Bal, D., Ao, S. I., Douglas, phrases and their compositionality. Advances in neural
C., Grundfest, W. S., & Burgstone, J. (2013). information processing systems, (pp. 3111-3119).
Unsupervised extraction of part names from service Niraula, N. B., Whyatt, D., & Kao, A. (2018). A Novel Approach
logs. ConferenceProceedings of the World Congress on to Part Name Discovery in Noisy Text .
Engineering and Computer Science, 2. ConferenceProceedings of the 2018 Conference of the
Dikaleh, S., Pape, D., Mistry, D., Felix, C., & Sheikh, O. (2018). North American Chapter of the Association for
Refine, restructure and make sense of data visually, Computational Linguistics: Human Language
using IBM Watson Studio. ConferenceProceedings of Technologies, Volume 3 (Industry Papers) , 3, pp. 170-
the 28th Annual International Conference on Computer 176.
Science and Software Engineering, (pp. 344-346). Niu, F., Zhang, C., Ré, C., & Shavlik, J. W. (2012). DeepDive:
Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, Web-scale Knowledge-base Construction using
K., . . . Zhang, W. (2014). Knowledge vault: A web-
Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities Shalaby et al.