In this paper, we introduce Dual-CNN, a semantically-enhanced deep learning model to target the problem of event detection in crisis situations from
social media data. A layer of semantics is added to a traditional Convolutional Neural Network (CNN) model to capture the contextual information that is generally scarce in short, ill-formed social media messages. Our results show that
our methods are able to successfully identify the existence of events, and event types (hurricane, floods, etc.) accurately (> 79% F-measure), but the performance of the model significantly drops (61% F-measure) when identifying fine-grained event-related information (affected individuals, damaged infrastructures, etc.).
These results are competitive with more traditional Machine Learning models, such as SVM.
http://oro.open.ac.uk/49639/1/event_detection.pdf
IRJET- Fake News Detection and Rumour Source IdentificationIRJET Journal
This document discusses methods for detecting fake news and identifying the source of rumors on social media. It proposes using Bayesian classification to classify information into real or fake categories based on the outputs. If the combined outputs from the classes do not match, then the information is considered fake. It also discusses using a reverse dissemination strategy to identify a group of suspects for the original rumor source, rather than examining each individual. This addresses issues with identifying sources. The method aims to identify the source node based on which nodes have accepted the rumor. Machine learning and natural language processing techniques are used to detect fake news from article content.
FAKE NEWS DETECTION WITH SEMANTIC FEATURES AND TEXT MININGijnlc
Nearly 70% of people are concerned about the propagation of fake news. This paper aims to detect fake news in online articles through the use of semantic features and various machine learning techniques. In this research, we investigated recurrent neural networks vs. the naive bayes classifier and random forest classifiers using five groups of linguistic features. Evaluated with real or fake dataset from kaggle.com, the best performing model achieved an accuracy of 95.66% using bigram features with the random forest classifier. The fact that bigrams outperform unigrams, trigrams, and quadgrams show that word pairs as opposed to single words or phrases best indicate the authenticity of news.
This document summarizes research on detecting fake news using text analysis techniques. It discusses how social media consumption of news has increased and the challenges of identifying trustworthy sources. Various types of fake news are described based on visual/text content or the targeted audience. Methods for detection include clustering similar news reports and using predictive models to analyze linguistic features like punctuation, semantic levels, and readability. The proposed approach uses text summarization, web crawling to find related articles, latent semantic analysis to compare articles, and fuzzy logic to determine the authenticity score of a target news article. The goal is to develop a system to help users identify fake news on social media platforms.
Classifying Crises-Information Relevancy with SemanticsCOMRADES project
Prashant Khare, Gregoire Burel, and Harith Alani
Knowledge Media Institute, The Open University, United Kingdom
fprashant.khare,g.burel,[email protected]
IRJET - Fake News Detection using Machine LearningIRJET Journal
This document presents a machine learning approach for detecting fake news. It discusses existing fake news detection methods and their limitations. The proposed system uses natural language processing and machine learning techniques like TF-IDF vectorization, naive Bayes classification and XGBoost to build a model that classifies news articles as real or fake. It extracts linguistic features from news content and social context to train models that can identify fake news with greater accuracy than existing approaches. The system is intended to help reduce the spread of misinformation on social media platforms.
Semantic Wide and Deep Learning for Detecting Crisis-Information Categories o...Gregoire Burel
When crises hit, many flog to social media to share or consume information related to the event. Social media posts during crises tend to provide valuable reports on affected people, donation offers, help requests, advice provision, etc. Automatically identifying the category of information (e.g., reports on affected individuals, donations and volunteers) contained in these posts is vital for their efficient handling and consumption by effected communities and concerned organisations. In this paper, we introduce Sem-CNN; a wide and deep Convolutional Neural Network (CNN) model designed for identifying the category of information contained in crisis-related social media content. Unlike previous models, which mainly rely on the lexical representations of words in the text, the proposed model integrates an additional layer of semantics that represents the named entities in the text, into a wide and deep CNN network. Results show that the Sem-CNN model consistently outperforms the baselines which consist of statistical and non-semantic deep learning models.
Paper access: http://oro.open.ac.uk/51726/
This document proposes using a convolutional neural network (CNN) to detect and classify fake news. It first discusses the implications of fake news spreading on social media and the need for automated identification. It then explores existing fake news datasets and data preprocessing techniques. Deep learning approaches like word embeddings and CNNs are presented as promising techniques to capture semantics in text for classification. The document outlines a CNN architecture with word embedding, convolutional, max pooling and fully connected layers to output probabilities for fake/real classification. It reports the CNN approach achieved 99.8% accuracy on a 2.5GB dataset, significantly outperforming baseline models like SVM and naive bayes. Finally, contact information is provided for questions.
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...caijjournal
The quick access to information on social media networks as well as its exponential rise also made it
difficult to distinguish among fake information or real information. The fast dissemination by way of
sharing has enhanced its falsification exponentially. It is also important for the credibility of social media
networks to avoid the spread of fake information. So it is emerging research challenge to automatically
check for misstatement of information through its source, content, or publisher and prevent the
unauthenticated sources from spreading rumours. This paper demonstrates an artificial intelligence based
approach for the identification of the false statements made by social network entities. Two variants of
Deep neural networks are being applied to evalues datasets and analyse for fake news presence. The
implementation setup produced maximum extent 99% classification accuracy, when dataset is tested for
binary (true or false) labeling with multiple epochs.
Recently, fake news has been incurring many problems to our society. As a result, many researchers have been working on identifying fake news. Most of the fake news detection systems utilize the linguistic feature of the news. However, they have difficulty in sensing highly ambiguous fake news which can be detected only after identifying meaning and latest related information. In this paper, to resolve this problem, we shall present a new Korean fake news detection system using fact DB which is built and updated by human's direct judgement after collecting obvious facts. Our system receives a proposition, and search the semantically related articles from Fact DB in order to verify whether the given proposition is true or not by comparing the proposition with the related articles in fact DB. To achieve this, we utilize a deep learning model, Bidirectional Multi Perspective Matching for Natural Language Sentence BiMPM , which has demonstrated a good performance for the sentence matching task. However, BiMPM has some limitations in that the longer the length of the input sentence is, the lower its performance is, and it has difficulty in making an accurate judgement when an unlearned word or relation between words appear. In order to overcome the limitations, we shall propose a new matching technique which exploits article abstraction as well as entity matching set in addition to BiMPM. In our experiment, we shall show that our system improves the whole performance for fake news detection. Prasanth. K | Praveen. N | Vijay. S | Auxilia Osvin Nancy. V ""Fake News Detection using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30014.pdf
Paper Url : https://www.ijtsrd.com/engineering/information-technology/30014/fake-news-detection-using-machine-learning/prasanth-k
A general stochastic information diffusion model in social networks based on ...IJCNCJournal
Social networks are an important infrastructure for information, viruses and innovations propagation. Since users’
behavior has influenced by other users’ activity, some groups of people would be made regard to similarity of users’
interests. On the other hand, dealing with many events in real worlds, can be justified in social networks; spreading
disease is one instance of them. People’s manner and infection severity are more important parameters in
dissemination of diseases. Both of these reasons derive, whether the diffusion leads to an epidemic or not. SIRS is a
hybrid model of SIR and SIS disease models to spread contamination. A person in this model can be returned to
susceptible state after it removed. According to communities which are established on the social network, we use the
compartmental type of SIRS model. During this paper, a general compartmental information diffusion model would
be proposed and extracted some of the beneficial parameters to analyze our model. To adapt our model to realistic
behaviors, we use Markovian model, which would be helpful to create a stochastic manner of the proposed model.
In the case of random model, we can calculate probabilities of transaction between states and predicting value of
each state. The comparison between two mode of the model shows that, the prediction of population would be
verified in each state.
IRJET- Fake News Detection using Logistic RegressionIRJET Journal
1) The document discusses a study that uses logistic regression to classify news articles as real or fake. It outlines the methodology which includes data preprocessing, feature extraction using bag-of-words and TF-IDF, and using a logistic regression classifier to predict fake news.
2) The model achieved an accuracy of approximately 72% at classifying news as real or fake when using TF-IDF features and logistic regression.
3) The study aims to address the growing issue of fake news proliferation online by developing a computational method for identifying unreliable news sources.
The mimetic virus A vector for cyberterrorismNicholas Ayres
The document discusses the potential for a "mimetic virus" to be used as a vector for cyberterrorism against the general public. It presents research involving a survey of 100 participants about their understanding and fear of cyberterrorism. The survey found that while participants had some knowledge of cyberterrorism, their fear of an attack was initially low. Participants were then shown a fabricated video claiming to depict a real computer virus that caused laptop batteries to explode. The data showed this increased participants' fear levels and likely to modify their future behaviors. The research suggests a mimetic virus could be an effective method for cyberterrorists to target and influence the general public, though the ability of such a virus to spread via social media is unclear.
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEMijnlc
Each argument begins with a conclusion, which is followed by one or more premises supporting the
conclusion. The warrant is a critical component of Toulmin's argument model; it explains why the premises
support the claim. Despite its critical role in establishing the claim's veracity, it is frequently omitted or left
implicit, leaving readers to infer. We consider the problem of producing more diverse and high-quality
warrants in response to a claim and evidence. To begin, we employ BART [1] as a conditional sequence tosequence language model to guide the output generation process. On the ARCT dataset [2], we fine-tune
the BART model. Second, we propose the Multi-Agent Network for Warrant Generation as a model for
producing more diverse and high-quality warrants by combining Reinforcement Learning (RL) and
Generative Adversarial Networks (GAN) with the mechanism of mutual awareness of agents. In terms of
warrant generation, our model generates a greater variety of warrants than other baseline models. The
experimental results validate the effectiveness of our proposed hybrid model for generating warrants.
A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...IJCNCJournal
The SIR model is used extensively in the field of epidemiology, in particular, for the analysis of communal
diseases. One problem with SIR and other existing models is that they are tailored to random or Erdos type networks since they do not consider the varying probabilities of infection or immunity per node. In this paper, we present the application and the simulation results of the pSEIRS model that takes into account the probabilities, and is thus suitable for more realistic scale free networks. In the pSEIRS model, the death rate and the excess death rate are constant for infective nodes. Latent and immune periods are assumed to be constant and the infection rate is assumed to be proportional to I (t) N(t) , where N (t) is the size of the total population and I(t) is the size of the infected population. A node recovers from an infection
temporarily with a probability p and dies from the infection with probability (1-p).
Probabilistic models for anomaly detection based on usage of network trafficAlexander Decker
This document discusses probabilistic models for anomaly detection based on network traffic usage. It introduces several probabilistic methods and statistical models that can be used for network traffic anomaly detection, including Bayesian theorem, mean and standard deviation models, point and interval estimations, multivariate regression models, Markov processes, and time series models. As an example, it describes modeling the spread of computer worms using epidemiological models such as linear, exponential, logistic, and differential equation models. It also discusses the different possible scenarios an intrusion detection system can encounter and how to calculate probabilities of outcomes using Bayesian theorem.
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: [email protected]
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
These days considering expansion of networks, dissemination of information has become one of significant cases for researchers. In social networks in addition to social structures and people effectiveness on each other, Profit increase of sales, publishing a news or rumor, spread or diffusion of an idea can be mentioned. In social societies, people affect each other and with an individual’s membership, his friends
may join that group as well. In publishing a piece of news, independent of its nature there are different ways to expand it. Since information isn’t always suitable and positive, this article is trying to introduce the immunization mechanism against this information. The meaning of immunization is a kind of Slow Publishing of such information in network. Therefor it has been tried in this article to slow down the
publishing of information or even stop them. With comparison of presented methods for immunization and also presenting rate delay parameter, the immunization of methods were evaluated and we identified the most effective immunization method. Among existing methods for immunization and recommended methods, recommended methods also have an effective role in preventing spread of malicious rumor.
This document proposes a computational model to represent and analyze transmedia ecosystems. It begins by defining transmedia as story-based contents that expand narratives across multiple media.
It then reviews existing definitions and taxonomies of transmedia, which primarily focus on media expansions. The proposed model instead focuses on expansions of narrative worlds.
The document introduces a taxonomy with eight categories for classifying narrative expansion methods: spin off, reboot, prequel, sequel, interquel, midquel, sidequel, and paraquel. It illustrates the taxonomy using examples from the Star Wars transmedia ecosystem.
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
Highlighted notes while studying for project work:
Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks
Abhinav Bhatele†
Jae-Seung Yeom†
Nikhil Jain†
Chris J. Kuhlman∗
Yarden Livnat‡
Keith R. Bisset∗
Laxmikant V. Kale§
Madhav V. Marathe∗
†Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94551 USA
∗Biocomplexity Institute & Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061 USA
‡Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112 USA
§Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801 USA
E-mail: †{bhatele, yeom2, nikhil}@llnl.gov, ∗{ckuhlman, kbisset, mmarathe}@vbi.vt.edu
Abstract—Controlling the spread of infectious diseases in large populations is an important societal challenge. Mathematically, the problem is best captured as a certain class of reactiondiffusion processes (referred to as contagion processes) over appropriate synthesized interaction networks. Agent-based models have been successfully used in the recent past to study such contagion processes. We describe EpiSimdemics, a highly scalable, parallel code written in Charm++ that uses agent-based modeling to simulate disease spreads over large, realistic, co-evolving interaction networks. We present a new parallel implementation of EpiSimdemics that achieves unprecedented strong and weak scaling on different architectures — Blue Waters, Cori and Mira. EpiSimdemics achieves five times greater speedup than the second fastest parallel code in this field. This unprecedented scaling is an important step to support the long term vision of realtime epidemic science. Finally, we demonstrate the capabilities of EpiSimdemics by simulating the spread of influenza over a realistic synthetic social contact network spanning the continental United States (∼280 million nodes and 5.8 billion social contacts).
A data mining tool for the detection of suicide in social networksYassine Bensaoucha
This document describes a dissertation that developed a program to detect suicidal tendencies in users on Twitter through data mining and text classification techniques. The program first collects and preprocesses tweets, then classifies them using naive Bayes classifiers into three categories: positive, negative, and suicidal. It analyzes the results to determine if a given user has suicidal tendencies based on the percentage of tweets classified in each category. While initial results were promising, future work could compare this approach to other classifiers and potentially combine it with decision tree classification.
1) The document discusses how extended metaphors can bolster the persuasive influence of metaphoric frames used to describe important issues.
2) An experiment was conducted where participants were presented with issues framed using different metaphors (e.g., crime as a virus or beast) and response options described using consistent or inconsistent extended metaphors.
3) Preliminary studies confirmed that the response options and extended metaphors used in the experiment were conceptually and lexically related to the metaphor frames in the intended ways. The experiment aimed to test if extended metaphors influence how people reason about issues framed metaphorically.
1) The document discusses how extended metaphors can bolster the persuasive influence of metaphoric frames used to describe important issues.
2) A study was conducted where participants matched metaphoric frames ("crime as a virus" or "crime as a beast") to policy responses. The results showed clear conceptual mappings between the frames and responses, indicating extended metaphors could reinforce these relationships.
3) A follow up experiment will manipulate whether an extended metaphor used to describe a response is conceptually consistent or inconsistent with the initial metaphor frame, to test if extended metaphors can influence response endorsement even when inconsistent.
This document describes a Bayesian approach to identifying individuals of high concern regarding infection in a population using contact network and epidemiological data. The approach models infection spread and uses Bayesian statistical methods to calculate the probability that an individual is infected but not yet presenting symptoms or will become infected soon. The approach was tested using a simulation model and showed reasonable results, though further work is needed to improve integration methods to allow inference on larger contact networks.
Trust Management for Secure Routing Forwarding Data Using Delay Tolerant Netw...rahulmonikasharma
Delay Tolerant Networks (DTNs) have established the connection to source and destination. For example this often faces disconnection and unreliable wireless connections. A delay tolerant network (DTNs) provides a network imposes disruption or delay. The delay tolerant networks operate in limited resources such as memory size, central processing unit. Trust management protocol uses a dynamic threshold updating which overcomes the problems .The dynamic threshold update reduces the false detection probability of the malicious nodes. The system proposes a secure routing management schemes to adopt information security principles successfully. It analyzes the basic security principles and operations for trust authentication which is applicable in delay tolerant networks (DTNs).For security the proposed system identifies the store and forward approach in network communications and analyzes the routing in cases like selfish contact and collaboration contact methods. The proposed method identifies ZRP protocol scheme and it enhances the scheme using methods namely distributed operation, mobility, delay analysis, security association and trust modules. This security scheme analyzes the performance analysis and proposed algorithm based on parameter time, authentication, security, and secure routing. From this analysis, this research work identifies the issues in DTNs secure routing and enhances ZRP (Zone Routing Protocol) by suggesting an authentication principle as a noted security principle for extremely information security concepts.
This document describes an adaptive model for the spread of infection on networks. It begins by introducing network analysis and percolation models. It then presents the basic SIS (Susceptible-Infected-Susceptible) model for modeling infection spread and derives the steady-state infection rates. It introduces the concept of network adaptation and reviews models of financial contagion that lack adaptation. It then presents a new popularity-based network model and develops an adaptive SIS model that incorporates network changes in response to infection spread. Computational analysis shows the existence of a phase boundary for this adaptive model.
Helping Crisis Responders Find the Informative Needle in the Tweet HaystackCOMRADES project
Leon Derczynski - University of Sheffield,
Kenny Meesters - TU Delft, Kalina Bontcheva - University of Sheffield, Diana Maynard- University of Sheffield
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Meliorating usable document density for online event detectionIJICTJOURNAL
Online event detection (OED) has seen a rise in the research community as it can provide quick identification of possible events happening at times in the world. Through these systems, potential events can be indicated well before they are reported by the news media, by grouping similar documents shared over social media by users. Most OED systems use textual similarities for this purpose. Similar documents, that may indicate a potential event, are further strengthened by the replies made by other users, thereby improving the potentiality of the group. However, these documents are at times unusable as independent documents, as they may replace previously appeared noun phrases with pronouns, leading OED systems to fail while grouping these replies to their suitable clusters. In this paper, a pronoun resolution system that tries to replace pronouns with relevant nouns over social media data is proposed. Results show significant improvement in performance using the proposed system.
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PU...caijjournal3
The quick access to information on social media networks as well as its exponential rise also made it difficult to distinguish among fake information or real information. The fast dissemination by way of sharing has enhanced its falsification exponentially. It is also important for the credibility of social media networks to avoid the spread of fake information. So it is emerging research challenge to automatically check for misstatement of information through its source, content, or publisher and prevent the unauthenticated sources from spreading rumours. This paper demonstrates an artificial intelligence based approach for the identification of the false statements made by social network entities. Two variants of Deep neural networks are being applied to evalues datasets and analyse for fake news presence. The implementation setup produced maximum extent 99% classification accuracy, when dataset is tested for binary (true or false) labeling with multiple epochs.
Recently, fake news has been incurring many problems to our society. As a result, many researchers have been working on identifying fake news. Most of the fake news detection systems utilize the linguistic feature of the news. However, they have difficulty in sensing highly ambiguous fake news which can be detected only after identifying meaning and latest related information. In this paper, to resolve this problem, we shall present a new Korean fake news detection system using fact DB which is built and updated by human's direct judgement after collecting obvious facts. Our system receives a proposition, and search the semantically related articles from Fact DB in order to verify whether the given proposition is true or not by comparing the proposition with the related articles in fact DB. To achieve this, we utilize a deep learning model, Bidirectional Multi Perspective Matching for Natural Language Sentence BiMPM , which has demonstrated a good performance for the sentence matching task. However, BiMPM has some limitations in that the longer the length of the input sentence is, the lower its performance is, and it has difficulty in making an accurate judgement when an unlearned word or relation between words appear. In order to overcome the limitations, we shall propose a new matching technique which exploits article abstraction as well as entity matching set in addition to BiMPM. In our experiment, we shall show that our system improves the whole performance for fake news detection. Prasanth. K | Praveen. N | Vijay. S | Auxilia Osvin Nancy. V ""Fake News Detection using Machine Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30014.pdf
Paper Url : https://www.ijtsrd.com/engineering/information-technology/30014/fake-news-detection-using-machine-learning/prasanth-k
A general stochastic information diffusion model in social networks based on ...IJCNCJournal
Social networks are an important infrastructure for information, viruses and innovations propagation. Since users’
behavior has influenced by other users’ activity, some groups of people would be made regard to similarity of users’
interests. On the other hand, dealing with many events in real worlds, can be justified in social networks; spreading
disease is one instance of them. People’s manner and infection severity are more important parameters in
dissemination of diseases. Both of these reasons derive, whether the diffusion leads to an epidemic or not. SIRS is a
hybrid model of SIR and SIS disease models to spread contamination. A person in this model can be returned to
susceptible state after it removed. According to communities which are established on the social network, we use the
compartmental type of SIRS model. During this paper, a general compartmental information diffusion model would
be proposed and extracted some of the beneficial parameters to analyze our model. To adapt our model to realistic
behaviors, we use Markovian model, which would be helpful to create a stochastic manner of the proposed model.
In the case of random model, we can calculate probabilities of transaction between states and predicting value of
each state. The comparison between two mode of the model shows that, the prediction of population would be
verified in each state.
IRJET- Fake News Detection using Logistic RegressionIRJET Journal
1) The document discusses a study that uses logistic regression to classify news articles as real or fake. It outlines the methodology which includes data preprocessing, feature extraction using bag-of-words and TF-IDF, and using a logistic regression classifier to predict fake news.
2) The model achieved an accuracy of approximately 72% at classifying news as real or fake when using TF-IDF features and logistic regression.
3) The study aims to address the growing issue of fake news proliferation online by developing a computational method for identifying unreliable news sources.
The mimetic virus A vector for cyberterrorismNicholas Ayres
The document discusses the potential for a "mimetic virus" to be used as a vector for cyberterrorism against the general public. It presents research involving a survey of 100 participants about their understanding and fear of cyberterrorism. The survey found that while participants had some knowledge of cyberterrorism, their fear of an attack was initially low. Participants were then shown a fabricated video claiming to depict a real computer virus that caused laptop batteries to explode. The data showed this increased participants' fear levels and likely to modify their future behaviors. The research suggests a mimetic virus could be an effective method for cyberterrorists to target and influence the general public, though the ability of such a virus to spread via social media is unclear.
WARRANTS GENERATIONS USING A LANGUAGE MODEL AND A MULTI-AGENT SYSTEMijnlc
Each argument begins with a conclusion, which is followed by one or more premises supporting the
conclusion. The warrant is a critical component of Toulmin's argument model; it explains why the premises
support the claim. Despite its critical role in establishing the claim's veracity, it is frequently omitted or left
implicit, leaving readers to infer. We consider the problem of producing more diverse and high-quality
warrants in response to a claim and evidence. To begin, we employ BART [1] as a conditional sequence tosequence language model to guide the output generation process. On the ARCT dataset [2], we fine-tune
the BART model. Second, we propose the Multi-Agent Network for Warrant Generation as a model for
producing more diverse and high-quality warrants by combining Reinforcement Learning (RL) and
Generative Adversarial Networks (GAN) with the mechanism of mutual awareness of agents. In terms of
warrant generation, our model generates a greater variety of warrants than other baseline models. The
experimental results validate the effectiveness of our proposed hybrid model for generating warrants.
A COMPUTER VIRUS PROPAGATION MODEL USING DELAY DIFFERENTIAL EQUATIONS WITH PR...IJCNCJournal
The SIR model is used extensively in the field of epidemiology, in particular, for the analysis of communal
diseases. One problem with SIR and other existing models is that they are tailored to random or Erdos type networks since they do not consider the varying probabilities of infection or immunity per node. In this paper, we present the application and the simulation results of the pSEIRS model that takes into account the probabilities, and is thus suitable for more realistic scale free networks. In the pSEIRS model, the death rate and the excess death rate are constant for infective nodes. Latent and immune periods are assumed to be constant and the infection rate is assumed to be proportional to I (t) N(t) , where N (t) is the size of the total population and I(t) is the size of the infected population. A node recovers from an infection
temporarily with a probability p and dies from the infection with probability (1-p).
Probabilistic models for anomaly detection based on usage of network trafficAlexander Decker
This document discusses probabilistic models for anomaly detection based on network traffic usage. It introduces several probabilistic methods and statistical models that can be used for network traffic anomaly detection, including Bayesian theorem, mean and standard deviation models, point and interval estimations, multivariate regression models, Markov processes, and time series models. As an example, it describes modeling the spread of computer worms using epidemiological models such as linear, exponential, logistic, and differential equation models. It also discusses the different possible scenarios an intrusion detection system can encounter and how to calculate probabilities of outcomes using Bayesian theorem.
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: [email protected]
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
These days considering expansion of networks, dissemination of information has become one of significant cases for researchers. In social networks in addition to social structures and people effectiveness on each other, Profit increase of sales, publishing a news or rumor, spread or diffusion of an idea can be mentioned. In social societies, people affect each other and with an individual’s membership, his friends
may join that group as well. In publishing a piece of news, independent of its nature there are different ways to expand it. Since information isn’t always suitable and positive, this article is trying to introduce the immunization mechanism against this information. The meaning of immunization is a kind of Slow Publishing of such information in network. Therefor it has been tried in this article to slow down the
publishing of information or even stop them. With comparison of presented methods for immunization and also presenting rate delay parameter, the immunization of methods were evaluated and we identified the most effective immunization method. Among existing methods for immunization and recommended methods, recommended methods also have an effective role in preventing spread of malicious rumor.
This document proposes a computational model to represent and analyze transmedia ecosystems. It begins by defining transmedia as story-based contents that expand narratives across multiple media.
It then reviews existing definitions and taxonomies of transmedia, which primarily focus on media expansions. The proposed model instead focuses on expansions of narrative worlds.
The document introduces a taxonomy with eight categories for classifying narrative expansion methods: spin off, reboot, prequel, sequel, interquel, midquel, sidequel, and paraquel. It illustrates the taxonomy using examples from the Star Wars transmedia ecosystem.
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...Subhajit Sahu
Highlighted notes while studying for project work:
Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks
Abhinav Bhatele†
Jae-Seung Yeom†
Nikhil Jain†
Chris J. Kuhlman∗
Yarden Livnat‡
Keith R. Bisset∗
Laxmikant V. Kale§
Madhav V. Marathe∗
†Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94551 USA
∗Biocomplexity Institute & Department of Computer Science, Virginia Tech, Blacksburg, Virginia 24061 USA
‡Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, Utah 84112 USA
§Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801 USA
E-mail: †{bhatele, yeom2, nikhil}@llnl.gov, ∗{ckuhlman, kbisset, mmarathe}@vbi.vt.edu
Abstract—Controlling the spread of infectious diseases in large populations is an important societal challenge. Mathematically, the problem is best captured as a certain class of reactiondiffusion processes (referred to as contagion processes) over appropriate synthesized interaction networks. Agent-based models have been successfully used in the recent past to study such contagion processes. We describe EpiSimdemics, a highly scalable, parallel code written in Charm++ that uses agent-based modeling to simulate disease spreads over large, realistic, co-evolving interaction networks. We present a new parallel implementation of EpiSimdemics that achieves unprecedented strong and weak scaling on different architectures — Blue Waters, Cori and Mira. EpiSimdemics achieves five times greater speedup than the second fastest parallel code in this field. This unprecedented scaling is an important step to support the long term vision of realtime epidemic science. Finally, we demonstrate the capabilities of EpiSimdemics by simulating the spread of influenza over a realistic synthetic social contact network spanning the continental United States (∼280 million nodes and 5.8 billion social contacts).
A data mining tool for the detection of suicide in social networksYassine Bensaoucha
This document describes a dissertation that developed a program to detect suicidal tendencies in users on Twitter through data mining and text classification techniques. The program first collects and preprocesses tweets, then classifies them using naive Bayes classifiers into three categories: positive, negative, and suicidal. It analyzes the results to determine if a given user has suicidal tendencies based on the percentage of tweets classified in each category. While initial results were promising, future work could compare this approach to other classifiers and potentially combine it with decision tree classification.
1) The document discusses how extended metaphors can bolster the persuasive influence of metaphoric frames used to describe important issues.
2) An experiment was conducted where participants were presented with issues framed using different metaphors (e.g., crime as a virus or beast) and response options described using consistent or inconsistent extended metaphors.
3) Preliminary studies confirmed that the response options and extended metaphors used in the experiment were conceptually and lexically related to the metaphor frames in the intended ways. The experiment aimed to test if extended metaphors influence how people reason about issues framed metaphorically.
1) The document discusses how extended metaphors can bolster the persuasive influence of metaphoric frames used to describe important issues.
2) A study was conducted where participants matched metaphoric frames ("crime as a virus" or "crime as a beast") to policy responses. The results showed clear conceptual mappings between the frames and responses, indicating extended metaphors could reinforce these relationships.
3) A follow up experiment will manipulate whether an extended metaphor used to describe a response is conceptually consistent or inconsistent with the initial metaphor frame, to test if extended metaphors can influence response endorsement even when inconsistent.
This document describes a Bayesian approach to identifying individuals of high concern regarding infection in a population using contact network and epidemiological data. The approach models infection spread and uses Bayesian statistical methods to calculate the probability that an individual is infected but not yet presenting symptoms or will become infected soon. The approach was tested using a simulation model and showed reasonable results, though further work is needed to improve integration methods to allow inference on larger contact networks.
Trust Management for Secure Routing Forwarding Data Using Delay Tolerant Netw...rahulmonikasharma
Delay Tolerant Networks (DTNs) have established the connection to source and destination. For example this often faces disconnection and unreliable wireless connections. A delay tolerant network (DTNs) provides a network imposes disruption or delay. The delay tolerant networks operate in limited resources such as memory size, central processing unit. Trust management protocol uses a dynamic threshold updating which overcomes the problems .The dynamic threshold update reduces the false detection probability of the malicious nodes. The system proposes a secure routing management schemes to adopt information security principles successfully. It analyzes the basic security principles and operations for trust authentication which is applicable in delay tolerant networks (DTNs).For security the proposed system identifies the store and forward approach in network communications and analyzes the routing in cases like selfish contact and collaboration contact methods. The proposed method identifies ZRP protocol scheme and it enhances the scheme using methods namely distributed operation, mobility, delay analysis, security association and trust modules. This security scheme analyzes the performance analysis and proposed algorithm based on parameter time, authentication, security, and secure routing. From this analysis, this research work identifies the issues in DTNs secure routing and enhances ZRP (Zone Routing Protocol) by suggesting an authentication principle as a noted security principle for extremely information security concepts.
This document describes an adaptive model for the spread of infection on networks. It begins by introducing network analysis and percolation models. It then presents the basic SIS (Susceptible-Infected-Susceptible) model for modeling infection spread and derives the steady-state infection rates. It introduces the concept of network adaptation and reviews models of financial contagion that lack adaptation. It then presents a new popularity-based network model and develops an adaptive SIS model that incorporates network changes in response to infection spread. Computational analysis shows the existence of a phase boundary for this adaptive model.
Helping Crisis Responders Find the Informative Needle in the Tweet HaystackCOMRADES project
Leon Derczynski - University of Sheffield,
Kenny Meesters - TU Delft, Kalina Bontcheva - University of Sheffield, Diana Maynard- University of Sheffield
WiPe Paper – Social Media Studies
Proceedings of the 15th ISCRAM Conference – Rochester, NY, USA May 2018
Meliorating usable document density for online event detectionIJICTJOURNAL
Online event detection (OED) has seen a rise in the research community as it can provide quick identification of possible events happening at times in the world. Through these systems, potential events can be indicated well before they are reported by the news media, by grouping similar documents shared over social media by users. Most OED systems use textual similarities for this purpose. Similar documents, that may indicate a potential event, are further strengthened by the replies made by other users, thereby improving the potentiality of the group. However, these documents are at times unusable as independent documents, as they may replace previously appeared noun phrases with pronouns, leading OED systems to fail while grouping these replies to their suitable clusters. In this paper, a pronoun resolution system that tries to replace pronouns with relevant nouns over social media data is proposed. Results show significant improvement in performance using the proposed system.
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PU...caijjournal3
The quick access to information on social media networks as well as its exponential rise also made it difficult to distinguish among fake information or real information. The fast dissemination by way of sharing has enhanced its falsification exponentially. It is also important for the credibility of social media networks to avoid the spread of fake information. So it is emerging research challenge to automatically check for misstatement of information through its source, content, or publisher and prevent the unauthenticated sources from spreading rumours. This paper demonstrates an artificial intelligence based approach for the identification of the false statements made by social network entities. Two variants of Deep neural networks are being applied to evalues datasets and analyse for fake news presence. The implementation setup produced maximum extent 99% classification accuracy, when dataset is tested for binary (true or false) labeling with multiple epochs.
Event detection and summarization based on social networks and semantic query...ijnlc
Events can be characterized by a set of descriptive, collocated keywords extracted documents. Intuitively,
documents describing the same event will contain similar sets of keywords, and the graph for a document collection will contain clusters individual events. Helping users to understand the event is an acute problem nowadays as the users are struggling to keep up with tremendous amount of information published every day in the Internet. The challenging task is to detect the events from online web resources, it is getting more attentions. The important data source for event detection is a Web search log because the information it contains reflects users’ activities and interestingness to various real world events. There are three major issues playing role for event detection from web search logs: effectiveness, efficiency of
detected events. We focus on modeling the content of events by their semantic relations with other events
and generating structured summarization. Event mining is a useful way to understand computer system behaviors. The focus of recent works on event mining has been shifted to event summarization from discovering frequent patterns. Event summarization provides a comprehensible explanation of the event sequence based on certain aspects.
Classification of Disastrous Tweets on Twitter using BERT ModelIRJET Journal
This document summarizes a research paper that used the BERT model to classify disaster-related tweets on Twitter. The researchers collected tweet data from Twitter related to disasters and emergencies and labeled them as referring to a genuine disaster or not. They preprocessed the tweet text and used the BERT model as well as other algorithms like SVM and TF-IDF to classify the tweets. The BERT model was able to understand the context of words in tweets to better determine if they referred to a disaster compared to methods that did not consider context. The researchers trained models on two different tweet datasets and evaluated the results, finding that the BERT model performed well at classifying disaster tweets.
This document discusses a blockchain-based framework for event detection and trust verification using natural language processing and machine learning on social media data. Key points:
- The framework aims to improve emergency response to disasters by analyzing social media posts to extract real-time information about events, while also verifying the trustworthiness of detected events and eliminating single authorities.
- Machine learning, deep learning, and natural language processing techniques are used on social media datasets to detect emergency events and filter information to support relief efforts.
- The blockchain framework is incorporated to improve security, transparency, and avoid sharing of wrong information by verifying the trust of detected events without a single point of authority.
- The overall goal is developing
HIGH ACCURACY LOCATION INFORMATION EXTRACTION FROM SOCIAL NETWORK TEXTS USING...kevig
Terrorism has become a worldwide plague with severe consequences for the development of nations.
Besides killing innocent people daily and preventing educational activities from taking place, terrorism is
also hindering economic growth. Machine Learning (ML) and Natural Language Processing (NLP) can
contribute to fighting terrorism by predicting in real-time future terrorist attacks if accurate data is
available. This paper is part of a research project that uses text from social networks to extract necessary
information to build an adequate dataset for terrorist attack prediction. We collected a set of 3000 social
network texts about terrorism in Burkina Faso and used a subset to experiment with existing NLP
solutions. The experiment reveals that existing solutions have poor accuracy for location recognition,
which our solution resolves. We will extend the solution to extract dates and action information to achieve
the project's goal.
High Accuracy Location Information Extraction From Social Network Texts Using...kevig
Terrorism has become a worldwide plague with severe consequences for the development of nations. Besides killing innocent people daily and preventing educational activities from taking place, terrorism is also hindering economic growth. Machine Learning (ML) and Natural Language Processing (NLP) can contribute to fighting terrorism by predicting in real-time future terrorist attacks if accurate data is available. This paper is part of a research project that uses text from social networks to extract necessary information to build an adequate dataset for terrorist attack prediction. We collected a set of 3000 social network texts about terrorism in Burkina Faso and used a subset to experiment with existing NLP solutions. The experiment reveals that existing solutions have poor accuracy for location recognition, which our solution resolves. We will extend the solution to extract dates and action information to achieve the project's goal.
High Accuracy Location Information Extraction From Social Network Texts Using...kevig
Terrorism has become a worldwide plague with severe consequences for the development of nations. Besides killing innocent people daily and preventing educational activities from taking place, terrorism is also hindering economic growth. Machine Learning (ML) and Natural Language Processing (NLP) can contribute to fighting terrorism by predicting in real-time future terrorist attacks if accurate data is available. This paper is part of a research project that uses text from social networks to extract necessary information to build an adequate dataset for terrorist attack prediction. We collected a set of 3000 social network texts about terrorism in Burkina Faso and used a subset to experiment with existing NLP solutions. The experiment reveals that existing solutions have poor accuracy for location recognition, which our solution resolves. We will extend the solution to extract dates and action information to achieve the project's goal.
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGijaia
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...IJECEIAES
This document summarizes a research paper that proposes using a tree-based pipeline optimization tool (TPOT) to improve sentiment classification of dialectal Arabic texts. The paper provides background on sentiment analysis and challenges in analyzing informal Arabic texts. It then discusses related work applying TPOT and AutoML techniques to optimize machine learning for various tasks. The proposed approach uses TPOT for sentiment analysis of three Arabic dialect datasets to automatically optimize hyperparameters and improve over similar prior work.
Crime prediction using a hybrid sentiment analysis approach based on the bidi...nooriasukmaningtyas
Sentiment analysis (SA) is widely used today in many areas such as crime detection (security intelligence) to detect potential security threats in realtime using social media platforms such as Twitter. The most promising techniques in sentiment analysis are those of deep learning (DL), particularly bidirectional encoder representations from transformers (BERT) in the field of natural language processing (NLP). However, employing the BERT algorithm to detect crimes requires a crime dataset labeled by the lexiconbased approach. In this paper, we used a hybrid approach that combines both lexicon-based and deep learning, with BERT as the DL model. We employed the lexicon-based approach to label our Twitter dataset with a set of normal and crime-related lexicons; then, we used the obtained labeled dataset to train our BERT model. The experimental results show that our hybrid technique outperforms existing approaches in several metrics, with 94.91% and 94.92% in accuracy and F1-score respectively.
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
The pervasive use of social media platforms, such as Facebook, Instagram, and X, has significantly amplified
our electronic interconnectedness. Moreover, these platforms are now easily accessible from any location at
any given time. However, the increased popularity of social media has also led to cyberbullying.It is imperative
to address the need for finding, monitoring, and mitigating cyberbullying posts on social media platforms.
Motivated by this necessity, we present this paper to contribute to developing an automated system for
detecting binary labels of aggressive tweets.Our study has demonstrated remarkable performance compared to
previous experiments on the same dataset. We employed the stacking ensemble machine learning method,
utilizing four various feature extraction techniques to optimize performance within the stacking ensemble
learning framework. Combining five machine learning algorithms,Decision Trees, Random Forest, Linear
Support Vector Classification, Logistic Regression, and K-Nearest Neighbors into an ensemble method, we
achieved superior results compared to traditional machine learning classifier models. The stacking classifier
achieved a high accuracy rate of 94.00%, outperforming traditional machine learning models and surpassing
the results of prior experiments that utilized the same dataset. The outcomes of our experiments showcased an
accuracy rate of 0.94% in detection tweets as aggressive or non-aggressive.
Crisis Event Extraction Service (CREES) – Automatic Detection and Classificat...Gregoire Burel
Social media posts tend to provide valuable reports during crises. However, this information can be hidden in large amounts of unrelated documents. Providing tools that automatically identify relevant posts, event types (hurricane, floods, etc.) and information categories (reports on affected individuals, donations and volunteers, etc.) in social media posts is vital for their efficient handling and consumption. We introduce the Crisis Event Extraction Service (CREES), an open-source web API that automatically classifies posts during crisis situations. The API provides annotations for crisis-related documents, event types and information categories through an easily deployable and accessible web API that can be easily integrated into multiple platform and tools. The annotation service is backed by Convolutional Neural Networks (CNN) and validated against traditional machine learning models. Results show that the CNN-based API is consistent with the baselines and can be relied upon when dealing with specific crises.
A ROBUST JOINT-TRAINING GRAPHNEURALNETWORKS MODEL FOR EVENT DETECTIONWITHSYMM...kevig
This document proposes a Joint-training Graph Convolution Networks (JT-GCN) model to address the challenge of event detection tasks with noisy labels. The model uses two Graph Convolution Networks with edge enhancement that make predictions simultaneously. A joint loss is calculated combining the detection loss from the predictions and a contrast loss between the two networks. Additionally, a small-loss selection mechanism is used to mitigate the impact of mislabeled samples during training, by excluding samples with large losses from backpropagation. Experiments on the ACE2005 benchmark dataset show the proposed model is robust to label noise and outperforms state-of-the-art models for event detection tasks.
A Robust Joint-Training Graph Neural Networks Model for Event Detection with ...kevig
Events are the core element of information in descriptive corpus. Although many progresses have beenmade in Event Detection (ED), it is still a challenge in Natural Language Processing (NLP) to detect event information from data with unavoidable noisy labels. A robust Joint-training Graph ConvolutionNetworks (JT-GCN) model is proposed to meet the challenge of ED tasks with noisy labels in this paper. Specifically, we first employ two Graph Convolution Networks with Edge Enhancement (EE-GCN) tomake predictions simultaneously. A joint loss combining the detection loss and the contrast loss fromtwonetworks is then calculated for training. Meanwhile, a small-loss selection mechanism is introduced tomitigate the impact of mislabeled samples in networks training process. These two networks gradually reach an agreement on the ED tasks as joint-training progresses. Corrupted data with label noise are generated from the benchmark dataset ACE2005. Experiments on ED tasks has been conducted with bothsymmetry and asymmetry label noise on dif erent level. The experimental results show that the proposedmodel is robust to the impact of label noise and superior to the state-of-the-art models for EDtasks.
The document proposes creating a social network to improve information sharing and resilience in communities in the Greater Mekong Subregion. It would involve surveying communication infrastructure and skills, developing guidelines and protocols, and setting up a simulation to test emergency response. The goal is to promote formation of a social network to organize sharing of knowledge and communication for disaster preparedness and community development across Cambodia, China, Laos, Myanmar, Thailand and Vietnam.
The document proposes creating a social network to improve information sharing and resilience in communities in the Greater Mekong Subregion. It would involve surveying communication infrastructure and skills, developing guidelines and protocols, and setting up a simulation to test emergency response. The goal is to promote formation of a social network to exchange knowledge and communicate, enabling self-organization and a more sustainable society through regional cooperation.
Abstract: Detection of fake news based on deep learning techniques is a major issue used to mislead people. For
the experiments, several types of datasets, models, and methodologies have been used to detect fake news. Also,
most of the datasets contain text id, tweets id, and user-based id and user-based features. To get the proper results
and accuracy various models like CNN (Convolution neural network), DEEP CNN, and LSTM (Long short-term
memory) are used
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847. This material reflects only the author's view and the European Commission is not responsible for any use that may be made of the information it contains.
Evaluating Platforms for Community Sensemaking: Using the Case of the Kenyan ...COMRADES project
This document describes a study that evaluated how platforms can support community sensemaking during disruptive events. The researchers conducted a scenario-based evaluation using data from Kenya's 2017 elections. Twelve students participated in the evaluation. They were given the task of mapping reports of voting incidents and irregularities from Kenya's Uchaguzi platform to assess the validity of the elections and support security forces. The goal was to examine how such a platform could aid non-mandated responders' situational understanding. Data was collected on the participants' sensemaking process to identify requirements for resilience platforms and inform future research.
An Extensible Multilingual Open Source LemmatizerCOMRADES project
This document summarizes an article that presents an open-source lemmatizer called GATE DictLemmatizer. The lemmatizer currently supports lemmatization for English, German, Italian, French, Dutch, and Spanish. It uses a combination of automatically generated lemma dictionaries from Wiktionary and the Helsinki Finite-State Transducer Technology (HFST). An evaluation shows it achieves similar or better results than the TreeTagger lemmatizer for languages with HFST support, and still provides satisfactory results for languages without HFST support. The lemmatizer and tools to generate dictionaries are made freely available as open source.
D6.2 First report on Communication and Dissemination activitiesCOMRADES project
COMRADES project was launched in January 2016 with a lifetime of 36 months and it aims to empower communities with intelligent socio-technical solutions to help them reconnect, respond to, and recover from crisis situations.
COMRADES consortium will build a next generation, intelligent resilience platform to provide high socio-technical innovation and to support community resilience in crises situations. The platform will capture and process in real-time, multilingual social information streams from distributed communities, for the purpose of identifying, aggregating, and verifying reported events at the citizen and community levels. Resilience frameworks, guidelines and best practices will be embedded into the platform design and functionality, enriched with open datasets and open source software.
The main objectives of the project is to foster social innovation during crises for safeguarding communities during critical scenarios from inaccurate, distrusted, and overhyped information, and for raising citizen and community awareness of crisis situations by providing them with filtered, validated, enriched, high quality, and actionable knowledge. Community decision-making will be assisted by automated methods for real-time, intelligent processing and linking of crowdsourced crisis information.
This document forms deliverable D6.2 “First report on communication and dissemination activities”. It outlines the dissemination and communication objectives and strategy of the reporting period and focuses on the tools and activities that were undertaken to accomplish the objectives set. The deliverable reports on dissemination tools (website, social media, press releases, newsletter issues, brochures, etc.) used from M1 to M12 to disseminate the project implementing the online and offline dissemination strategy D6.1 deliverable in M6. Also, it presents the dissemination activities that have been implemented by the partners and are foreseen in the Description of Action for WP6. It will be updated yearly during the whole du project.
It is based on, and is consistent with, the DoA and the CA, but is not a substitute for reading these documents.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847
http://www.comrades-project.eu/outputs/deliverables/82-deliverables/46-d6-2-first-report-on-communication-and-dissemination-activities.html
This report describes the tools developed for multilingual text processing of social media. It gives details about the linguistic approaches used, the scope of the tools, and some results of performance evaluation. WP3 focuses on developing methods to detect relevant and informative posts, as well as content with clear validity, so that these can be dealt with efficiently without the clutter of uninformative or irrelevant posts obscuring the important facts. In order to achieve these objectives, low-level linguistic processing components are first required in order to generate lexical, syntactic and semantic features of the text required by the informativeness and trustworthiness components to be developed later in the project. These tools are also required by components developed in WP4 for the detection of emergency events, modelling and matchmaking. They take as input social media and other kinds of text-based messages and posts, and produce as output additional information about the language of the message, the named entities, and syntactic and semantic information.
In this report, we first describe the suite of tools we have developed for Information Extraction from social media for English, French and German. While English is the main language of messages dealt with by the tools in this project, it is very useful to be able to both recognise and deal with messages in other languages. French and German are therefore used as examples to show the adaptability of our tools to other languages, and our multilingual components thus serve as a testbed for new language adaptation techniques with which we have experimented during the project.
Various aspects of these tools are evaluated for accuracy. Second, we describe the tools we have developed for entity disambiguation and linking from social media, for English, French and German. These ensure not only that we extract relevant instances of locations, names of people and organisations, but that we know which particular instance we are talking about since these names may potentially refer to different things. By linking to a semantic knowledge base, we ensure both disambiguation and also that we have additional knowledge (for example, the coordinates of a location). The tools are evaluated for accuracy as well as speed, since traditionally these techniques are extremely slow and cumbersome to use in real world scenarios. The tools are all made available as GATE Cloud services.
http://www.comrades-project.eu/outputs/deliverables/82-deliverables/43-d3-1-multilingual-content-processing-methods.html
D4.1 Enriched Semantic Models of Emergency EventsCOMRADES project
This document presents a semantic model for representing emergency event data in the COMRADES project. It analyzes requirements from multiple sources, including the Ushahidi platform data structures, COMRADES tool requirements, stakeholder interviews, and crisis datasets. Based on this analysis, an Ontology Requirement Specification Document is created. Finally, an ontology model is presented and evaluated against the competency questions from the requirements analysis to ensure it meets the project needs. The model will be integrated into the COMRADES platform to semantically represent emergency event data and metadata.
SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for ...COMRADES project
Leon Derczynski and Kalina Bontcheva and Maria Liakata and Rob Procter and Geraldine Wong Sak Hoi and Arkaitz Zubiaga
Media is full of false claims. Even Oxford Dictionaries named “post-truth” as the word of 2016. This makes it more important than ever to build systems that can identify the veracity of a story, and the nature of the discourse around it.RumourEval is a SemEval shared task that aims to identify and handle rumours and reactions to them, in text. We present an annotation scheme, a large dataset covering multiple topics – each having their own families of claims and replies – and use these to pose two concrete challenges as well as the results achieved by participants on these challenges.
http://www.derczynski.com/sheffield/papers/rumoureval-task.pdf
Prospecting Socially-Aware Concepts and Artefacts for Designing for Community...COMRADES project
This document discusses concepts from the Socially-Aware Design approach that could help inform the design of technologies to boost community resilience for refugees. It introduces the Semiotic Onion model, which views a design problem across technical, formal, and informal sociocultural layers. It also discusses Edward Hall's Basic Building Blocks of Culture as a way to understand cultural aspects that may influence design. The document argues these concepts could help identify important values, threats, and resilience factors for refugee communities to ensure designs are aligned with their needs and context.
A Semantic Graph-based Approach for Radicalisation Detection on Social MediaCOMRADES project
This document presents a semantic graph-based approach for detecting radicalization on social media, specifically Twitter. The approach extracts semantic concepts and relations from tweets and represents them as graphs. Frequent subgraph mining is used to identify patterns that distinguish pro-ISIS and anti-ISIS stances. Classifiers are trained using these "semantic features" and are shown to outperform classifiers using only lexical, sentiment, topic and network features. The top entities and relations discussed differ between pro-ISIS and anti-ISIS users.
Behind the Scenes of Scenario-Based Training: Understanding Scenario Design a...COMRADES project
This document discusses scenario-based training exercises for disaster response organizations. It notes that current scenario design processes are often inflexible and do not adapt to how coordination emerges in real disasters. The authors observed a large humanitarian aid exercise called TRIPLEX-2016 to understand current scenario design practices. They propose an adaptive scenario generator that can select and adjust scenarios in real-time to better address both individual and collective learning goals for organizations responding to complex, uncertain disasters.
Sustainable Performance Measurement for Humanitarian Supply Chain Operations COMRADES project
WiPe Paper – Logistics and Supply-Chain Proceedings of the 14th ISCRAM Conference – Albi, France, May 2017 Tina Comes, Frédérick Bénaben, Chihab Hanachi, Matthieu Lauras, Aurélie Montarnal, eds.
http://idl.iscram.org/files/lauralagunasalvado/2017/1510_LauraLagunaSalvado_etal2017.pd
Detecting Important Life Events on Twitter Using Frequent Semantic and Syntac...COMRADES project
Dickinson, Thomas; Fernandez, Miriam; Thomas, Lisa; Mulholland, Paul; Briggs, Pam and Alani, Harith (2016). Detecting Important Life Events on Twitter Using Frequent Semantic and Syntactic Subgraphs. IADIS International Journal on WWW/Internet, 14(2) pp. 23–37.
http://oro.open.ac.uk/48678/
DoRES — A Three-tier Ontology for Modelling Crises in the Digital AgeCOMRADES project
Burel, Gregoire; Piccolo, Lara S. G.; Meesters, Kenny and Alani, Harith (2017). DoRES — A Three-tier Ontology for Modelling Crises in the Digital Age. In: ISCRAM 2017 Conference Proceedings, (in press).
http://oro.open.ac.uk/49285/
D2.1 Requirements for boosting community resilience in crisis situationCOMRADES project
COMRADES (Collective Platform for Community Resilience and Social Innovation during Crises, www.comrades-project.eu) aims to empower communities with intelligent socio-technical solutions to help them reconnect, respond to, and recover from crisis situations.
This deliverable reviews theories, conceptual frameworks, standards (e.g., BS 11200 and ISO 22320-2011) and indicators of community resilience. It analyses relevant use cases and reports of resilience around various type of crises.
These assessments enable the project team to present a definition of community resilience that is tailored specifically to the needs and aims of the COMRADES project, which focuses on the role of information and technology as a driver of community resilience.
The project’s stance to improve resilience can therefore be summarized by:
Continuously enhancing community resilience through ICT, instead of focusing on specific shocks or disruptive events.
Enabling a broad range of actors to acquire a relevant, consistent and coherent understanding of a stressing situation.
Empower decision makers and trigger community engagement on response and recovery efforts, including long-term mitigation and preparation.
The deliverable uses this definition and the project’s overall understanding of community resilience, as presented in the review sections, to specify initial design and functional requirements for adopting and integrating resilience procedures and methodologies into the COMRADES collective resilience platform. At the same time, we develop an evaluation framework that enables us to measure the contribution of the COMRADES resilience platform to building community resilience.
Finally, we outline three prototypical crisis scenarios that will enable the project development, test and evaluation. As such, the deliverable informs the design of the community workshops and interviews planned that will be conducted in T2.2 and T2.3, as well as the work of technology development, particularly in WP5.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847
OMRADES is creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847 This material reflects only the authors view and the European Commission is not responsible for any use that may be made of the information it contains
OMRADES is creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847 This material reflects only the authors view and the European Commission is not responsible for any use that may be made of the information it contains
OMRADES is creating an open‐source, community resilience platform, designed by communities, for communities, to help them reconnect, respond to, and recover from crisis situations. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 687847 This material reflects only the authors view and the European Commission is not responsible for any use that may be made of the information it contains
Oshawa 2025 Economic Update: Growth, Investment, and Future-Facing Strategyjking57
Explore the 2025 Chair’s Update for the City of Oshawa’s Economic and Development Services Committee, presented by Councillor Tito-Dante Marimpietri. This official presentation outlines key development statistics, investment highlights, policy incentives, and strategic priorities that are shaping the city’s next chapter of growth.
With over $4 billion in total development since 2018—including $1.3 billion in non-residential investment—Oshawa is actively positioning itself as a dynamic growth hub in the eastern Greater Toronto Area.
The presentation covers:
Major commercial and industrial investments, including General Motors, Ontario Power Generation, and Panattoni
Residential growth trends and long-term population forecasts, targeting 220,000 residents by 2051
Strategic incentives such as zero development charges for industrial and downtown builds
Community Improvement Plans designed to attract new projects and support business retention
Emerging growth sectors including AI, EV technologies, cybersecurity, and defence
Recent international trade missions and Oshawa’s positioning within the North American market
Infrastructure readiness, expedited approvals, and regional coordination for investment attraction
Northwood Business Park and new industrial zones under active development
This deck is a valuable resource for investors, developers, site selectors, policy advisors, and economic development professionals seeking insight into how Oshawa is adapting and thriving in a competitive landscape.
For more information, visit oshawa.ca/InvestOshawa
2025 - JSchaus & Associates in Washington DC present a complimentary webinar series covering The DFARS, Defense Federal Acquisition Regulation Supplement. Learn about US Federal Government Contracting with The Department of Defense, DoD. Defense Contracting. Defense Acquisition. Federal Contracting.
Link To Video:
https://youtu.be/tcvQqFqjKNA
Subscribe to Our YouTube Channel for complimentary access to US Federal Government Contracting videos:
https://www.youtube.com/@jenniferschaus/videos
RO Sustainable Development and Local Enterprises - Erasmus+ PresentationPavel26766
The Romanian presentation, created for the educational purposes of the Erasmus+ project "Rhapsody in Green", an international educational project connecting young people from Czechia, Romania, Lithuania, Poland, Serbia and Hungary with aim to educate them in the fields of financial literacy, sustainability and entrepreneurship. The project was funded by the Erasmus+ Programme of the European Union.
Find more information about the Erasmus+ Youth Exchanges:
https://erasmus-plus.ec.europa.eu/opportunities/opportunities-for-individuals/youth-exchanges
PPT Item # 12 - AH Pool Pocket Park Architectahcitycouncil
On Semantics and Deep Learning for Event Detection in Crisis Situations
1. On Semantics and Deep Learning for Event Detection in
Crisis Situations
Gr´egoire Burel, Hassan Saif, Miriam Fernandez, and Harith Alani
Knowledge Media Institute, The Open University, United Kingdom
{g.burel, h.saif, m.fernandez, h.alani}@open.ac.uk
Abstract. In this paper, we introduce Dual-CNN, a semantically-enhanced deep
learning model to target the problem of event detection in crisis situations from
social media data. A layer of semantics is added to a traditional Convolutional
Neural Network (CNN) model to capture the contextual information that is gen-
erally scarce in short, ill-formed social media messages. Our results show that
our methods are able to successfully identify the existence of events, and event
types (hurricane, floods, etc.) accurately (> 79% F-measure), but the performance
of the model significantly drops (61% F-measure) when identifying fine-grained
event-related information (affected individuals, damaged infrastructures, etc.).
These results are competitive with more traditional Machine Learning models,
such as SVM.
Keywords: Event Detection, Semantic Deep Learning, Word Embeddings, Se-
mantic Embeddings, CNN, Dual-CNN.
1 Introduction
Social media has emerged as a dominant channel for communities to gather and spread in-
formation during crises. Such media has proven itself as an invaluable information source
in several recent natural and social crisis situations, such as floods [26], earthquakes [21],
wildfires [29], nuclear disasters [28], and civil wars [4].
A survey by the American Red Cross showed that 40% of the population would use
social media during a crisis, and 76% of them expect their help requests to be answered
within three hours. Doing this through manual analysis, however, is far from trivial, due
to the sheer data volumes and velocity. For example, in a single day during the 2011
Japan earthquake, 177 million tweets related to the crisis were sent [5].
Although information is paramount during such major crises, it is almost impossible
for organisations and communities to manually absorb, process, and turn the sheer
volume of social media data during crisis into sensible, actionable information [10].
Tools to automatically identify the type of emergency events reported by citizens (e.g.,
need shelter, trapped in building) are largely unavailable. Genuine help requests are often
difficult to spot, group and validate, and many urgent aid requests by individual citizens
could go unnoticed.
Several works exist in the literature that focus on detecting general and global
events and themes from social media (floods, wildfires, bombings, etc.). However,
the automatic identification of fine-grained emergency-related information [19] (e.g.,
affected individuals, infrastructure, etc.) is still in its infancy.
2. Current works for event identification from social media data make use of supervised
and unsupervised Machine Learning (ML) methods, such as classifiers, clustering and
language models [1]. More recently, deep learning has emerged as a promising ML
technique able to capture high level abstractions in the data, providing significant im-
provement for various tasks over more traditional ML methods, such as text classification
[13], machine translation [2, 7] or sentiment analysis [27, 8]. However, to the best of
our knowledge deep learning has not been applied yet to the problem of fine-grained
information detection in crisis situations.
An advantage of the usage of deep learning is the capacity of the model to capture
multiple layers of information. Our hypothesis is that, by encapsulating a layer of
semantics into the deep learning model, we can provide a better characterisation of the
contextual information, generally scarce in short, ill-formed social media messages;
leading to a more accurate event identification.
We therefore propose in this paper a semantically enhanced Dual-CNN deep-learning
model to target the problem of event detection in crisis situations. Our results show that
our proposed model is able to successfully identify the existence of an event, and the
event type (hurricane, floods, etc.) with > 79% F-measure, but the performance of the
model significantly drops (61% F-measure) when identifying fine-grained event-related
information, showing competitive results with more traditional ML techniques, such as
SVM.
Our hypothesis is that the semantics extracted from tweets may not be sufficient to
capture the level of contextual information needed for an accurate fine-grained event
identification. Our future work therefore aims to enhance the semantic information
extracted from tweets with additional methods to enrich the data abstraction captured by
our proposed deep learning model.
The contributions of this paper can therefore be summarised as follows: 1) The gener-
ation of a deep learning model (Dual-CNN) to target the problem of event identification
in crisis situations, and; 2) The exploration of how semantic information can be used to
enrich the deep-learning data representations.
The rest of the paper is structured as follows. Section 2 shows related work on the
areas of event detection and deep learning. Section 3 describes the scenario targeted in
this paper and the different types of events that we aim to identify. Section 4 describes
our proposed deep learning model for event identification. Sections 5 and 6 show our
evaluation set up and the results of our experiments. Section 7 describes our reflections
and our planned future work. Section 8 concludes the paper.
2 Related Work
Recently, several works have introduced the use of deep learning for event detection [6,
9, 17, 11, 31]. Unlike traditional ML feature-based methods, deep learning models do
not generally require heavy feature engineering, and are therefore less prone to error
propagation, caused by using external NLP and text processing tools. Also, deep learning
models are more generic and tolerant to domain and context variations than feature-based
models, as the former use word embeddings as a more general and richer representation
of words. [17]
Pioneer works in this vein include [6, 9, 17]. These works address the problem of
event detection at the sentence and/or phrase level by first identifying the event triggers
3. in a given sentence (which could be a verb or nominalisation) and classifying them into
specific types. For example, the word “release” in “The European Unit will release
20 million euros to Iraq” is a trigger for the event “Transfer-Money”. Multiple deep
learning models have been proposed to address the above problem. For example, Nguyen
and Grishman [17] use a Convolutional Neural Network (CNN) [15] with three input
channels, corresponding to word embeddings, word position embeddings and entity type
embeddings, to learn a word representation and use it to infer whether a word is an event
trigger or not. Chen et al. [6] argue that a sentence may contain two or more events and
that using a traditional CNN model with a max-pooling layer1
often leads to capture
clues of one event in the sentence but to miss the rest. To address this issue, the authors
propose using a CNN with a dynamic multi-pooling layer to obtain a maximum value for
each part of a sentence and therefore cover more valuable clues of the events within it.
Feng et al. [9] use a hybrid neural network model for cross-language event detection.
The proposed model incorporates both, a bidirectional LSTM (Bi-LSTM) [24] and
CNN component. Bi-LSTM captures contextual semantics of a given word by means
of its preceding and following information in the text, while CNN is used to capture
structure information from the local contexts (i.g., sentence chunks). Results show that
the proposed model achieves relatively high and robust performance when applied to
data of multiple languages including English, Chinese and Spanish, in comparison with
traditional feature-based approaches.
It is worth noting that the above works experiment with their approach on the ACE
2005 event extraction corpus [30], which consists of a set of news articles collected from
several online newspapers.
Our work in this paper differs from the above works in two main aspects: First,
while the above works target the problem at the sentence level, our proposed model
aims to detect events related to crisis situations at different detection levels (see Section
3). Secondly, in addition to using word embeddings, our model uses the conceptual
semantics word embeddings (i.e., semantics extracted from external knowledge sources)
as additional input layer to better capture the events’ contextual and conceptual clues in
the tweets as described in Section 4.
3 Scenario
During crises, a very large number, sometimes in the millions, of messages are often
posted on various social media platforms by using the hashtags dedicated to the crises at
hand. However, a good percentage of those messages are irrelevant or uninformative.
Olteanu and colleagues observe that crises reports could be classified into three main
categories of informativeness; related and informative, related but not informative, and
not related [19]. The percentage of relevant and informative social reports during crises
varies a great deal, ranging from 10% in some cases2
to 65% in others [25]. However,
buried under very many mundane and irrelevant tweets, sometime one emerge that needs
an urgent response.
1
in a CNN, a max-pooling layer applies a max operation over the representation of an entire
sentence to capture the most useful information.
2
Behavioral & Linguistic Analysis of Disaster Tweets, http://irevolution.net/
2012/07/18/disaster-tweets-for-situational-awareness/.
4. Our goal in this paper is to develop models to efficiently identify the messages of
sufficient relevance and value. For this purpose, and based on the event types identified
by [19] we consider the following three tasks when developing our approach:
– Task1 - Crisis vs. non crisis related messages: The goal of this task is to differentiate
those posts that are related to a crisis situation vs. those posts that do not.
– Task2 - Type of crisis: The goal of this task is to identify the different types of crises
the message is related to. Following the work of [19] we consider the following
types of natural and human-induced types of crises: shooting, explosion, building
collapse, fires, floods, meteorite fall, haze, bombing, typhoon, crash, earthquake and
derailment.
– Task3 - Type of information: the goal of this task is to provide a fine-grained in-
formation detection in crises situations. Following the work of [19] we consider
the following categories of crisis-related information: affected individuals, infras-
tructures and utilities, donations and volunteer, caution and advice, sympathy and
emotional support, useful information, other.
4 A Semantic Deep Learning Approach for Event Detection
Event detection in the context of Twitter is a text classification task where the aim is to
identify if a given document (post) describes or is related to an event. In this section we
describe our proposed Dual-CNN model, a semantically enriched deep learning model
for event detection on Twitter.
Besides relying on word embeddings, the proposed model also learns a semantic
embeddings representation from word concepts that aims at better capturing the latent
clues of the event description in tweets and consequently enhance the automatic detection
of events.
The pipeline of our model consists of five main phases as depicted in Figure 1:
1. Text Processing: A collection of input tweets are cleaned and tokenised for later
stages;
2. Word Vector Initialisation: Given a bag of words produced in the previous stage and
a pre-trained word embeddings, a matrix of word embedding is constructed to be
used for model training;
3. Concept Extraction: This phase run in parallel with the previous phase. Here the se-
mantic concepts of named-entities in tweets are extracted using an external semantic
extraction tool;
4. Concepts Vector Initialisation: this stage constructs a vector representation for each
of the extracted entities as well as the entities’ associated concepts;
5. Dual-CNN Training: in this phase our proposed Dual-CNN model is trained from
both, the word embeddings matrix and the semantics embeddings matrix.
In the following subsections we describe each of the phases of the pipeline in more
detail.
4.1 Text Preprocessing
Tweets are usually composed of incomplete, noisy and poorly structured sentences due
to the frequent presence of abbreviations, irregular expressions, ill-formed words and
5. Tweets Preprocessing
Concept Extraction
Word
Vectors Initialisation
Dual-CNN Training
Pre-trained
Embeddings
Concepts
Vectors Initialisation
Bag of Words
Bag of Concepts
T = “Obama
attends vigil for
Boston Marathon
bombing victims”
W = [obama, attends, vigil, for, boston,
marathon, bombing, victims]
C = [obama, politician, none, none,
none, boston, location, none, none,
none]
Embeddings
Embeddings
obama
politician
none
none
none
boston
location
none
none
none
obama
attends
vigil
for
boston
marathon
bombing
victims
Fig. 1: Pipeline of the proposed semantic Dual-CNN deep learning model event detection model.
non-dictionary terms. This phase therefore applies a series of preprocessing steps to
reduce the amount of noise in tweets including, for example, the removal URLs, and all
non-ASCII and non English characters. After that, the processed tweets are tokenized
into words that are consequently passed as input to the word embeddings phase.
4.2 Word Vector Initialisation
An important part for applying deep neural networks to text classification is to use word
embeddings. As such, this phase aims to initialise a matrix of word embeddings for
training the event classification model.
Word embeddings is a general name that refers to a vectorised representation of
words, where words are mapped to vectors instead of a one dimension space [3]. The
main idea is that semantically close words should have a similar vector representation
instead of a distinct representation. Different methods have been proposed for generating
embeddings such has Word2Vec [16] and GloVe [20] and they have shown to improve
the performance in multiple NLP tasks. Hence, in this wok we choose to bootstrap
our model with Google’s pre-trained Word2Vec model [16] to construct our word
embeddings matrix, where rows in the matrix represent embeddings vectors of the words
in the Twitter dataset.
4.3 Concept Extraction and Semantics Vector Initialisation
As mentioned in the previous step, using word embeddings for training deep learning
classification models has shown to substantially improve classification performance.
However, conventional word embeddings methods merely rely on the context of a word
in the text to learn its embeddings. As such, learning word embeddings from Twitter
data might not be as sufficient for our training our classifier because tweets often lack
context due to their short length and noisy nature.
To address this issue, we propose to enrich the training process of our proposed
model with the semantic embeddings of words in order to better capture the context of
tweets. To this end, we use AlchemyAPI3
to first extract named entities from tweets (e.g.
‘Oklahoma’, ‘Obama’, ‘Red Cross’) and map them to their corresponding semantic sub-
3
Alchemy API, http://www.ibm.com/watson/alchemy-api.html.
6. types (e.g. ‘Location’ , ‘Politician’, ‘Non-Profit Organisation’) using multiple semantic
knowledge bases including DBpedia4
and Freebase.5
After that, we represent each of the extracted entities and semantic types as a
vector using an approach similar to the word embeddings. As a result, the semantic
representation of documents (i.e. the entities and their associated semantic subtypes)
become represented as a semantic embedding matrix, which is used for training the
proposed Dual-CNN model.
4.4 Dual-representation CNN Model for Text Classification
This phase aims to train our Dual-CNN model from the word and semantic embed-
dings matrices. Below we describe our CNN-Model along with the proposed training
procedure.
As discussed in section 2, CNN can be used for classifying sentences or documents
[13]. The main idea is to use word embeddings coupled with multiple convolutions of
varying sizes that extract important information from a set of words in a given sentence,
or a document, and then apply a softmax function that predict its class.
Kim’s model [13] is a simple CNN model widely used for text classification. It
consists of a convolution layer (with three region sizes and multiple filters per region)
followed by a max-pooling phase and a fully connected layer where the softmax function
is applied for predicting the document classes.
In this paper, we propose to extend the aforementioned CNN model with an ad-
ditional semantic representation layer representing the named entities in tweets and
their associated semantic subtypes. Although, in principle, the most logical method
for adding a semantic representation to an existing word-embedding CNN model is to
use an additional channel, as it is commonly used in image classification, it requires
one-to-one mappings between each embedding channel; meaning that the words and
semantic tokenisations of a document need to match exactly (i.e. for a given document,
the word and semantic embeddings need to have the same length and width).
Nonetheless, one-to-one mappings between word tokens and their meanings can
not be enforced. For example, a document D = ‘Obama attends vigil for Boston
Marathon bombing victims.’ may be tokenised as Tw = [‘obama’, ‘attends’, ‘vigil’,
‘for’, ‘boston’, ‘marathon’, ‘bombing’, ‘victims’] by a word tokeniser whereas a se-
mantic tokeniser may split D as Ts = [‘obama’, ‘politician’, ‘none’, ‘none’, ‘none’,
‘boston’, ‘location’, ‘none’, ‘none’, ‘none’] using entity and entity-type tokens. In this
context, the embedding of both Tw and Ts cannot be used directly as different channels
of the embedding representation of D, as they have different length.
In order to deal with this particular issue, we decided to add a parallel convolutional
layer that is computed separately from the word embeddings. This is done before a
merging step, that concatenates the max-pooling steps for each representation layer, and
before applying the softmax step that classifies individual documents as depicted in
Figure 2
4
DBpedia, http://dbpedia.org.
5
Freebase, http://www.freebase.com.
7. Fig. 2: Dual-representation Convolutional Neural Network (CNN) for text classification with word
embeddings and semantic embeddings representations
5 Experimental Setup
Here we present the experimental setup used to assess our event detection model. As
mentioned in Section 3, we aim to apply and test the proposed model in three different
tasks. As such, our evaluation setup requires the selection of (i) Twitter datasets, (ii) the
semantic extraction tool, and (iii) baseline models for cross-comparison.
5.1 Dataset
To assess the performance of the event detection model we require the use of datasets
where each tweet is annotated with: whether or not it relates to a crisis event, the type
of crisis (earthquake, flood, etc.) and the type of information (affected individuals,
infrastructures, etc.) - see Section 3 for more details. For the purpose of this work we
use the CrisisLexT26 dataset.[18]
CrisisLexT26 includes tweets collected during 26 crisis events in 2012 and 2013.
Each crisis contains around 1,000 annotated tweets for a total of around 28,000 tweets
with labels that indicate if a tweet is related or unrelated to a crisis event (i.e. re-
lated/unrelated, Task1)
For the second task (see Section 3), we need a list of crisis types. In order to obtain
such information, we consider that the annotated tweets that are from the same sub-
collection belong to the same type of event. Using this approach we obtain 12 different
crisis types (shooting, explosion, building collapse, fires, floods, meteorite fall, haze,
bombing, typhoon, crash, earthquake and derailment) (Task 1).
The CrisisLextT26 tweets are also annotated with additional labels indicating the type
of information present in the tweet (affected individuals, infrastructures and utilities,
donations and volunteer, caution and advice, sympathy and emotional support, and
8. useful information and unknown, Task 3). More information about the CrisisLexT26
dataset can be found on the CrisisLex website.6
Since the annotations tend to be unbalanced, we also create a balanced version of the
dataset for each task by performing biased random undersampling using tweets from
each sub-collection. As a result, the first task dataset is reduced to 6703 tweets (24%),
the second task to 12997 tweets (46.5%) and the final task to 9105 tweets (32.6%).
5.2 Semantic Extraction
As mentioned in Section 4, the Dual-CNN model integrates the conceptual semantics of
words as semantic embeddings to better capture event clues in tweets. We take conceptual
semantics to refer to the semantic types (e.g. ‘Location’ , ‘Politician’, ‘Non-Profit
Organisation’) of named-entities (e.g. ‘Oklahoma’ , ‘Obama’, ‘Red Cross’) in tweets. To
extract this type of semantic from our Twitter datasets we use the AlchemyAPI semantic
extraction tool due to its accuracy and high coverage of semantic types in comparison
with other semantic extraction services [22, 23]. Nevertheless, only 16.6% of the dataset
tweets get annotated by the semantic extraction tool.
6 Evaluation
In this section, we report the results obtained from using the proposed Dual-CNN
model for crisis event detection of tweets under three evaluation tasks: (Task1) Crisis
vs. non crisis related tweets, (Task2) type of crisis, and (Task3) type of information.
Our baselines of comparison are three traditional machine learning classifiers: Naive
Bayes, Classification and Regression Trees (CART), and SVM with RBF kernels trained
from words unigrams. We initialise our CNN models with the Google News 3 million
words and phrases pre-trained word embeddings data.7
Results for all experiments are
computed using 5-fold cross validation. For each task, we perform the evaluation on the
full and undersampled versions of the dataset.
We train the CNN model using 300 long word embeddings vectors with Fn = 128
convolutional filter of sizes Fs = [3, 4, 5]. For the Dual-CNN model, we use the same
parameters except that for the semantic embeddings, we use 30 long vectors since we
have very few semantic concepts compared to the size of words lexicon. For avoiding
over-fitting, we use a dropout of 0.5 during training and use the ADAM gradient decent
algorithm [14]. We perform 400 iterations with a batch size of 256.
Table 1 shows the results of our event detection classifiers for the three evaluation
tasks on the full and undersampled versions of the dataset. In particular, the table reports
the precision (P), recall (R), and F1-measure (F1) for each evaluation Task and model.
The table also reports the types of features and embeddings used to train the different
classifiers.
6.1 Baselines Results
As seen in Table 1, the results for each task and each baseline show that the first two
tasks are relatively easy to predict whereas predicting information types is much more
complex. In general we also observe that SVM is the best performing algorithm followed
6
CrisisLex T26 Dataset, http://www.crisislex.org/data-collections.html
#CrisisLexT26.
7
Google Word2Vec, https://code.google.com/archive/p/word2vec
9. Table 1: Event detection performance of baselines and our proposed CNN models under the three evaluation Tasks on full
and undersampled datasets. PT-Embed: Pre-trained word embeddings. PTS-Embeddings: pre-trained word embeddings and
semantic word embeddings.
Related/Unrelated Event Types Information Types
Model Data Features P R F 1 P R F 1 P R F 1
NAIVE BAYES Full TF-IDF 0.846 0.684 0.733 0.941 0.927 0.933 0.600 0.570 0.579
CART Full TF-IDF 0.742 0.707 0.723 0.992 0.992 0.992 0.506 0.491 0.497
SVM Full TF-IDF 0.870 0.738 0.785 0.997 0.996 0.997 0.642 0.604 0.616
CNN Full PT-Embed 0.861 0.744 0.797 0.991 0.986 0.988 0.634 0.590 0.609
DUAL-CNN Full PTS-Embed 0.857 0.762 0.798 0.990 0.985 0.988 0.648 0.581 0.601
NAIVE BAYES Sample TF-IDF 0.795 0.787 0.785 0.929 0.928 0.928 0.558 0.563 0.556
CART Sample TF-IDF 0.770 0.769 0.769 0.988 0.988 0.988 0.471 0.464 0.464
SVM Sample TF-IDF 0.833 0.830 0.829 0.995 0.995 0.995 0.606 0.609 0.605
CNN Sample PT-Embed 0.839 0.838 0.838 0.983 0.983 0.983 0.610 0.610 0.610
DUAL-CNN Sample PTS-Embed 0.835 0.833 0.833 0.985 0.985 0.985 0.615 0.615 0.613
by CART and Naive Bayes. For the first two tasks with the full data, each method achieve
precision, recall and F1 > 0.72 and SVM appears to be the best model with F1 = 0.785
for identifying crisis related tweets and F1 = 0.997 for identifying event types. The task
of identifying information types show much lower F1 accross the board. This is probably
due to the fact that compared to the previous tasks, information types probably contain
much more general terms in each class. Similarly to the prevous tasks, SVM performs
the best with F1 = 0.616.
With the balanced datasets, the results are similar. However, the predictions for the
first task increase by around +4.8%. This results is likely due to the fact that the first task
was the most imbalanced task and benefits the most from the undersampling process.
The high precision and recall results observed for the second task (F1 = 0.997)
suggests that the different models overfit the data. The issue was not resolved by under-
sampling the data with an F1 of 0.995. Looking at the data in more details, we observe
that each category contains very clear category indicators. For instance, 77% of the
tweets about meteorite falls contain the word meteor, whereas 76.2% of the tweets about
explosions contains the word Boston. In order to reduce such issue, we could for instance
remove some of these words from the dataset so the models become less tied to practical
event instances (e.g the Boston bombings).
6.2 CNN and Dual-CNN Results
In general, applying CNNs with pre-trained word embeddings (PT-Embed) for both the
full and undersampled data does not improve significantly over SVM. Using the full
dataset, we obtain an F1 of 0.797 for the crises related tweets and full dataset, 0.988 for
event types detection and 0.616 for information type identification. We also observer
very little difference between the CNN model and Dual-CNN model despite adding an
additional semantic layer.
When using the undersampled datasets, the results are similar to the previous ob-
servations with an increase of +3.7% in F1 for the first task. There is also a slight
improvement for the last task with +0.6% in F1.
Adding semantics seems to not improve much the accuracy compared to the standard
CNN model. This result may be explained by different factors. First, the size of our
semantic concepts and entities vocabulary is much smaller than the word lexicon with
only 265 semantic terms compared to 57,577 words. Second more than 83.5% of the
10. Tweets appear to not have any concepts. This means that very little semantic context is
available for each Tweet and that the extracted semantic information has little impact
on the predictive power of our model. Such issue could be alleviated by using better
semantic extraction techniques or using a more complex semantic representation of
Tweets. We could also increase the number of iterations and the size of the batches to
improving the performance of the model.
7 Discussion and Future Work
In this paper we introduced the use of conceptual semantics embeddings in deep learning
CNN models for detecting events on Twitter. This section discusses the limitations of
the presented work as well as different areas of future investigations.
We experimented with our proposed Dual-CNN model on three event detection
tasks (Section 3) and observed that identifying crisis related events and event types in
tweets (i.e. Task 1 and Task 2) with high accuracy appears to be a relatively easy task
that can be fulfilled well with both traditional models such as SVM and CNN models.
Identifying the types of information provided in crisis related tweets (Task 3) is much
more challenging as tweets mentioning event information types tend to contain much
more general terms in each classes than the tweets that are related or unrelated to crises
or are discussing different types of events.
Looking into the details of the second task, we observed that for this task, the models
were generally overfitted even after balancing the data. The reason seems to be associated
with the presence of very clear category indicators (e.g., place names). In order to reduce
such an issue, we could remove place names from training instances or try to collect
additional data so that the associations between event types and locations is reduced.
Despite using the semantic concepts of words in the proposed Dual-CNN model, we
found no significant improvement compared to the original CNN model. As stated in
the previous section, the lack of improvement is probably linked to the small size of
the semantic vocabulary as well as the ability of the Alchemy API semantic extraction
tool to extract concepts from tweets (only 16.5% of the tweets had semantic concepts
extracted by the Alchemy API). Also, we observed that some of the extracted concepts
were too abstract (e.g., Location) and were mapped to entities in both, event-related and
event-unrelated tweets. This might affect the discrimination power of such concepts and
lead to inaccurate event classification.
As future work, we plan to investigate methods to improve both, the extraction
and the integration of words’ conceptual semantics into our proposed model. For the
semantic extraction part, we plan to increase the number as well as the specificity of the
conceptual semantics, perhaps with the aid of Linked Data or using alternative extractors,
such as TextRazor.8
Concerning the event detection model, we plan to improve the Dual-CNN model
by adding additional convolutional layers and performing parameter optimisation. For
instance we could try to improve the results by modifying the size of the model filters as
well as the number of filters. We could also increase and optimise the number of training
steps in order to obtain better results.
8
TextRazor, http://www.textrazor.com.
11. Our proposed dual layer model is built on top of a CNN network, which assumes
that all inputs (i.e., words and semantic concepts) are loosely coupled with each other.
However, it might be the case that the latent clues of an event can be determined based on
the intrinsic dependencies between the words and semantic concepts of a tweet. Hence,
room for future work is to incorporate these information in our event detection model,
probably by using recurrent neural networks (RNN) [12] due to their ability to capture
sequential information in text.
8 Conclusions
We proposed Dual-CNN, a deep learning model that uses the conceptual semantics of
words for fine-grained event detection in crisis situations. We based our analysis on
Twitter data since it is a social media platform that is widely used during crisis events. We
investigated how named-entities in tweets can be extracted and used, together with their
corresponding semantic concepts as an additional CNN layer to train a deep learning
model for event detection on Twitter. We used our Dual-CNN model on a Twitter dataset
of 26 different crisis events and tested its performance under three event detection tasks.
Results show that our model is able to successfully identify the existence of events, and
event types with > 79% F-measure, but the performance of the model significantly drops
(61% F-measure) when identifying fine-grained event-related information. These results
are competitive with more traditional Machine Learning models, such as SVM.
Acknowledgment: This work has received support from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 687847 (COMRADES).
References
1. Atefeh, F., Khreich, W.: A survey of techniques for event detection in Twitter. Computational
Intelligence 31(1), 132–164 (2015)
2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align
and translate. arXiv preprint arXiv:1409.0473 (2014)
3. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model.
Journal of machine learning research 3(Feb), 1137–1155 (2003)
4. Bercovici, J.: Why time magazine used instagram to cover hurricane sandy. Obtenido de
http://www. forbes. com/sites/jeffbercovici/2012/11/01/why-time-magazine-used-instagram-
to-cover-hurricane-sandy (2012)
5. Campanella, T.J.: Urban resilience and the recovery of New Orleans. Journal of the American
Planning Association 72(2), 141–146 (2006)
6. Chen, Y., Xu, L., Liu, K., Zeng, D., Zhao, J.: Event extraction via dynamic multi-pooling
convolutional neural networks. In: ACL (1). pp. 167–176 (2015)
7. Cho, K., Van Merri¨enboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio,
Y.: Learning phrase representations using RNN encoder-decoder for statistical machine trans-
lation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language
Processing (EMNLP 2014) (2014)
8. Dos Santos, C.N., Gatti, M.: Deep convolutional neural networks for sentiment analysis of
short texts. In: COLING. pp. 69–78 (2014)
9. Feng, X., Huang, L., Tang, D., Qin, B., Ji, H., Liu, T.: A language-independent neural network
for event detection. In: The 54th Annual Meeting of the Association for Computational
Linguistics. p. 66 (2016)
10. Gao, H., Barbier, G., Goolsby, R.: Harnessing the crowdsourcing power of social media for
disaster relief. IEEE Intelligent Systems 26(3), 10–14 (2011)
12. 11. Ghaeini, R., Fern, X.Z., Huang, L., Tadepalli, P.: Event nugget detection with forward-
backward recurrent neural networks. In: The 54th Annual Meeting of the Association for
Computational Linguistics. p. 369 (2016)
12. Graves, A.: Supervised sequence labelling. In: Supervised Sequence Labelling with Recurrent
Neural Networks, pp. 5–13. Springer (2012)
13. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the
2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014)
(2014)
14. Kingma, D., Ba, J.: Adam: A method for stochastic optimization. In: Proceedings of the 3rd
International Conference on Learning Representations (ICLR) (2014)
15. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document
recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in
vector space. arXiv preprint arXiv:1301.3781 (2013)
17. Nguyen, T.H., Grishman, R.: Event detection and domain adaptation with convolutional neural
networks. In: ACL (2). pp. 365–371 (2015)
18. Olteanu, A., Castillo, C., Diaz, F., Vieweg, S.: CrisisLex: A lexicon for collecting and filtering
microblogged communications in crises. In: ICWSM (2014)
19. Olteanu, A., Vieweg, S., Castillo, C.: What to expect when the unexpected happens: Social
media communications across crises. In: Proceedings of the 18th ACM Conference on
Computer Supported Cooperative Work & Social Computing. pp. 994–1009. ACM (2015)
20. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In:
Empirical Methods in Natural Language Processing (EMNLP). pp. 1532–1543 (2014)
21. Qu, Y., Huang, C., Zhang, P., Zhang, J.: Microblogging after a major disaster in China: a
case study of the 2010 Yushu earthquake. In: Proceedings of the ACM 2011 conference on
Computer supported cooperative work. pp. 25–34. ACM (2011)
22. Rizzo, G., Troncy, R.: NERD: Evaluating named entity recognition tools in the web of data.
In: Workshop on Web Scale Knowledge Extraction (WEKEX11). vol. 21 (2011)
23. Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of twitter. In: Proc. 11th Int. Semantic
Web Conf. (ISWC). Boston, MA (2012)
24. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Transactions on
Signal Processing 45(11), 2673–2681 (1997)
25. Sinnappan, S., Farrell, C., Stewart, E.: Priceless tweets! a study on Twitter messages posted
during crisis: Black Saturday. ACIS 2010 Proceedings 39 (2010)
26. Starbird, K., Palen, L., Hughes, A.L., Vieweg, S.: Chatter on the red: what hazards threat
reveals about the social life of microblogged information. In: Proceedings of the 2010 ACM
conference on Computer supported cooperative work. pp. 241–250. ACM (2010)
27. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for
sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in
Natural Language Processing (EMNLP 2015). pp. 1422–1432 (2015)
28. Thomson, R., Ito, N., Suda, H., Lin, F., Liu, Y., Hayasaka, R., Isochi, R., Wang, Z.: Trusting
tweets: The Fukushima disaster and information source credibility on twitter. In: Proceedings
of the 9th International ISCRAM Conference. pp. 1–10 (2012)
29. Vieweg, S., Hughes, A.L., Starbird, K., Palen, L.: Microblogging during two natural hazards
events: what twitter may contribute to situational awareness. In: Proceedings of the SIGCHI
conference on human factors in computing systems. pp. 1079–1088. ACM (2010)
30. Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus.
Linguistic Data Consortium, Philadelphia 57 (2006)
31. Zeng, Y., Yang, H., Feng, Y., Wang, Z., Zhao, D.: A convolution BiLSTM neural network
model for chinese event extraction. In: International Conference on Computer Processing of
Oriental Languages. pp. 275–287. Springer (2016)