SlideShare a Scribd company logo
JoeOlson
DataArchitect
SmartChicagoCollaborative
27Mar2014
joe.olson@cct.org
(All the cool buzzwords in one place!)
Social Media,
Cloud Computing,
Machine Learning,
Open Source, and
Big Data Analytics
Social Media - Twitter
• What can we learn from Twitter?
• 400 million tweets per day
source: http://articles.washingtonpost.com/2013-03-21/business/37889387_1_tweets-jack-dorsey-twitter
• 218 million users
source: http://techcrunch.com/2013/10/03/bweeting/
• Excellent source of sentiment
• Excellent source of big data
• Prototyping
• Modeling natural language
• Resume padding
Social Media - Twitter
• How do we get at the data?
• Twitter provided APIs:
• https://dev.twitter.com/docs
• Streaming
• Set up a real time data stream (json) based on keywords
• REST (v1.1)
• Make REST requests, and get results
• Possible parameters:
• Geospatial bounding box
• By time
• By user, hashtag, retweets etc
• Fire hose
• Big $$$. Big data
Social Media - Twitter
• Information & Obstacles
• Who
• What
• At best: Plain English (!)
• Worse: (Spanish or Arabic or Portuguese...)
• Worst: “Textspeak” symbols :-0, UTF8 chars, etc.
• Absolute Worst: combination of all of them
• Where
• 1-2% with latitude / longitude
• Geocode
• When
Social Media - Twitter
JSON Tweet example:
• "created_at":"Sun Oct 27 13:57:40 +0000 2013",
• "id":394462908261740540,
• "text":"Flu :(",
• "source":"<a href="http://twitmania.com" rel="nofollow">TwitMania™</a>",
• "user":{
• "id":594141140,
• "name":"Yultiana Farida N",
• "screen_name":"yultiana",
• "followers_count":231,
• "friends_count":252,
• "created_at":"Tue May 29 23:58:25 +0000 2012",
• "statuses_count":2397,
• },
• "geo":null,
Cloud Computing
• What does cloud computing bring to the table?
• Amazon’s EC2:
• Commoditized hardware
• Low cost
• Only charged for resources you use
• No long term commitments
• Scalable
• "Throwaway" mentality
**IF** you play by their rules!
Cloud Computing – AWS
• Tools
• Virtual Machines
• # of Processors, RAM, OS, disk capacity and I/O – all configurable
• Price range: $.02/hr - $4.60/hr
• Licensed OSes cost 50% more than Linux OSes
• Archive Storage
• S3 / Glacier
• Work Queues
• SQS
• Data Stores
• Dynamo (key value store), Red Shift (analysis store)
• Virtual Networking
• Routers, VPN gateways, access control lists, etc
• APIs
• Command line
• HTTPS REST
• Native programming languages (Python, bash, PHP, Java etc.)
Ideal for rapid prototyping / proof of concepts
Cloud Computing – AWS
• APIs
• Basic
• Start an instance (and start billing)
• Stop an instance (stop billing)
• Insert item into queue
• Remove item from queue
• Write to backup store
• Ultra advanced
• Reserved vs. on demand vs. spot instances
• Price can drop as much as 80% due to market demand
• Instance can disappear at any time
Big Data Analytics
• Can we skirt the “big data” problem by distilling the tweets
down from millions and millions “noise” tweets into a more
desirable data set?
• Enrich in real time, rather than on archived data, and avoid the
overhead of map/reduce?
• Possible Enrichment of raw data:
• Classification – separate tweets into “relevant” and “irrelevant”
• Geocoding – improve on the 1-2% ?
• Aggregation –> map reduce
• Mapping -> Reduce Function -> Output
• AWS – Elastic Map Reduce
• Clustering
Machine Learning
• Classification: relevant, or irrelevant?
• Human trained model
• Once model is established, bounce new data off it for
classification
• Validation of model
• Accuracy =
(Total # of classifications – Mismatches between machine / human)
Total # of classifications
• Crowdsourcing – AWS Mechanical Turk
• Improve model by feeding disagreements back into the model
• Our best text classification model to date: low 90%
Open Source
• Friendly to the commoditized computing paradigm
• Don’t have to worry about licensing issues
• Contributes to the “throwaway” discipline
• Don’t have to re-invent the wheel (collaboration)
• Solutions applicable to all parts of the architecture
• Acquire data: Node.js – non blocking
• Analyze data: R – statistical engine
• Store and query data: MongoDB (document store) or Riak (key-
value database)
Architecture
• We know Twitter is providing a mountain of data from all parts
of the world
• We know Amazon is providing a framework of low cost, on-
demand, no commitment computing
• Open source is providing a rich tool set
• Goals:
• Architect with cost in mind!
• Enrichment - Real time and after-the-fact enrichment (open data)
• Scalable
• Decoupled
• Service based
• Rapid development
• Prove the concepts
Architecture - Acquire
• Acquire the data from Twitter
• If classifying in real time:
• Store then classify?
• Classify then store?
• Tools
• Twitter streaming API
• Keywords
• Node.js
• Several different packages to interface with Twitter APIs
• Amazon
• EC2
• SQS (?) Extremely useful, but drives the cost up
Architecture - Analyze
• Classification interface
• Service based – HTTP REST
• Push or pull?
• Push – classifiers listen on port 80
• Pull – classifier starts pulling from an established work queue
• Both highly scalable and flexible with respect to cost.
• Stateless
• R
• Human trained machine learning packages available
• Cloud friendly – no licenses
• Automatable – from install, configuration, execution
Architecture - Store
• Store JSON as an object (document store) or normalize (relational
database)?
• Relational databases
• disk I/O intensive – not cloud friendly
• allow complex indexing
• Easy to get a business intelligence front end on them
• Requires a schema / ETL
• Key-value document stores
• Designed to be scalable – doesn’t need fast disks
• Indexing is not nearly as flexible as RDBMS
• More difficult to front a UI – no “drag and drop” tools
• No schema / ETL needed.
• Not as mature
• MongoDB / Riak
Architecture – Presentation
• Least need for cloud friendly scalability here?
• Options
• Licensed BI software – Tableau, Endeca, Jaspersoft, Pentaho
• Open source BI software – SpagoBI
• Roll your own - PHP, Ruby, Visual Basic, Javascript, etc
• Connect to an existing system instead?
Costs – Real Time Classification
• Number of tweets collected per day: 1,000,000 (comfortable - .25%)
• Machine used on EC2 to acquire (node.js): micro
• $.02/hr * 24 hrs = .48/day
• Machine used on EC2 to classify (R): small (x2)
• $.06/hr * 24 hrs = $1.44/day*2 = $2.88/day
• Machine used on EC2 to store (MongoDB): large
• $.24/hr * 24 hrs = $5.76 /day
• Machine used on EC2 for GUI (Apache): small
• $.06/hr * 24 = $1.44
•
$0.48+$2.88+$5.76+$1.44 = $10.56 / 1,000,000 =
.00001056 cents/tweet
Can add more zeros if you relax real-time classification (spot instances)
Costs - Archive
• Size of average tweet: 2.5 KB
• Cost to archive:
• s3 : .095 GB/month
• 0.0000002 per tweet per month
• Glacier: .01 GB/month
• 0.00000002 per tweet per month
• Compression will add even more zeros, but will require more
computing power, and mean more latency for post collection
data analysis. Can be automated.
Use Cases
• Foodborne Chicago (http://foodborne.smartchicagoapps.org/)
• Public-private partnership with City of Chicago Dept. of Public Health
and Smart Chicago Collaborative
• Reach out to city residents on Twitter tweeting about food poisoning
symptoms, in an attempt to get them to log information in the City’s
311 database (via the Open311 API)
• Once in the 311 database, it follows established City workflows, and
becomes actionable
• Numbers (1 year):
• 2,390 tweets classified as related to food poisoning
• 282 tweets responded to
• 205 reports submitted
• 145 inspections
• Real time classification examples:
• “Ugh! I got food poisoning from the McDonalds’s on Halstead!”
http://184.73.52.31/cgi-bin/R/fp_classifier?text=Ugh!%20I%20got%20food%20poisoning%20from%20McDonalds%20on%20Halstead
• “U of Chicago releases a new paper on the effects of food poisoning”
http://184.73.52.31/cgi-bin/R/fp_classifier?text=U%20of%20Chicago%20releases%20new%20paper%20on%20the%20effects%20of%20food%20poisoning
• Video:http://www.youtube.com/watch?v=RNf9XQ_25Yw&feature=youtu.be
Use Cases
• Disease Tracker
• Large scale attempt to track disease occurrences in the United
States.
• Sponsored by the Dept. of HHS
• Approximately 1 million tweets a day (cold, flu) classified in real
time
• EC2 scalable instances
• Geolocation
• Cost to run for 6 months: $850
Future Directions
• Turnkey service
• Can all this functionality be abstracted down to a pushbutton
service?
• Open data
• Can you advertise the data collected, how you enriched it, and
allow others to come along an enrich it as well?
• General purpose bridge between Twitter and issue tracking
databases
• Big industry problem
Github Sources
• Tweet Collector
• https://github.com/smartchicago/TweetCollector
• Classifier Code
• https://github.com/corynissen/foodborne_classifier
Ad

More Related Content

What's hot (20)

Comparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesComparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
Lynn Langit
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
Treasure Data, Inc.
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
Mars Lan
 
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
DataStax
 
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudOTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
Mark Rittman
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
Neo4j
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
MongoDB
 
Clickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache SparkClickstream & Social Media Analysis using Apache Spark
Clickstream & Social Media Analysis using Apache Spark
TUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
Treasure Data, Inc.
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB
 
Yahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile PlatformYahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile Platform
DataWorks Summit/Hadoop Summit
 
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
Dataiku
 
Comparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform TechnologiesComparing Microsoft Big Data Platform Technologies
Comparing Microsoft Big Data Platform Technologies
Jen Stirrup
 
NoSQL for the SQL Server Pro
NoSQL for the SQL Server ProNoSQL for the SQL Server Pro
NoSQL for the SQL Server Pro
Lynn Langit
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
Séminaire Big Data Alter Way - Elasticsearch - octobre 2014
ALTER WAY
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
Treasure Data, Inc.
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
Mars Lan
 
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
DataStax
 
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle CloudOTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
OTN EMEA Tour 2016 : Deploying Full BI Platforms to Oracle Cloud
Mark Rittman
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
Relational to Big Graph
Relational to Big GraphRelational to Big Graph
Relational to Big Graph
Neo4j
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
MongoDB
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
Treasure Data, Inc.
 
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...
MongoDB
 
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy Online Course - Gain Competitive Advantage Using Microsoft Azure Data...
Experfy
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
Dataiku
 

Viewers also liked (9)

Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Kevin O'Shea
 
Today's social media and cloud computing in business environment
Today's social media and cloud computing in business environmentToday's social media and cloud computing in business environment
Today's social media and cloud computing in business environment
Novi Research Center
 
Understanding meteor
Understanding meteorUnderstanding meteor
Understanding meteor
M A Hossain Tonu
 
Big data-Cloud-Mobile service offerings
Big data-Cloud-Mobile service offeringsBig data-Cloud-Mobile service offerings
Big data-Cloud-Mobile service offerings
Vijayananda Mohire
 
Social Media, Cloud Computing and architecture
Social Media, Cloud Computing and architectureSocial Media, Cloud Computing and architecture
Social Media, Cloud Computing and architecture
Rick Mans
 
The Cloud and Mobile - Guppers
The Cloud and Mobile - GuppersThe Cloud and Mobile - Guppers
The Cloud and Mobile - Guppers
Andy Harjanto
 
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
Gruter
 
Understanding Microservices
Understanding Microservices Understanding Microservices
Understanding Microservices
M A Hossain Tonu
 
Big data and Social Media Analytics
Big data and Social Media AnalyticsBig data and Social Media Analytics
Big data and Social Media Analytics
Simplify360
 
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Avoiding Technical Fouls:Selected Ethical Issues in Advertising, Social Media...
Kevin O'Shea
 
Today's social media and cloud computing in business environment
Today's social media and cloud computing in business environmentToday's social media and cloud computing in business environment
Today's social media and cloud computing in business environment
Novi Research Center
 
Big data-Cloud-Mobile service offerings
Big data-Cloud-Mobile service offeringsBig data-Cloud-Mobile service offerings
Big data-Cloud-Mobile service offerings
Vijayananda Mohire
 
Social Media, Cloud Computing and architecture
Social Media, Cloud Computing and architectureSocial Media, Cloud Computing and architecture
Social Media, Cloud Computing and architecture
Rick Mans
 
The Cloud and Mobile - Guppers
The Cloud and Mobile - GuppersThe Cloud and Mobile - Guppers
The Cloud and Mobile - Guppers
Andy Harjanto
 
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
GRUTER가 들려주는 Big Data Platform 구축 전략과 적용 사례: GRUTER의 빅데이터 플랫폼 및 전략 소개
Gruter
 
Understanding Microservices
Understanding Microservices Understanding Microservices
Understanding Microservices
M A Hossain Tonu
 
Big data and Social Media Analytics
Big data and Social Media AnalyticsBig data and Social Media Analytics
Big data and Social Media Analytics
Simplify360
 
Ad

Similar to Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data Analytics (Chicago Summit) (20)

Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04
marc_harrison
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
Caserta
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
Mercedes Coyle
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
Travis Oliphant
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
Claudio Pontili
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
Jean-Claude Sotto
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
Unushs susus susujss. Ssuusussjjsjsit 4.pptxUnushs susus susujss. Ssuusussjjsjsit 4.pptx
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
AshishHiwale1
 
Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
Big data
Big dataBig data
Big data
roysonli
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
AbhishekKumarAgrahar2
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04Elasticsearch meetup final_2014_04
Elasticsearch meetup final_2014_04
marc_harrison
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
Caserta
 
Data Care, Feeding, and Maintenance
Data Care, Feeding, and MaintenanceData Care, Feeding, and Maintenance
Data Care, Feeding, and Maintenance
Mercedes Coyle
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
Jesus Rodriguez
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
Travis Oliphant
 
Big problems Big Data, simple solutions
Big problems Big Data, simple solutionsBig problems Big Data, simple solutions
Big problems Big Data, simple solutions
Claudio Pontili
 
Big problems Big data, simple AWS solution
Big problems Big data, simple AWS solutionBig problems Big data, simple AWS solution
Big problems Big data, simple AWS solution
Jean-Claude Sotto
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
Unushs susus susujss. Ssuusussjjsjsit 4.pptxUnushs susus susujss. Ssuusussjjsjsit 4.pptx
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
AshishHiwale1
 
IARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptxIARE_BDBA_ PPT_0.pptx
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
Lecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in detailsLecture1 BIG DATA and Types of data in details
Lecture1 BIG DATA and Types of data in details
AbhishekKumarAgrahar2
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
Sri Ambati
 
Ad

More from Open Analytics (20)

Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)
Open Analytics
 
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Open Analytics
 
CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)
Open Analytics
 
An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)
Open Analytics
 
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
Open Analytics
 
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Open Analytics
 
Using Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationUsing Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & Personalization
Open Analytics
 
M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco Analytics
Open Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital Economy
Open Analytics
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)
Open Analytics
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)
Open Analytics
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
Open Analytics
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Open Analytics
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Open Analytics
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)
Open Analytics
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYC
Open Analytics
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics Meetup
Open Analytics
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetup
Open Analytics
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_final
Open Analytics
 
HDScores OA DC Pitch
HDScores OA DC PitchHDScores OA DC Pitch
HDScores OA DC Pitch
Open Analytics
 
Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)
Open Analytics
 
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Open Analytics
 
CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)
Open Analytics
 
An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)
Open Analytics
 
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
Open Analytics
 
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Open Analytics
 
Using Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationUsing Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & Personalization
Open Analytics
 
M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco Analytics
Open Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital Economy
Open Analytics
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)
Open Analytics
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)
Open Analytics
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
Open Analytics
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Open Analytics
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Open Analytics
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)
Open Analytics
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYC
Open Analytics
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics Meetup
Open Analytics
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetup
Open Analytics
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_final
Open Analytics
 

Recently uploaded (20)

Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 

Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data Analytics (Chicago Summit)

  • 1. JoeOlson DataArchitect SmartChicagoCollaborative 27Mar2014 [email protected] (All the cool buzzwords in one place!) Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data Analytics
  • 2. Social Media - Twitter • What can we learn from Twitter? • 400 million tweets per day source: http://articles.washingtonpost.com/2013-03-21/business/37889387_1_tweets-jack-dorsey-twitter • 218 million users source: http://techcrunch.com/2013/10/03/bweeting/ • Excellent source of sentiment • Excellent source of big data • Prototyping • Modeling natural language • Resume padding
  • 3. Social Media - Twitter • How do we get at the data? • Twitter provided APIs: • https://dev.twitter.com/docs • Streaming • Set up a real time data stream (json) based on keywords • REST (v1.1) • Make REST requests, and get results • Possible parameters: • Geospatial bounding box • By time • By user, hashtag, retweets etc • Fire hose • Big $$$. Big data
  • 4. Social Media - Twitter • Information & Obstacles • Who • What • At best: Plain English (!) • Worse: (Spanish or Arabic or Portuguese...) • Worst: “Textspeak” symbols :-0, UTF8 chars, etc. • Absolute Worst: combination of all of them • Where • 1-2% with latitude / longitude • Geocode • When
  • 5. Social Media - Twitter JSON Tweet example: • "created_at":"Sun Oct 27 13:57:40 +0000 2013", • "id":394462908261740540, • "text":"Flu :(", • "source":"<a href="http://twitmania.com" rel="nofollow">TwitMania™</a>", • "user":{ • "id":594141140, • "name":"Yultiana Farida N", • "screen_name":"yultiana", • "followers_count":231, • "friends_count":252, • "created_at":"Tue May 29 23:58:25 +0000 2012", • "statuses_count":2397, • }, • "geo":null,
  • 6. Cloud Computing • What does cloud computing bring to the table? • Amazon’s EC2: • Commoditized hardware • Low cost • Only charged for resources you use • No long term commitments • Scalable • "Throwaway" mentality **IF** you play by their rules!
  • 7. Cloud Computing – AWS • Tools • Virtual Machines • # of Processors, RAM, OS, disk capacity and I/O – all configurable • Price range: $.02/hr - $4.60/hr • Licensed OSes cost 50% more than Linux OSes • Archive Storage • S3 / Glacier • Work Queues • SQS • Data Stores • Dynamo (key value store), Red Shift (analysis store) • Virtual Networking • Routers, VPN gateways, access control lists, etc • APIs • Command line • HTTPS REST • Native programming languages (Python, bash, PHP, Java etc.) Ideal for rapid prototyping / proof of concepts
  • 8. Cloud Computing – AWS • APIs • Basic • Start an instance (and start billing) • Stop an instance (stop billing) • Insert item into queue • Remove item from queue • Write to backup store • Ultra advanced • Reserved vs. on demand vs. spot instances • Price can drop as much as 80% due to market demand • Instance can disappear at any time
  • 9. Big Data Analytics • Can we skirt the “big data” problem by distilling the tweets down from millions and millions “noise” tweets into a more desirable data set? • Enrich in real time, rather than on archived data, and avoid the overhead of map/reduce? • Possible Enrichment of raw data: • Classification – separate tweets into “relevant” and “irrelevant” • Geocoding – improve on the 1-2% ? • Aggregation –> map reduce • Mapping -> Reduce Function -> Output • AWS – Elastic Map Reduce • Clustering
  • 10. Machine Learning • Classification: relevant, or irrelevant? • Human trained model • Once model is established, bounce new data off it for classification • Validation of model • Accuracy = (Total # of classifications – Mismatches between machine / human) Total # of classifications • Crowdsourcing – AWS Mechanical Turk • Improve model by feeding disagreements back into the model • Our best text classification model to date: low 90%
  • 11. Open Source • Friendly to the commoditized computing paradigm • Don’t have to worry about licensing issues • Contributes to the “throwaway” discipline • Don’t have to re-invent the wheel (collaboration) • Solutions applicable to all parts of the architecture • Acquire data: Node.js – non blocking • Analyze data: R – statistical engine • Store and query data: MongoDB (document store) or Riak (key- value database)
  • 12. Architecture • We know Twitter is providing a mountain of data from all parts of the world • We know Amazon is providing a framework of low cost, on- demand, no commitment computing • Open source is providing a rich tool set • Goals: • Architect with cost in mind! • Enrichment - Real time and after-the-fact enrichment (open data) • Scalable • Decoupled • Service based • Rapid development • Prove the concepts
  • 13. Architecture - Acquire • Acquire the data from Twitter • If classifying in real time: • Store then classify? • Classify then store? • Tools • Twitter streaming API • Keywords • Node.js • Several different packages to interface with Twitter APIs • Amazon • EC2 • SQS (?) Extremely useful, but drives the cost up
  • 14. Architecture - Analyze • Classification interface • Service based – HTTP REST • Push or pull? • Push – classifiers listen on port 80 • Pull – classifier starts pulling from an established work queue • Both highly scalable and flexible with respect to cost. • Stateless • R • Human trained machine learning packages available • Cloud friendly – no licenses • Automatable – from install, configuration, execution
  • 15. Architecture - Store • Store JSON as an object (document store) or normalize (relational database)? • Relational databases • disk I/O intensive – not cloud friendly • allow complex indexing • Easy to get a business intelligence front end on them • Requires a schema / ETL • Key-value document stores • Designed to be scalable – doesn’t need fast disks • Indexing is not nearly as flexible as RDBMS • More difficult to front a UI – no “drag and drop” tools • No schema / ETL needed. • Not as mature • MongoDB / Riak
  • 16. Architecture – Presentation • Least need for cloud friendly scalability here? • Options • Licensed BI software – Tableau, Endeca, Jaspersoft, Pentaho • Open source BI software – SpagoBI • Roll your own - PHP, Ruby, Visual Basic, Javascript, etc • Connect to an existing system instead?
  • 17. Costs – Real Time Classification • Number of tweets collected per day: 1,000,000 (comfortable - .25%) • Machine used on EC2 to acquire (node.js): micro • $.02/hr * 24 hrs = .48/day • Machine used on EC2 to classify (R): small (x2) • $.06/hr * 24 hrs = $1.44/day*2 = $2.88/day • Machine used on EC2 to store (MongoDB): large • $.24/hr * 24 hrs = $5.76 /day • Machine used on EC2 for GUI (Apache): small • $.06/hr * 24 = $1.44 • $0.48+$2.88+$5.76+$1.44 = $10.56 / 1,000,000 = .00001056 cents/tweet Can add more zeros if you relax real-time classification (spot instances)
  • 18. Costs - Archive • Size of average tweet: 2.5 KB • Cost to archive: • s3 : .095 GB/month • 0.0000002 per tweet per month • Glacier: .01 GB/month • 0.00000002 per tweet per month • Compression will add even more zeros, but will require more computing power, and mean more latency for post collection data analysis. Can be automated.
  • 19. Use Cases • Foodborne Chicago (http://foodborne.smartchicagoapps.org/) • Public-private partnership with City of Chicago Dept. of Public Health and Smart Chicago Collaborative • Reach out to city residents on Twitter tweeting about food poisoning symptoms, in an attempt to get them to log information in the City’s 311 database (via the Open311 API) • Once in the 311 database, it follows established City workflows, and becomes actionable • Numbers (1 year): • 2,390 tweets classified as related to food poisoning • 282 tweets responded to • 205 reports submitted • 145 inspections • Real time classification examples: • “Ugh! I got food poisoning from the McDonalds’s on Halstead!” http://184.73.52.31/cgi-bin/R/fp_classifier?text=Ugh!%20I%20got%20food%20poisoning%20from%20McDonalds%20on%20Halstead • “U of Chicago releases a new paper on the effects of food poisoning” http://184.73.52.31/cgi-bin/R/fp_classifier?text=U%20of%20Chicago%20releases%20new%20paper%20on%20the%20effects%20of%20food%20poisoning • Video:http://www.youtube.com/watch?v=RNf9XQ_25Yw&feature=youtu.be
  • 20. Use Cases • Disease Tracker • Large scale attempt to track disease occurrences in the United States. • Sponsored by the Dept. of HHS • Approximately 1 million tweets a day (cold, flu) classified in real time • EC2 scalable instances • Geolocation • Cost to run for 6 months: $850
  • 21. Future Directions • Turnkey service • Can all this functionality be abstracted down to a pushbutton service? • Open data • Can you advertise the data collected, how you enriched it, and allow others to come along an enrich it as well? • General purpose bridge between Twitter and issue tracking databases • Big industry problem
  • 22. Github Sources • Tweet Collector • https://github.com/smartchicago/TweetCollector • Classifier Code • https://github.com/corynissen/foodborne_classifier