0% found this document useful (0 votes)
8 views

IOTDS-Introduction

Uploaded by

sidimed.jeilang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

IOTDS-Introduction

Uploaded by

sidimed.jeilang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction to IoT Data Stream Mining

Jacob Montiel & Albert Bifet

Paris, 20 November 2019


Who are We

I Albert Bifet
I Professor at Télécom Paris
I Data stream mining algorithms and systems
I MOA: Massive Online Analytics
I Apache SAMOA: Scalable Advanced Massive Online
Analytics
I Jesse Read
I Professor at École Polytechnique
I MultiLabel Learning, Data stream mining and Deep
Learning
I MEKA: Multilabel Learning
I MOA: Massive Online Analytics
Who are We

I Heitor Gomes
I Senior Research Fellow at the University of Waikato
I Big Data and Data Stream Mining
I StreamDM: mining big data streams using Spark Streaming
I MOA: Massive Online Analytics
I Jacob Montiel
I Research Fellow at the University of Waikato
I Machine Learning for Evolving Data Streams, OSS
I scikit-multiflow: multi-output/multi-label and stream data
IoT Data Stream Mining

Outline
1. Introduction
2. Open Source Tools
3. Concept Drift
4. Classification
5. Ensemble Methods
6. Clustering
7. Streaming for Time Series
8. Classification in Multi-output Data Streams
9. Stream Algorithmics
IoT Data Stream Mining

Assessment
10% Lab Assignments
30% Project
60% Test

Classes
20/11, 27/11, 11/12, 18/12 Wednesdays at 9:00

Session Labs: 4/12 and 8/01

Important Dates
Project Presentation: January 15
Motivation

Source: IDC’s Digital Universe Study (EMC), June 2011

Data is growing
Motivation

Memory unit Size Binary size


kilobyte (kB/KB) 103 210
megabyte (MB) 106 220
gigabyte (GB) 109 230
terabyte (TB) 1012 240
petabyte (PB) 1015 250
exabyte (EB) 1018 260
zettabyte (ZB) 1021 270
yottabyte (YB) 1024 280

Data is growing
Motivation

Source: IDC’s Digital Universe Study (EMC), June 2011

Data is growing
Motivation

Source: IDC’s Digital Universe Study (EMC), June 2011

Data is growing
Motivation

Source: IDC’s Digital Universe Study (EMC), June 2011

Data is growing
Streaming Data

Big Data & Real Time


Big Data

McKinsey Global Institute (MGI) Report on Big Data, 2011.

Big data refers to datasets whose size is beyond


the ability of typical database software tools to
capture, store, manage, and analyze.
Big Data

McKinsey Global Institute (MGI) Report on Big Data, 2011.

Big data refers to datasets whose size is beyond


the ability of typical database software tools to
capture, store, manage, and analyze.
Methodology

Sampling and distributed systems


Methodology

Paolo Boldi

Big Data does not need big machines,


it needs big intelligence
Real time analytics

We want to analyze what is happening now.


Real time analytics

We want to analyze what is happening now.


Time and Memory

Number 8 Wire Mentality

Time and memory are the resource dimensions of


the process.
Time and Memory

Time and memory are the resource dimensions of


the process.
Algorithms

Classification, Regression, Clustering, Frequent


Pattern Mining.
Applications

I sensor data: industry, cities


I telecom data
I social networks: Twitter, Facebook, Yahoo
I marketing: sales business

Data may come from: humans, sensors, or


machines.
Data Streams

Big Data & Real Time

You might also like