0% found this document useful (0 votes)
6 views

02_data-ingestion

The document discusses data ingestion, focusing on time series data, which can be categorized into regular metrics and irregular events. It outlines the data model, including measurements, tags, fields, and timestamps, and explains how data is ingested through various views (conceptual, logical, and physical). Additionally, it introduces Telegraf as a data collection agent and provides examples of line protocol for data representation.

Uploaded by

Ammar Ajmal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

02_data-ingestion

The document discusses data ingestion, focusing on time series data, which can be categorized into regular metrics and irregular events. It outlines the data model, including measurements, tags, fields, and timestamps, and explains how data is ingested through various views (conceptual, logical, and physical). Additionally, it introduces Telegraf as a data collection agent and provides examples of line protocol for data representation.

Uploaded by

Ammar Ajmal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 72

Data Ingestion

Emanuele Della Valle


Prof. @ Politecnico di Milano
Founder & Partner @ Quantia Consulting
Marco Balduini
Founder & CEO @ Quantia Consulting
Riccardo Tommasini
Prof. @ INSA Lyon (France)
© 2021 InfluxData. All rights reserved.
Introduction

2
Data Lifecycle

© 2021 InfluxData. All rights reserved. 3


Data Lifecycle

© 2021 InfluxData. All rights reserved. 4


Data Lifecycle

© 2021 InfluxData. All rights reserved. 5


What’s a time series?

6
Let’s start by example

• Weather conditions
• Stock exchange
• Healthcare
• Cluster monitoring

© 2021 InfluxData. All rights reserved. 7


Now some different type of time series

• Logs
• Traces

© 2021 InfluxData. All rights reserved. 8


What’s the difference?

VS.

© 2021 InfluxData. All rights reserved. 9


What’s the difference?
both of them are time series, but …
We monitor the The phenomena happen
phenomena and we observe them

VS.

© 2021 InfluxData. All rights reserved. 10


What’s the difference?
both of them are time series, but …
We monitor the The phenomena happen
phenomena and we observe them
regular

VS.

© 2021 InfluxData. All rights reserved. 11


What’s the difference?
both of them are time series, but …
We monitor the The phenomena happen
phenomena and we observe them
regular irregular

VS.

© 2021 InfluxData. All rights reserved. 12


What’s the difference?
both of them are time series, but …
We monitor the The phenomena happen
phenomena and we observe them
regular irregular

VS.

metrics
© 2021 InfluxData. All rights reserved. 13
What’s the difference?
both of them are time series, but …
We monitor the The phenomena happen
phenomena and we observe them
regular irregular

VS.

metrics events

© 2021 InfluxData. All rights reserved. 14


Metrics

Regular Time Series

Measurements
gathered at regular
time intervals

© 2021 InfluxData. All rights reserved. 15


Metrics Events

Regular Time Series Irregular Time Series

Measurements Measurements
gathered at regular observed at irregular
time intervals time intervals

© 2021 InfluxData. All rights reserved. 16


Summarization of Events
turns events into metrics, for example

© 2021 InfluxData. All rights reserved. 17


Summarization of Events
turns events into metrics, for example

Summarizing the average


trade price of Apple stock
every 10 minutes over the
course of a day

© 2021 InfluxData. All rights reserved. 18


Summarization of Events
turns events into metrics, for example

Summarizing the average Summarizing the average


trade price of Apple stock response time for requests
every 10 minutes over the in an application over 1
course of a day minute intervals

© 2021 InfluxData. All rights reserved. 19


Characteristics of the time series
• All Time-stamped data
• Generated in
– regular (Metric) and
– irregular (Event) time periods
• Huge volumes of data
• High variety of semi-structured data
• Real-time
• Time sensitive

© 2021 InfluxData. All rights reserved. 20


How is data ingested?

21
Conceptual vs. logical vs. physical views

[Src: https://www.oreilly.com/library/view/data-modeling-made/9781935504481/ ]
© 2021 InfluxData. All rights reserved. 22
How is data ingested?
Conceptual View

23
Let’s start from the anatomy of a Time-Series Line Graph

© 2021 InfluxData. All rights reserved. 24


The type of measurement is the title of the Line Graph

© 2021 InfluxData. All rights reserved. 25


Data Model
• Measurement
– The name of the measurement used as high level grouping of data
• Tag set
– Other lower level grouping criteria of data
• Fields
– Actual data
• Timestamp
– Time of the data
• Series
– A unique combination of measurement+tags

© 2021 InfluxData. All rights reserved. 26


Time stamps are on the X-Axis

© 2021 InfluxData. All rights reserved. 27


Data Model
• Measurement
– The name of the measurement used as high-level grouping of data
• Tag set
– Other lower level grouping criteria of data
• Fields
– Actual data
• Timestamp
– Time of the data (better if the time the data is observed)
• Series
– A unique combination of measurement+tags

© 2021 InfluxData. All rights reserved. 28


Data is on the Y-Axis

© 2021 InfluxData. All rights reserved. 29


Data Model
• Measurement
– The name of the measurement used as high-level grouping of data
• Tag set
– Other lower level grouping criteria of data
• Field set
– Actual data
• Timestamp
– Time of the data (better if the time the data is observed)
• Series
– A unique combination of measurement+tags

© 2021 InfluxData. All rights reserved. 30


The Legend distinguishes the three time series in the graph

© 2021 InfluxData. All rights reserved. 31


Data Model
• Measurement
– The name of the measurement used as high-level grouping of data
• Tag set
– Other lower-level grouping criteria of data
• Field set
– Actual data
• Timestamp
– Time of the data (better if the time the data is observed)
• Series
– A unique combination of measurement+tags

© 2021 InfluxData. All rights reserved. 32


A series in the graph thus is …

© 2021 InfluxData. All rights reserved. 33


Data Model
• Measurement
– The name of the measurement used as high-level grouping of data
• Tag set
– Other lower-level grouping criteria of data
• Field set
– Actual data
• Timestamp
– Time of the data (better if the time the data is observed)
• Series
– Data points in time order grouped by measurements and tags

© 2021 InfluxData. All rights reserved. 34


How is data ingested?
Logical View

35
Data Model
• Measurement
– A name to group data at high level
• Tag set
– A set of key-value pairs to group data at low level (values are strings)
• Field set
– A set of key-value pairs to represent data (values are numerical & strings)
• Timestamp
– Time of the data with nanosecond precision
• Series
– A unique combination of measurement+tags
© 2021 InfluxData. All rights reserved. 36
Data model vs ingestion & storage

© 2021 InfluxData. All rights reserved. 37


Data model vs ingestion & storage

Measurement
Tag set
Field set
Timestamp

© 2021 InfluxData. All rights reserved. 38


Data model vs ingestion & storage
Series

Measurement
Tag set
Field set
Timestamp

© 2021 InfluxData. All rights reserved. 39


Data model vs ingestion & storage
Series

Buckets
Measurement (set of series)
Tag set
Field set
Timestamp

© 2021 InfluxData. All rights reserved. 40


How is data ingested?
Physical View

41
An example of Line Protocol

obs,host=ovenA,num=1,region=west temp=301,hum=23 1492214400000000000

Measurement

© 2021 InfluxData. All rights reserved. 42


An example of Line Protocol

obs,host=ovenA,num=1,region=west temp=301,hum=23 1492214400000000000

Tags

© 2021 InfluxData. All rights reserved. 43


An example of Line Protocol

whitespace

obs,host=ovenA,num=1,region=west temp=301,hum=23 1492214400000000000

Tags

© 2021 InfluxData. All rights reserved. 44


An example of Line Protocol

obs,host=ovenA,num=1,region=west temp=301,hum=23 1492214400000000000

Fields

© 2021 InfluxData. All rights reserved. 45


An example of Line Protocol

whitespace

obs,host=ovenA,num=1,region=west temp=301,hum=23 1492214400000000000

Fields

© 2021 InfluxData. All rights reserved. 46


An example of Line Protocol

obs,host=ovenA,num=1,region=west temp=301,hum=23 1492214400000000000

timestamp

Reference: https://v2.docs.influxdata.com/v2.0/reference/line-protocol/

© 2021 InfluxData. All rights reserved. 47


Bucket physical view
• Columnar Data Stores
• Best for column selection and aggregation thanks to
– Disk + Memory locality
– Cache locality

_time _m host num line temp humidity

1492…1 obs ovenB 1 west 301 23

1492…0 obs ovenA 1 west 125 75

… … … … … … …

_m = _measurement
tags fields
© 2021 InfluxData. All rights reserved. 48
So, the line protocol representations of two metrics are …

© 2021 InfluxData. All rights reserved. 49


So, the line protocol representations of two metrics are …

© 2021 InfluxData. All rights reserved. 50


So, the line protocol representations of two metrics are …

stock_price,ticker=A price=170 1465839830100400200

© 2021 InfluxData. All rights reserved. 51


So, the line protocol representations of two metrics are …

stock_price,ticker=A price=170 1465839830100400200

© 2021 InfluxData. All rights reserved. 52


So, the line protocol representations of two metrics are …

stock_price,ticker=A price=170 1465839830100400200

stock_price,ticker=AA price=42 1465839840100400200

© 2021 InfluxData. All rights reserved. 53


How is data
automatically
ingested?

54
Telegraf

• Telegraf is a data collection agent


• It is based on a Plug and Play architecture
• It offers a variety of input plugins
• It can be configured from the InfluxDB
cloud UI

Download and install:


https://docs.influxdata.com/telegraf/latest/introduction/installation/

© 2021 InfluxData. All rights reserved. 55


Positioning Telegraf in the ingestion pipeline
Telegraf

© 2021 InfluxData. All rights reserved. 56


Quiz

• Provide the correct relationship — 1:1, 1:N, N:1, or N:N


– 1 time series to ? measurement
– 1 time series to ? tag (key-value pair)
– 1 time series to ? field key
– ? field key to ? field value
– 1 time series to ? timestamp
– 1 <measurement,tag,field key,timestamp> to ? value

© 2021 InfluxData. All rights reserved. 57


Quiz

• Provide the correct relationship — 1:1, 1:N, N:1, or N:N


– 1 time series to 1 measurement
– 1 time series to ? tag (key-value pair)
– 1 time series to ? field key
– ? field key to ? field value
– 1 time series to ? timestamp
– 1 <measurement,tag,field key,timestamp> to ? value

© 2021 InfluxData. All rights reserved. 58


Quiz

• Provide the correct relationship — 1:1, 1:N, N:1, or N:N


– 1 time series to 1 measurement
– 1 time series to N tag (key-value pair)
– 1 time series to ? field key
– ? field key to ? field value
– 1 time series to ? timestamp
– 1 <measurement,tag,field key,timestamp> to ? value

© 2021 InfluxData. All rights reserved. 59


Quiz

• Provide the correct relationship — 1:1, 1:N, N:1, or N:N


– 1 time series to 1 measurement
– 1 time series to N tag (key-value pair)
– 1 time series to N field key
– ? field key to ? field value
– 1 time series to ? timestamp
– 1 <measurement,tag,field key,timestamp> to ? value

© 2021 InfluxData. All rights reserved. 60


Quiz

• Provide the correct relationship — 1:1, 1:N, N:1, or N:N


– 1 time series to 1 measurement
– 1 time series to N tag (key-value pair)
– 1 time series to N field key
– N field key to N field value
– 1 time series to ? timestamp
– 1 <measurement,tag,field key,timestamp> to ? value

© 2021 InfluxData. All rights reserved. 61


Quiz

• Provide the correct relationship — 1:1, 1:N, N:1, or N:N


– 1 time series to 1 measurement
– 1 time series to N tag (key-value pair)
– 1 time series to N field key
– N field key to N field value
– 1 time series to N timestamp
– 1 <measurement,tag,field key,timestamp> to ? value

© 2021 InfluxData. All rights reserved. 62


Quiz

• Provide the correct relationship — 1:1, 1:N, N:1, or N:N


– 1 time series to 1 measurement
– 1 time series to N tag (key-value pair)
– 1 time series to N field key
– N field key to N field value
– 1 time series to N timestamp
– 1 <measurement,tag,field key,timestamp> to 1 value

© 2021 InfluxData. All rights reserved. 63


Let’s get dirty!

1
© 2021 InfluxData. All rights reserved. 64
Use Case: Continuous Linear Pizza Oven
Sensors observe
• temperature (C°)
• relative humidity (%)
of the two ovens

Learning Goals
• Line protocol
usage
• First query

© 2021 InfluxData. All rights reserved. 65


Task 1

Model the following data representing the temperature and the


humidity observations, from both sensors, over time.

component sensor temperature humidity ts

iot-oven S1 290 30 1636372800000000000

iot-oven S2 105 55 1636372815000000000

iot-oven S1 305 38 1636372860000000000

iot-oven S2 120 65 1636372875000000000

© 2021 InfluxData. All rights reserved. 66


Task 2
Load data into an InfluxDB bucket named training

iot-oven,sensor=S1 temperature=290,humidity=30
1636372800000000000
iot-oven,sensor=S2 temperature=105,humidity=55
1636372815000000000
iot-oven,sensor=S1 temperature=305,humidity=38
1636372860000000000
iot-oven,sensor=S2 temperature=120,humidity=65
1636372875000000000

© 2021 InfluxData. All rights reserved. 67
Let’s do some live coding

© 2021 InfluxData. All rights reserved. 68


Task 3
Run you first query

from(bucket: "training")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "iot-oven")
|> filter(fn: (r) => r._field == "temperature")
|> filter(fn: (r) => r.sensor == "S2")
|> filter(fn: (r) => r._value > 100)

© 2021 InfluxData. All rights reserved. 69


Quiz

• Should a bucket contain the same number of distinct


measurements, tag keys, tag values, field keys, field values, and
timestamps the same?
• If not, please put them in order using:
– >> (at least an order of magnitude more)
– ~ (about the same order of magnitude)

e.g., field values >> field keys

© 2021 InfluxData. All rights reserved. 70


Quiz answers

• Should a bucket contain the same number of distinct


measurements, tag keys, tag values, field keys, field values, and
timestamps the same?
• If not, please put them in order using:
– >> (at least an order of magnitude more)
– ~ (about the same order of magnitude)
timestamps >> field values >> tag values >> field keys ~ tag
keys
>> measurements

© 2021 InfluxData. All rights reserved. 71


Data Ingestion
Emanuele Della Valle
Prof. @ Politecnico di Milano
Founder & Partner @ Quantia Consulting
Marco Balduini
Founder & CEO @ Quantia Consulting
Riccardo Tommasini
Prof. @ INSA Lyon (France)
© 2021 InfluxData. All rights reserved.

You might also like