SlideShare a Scribd company logo
Queue with
asyncio and Kafka
Showcase
Ondřej Veselý
What kind of data we have
Problem:
store JSON to database
Just a few records
per second.
But
● Slow database
● Unreliable database
● Increasing traffic (20x)
def save_data(conn, cur, ts, data):
cur.execute(
"""INSERT INTO data (timestamp, data)
VALUES (%s,%s) """, (ts, ujson.dumps(data)))
conn.commit()
@app.route('/store', method=['PUT', 'POST'])
def logstash_route():
data = ujson.load(request.body)
conn = psycopg2.connect(**config.pg_logs)
t = datetime.now()
with conn.cursor(cursor_factory=DictCursor) as cur:
for d in data:
save_data(conn, cur, t, d)
conn.close()
Old code
Architecture
internet
Kafka producer
/store
Kafka consumer
Kafka queue
Postgres
… time to kill consumer ...
Asyncio, example
import asyncio
async def factorial(name, number):
f = 1
for i in range(2, number+1):
print("Task %s: Compute factorial(%s)..." % (name, i))
await asyncio.sleep(1)
f *= i
print("Task %s: factorial(%s) = %s" % (name, number, f))
loop = asyncio.get_event_loop()
tasks = [
asyncio.ensure_future(factorial("A", 2)),
asyncio.ensure_future(factorial("B", 3)),
asyncio.ensure_future(factorial("C", 4))]
loop.run_until_complete(asyncio.gather(*tasks))
loop.close()
Task A: Compute factorial(2)...
Task B: Compute factorial(2)...
Task C: Compute factorial(2)...
Task A: factorial(2) = 2
Task B: Compute factorial(3)...
Task C: Compute factorial(3)...
Task B: factorial(3) = 6
Task C: Compute factorial(4)...
Task C: factorial(4) = 24
What we used
Apache Kafka
Not ujson
Concurrency - doing lots of slow things at once.
No processes, no threads.
Producer
from aiohttp import web
import json
Consumer
import asyncio
import json
from aiokafka import AIOKafkaConsumer
import aiopg
Producer #1
async def kafka_send(kafka_producer, data, topic):
message = {
'data': data,
'received': str(arrow.utcnow())
}
message_json_bytes = bytes(json.dumps(message), 'utf-8')
await kafka_producer.send_and_wait(topic, message_json_bytes)
async def handle(request):
post_data = await request.json()
try:
await kafka_send(request.app['kafka_p'], post_data, topic=settings.topic)
except:
slog.exception("Kafka Error")
await destroy_all()
return web.Response(status=200)
app = web.Application()
app.router.add_route('POST', '/store', handle)
app['kafka_p'] = get_kafka_producer()
Destroying the loop
async def destroy_all():
loop = asyncio.get_event_loop()
for task in asyncio.Task.all_tasks():
task.cancel()
await loop.stop()
await loop.close()
slog.debug("Exiting.")
sys.exit()
def get_kafka_producer():
loop = asyncio.get_event_loop()
producer = AIOKafkaProducer(
loop=loop,
bootstrap_servers=settings.queues_urls,
request_timeout_ms=settings.kafka_timeout,
retry_backoff_ms=1000)
loop.run_until_complete(producer.start())
return producer
Getting producer
Producer #2
Consume
… time to resurrect consumer ...
DB
connected
1. Receive data record from Kafka
2. Put it to the queue
start
yesno
Flush
queue full enough
or
data old enough
Store data from queue to DB
yesno
Connect to DB
start
asyncio.Queue()
Consumer #1
def main():
dbs_connected = asyncio.Future()
batch = asyncio.Queue(maxsize=settings.batch_max_size)
asyncio.ensure_future(consume(batch, dbs_connected))
asyncio.ensure_future(start_flushing(batch, dbs_connected))
loop.run_forever()
async def consume(queue, dbs_connected):
await asyncio.wait_for(dbs_connected, timeout=settings.wait_for_databases)
consumer = AIOKafkaConsumer(
settings.topic, loop=loop, bootstrap_servers=settings.queues_urls,
group_id='consumers'
)
await consumer.start()
async for msg in consumer:
message = json.loads(msg.value.decode("utf-8"))
await queue.put((message.get('received'), message.get('data')))
await consumer.stop()
Consumer #2
async def start_flushing(queue, dbs_connected):
db_logg = await aiopg.create_pool(settings.logs_db_url)
while True:
async with db_logg.acquire() as logg_conn, logg_conn.cursor() as logg_cur:
await keep_flushing(dbs_connected, logg_cur, queue)
await asyncio.sleep(2)
async def keep_flushing(dbs_connected, logg_cur, queue):
dbs_connected.set_result(True)
last_stored_time = time.time()
while True:
if not queue.empty() and (queue.qsize() > settings.batch_flush_size or
time.time() - last_stored_time > settings.batch_max_time):
to_store = []
while not queue.empty():
to_store.append(await queue.get())
try:
await store_bulk(logg_cur, to_store)
except:
break # DB down, breaking to reconnect
last_stored_time = time.time()
await asyncio.sleep(settings.batch_sleep)
Consumer #3
Code is public on gitlab
https://gitlab.skypicker.com/ondrej/faqstorer
www.orwen.org
code.kiwi.com
www.kiwi.com/jobs/
Check graphs...
Ad

More Related Content

What's hot (20)

Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Toshihiro Suzuki
 
アプリ開発者、DB 管理者視点での Cloud Spanner 活用方法 | 第 10 回 Google Cloud INSIDE Games & App...
アプリ開発者、DB 管理者視点での Cloud Spanner 活用方法 | 第 10 回 Google Cloud INSIDE Games & App...アプリ開発者、DB 管理者視点での Cloud Spanner 活用方法 | 第 10 回 Google Cloud INSIDE Games & App...
アプリ開発者、DB 管理者視点での Cloud Spanner 活用方法 | 第 10 回 Google Cloud INSIDE Games & App...
Google Cloud Platform - Japan
 
MariaDB+GaleraClusterの運用事例(MySQL勉強会2016-01-28)
MariaDB+GaleraClusterの運用事例(MySQL勉強会2016-01-28)MariaDB+GaleraClusterの運用事例(MySQL勉強会2016-01-28)
MariaDB+GaleraClusterの運用事例(MySQL勉強会2016-01-28)
Yuji Otani
 
Atomicity In Redis: Thomas Hunter
Atomicity In Redis: Thomas HunterAtomicity In Redis: Thomas Hunter
Atomicity In Redis: Thomas Hunter
Redis Labs
 
AppArmorの話
AppArmorの話AppArmorの話
AppArmorの話
Hiroki Ishikawa
 
A Threat Hunter Himself
A Threat Hunter HimselfA Threat Hunter Himself
A Threat Hunter Himself
Sergey Soldatov
 
Spring Cloud Data Flow の紹介 #streamctjp
Spring Cloud Data Flow の紹介  #streamctjpSpring Cloud Data Flow の紹介  #streamctjp
Spring Cloud Data Flow の紹介 #streamctjp
Yahoo!デベロッパーネットワーク
 
Red Hat Update Infrastructure 2.0
Red Hat Update Infrastructure 2.0Red Hat Update Infrastructure 2.0
Red Hat Update Infrastructure 2.0
Etsuji Nakai
 
MySQLからPostgreSQLへのマイグレーションのハマリ所
MySQLからPostgreSQLへのマイグレーションのハマリ所MySQLからPostgreSQLへのマイグレーションのハマリ所
MySQLからPostgreSQLへのマイグレーションのハマリ所
Makoto Kaga
 
PostgreSQLレプリケーション徹底紹介
PostgreSQLレプリケーション徹底紹介PostgreSQLレプリケーション徹底紹介
PostgreSQLレプリケーション徹底紹介
NTT DATA OSS Professional Services
 
脆弱性は誰のせい? PHP、MySQL、Joomla! の責任やいかに
脆弱性は誰のせい? PHP、MySQL、Joomla! の責任やいかに脆弱性は誰のせい? PHP、MySQL、Joomla! の責任やいかに
脆弱性は誰のせい? PHP、MySQL、Joomla! の責任やいかに
Hiroshi Tokumaru
 
これからSpringを使う開発者が知っておくべきこと
これからSpringを使う開発者が知っておくべきことこれからSpringを使う開発者が知っておくべきこと
これからSpringを使う開発者が知っておくべきこと
土岐 孝平
 
Kheirkhabarov24052017_phdays7
Kheirkhabarov24052017_phdays7Kheirkhabarov24052017_phdays7
Kheirkhabarov24052017_phdays7
Teymur Kheirkhabarov
 
Osc2015北海道 札幌my sql勉強会_波多野_r3
Osc2015北海道 札幌my sql勉強会_波多野_r3Osc2015北海道 札幌my sql勉強会_波多野_r3
Osc2015北海道 札幌my sql勉強会_波多野_r3
Nobuhiro Hatano
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
Practical Celery
Practical CeleryPractical Celery
Practical Celery
Cameron Maske
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
confluent
 
Structured Streaming - The Internal -
Structured Streaming - The Internal -Structured Streaming - The Internal -
Structured Streaming - The Internal -
NTT DATA OSS Professional Services
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and Tuning
NGINX, Inc.
 
GitLabのAutoDevOpsを試してみた
GitLabのAutoDevOpsを試してみたGitLabのAutoDevOpsを試してみた
GitLabのAutoDevOpsを試してみた
富士通クラウドテクノロジーズ株式会社
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Toshihiro Suzuki
 
アプリ開発者、DB 管理者視点での Cloud Spanner 活用方法 | 第 10 回 Google Cloud INSIDE Games & App...
アプリ開発者、DB 管理者視点での Cloud Spanner 活用方法 | 第 10 回 Google Cloud INSIDE Games & App...アプリ開発者、DB 管理者視点での Cloud Spanner 活用方法 | 第 10 回 Google Cloud INSIDE Games & App...
アプリ開発者、DB 管理者視点での Cloud Spanner 活用方法 | 第 10 回 Google Cloud INSIDE Games & App...
Google Cloud Platform - Japan
 
MariaDB+GaleraClusterの運用事例(MySQL勉強会2016-01-28)
MariaDB+GaleraClusterの運用事例(MySQL勉強会2016-01-28)MariaDB+GaleraClusterの運用事例(MySQL勉強会2016-01-28)
MariaDB+GaleraClusterの運用事例(MySQL勉強会2016-01-28)
Yuji Otani
 
Atomicity In Redis: Thomas Hunter
Atomicity In Redis: Thomas HunterAtomicity In Redis: Thomas Hunter
Atomicity In Redis: Thomas Hunter
Redis Labs
 
Red Hat Update Infrastructure 2.0
Red Hat Update Infrastructure 2.0Red Hat Update Infrastructure 2.0
Red Hat Update Infrastructure 2.0
Etsuji Nakai
 
MySQLからPostgreSQLへのマイグレーションのハマリ所
MySQLからPostgreSQLへのマイグレーションのハマリ所MySQLからPostgreSQLへのマイグレーションのハマリ所
MySQLからPostgreSQLへのマイグレーションのハマリ所
Makoto Kaga
 
脆弱性は誰のせい? PHP、MySQL、Joomla! の責任やいかに
脆弱性は誰のせい? PHP、MySQL、Joomla! の責任やいかに脆弱性は誰のせい? PHP、MySQL、Joomla! の責任やいかに
脆弱性は誰のせい? PHP、MySQL、Joomla! の責任やいかに
Hiroshi Tokumaru
 
これからSpringを使う開発者が知っておくべきこと
これからSpringを使う開発者が知っておくべきことこれからSpringを使う開発者が知っておくべきこと
これからSpringを使う開発者が知っておくべきこと
土岐 孝平
 
Osc2015北海道 札幌my sql勉強会_波多野_r3
Osc2015北海道 札幌my sql勉強会_波多野_r3Osc2015北海道 札幌my sql勉強会_波多野_r3
Osc2015北海道 札幌my sql勉強会_波多野_r3
Nobuhiro Hatano
 
My sql failover test using orchestrator
My sql failover test  using orchestratorMy sql failover test  using orchestrator
My sql failover test using orchestrator
YoungHeon (Roy) Kim
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
confluent
 
NGINX Installation and Tuning
NGINX Installation and TuningNGINX Installation and Tuning
NGINX Installation and Tuning
NGINX, Inc.
 

Viewers also liked (7)

codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
DataStax Academy
 
美团技术沙龙04 - Kv Tair best practise
美团技术沙龙04 - Kv Tair best practise 美团技术沙龙04 - Kv Tair best practise
美团技术沙龙04 - Kv Tair best practise
美团点评技术团队
 
Communication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed SystemsCommunication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed Systems
guest61205606
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
Aya Mahmoud
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
SHATHAN
 
大数据时代feed架构 (ArchSummit Beijing 2014)
大数据时代feed架构 (ArchSummit Beijing 2014)大数据时代feed架构 (ArchSummit Beijing 2014)
大数据时代feed架构 (ArchSummit Beijing 2014)
Tim Y
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
DataStax Academy
 
美团技术沙龙04 - Kv Tair best practise
美团技术沙龙04 - Kv Tair best practise 美团技术沙龙04 - Kv Tair best practise
美团技术沙龙04 - Kv Tair best practise
美团点评技术团队
 
Communication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed SystemsCommunication And Synchronization In Distributed Systems
Communication And Synchronization In Distributed Systems
guest61205606
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
Aya Mahmoud
 
Synchronization in distributed systems
Synchronization in distributed systems Synchronization in distributed systems
Synchronization in distributed systems
SHATHAN
 
大数据时代feed架构 (ArchSummit Beijing 2014)
大数据时代feed架构 (ArchSummit Beijing 2014)大数据时代feed架构 (ArchSummit Beijing 2014)
大数据时代feed架构 (ArchSummit Beijing 2014)
Tim Y
 
Ad

Similar to Python queue solution with asyncio and kafka (20)

Introduction to asyncio
Introduction to asyncioIntroduction to asyncio
Introduction to asyncio
Saúl Ibarra Corretgé
 
Tools for Making Machine Learning more Reactive
Tools for Making Machine Learning more ReactiveTools for Making Machine Learning more Reactive
Tools for Making Machine Learning more Reactive
Jeff Smith
 
Future Decoded - Node.js per sviluppatori .NET
Future Decoded - Node.js per sviluppatori .NETFuture Decoded - Node.js per sviluppatori .NET
Future Decoded - Node.js per sviluppatori .NET
Gianluca Carucci
 
ZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made SimpleZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made Simple
Ian Barber
 
Asynchronous web apps with the Play Framework 2.0
Asynchronous web apps with the Play Framework 2.0Asynchronous web apps with the Play Framework 2.0
Asynchronous web apps with the Play Framework 2.0
Oscar Renalias
 
JS Fest 2019 Node.js Antipatterns
JS Fest 2019 Node.js AntipatternsJS Fest 2019 Node.js Antipatterns
JS Fest 2019 Node.js Antipatterns
Timur Shemsedinov
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
Databricks
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Sages
 
Websockets talk at Rubyconf Uruguay 2010
Websockets talk at Rubyconf Uruguay 2010Websockets talk at Rubyconf Uruguay 2010
Websockets talk at Rubyconf Uruguay 2010
Ismael Celis
 
TDC2018SP | Trilha Go - Processando analise genetica em background com Go
TDC2018SP | Trilha Go - Processando analise genetica em background com GoTDC2018SP | Trilha Go - Processando analise genetica em background com Go
TDC2018SP | Trilha Go - Processando analise genetica em background com Go
tdc-globalcode
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 
Lego: A brick system build by scala
Lego: A brick system build by scalaLego: A brick system build by scala
Lego: A brick system build by scala
lunfu zhong
 
Think Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSThink Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJS
Adam L Barrett
 
Writing Redis in Python with asyncio
Writing Redis in Python with asyncioWriting Redis in Python with asyncio
Writing Redis in Python with asyncio
James Saryerwinnie
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
InfluxData
 
Rntb20200805
Rntb20200805Rntb20200805
Rntb20200805
t k
 
Avoiding Callback Hell with Async.js
Avoiding Callback Hell with Async.jsAvoiding Callback Hell with Async.js
Avoiding Callback Hell with Async.js
cacois
 
Stream or not to Stream?

Stream or not to Stream?
Stream or not to Stream?

Stream or not to Stream?

Lukasz Byczynski
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
Futures e abstração - QCon São Paulo 2015
Futures e abstração - QCon São Paulo 2015Futures e abstração - QCon São Paulo 2015
Futures e abstração - QCon São Paulo 2015
Leonardo Borges
 
Tools for Making Machine Learning more Reactive
Tools for Making Machine Learning more ReactiveTools for Making Machine Learning more Reactive
Tools for Making Machine Learning more Reactive
Jeff Smith
 
Future Decoded - Node.js per sviluppatori .NET
Future Decoded - Node.js per sviluppatori .NETFuture Decoded - Node.js per sviluppatori .NET
Future Decoded - Node.js per sviluppatori .NET
Gianluca Carucci
 
ZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made SimpleZeroMQ: Messaging Made Simple
ZeroMQ: Messaging Made Simple
Ian Barber
 
Asynchronous web apps with the Play Framework 2.0
Asynchronous web apps with the Play Framework 2.0Asynchronous web apps with the Play Framework 2.0
Asynchronous web apps with the Play Framework 2.0
Oscar Renalias
 
JS Fest 2019 Node.js Antipatterns
JS Fest 2019 Node.js AntipatternsJS Fest 2019 Node.js Antipatterns
JS Fest 2019 Node.js Antipatterns
Timur Shemsedinov
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
Databricks
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Sages
 
Websockets talk at Rubyconf Uruguay 2010
Websockets talk at Rubyconf Uruguay 2010Websockets talk at Rubyconf Uruguay 2010
Websockets talk at Rubyconf Uruguay 2010
Ismael Celis
 
TDC2018SP | Trilha Go - Processando analise genetica em background com Go
TDC2018SP | Trilha Go - Processando analise genetica em background com GoTDC2018SP | Trilha Go - Processando analise genetica em background com Go
TDC2018SP | Trilha Go - Processando analise genetica em background com Go
tdc-globalcode
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 
Lego: A brick system build by scala
Lego: A brick system build by scalaLego: A brick system build by scala
Lego: A brick system build by scala
lunfu zhong
 
Think Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJSThink Async: Asynchronous Patterns in NodeJS
Think Async: Asynchronous Patterns in NodeJS
Adam L Barrett
 
Writing Redis in Python with asyncio
Writing Redis in Python with asyncioWriting Redis in Python with asyncio
Writing Redis in Python with asyncio
James Saryerwinnie
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
InfluxData
 
Rntb20200805
Rntb20200805Rntb20200805
Rntb20200805
t k
 
Avoiding Callback Hell with Async.js
Avoiding Callback Hell with Async.jsAvoiding Callback Hell with Async.js
Avoiding Callback Hell with Async.js
cacois
 
Stream or not to Stream?

Stream or not to Stream?
Stream or not to Stream?

Stream or not to Stream?

Lukasz Byczynski
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
Futures e abstração - QCon São Paulo 2015
Futures e abstração - QCon São Paulo 2015Futures e abstração - QCon São Paulo 2015
Futures e abstração - QCon São Paulo 2015
Leonardo Borges
 
Ad

Recently uploaded (20)

chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Deloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining ProjectsDeloitte - A Framework for Process Mining Projects
Deloitte - A Framework for Process Mining Projects
Process mining Evangelist
 

Python queue solution with asyncio and kafka

  • 1. Queue with asyncio and Kafka Showcase Ondřej Veselý
  • 2. What kind of data we have
  • 3. Problem: store JSON to database Just a few records per second. But ● Slow database ● Unreliable database ● Increasing traffic (20x)
  • 4. def save_data(conn, cur, ts, data): cur.execute( """INSERT INTO data (timestamp, data) VALUES (%s,%s) """, (ts, ujson.dumps(data))) conn.commit() @app.route('/store', method=['PUT', 'POST']) def logstash_route(): data = ujson.load(request.body) conn = psycopg2.connect(**config.pg_logs) t = datetime.now() with conn.cursor(cursor_factory=DictCursor) as cur: for d in data: save_data(conn, cur, t, d) conn.close() Old code
  • 5. Architecture internet Kafka producer /store Kafka consumer Kafka queue Postgres … time to kill consumer ...
  • 6. Asyncio, example import asyncio async def factorial(name, number): f = 1 for i in range(2, number+1): print("Task %s: Compute factorial(%s)..." % (name, i)) await asyncio.sleep(1) f *= i print("Task %s: factorial(%s) = %s" % (name, number, f)) loop = asyncio.get_event_loop() tasks = [ asyncio.ensure_future(factorial("A", 2)), asyncio.ensure_future(factorial("B", 3)), asyncio.ensure_future(factorial("C", 4))] loop.run_until_complete(asyncio.gather(*tasks)) loop.close() Task A: Compute factorial(2)... Task B: Compute factorial(2)... Task C: Compute factorial(2)... Task A: factorial(2) = 2 Task B: Compute factorial(3)... Task C: Compute factorial(3)... Task B: factorial(3) = 6 Task C: Compute factorial(4)... Task C: factorial(4) = 24
  • 7. What we used Apache Kafka Not ujson Concurrency - doing lots of slow things at once. No processes, no threads. Producer from aiohttp import web import json Consumer import asyncio import json from aiokafka import AIOKafkaConsumer import aiopg
  • 8. Producer #1 async def kafka_send(kafka_producer, data, topic): message = { 'data': data, 'received': str(arrow.utcnow()) } message_json_bytes = bytes(json.dumps(message), 'utf-8') await kafka_producer.send_and_wait(topic, message_json_bytes) async def handle(request): post_data = await request.json() try: await kafka_send(request.app['kafka_p'], post_data, topic=settings.topic) except: slog.exception("Kafka Error") await destroy_all() return web.Response(status=200) app = web.Application() app.router.add_route('POST', '/store', handle) app['kafka_p'] = get_kafka_producer()
  • 9. Destroying the loop async def destroy_all(): loop = asyncio.get_event_loop() for task in asyncio.Task.all_tasks(): task.cancel() await loop.stop() await loop.close() slog.debug("Exiting.") sys.exit() def get_kafka_producer(): loop = asyncio.get_event_loop() producer = AIOKafkaProducer( loop=loop, bootstrap_servers=settings.queues_urls, request_timeout_ms=settings.kafka_timeout, retry_backoff_ms=1000) loop.run_until_complete(producer.start()) return producer Getting producer Producer #2
  • 10. Consume … time to resurrect consumer ... DB connected 1. Receive data record from Kafka 2. Put it to the queue start yesno Flush queue full enough or data old enough Store data from queue to DB yesno Connect to DB start asyncio.Queue() Consumer #1
  • 11. def main(): dbs_connected = asyncio.Future() batch = asyncio.Queue(maxsize=settings.batch_max_size) asyncio.ensure_future(consume(batch, dbs_connected)) asyncio.ensure_future(start_flushing(batch, dbs_connected)) loop.run_forever() async def consume(queue, dbs_connected): await asyncio.wait_for(dbs_connected, timeout=settings.wait_for_databases) consumer = AIOKafkaConsumer( settings.topic, loop=loop, bootstrap_servers=settings.queues_urls, group_id='consumers' ) await consumer.start() async for msg in consumer: message = json.loads(msg.value.decode("utf-8")) await queue.put((message.get('received'), message.get('data'))) await consumer.stop() Consumer #2
  • 12. async def start_flushing(queue, dbs_connected): db_logg = await aiopg.create_pool(settings.logs_db_url) while True: async with db_logg.acquire() as logg_conn, logg_conn.cursor() as logg_cur: await keep_flushing(dbs_connected, logg_cur, queue) await asyncio.sleep(2) async def keep_flushing(dbs_connected, logg_cur, queue): dbs_connected.set_result(True) last_stored_time = time.time() while True: if not queue.empty() and (queue.qsize() > settings.batch_flush_size or time.time() - last_stored_time > settings.batch_max_time): to_store = [] while not queue.empty(): to_store.append(await queue.get()) try: await store_bulk(logg_cur, to_store) except: break # DB down, breaking to reconnect last_stored_time = time.time() await asyncio.sleep(settings.batch_sleep) Consumer #3
  • 13. Code is public on gitlab https://gitlab.skypicker.com/ondrej/faqstorer www.orwen.org code.kiwi.com www.kiwi.com/jobs/ Check graphs...

Editor's Notes

  • #3: Talk more about Kiwi.com Skyscanner, Momondo
  • #4: 5 TB Postgres Database
  • #7: PEP 492 -- Coroutines with async and await syntax, 09-Apr-2015 Python 3.5