SlideShare a Scribd company logo
Improving Mobile
Payments with Real time
Spark
● Madhukara Phatak
● Big data consultant and
trainer at datamantra.io
● Consult in Hadoop, Spark
and Scala
● www.madhukaraphatak.com
Agenda
● Mobile as drive for big data
● Our customer solution
● Existing data solution
● Improved solution
● Technical details
● Future enhancements
● Q & A
Mobile as Big data drive
● Mobile has changed the way in which we interact with
world
● Most of the buy/sell happens on mobile today
○ Myntra went fully mobile
○ Flipkart and amazon say their 50% buy happens on
mobile
○ Quikr and OLX is mobile based selling platform
○ Ola etc
Challenges in Mobile
● Customers expect the service to available 24/7
● Tiny screens make very challenging to typical software
flows
● Flaky connectivity of mobile networks makes it tougher
● Constant moving results in drop in interactions
● No more downtime
● Everything has to be done in realtime
Mobile payments
● Almost every app earlier mentioned needs some kind of
payment
● Getting payments right on mobile is very hard
● Globally 21% of online shoppers abandon their basket
due to payment failures or delays
● Some companies are building sdk’s to help the app
developers
● Our customer is one of them
Why mobile payments are hard?
Too many inputs
Terrible interface by Banks
OTP vs Password
Our customer solution
Our customer solution
● Mobile sdk for applications simplify the payments
● SDK provides better user interface like big buttons to
generate OTP or other flows
● SDK also helps in filling up different kind of forms given
by different banks using consistent UI
● Better user experience across applications
● Application sends anonymous payments details across
apps to our customer servers
Some numbers
● 40 + customers
● Over 1 million transactions per month as per March
● Around 55% success rate ( 5 % above average)
● Supports major banks, payment gateways and wallet
providers
● Soon will be available in other than mobile payment
space
Why data matters?
● As number of transaction increases, things will go
wrong
● There are so many different combinations to go wrong
● Example
○ Airtel OTP failing with state bank netbanking
○ Customers stuck in password page
○ Not able to read OTP from some specific
● Understanding customer pain and reacting to it is
paramount
● Every help results in payment
Initial BI solution
Events
Hourly
Push
JSON
Data
S3FS
Session Wise
Aggregations
Initial BI solution
● Phone sdk pushes events like transaction initiation,
payment complete to logging servers
● Logging servers roll log for every one hour and push to
s3
● A single node spark machine aggregates data by
sessions and pushes it to mysql
● Google BigQuery is used for adhoc querying
Challenges with BI solution
● Batch processing
● Geared towards more of report generation oriented flow
● Very minimal use of Spark API’s as team was not well
aware of it’s potential
● No integration with mobile sdk for feed back loop
Requirements for consulting
● Bring the same reporting calculations to real time
● Understanding the user behaviour and tracking his/her
flow over a session
● Closing the loop by providing automatic alerts based on
the metric calculations
● Some new specific business cases like loyalty
management etc
● Improving team expertise on spark
Choosing Spark streaming
● Company was already invested in Spark so spark
streaming was no brainer
● Also porting spark batch code to streaming was mostly
straight forward as both talk same API
● Company used python as Spark API language which
was supported by streaming also
● So we didn't consider storm we went ahead with Spark
streaming
First version
Events
Five Minute Push
JSON
Data
FileStream
Session Wise
Aggregations
First version
● We used fileStream API of spark streaming which
allowed us to poll a s3 bucket for every few mins
● A new rolling appender was added to log servers to
push logs to s3 every 5 mins
● Exact same batch code was used for calculations which
made transition very easy
● All downstream applications remained same
Second version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime
Amazon Kinesis
● A kafka like distributed message queue by Amazon
● It’s used as managed kafka source on AWS web
services
● Highly scalable and low latency support
● Persistence with fault tolerance across multiple
availability zones
● Great integration with Spark
Second version
● Amazon kinesis is added as real time stream source
● Logging server push logs to kinesis as they arrive
● Streaming application pulls the data from kinesis for
every few mins
● Multiple partitions support added for parallel streams
Challenges with Python
● Spark streaming API for python was introduced in 1.2
whereas spark-streaming for Scala/Java is available
from 0.8
● No aws kinesis connector was available as of March
● Team has to write it’s own
● No support for python in Spark job server
Challenges from batch to streaming
● Session typically last from 1-10 mins. Batch is easy
most of the time session is done for a one hour data but
challenging for real time data
● Designing state for session
● Designing checkpointing and deciding on interval
● Weird checkpointing issues with s3 due to eventual
consistency
Improvements to batch code
● Most of the code was written in rdd paradigm as it was
only know to team
● Team was trained on spark sql and spark streaming
● Majority code was ported to Spark sql based solution to
improve readability and maintainability
● Recently moved into Dataframe based code
Third version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime
Choosing Mesos
● Mesos is a great cluster manager for Spark only
workloads
● Has specific coarse-grain mode which is dedicated for
the real time systems
● Minimal overhead compared to YARN
● Easy to setup on EC2
Fourth version
Events
JSON
Data
Session Wise
Aggregations
Hourly
Push Realtime
Grafana
● Added grafana for visualization and dashboards
● Graphana = Graphite + influxDB
● Moved away from mysql to time series database influx
DB
● Scales much better compared to mysql
● Data scientists or product managers can monitor
customers using these dashboards
● Integrates with mobile sdk
Ad

More Related Content

What's hot (20)

Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
datamantra
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
datamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
datamantra
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
datamantra
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
datamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
datamantra
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
datamantra
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
datamantra
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source API
datamantra
 
Evolution of apache spark
Evolution of apache sparkEvolution of apache spark
Evolution of apache spark
datamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
datamantra
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
datamantra
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
datamantra
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Garindra Prahandono
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
datamantra
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
datamantra
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
datamantra
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
datamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
datamantra
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
datamantra
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
datamantra
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
datamantra
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
datamantra
 
Exploratory Data Analysis in Spark
Exploratory Data Analysis in SparkExploratory Data Analysis in Spark
Exploratory Data Analysis in Spark
datamantra
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
datamantra
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
datamantra
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source API
datamantra
 
Evolution of apache spark
Evolution of apache sparkEvolution of apache spark
Evolution of apache spark
datamantra
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
datamantra
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
datamantra
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
datamantra
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Garindra Prahandono
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
datamantra
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
datamantra
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
datamantra
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
datamantra
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 

Viewers also liked (16)

Digital Currencies: Where to from here?
Digital Currencies: Where to from here?Digital Currencies: Where to from here?
Digital Currencies: Where to from here?
Chartered Accountants Australia and New Zealand
 
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Bradley Perry
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
Rajesh Ananda Kumar
 
Google Wallet Presentation
Google Wallet PresentationGoogle Wallet Presentation
Google Wallet Presentation
Raghav Sharma
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
Krishna-Kumar
 
Visa
VisaVisa
Visa
Rudi Chatab
 
Digital currencies new technology new business model
Digital currencies new technology new business modelDigital currencies new technology new business model
Digital currencies new technology new business model
Shiva Bissessar
 
Payments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, creditPayments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, credit
Bloomberg LP
 
Apple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to knowApple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to know
Vibes_Thought_Leadership
 
Demonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital EconomyDemonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital Economy
Shreyas Kamath
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102
Blockstrap.com
 
CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102
Blockstrap.com
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
Uri Laserson
 
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Digital assets And Currencies In The Information Age: Do Your Ones And Zero H...
Bradley Perry
 
Google Wallet Presentation
Google Wallet PresentationGoogle Wallet Presentation
Google Wallet Presentation
Raghav Sharma
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
Krishna-Kumar
 
Digital currencies new technology new business model
Digital currencies new technology new business modelDigital currencies new technology new business model
Digital currencies new technology new business model
Shiva Bissessar
 
Payments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, creditPayments industry watches for inflections in mobile, credit
Payments industry watches for inflections in mobile, credit
Bloomberg LP
 
Apple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to knowApple's Passbook and Google Wallet: 5 key differences you need to know
Apple's Passbook and Google Wallet: 5 key differences you need to know
Vibes_Thought_Leadership
 
Demonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital EconomyDemonetisation: Push Towards a Digital Economy
Demonetisation: Push Towards a Digital Economy
Shreyas Kamath
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
datamantra
 
CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102CBGTBT - Part 5 - Blockchains 102
CBGTBT - Part 5 - Blockchains 102
Blockstrap.com
 
CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102CBGTBT - Part 6 - Transactions 102
CBGTBT - Part 6 - Transactions 102
Blockstrap.com
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
datamantra
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
Uri Laserson
 
Ad

Similar to Improving Mobile Payments With Real time Spark (20)

Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
Dani Solà Lagares
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
datamantra
 
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
idan_by
 
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays
 
Google App Engine Overview and Update
Google App Engine Overview and UpdateGoogle App Engine Overview and Update
Google App Engine Overview and Update
Chris Schalk
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
confluent
 
Instruments to play microservice
Instruments to play microserviceInstruments to play microservice
Instruments to play microservice
Chandresh Pancholi
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Scaling Production Data across Microservices
Scaling Production Data across MicroservicesScaling Production Data across Microservices
Scaling Production Data across Microservices
Erik Ashepa
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
shyamraj55
 
Nginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the IndustryNginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the Industry
Benjamin Scholler
 
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
WSO2
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Bowen Li
 
The good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsThe good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functions
Mohsiur Rahman
 
Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Sweet Streams (Are made of this)
Sweet Streams (Are made of this)
Corneil du Plessis
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
Digital Vidya
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
Dani Solà Lagares
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
datamantra
 
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays LIVE India - Asynchronous and Broadcasting APIs using Kafka by Rohit ...
apidays
 
Simply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
idan_by
 
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays LIVE New York 2021 - Managing the usage of Asynchronous APIs: What do...
apidays
 
Google App Engine Overview and Update
Google App Engine Overview and UpdateGoogle App Engine Overview and Update
Google App Engine Overview and Update
Chris Schalk
 
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward San Francisco 2018: Gregory Fee - "Bootstrapping State In Apach...
Flink Forward
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
confluent
 
Instruments to play microservice
Instruments to play microserviceInstruments to play microservice
Instruments to play microservice
Chandresh Pancholi
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
Scaling Production Data across Microservices
Scaling Production Data across MicroservicesScaling Production Data across Microservices
Scaling Production Data across Microservices
Erik Ashepa
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
shyamraj55
 
Nginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the IndustryNginx Conference 2016 - Learnings and State of the Industry
Nginx Conference 2016 - Learnings and State of the Industry
Benjamin Scholler
 
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
[APIdays NY] Managing the usage of Asynchronous APIs: What does it take?
WSO2
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
 
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Streaming at Lyft, Gregory Fee, Seattle Flink Meetup, Jun 2018
Bowen Li
 
The good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functionsThe good, the bad, the ugly side of step functions
The good, the bad, the ugly side of step functions
Mohsiur Rahman
 
Sweet Streams (Are made of this)
Sweet Streams (Are made of this)Sweet Streams (Are made of this)
Sweet Streams (Are made of this)
Corneil du Plessis
 
Structured Streaming in Spark
Structured Streaming in SparkStructured Streaming in Spark
Structured Streaming in Spark
Digital Vidya
 
Ad

More from datamantra (16)

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
datamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
datamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
datamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
datamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
datamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
datamantra
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
datamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
datamantra
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
datamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
datamantra
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
datamantra
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
datamantra
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
datamantra
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
datamantra
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
datamantra
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
datamantra
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
datamantra
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
datamantra
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
datamantra
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
datamantra
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
datamantra
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
datamantra
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
datamantra
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
datamantra
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
datamantra
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
datamantra
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
datamantra
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
datamantra
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
datamantra
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
datamantra
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
datamantra
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
datamantra
 

Recently uploaded (20)

183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 

Improving Mobile Payments With Real time Spark

  • 2. ● Madhukara Phatak ● Big data consultant and trainer at datamantra.io ● Consult in Hadoop, Spark and Scala ● www.madhukaraphatak.com
  • 3. Agenda ● Mobile as drive for big data ● Our customer solution ● Existing data solution ● Improved solution ● Technical details ● Future enhancements ● Q & A
  • 4. Mobile as Big data drive ● Mobile has changed the way in which we interact with world ● Most of the buy/sell happens on mobile today ○ Myntra went fully mobile ○ Flipkart and amazon say their 50% buy happens on mobile ○ Quikr and OLX is mobile based selling platform ○ Ola etc
  • 5. Challenges in Mobile ● Customers expect the service to available 24/7 ● Tiny screens make very challenging to typical software flows ● Flaky connectivity of mobile networks makes it tougher ● Constant moving results in drop in interactions ● No more downtime ● Everything has to be done in realtime
  • 6. Mobile payments ● Almost every app earlier mentioned needs some kind of payment ● Getting payments right on mobile is very hard ● Globally 21% of online shoppers abandon their basket due to payment failures or delays ● Some companies are building sdk’s to help the app developers ● Our customer is one of them
  • 12. Our customer solution ● Mobile sdk for applications simplify the payments ● SDK provides better user interface like big buttons to generate OTP or other flows ● SDK also helps in filling up different kind of forms given by different banks using consistent UI ● Better user experience across applications ● Application sends anonymous payments details across apps to our customer servers
  • 13. Some numbers ● 40 + customers ● Over 1 million transactions per month as per March ● Around 55% success rate ( 5 % above average) ● Supports major banks, payment gateways and wallet providers ● Soon will be available in other than mobile payment space
  • 14. Why data matters? ● As number of transaction increases, things will go wrong ● There are so many different combinations to go wrong ● Example ○ Airtel OTP failing with state bank netbanking ○ Customers stuck in password page ○ Not able to read OTP from some specific ● Understanding customer pain and reacting to it is paramount ● Every help results in payment
  • 16. Initial BI solution ● Phone sdk pushes events like transaction initiation, payment complete to logging servers ● Logging servers roll log for every one hour and push to s3 ● A single node spark machine aggregates data by sessions and pushes it to mysql ● Google BigQuery is used for adhoc querying
  • 17. Challenges with BI solution ● Batch processing ● Geared towards more of report generation oriented flow ● Very minimal use of Spark API’s as team was not well aware of it’s potential ● No integration with mobile sdk for feed back loop
  • 18. Requirements for consulting ● Bring the same reporting calculations to real time ● Understanding the user behaviour and tracking his/her flow over a session ● Closing the loop by providing automatic alerts based on the metric calculations ● Some new specific business cases like loyalty management etc ● Improving team expertise on spark
  • 19. Choosing Spark streaming ● Company was already invested in Spark so spark streaming was no brainer ● Also porting spark batch code to streaming was mostly straight forward as both talk same API ● Company used python as Spark API language which was supported by streaming also ● So we didn't consider storm we went ahead with Spark streaming
  • 20. First version Events Five Minute Push JSON Data FileStream Session Wise Aggregations
  • 21. First version ● We used fileStream API of spark streaming which allowed us to poll a s3 bucket for every few mins ● A new rolling appender was added to log servers to push logs to s3 every 5 mins ● Exact same batch code was used for calculations which made transition very easy ● All downstream applications remained same
  • 23. Amazon Kinesis ● A kafka like distributed message queue by Amazon ● It’s used as managed kafka source on AWS web services ● Highly scalable and low latency support ● Persistence with fault tolerance across multiple availability zones ● Great integration with Spark
  • 24. Second version ● Amazon kinesis is added as real time stream source ● Logging server push logs to kinesis as they arrive ● Streaming application pulls the data from kinesis for every few mins ● Multiple partitions support added for parallel streams
  • 25. Challenges with Python ● Spark streaming API for python was introduced in 1.2 whereas spark-streaming for Scala/Java is available from 0.8 ● No aws kinesis connector was available as of March ● Team has to write it’s own ● No support for python in Spark job server
  • 26. Challenges from batch to streaming ● Session typically last from 1-10 mins. Batch is easy most of the time session is done for a one hour data but challenging for real time data ● Designing state for session ● Designing checkpointing and deciding on interval ● Weird checkpointing issues with s3 due to eventual consistency
  • 27. Improvements to batch code ● Most of the code was written in rdd paradigm as it was only know to team ● Team was trained on spark sql and spark streaming ● Majority code was ported to Spark sql based solution to improve readability and maintainability ● Recently moved into Dataframe based code
  • 29. Choosing Mesos ● Mesos is a great cluster manager for Spark only workloads ● Has specific coarse-grain mode which is dedicated for the real time systems ● Minimal overhead compared to YARN ● Easy to setup on EC2
  • 31. Grafana ● Added grafana for visualization and dashboards ● Graphana = Graphite + influxDB ● Moved away from mysql to time series database influx DB ● Scales much better compared to mysql ● Data scientists or product managers can monitor customers using these dashboards ● Integrates with mobile sdk