SlideShare a Scribd company logo
Building the High Speed Cyber Security
Data Pipeline Using Apache NiFi
Praveen Kanumarlapudi
Cyber Security
60% of Small
Businesses Fold Within
6 Months of a Cyber
Attack.
How to make it
success ?
Global Security Key Stake Holders
Security Operations Center Data Scientists Data Analysts Executives
An information security
operations center
("ISOC" or "SOC") is a
facility where
enterprise information
systems (websites,
applications, databases,
data centers and
servers, networks,
desktops and other
endpoints) are
monitored, assessed,
and defended.
Technology : SIEM
Security data scientists
have the skills to
understand complex
algorithms and build
advanced models for
threat and anomaly
detection and applying
these concepts to real
security data sets in
single or clustered
environments.
Technology : Python, R,
Big Data, Spark/Scala or
MATLAB…
Map and trace the data
from system to system
for solving a given
business or incident
problem.
Design and create data
reports using various
reporting tools that
help business executive
to make better
decisions.
Implements new
metrics for business
(KPIs)
Technology : SQL, SIEM,
Big Data, Reporting
tools
CSO’s,
CISO’s
Cyber Security ‘BIG data’ challenges
• Speed , Volume and Variety
 Data Ingestion
 Cleansing
 Transformation
• data reliance
 Executives – KPI Metrics
 Data scientists
 SOC
 Data Analysts
• Real-Time context
A couple of years Ago !
Network logs
Web logs
AD Logs
Infrastructure
logs
Application
Logs
Threat Intel
3rd Party RG
RDBMS
unstructured(semi)structured
Syslog
servers
SIEM APP
Sqoop
PySpark
SIEM Tool
Data Source Ingestion Integration Delivery
Flume
UBA Tools
SOCDataScienceKPI/Reporting
Challenges
• Complexity of Architecture
• Debugging
• Data Source Dependencies
• Lack of Centralized logging
• Multiple Data Copies
• Stress on Network
• Transformations with respect to destination
Solution Framework
 Single Data entry point – avoids network traffic and
duplicate data flowing around
 Transformations according destination – reduces the
reliance on source
 Should be capable of handling different formats and
different sources
Ingest Clean/Route
Transform for
1
Transform for
2
Route to 1
Route to 2
Archive
Deployment Models
Data Sources
Challenges
 Good architectural understanding of all
systems
 Good amount of coding effort
 Long development hours
 Maintenance overheads
 Maintain the sync between the systems
 Provenance
• Guaranteed delivery
• Processors that supports multiple
formats
• Ease to develop the flows and
deploy in minutes
• Open Source and rich community
The Data Gateway
Network logs
Web logs
AD Logs
Infrastructure
logs
Application
Logs
Threat Intel
3rd Party RG
RDBMS
unstructured(semi)structured
Data Source Data Gateway Delivery
SOCDataScienceKPI/Reporting
SOC
Sample Flow
To Azure
To Splunk
Grafana dashboard – Last 7 days
Yesterday
In the middle of the day
Metrics
 100+ production flows
 ~ 20 Billion events
 1000+ Transformations
Next ?
 MiNiFi
 Stateless NiFi
 Registry
 SAM
 Real-Time Model training
 CI/CD, NiFi API’s
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

More Related Content

What's hot (20)

PDF
SIEM and SOC
Abolfazl Naderi
 
PPTX
EDR vs SIEM - The fight is on
Justin Henderson
 
PPTX
Splunk Overview
Splunk
 
PDF
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Kai Wähner
 
PDF
Microservice Architecture
Nguyen Tung
 
PPTX
SOC: Use cases and are we asking the right questions?
Jonathan Sinclair
 
PDF
SOC Certification Runbook Template
Mark S. Mahre
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Splunk
Simstream
 
PDF
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Kai Wähner
 
PPTX
Splunk Enterprise Security
Splunk
 
PDF
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
PDF
DevSecOps and the CI/CD Pipeline
James Wickett
 
PPTX
Splunk Phantom SOAR Roundtable
Splunk
 
PDF
Cyber threat intelligence ppt
Kumar Gaurav
 
PPTX
Security architecture, engineering and operations
Piyush Jain
 
PDF
SOC Architecture - Building the NextGen SOC
Priyanka Aash
 
PDF
Splunk-Presentation
PrasadThorat23
 
PDF
Visualization for Security
Raffael Marty
 
PDF
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 
SIEM and SOC
Abolfazl Naderi
 
EDR vs SIEM - The fight is on
Justin Henderson
 
Splunk Overview
Splunk
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Kai Wähner
 
Microservice Architecture
Nguyen Tung
 
SOC: Use cases and are we asking the right questions?
Jonathan Sinclair
 
SOC Certification Runbook Template
Mark S. Mahre
 
Learn to Use Databricks for Data Science
Databricks
 
Splunk
Simstream
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Kai Wähner
 
Splunk Enterprise Security
Splunk
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
DevSecOps and the CI/CD Pipeline
James Wickett
 
Splunk Phantom SOAR Roundtable
Splunk
 
Cyber threat intelligence ppt
Kumar Gaurav
 
Security architecture, engineering and operations
Piyush Jain
 
SOC Architecture - Building the NextGen SOC
Priyanka Aash
 
Splunk-Presentation
PrasadThorat23
 
Visualization for Security
Raffael Marty
 
Making Data Timelier and More Reliable with Lakehouse Technology
Matei Zaharia
 

Similar to Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi (20)

PPTX
Analytics with unified file and object
Sandeep Patil
 
PDF
Webinar - Feel Secure with revolutionary OTM Solution
JK Tech
 
PPTX
How Data Analytics is Re-defining Modern Era in Cyber Security
Saqib Chaudhry
 
PPTX
Building an Analytics - Enabled SOC Breakout Session
Splunk
 
PPTX
WebAction In-Memory Computing Summit 2015
WebAction
 
PPTX
A streaming architecture for Cyber Security - Apache Metron
Simon Elliston Ball
 
PDF
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Big Data Spain
 
PPTX
SplunkLive! London 2017 - Building an Analytics Driven Security Operation Cen...
Splunk
 
PDF
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
In-Memory Computing Summit
 
PDF
Data Care, Feeding, and Maintenance
Mercedes Coyle
 
PPTX
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
PDF
Building a Cyber Security Operations Center for SCADA/ICS Environments
Shah Sheikh
 
PPTX
SOC Analysis
chipo3
 
PDF
PLNOG19 - Gaweł Mikołajczyk & Michał Garcarz - SOC, studium ciężkich przypadków
PROIDEA
 
PPT
Real-Time Analytics for Industries
Avadhoot Patwardhan
 
PPTX
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Kevin Mao
 
PPTX
lec1_Unit 1_rev.pptx_big data aanalytics
ashima967262
 
PDF
It's All About the Data - Tia Dubuisson
Catalina Arango
 
PDF
Cyber Security and Data Science
Ania Wieczorek
 
PDF
10-essential-capabilities-of-a-modern-soc1.pdf
reflandahartanto00
 
Analytics with unified file and object
Sandeep Patil
 
Webinar - Feel Secure with revolutionary OTM Solution
JK Tech
 
How Data Analytics is Re-defining Modern Era in Cyber Security
Saqib Chaudhry
 
Building an Analytics - Enabled SOC Breakout Session
Splunk
 
WebAction In-Memory Computing Summit 2015
WebAction
 
A streaming architecture for Cyber Security - Apache Metron
Simon Elliston Ball
 
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Big Data Spain
 
SplunkLive! London 2017 - Building an Analytics Driven Security Operation Cen...
Splunk
 
IMCSummit 2015 - Day 2 Developer Track - The Internet of Analytics – Discover...
In-Memory Computing Summit
 
Data Care, Feeding, and Maintenance
Mercedes Coyle
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
Building a Cyber Security Operations Center for SCADA/ICS Environments
Shah Sheikh
 
SOC Analysis
chipo3
 
PLNOG19 - Gaweł Mikołajczyk & Michał Garcarz - SOC, studium ciężkich przypadków
PROIDEA
 
Real-Time Analytics for Industries
Avadhoot Patwardhan
 
Achieving Real-time Ingestion and Analysis of Security Events through Kafka a...
Kevin Mao
 
lec1_Unit 1_rev.pptx_big data aanalytics
ashima967262
 
It's All About the Data - Tia Dubuisson
Catalina Arango
 
Cyber Security and Data Science
Ania Wieczorek
 
10-essential-capabilities-of-a-modern-soc1.pdf
reflandahartanto00
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
PPTX
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Ad

Recently uploaded (20)

PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

  • 1. Building the High Speed Cyber Security Data Pipeline Using Apache NiFi Praveen Kanumarlapudi
  • 3. 60% of Small Businesses Fold Within 6 Months of a Cyber Attack.
  • 4. How to make it success ?
  • 5. Global Security Key Stake Holders Security Operations Center Data Scientists Data Analysts Executives An information security operations center ("ISOC" or "SOC") is a facility where enterprise information systems (websites, applications, databases, data centers and servers, networks, desktops and other endpoints) are monitored, assessed, and defended. Technology : SIEM Security data scientists have the skills to understand complex algorithms and build advanced models for threat and anomaly detection and applying these concepts to real security data sets in single or clustered environments. Technology : Python, R, Big Data, Spark/Scala or MATLAB… Map and trace the data from system to system for solving a given business or incident problem. Design and create data reports using various reporting tools that help business executive to make better decisions. Implements new metrics for business (KPIs) Technology : SQL, SIEM, Big Data, Reporting tools CSO’s, CISO’s
  • 6. Cyber Security ‘BIG data’ challenges • Speed , Volume and Variety  Data Ingestion  Cleansing  Transformation • data reliance  Executives – KPI Metrics  Data scientists  SOC  Data Analysts • Real-Time context
  • 7. A couple of years Ago ! Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Syslog servers SIEM APP Sqoop PySpark SIEM Tool Data Source Ingestion Integration Delivery Flume UBA Tools SOCDataScienceKPI/Reporting
  • 8. Challenges • Complexity of Architecture • Debugging • Data Source Dependencies • Lack of Centralized logging • Multiple Data Copies • Stress on Network • Transformations with respect to destination
  • 9. Solution Framework  Single Data entry point – avoids network traffic and duplicate data flowing around  Transformations according destination – reduces the reliance on source  Should be capable of handling different formats and different sources Ingest Clean/Route Transform for 1 Transform for 2 Route to 1 Route to 2 Archive
  • 11. Challenges  Good architectural understanding of all systems  Good amount of coding effort  Long development hours  Maintenance overheads  Maintain the sync between the systems  Provenance
  • 12. • Guaranteed delivery • Processors that supports multiple formats • Ease to develop the flows and deploy in minutes • Open Source and rich community
  • 13. The Data Gateway Network logs Web logs AD Logs Infrastructure logs Application Logs Threat Intel 3rd Party RG RDBMS unstructured(semi)structured Data Source Data Gateway Delivery SOCDataScienceKPI/Reporting SOC
  • 17. Grafana dashboard – Last 7 days
  • 19. In the middle of the day
  • 20. Metrics  100+ production flows  ~ 20 Billion events  1000+ Transformations
  • 21. Next ?  MiNiFi  Stateless NiFi  Registry  SAM  Real-Time Model training  CI/CD, NiFi API’s

Editor's Notes

  • #4: 60 Percent of Small Businesses went out of business in just 6 months after cyber attacks