Scalable Data Pipelines: Architecting For The Petabyte Era

Ebook213 pages2 hours

Scalable Data Pipelines: Architecting For The Petabyte Era

Name: Scalable Data Pipelines: Architecting For The Petabyte Era
Author: Oreoluwa Adebayo
ISBN: 9789235783285

By Oreoluwa Adebayo

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Scalable Data Pipelines Architecting for the Petabyte Era is a timely and essential guide for professionals navigating the explosive growth of data in today's digital world. As organizations generate data at unprecedented rates from everyday user interactions to complex industrial IoT readings the need for robust scalabl

Skip carousel

LanguageEnglish

PublisherPlexity Digital

Release dateJul 17, 2021

ISBN9789235783285

Author

Oreoluwa Adebayo

Related authors

Skip carousel

Related to Scalable Data Pipelines

Related ebooks

Skip carousel

The Power of Big Data: Transforming Industries and Shaping the Future
Ebook
The Power of Big Data: Transforming Industries and Shaping the Future
byTom Henricksen
Rating: 0 out of 5 stars
0 ratings
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
Ebook
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
byManoj Kumar
Rating: 0 out of 5 stars
0 ratings
Crash Course Big Data
Ebook
Crash Course Big Data
byIntroBooks Team
Rating: 0 out of 5 stars
0 ratings
Real-Time Big Data Analytics: Emerging Trends
Ebook
Real-Time Big Data Analytics: Emerging Trends
byTrilokesh Khatri
Rating: 0 out of 5 stars
0 ratings
Practical Dataflow Engineering: Definitive Reference for Developers and Engineers
Ebook
Practical Dataflow Engineering: Definitive Reference for Developers and Engineers
byRichard Johnson
Rating: 0 out of 5 stars
0 ratings
Building Scalable Data-Intensive Applications
Ebook
Building Scalable Data-Intensive Applications
byChandani Kaul
Rating: 0 out of 5 stars
0 ratings
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
Ebook
StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers
byWilliam Smith
Rating: 0 out of 5 stars
0 ratings
Making Big Data Work for Your Business: A guide to effective Big Data analytics
Ebook
Making Big Data Work for Your Business: A guide to effective Big Data analytics
bySudhi Sinha
Rating: 0 out of 5 stars
0 ratings
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
Ebook
Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers
byRichard Johnson
Rating: 0 out of 5 stars
0 ratings
Essential Apache Beam: Definitive Reference for Developers and Engineers
Ebook
Essential Apache Beam: Definitive Reference for Developers and Engineers
byRichard Johnson
Rating: 0 out of 5 stars
0 ratings
Big Data: the Revolution That Is Transforming Our Work, Market and World
Ebook
Big Data: the Revolution That Is Transforming Our Work, Market and World
byPAT NAKAMOTO
Rating: 0 out of 5 stars
0 ratings
Open-Source Odyssey: Pioneering Data Engineering with AI Automation
Ebook
Open-Source Odyssey: Pioneering Data Engineering with AI Automation
byMuthukrishnan Muthusubramanian
Rating: 0 out of 5 stars
0 ratings
Python Automation Mastery: From Novice To Pro
Ebook
Python Automation Mastery: From Novice To Pro
byRob Botwright
Rating: 0 out of 5 stars
0 ratings
Hadoop Ecosystem for Big Data
Ebook
Hadoop Ecosystem for Big Data
byDr. Zemelak Goraga
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics: Turning Big Data into Big Money
Ebook
Big Data Analytics: Turning Big Data into Big Money
byFrank J. Ohlhorst
Rating: 0 out of 5 stars
0 ratings
Ultimate AWS Data Engineering: Design, Implement and Optimize Scalable Data Solutions on AWS with Practical Workflows and Visual Aids for Unmatched Impact (English Edition)
Ebook
Ultimate AWS Data Engineering: Design, Implement and Optimize Scalable Data Solutions on AWS with Practical Workflows and Visual Aids for Unmatched Impact (English Edition)
byRathish Mohan
Rating: 0 out of 5 stars
0 ratings
The Data-Driven World - How Big Data is Transforming Business and Society
Ebook
The Data-Driven World - How Big Data is Transforming Business and Society
byAlex Dawson
Rating: 0 out of 5 stars
0 ratings
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
Ebook
CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers
byWilliam Smith
Rating: 0 out of 5 stars
0 ratings
Comprehensive Guide to Data Integration with Hevo: Definitive Reference for Developers and Engineers
Ebook
Comprehensive Guide to Data Integration with Hevo: Definitive Reference for Developers and Engineers
byRichard Johnson
Rating: 0 out of 5 stars
0 ratings
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
Ebook
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
byNeylson Crepalde
Rating: 0 out of 5 stars
0 ratings
Data Decoded - Understanding Big Data and Its Everyday Applications
Ebook
Data Decoded - Understanding Big Data and Its Everyday Applications
byMichael Reed
Rating: 0 out of 5 stars
0 ratings
A Technical Excellence Framework for Innovative Digital Transformation Leadership
Ebook
A Technical Excellence Framework for Innovative Digital Transformation Leadership
byDr Mehmet Yildiz
Rating: 5 out of 5 stars
5/5
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Ebook
InfluxDB Essentials: Definitive Reference for Developers and Engineers
byRichard Johnson
Rating: 0 out of 5 stars
0 ratings
The Future of IoT: Leveraging the Shift to a Data Centric World
Ebook
The Future of IoT: Leveraging the Shift to a Data Centric World
byDon DeLoach
Rating: 1 out of 5 stars
1/5
Crafting Data-Driven Solutions: Core Principles for Robust, Scalable, and Sustainable Systems
Ebook
Crafting Data-Driven Solutions: Core Principles for Robust, Scalable, and Sustainable Systems
byPeter Jones
Rating: 0 out of 5 stars
0 ratings
Big Data: Revolutionizing the Future
Ebook
Big Data: Revolutionizing the Future
byParvati Mishra
Rating: 0 out of 5 stars
0 ratings
Hands-on Cloud Analytics with Microsoft Azure Stack
Ebook
Hands-on Cloud Analytics with Microsoft Azure Stack
byPrashila Naik
Rating: 0 out of 5 stars
0 ratings
Airflow for Data Workflow Automation
Ebook
Airflow for Data Workflow Automation
byRichard Johnson
Rating: 0 out of 5 stars
0 ratings
Splunk for Data Insights: Definitive Reference for Developers and Engineers
Ebook
Splunk for Data Insights: Definitive Reference for Developers and Engineers
byRichard Johnson
Rating: 0 out of 5 stars
0 ratings
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
Ebook
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
byDr Mehmet Yildiz
Rating: 5 out of 5 stars
5/5

Systems Architecture For You

Skip carousel

Arduino Projects For Dummies
Ebook
Arduino Projects For Dummies
byBrock Craft
Rating: 3 out of 5 stars
3/5
Microsoft Certification: Complete step by step guide to pass all Microsoft Exams and get certifications real and unique practice tests included
Ebook
Microsoft Certification: Complete step by step guide to pass all Microsoft Exams and get certifications real and unique practice tests included
byDavid Mayer
Rating: 5 out of 5 stars
5/5
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101
Ebook
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
CompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61
Ebook
CompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61
byMike Chapple
Rating: 5 out of 5 stars
5/5
Raspberry Pi Projects For Dummies
Ebook
Raspberry Pi Projects For Dummies
byMike Cook
Rating: 5 out of 5 stars
5/5
A Modern Enterprise Architecture Approach: Enterprise Architecture
Ebook
A Modern Enterprise Architecture Approach: Enterprise Architecture
byDr Mehmet Yildiz
Rating: 4 out of 5 stars
4/5
History of Hacking
Ebook
History of Hacking
byBrandon Jones
Rating: 0 out of 5 stars
0 ratings
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Ebook
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
byRodrigo Copetti
Rating: 0 out of 5 stars
0 ratings
AutoCAD 2023 : Beginners And Intermediate user Guide
Ebook
AutoCAD 2023 : Beginners And Intermediate user Guide
byDaniel Smith
Rating: 0 out of 5 stars
0 ratings
A Practical Guide for IoT Solution Architects
Ebook
A Practical Guide for IoT Solution Architects
byDr Mehmet Yildiz
Rating: 5 out of 5 stars
5/5
AWS Certified Solutions Architect – Professional Exam Guide (SAP-C02): Gain the practical skills, knowledge, and confidence to ace the AWS (SAP-C02) exam on your first attempt
Ebook
AWS Certified Solutions Architect – Professional Exam Guide (SAP-C02): Gain the practical skills, knowledge, and confidence to ace the AWS (SAP-C02) exam on your first attempt
byPatrick Sard
Rating: 0 out of 5 stars
0 ratings
Architecting Digital Transformation
Ebook
Architecting Digital Transformation
byDr Mehmet Yildiz
Rating: 5 out of 5 stars
5/5
Mining for Knowledge: Exploring GPU Architectures In Cryptocurrency and AI: The Crypto Mining Mastery Series, #2
Ebook
Mining for Knowledge: Exploring GPU Architectures In Cryptocurrency and AI: The Crypto Mining Mastery Series, #2
byLadd Baby
Rating: 0 out of 5 stars
0 ratings
Software Architecture Fundamentals: A Study Guide for the Certified Professional for Software Architecture® – Foundation Level – iSAQB compliant
Ebook
Software Architecture Fundamentals: A Study Guide for the Certified Professional for Software Architecture® – Foundation Level – iSAQB compliant
byMahbouba Gharbi
Rating: 5 out of 5 stars
5/5
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 2 Exam 220-1102
Ebook
CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 2 Exam 220-1102
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
Game Boy Advance Architecture: Architecture of Consoles: A Practical Analysis, #7
Ebook
Game Boy Advance Architecture: Architecture of Consoles: A Practical Analysis, #7
byRodrigo Copetti
Rating: 0 out of 5 stars
0 ratings
PSP Architecture: Architecture of Consoles: A Practical Analysis, #18
Ebook
PSP Architecture: Architecture of Consoles: A Practical Analysis, #18
byRodrigo Copetti
Rating: 0 out of 5 stars
0 ratings
Game Boy / Color Architecture: Architecture of Consoles: A Practical Analysis, #2
Ebook
Game Boy / Color Architecture: Architecture of Consoles: A Practical Analysis, #2
byRodrigo Copetti
Rating: 0 out of 5 stars
0 ratings
PlayStation 3 Architecture: Architecture of Consoles: A Practical Analysis, #19
Ebook
PlayStation 3 Architecture: Architecture of Consoles: A Practical Analysis, #19
byRodrigo Copetti
Rating: 0 out of 5 stars
0 ratings
AWS Certified Solutions Architect - Associate Exam Prep kit
Ebook
AWS Certified Solutions Architect - Associate Exam Prep kit
bySUJAN
Rating: 0 out of 5 stars
0 ratings
NES Architecture: Architecture of Consoles: A Practical Analysis, #1
Ebook
NES Architecture: Architecture of Consoles: A Practical Analysis, #1
byRodrigo Copetti
Rating: 5 out of 5 stars
5/5
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
Ebook
Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14
byRodrigo Copetti
Rating: 0 out of 5 stars
0 ratings
The Official BBC micro:bit User Guide
Ebook
The Official BBC micro:bit User Guide
byGareth Halfacree
Rating: 4 out of 5 stars
4/5
The Ultimate Guide To Auto Cad 2022 3D Modeling For 3d Drawing And Modeling
Ebook
The Ultimate Guide To Auto Cad 2022 3D Modeling For 3d Drawing And Modeling
byALLEN BENTON
Rating: 0 out of 5 stars
0 ratings
CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008
Ebook
CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008
byMike Chapple
Rating: 0 out of 5 stars
0 ratings
100 Puzzles to Learn Data Warehousing
Ebook
100 Puzzles to Learn Data Warehousing
byCristian Scutaru
Rating: 0 out of 5 stars
0 ratings
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
Ebook
Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5
byRodrigo Copetti
Rating: 0 out of 5 stars
0 ratings
The Automation Revolution A Beginner’s Guide to Digital Automation
Ebook
The Automation Revolution A Beginner’s Guide to Digital Automation
byturki alkhwlani
Rating: 5 out of 5 stars
5/5
Quantum Computer Vs Traditional Computer
Ebook
Quantum Computer Vs Traditional Computer
byArief Muinnudin
Rating: 0 out of 5 stars
0 ratings

Related categories

Skip carousel

Reviews for Scalable Data Pipelines

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Scalable Data Pipelines - Oreoluwa Adebayo

Dedication

This book is dedicated to the countless data engineers and architects who work tirelessly behind the scenes to build and maintain the invisible infrastructure that powers our data-driven world. Your dedication to reliability, scalability, and efficiency often goes unnoticed, but your contributions are fundamental to the progress of science, business, and society. It is also dedicated to my family, whose unwavering support and encouragement made this endeavor possible. Your patience and understanding during the long hours of writing were a constant source of motivation.

Finally, this book is for the aspiring data professionals who are embarking on their journey to master the art of building scalable data pipelines. May this book serve as a guiding light in your exploration of the petabyte era.

Dedication

Foreword

Preface

Introduction

Chapter one

The Rise of Petabyte-Scale Data Navigating the Uncharted Territories of the Digital Deluge

Chapter two

Fundamentals of Scalable Data Architectures

Chapter three

Designing for Fault Tolerance and Resilience

Chapter four

Data Ingestion at Scale

Chapter five

Storage Solutions for Massive Datasets

Chapter six

Efficient Data Processing Frameworks

Chapter seven

Orchestrating Complex Data Workflows

Chapter eight

Real-Time and Batch Processing Trade-offs

Chapter nine

Security and Governance in Large-Scale Pipelines

Chapter ten

Future-Proofing Your Data Infrastructure

Foreword

It is with great pleasure that I write the foreword for Tochukwu Njoku's timely and insightful book, Scalable Data Pipelines: Architecting for the Petabyte Era. In today's data-saturated world, the ability to effectively manage and process vast amounts of information is no longer a luxury but a fundamental necessity for any organization seeking to thrive.

Tochukwu, through his extensive experience and deep understanding of the data landscape, has crafted a comprehensive guide that tackles the critical challenges of building data pipelines on a scale. This book goes beyond theoretical concepts and delves into the practical considerations and architectural patterns that are essential for navigating the complexities of the Petabyte era.

The insights shared within these pages are not just relevant for today's challenges but also provide a solid foundation for building data infrastructure that can adapt to the evolving demands of tomorrow. Whether you are a seasoned data engineer, a budding data architect, or a technology leader grappling with the realities of big data, this book offers invaluable guidance and practical strategies.

Tochukwu's passion for the field and his commitment to sharing knowledge are evident throughout the book. He has successfully distilled complex concepts into accessible language, making this a valuable resource for a wide audience.

I highly recommend Scalable Data Pipelines: Architecting for the Petabyte Era to anyone who is serious about harnessing the power of data on a scale. It is a must-read for those who are building the data infrastructure of the future.

Preface

The digital landscape is being reshaped at an unprecedented pace, driven by an exponential surge in data generation. From the mundane clicks of online interactions to the complex sensor readings of industrial IoT devices, data is no longer a trickle but a torrential downpour. This deluge presents both immense opportunities and significant challenges. Organizations that can effectively harness, process, and analyze this vast ocean of information stand to gain invaluable insights, drive innovation, and achieve a competitive edge. However, the traditional approaches to data management and processing often falter when confronted with the sheer volume, velocity, and variety of data in the petabyte era.

This book, Scalable Data Pipelines: Architecting for the Petabyte Era, is born out of the necessity to navigate this new data reality. It is a guide for data engineers, architects, scientists, and anyone involved in building and maintaining robust and scalable data infrastructure. We delve into the core principles, architectural patterns, and practical techniques required to design and implement data pipelines that can not only handle today's massive datasets but are also future proofed for the even greater data volumes to come.

Within these pages, you will find a comprehensive exploration of the key considerations for building scalable data pipelines, from data ingestion and storage to transformation, processing, and delivery. We will examine various technologies and frameworks, discuss best practices for performance optimization and fault tolerance, and explore the evolving landscape of cloud-based data solutions.

This book is not just about theoretical concepts; it is grounded in practical experience and real-world challenges. It aims to equip you with the knowledge and tools necessary to architect data pipelines that are not only scalable but also reliable, efficient, and adaptable to the ever-changing demands of the petabyte era. Join us on this journey to unlock the power of big data through the art and science of scalable data pipelines.

Introduction

The petabyte era is no longer a futuristic concept; it is the present reality for many organizations. The sheer scale of data being generated daily necessitates a fundamental shift in how we approach data management and processing. Traditional batch-oriented systems and monolithic architectures often struggle to cope with the velocity and volume of modern datasets. This is where the concept of scalable data pipelines becomes critical.

A data pipeline is a series of interconnected steps that transform raw data into usable information. In the petabyte era, these pipelines must be designed with scalability as a core principle. They need to be able to handle massive data volumes, process them efficiently, and adapt to fluctuating data loads without compromising performance or reliability.

This book provides a comprehensive guide to architecting such scalable data pipelines. We will explore the fundamental building blocks of a modern data pipeline, including data ingestion techniques for efficiently and reliably bringing data from various sources into the pipeline; scalable and cost-effective data storage solutions capable of handling petabyte-scale datasets; data transformation methods for cleaning, shaping, and preparing data for analysis at scale; data processing using distributed computing frameworks and techniques for processing massive datasets in parallel; data delivery strategies for making processed data accessible to downstream systems and users; and monitoring and management tools and techniques for ensuring the health, performance, and reliability of data pipelines.

We will also delve into key architectural patterns for building scalable systems, such as distributed computing, microservices, and cloud-native architectures. We will examine various technologies and frameworks commonly used in the big data ecosystem, including but not limited to Apache Spark, Apache Kafka, cloud-based data warehousing solutions, and serverless computing.

This book is intended for a broad audience, including data engineers looking to deepen their understanding of scalable architectures, data architects responsible for designing robust data infrastructure, data scientists seeking to optimize their data processing workflows, and technology leaders aiming to leverage the power of big data within their organizations. While some familiarity with data processing concepts will be beneficial, we will strive to explain complex topics in a clear and accessible manner.

Our goal is to empower you with the knowledge and practical insights needed to design, build, and maintain data pipelines that can not only handle the challenges of today's petabyte era but also pave the way for future data-driven innovation.

Chapter one

The Rise of Petabyte-Scale Data Navigating the Uncharted Territories of the Digital Deluge

The opening decades of the 21st century have witnessed a transformation of unprecedented scale, a fundamental reshaping of the very fabric of our digital existence. At the heart of this metamorphosis lies an explosive growth in data generation, a phenomenon so profound that it has propelled us into an era defined by petabyte-scale datasets. Organizations across the globe, irrespective of their size or industry, now grapple with daily influxes of information that were once the realm of science fiction. This chapter embarks on a comprehensive exploration of the multifaceted drivers fueling this relentless expansion of the digital universe, meticulously dissecting the key technological and societal forces that have irrevocably altered the landscape of data management. We will delve into the intricate mechanisms by which the burgeoning ecosystem of Internet of Things (IoT) devices, the pervasive influence of social media platforms, the transformative power of artificial intelligence and machine learning (AI/ML), and the imperative for real-time analytics have collectively conspired to unleash this tidal wave of data. Furthermore, we will undertake a critical examination of the inherent limitations and growing inadequacies of traditional data architectures when confronted with datasets of this magnitude, highlighting the fundamental reasons why modern enterprises can no longer afford to cling to outdated paradigms. The chapter will culminate in a compelling argument for a radical rethinking of data pipelines, emphasizing the urgent need for organizations to embrace innovative approaches to data ingestion, processing, storage, and analysis to not only survive the challenges posed by petabyte-scale data but to harness its immense potential to achieve sustained competitive advantage in an increasingly data-centric world.

The Unstoppable Floodgates: Unpacking the Multifarious Drivers of Petabyte Proliferation

The exponential surge in data volumes is not a singular event but rather the confluence of several powerful and interconnected technological advancements and pervasive societal trends. To truly comprehend the magnitude of the petabyte-scale data phenomenon, it is crucial to meticulously dissect the individual contributions of these driving forces.

The relentless proliferation of Internet of Things (IoT) devices stands as a cornerstone of this data explosion. This vast and rapidly expanding network of interconnected physical objects embedded with sensors, software, and other technologies is capable of collecting and exchanging data. From the mundane yet ubiquitous smart home appliances that monitor energy consumption and user preferences to the intricate networks of industrial sensors that optimize manufacturing processes and the increasingly sophisticated wearable technology that tracks our health and activity levels, the sheer diversity and pervasiveness of IoT devices contribute significantly to the data deluge. Connected vehicles, equipped with an array of sensors and communication capabilities, generate massive amounts of data related to navigation, performance, and even driver behavior. The defining characteristic of IoT data is its continuous, granular nature. Each device, often operating autonomously, constantly streams data points, accumulating into colossal datasets over time. As the number of connected devices continues its projected exponential growth, reaching tens of billions in the coming years, the volume of data generated will only intensify, creating an ever-increasing burden on traditional data management systems that were never conceived to handle such relentless and granular streams of information.

Simultaneously, the pervasive and deeply ingrained influence of social media platforms on modern life continues to be a monumental contributor to this exponential data growth. Billions of users across the globe actively engage with these platforms daily, generating an overwhelming variety and volume of predominantly unstructured data. Text posts, status updates, images, videos, live streams, and intricate networks of user interactions, including likes, shares, and comments, all contribute to this massive digital footprint. The velocity at which this social media data is generated is staggering, with millions of posts and interactions occurring every minute. Furthermore, the inherent variety of this data, ranging from simple text to rich multimedia content, presents unique and complex challenges for storage, processing, and meaningful analysis. Extracting valuable insights from this dynamic and often noisy data requires sophisticated techniques that go far beyond the capabilities of traditional relational databases and batch-oriented processing methods. The sheer scale and dynamism of social media data have fundamentally stretched the limits of existing data management capabilities, demanding entirely new approaches to handle its unique characteristics.

The transformative field of artificial intelligence and machine learning (AI/ML) plays a dual role in the rise of petabyte-scale data, acting as both a significant driver of its generation and a voracious consumer of its vast quantities. The development and training of increasingly sophisticated AI/ML

Enjoying the preview?

Page 1 of 1

Scalable Data Pipelines: Architecting For The Petabyte Era

About this ebook

Oreoluwa Adebayo

Related authors

Related to Scalable Data Pipelines

Related ebooks

The Power of Big Data: Transforming Industries and Shaping the Future

Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)

Crash Course Big Data

Real-Time Big Data Analytics: Emerging Trends

Practical Dataflow Engineering: Definitive Reference for Developers and Engineers

Building Scalable Data-Intensive Applications

StreamSets Data Integration Architecture and Design: The Complete Guide for Developers and Engineers

Making Big Data Work for Your Business: A guide to effective Big Data analytics

Data Pipeline Automation with Airbyte: Definitive Reference for Developers and Engineers

Essential Apache Beam: Definitive Reference for Developers and Engineers

Big Data: the Revolution That Is Transforming Our Work, Market and World

Open-Source Odyssey: Pioneering Data Engineering with AI Automation

Python Automation Mastery: From Novice To Pro

Hadoop Ecosystem for Big Data

Big Data Analytics: Turning Big Data into Big Money

Ultimate AWS Data Engineering: Design, Implement and Optimize Scalable Data Solutions on AWS with Practical Workflows and Visual Aids for Unmatched Impact (English Edition)

The Data-Driven World - How Big Data is Transforming Business and Society

CrateDB for IoT and Machine Data: The Complete Guide for Developers and Engineers

Comprehensive Guide to Data Integration with Hevo: Definitive Reference for Developers and Engineers

Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions

Data Decoded - Understanding Big Data and Its Everyday Applications

A Technical Excellence Framework for Innovative Digital Transformation Leadership

InfluxDB Essentials: Definitive Reference for Developers and Engineers

The Future of IoT: Leveraging the Shift to a Data Centric World

Crafting Data-Driven Solutions: Core Principles for Robust, Scalable, and Sustainable Systems

Big Data: Revolutionizing the Future

Hands-on Cloud Analytics with Microsoft Azure Stack

Airflow for Data Workflow Automation

Splunk for Data Insights: Definitive Reference for Developers and Engineers

Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud

Systems Architecture For You

Arduino Projects For Dummies

Microsoft Certification: Complete step by step guide to pass all Microsoft Exams and get certifications real and unique practice tests included

CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101

Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)

CompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61

Raspberry Pi Projects For Dummies

A Modern Enterprise Architecture Approach: Enterprise Architecture

History of Hacking

PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12

AutoCAD 2023 : Beginners And Intermediate user Guide

A Practical Guide for IoT Solution Architects

AWS Certified Solutions Architect – Professional Exam Guide (SAP-C02): Gain the practical skills, knowledge, and confidence to ace the AWS (SAP-C02) exam on your first attempt

Architecting Digital Transformation

Mining for Knowledge: Exploring GPU Architectures In Cryptocurrency and AI: The Crypto Mining Mastery Series, #2

Software Architecture Fundamentals: A Study Guide for the Certified Professional for Software Architecture® – Foundation Level – iSAQB compliant

CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 2 Exam 220-1102

Game Boy Advance Architecture: Architecture of Consoles: A Practical Analysis, #7

PSP Architecture: Architecture of Consoles: A Practical Analysis, #18

Game Boy / Color Architecture: Architecture of Consoles: A Practical Analysis, #2

PlayStation 3 Architecture: Architecture of Consoles: A Practical Analysis, #19

AWS Certified Solutions Architect - Associate Exam Prep kit

NES Architecture: Architecture of Consoles: A Practical Analysis, #1

Nintendo DS Architecture: Architecture of Consoles: A Practical Analysis, #14

The Official BBC micro:bit User Guide

The Ultimate Guide To Auto Cad 2022 3D Modeling For 3d Drawing And Modeling

CompTIA Network+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam N10-008

100 Puzzles to Learn Data Warehousing

Sega Saturn Architecture: Architecture of Consoles: A Practical Analysis, #5

The Automation Revolution A Beginner’s Guide to Digital Automation

Quantum Computer Vs Traditional Computer

Related categories

Reviews for Scalable Data Pipelines

What did you think?

Book preview

Scalable Data Pipelines - Oreoluwa Adebayo

Dedication

TABLE OF CONTENTS

Foreword

Preface

Introduction