0% found this document useful (0 votes)
178 views

Big Data Simplified: Book Description

g Math Adapting Reading Strategies
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
178 views

Big Data Simplified: Book Description

g Math Adapting Reading Strategies
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

4/17/2021 Big Data Simplified

Big Data Simplified


2 REVIEWS
by Sayan Goswami, Amit Kumar Das, Sourabh Mukherjee
Publisher: Pearson Education India
Release Date: June 2019
Topic: Data Lake

START READING NOW

View table of contents

Book Description
"Big Data Simplified blends technology with strategy and delves into applications of big data in specialized areas, such as recommendation engines, data
science and Internet of Things (IoT) and enables a practitioner to make the right technology choice. The steps to strategize a big data implementation are ⬆
also discussed in detail. This book presents a holistic approach to the topic, covering a wide landscape of big data technologies like Hadoop 2.0 and package
implementations, such as Cloudera. In-depth discussion of associated technologies, such as MapReduce, Hive, Pig, Oozie, ApacheZookeeper, Flume, Kafka,
Spark, Python and NoSQL databases like Cassandra, MongoDB, GraphDB, etc., is also included.
Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 1/14
4/17/2021 Big Data Simplified

About the Publisher


Learning isn’t a destination, starting and stopping at the classroom door. It’s a never-ending road of discovery, challenge,
inspiration, and wonder.

For many people, learning is the route to a job to support their family, ...
More about Pearson Education India

Table of Contents
Cover

About Pearson

Tittle

Copyright

Dedication

Brief Contents

Contents (1/2)

Contents (2/2)

Preface

Acknowledgements ⬆
About the Authors

ModelYour trial membership


Syllabus for Big Data has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 2/14
4/17/2021 Big Data Simplified

Lesson Plan

Chapter 1 A Closer Look at Data


1.1 Introduction

1.2 Types of Data

1.2.1 Structured Data

1.2.2 Unstructured Data

1.2.3 Semi-Structured Data

1.3 The Emergence of ‘New Data’

1.4 ‘New’ Data and ‘Traditional’ Data Compared

Summary

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 2 Introducing Big Data


2.1 Introduction

2.2 The Transition to Big Data

2.3 The Definition of Big Data

2.4 The V’s

2.5 Sources of Big Data



2.6 Common Applications of Big Data

2.7 An Introduction to Big Data Technologies

Your2.7.1
trialHadoop
membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 3/14
4/17/2021 Big Data Simplified

2.7.2 MapReduce

2.7.3 Hadoop A iliate Technologies

2.7.4 Massively Parallel Processing

2.7.5 NoSQL

2.7.6 Hadoop Hybrids

2.8 An Overview of Popular Vendors

2.8.1 Hadoop Distributions

2.8.2 Hadoop in the Cloud

2.8.3 HDFS-Alternative Products

2.8.4 NoSQL

2.8.5 MPP Products

2.8.6 Hybrids

2.8.7 Data Integration, Visualization, Analytics

2.8.8 Business Intelligence (BI)

Summary

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 3 Introducing Hadoop


3.1 Introduction ⬆
3.2 An Overview of Hadoop

3.3 Configuring a Hadoop Cluster (1/2)


Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 4/14
4/17/2021 Big Data Simplified

3.3 Configuring a Hadoop Cluster (2/2)

3.4 Storing Data with HDFS

3.4.1 The NameNode and DataNodes

3.4.2 Storing and Reading Files from HDFS

3.4.3 Fault Tolerance with Replication

3.4.4 NameNode Failure Management

3.5 HDFS Technical Commands

3.6 Hadoop Distributions

3.7 Hadoop in the Cloud

Summary

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 4 Introducing MapReduce


4.1 Introduction

4.2 Processing Data with MapReduce

4.2.1 A MapReduce Example

4.2.2 Technical Flow of a MapReduce Job

4.2.3 End-to-End Technical Anatomy of a MapReduce Job

4.3 Parallelism in Map and Reduce Phases ⬆


4.3.1 Using a Single Reducer

4.3.2 Using Multiple Reducers


Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 5/14
4/17/2021 Big Data Simplified

4.4 Optimize the Map Phase Using a Combiner

4.4.1 Reducers as Combiners

4.5 What is YARN?

4.5.1 Scheduling and Managing Tasks

4.5.2 Job Execution in the Hadoop Cluster

4.5.3 Troubleshoot a MapReduce Job in Hadoop Cluster

4.6 Example Use Case on MapReduce: Development and Execution Step-by-step (1/2)

4.6 Example Use Case on MapReduce: Development and Execution Step-by-step (2/2)

Summary

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 5 Introducing NoSQL


5.1 Introduction

5.2 NoSQL Databases in the Light of CAP Theorem

5.3 NoSQL Product Categories

5.3.1 Key-value Stores

5.3.2 Wide Column Stores or Columnar Stores

5.3.3 Document Stores

5.3.4 Graph Databases ⬆


5.4 NoSQL Database: Cassandra

5.4.1 Characteristics of Cassandra


Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 6/14
4/17/2021 Big Data Simplified

5.4.2 Cassandra Architecture

5.4.3 Components of Cassandra

5.4.4 Cassandra Write Operations at a Node Level

5.4.5 Cassandra Node Level Read Operation

5.4.6 KEYSPACE in Cassandra

5.4.7 Starting Cassandra Server and Cqlsh Query Editor

5.4.8 DataStax Distribution Package

5.5 NoSQL Databases in the Cloud

5.6 NoSQL – Do’s and Don’ts

5.7 Business Intelligence and NoSQL

5.8 Big Data and NoSQL

Summary

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 6 Introducing Spark and Kafka


6.1 Introducing Spark

6.1.1 Hadoop and Spark

6.1.2 Spark Programming Languages

6.1.3 Understanding Spark Architecture ⬆


6.1.4 Spark Libraries: Spark SQL

6.1.5 Spark Libraries: Streaming


Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 7/14
4/17/2021 Big Data Simplified

6.1.6 Spark Libraries: Machine Learning

6.1.7 Spark Libraries: GraphX

6.1.8 PySpark: Spark with Python

6.2 Working with Kafka

6.2.1 What is Apache Kafka

6.2.2 Kafka Architecture

6.2.3 Need of Apache Kafka in Big Data

6.2.4 Kafka Use Cases

6.2.5 Why is Kafka so Fast?

6.2.6 Kafka Needs ZooKeeper

6.2.7 Di erent Components in Kafka

6.2.8 Di erence between Apache Kafka and Apache Flume

6.2.9 Kafka Demonstration—How Messages are Passing from Publisher to Consumer through a Topic

Summary

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 7 Other BigData Tools and Technologies


7.1 Introduction

7.2 Hive ⬆
7.2.1 Hive Architecture

7.2.2 Data Flow in Hive


Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 8/14
4/17/2021 Big Data Simplified

7.2.3 Data Types in Hive

7.2.4 Di erent Types of Tables in Hive (1/2)

7.2.4 Di erent Types of Tables in Hive (2/2)

7.2.5 Partitioning and Bucketing in Hive

7.3 Pig

7.3.1 Why Apache Pig

7.3.2 Features of Apache Pig

7.3.3 Apache Pig vs. MapReduce

7.3.4 Pig Architecture

7.4 Sqoop and Flume

7.4.1 SqoopEXPORT (Data Transfer from HDFS to MySQL)

7.4.2 Sqoop IMPORT (Importing Fresh Table from MySQL to HIVE)

7.4.3 Flume

7.4.4 Components of Flume

7.4.5 Configure Flume to Ingest Web Log Data from a Local Directory to HDFS

7.5 Oozie

7.5.1 Oozie Workflow

7.6 Lucene and Solr

7.6.1 Lucene in Search Applications

7.6.2 Features of Apache Solr



7.6.3 Apache Solr—Basic Commands

7.7 Zookeeper

Your
7.8 trialNiFi
Apache membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 9/14
4/17/2021 Big Data Simplified

7.8.1 What Apache NiFi Does

Summary

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 8 Working with Big Data in R


8.1 Prerequisites

8.1.1 Install R in Your System

8.1.2 Know How to Manage R Scripts

8.1.3 Introduction to Basic R Commands (1/3)

8.1.3 Introduction to Basic R Commands (2/3)

8.1.3 Introduction to Basic R Commands (3/3)

8.2 Exploratory Data Analysis

8.2.1 Basic Statistical Techniques for Data Exploration

8.2.2 Basic Plots for Data Exploration

8.3 R Libraries for Dealing with Large Data Sets

8.3.1 and base Packages

8.3.2 Parallel Package

8.3.3 data.table Package

8.4 Integrating Hadoop with R ⬆


8.5 Simple R Program with Hadoop

Summary
Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 10/14
4/17/2021 Big Data Simplified

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 9 Working with Big Data in Python


9.1 Prerequisites

9.1.1 Install Python in Your System

9.1.2 Know How to Manage Python Scripts

9.1.3 Introduction to Basic Python Commands

9.2 Basic Libraries in Python

9.2.1 NumPy Library (1/2)

9.2.1 NumPy Library (2/2)

9.2.2 Pandas Library

9.2.3 Matplotlib Library

9.3 Python Libraries for Dealing with Large Data Sets

9.3.1 numpy.memmap Object

9.3.2 Parallel Computing Using mp4pi Library

9.4 Python-MapReduce Using Hadoop Streaming

9.4.1 What is Hadoop Streaming?

9.4.2 Python MapReduce Code

9.4.3 Step by Step Execution ⬆


9.4.4 Running the MapReduce Python Code on Hadoop

Summary
Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 11/14
4/17/2021 Big Data Simplified

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 10 Big Data Applied


10.1 Introduction

10.2 Big Data and Data Science

10.2.1 What is Data Science?

10.2.2 Who is a Data Scientist?

10.2.3 How Do We Do Define ‘Data Science’?

10.2.4 Common Pitfalls of Data Science

10.3 Big Data and IoT

10.3.1 What is IoT?

10.3.2 Overview of IoT Architecture

10.3.3 IoT in Action

10.3.4 Impacts of IoT

10.3.5 Applications of Big Data and IoT

10.4 Big Data and Recommendation Engines

10.4.1 What is a Recommendation?

10.4.2 What are Recommendation Engines?

10.4.3 What are the Types of Recommendation Engines? ⬆


10.4.4 How is Big Data Used in a Recommendation Engine?

Summary
Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 12/14
4/17/2021 Big Data Simplified

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 11 Big Data Strategy


11.1 Introduction

11.2 Two Typical Big Data Use Cases

11.2.1 Big Data Primarily for Cost Reduction

11.2.2 Big Data Primarily for Enhanced Value

11.3 Data Warehouses vs. Data Lakes—What is Your Strategy?

11.3.1 Di erences between Data Warehouse and Data Lake

11.4 Key Questions to Ask

11.5 Getting Ready for a Big Data Program

11.6 Making Technology Choices

11.7 Making Tooling Choices

Summary

Short-answer Type Questions (5 Marks Questions)

Long-answer Type Questions (10 Marks Questions)

Chapter 12 Case Study: Retail Near Real-time Analytics


12.1 Introduction to Retail Domain

12.1.1 What is Retail in the First Place? ⬆

12.1.2 So, Why is Retailing So Important?

12.2
YourNear Real-time
trial Analytics:
membership hasProblem Statement
ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 13/14
4/17/2021 Big Data Simplified

12.3 NRT Analytics: Solution Approach

12.4 NRT Analytics: Details of Solution Implemented (1/3)

12.4 NRT Analytics: Details of Solution Implemented (2/3)

12.4 NRT Analytics: Details of Solution Implemented (3/3)

12.4.1 Data from Producer

12.4.2 Output A er Running Analysis Using Spark

12.4.3 Data Saved in Cassandra

12.4.4 Kafka Producer Streamed in Batch Mode A er Every 2 Minutes

12.4.5 Data Streamed A er 2 Minutes Containing the New Data

12.4.6 New Data Got Entered in Cassandra

Summary

Multiple-choice Questions (1 Mark Questions)

Short-answer Type Questions (5 Marks Questions)

Appendix (1/2)

Appendix (2/2)

Index

Support / Sign Out


© 2021 O'Reilly Media, Inc.
Terms of Service / Privacy Policy ⬆

Your trial membership has ended, Kokomej. Please contact your administrator or O'Reilly Support.
https://learning.oreilly.com/library/view/big-data-simplified/9789353941505/ 14/14

You might also like