0% found this document useful (0 votes)

66 views

Data Science

Apache Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It uses a distributed file system called HDFS to store data and a processing framework called MapReduce to analyze data in parallel across nodes in a cluster. Hadoop provides reliable, scalable storage and processing of large datasets in a distributed computing environment.

Uploaded by

Umar Ahmad

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

Data Science

Uploaded by

Umar Ahmad

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 87

CS439 DATA SCIENCE

Muhammad Salman Saeed

Big Data
What is Big Data?
• Big Data is a term for data sets that are so large or
complex that traditional data processing application
software is inadequate to deal with them.

• Big Data challenges include

capturing data, data storage, processing,
data analysis, search, sharing, transfer,
visualization, querying, updating and information
privacy/security.
Trend 1

Trend 2

Trend 3

Trend 4
Can you identify these
Trend 5
trend lines in the field of data?
VOLUME
+
VARIETY

+
VELOCITY
Trend 1
+
VERACITY
Trend 2

Trend 3

Trend 4
+ = BIG DATA
Trend 5

COST
Big Data
Process A lot of Data

High Speed

With Less Cost

What is Big Data?

• We often use the concept of 4 V-s to describe Big

Data:
1. Volume - Amount of Data (Petabytes, Zetabytes)
2. Variety - Forms of Data (Structured, Unstructured)
3. Velocity - Speed of Data (GBs/sec)
4. Veracity - Uncertainty of Data (Accuracy)
What is Big Data?
1. Volume
• Transaction-based data stored through years.
k s,
b n
a . ta ing
,
os , etc D usa
lc
Te ealth r eho
h Wa

• Unstructured data streaming from social media.

Images Videos

Audio e ets
Tw

• Sensor and machine-to-machine data.

2M
IoT M
2. Variety
• Structured data in traditional databases

• Semi-structured data like XML or JSON.

• Unstructured data like emails, images, click-stream

3. Velocity
• Megabytes per second, Gigabytes per second.

• Data needs to be dealt with in timely manner.

• Inconsistent data flows with periodic peaks.

Speed of Creating Data
Speed of Storing Data
Speed of Processing Data
Speed of Analyzing Data
3. Velocity (cont.)
Batch Processing

Collect Clean Feed in Wait for

Act
Data Data Batches Process

Real-Time Processing

Capture Feed
Process
Streaming Real-time Act
Real-time
Data to machine

Big Data enables real-time decision pipelines!

3. Velocity (cont.)
Speed of Data Generation Speed of Data Processing

Slow Slow

Fast Fast

Which path to choose in what scenario?

4. Veracity
• Untrusted and Unreliable.

• Data Inconsistency and Incompleteness.

?
• Biased, Unclean and Ambiguous Data
Big Data Problem
&
Big Data Platforms
We Buy Machines
Storage Processing
A Big Data Platform
A Big Data Platform
Hadoop is one of the platform to
Solve Big Data Problem
Distributed Parallel
Storage Processing
Why Big Data Platforms?
Scalable Cost Effective

Flexible Fast

Resilient
1. Scalable
• It can store and distribute very large data sets across
hundreds of inexpensive servers that operate in
parallel.
• It enables businesses to run applications on
thousands of nodes involving thousands of terabytes
of data.

• It manages horizontal scalability seamlessly.

2. Cost Effective
• A scale-out architecture (as seen in previous slide)
that can affordably store all of a company's data for
later use.
• In the past, many companies would have had to
down-sample data, in an effort to reduce costs.

• The raw data would be deleted in relational DBs, as it

would be too cost-prohibitive to keep.

The cost savings are staggering!

3. Flexible
• Enables businesses to easily access new data
sources and tap into different types of data
(structured, unstructured, semistructured).
• A single system deriving valuable business insights
from data sources as variable as social media, email
conversations or clickstream data.

• A single system used for a wide variety of purposes,

such as log processing, recommendation systems,
data warehousing, market campaign analysis and
fraud detection.
4. Fast
• Storage method is based on a
distributed file system that basically
'maps' data wherever it is located on
a cluster.
• The tools for data processing are
often on the same servers where the
data is located, resulting in much
faster data processing.

• If you're dealing with large volumes of unstructured

data, it is able to efficiently process terabytes of data in
just minutes, and petabytes in hours.
5. Resilient
• Data is replicated to many nodes in the cluster,
which means that in the event of failure, there
is another copy available for use.
Sources of Big Data
Machines

People

Organizations
1. Machines
• Machine generated data is the
biggest source of Big Data.

A Boeing 787 produces 1/2 Terabytes per flight!

• Internet of Things, Smart Devices

- phones & sensors.

‘A lot of smart devices’ x ‘A lot of data capture’ = Big Data

2. People
• Mostly unstructured and text-
heavy.

• 80-90% of data the total data in

the world is unstructured.

• 75% of total data on internet is images/videos. It’s

called the Dark Matter of web.
3. Organizations
• Most data is Structured
Commercial Transactions,
Govt. Open Data, Banking
Stock Records, Medical
Health Records, E-
Commerce, etc.
• At least as important as unstructured data.
• It often gets ‘compartmentalized’ into isolated
information islands called Data Silos.
• Benefits can generated only by linking with other
structured and non-structured data.
• Walmart collects 2.5 petabytes
of data per hour!
Applications of Big Data
• Personalized Marketing.

• Recommendation Engines.

• Sentiment Analysis.

• Mobile Advertising.

• Biomedical Applications.

• Smart Cities.
Traditional Data Warehouse
Modern Data Warehouse
Modern Data Pipelines
Differentiate between DBMS
and DSMS
How to Get Value Out of
Big Data?

Data Science
Exploratory Data Analysis (EDA)

1) Question Acquire Ingest/ETL Wrangling Visualize

Modelling

2)
Choose Build/Train Validate Deploy Test
5 P’s of Data Science

People

Process Programmability

Purpose

Platform
What is Apache Hadoop?
• Apache Hadoop software library is a framework that
allows for the distributed processing of large data
sets across clusters of computers using simple
programming models.

• It is designed to scale up from single servers to

thousands of machines, each offering local
computation and storage.

• Rather than rely on hardware to deliver high-

availability, the library itself is designed to detect and
handle failures at the application layer.
Basic Hadoop Stack
Basic Hadoop Stack
Basic Hadoop Stack
Data Management Frameworks

Hadoop Distributed File System.

A Java-based, distributed file system that
HDFS provides scalable, reliable, high-throughput access
to application data stored across commodity
servers.

Yet Another Resource Negotiator.

YARN A framework for cluster resource management
and job scheduling.
Basic Hadoop Stack
Operations Frameworks

A web-based framework for provisioning,

Ambari managing and monitoring Hadoop Clusters.

A high-performance coordination service for

Zookeeper distributed applications.

A tool for provisioning and managing

Cloudbreak Hadoop Clusters in the cloud.

A server-based workflow engine used to

Oozie execute Hadoop Jobs
Basic Hadoop Stack
Data Access Frameworks

A high-level platform for extracting,

Pig transforming, analyzing large datasets.

A data warehouse infrastructure that

Hive supports ad hoc SQL queries.

A table information, schema and metadata

HCatalog management layer supporting Hive, Pig,
MapReduce, and Tez Processing.

Application development framework for

Cascading building data applications, abstracting details
of complex MapReduce programing.

A scalable distributed NoSQL database that

HBase supports structured data storage for large
tables.
Basic Hadoop Stack
Data Access Frameworks

A client-side SQL layer over HBase that

Phoenix provides low latency access to HBase data.

A low latency, large table data storage and

Accumulo retrieval system with cell-level security.

A distributed computation system for

Storm processing continuous stream of real-time
data.

A distributed search platform capable of

Solr indexing petabytes of data.

A fast, general purpose processing engine

Spark used to build and run sophisticated SQL,
streaming, machine learning or graphics.
Basic Hadoop Stack
Governance and Integration Frameworks

A data governance tool providing workflow

Falcon orchestration, data lifecycle management, and
data replication services.

A REST API that uses standard HTTP verbs

WebHDFS WebHDFS to access, operate, manage HDFS.

HDFS NFS HDFS NFS A gateway that enables access to HDFS as

Gateway an NFS mounted file system.
Gateway

A distributed, reliable and highly available

Flume service that efficiently collects, aggregates and
moves streaming data.
Basic Hadoop Stack
Governance and Integration Frameworks

A set of tools for importing and exporting

Sqoop data between Hadoop and RDBM systems.

A fast, scalable, durable, and faut-tolerant

Kafka publish-subscribe messaging system.

A scalable and extensible set of core

governance services enabling enterprises to
Atlas meet compliance and data integration
requirements.
Basic Hadoop Stack
Security Frameworks

A storage management service providing

file and directory permissions, even more
HDFS granular file and directory access control lists,
and transparent data encryption.

A resource management service with access

YARN control lists controlling access to compute
resources and YARN administrative functions.

A data warehouse infrastructure service

Hive providing granular access controls to table
columns and rows.
Basic Hadoop Stack
Security Frameworks

A data governance tool providing access

Falcon control lists that limit who may submit Hadoop
Jobs.

A gateway providing perimeter security to a

Knox Hadoop Cluster.

A centralized security framework offering

Ranger fine-grained policy controls for HDFS, Hive,
Hbase, Knox, Storm, Kafka and Solr
Hadoop as +1 Architecture
• Though it has the potential to replace all others, it can
also be used to complement existing systems if they
can’t be removed due to any constraints.
Cloudera
Hortonworks Data Platform (HDP)
MapR
Cloudera vs Hortonworks vs MapR (1)
Cloudera vs Hortonworks vs MapR (2)
Distributed File System
• We said Big Data has a lot of Volume. Is it then
possible to store Big Data in a single system?
• It’s not. We need to distribute Big Data into
multiple systems, for which we need a Distributed
File System (DFS).

Node
Rack
Distributed File System
• To achieve parallelization, we distribute data across
nodes and also move computation to each node.

Data 1 2 3 4 5

2 5

3 4
Distributed File System
• To achieve parallelization, we distribute data across
nodes and also move computation to each node.

Data 1 2 3 4 5

2 5

3 4
Distributed File System
Reading 1 TB Data

1 Machine
4 I/O Channels
100 Mbps / Channel 10 Machine
4 I/O Channels
43 Minutes 100 Mbps / Channel
10 Times Faster!
4.3 Minutes
Distributed Computing
• It can be defined as the use of a distributed system to
solve a single large problem by breaking it down
into several tasks where each task is computed in the
individual computers of the distributed system.

• All the computers connected in a network

communicate with each other to attain a common
goal by making use of their own local memory.

• Hadoop makes use of Distributed Computing.

Computer Cluster
• A group of network connected
computers working as a single unit
to perform a task.
Hadoop Server Roles

Client Master Slave

Machines Nodes Nodes
Hadoop Server Roles
Clients
Distributed Data Processing Distributed Data Storage
(MapReduce) (HDFS)

Job Name Secondary

Trackers Nodes Nodes

DataNode DataNode DataNode

TaskTracker TaskTracker TaskTracker

DataNode DataNode DataNode

TaskTracker TaskTracker TaskTracker
Typical Hadoop Cluster
Switch Switch
Hadoop Cluster

Rack 1 Rack 2 Rack 3 Rack n

Switch Switch Switch Switch

Node 1 Node 1 Node 1 Node 1

Node 2 Node 2 Node 2 Node 2

Node n Node n Node n Node n

HDFS Overview
• A fault-tolerant distributed file system for big large
files.
• Write Once, Read Many Times (WORM)
• Divide files into big blocks and distribute across the
cluster.

• Store multiple replicas of each block for reliability.

• Programs can ask “Where do the pieces of my file

live.
HDFS Overview
1110010101100

1
1001010110011
1110010101100
1001010110011
Rack 1 Rack 2 Rack 3

1110010101100

2
1001010110011
1110010101100
Blocks 100101001001
0101100111110

3
0101011001001
0101100111100
110101

4
Logical File Hadoop Cluster
HDFS Overview
• It looks & acts just like a file system.
hdfs dfs -command[args]

• A few of almost 30 HDFS Commands:

- -cat :display file content (uncompressed).
- -text :just like cat but works on uncompressed files.
- -chgrp, -chmod, -chown :changes file permissions.
- -put, -get, -copyFromLocal, -copyToLocal :copies files from
the local file system to HDFS and vice versa.
- -ls, -ls -R :list file/directories.
- -mv, -moveFromLocal, -moveToLocal :moves files.
- -stat :statistical info for any given file.
HDFS Components

Name Data
Node Node
NameNode
• It acts as HDFS Master Component.
• It determines and maintains how chunks of data are
distributed and replicated across the DataNodes.

• It maintains critical HDFS information/system state

information.
• To enhance HDFS performance, it maintains and
serves this information from memory.
• Therefore it is critical to ensure NameNode always
has sufficient memory.
• If NameNode fails, HDFS fails.
NameNode
• Overview of information stored by NameNode.

Namespace Metadata Block Map

• Hierarchy • Permissions & Ownership • Files names >
• Directory names • ACLs Block IDs
• Files names • Block Size & Replication levels
• Access & Modification times
• User quotas
Secondary NameNode
Secondary
NameNode
NameNode

• It does housekeeping and backup of NameNode

namespace and metadata.
• It is not a hot-standby for NameNode.

• It connects to NameNode every hour.

• Saved metadata can rebuild a failed NameNode.

NameNode High Availability
Secondary
NameNode
NameNode

• HDFS NameNode is Single Point of Failure.

• NameNode HA:
- Uses a redundant NameNode.
- Enables fast fail-over in response to
NameNode failure.
- Is configured by Ambari.
- Is configured in an active/standby
configuration.
- Permits administrator-initiated failover for
maintenance.
DataNode
• It acts as HDFS Slave Component.
• It is the only place where chunks of data are actually
stored.
• Other than storing data, it is also responsible for
replicating data.

• It keeps on sending its heartbeat to NameNode to tell

about it’s availability.

• Every 10th heartbeat is a Block Report.

• DataNodes are heterogeneous: supports different types
of storages: Disks, SSDs, Memory.
HDFS Architecture
fs/namespace/meta ops
HDFS Secondary
NameNode
Client NameNode

namespace backup

heartbeats, balancing, block ops, etc.

replication
DataNode DataNode DataNode DataNode DataNode

Data Serving
HDFS Rack Awareness
• Never loose all data if entire rack fails. How?
Hadoop Cluster

• Store replicas on Rack 1 Rack 2 Rack 3

multiple racks.
Switch Switch Switch
• Keep bulky flows
in-rack. Node 1 Node 1 Node 1

• There is higher Node 2 Node 2 Node 2

bandwidth and
lower latency in-
rack.
Node n Node n Node n
HDFS Write Pipeline
Note: All DataNodes are in constant communication with
NameNode so no arrows are drawn for that.

Hadoop Cluster

Rack 1 Rack 2 Rack 3

NameNode Switch Switch Switch

Node 1 Node 1 Node 1

Switch

Node 2 Node 2 Node 2

HDFS
Client
Node n Node n Node n
HDFS Write Pipeline
• HDFS manages writing of file block by block.
• Many files are being written in parallel to save time.

• All communication happens through TCP

connections so high bandwidth is required.

• And HDFS Write Operation has to major cycles:

Acknowledgements and Data Transfer.
• Starting node for each block isn’t necessarily same.

• NameNode updates metadata with the help of Block

Reports sent by DataNodes.
Spanning HDFS Cluster
• Keep Block Size small.
• Small Block Size means more Blocks for a file.
• More Blocks means file is spread on more machines.

• More CPU cores and disk drives that have a block of

file mean more parallel processing power and faster
results.
• This is why we build large wide clusters.
Re-replicating in HDFS Cluster
• NameNode automatically takes care of recovering
missing and corrupted blocks.
Rack 1 Rack 2 Rack 3
• Missing heartbeats signify
lost Nodes.
• NameNode consults
metadata, finds affected data.

• NameNode consults Rack

Awareness script.
• NameNode tells a DataNode to
re-replicate itself to a specified Hadoop Cluster
DataNode that is available.
HDFS Read
• HDFS manages reading of file block by block.
• Many files are being read in parallel to save time.

• All communication happens through TCP

connections so high bandwidth is required.
• NameNode updates metadata with the help of Block
Reports sent by DataNodes.
• In case a DataNode needs a block that it does not
have, the NameNode provides rack local DataNodes
first to leverage in-rack bandwidth.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
94% (68)
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
49 pages
Read People Like A Book by Patrick King-Edited
61% (72)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (29)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (55)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Immediate download Unlocking dbt : Design and Deploy Transformations in Your Cloud Data Warehouse Cameron Cyr ebooks 2024
100% (4)
Immediate download Unlocking dbt : Design and Deploy Transformations in Your Cloud Data Warehouse Cameron Cyr ebooks 2024
66 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
1001 Songs
70% (71)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Visual C++ 5.0 Symbolic Debug Information Specification
No ratings yet
Visual C++ 5.0 Symbolic Debug Information Specification
89 pages
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
Vmware Cloud Foundation 4 X On Vxrail Architecture Guide
No ratings yet
Vmware Cloud Foundation 4 X On Vxrail Architecture Guide
96 pages
Challenges and Scope of Data Science Project
No ratings yet
Challenges and Scope of Data Science Project
21 pages
Ruta de Entrenamiento Base Cloudera Revisada
100% (1)
Ruta de Entrenamiento Base Cloudera Revisada
6 pages
HIVE Blockchain Short Report
No ratings yet
HIVE Blockchain Short Report
7 pages
Analytics, Decision Support, and Artificial Intelligence Brainpower For Your Business
No ratings yet
Analytics, Decision Support, and Artificial Intelligence Brainpower For Your Business
33 pages
Application Security Report As of 7 Mar 2021
No ratings yet
Application Security Report As of 7 Mar 2021
45 pages
A Survey On Natural Language To SQL Query Generator
No ratings yet
A Survey On Natural Language To SQL Query Generator
6 pages
Data Science Foundation
No ratings yet
Data Science Foundation
7 pages
Mitigation and Detection of DDOS Attacks Using Software Defined Network (SDN) and Machine Learning
No ratings yet
Mitigation and Detection of DDOS Attacks Using Software Defined Network (SDN) and Machine Learning
11 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
Diabetic Retinopathy Using Inception V3 Model
No ratings yet
Diabetic Retinopathy Using Inception V3 Model
7 pages
Data Science
No ratings yet
Data Science
8 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Relational Database Management System
No ratings yet
Relational Database Management System
5 pages
AI Data Science
100% (1)
AI Data Science
17 pages
Data Engineering Explanation
No ratings yet
Data Engineering Explanation
43 pages
MongoBoulder - Schema Design
No ratings yet
MongoBoulder - Schema Design
59 pages
Cloud Architecture
No ratings yet
Cloud Architecture
75 pages
Applied Coding Track
No ratings yet
Applied Coding Track
10 pages
PSD02 - Data Science Overview
No ratings yet
PSD02 - Data Science Overview
64 pages
Aws Three Practical Use Cases With Databricks Ebook v5 101221
No ratings yet
Aws Three Practical Use Cases With Databricks Ebook v5 101221
34 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
Feasibility Studies and Performance Evaluation of BRTS in Rajahmundry and Kakinada Corridor
No ratings yet
Feasibility Studies and Performance Evaluation of BRTS in Rajahmundry and Kakinada Corridor
9 pages
Hadoop Interviews Q
No ratings yet
Hadoop Interviews Q
9 pages
GCP Storage
No ratings yet
GCP Storage
12 pages
MongoDB Manual
No ratings yet
MongoDB Manual
25 pages
Aggregated Function in HIVE
No ratings yet
Aggregated Function in HIVE
5 pages
Cert DEWD (Edits)
No ratings yet
Cert DEWD (Edits)
158 pages
AD8552-Machnie Learning QB
No ratings yet
AD8552-Machnie Learning QB
25 pages
SQL Replication Basic
No ratings yet
SQL Replication Basic
22 pages
Oltp Olap Rtap
No ratings yet
Oltp Olap Rtap
53 pages
DW DM Notes
No ratings yet
DW DM Notes
107 pages
Mining Your Data Lake For Analytics Insights v3 101420
No ratings yet
Mining Your Data Lake For Analytics Insights v3 101420
16 pages
Hive Real Life Use Cases - AcadGild Blog
No ratings yet
Hive Real Life Use Cases - AcadGild Blog
19 pages
Impala and BigQuery
No ratings yet
Impala and BigQuery
47 pages
2016 05 10 Apache Nifi Deep Dive 160511170654
No ratings yet
2016 05 10 Apache Nifi Deep Dive 160511170654
34 pages
What Is DW2.0
No ratings yet
What Is DW2.0
13 pages
Big Data Pipelines
No ratings yet
Big Data Pipelines
22 pages
MIS-15 - Data and Knowledge Management
No ratings yet
MIS-15 - Data and Knowledge Management
55 pages
17 2017 Lecture1-2 INT312
0% (2)
17 2017 Lecture1-2 INT312
21 pages
SQL NoSQL NewSQL
No ratings yet
SQL NoSQL NewSQL
12 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
17 pages
Big Data Analytics and Artificial Intelligence in
No ratings yet
Big Data Analytics and Artificial Intelligence in
10 pages
Data Engineer Master Program v2
No ratings yet
Data Engineer Master Program v2
27 pages
Lab Workbook-Informatica EDC Migrating To MSFT AZURE DWM
No ratings yet
Lab Workbook-Informatica EDC Migrating To MSFT AZURE DWM
17 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Talend ESB Container AG 50b en
No ratings yet
Talend ESB Container AG 50b en
63 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Artificial Intelligence Machine Learning Big Data
No ratings yet
Artificial Intelligence Machine Learning Big Data
22 pages
DP 900T00A ENU TrainerPrepGuide
No ratings yet
DP 900T00A ENU TrainerPrepGuide
10 pages
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
No ratings yet
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
3 pages
AWS Certified Big Data Specialty Exam Dumps - Amazondumps - Us
100% (1)
AWS Certified Big Data Specialty Exam Dumps - Amazondumps - Us
5 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
In My Case: Master - Slave Replication Step by Step
100% (1)
In My Case: Master - Slave Replication Step by Step
4 pages
Managing and Using Information Systems: A Strategic Approach - Sixth Edition
No ratings yet
Managing and Using Information Systems: A Strategic Approach - Sixth Edition
25 pages
CH 6
No ratings yet
CH 6
72 pages
Mapreduce Lab
No ratings yet
Mapreduce Lab
36 pages
Basic Programming Concept: Program
100% (1)
Basic Programming Concept: Program
34 pages
Aman Raj: Projects Education
No ratings yet
Aman Raj: Projects Education
1 page
Microsoft Powerbi Connection Adw
No ratings yet
Microsoft Powerbi Connection Adw
11 pages
The Bugbook VI Introductory Experiments in Digital Electronics, - Nodrm
No ratings yet
The Bugbook VI Introductory Experiments in Digital Electronics, - Nodrm
522 pages
XI-Emerging Trends-Notes
100% (1)
XI-Emerging Trends-Notes
6 pages
CCPDS-R Case Study
100% (1)
CCPDS-R Case Study
35 pages
PC Assembly Full Notes Ptu Bca 3
No ratings yet
PC Assembly Full Notes Ptu Bca 3
61 pages
Requirement - Specification - Week1 SRS
No ratings yet
Requirement - Specification - Week1 SRS
24 pages
Canopen® Communication Protocol: E N O D 3 - D
No ratings yet
Canopen® Communication Protocol: E N O D 3 - D
43 pages
Pavan - PD Updated Resume
No ratings yet
Pavan - PD Updated Resume
3 pages
2022-SIW-24HLA4FederateProtocol-RequirementsandSolutions
No ratings yet
2022-SIW-24HLA4FederateProtocol-RequirementsandSolutions
12 pages
Python For DevOps
No ratings yet
Python For DevOps
15 pages
Fanuc Macro Comp. Series 30i
No ratings yet
Fanuc Macro Comp. Series 30i
136 pages
Senior Front End Developer Resume Example
No ratings yet
Senior Front End Developer Resume Example
1 page
DLD Lab Manual
No ratings yet
DLD Lab Manual
59 pages
Python Cheat Sheet
No ratings yet
Python Cheat Sheet
11 pages
Gate 2024 Prep
No ratings yet
Gate 2024 Prep
7 pages
IT Assignment
No ratings yet
IT Assignment
2 pages
Simatic ktp400 Basic
No ratings yet
Simatic ktp400 Basic
116 pages
Virtual Basic Lecture 3
No ratings yet
Virtual Basic Lecture 3
30 pages
SSD-Lab 2-D1-Details 2023
No ratings yet
SSD-Lab 2-D1-Details 2023
5 pages
WDM QP Iat 3
No ratings yet
WDM QP Iat 3
2 pages
Ping-Pong Balls Collector - HuskyLens AI Robotic Ball Picker: 4 Steps (With Pictures) - Instructables
No ratings yet
Ping-Pong Balls Collector - HuskyLens AI Robotic Ball Picker: 4 Steps (With Pictures) - Instructables
6 pages
C.V Shaukat Mehmood12
No ratings yet
C.V Shaukat Mehmood12
2 pages
Owning One Rule All v2
No ratings yet
Owning One Rule All v2
50 pages
Stackq
No ratings yet
Stackq
5 pages
ADA Solved Model Paper 2024
No ratings yet
ADA Solved Model Paper 2024
43 pages