0% found this document useful (0 votes)

13 views

BDA Unit - IV

Uploaded by

Associate Professor, CSE Vel Tech, Chennai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

BDA Unit - IV

Uploaded by

Associate Professor, CSE Vel Tech, Chennai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

Department of

Computer Science and Engineering

10212CS210 – Big Data Analytics

Course Category : Program Elective

Credits :4
Slot : S1 & S5
Semester : Summer
Academic Year : 2024-2025
Faculty Name : Dr. S. Jagan

School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of
Science and Technology
Unit 4 Big Data Visualization and Prediction

Pig : Introduction to PIG, Execution Modes of Pig, Comparison of

Pig with Databases, Grunt, Pig Latin, User Defined Functions, Data
Processing operators. Hive : Hive Shell, Hive Services, Hive
Metastore, Comparison with Traditional Databases, HiveQL, Tables,
Querying Data and User Defined Functions, NoSQL Databases :
Schema-less Models‖: Increasing Flexibility for Data Manipulation-
Key Value Stores- Document Stores – Tabular Stores – Object Data
Stores – Graph Databases Hive – Sharding- Hbase – Analyzing big
data with twitter – Big data for E-Commerce Big data for blogs.
Department of Computer Science and Engineering 2
Introduction to PIG

• Developed by Yahoo! and a top level Apache project

• Immediately makes data on a cluster available to non-
Java programmers via Pig Latin – a dataflow language
• Interprets Pig Latin and generates MapReduce jobs
that run on the cluster
• Enables easy data summarization, ad-hoc reporting
and querying, and analysis of large volumes of data
• Pig interpreter runs on a client machine – no
administrative overhead required

Department of Computer Science and Engineering 3

Introduction to PIG

Department of Computer Science and Engineering 4

Pig Terms

• All data in Pig one of four types:

• An Atom is a simple data value - stored as a string but can
be used as either a string or a number
• A Tuple is a data record consisting of a sequence of "fields"
• Each field is a piece of data of any type (atom, tuple or bag)
• A Bag is a set of tuples (also referred to as a ‘Relation’)
• The concept of a “kind of a” table
• A Map is a map from keys that are string literals to values
that can be any data type
• The concept of a hash map

Department of Computer Science and Engineering 5

Pig Capabilities

• Support for
• Grouping
• Joins
• Filtering
• Aggregation
• Extensibility
• Support for User Defined Functions (UDF’s)
• Leverages the same massive parallelism as native
MapReduce

Department of Computer Science and Engineering 6

Pig Basics

• Pig is a client application

• No cluster software is required
• Interprets Pig Latin scripts to MapReduce jobs
• Parses Pig Latin scripts
• Performs optimization
• Creates execution plan
• Submits MapReduce jobs to the cluster

Department of Computer Science and Engineering 7

Execution Modes

• Pig has two execution modes

• Local Mode - all files are installed and run using your local host
and file system
• MapReduce Mode - all files are installed and run on a Hadoop
cluster and HDFS installation
• Interactive
• By using the Grunt shell by invoking Pig on the command line
$ pig
grunt>
• Batch
• Run Pig in batch mode using Pig Scripts and the "pig" command
$ pig –f id.pig –p <param>=<value> ...

Department of Computer Science and Engineering 8

Pig Latin

• Pig Latin scripts are generally organized as follows

• A LOAD statement reads data
• A series of “transformation” statements process the data
• A STORE statement writes the output to the filesystem
• A DUMP statement displays output on the screen
• Logical vs. physical plans:
• All statements are stored and validated as a logical plan
• Once a STORE or DUMP statement is found the logical plan
is executed

Department of Computer Science and Engineering 9

Example Pig Script

-- Load the content of a file into a pig bag named ‘input_lines’

input_lines = LOAD 'CHANGES.txt' AS (line:chararray);

-- Extract words from each line and put them into a pig bag named ‘words’
words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word;

-- filter out any words that are just white spaces

filtered_words = FILTER words BY word MATCHES '\\w+';

-- create a group for each word

word_groups = GROUP filtered_words BY word;

-- count the entries in each group

word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS
word;

-- order the records by count

ordered_word_count = ORDER word_count BY count DESC;

-- Store the results ( executes the pig script )

STORE ordered_word_count INTO 'output’;

Department of Computer Science and Engineering 10

Basic “grunt” Shell Commands

• Help is available
$ pig -h
• Pig supports HDFS commands
grunt> pwd
• put, get, cp, ls, mkdir, rm, mv, etc.

Department of Computer Science and Engineering 11

About Pig Scripts

• Pig Latin statements grouped together in a file

• Can be run from the command line or the shell
• Support parameter passing
• Comments are supported
• Inline comments '--'
• Block comments /* */

Department of Computer Science and Engineering 12

Simple Data Types
Type Description
int 4-byte integer
long 8-byte integer
float 4-byte (single precision) floating point
double 8-byte (double precision) floating point
bytearray Array of bytes; blob
chararray String (“hello world”)
boolean True/False (case insensitive)
datetime A date and time
biginteger Java BigInteger
bigdecimal Java BigDecimal

Department of Computer Science and Engineering 13

Complex Data Types

Type Description
Tuple Ordered set of fields (a “row / record”)
Bag Collection of tuples (a “resultset / table”)
Map A set of key-value pairs
Keys must be of type chararray

Department of Computer Science and Engineering 14

Pig Data Formats

• BinStorage
• Loads and stores data in machine-readable (binary) format
• PigStorage
• Loads and stores data as structured, field delimited text
files
• TextLoader
• Loads unstructured data in UTF-8 format
• PigDump
• Stores data in UTF-8 format
• YourOwnFormat!
• via UDFs

Department of Computer Science and Engineering 15

Loading Data Into Pig

• Loads data from an HDFS file

var = LOAD 'employees.txt';
var = LOAD 'employees.txt' AS (id, name,
salary);
var = LOAD 'employees.txt' using PigStorage()
AS (id, name, salary);
• Each LOAD statement defines a new bag
• Each bag can have multiple elements (atoms)
• Each element can be referenced by name or position ($n)
• A bag is immutable
• A bag can be aliased and referenced later

Department of Computer Science and Engineering 16

Storing Data Into Pig

• STORE
• Writes output to an HDFS file in a specified directory
grunt> STORE processed INTO
'processed_txt';
• Fails if directory exists
• Writes output files, part-[m|r]-xxxxx, to the directory
• PigStorage can be used to specify a field delimiter
• DUMP
• Write output to screen
grunt> DUMP processed;

Department of Computer Science and Engineering 17

Relational Operators

• FOREACH
• Applies expressions to every record in a bag
• FILTER
• Filters by expression
• GROUP
• Collect records with the same key
• ORDER BY
• Sorting
• DISTINCT
• Removes duplicates

Department of Computer Science and Engineering 18

Relational Operators

• Use the FOREACH …GENERATE operator to work with

rows of data, call functions, etc.
• Basic syntax:
alias2 = FOREACH alias1 GENERATE
expression;
• Example:
DUMP alias1;
(1,2,3) (4,2,1) (8,3,4) (4,3,3) (7,2,5)
(8,4,3)
alias2 = FOREACH alias1 GENERATE col1, col2;
DUMP alias2;
(1,2) (4,2) (8,3) (4,3) (7,2) (8,4)

Department of Computer Science and Engineering 19

Relational Operators

• Use the FILTER operator to restrict tuples or rows of

data
• Basic syntax:
alias2 = FILTER alias1 BY expression;
• Example:
DUMP alias1;
(1,2,3) (4,2,1) (8,3,4) (4,3,3) (7,2,5)
(8,4,3)
alias2 = FILTER alias1 BY (col1 == 8) OR (NOT
(col2+col3 > col1));
DUMP alias2;
(4,2,1) (8,3,4) (7,2,5) (8,4,3)

Department of Computer Science and Engineering 20

Relational Operators

• Use the GROUP…ALL operator to group data

• Use GROUP when only one relation is involved
• Use COGROUP with multiple relations are involved
• Basic syntax:
alias2 = GROUP alias1 ALL;
• Example:
DUMP alias1;
(John,18,4.0F) (Mary,19,3.8F)
(Bill,20,3.9F) (Joe,18,3.8F)
alias2 = GROUP alias1 BY col2;
DUMP alias2;
(18,{(John,18,4.0F),(Joe,18,3.8F)})
(19,{(Mary,19,3.8F)})
(20,{(Bill,20,3.9F)})
Department of Computer Science and Engineering 21
Relational Operators

• Use the ORDER…BY operator to sort a relation based

on one or more fields
• Basic syntax:
alias = ORDER alias BY field_alias [ASC|DESC];
• Example:
DUMP alias1;
(1,2,3) (4,2,1) (8,3,4) (4,3,3) (7,2,5)
(8,4,3)
alias2 = ORDER alias1 BY col3 DESC;
DUMP alias2;
(7,2,5) (8,3,4) (1,2,3) (4,3,3) (8,4,3)
(4,2,1)

Department of Computer Science and Engineering 22

Relational Operators

• Use the DISTINCT operator to remove duplicate tuples

in a relation.
• Basic syntax:
alias2 = DISTINCT alias1;
• Example:
DUMP alias1;
(8,3,4) (1,2,3) (4,3,3) (4,3,3) (1,2,3)
alias2= DISTINCT alias1;
DUMP alias2;
(8,3,4) (1,2,3) (4,3,3)

Department of Computer Science and Engineering 23

Relational Operators

• FLATTEN
• Used to un-nest tuples as well as bags
• INNER JOIN
• Used to perform an inner join of two or more relations based on
common field values
• OUTER JOIN
• Used to perform left, right or full outer joins
• SPLIT
• Used to partition the contents of a relation into two or more
relations
• SAMPLE
• Used to select a random data sample with the stated sample size

Department of Computer Science and Engineering 24

Relational Operators

• Use the JOIN operator to perform an inner, equi-join

join of two or more relations based on common field
values
• The JOIN operator always performs an inner join
• Inner joins ignore null keys
• Filter null keys before the join
• JOIN and COGROUP operators perform similar
functions
• JOIN creates a flat set of output records
• COGROUP creates a nested set of output records

Department of Computer Science and Engineering 25

Relational Operators

DUMP Alias1; Join Alias1 by Col1 to

(1,2,3) Alias2 by Col1
(4,2,1) Alias3 = JOIN Alias1
(8,3,4) BY Col1, Alias2 BY
Col1;
(4,3,3)
(7,2,5)
(8,4,3) Dump Alias3;
DUMP Alias2; (1,2,3,1,3)
(2,4) (4,2,1,4,6)
(8,9) (4,3,3,4,6)
(1,3) (4,2,1,4,9)
(2,7) (4,3,3,4,9)
(2,9) (8,3,4,8,9)
(4,6) (8,4,3,8,9)
(4,9)

Department of Computer Science and Engineering 26

Relational Operators

• Use the OUTER JOIN operator to perform left, right, or full

outer joins
• Pig Latin syntax closely adheres to the SQL standard
• The keyword OUTER is optional
• keywords LEFT, RIGHT and FULL will imply left outer, right outer
and full outer joins respectively
• Outer joins will only work provided the relations which
need to produce nulls (in the case of non-matching keys)
have schemas
• Outer joins will only work for two-way joins
• To perform a multi-way outer join perform multiple two-way
outer join statements

Department of Computer Science and Engineering 27

User-Defined Functions

• Natively written in Java, packaged as a jar file

• Other languages include JavaScript, Ruby, Groovy, and
Python
• Register the jar with the REGISTER statement
• Optionally, alias it with the DEFINE statement
REGISTER /src/myfunc.jar;
A = LOAD 'students';
B = FOREACH A GENERATE myfunc.MyEvalFunc($0);

Department of Computer Science and Engineering 28

DEFINE

• DEFINE can be used to work with UDFs and also

streaming commands
• Useful when dealing with complex input/output formats
/* read and write comma-delimited data */
DEFINE Y 'stream.pl' INPUT(stdin USING PigStreaming(','))
OUTPUT(stdout USING PigStreaming(','));
A = STREAM X THROUGH Y;

/* Define UDFs to a more readable format */

DEFINE MAXNUM org.apache.pig.piggybank.evaluation.math.MAX;
A = LOAD ‘student_data’ AS (name:chararray, gpa1:float,
gpa2:double);
B = FOREACH A GENERATE name, MAXNUM(gpa1, gpa2);
DUMP B;

Department of Computer Science and Engineering 29

Data Warehousing package built on top of
Hadoop

Department of Computer Science and Engineering 30

Hive Background

• Started at Facebook
• Data was collected and stored into Oracle DB
• Data Grew from 10s of GB (2006) to 1 TB/day new data(2007)
• Now the 2020 time its 1024 TB of data generating in a minute.

Department of Computer Science and Engineering 31

Hive use case @ Facebook

Department of Computer Science and Engineering 32

What is Hive

• Data Warehousing package built on top of Hadoop.

• Used for data analysis.
• Targeted towards users comfortable with SQL.
• It is similar to SQL and called HiveQL.
• For managing and querying structured data.
• No need to learn Java and Hadoop APIs.
• Developed by Facebook and contributed by community.
• Facebook analyzed several Terabytes of data every day using Hive.

Department of Computer Science and Engineering 33

Features of Hive

• Hive is fast and scalable.

• It provides SQL-like queries (i.e., HQL) that are implicitly transformed to
MapReduce or Spark jobs.
• It is capable of analyzing large datasets stored in HDFS.
• It allows different storage types such as plain text, RCFile, and HBase.
• It uses indexing to accelerate queries.
• It can operate on compressed data stored in the Hadoop ecosystem.
• It supports user-defined functions (UDFs) where user can provide its
functionality.

Department of Computer Science and Engineering 34

What is Hive

ETL – Extract,
Transform,
Load

Department of Computer Science and Engineering 35

Why go for Hive? When Pig is there

Department of Computer Science and Engineering 36

Hive Architecture and components

Department of Computer Science and Engineering 37

Why go for Hive When Pig is there

Pig Latin: Hive QL:

Procedural data-flow language Declarative SQLish language

A=load’mydata’; Select * from ‘mytable’;
Dump A;

Pig is used by programmer and Hive is used by Analysts generating daily

Researchers. reports.

Department of Computer Science and Engineering 38

Pig vs Hive

Features Hive Pig

Language SQL-like Piglatin
Schemas/Types Yes(Explicit) Yes(Implicit)
Partitions Yes No
Server Optional(Thrift) No
User Defined Yes(Java) Yes(Java)
Functions(UDF)

DFS Direct access Yes Yes

Join/Order/Sort Yes Yes
Shell Yes Yes
Web Interface Yes No
JDBC/ODBC Yes No

Department of Computer Science and Engineering 39

Differences between Hive and Pig

Hive Pig

Hive is commonly used by Data Pig is commonly used by

Analysts. programmers.

It follows SQL-like queries. It follows the data-flow language.

It can handle structured data. It can handle semi-structured data.

It works on server-side of HDFS It works on client-side of HDFS

cluster. cluster.

Hive is slower than Pig. Pig is comparatively faster than Hive.

Department of Computer Science and Engineering 40

Hive Architecture

Department of Computer Science and Engineering 41

Apache Hive Installation

Java Installation - Check whether the Java is installed or not using the following
command.
$ java -version
•Hadoop Installation - Check whether the Hadoop is installed or not using the
following command.
$hadoop version
Steps to install Apache Hive
Download the Apache Hive tar file.
http://mirrors.estointernet.in/apache/hive/hive-1.2.2/
DUnzip the downloaded tar file.

Department of Computer Science and Engineering 42

Apache Hive Installation

tar -xvf apache-hive-1.2.2-bin.tar.gz

DOpen the bashrc file.
$ sudo nano ~/.bashrc
DNow, provide the following HIVE_HOME path.
export HIVE_HOME=/home/codegyani/apache-hive-1.2.2-bin
export PATH=$PATH:/home/codegyani/apache-hive-1.2.2-bin/bin
DUpdate the environment variable.
$ source ~/.bashrc
DLet's start the hive by providing the following command.
$ hive

Department of Computer Science and Engineering 43

Hive Components

Department of Computer Science and Engineering 44

Metastore

Department of Computer Science and Engineering 45

Limitations of Hive

Department of Computer Science and Engineering 46

Abilities of Hive Query Language

Department of Computer Science and Engineering 47

Hive Data Models

Department of Computer Science and Engineering 48

Partitioning

Department of Computer Science and Engineering 49

Partitioning in Hive

• The partitioning in Hive means dividing the table into some parts based
on the values of a particular column like date, course, city or country.
• The advantage of partitioning is that since the data is stored in slices, the
query response time becomes faster.
• As we know that Hadoop is used to handle the huge amount of data, it is
always required to use the best approach to deal with it.
• The partitioning in Hive is the best example of it.

Department of Computer Science and Engineering 50

Partitioning in Hive

• Let's assume we have a data of 10 million students studying in an institute.

• Now, we have to fetch the students of a particular course.
• If we use a traditional approach, we have to go through the entire data.
• This leads to performance degradation.
• In such a case, we can adopt the better approach i.e., partitioning in Hive and
divide the data among the different datasets based on particular columns.

The partitioning in Hive can be executed in two ways -

•Static partitioning
•Dynamic partitioning

Department of Computer Science and Engineering 51

Bucketing

• Bucket concept is based on (Hashing function) mod (By total

number of bucket)

Department of Computer Science and Engineering 52

Bucketing in Hive

• The bucketing in Hive is a data organizing technique.

• It is similar to partitioning in Hive with an added functionality that it divides
large datasets into more manageable parts known as buckets.
• So, we can use bucketing in Hive when the implementation of partitioning
becomes difficult.
• However, we can also divide partitions further in buckets.

Department of Computer Science and Engineering 53

Bucketing in Hive

•The concept of bucketing is based on the hashing technique.

•Here, modules of current column value and the number of required
buckets is calculated (let say, F(x) % 3).
•Now, based on the resulted value, the data is stored into the
corresponding bucket.

Department of Computer Science and Engineering 54

Example of Bucketing in Hive

•First, select the database in which we want to create a table.

hive> use showbucket;

Department of Computer Science and Engineering 55

SerDe - Serialization and Deserialization

Introduction to Hive SerDe

• For the purpose of IO, Apache Hive uses the Hive SerDe interface.
Hence, it handles both serialization and deserialization in Hive.

• Also, interprets the results of serialization as individual fields for

processing.

• In addition, to read in data from a table a SerDe allows Hive.

Further writes it back out to HDFS in any custom format.

• However, it is possible that anyone can write their own SerDe for
their own data formats.

Department of Computer Science and Engineering 56

SerDe

• HDFS files –> InputFileFormat –> <key, value> –>

Deserializer –> Row object

• Row object –> Serializer –> <key, value> –>

OutputFileFormat –> HDFS files

Department of Computer Science and Engineering 57

UDF

• User Defined Functions, also known as UDF, allow you to

create custom functions to process records or groups of
records.

• Hive comes with a comprehensive library of functions.

• There are however some omissions, and some specific cases

for which UDFs are the solution.

Department of Computer Science and Engineering 58

UDF

A UDF processes one or several columns of one row and outputs one
value. For example :
•SELECT lower(str) from table

For each row in "table," the "lower" UDF takes one argument, the value
of "str", and outputs one value, the lowercase representation of "str".
•SELECT datediff(date_begin, date_end) from table

Department of Computer Science and Engineering 59

UDF

For each row in "table," the "datediff" UDF takes two arguments, the value of
"date_begin" and "date_end", and outputs one value, the difference in time
between these two dates.
Each argument of a UDF can be:
•A column of the table.
•A constant value.
•The result of another UDF.
•The result of an arithmetic computation.

Department of Computer Science and Engineering 60

Types of Built-in Functions in HIVE

• Collection Functions.

• Date Functions.

• Mathematical Functions.

• Conditional Functions.

• String Functions.

Department of Computer Science and Engineering 61

NoSQL – Not Only Sql

• Lightweight, Open source,.

• NoSQL DB used in

• Bigdata

• Real-time Web application.

• Log analysis

• Social networking feeds

• Non-relational database.

• Distributed.

• No support for Acid properties.

• No fixed table schema.

Department of Computer Science and Engineering 62
NoSQL - Types

• NoSQL

• Key-value or big hash table – Dynamo, Redis, Riak

• Document – MongoDB, Apache CouchDB, Mark Logic.

• Columnar – Cassandra, Hbase.

• Graph formats – Neo4j, Hypergraph DB, Infinite Graph

Department of Computer Science and Engineering 63

NoSQL - Types

Department of Computer Science and Engineering 64

What is it?
• NoSql database are not relational. - Key value
• Key value pair or document oriented or column oriented or graph
oriented.
Key value or big hash table.
• Key Value
• Firstname Rahul
• Lastnme Dravid

Document oriented.
• Maintain data in collections constituted of documents.
• For ex- mongoDB, Apache CouchDB, Couchbase, MarkLogic.
{
“Book Name” : BDA “,
“Publisher” : Wiley India
“ Year of publications”: 2011
}
Department of Computer Science and Engineering 65
Column

• Column – each storage block has data from only one column.

NoSQL

Key/Value or Bighash
table Schema less

Department of Computer Science and Engineering 66

Graph

• They are called network database, A graph stores in nodes.

ID: 1001 ID : 1002

ID : 1003

Department of Computer Science and Engineering 67

NoSQL – Types & Tools

Department of Computer Science and Engineering 68

Advantages of NoSql

• Can easily scale up and down

• Does not require a predefined schema
• Cheap, easily to implement.
• Relaxes the data consistency requirement.
• Data can be replicated to multiple nodes and can be partitioned.

Department of Computer Science and Engineering 69

Sql Vs NoSql

Department of Computer Science and Engineering 70

No SQL Vendors

Company Product Most widely used by

Amazon DynamoDB LinkedIn, Mozilla

Facebook Cassandra Netflix, Twitter, Ebay

Google Big Table Adobe Photoshop

Department of Computer Science and Engineering 71

Hbase

HBase is an open-source,
distributed, column-oriented
database built on top of HDFS
based on BigTable!

Department of Computer Science and Engineering 72

Hbase

• A distributed data store that can scale horizontally to

1,000s of commodity servers and petabytes of
indexed storage.
• Designed to operate on top of the Hadoop distributed
file system (HDFS) or Kosmos File System (KFS, aka
Cloudstore) for scalability, fault tolerance, and high
availability.

Department of Computer Science and Engineering 73

Hbase

• Distributed storage
• Table-like in data structure
• multi-dimensional map
• High scalability
• High availability
• High performance

Department of Computer Science and Engineering 74

Hbase

• Started toward by Chad Walters and Jim

• 2006.11
• Google releases paper on BigTable
• 2007.2
• Initial HBase prototype created as Hadoop contrib.
• 2007.10
• First useable HBase
• 2008.1
• Hadoop become Apache top-level project and HBase becomes
subproject
• 2008.10~
• HBase 0.18, 0.19 released
Department of Computer Science and Engineering 75
Hbase

• Tables have one primary index, the row key.

• No join operators.
• Scans and queries can select a subset of available
columns, perhaps by using a wildcard.
• There are three types of lookups:
• Fast lookup using row key and optional timestamp.
• Full table scan
• Range scan from region start to end.

Department of Computer Science and Engineering 76

Hbase

• HBase is a Bigtable clone.

• It is open source
• It has a good community and promise for the future
• It is developed on top of and has good integration for
the Hadoop platform, if you are using Hadoop
already.
• It has a Cascading connector.

Department of Computer Science and Engineering 77

Hbase

Department of Computer Science and Engineering 78

Analyzing big data with twitter

Department of Computer Science and Engineering 79

Big data for E-Commerce

Department of Computer Science and Engineering 80

Big data for blogs

Department of Computer Science and Engineering 81

Step by Step CT Scan
75% (4)
Step by Step CT Scan
210 pages
Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
Fault Diagnosis - MAN CATs II PDF
0% (1)
Fault Diagnosis - MAN CATs II PDF
17 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Session 3.3
No ratings yet
Session 3.3
30 pages
Apache Pig
No ratings yet
Apache Pig
28 pages
Unit IV EBDP 22
No ratings yet
Unit IV EBDP 22
97 pages
Pig Hive
No ratings yet
Pig Hive
59 pages
Hadoop Week 5
No ratings yet
Hadoop Week 5
78 pages
Pig Hive
No ratings yet
Pig Hive
58 pages
Apache Pig
No ratings yet
Apache Pig
61 pages
Pig_2
No ratings yet
Pig_2
63 pages
Hadoop Pig
No ratings yet
Hadoop Pig
111 pages
Apache PIG.pptx
No ratings yet
Apache PIG.pptx
41 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
9 pages
Introduction To Pig: SESSION 2016-2017
No ratings yet
Introduction To Pig: SESSION 2016-2017
44 pages
Big_Data_Unit-5
No ratings yet
Big_Data_Unit-5
81 pages
06-Pig-01-Intro-1
No ratings yet
06-Pig-01-Intro-1
23 pages
BDA Module 4 - Part 1 (Pig) 2023
No ratings yet
BDA Module 4 - Part 1 (Pig) 2023
34 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Unit 5 Lecture No-2(PIG)
No ratings yet
Unit 5 Lecture No-2(PIG)
101 pages
Notes - 5 Unit Big Data
No ratings yet
Notes - 5 Unit Big Data
22 pages
Pig Slides
No ratings yet
Pig Slides
46 pages
bda-unit-4-060115-big-data-analytics-unit-4
No ratings yet
bda-unit-4-060115-big-data-analytics-unit-4
19 pages
Pig Full Lecture
No ratings yet
Pig Full Lecture
38 pages
UNIT-5
No ratings yet
UNIT-5
24 pages
Unit 5 Lecture No-2(PIG)
No ratings yet
Unit 5 Lecture No-2(PIG)
94 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
BDA Unit-4-PPT
No ratings yet
BDA Unit-4-PPT
98 pages
L Apachepigdataquery PDF
No ratings yet
L Apachepigdataquery PDF
10 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Unit-4_PIG_
No ratings yet
Unit-4_PIG_
9 pages
Pig
No ratings yet
Pig
27 pages
Apache PIG by Sravanthi
No ratings yet
Apache PIG by Sravanthi
31 pages
BIGDATUNIT5
No ratings yet
BIGDATUNIT5
32 pages
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
No ratings yet
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
58 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
big-data-unit-5-big-data-notes-of-unit-5
No ratings yet
big-data-unit-5-big-data-notes-of-unit-5
16 pages
BDA-Unit 5-notes
No ratings yet
BDA-Unit 5-notes
36 pages
05a-pig
No ratings yet
05a-pig
52 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
19 pages
IMTC634 - Data Science - Chapter 16
No ratings yet
IMTC634 - Data Science - Chapter 16
20 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Unit IV - Pig PDF
No ratings yet
Unit IV - Pig PDF
79 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
UNIT 5-1
No ratings yet
UNIT 5-1
8 pages
Scet Unit 5
No ratings yet
Scet Unit 5
9 pages
unit5-part1-notes
No ratings yet
unit5-part1-notes
21 pages
Module 4 - Pig
No ratings yet
Module 4 - Pig
65 pages
Pig
No ratings yet
Pig
16 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
Notes
No ratings yet
Notes
19 pages
Nosql 24 011 Pig
No ratings yet
Nosql 24 011 Pig
41 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Unit 4
No ratings yet
Unit 4
29 pages
Chapter 5 - Introducing Pig Pig Architecture
No ratings yet
Chapter 5 - Introducing Pig Pig Architecture
81 pages
Pig
No ratings yet
Pig
61 pages
Unit 5(Pig,Hive,Hbase)
No ratings yet
Unit 5(Pig,Hive,Hbase)
18 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
1381105764excell Raex Web Brochure October 2013
No ratings yet
1381105764excell Raex Web Brochure October 2013
4 pages
B. Tech ECE Electronics Computer Engineering 2018 Pattern
No ratings yet
B. Tech ECE Electronics Computer Engineering 2018 Pattern
7 pages
Experiment 6IOT
No ratings yet
Experiment 6IOT
8 pages
LAST FINAL FLUSH 1
No ratings yet
LAST FINAL FLUSH 1
34 pages
Automobile Tamil Part 1
No ratings yet
Automobile Tamil Part 1
5 pages
Ecommerce Website FOR Craftify: National College of Computer Studies
No ratings yet
Ecommerce Website FOR Craftify: National College of Computer Studies
40 pages
Operations Management PDF
100% (5)
Operations Management PDF
290 pages
MASTER PRELI Admission Guide
No ratings yet
MASTER PRELI Admission Guide
6 pages
Aurang Zeb CV
No ratings yet
Aurang Zeb CV
2 pages
Chapter - 4
No ratings yet
Chapter - 4
17 pages
Uzima Borehole Drilling System - Docx Documentation
100% (2)
Uzima Borehole Drilling System - Docx Documentation
97 pages
Anonymous Data Sharing Scheme in Public Cloud and Its Application in E-Health Record
No ratings yet
Anonymous Data Sharing Scheme in Public Cloud and Its Application in E-Health Record
11 pages
Alienware m16 R1 AMD Setup and Specifications: Regulatory Model: P124F Regulatory Type: P124F002 August 2023 Rev. A02
No ratings yet
Alienware m16 R1 AMD Setup and Specifications: Regulatory Model: P124F Regulatory Type: P124F002 August 2023 Rev. A02
24 pages
Data Structures and Algorithms - CD3291 - Important Questions
No ratings yet
Data Structures and Algorithms - CD3291 - Important Questions
6 pages
HP LE190 1w/LE1901wm and LE2201w LCD Monitors: User Guide
No ratings yet
HP LE190 1w/LE1901wm and LE2201w LCD Monitors: User Guide
45 pages
RTI Info HTMR Belagavi
No ratings yet
RTI Info HTMR Belagavi
6 pages
High Precision Spectroradiometer Integrating Sphere System 9000b
No ratings yet
High Precision Spectroradiometer Integrating Sphere System 9000b
12 pages
Documentation (AA20)
No ratings yet
Documentation (AA20)
62 pages
Mbsewitharcadiamethodstep by Stepsystemanalysis 230227223654 47d15350
No ratings yet
Mbsewitharcadiamethodstep by Stepsystemanalysis 230227223654 47d15350
15 pages
Web Based Survey Management System W SMS PDF
No ratings yet
Web Based Survey Management System W SMS PDF
9 pages
C G Sample Programs
No ratings yet
C G Sample Programs
26 pages
Surat kepada anjing hitam : biografi dan karomah kiai Kholil Bangkalan - Leiden University
No ratings yet
Surat kepada anjing hitam : biografi dan karomah kiai Kholil Bangkalan - Leiden University
1 page
ELE350 Section 2 Midterm Exam - Attempt Review - 2
No ratings yet
ELE350 Section 2 Midterm Exam - Attempt Review - 2
4 pages
Testbank for a First Course in Abstract Algebra 8th Edition Fraleigh
No ratings yet
Testbank for a First Course in Abstract Algebra 8th Edition Fraleigh
19 pages
AC2401 Bible
No ratings yet
AC2401 Bible
112 pages
Click Start 8 - Unit 1 To 5
No ratings yet
Click Start 8 - Unit 1 To 5
110 pages
4 Synchronization of 2G 3G GSM Networks
No ratings yet
4 Synchronization of 2G 3G GSM Networks
22 pages
EER Exercises
No ratings yet
EER Exercises
4 pages