0% found this document useful (0 votes)

2 views

Big Data Analytics lab file

The document provides detailed installation instructions for Apache Hive and HBase, including prerequisites, setup steps, and configuration commands. It also includes examples of Pig Latin scripts for data manipulation tasks such as sorting, grouping, joining, filtering, and counting words. Additionally, it outlines steps to find the maximum temperature for each year using Pig Latin.

Uploaded by

kanehoj367

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Big Data Analytics lab file

Uploaded by

kanehoj367

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

[Approved by AICTE, Govt. of India & Affiliated to Dr.

APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Program-6

Aim: Installation of Hive along with practice examples.

Theory: Apache Hive is a distributed, fault-tolerant data warehouse system that enables
analytics at a massive scale. A data warehouse provides a central store of
information that can easily be analyzed to make informed, data driven decisions.
Hive allows users to read, write, and manage petabytes of data using SQL.
Prerequisites:
 JDK(Java) installed in the system
 Hadoop installed
 7 Zip

Steps:
Step 1: Check whether the Java is installed or not using the following command.
$ java -version
Step 2: Check whether the Hadoop is installed or not using the following command.

$ hadoop version
Step 3: Download the Apache Hive file apache-hive-2.3.9-bin.tar.gz using Hive download
directory.
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Step 4: Unzip and Install Hive:

The following command is used to verify the download and extract the hive archive:

$ tar zxvf apache-hive-2.3.9-bin.tar.gz

$ ls

On successful download, you get to see the following response:

apache-hive-2.3.9-bin apache-hive-2.3.9-bin.tar.gz
Step 5: Copying files to /user/local/hive directory:
We need to copy the files from the super user “su -”. The following commands are
used to copy the files from the extracted directory to the /user/local/hive, directory.

$ su -
password:

# cd /home/cloudera/user/Download
# mv apache-hive-2.3.9-bin /usr/local/hive
# exit

Step 6: Setting up environment for Hive:

You can set up the Hive environment by appending the following lines to ~/.bashrc file:

export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.

The following command is used to execute ~/.bashrc file:

$ source ~/.bashrc
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Step 7: Configuring Hive:

To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed
in the $HIVE_HOME/conf directory. The following commands redirect to Hive config folder
and copy the template file:
$ cd $HIVE_HOME/conf
$ cp hive-env.sh.template hive-env.sh

Edit the hive-env.sh file by appending the following line:

export HADOOP_HOME=/user/local/27adoop
Hive installation is completed successfully.

Step 8: Verify Hive Installation by following command:

$ bin/hive
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Program-7

Aim: Installation of HBase, Installing thrift along with Practice examples.

Theory: HBase provides low latency random read and write access to petabytes of data by
distributing requests from applications across a cluster of hosts. Each host has
access to data in HDFS and S3, and serves read and write requests in milliseconds.

Steps:
Step 1: Download the HBase file hbase-3.0.0-beta-1-bin.tar.gz from Apache website.

Step 2: Unzip and Move:

$ cd /home/cloudera/user/Downloads
$ sudo tar -zxvf hbase-3.0.0-beta-1-bin.tar.gz
$ sudo mv hbase-3.0.0-beta-1 /usr/local/hbase
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Step 3: Edit hbase-env.sh and hbase-site.xml:

$ cd /usr/local/hbase/conf

In the hbase-env.sh file, you need to export JAVA_HOME path. Here, you need to check the
JAVA_HOME path on your system. You can display your java home using the command
given below:
{} $ echo $JAVA_HOME
On this system the above command gives the output /usr/lib/jvm/java-7-openjdk-amd64
To update the hbase-env.sh, we need to run the command given below:

$ sudo nano hbase-env.sh

Copy and paste the below line in the hbase-env.sh file:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HBASE_REGIONSERVERS=/usr/local/hbase/conf/regionservers
export HBASE_MANAGES_ZK=true

Use Ctrl+X and Y to save.

[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Your hbase-env.sh file will look like the image given below:

Now update the .bashrc file to export hbase variables:

$ sudo nano ~/.bashrc

Copy and paste the below lines at the end of .bashrc:

#HBASE VARIABLES START

export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
#HBASE VARIABLES END

Image below explains how to append the HBASE variables:

Use Ctrl+X and Y to save.

To make the above changes permanent in .bashrc, we need to run the command given
below:

Now to update the hbase-site.xml file, use the command given below:

$ cd /usr/local/hbase/conf
$ sudo nano hbase-site.xml

Add the following lines between configuration tags:

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:54310/hbase</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/usr/local/hadoop/zookeeper</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
</configuration>

Use Ctrl+X and Y to save

Your hbase-site.xml file will look like the image given below:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Step 4: Starting HBASE:

$ cd /usr/local/hbase/bin
$ sudo chown -R hduser:hadoop /usr/local/hbase/
$ ./start-hbase.sh

You will get the following message which looks as shown in the image given below at the
time of starting:

Once you need to run jps command to check the hbase daemons: {} $ jps
You will get the output as shown in the image given below:

To check the HBase Directory in HDFS:

$ hadoop fs -ls hdfs://localhost:54310/hbase/

You will get the output as shown in the image given below:
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Program-8

Aim: Write PIG Commands: Write Pig Latin scripts sort, group, join, project and filter your
data.

Pig Latin Script for Sorting:

The ORDER BY operator is used to display the contents of a relation in a sorted order based
on one or more fields.
Syntax:

grunt> Relation_name2 = ORDER Relatin_name1 BY (ASC|DESC);

Pig Latin Script for Group:

The GROUP operator is used to group the data in one or more relations. It collects the data
having the same key.
Syntax:
grunt> Group_data = GROUP Relation_name BY age;

Pig Latin Script for Join:

The JOIN operator is used to combine records from two or more relations. While performing
a join operation, we declare one (or a group of) tuple(s) from each relation, as keys. When
these keys match, the two particular tuples are matched, else the records are dropped.
Joins can be of the following types −

 Self-join
 Inner-join
 Outer-join − le join, right join, and full join
Syntax:
Given below is the syntax of performing self-join operation using the JOIN operator.
grunt> Relation3_name = JOIN Relation1_name BY key, Relation2_name BY key ;
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Here is the syntax of performing inner join operation using the JOIN operator.
grunt> result = JOIN relation1 BY columnname, relation2 BY columnname;

Pig Latin Script for Filter:

The FILTER operator is used to select the required tuples from a relation based on a
condition.
Syntax:

grunt> Relation2_name = FILTER Relation1_name BY (condition);

Program-9

Aim: Run the Pig Latin Scripts to find Word Count.

Program:
Assume we have data in the file like below:
This is a hadoop post
hadoop is a bigdata technology
we want to generate output for count of each word like below:

(a,2)
(is,2)
(This,1)
(class,1)
(hadoop,2)
(bigdata,1)
(technology,1)

Steps how to generate the same using Pig Latin:

Step 1: Load the data from HDFS

Use Load statement to load the data into a relation.
As keyword used to declare column names, as we don’t have any columns, we declared only
one column named line.
input = LOAD '/path/to/file/' AS(line:Chararray);
Step 2: Convert the Sentence into words
The data we have is in sentences. So we have to convert that data into words using
TOKENIZE Function.

(TOKENIZE(line));
(or)
If we have any delimeter like space we can specify as
(TOKENIZE(line,' '));

Output will be like this:

({(This),(is),(a),(hadoop),(class)})
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

({(hadoop),(is),(a),(bigdata),(technology)})

but we have to convert it into multiple rows like below:

(This)
(is)
(a)
(hadoop)
(class)
(hadoop)
(is)
(a)
(bigdata)
(technology)
Convert Column into Rows

I mean we have to convert every line of data into multiple rows ,for this we have function
called
FLATTEN in pig.

Using FLATTEN function the bag is converted into tuple, means the array of strings
converted into multiple rows.

Words = FOREACH input GENERATE FLATTEN(TOKENIZE(line,' ')) AS word;

Then the ouput is like below

(This)
(is)
(a)
(hadoop)
(class)
(hadoop)
(is)
(a)
(bigdata)
(technology)

Step 3: Apply GROUP BY

We have to count each word occurance, for that we have to group all the words.
Grouped = GROUP words BY word;
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Step 4: Generate word count

wordcount = FOREACH Grouped GENERATE group, COUNT(words);

We can print the word count on console using Dump.

DUMP wordcount;

Output will be like below:

(a,2)
(is,2)
(This,1)
(class,1)
(hadoop,2)
(bigdata,1)
(technology,1)

Below is the complete program for the same:

input = LOAD '/path/to/file/' AS(line:Chararray);

Words = FOREACH input GENERATE FLATTEN(TOKENIZE(line,' ')) AS word;
Grouped = GROUP words BY word;
wordcount = FOREACH Grouped GENERATE group, COUNT(words);
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Program-10

Aim: Run the Pig Latin Scripts to find a max temp for each and every year.

Word Count using Pig Latin Steps:

Step1:
1. Create a text file having few lines of text and save it as bd.txt.
2. Create on directory in hdfs named wc
3. Store bd..txt file from local file system to hdfs file system directory wc

Step2:
inputline = load '/user/cloudera/wc/bd.txt' using PigStorage('\t') as (data:chararray);
words = FOREACH inputline GENERATE FLATTEN(TOKENIZE(data)) AS word;
filtered_words = FILTER words BY word MATCHES '\\w+';
word_groups = GROUP filtered_words BY word;
word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS
count, group AS word;
ordered_word_count = ORDER word_count BY count DESC;
DUMP ordered_word_count;

You can use the below command to save the result in HDFS:
grunt> store ordered_word_count; into '/user/cloudera/wc/output/';

Find the Maximum Year in a given dataset using PIG

Option1:
A = LOAD 'input' USING PigStorage() AS (Year:int,Temp:int);
B = GROUP A ALL;
C = FOREACH B GENERATE MAX(A.Temp);
DUMP C;
[Approved by AICTE, Govt. of India & Affiliated to Dr. APJ
GL BAJAJ
Institute of Technologies & Management
Abdul Kalam Technical University, Lucknow, U.P., India]
Department of Applied Computational Science & Engineering
Greater Noida

Option2: Using (ORDER and LIMIT)

A = LOAD 'input' USING PigStorage() AS (Year:int,Temp:int);
B = ORDER A BY Temp DESC;
C = LIMIT B 1;
D = FOREACH C GENERATE Temp;
DUMP D;

Zymurgy - Special Ingridients and Indigenous Beer (Vol. 17, No. 4, 1994)
No ratings yet
Zymurgy - Special Ingridients and Indigenous Beer (Vol. 17, No. 4, 1994)
148 pages
hbase_installationn
No ratings yet
hbase_installationn
12 pages
Manual Hadoop HIve Installation
No ratings yet
Manual Hadoop HIve Installation
4 pages
1.2. Quick Start - Standalone HBase
No ratings yet
1.2. Quick Start - Standalone HBase
7 pages
Big Data Analysis - Lab Manual - Bharathidasan University - B.Sc Data Science, Second Year, 4th Semester
No ratings yet
Big Data Analysis - Lab Manual - Bharathidasan University - B.Sc Data Science, Second Year, 4th Semester
41 pages
BDA_Unit-5
No ratings yet
BDA_Unit-5
44 pages
Install HBase On Ubuntu 20.04
No ratings yet
Install HBase On Ubuntu 20.04
4 pages
bdcc-2.5
No ratings yet
bdcc-2.5
9 pages
CCS334 Bda
No ratings yet
CCS334 Bda
23 pages
Hadoop/Hbase Installation: Install Java
No ratings yet
Hadoop/Hbase Installation: Install Java
11 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
BDA Unit-5
No ratings yet
BDA Unit-5
44 pages
2 - Installation
No ratings yet
2 - Installation
15 pages
Hadoop 3 Installation
No ratings yet
Hadoop 3 Installation
10 pages
exp11.1
No ratings yet
exp11.1
3 pages
Final Copy - BDA LAB Record
No ratings yet
Final Copy - BDA LAB Record
44 pages
installation-of-Hive-on-Ubuntu
No ratings yet
installation-of-Hive-on-Ubuntu
9 pages
Hbase-1.1.2-Installation Guide-On-Hadoop-2.x
No ratings yet
Hbase-1.1.2-Installation Guide-On-Hadoop-2.x
7 pages
Ba Lab Record-It b2022-26
No ratings yet
Ba Lab Record-It b2022-26
43 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
HBase Installation
No ratings yet
HBase Installation
1 page
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Installation+Steps
No ratings yet
Installation+Steps
5 pages
Apache Hive
No ratings yet
Apache Hive
77 pages
Big Data Lab Manual and Syllabus
No ratings yet
Big Data Lab Manual and Syllabus
71 pages
Big Data Security 20100BTCSDSI07268
No ratings yet
Big Data Security 20100BTCSDSI07268
76 pages
big data
No ratings yet
big data
32 pages
Bigdata Manual Final
No ratings yet
Bigdata Manual Final
65 pages
HDFS Installation Steps
No ratings yet
HDFS Installation Steps
17 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Hbase Installation Steps
No ratings yet
Hbase Installation Steps
13 pages
Installation of Hadoop in Ubuntu
No ratings yet
Installation of Hadoop in Ubuntu
15 pages
Hadoop Install
No ratings yet
Hadoop Install
19 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
Final Bda 1-8 Lab Aayush
No ratings yet
Final Bda 1-8 Lab Aayush
17 pages
Apache Hive Installation and Basic Usage Guide
No ratings yet
Apache Hive Installation and Basic Usage Guide
10 pages
MapReduce Merged
No ratings yet
MapReduce Merged
18 pages
BDA Practical
No ratings yet
BDA Practical
38 pages
unit-4-unit-4-bda
No ratings yet
unit-4-unit-4-bda
16 pages
BDA LAB FILE Final 18EGICS110
No ratings yet
BDA LAB FILE Final 18EGICS110
54 pages
Unit-V CC&BD CS62
No ratings yet
Unit-V CC&BD CS62
73 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Big Data Journal
No ratings yet
Big Data Journal
50 pages
HBase Installation
No ratings yet
HBase Installation
3 pages
How To Install Hadoop On Ubuntu 18.04 or 20.04
No ratings yet
How To Install Hadoop On Ubuntu 18.04 or 20.04
15 pages
Bda Record
No ratings yet
Bda Record
46 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
BDA Lab manual
No ratings yet
BDA Lab manual
49 pages
Hadoop Installation Step by Step
No ratings yet
Hadoop Installation Step by Step
8 pages
BDA Record (1)
No ratings yet
BDA Record (1)
34 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
Apache HBase Installation
No ratings yet
Apache HBase Installation
1 page
bda lab
No ratings yet
bda lab
4 pages
module 4 HIVE1ppt
No ratings yet
module 4 HIVE1ppt
44 pages
BD U-5 (Anupam Sir)
No ratings yet
BD U-5 (Anupam Sir)
12 pages
Configuration of Apache Server To Support ASP
From Everand
Configuration of Apache Server To Support ASP
Dr. Hedaya Mahmood Alasooly
No ratings yet
CONFIGURATION OF APACHE SERVER TO SUPPORT ASP
From Everand
CONFIGURATION OF APACHE SERVER TO SUPPORT ASP
DR. HIDAIA MAHMOOD ALASSOULI
No ratings yet
A concise guide to PHP MySQL and Apache
From Everand
A concise guide to PHP MySQL and Apache
alasdair gilchrist
4/5 (2)
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
STP Pump Oil marcol_152
No ratings yet
STP Pump Oil marcol_152
2 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
88 pages
ArmorStart 284E Especificação
No ratings yet
ArmorStart 284E Especificação
3 pages
Hareem Ahmad CV - Electrical Engineer
No ratings yet
Hareem Ahmad CV - Electrical Engineer
2 pages
RNP Ar RNP App Review
No ratings yet
RNP Ar RNP App Review
3 pages
Excel Task
No ratings yet
Excel Task
27 pages
DN Series 100D Product Card
No ratings yet
DN Series 100D Product Card
2 pages
Day 1 Module 13 Onsite Transport and Storage of Healthcare Waste - English
No ratings yet
Day 1 Module 13 Onsite Transport and Storage of Healthcare Waste - English
50 pages
No S. 2 (22) (E) "Deemed Dividend" If Loan To Shareholder Is On Quid Pro Quo Basis
No ratings yet
No S. 2 (22) (E) "Deemed Dividend" If Loan To Shareholder Is On Quid Pro Quo Basis
1 page
Call Sheet - DONGIBAB - 081221 (REVISED)
No ratings yet
Call Sheet - DONGIBAB - 081221 (REVISED)
2 pages
Iso 2768-Mk - Solidworks Forums
100% (1)
Iso 2768-Mk - Solidworks Forums
3 pages
Tundi 15 Kva Quotation
No ratings yet
Tundi 15 Kva Quotation
3 pages
Udyam Certificate
No ratings yet
Udyam Certificate
1 page
CMMS Implementation Guide Ebook - Compressed
No ratings yet
CMMS Implementation Guide Ebook - Compressed
20 pages
Solar Oven 1
No ratings yet
Solar Oven 1
2 pages
Metering
100% (1)
Metering
147 pages
M68000 Instruction Sets
No ratings yet
M68000 Instruction Sets
78 pages
PSSR
No ratings yet
PSSR
11 pages
Ref47 Radl TheEngineeringHydrogenPeroxide
No ratings yet
Ref47 Radl TheEngineeringHydrogenPeroxide
28 pages
Pipeline Pressure Limits
No ratings yet
Pipeline Pressure Limits
6 pages
8259 4b27
No ratings yet
8259 4b27
3 pages
A Review of The Path To Consistency
No ratings yet
A Review of The Path To Consistency
8 pages
Fujitsu A6025 Laptop Manual
No ratings yet
Fujitsu A6025 Laptop Manual
120 pages
FOCAS Library
No ratings yet
FOCAS Library
2 pages
User's Manual of Pressure Tank
No ratings yet
User's Manual of Pressure Tank
15 pages
Company Profile PT.Laksana Putra Makmur
No ratings yet
Company Profile PT.Laksana Putra Makmur
8 pages
Weekly Schdule Template
No ratings yet
Weekly Schdule Template
1 page
Exit Examination Model Exam For Buma 2023 Academic Year
100% (7)
Exit Examination Model Exam For Buma 2023 Academic Year
17 pages
Divisibility Rules
No ratings yet
Divisibility Rules
5 pages

Big Data Analytics lab file

Uploaded by

Big Data Analytics lab file

Uploaded by

[Approved by AICTE, Govt. of India & Affiliated to Dr.

Aim: Installation of Hive along with practice examples.

Step 4: Unzip and Install Hive:

$ tar zxvf apache-hive-2.3.9-bin.tar.gz

On successful download, you get to see the following response:

Step 6: Setting up environment for Hive:

The following command is used to execute ~/.bashrc file:

Step 7: Configuring Hive:

Edit the hive-env.sh file by appending the following line:

Step 8: Verify Hive Installation by following command:

Aim: Installation of HBase, Installing thrift along with Practice examples.

Step 2: Unzip and Move:

Step 3: Edit hbase-env.sh and hbase-site.xml:

$ sudo nano hbase-env.sh

Copy and paste the below line in the hbase-env.sh file:

Use Ctrl+X and Y to save.

Now update the .bashrc file to export hbase variables:

$ sudo nano ~/.bashrc

Copy and paste the below lines at the end of .bashrc:

#HBASE VARIABLES START

Image below explains how to append the HBASE variables:

Use Ctrl+X and Y to save.

Add the following lines between configuration tags:

Use Ctrl+X and Y to save

Step 4: Starting HBASE:

To check the HBase Directory in HDFS:

$ hadoop fs -ls hdfs://localhost:54310/hbase/

Pig Latin Script for Sorting:

grunt> Relation_name2 = ORDER Relatin_name1 BY (ASC|DESC);

Pig Latin Script for Group:

Pig Latin Script for Join:

Pig Latin Script for Filter:

grunt> Relation2_name = FILTER Relation1_name BY (condition);

Aim: Run the Pig Latin Scripts to find Word Count.

Steps how to generate the same using Pig Latin:

Step 1: Load the data from HDFS

Output will be like this:

but we have to convert it into multiple rows like below:

Words = FOREACH input GENERATE FLATTEN(TOKENIZE(line,' ')) AS word;

Then the ouput is like below

Step 3: Apply GROUP BY

Step 4: Generate word count

We can print the word count on console using Dump.

Output will be like below:

Below is the complete program for the same:

input = LOAD '/path/to/file/' AS(line:Chararray);

Word Count using Pig Latin Steps:

Find the Maximum Year in a given dataset using PIG

Option2: Using (ORDER and LIMIT)

You might also like