0% found this document useful (0 votes)

3 views

Big Data Unit 5 (Easy Notes ) Edushine Classes

Uploaded by

Yashi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Big Data Unit 5 (Easy Notes ) Edushine Classes

Uploaded by

Yashi Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Big Data(BCS061/BCDS-601/KOE-097

Unit – 5 Hadoop Ecosystem Frameworks , Pig,

Hive , HBase

Edushine Classes
Follow Us
Download Notes : https://rzp.io/rzp/JV7zlavG
https://telegram.me/rrsimtclasses/
Big Data(BCS061/BCDS-601

🐷 What is Pig? (in Hadoop)

Pig is a data flow language used to analyze big data in Hadoop.
It uses a simple language called Pig Latin, which is easier than writing Java MapReduce code.
 Why Use Pig?
• It helps process huge data sets.
• It reduces coding time (just like SQL is easier than full programming).
• It converts your code into MapReduce jobs automatically.
⚙Execution Modes of Pig :
Pig can run in 2 Modes –
Big Data(BCS061/BCDS-601

➡ You can choose the mode using the command:

pig -x local // for local mode
pig -x mapreduce // for Hadoop cluster

🌟 Features of Pig
i. Easy to Learn – Uses Pig Latin, similar to SQL.
ii. Handles Big Data – Good for analyzing huge datasets.
iii. Extensible – You can write your own functions (called UDFs).
iv. Automatically Converts to MapReduce – No need to write complex code.
v. Supports Joins, Filters, Grouping – Like SQL operations.
vi. Error Handling – Provides good debugging and error messages.
Pig is a tool to process big data using Pig Latin.
It runs in local or MapReduce mode and makes data handling easy and fast in Hadoop.
Big Data(BCS061/BCDS-601

🐷 Pig Latin vs SQL(Database) :

RRSIMT CLASSES WHATSAPP - 9795358008 Follow Us

Big Data(BCS061/BCDS-601

🐷💻 What is Grunt in Pig?(Short Note)

• Grunt is the command-line interface (CLI) of Pig.
• It’s like a place where you type Pig commands and run them step by step.
✅ What You Can Do in Grunt:
• Write and run Pig Latin commands
• Load, filter, join, and process data
• See outputs and debug easily
Big Data(BCS061/BCDS-601

 Syntax and Semantics of Pig Latin :

✅ Syntax of Pig Latin
Pig Latin is a data flow language. Its syntax defines how we write statements to process
data step by step.
It includes commands like:
1.LOAD – To load data from HDFS
data = LOAD 'file.txt' USING PigStorage(',') AS (name:chararray, age:int);
2.FILTER – To select rows based on condition
adults = FILTER data BY age >= 18;
3.FOREACH…GENERATE – To select specific columns
names = FOREACH adults GENERATE name;
4.GROUP – To group records
grouped = GROUP data BY age;
5.JOIN – To combine two datasets
joined = JOIN A BY id, B BY id;
Big Data(BCS061/BCDS-601

6.STORE/DUMP – To save or display the result

DUMP names;
STORE names INTO 'output';

✅ Semantics of Pig Latin

Semantics means the meaning of the Pig Latin statements. Each line is a step in the data
flow and describes how data moves and is processed.
Example :
data = LOAD 'students.csv' AS (name, marks);
passed = FILTER data BY marks >= 33; Pig Latin has a simple syntax and clear
DUMP passed; Meaning: semantics, making it easy to process large
• Load student data data in Hadoop. It supports step-by-step
• Select only those who passed data flow, similar to SQL but more flexible
• Show the result on screen for big data.
Big Data(BCS061/BCDS-601

✅ What is a UDF in Pig?

A User Defined Function (UDF) in Pig is a custom function created by the user to perform
operations that are not available in built-in functions.
Pig has many built-in functions, but if you need something special (like custom string or
math logic), you can create your own.
✅ Language Used:
 UDFs are usually written in Java
 Can also be written in Python, Ruby, or JavaScript
✅ Example Use:
Let’s say you want to convert names to uppercase but there’s no built-in function:
You can write a UDF like ToUpper() and use it in Pig like:
Example :
REGISTER myudfs.jar;
data = LOAD 'file.txt' AS (name:chararray);
upper_names = FOREACH data GENERATE ToUpper(name);
Big Data(BCS061/BCDS-601

 Data Processing Operators in Pig

Pig Latin provides several data processing operators that help in analyzing and transforming
large datasets efficiently. These operators allow step-by-step data processing similar to SQL
but are more suitable for parallel processing in Hadoop.
🔹 1. LOAD
Used to load data from a file or HDFS into a relation.
🔹 2. FILTER
Used to select records that meet a specific condition.
🔹 3. FOREACH…GENERATE
Used to perform operations on each record and generate new output.
🔹 4. GROUP
Used to group records based on the value of a specific field.
🔹 5. JOIN
Used to join two or- more relations
Download based
Notes on a common key.
: https://rzp.io/rzp/JV7zlavG
Big Data(BCS061/BCDS-601

🔹 6. ORDER
Used to sort the data based on one or more fields.
🔹 7. DISTINCT
Used to remove duplicate records from a dataset.
🔹 8. LIMIT
Used to return a specified number of rows.
🔹 9. DUMP
Used to display the result on the console.
🔹 10. STORE
Used to save the result into a file or directory in HDFS.
These operators are essential for performing tasks like filtering, grouping, joining, and
storing data in big data applications using Pig.
Big Data(BCS061/BCDS-601

 Apache Hive and Its Architecture

🔹 What is Hive?
Hive is a data warehouse tool built on top of Hadoop. It helps in reading, writing, and
managing large datasets using HiveQL (a SQL-like language). It converts HiveQL queries
into MapReduce jobs for processing.
Big Data(BCS061/BCDS-601

🏗 Architecture of Hive:
1. User Interfaces:
Used to interact with Hive.
Examples:
• Web UI
• Hive Command Line
• HDInsight
2. Meta Store:
• Stores metadata (info about tables, columns, data types).
• Helps Hive know where and how the data is stored in HDFS.
3. HiveQL Process Engine:
• Receives queries written in HiveQL.
• Checks the syntax and passes the query to the execution engine.
Big Data(BCS061/BCDS-601

4. Execution Engine:
• Converts queries into MapReduce jobs.
• Executes them on the Hadoop cluster.
5. HDFS or HBase Storage:
• Hive stores actual data in HDFS or HBase.
• It just processes queries over this stored data.
Hive lets you run SQL-like queries on big data stored in HDFS. It uses components like
Metastore, HiveQL engine, and Execution engine to turn your queries into results.
Big Data(BCS061/BCDS-601

✍Working of Hive with Hadoop (Step-by-Step)

When a user runs a HiveQL query, this is what happens:
Big Data(BCS061/BCDS-601

🔹 1. Interface (Step 1 & 10):

The user writes the query using Hive Command Line, Web UI, or other interfaces.
🔹 2. Driver (Steps 2, 6, 9):
The driver receives the query and manages the full process:
• Sends the query to the compiler
• Monitors the execution
• Returns results to the user
🔹 3. Compiler (Steps 3 & 5):
The compiler checks the query for errors and converts it into a logical plan.
It also asks the Metastore for table info.
🔹 4. Metastore (Step 4):
Stores metadata (data about data), like table names, columns, data types, location in HDFS.
🔹 5. Execution Engine (Steps 7, 7.1, 8):
The query is passed to the Execution Engine, which converts it into MapReduce jobs.
Big Data(BCS061/BCDS-601

🔹 6. Hadoop Framework (MapReduce + HDFS):

• MapReduce processes the data
• HDFS provides the data from DataNodes
• Once processed, results are sent back to the Hive Execution Engine
🔹 7. Final Result (Step 9 & 10):
The result is collected by the Driver and shown to the user.

Hive converts your SQL-like query into MapReduce jobs, runs them using Hadoop, gets the
results from HDFS, and gives you the answer — just like a smart translator between SQL and big
data.
Big Data(BCS061/BCDS-601

📄 Short Note: Apache Hive Installation :

1.Install Java and Hadoop
• Make sure Java and Hadoop are installed and working properly.
• Set environment variables for both.
2.Download Hive
• Go to the official Hive website and download the Hive software.
• Extract the files and place them in a folder like /usr/local/hive.
3.Set Environment Variables
• Add Hive path to the system using .bashrc or .bash_profile.
4.Create Directories in HDFS
• Make folders /tmp and /user/hive/warehouse in HDFS.
• Give permission using Hadoop commands.
5.Initialize Metastore
•Use Derby database (default) and run command to initialize the schema:
schematool -initSchema -dbType derby
Big Data(BCS061/BCDS-601

6.Start Hive
• Type hive in terminal to open Hive shell and start writing HiveQL queries.
✅ Hive Shell :
Hive Shell is a command-line tool where we write and run Hive queries.
• It looks like a terminal screen where we type HiveQL commands.
• It is used to create tables, load data, and run queries on big data stored in HDFS.
📝 Example:
You open Hive shell by typing hive in the terminal. Then you can write:
SELECT * FROM student;
✅ Hive Services :
Hive has several services that help it work smoothly. Main services are:
1. Driver
Manages query execution and keeps track of its progress.
Big Data(BCS061/BCDS-601

2. Compiler
Checks your Hive query and converts it into a MapReduce job.
3. Metastore
Stores information (metadata) about Hive tables like names, columns, types, etc.
4. Execution Engine
Runs the query and fetches the result using MapReduce.

✅ What is Hive Metastore?

• Hive Metastore is like a library catalog for Hive.
• It stores all the information about Hive tables—like their names, columns, data types,
where data is stored, etc.
📌 Think of it as a database about your data.
Big Data(BCS061/BCDS-601

Hive Metastore is a service that

stores metadata about Hive tables,
columns, data types, and HDFS
locations. It helps Hive know how
and where the data is stored.
Big Data(BCS061/BCDS-601

✅ Comparison: Hive vs Traditional Database

Big Data(BCS061/BCDS-601

✅ 1. What is HiveQL?
HiveQL (Hive Query Language) is a SQL-like language used to interact with Hive.
It helps to create tables, insert data, and run queries on large datasets stored in HDFS.
📌 Example:
SELECT name FROM students WHERE marks > 80;

✅ 2. What is a Hive Table?

A Hive table is like a virtual table where data is stored in HDFS.
It has rows and columns just like in SQL.
📝 Types:
i. Managed Table: Hive manages both data and metadata.
ii. External Table: Hive manages only metadata. Data remains outside.
Big Data(BCS061/BCDS-601

✅ 3. What is Partition in Hive?

Partition means dividing a table into smaller parts based on column values.
Helps in faster query performance by scanning only required parts.
📌 Example:
Partition a sales table by year:
 PARTITIONED BY (year INT)
✅ 4. What is Bucketing in Hive?
Bucketing further divides data inside a partition into equal-sized files (buckets) based on the
hash function.
Helps in faster joins and sampling.
📌 Example:
 CLUSTERED BY (student_id) INTO 4 BUCKETS;
Big Data(BCS061/BCDS-601

✅ 5. Storage Formats in Hive

Hive supports multiple file formats for storing data:

✅ 6. Sorting in Hive
• Sorting means arranging data in ascending or descending order.
• Done using ORDER BY or SORT BY.
📌 Example: SELECT * FROM student ORDER BY marks DESC;
Big Data(BCS061/BCDS-601

✅ 7. Aggregating in Hive
Aggregation means using functions like COUNT, SUM, AVG, MAX, MIN to summarize data.
📌 Example: SELECT AVG(marks) FROM student;

✅ 8. Joins in Hive
Joins are used to combine rows from two or more tables based on a related column.
📌 Types:
 INNER JOIN – returns matching rows
 LEFT OUTER JOIN – returns all from left + match from right
 RIGHT OUTER JOIN – returns all from right + match from left
 FULL OUTER JOIN – all rows from both tables
Example :
SELECT s.name, m.marks
FROM students s
JOIN marks m ON s.id = m.student_id;
Big Data(BCS061/BCDS-601

✅ 9. Subqueries in Hive
A subquery is a query inside another query.
It helps in filtering, grouping, or complex logic.
📌 Example:
SELECT name FROM student
WHERE marks > (SELECT AVG(marks) FROM student);
Big Data(BCS061/BCDS-601

✅ What is HBase?
• HBase is a NoSQL database that runs on top of Hadoop.
• It is used to store and manage very large data (billions of rows) in a table format, just like an
Excel sheet — but distributed across many machines.
• It works well for real-time read and write of big data.
📌 Think of it as a giant Excel sheet spread across many computers!
✨ Features of Hbase :
Big Data(BCS061/BCDS-601

✅ HBase Data Model :

HBase stores data in tables, just like SQL — but the structure is different and more
flexible.
📦 Basic Structure of HBase:
Big Data(BCS061/BCDS-601

 HBase Data Model Components :

Big Data(BCS061/BCDS-601

✅ Client Options for Interacting with HBase Cluster

There are many ways to interact with an HBase cluster:
1. HBase Shell – This is a command-line tool that lets us run commands to create
tables, insert data, read data, and manage the database easily.
2. Java API – Developers can use Java programming to connect with HBase and
perform read/write operations in their programs.
3. REST API – HBase can be accessed using web URLs, which is helpful for web
applications and services.
4. Thrift API – It allows other languages like Python, PHP, and C++ to connect with
HBase.
5. MapReduce – Hadoop's MapReduce can be used to process data stored in HBase
in large batches.
6. Hive Integration – Hive can be used to write SQL-like queries (HiveQL) on HBase
tables for easier data analysis.
Big Data(BCS061/BCDS-601

 Difference between HBase and RDBMS :

Big Data(BCS061/BCDS-601

✅ Schema Design in Hbase :

In HBase, designing the schema means deciding how to organize your data in tables. But
it’s very different from SQL databases.
• HBase is schema-less for columns — you only need to define column families, not
individual columns.
• Each row is identified by a Row Key — it should be unique and well-designed (like a roll
number or user ID).
• Column families group related columns (like student:name, student:marks).
• It’s important to group data that is usually accessed together into the same column
family.
• Avoid putting too many column families because each one is stored separately, which
slows down performance.

Download Notes : https://rzp.io/rzp/JV7zlavG

Big Data(BCS061/BCDS-601

✅ What is Indexing in HBase?

In HBase, data is stored and searched based on Row Keys only.
That means:
If you know the Row Key, data retrieval is very fast.
But if you want to search by some other column, like "name" or "city", it becomes slow —
because HBase doesn't create indexes on those columns by default.

✅ What is Advanced Indexing?

Advanced Indexing means creating a secondary index (extra structure) to make searching
faster by non-key columns.
This helps you search HBase tables like SQL-style queries:
• Search by name, email, or age, not just Row Key.
Big Data(BCS061/BCDS-601

✅ Example :
Suppose you have an HBase table Student:

if you want:
"Find student whose Name = Priya"
➡️ This is slow because HBase will check each row one by one (called full scan).
We can create a Secondary Index Table:
Big Data(BCS061/BCDS-601

Now:
First, you search in the index table using "Priya" → it gives you 1003.
Then, go to the main table with 1003 → get full student data.
✅ Faster than full table scan.
✅ Short Note on ZooKeeper and Its Role in Monitoring a Cluster
ZooKeeper is a tool used in Hadoop and HBase systems to manage and coordinate
different machines (nodes) in a cluster.
It helps in:
• Tracking node status: ZooKeeper keeps an eye on which servers are active and which
are down.
• Leader election: If the main/master server fails, ZooKeeper helps to choose a new
leader automatically.
• Communication: It helps all nodes in the cluster talk to each other smoothly.
• Fail recovery: When a server fails, ZooKeeper informs the system so it can recover
quickly.
Big Data(BCS061/BCDS-601

• ZooKeeper makes sure that the cluster runs smoothly, with less downtime and better
coordination.
✅ IBM Big Data Strategy :
IBM's Big Data strategy focuses on helping businesses use their data in a smart way to make
better decisions, faster.
IBM believes that Big Data is not just about collecting a lot of data, but about using that
data to get useful insights.
✅ Key Points of IBM’s Big Data Strategy:
1. Volume, Variety, Velocity:
IBM handles all types of data – big in size, different in format (text, video, etc.), and
coming at high speed.
2. Unified Platform:
IBM provides a complete platform where you can store, manage, analyze, and visualize
your data in one place.
Big Data(BCS061/BCDS-601

3. Infosphere BigInsights:
IBM offers this tool to process and analyze Big Data using Hadoop technology.
4. Big SQL:
You can use SQL queries to analyze big data easily, even if it’s stored in Hadoop.
5. Security and Governance:
IBM ensures that data is safe, secure, and managed properly, with proper rules.
6. Integration with AI and Cloud:
IBM connects Big Data with AI (Watson) and Cloud to provide real-time intelligence
and smart decisions.
Big Data(BCS061/BCDS-601

✅ 1. InfoSphere (by IBM)

InfoSphere is a set of IBM tools that helps in:
• Collecting, managing, and analyzing big data.
• It makes sure data is clean, organized, and ready to be used in analytics.
•It supports data integration, data quality, and data governance.
📌 In Easy Words:
InfoSphere is IBM’s tool to manage big data properly so companies can trust and use their
data easily.
✅ 2. BigInsights
BigInsights is IBM’s platform for working with Big Data using Hadoop.
• It is built on Apache Hadoop but has extra features like better security, analytics, and a
user-friendly interface.
• Helps to process large data and get useful results.
• Includes tools for developers, data scientists, and business users.
Big Data(BCS061/BCDS-601

📌 In Easy Words:
BigInsights is IBM’s software that adds more power and features to Hadoop for better
big data processing.

✅ 3. BigSheets
BigSheets is a tool in BigInsights that looks like Excel but works on Big Data.
• It allows users to analyze large datasets without coding.
• You can filter, sort, group, and visualize big data using an easy spreadsheet-style interface.
• Great for business users who don’t know programming.
📌 In Easy Words:
BigSheets is like Excel for Big Data. It helps non-technical people explore and analyze big data
in a simple way.
Big Data(BCS061/BCDS-601

✅ What is BigSQL?
BigSQL is a tool by IBM that lets you use SQL queries to work with Big Data stored in Hadoop.
• Just like we use SQL for normal databases (like MySQL, Oracle),
•With BigSQL, we can write same SQL queries to read data from Hadoop (HDFS), Hive, or
HBase.
📌 In Easy Words:
BigSQL helps you use familiar SQL language to work with huge data stored in big data systems
like Hadoop.
✅ Key Features of BigSQL
• ✅ Works with standard SQL
• ✅ Can access data from Hive, HDFS, HBase
• ✅ Faster and more efficient than using Hive alone
• ✅ Supports joins, subqueries, sorting, grouping
• ✅ Provides security and governance features
Big Data(BCS061/BCDS-601

✅ How BigSQL Works?

You write SQL queries
Like:
SELECT * FROM customers WHERE city = 'Lucknow';
2.⚙BigSQL takes your SQL and translates it into commands that Hadoop can understand.
3.🗃It fetches data from different big data sources like HDFS, Hive tables, or HBase.
4.⚡ Processes the data using a powerful engine (faster than normal Hive).
5.📄 Returns results just like a normal SQL database does.

Download Notes : https://rzp.io/rzp/JV7zlavG

Big Data(BCS061/BCDS-601

Thank You….

Download Notes : https://rzp.io/rzp/JV7zlavG

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6414)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (640)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1173)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (991)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1852)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4101)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1015)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5143)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (460)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2126)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4355)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2001)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1087)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2787)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2876)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4087)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (835)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (918)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (814)
Mongodb Mock Test
No ratings yet
Mongodb Mock Test
7 pages
Big Data Unit 1 Easy Notes (Edushine Classes)
No ratings yet
Big Data Unit 1 Easy Notes (Edushine Classes)
21 pages
A Mini Project Report on Autoencoders (1)
No ratings yet
A Mini Project Report on Autoencoders (1)
39 pages
222
No ratings yet
222
35 pages
cssdocument
No ratings yet
cssdocument
2 pages
Web 3 0 Presentation
No ratings yet
Web 3 0 Presentation
11 pages
Kotlin
No ratings yet
Kotlin
2 pages
Parth Gupta
No ratings yet
Parth Gupta
2 pages
main slides
No ratings yet
main slides
28 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Software Engineering Lab 02 Prepared by Dang Minh Thang
No ratings yet
Software Engineering Lab 02 Prepared by Dang Minh Thang
16 pages
Top 50 Goldengate Interview Questions - VitalSoftTech
100% (1)
Top 50 Goldengate Interview Questions - VitalSoftTech
10 pages
ACFS File System Is Not Configured.: Opatch Auto
No ratings yet
ACFS File System Is Not Configured.: Opatch Auto
3 pages
Transactions
No ratings yet
Transactions
6 pages
Full Download Oracle SQL Recipes A Problem Solution Approach 1st Edition Grant Allen PDF DOCX
100% (1)
Full Download Oracle SQL Recipes A Problem Solution Approach 1st Edition Grant Allen PDF DOCX
67 pages
The Best of Bruce's Postgres Slides: Ruce Omjian
No ratings yet
The Best of Bruce's Postgres Slides: Ruce Omjian
26 pages
current_log
No ratings yet
current_log
39 pages
Practical Assignment-1 (RDBMS) : Practicle-2 Insert Data Into Following Tables
No ratings yet
Practical Assignment-1 (RDBMS) : Practicle-2 Insert Data Into Following Tables
3 pages
Mongo MCQ
No ratings yet
Mongo MCQ
13 pages
Lab 4 - SQLStatements
No ratings yet
Lab 4 - SQLStatements
3 pages
21 Recovery (1)
No ratings yet
21 Recovery (1)
7 pages
Lesson 11 Payroll Calculation Solution
No ratings yet
Lesson 11 Payroll Calculation Solution
6 pages
Sales Tracking System Tables
No ratings yet
Sales Tracking System Tables
21 pages
R 5
100% (1)
R 5
70 pages
SQL Osnovne Komande I Opis
No ratings yet
SQL Osnovne Komande I Opis
5 pages
ODBA Lab Questions
No ratings yet
ODBA Lab Questions
10 pages
Interview Questions
No ratings yet
Interview Questions
36 pages
Exam 05
No ratings yet
Exam 05
25 pages
A3 Worksheet ΓÇô Explore a database
No ratings yet
A3 Worksheet ΓÇô Explore a database
5 pages
SQL Project Fall 2024
No ratings yet
SQL Project Fall 2024
19 pages
BP_CS_XII_I PB
No ratings yet
BP_CS_XII_I PB
1 page
DAC (Dedicated Admin Connection) SQL Server
No ratings yet
DAC (Dedicated Admin Connection) SQL Server
3 pages
Chapter 17 The REA Data Model: Accounting Information Systems, 12e (Romney/Steinbart)
No ratings yet
Chapter 17 The REA Data Model: Accounting Information Systems, 12e (Romney/Steinbart)
39 pages
Aplikasi Client Server: Oleh
No ratings yet
Aplikasi Client Server: Oleh
16 pages
What Are JOINS?
No ratings yet
What Are JOINS?
9 pages
DBMS Project 1
No ratings yet
DBMS Project 1
4 pages
Microsoft Jet Database Engine
No ratings yet
Microsoft Jet Database Engine
258 pages
Laboratory Record Note Book: Rajalakshmi Institute of Technology
100% (1)
Laboratory Record Note Book: Rajalakshmi Institute of Technology
110 pages
Creation of Database Using Derby DB. Our Derby Bookstore Database Contains Tables: Authors (Authorid, Firstname, Lastname, EMAIL) 4 Attributes
No ratings yet
Creation of Database Using Derby DB. Our Derby Bookstore Database Contains Tables: Authors (Authorid, Firstname, Lastname, EMAIL) 4 Attributes
16 pages

Big Data Unit 5 (Easy Notes ) Edushine Classes

Uploaded by

Big Data Unit 5 (Easy Notes ) Edushine Classes

Uploaded by

Big Data(BCS061/BCDS-601/KOE-097

Unit – 5 Hadoop Ecosystem Frameworks , Pig,

🐷 What is Pig? (in Hadoop)

➡ You can choose the mode using the command:

🐷 Pig Latin vs SQL(Database) :

RRSIMT CLASSES WHATSAPP - 9795358008 Follow Us

🐷💻 What is Grunt in Pig?(Short Note)

 Syntax and Semantics of Pig Latin :

6.STORE/DUMP – To save or display the result

✅ Semantics of Pig Latin

✅ What is a UDF in Pig?

 Data Processing Operators in Pig

 Apache Hive and Its Architecture

✍Working of Hive with Hadoop (Step-by-Step)

🔹 1. Interface (Step 1 & 10):

🔹 6. Hadoop Framework (MapReduce + HDFS):

📄 Short Note: Apache Hive Installation :

✅ What is Hive Metastore?

Hive Metastore is a service that

✅ Comparison: Hive vs Traditional Database

✅ 2. What is a Hive Table?

✅ 3. What is Partition in Hive?

✅ 5. Storage Formats in Hive

✅ HBase Data Model :

 HBase Data Model Components :

✅ Client Options for Interacting with HBase Cluster

 Difference between HBase and RDBMS :

✅ Schema Design in Hbase :

Download Notes : https://rzp.io/rzp/JV7zlavG

✅ What is Indexing in HBase?

✅ What is Advanced Indexing?

✅ 1. InfoSphere (by IBM)

✅ How BigSQL Works?

Download Notes : https://rzp.io/rzp/JV7zlavG

Download Notes : https://rzp.io/rzp/JV7zlavG

You might also like