0% found this document useful (0 votes)
14 views

Bigdata Ass2

MapReduce is a programming model and processing framework for distributed big data processing and generation. It was popularized by Google and is commonly used with Apache Hadoop for big data processing, batch processing, log analysis, search engines, and recommendation systems due to its scalability, fault tolerance, simplicity, and flexibility. Pig and Hive are query languages used for data processing on Hadoop - Pig operates on the client side and uses a procedural language while Hive operates on the server side and uses a SQL-like language mainly used by data analysts.

Uploaded by

anuragmodi018
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Bigdata Ass2

MapReduce is a programming model and processing framework for distributed big data processing and generation. It was popularized by Google and is commonly used with Apache Hadoop for big data processing, batch processing, log analysis, search engines, and recommendation systems due to its scalability, fault tolerance, simplicity, and flexibility. Pig and Hive are query languages used for data processing on Hadoop - Pig operates on the client side and uses a procedural language while Hive operates on the server side and uses a SQL-like language mainly used by data analysts.

Uploaded by

anuragmodi018
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Que 1

MapReduce is a programming model and processing framework designed to process and


generate large volumes of data in a parallel and distributed computing environment.
It was popularized by Google and is commonly associated with Apache Hadoop

where it is used :-

Big Data Processing


Batch Processing
Log Analysis
Search Engines
Recommendation Systems
Data Transformation

WhY it is used :-

Scalability
Fault Tolerance
Simplicity
Flexibility
Flexibility
-----------------------------------------------------------------------------------
-----------------------------------------------------------

Que 2
S.No. Pig Hive
1. Pig operates on the client side of a cluster. Hive operates on the
server side of a cluster.
2. Pig uses pig-latin language. Hive uses HiveQL language.
3. Pig is a Procedural Data Flow Language. Hive is a Declarative
SQLish Language.
4. It was developed by Yahoo. It was developed by Facebook.
5. It is used by Researchers and Programmers. It is mainly used by
Data Analysts.
6. It is used to handle structured and
semi-structured data. It is mainly used to handle
structured data.
7. It is used for programming. It is used for creating
reports.
8. Pig scripts end with .pig extension. In HIve, all extensions
are supported.
9. It does not support partitioning. It supports partitioning.
10. It loads data quickly. It loads data slowly.

-----------------------------------------------------------------------------------
--------------------------------------------------------

Que 3

NoSQL, or "Not Only SQL," is a type of database management system designed for
handling large volumes of unstructured or semi-structured data. It provides
flexible data models, horizontal scalability, and high performance, making it well-
suited for modern, data-intensive applications. NoSQL databases come in various
forms, including document, key-value, column-family, and graph databases, each
tailored to specific use cases.
Variation of nosql

Document Databases: These store data in semi-structured documents,

Key-Value Stores: Data is stored as key-value pairs

Column-Family Stores: Data is organized into column families rather than tables

Graph Databases: These are optimized for storing and querying graph-like data
structures.

Wide-Column Stores: Designed for large-scale data with high write throughput

Object Databases: These store data in the form of objects, allowing for complex
data structures.
-----------------------------------------------------------------------------------
----------------------------

Que 4
The Hadoop ecosystem is a collection of open-source software tools and frameworks
for distributed storage and processing of big data. Its key components include:

Following are the components that collectively form a Hadoop ecosystem:

HDFS: Hadoop Distributed File System


YARN: Yet Another Resource Negotiator
MapReduce: Programming based Data Processing
Spark: In-Memory data processing
PIG, HIVE: Query based processing of data services
HBase: NoSQL Database
Mahout, Spark MLLib: Machine Learning algorithm libraries
Solar, Lucene: Searching and Indexing
Zookeeper: Managing cluster
Oozie: Job Scheduling

-----------------------------------------------------------------------------------
---------------
Que 5
Social networking mining refers to the process of extracting insights and patterns
from social media data. It involves collecting and analyzing user-generated content
from platforms like Facebook, Twitter, and Instagram.

Applications:

Sentiment Analysis: Assessing public sentiment towards products, brands, or events.


Marketing and Advertising: Targeting audiences and optimizing ad campaigns.
Recommendation Systems: Personalizing content and product recommendations.
Crisis Management: Monitoring social media during crises and providing support.
Competitive Intelligence: Analyzing competitor strategies and consumer perceptions.
Influencer Marketing: Identifying and collaborating with social media influencers.
Political Analysis: Analyzing public opinion and trends during elections.
Healthcare: Tracking disease outbreaks and public health concerns.
Security and Fraud Detection: Identifying security threats and fraud.
Content Creation: Identifying popular topics for content creation.

You might also like