0% found this document useful (0 votes)
4 views

Facebook Data Analysis Using Hadoop and Hive (1)

The project proposal outlines a plan to analyze Facebook data using Hadoop and Hive, focusing on the challenges of processing large-scale unstructured data generated by the platform. It aims to collect, preprocess, and analyze user engagement trends and content popularity while evaluating the performance of these big data technologies. The project is part of the requirements for a Bachelor's degree in Computer Science at Shaikh Zayed University.

Uploaded by

khangul4321khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Facebook Data Analysis Using Hadoop and Hive (1)

The project proposal outlines a plan to analyze Facebook data using Hadoop and Hive, focusing on the challenges of processing large-scale unstructured data generated by the platform. It aims to collect, preprocess, and analyze user engagement trends and content popularity while evaluating the performance of these big data technologies. The project is part of the requirements for a Bachelor's degree in Computer Science at Shaikh Zayed University.

Uploaded by

khangul4321khan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Facebook Data Analysis Using Hadoop and Hive

Project Proposal

By

Full name (member 1)


(Member 2)
(Member 3)
(Member 4)

A project submitted in partial fulfillment of the requirements for the degree of Bachelors in
Computer Science

Examination Committee: Mr. Samiullah Hassan (Main Advisor)

Mr.Dilawar Shah Zwakman (Co- Advisor )

Mr. Javid Hamdard (Co- Advisor )

Shaikh Zayed University


Computer Science Faculty
Information Technology Department
Month Year 1403
Table of Contents

1. Introduction --------------------------------------------------------------------------------------------------
1
1.1. Background ---------------------------------------------------------------------------------------------- 2
1.2. Problem statement -------------------------------------------------------------------------------------- 2
1.3. Objective ------------------------------------------------------------------------------------------------- 2
1.4. Scope ----------------------------------------------------------------------------------------------------- 2
2. Literature Review -------------------------------------------------------------------------------------------
3
2.1. Technology used ---------------------------------------------------------------------------------------- 3
2.2. Case studies ---------------------------------------------------------------------------------------------- 3
3. Methodology ------------------------------------------------------------------------------------------------- 4
3.1. Data collection ------------------------------------------------------------------------------------------ 4
3.2. Data Analyze -------------------------------------------------------------------------------------------- 4
3.3. Physical design ------------------------------------------------------------------------------------------ 4
3.3.1. Database design --------------------------------------------------------------------------------- 4
3.3.2. Interface design --------------------------------------------------------------------------------- 4
4. Project Milestones, Timelines and Deliverables ------------------------------------------------------ 5
5. Estimated Project cost ------------------------------------------------------------------------------------- 6
6. References ---------------------------------------------------------------------------------------------------- 7
1. Introduction
1.1 Background
Facebook is one of the largest social media platform with over 2.9 billion active users, generating
vast amounts of structure and unstructured data daily. Understanding user engagement, trending
topics, and content performance requires big data technologies that can handle high-velocity and high-
volume data. Traditional relational database management systems(RDBMS) struggle to handle
such massive datasets efficiently, but traditional SQL-based systems are inefficient for processing such
vast datasets. This project focuses on using Hadoop and Hive to perform large-scale data analysis
on Facebook datasets.

1.2 Problem statement


The rapid growth of social media platforms, Facebook has become one of the largest sources of
user-generated data. This huge amount of data includes posts, comments, reactions and user
interactions, which hold valuable insights for businesses, researchers and data analysts. However,
traditional database systems struggle to efficiently process and analyze such large-scale
unstructured data due to their limitations in scalability and performance.

To address this challenge, Hadoop and Hive provide a distributed and scalable solution for
handling big data. Hadoop’s ecosystem enables efficient storage and processing of massive
datasets, while Hive simplifies querying and analysis using an SQL-like interface. This project
focuses on analyzing Facebook data using Hadoop and Hive to extract meaningful insights,
such as user engagement patterns, trending topics, and content popularity. The project aims to
demonstrate how big data technologies can enhance social media(Facebook) analysis by
improving data processing speed, scalability and accuracy.

1.3Objective
The primary objectives of this project are:

 To collect and preprocess Facebook data.


 To store data efficiently using HDFS (Hadoop Distributed File System).
 To use Apache Hive for SQL-like querying of big data.
 To analyze user engagement trends, post popularity, and sentiment analysis.
 To evaluate the performance of Hadoop & Hive in big data analysis.

1.4Scope
2 Literature Review
2.1Technology used
 Hadoop: For distributed storage and processing.
 Hive: For querying large datasets using SQL-like syntax.
 VMware:
 Python:
 Linux:

2.2Case studies
3 Methodology
3.1Data collection
3.2Data Analyze
3.3Physical design
3.3.1 Database design
3.3.2 Interface design
4 Project Milestones, Timelines and Deliverables
5 Estimated Project cost
6 References

References:

1. Rahayu, A., Fitriana, L. P., & Pasa, I. Y. (2021, April). Designing an Completeness Medical Record
Document Website Using Waterfall Model. In International Conference Health, Science And
Technology (ICOHETECH) (pp. 105-108).

You might also like