Facebook Data Analysis Using Hadoop and Hive (1)
Facebook Data Analysis Using Hadoop and Hive (1)
Project Proposal
By
A project submitted in partial fulfillment of the requirements for the degree of Bachelors in
Computer Science
1. Introduction --------------------------------------------------------------------------------------------------
1
1.1. Background ---------------------------------------------------------------------------------------------- 2
1.2. Problem statement -------------------------------------------------------------------------------------- 2
1.3. Objective ------------------------------------------------------------------------------------------------- 2
1.4. Scope ----------------------------------------------------------------------------------------------------- 2
2. Literature Review -------------------------------------------------------------------------------------------
3
2.1. Technology used ---------------------------------------------------------------------------------------- 3
2.2. Case studies ---------------------------------------------------------------------------------------------- 3
3. Methodology ------------------------------------------------------------------------------------------------- 4
3.1. Data collection ------------------------------------------------------------------------------------------ 4
3.2. Data Analyze -------------------------------------------------------------------------------------------- 4
3.3. Physical design ------------------------------------------------------------------------------------------ 4
3.3.1. Database design --------------------------------------------------------------------------------- 4
3.3.2. Interface design --------------------------------------------------------------------------------- 4
4. Project Milestones, Timelines and Deliverables ------------------------------------------------------ 5
5. Estimated Project cost ------------------------------------------------------------------------------------- 6
6. References ---------------------------------------------------------------------------------------------------- 7
1. Introduction
1.1 Background
Facebook is one of the largest social media platform with over 2.9 billion active users, generating
vast amounts of structure and unstructured data daily. Understanding user engagement, trending
topics, and content performance requires big data technologies that can handle high-velocity and high-
volume data. Traditional relational database management systems(RDBMS) struggle to handle
such massive datasets efficiently, but traditional SQL-based systems are inefficient for processing such
vast datasets. This project focuses on using Hadoop and Hive to perform large-scale data analysis
on Facebook datasets.
To address this challenge, Hadoop and Hive provide a distributed and scalable solution for
handling big data. Hadoop’s ecosystem enables efficient storage and processing of massive
datasets, while Hive simplifies querying and analysis using an SQL-like interface. This project
focuses on analyzing Facebook data using Hadoop and Hive to extract meaningful insights,
such as user engagement patterns, trending topics, and content popularity. The project aims to
demonstrate how big data technologies can enhance social media(Facebook) analysis by
improving data processing speed, scalability and accuracy.
1.3Objective
The primary objectives of this project are:
1.4Scope
2 Literature Review
2.1Technology used
Hadoop: For distributed storage and processing.
Hive: For querying large datasets using SQL-like syntax.
VMware:
Python:
Linux:
2.2Case studies
3 Methodology
3.1Data collection
3.2Data Analyze
3.3Physical design
3.3.1 Database design
3.3.2 Interface design
4 Project Milestones, Timelines and Deliverables
5 Estimated Project cost
6 References
References:
1. Rahayu, A., Fitriana, L. P., & Pasa, I. Y. (2021, April). Designing an Completeness Medical Record
Document Website Using Waterfall Model. In International Conference Health, Science And
Technology (ICOHETECH) (pp. 105-108).