0% found this document useful (0 votes)
66 views

Big Data1 Project Updated With Scenario Winter

This project requires students to: 1. Analyze business data from a selected company using HDFS, Hive, Zeppelin, and HBase. Insights could be presented as a report or graphical representations. 2. Give a 15-20 minute presentation explaining the analysis and results and how different technologies were used. 3. An example scenario provided is a healthcare analysis of patient data from a hospital, including creating HBase tables, analyzing data with Hive and Zeppelin, and presenting insights with charts.

Uploaded by

Muhammad Mazarib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Big Data1 Project Updated With Scenario Winter

This project requires students to: 1. Analyze business data from a selected company using HDFS, Hive, Zeppelin, and HBase. Insights could be presented as a report or graphical representations. 2. Give a 15-20 minute presentation explaining the analysis and results and how different technologies were used. 3. An example scenario provided is a healthcare analysis of patient data from a hospital, including creating HBase tables, analyzing data with Hive and Zeppelin, and presenting insights with charts.

Uploaded by

Muhammad Mazarib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Big Data1 Class Project

This project is a team project where the team can be between 2 and 4 people

1. The first part of this project is to select a company business and demonstrate the use of data by
applying the different topics that we have used to date to show a business result – this could be
in the form of a report or a series of graphical representations that gives insights into the
business activities of that organization
2. The second part of this project is to prepare a power point slide presentation where you will be
given a 15-20 minute window to present to the class the business and what it does and what
your analysis found by explaining the results from the first part of the project and why you used
the different technologies in the process. A demo after the presentation could be beneficial

Project Details

For section 1 the expectation is to use all the tools

1. HDFS
2. Hive
3. Zeppelin
4. Hbase

Select any business type from( health care to government to software company) etc. then research to
determine what you want show insight into: example a bank may want to show the uptake of user
accounts year over year or month over month or even the different types of accounts to see which one
is growing or shrinking. You can find or create the dataset or datasets that would be pertinent to your
business to be used in exercising the above components.

For section 2 it will be in a presentation form showing content and description of what you did via
screenshots of the stages in your process to the end result.

Assignment Mark Breakdown

1. Usage of the Hadoop components - 40 %


a. HDFS
b. Hive
c. Zeppelin
i. Dataframe Scala
ii. SQL
iii. Visualization Charts – Multiple
d. Hbase
2. Usage of Data including Visualization – 30 %
3. Presentation - 10 %
4. Originality and ingenuity -20 %
Example Scenario
There are many uses cases to choose from (Banking, Sales, manufacturing ,Health Care, Pharma - below
is an example of a scenario to engage in all aspects of technology that we have touched so far

1. Health Care - Hospital analysis of patient data


a. Number of critical ,severe ,mild accidents in a year
b. Create a Hbase table that show the Patient with differing columns in a column family
example
i. A group of patients has name address , accident , severity
ii. Another group of patients has name ,accident,severity,medication,addict.
c. Now you have the hbase table or tables that captured this semi-structured data you can
show how it can be seen in HIVE by mapping the hive to hbase structure – which in turn
you can dump the hive data into a flat file or CSV which in turn is loaded into a
dataframe to be analysed in zeppelin.
d. Loading the CSV or Flatfile in Zeppelin using scala and manipulating it with SQL
e. You could show how many patients were in accidents that were admitted to the hospital
as opposed to how many were released the same day based on severity of accidents .
f. You could also show this in charts - example a month to month breakdown of accidents
or show a line graph of accidents in correlation to weather patterns etc. – or the
breakdown of age and gender as lines graphs as well

You might also like