0% found this document useful (0 votes)
96 views2 pages

Group Project Outline

This document outlines a group project using the ELK (Elasticsearch, Logstash, Kibana) stack to analyze a large NYC OpenData dataset containing over 34 million rows of 311 service request data. Students are expected to demonstrate proficiency with the ELK stack by developing a Logstash configuration file to ingest the data, performing analytical queries and visualizations in Kibana, and presenting results in a report. Specific tasks include creating visualizations like tables, charts, tag clouds and maps to analyze top complaint calls and cities, and integrating these into a dashboard.

Uploaded by

mbjanjua35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views2 pages

Group Project Outline

This document outlines a group project using the ELK (Elasticsearch, Logstash, Kibana) stack to analyze a large NYC OpenData dataset containing over 34 million rows of 311 service request data. Students are expected to demonstrate proficiency with the ELK stack by developing a Logstash configuration file to ingest the data, performing analytical queries and visualizations in Kibana, and presenting results in a report. Specific tasks include creating visualizations like tables, charts, tag clouds and maps to analyze top complaint calls and cities, and integrating these into a dashboard.

Uploaded by

mbjanjua35
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

DBAS31064 Big Data Storage Management

Group Project
ELK (Elasticsearch, Logstash, Kibana)
Due Date: 6th December 2023
Introduction
In this project you will be working with NYC OpenData published by the city of New York
pertaining to 311 service requests collected since 2010 with over 34 million rows with 41 columns.
https://nycopendata.socrata.com/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-
nwe9.
This project has the following objectives:
1. To further expose you to using ELK stack as an analytic tool to analyze streaming realistic big
data.
2. To give you experience working with opened end problems, that are similar to problems that you
will face in your career as a big data professional.
3. At the end of this project you should:
a. Gain sufficient confidence in creating Logstash configuration files and creating
Elasticsearch indices, advanced queries, charts, maps and dashboards using Kibana i.e., fully
using ELK stack in real big data scenarios.
b. Gain an appetite for working with large streaming datasets.
c. Be aware of the potential and benefits of analyzing large streaming datasets using big data
tools.
What is expected from you?
1. You should demonstrate that you have a good understanding of the features that ELK stack
provides, such as advanced queries, indexing, managing tables, aggregations, charts/graphs,
maps and dashboards.
2. You should demonstrate your ability to work with multiple large datasets and use appropriate
big data tools to gain valuable insights from the datasets. As well, you should be able to
present these insights in a manner that is easily consumable by stakeholders and other
interested parties.
Problem Background
You have been hired as a data analyst by the city of New York to gain valuable insights from their
huge data set for 311 service requests. Your task is to use the ELK stack hosted in a cloud platform
such as GCP. Successful completion of this task includes creating a Logstash configuration file as
well as a geo-point template (for maps), provisioning a GCP to host a fully installed, configured and
functional ELK stack, firing Logstash to ingest and process (e.g., cleaning, data mutation and
conversion, simply data preparation) the NYC 311 service requests data into Elasticsearch and using
Kibana to analyze and visualize the results as per the questions given. This is a group Project with
grade weight of 30% of your final score. The required results are: Code for your Logstash
configuration file and geo-point template, results for the analytical questions (tables, charts, tag
clouds, maps and dashboard) in MS Word or PDF document. Where applicable, show the
syntax/code or capture screenshots for all your analysis.
Analytical Questions
1. Create a table showing the top 10 cities with the highest calls alongside the count of top 10
complaint calls (by Descriptor) in each city.

1
2. Create a pie chart showing the top 5 cities with the highest calls alongside the top five calls
(Descriptor) in each city
3. Create a tag cloud representing the top 20 call descriptors.
4. Create a coordinated map of all the major call descriptors in each city
5. Create a dashboard for all visualizations of 1to 4 above

Note:
1. 10% of the project weight for integrating Kafka into your ELK stack
2. You do not have to downloaded dataset directly into your instances, you should use Logstash
http_poller input plug-in to ingest the data directly from the available API endpoints. You should
view and explore the dataset on the above website. Knowing and understanding your dataset will
certainly help you do better analysis.

You might also like