Group Project Outline

This document outlines a group project using the ELK (Elasticsearch, Logstash, Kibana) stack to analyze a large NYC OpenData dataset containing over 34 million rows of 311 service request data. Students are expected to demonstrate proficiency with the ELK stack by developing a Logstash configuration file to ingest the data, performing analytical queries and visualizations in Kibana, and presenting results in a report. Specific tasks include creating visualizations like tables, charts, tag clouds and maps to analyze top complaint calls and cities, and integrating these into a dashboard.

Uploaded by

mbjanjua35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views2 pages

Group Project Outline

Uploaded by

mbjanjua35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

DBAS31064 Big Data Storage Management

Group Project
ELK (Elasticsearch, Logstash, Kibana)
Due Date: 6th December 2023
Introduction
In this project you will be working with NYC OpenData published by the city of New York
pertaining to 311 service requests collected since 2010 with over 34 million rows with 41 columns.
https://nycopendata.socrata.com/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-
nwe9.
This project has the following objectives:
1. To further expose you to using ELK stack as an analytic tool to analyze streaming realistic big
data.
2. To give you experience working with opened end problems, that are similar to problems that you
will face in your career as a big data professional.
3. At the end of this project you should:
a. Gain sufficient confidence in creating Logstash configuration files and creating
Elasticsearch indices, advanced queries, charts, maps and dashboards using Kibana i.e., fully
using ELK stack in real big data scenarios.
b. Gain an appetite for working with large streaming datasets.
c. Be aware of the potential and benefits of analyzing large streaming datasets using big data
tools.
What is expected from you?
1. You should demonstrate that you have a good understanding of the features that ELK stack
provides, such as advanced queries, indexing, managing tables, aggregations, charts/graphs,
maps and dashboards.
2. You should demonstrate your ability to work with multiple large datasets and use appropriate
big data tools to gain valuable insights from the datasets. As well, you should be able to
present these insights in a manner that is easily consumable by stakeholders and other
interested parties.
Problem Background
You have been hired as a data analyst by the city of New York to gain valuable insights from their
huge data set for 311 service requests. Your task is to use the ELK stack hosted in a cloud platform
such as GCP. Successful completion of this task includes creating a Logstash configuration file as
well as a geo-point template (for maps), provisioning a GCP to host a fully installed, configured and
functional ELK stack, firing Logstash to ingest and process (e.g., cleaning, data mutation and
conversion, simply data preparation) the NYC 311 service requests data into Elasticsearch and using
Kibana to analyze and visualize the results as per the questions given. This is a group Project with
grade weight of 30% of your final score. The required results are: Code for your Logstash
configuration file and geo-point template, results for the analytical questions (tables, charts, tag
clouds, maps and dashboard) in MS Word or PDF document. Where applicable, show the
syntax/code or capture screenshots for all your analysis.
Analytical Questions
1. Create a table showing the top 10 cities with the highest calls alongside the count of top 10
complaint calls (by Descriptor) in each city.

1
2. Create a pie chart showing the top 5 cities with the highest calls alongside the top five calls
(Descriptor) in each city
3. Create a tag cloud representing the top 20 call descriptors.
4. Create a coordinated map of all the major call descriptors in each city
5. Create a dashboard for all visualizations of 1to 4 above

Note:
1. 10% of the project weight for integrating Kafka into your ELK stack
2. You do not have to downloaded dataset directly into your instances, you should use Logstash
http_poller input plug-in to ingest the data directly from the available API endpoints. You should
view and explore the dataset on the above website. Knowing and understanding your dataset will
certainly help you do better analysis.

Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
From Everand
Mastering ClickHouse: High-Performance Data Analytics for Modern Applications
Robert Johnson
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Analitical Modeling Research in Fashion Business PDF
100% (1)
Analitical Modeling Research in Fashion Business PDF
274 pages
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
The Modern Engineer's Spreadsheet Toolbox
From Everand
The Modern Engineer's Spreadsheet Toolbox
Pasquale De Marco
No ratings yet
Learning DHTMLX Suite UI
From Everand
Learning DHTMLX Suite UI
Eli Geske
No ratings yet
GETTING STARTED WITH SQL: Exercises with PhpMyAdmin and MySQL
From Everand
GETTING STARTED WITH SQL: Exercises with PhpMyAdmin and MySQL
Remy Lentzner
No ratings yet
Naukri_Karthik[4y_6m] (1)
No ratings yet
Naukri_Karthik[4y_6m] (1)
4 pages
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
From Everand
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
Neylson Crepalde
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Oracle Quick Guides: Part 1 - Oracle Basics: Database and Tools
From Everand
Oracle Quick Guides: Part 1 - Oracle Basics: Database and Tools
Malcolm Coxall
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
Building the Future: Advanced Web Development Techniques with Flask and Python
From Everand
Building the Future: Advanced Web Development Techniques with Flask and Python
Ladd Baby
No ratings yet
SAP Business Objects SA
From Everand
SAP Business Objects SA
equitypress
5/5 (2)
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Learn R By Coding
From Everand
Learn R By Coding
Thomas Kurnicki
No ratings yet
Twitter Data Analysis and Visualizations Using The R Language On Top of The Hadoop Platform
No ratings yet
Twitter Data Analysis and Visualizations Using The R Language On Top of The Hadoop Platform
17 pages
Aws Devops Khan
No ratings yet
Aws Devops Khan
2 pages
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
From Everand
Python Data Science Cookbook: Practical solutions across fast data cleaning, processing, and machine learning workflows with pandas, NumPy, and scikit-learn
Taryn Voska
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Python Data Science Cookbook
From Everand
Python Data Science Cookbook
Taryn Voska
No ratings yet
Management Strategies for the Cloud Revolution (Review and Analysis of Babcock's Book)
From Everand
Management Strategies for the Cloud Revolution (Review and Analysis of Babcock's Book)
BusinessNews Publishing
No ratings yet
Mapbox Development Essentials: Definitive Reference for Developers and Engineers
From Everand
Mapbox Development Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
GraphX in Practice: Definitive Reference for Developers and Engineers
From Everand
GraphX in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
GTKSharp Programming Guide: Definitive Reference for Developers and Engineers
From Everand
GTKSharp Programming Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential .NET Framework Technologies
From Everand
Essential .NET Framework Technologies
Pasquale De Marco
No ratings yet
Learning Programming and Computer Science: 1, #1
From Everand
Learning Programming and Computer Science: 1, #1
MATHY WISDOM
No ratings yet
Backend Development
From Everand
Backend Development
Kai Turing
No ratings yet
Introduction To ELK
No ratings yet
Introduction To ELK
27 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Learning .NET High-performance Programming
From Everand
Learning .NET High-performance Programming
Antonio Esposito
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
IBM Cognos 8 Planning
From Everand
IBM Cognos 8 Planning
Jason Edwards
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Oracle APEX Tips and Tricks
From Everand
Oracle APEX Tips and Tricks
Priyanka Agarwal
No ratings yet
C++ Data Structures Explained: A Practical Guide with Examples
From Everand
C++ Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
LangChain Essentials: From Basics to Advanced AI Applications
From Everand
LangChain Essentials: From Basics to Advanced AI Applications
Robert Johnson
No ratings yet
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
C++ Essentials
From Everand
C++ Essentials
Zoe Codewell
No ratings yet
ASP.NET For Beginners: The Simple Guide to Learning ASP.NET Web Programming Fast!
From Everand
ASP.NET For Beginners: The Simple Guide to Learning ASP.NET Web Programming Fast!
Tim Warren
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
JavaScript and jQuery for Data Analysis and Visualization
From Everand
JavaScript and jQuery for Data Analysis and Visualization
Jon Raasch
No ratings yet
Power BI DAX: A Guide to Using Basic Functions in Data Analysis
From Everand
Power BI DAX: A Guide to Using Basic Functions in Data Analysis
Kiet Huynh
No ratings yet
Essential Apache Beam: Definitive Reference for Developers and Engineers
From Everand
Essential Apache Beam: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Databricks Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Pipeline Nifi Aws Elk
No ratings yet
Pipeline Nifi Aws Elk
2 pages
Homework 04
No ratings yet
Homework 04
2 pages
Building a Product Master
From Everand
Building a Product Master
Edufdev
No ratings yet
Siddhant Resume
No ratings yet
Siddhant Resume
1 page
System Design Basics
From Everand
System Design Basics
Kai Turing
No ratings yet
KSQL for Stream Processing: Definitive Reference for Developers and Engineers
From Everand
KSQL for Stream Processing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Log Management and System Monitoring: Mastering the ELK Stack
From Everand
Advanced Log Management and System Monitoring: Mastering the ELK Stack
Adam Jones
No ratings yet
The Turning Point: A Novel about Agile Architects Building a Digital Foundation: The Open Group Series
From Everand
The Turning Point: A Novel about Agile Architects Building a Digital Foundation: The Open Group Series
Kees van den Brink
No ratings yet
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
Campaigns Specialist - 12-Month Contract
No ratings yet
Campaigns Specialist - 12-Month Contract
4 pages
LAB 2 Transfer Learning
No ratings yet
LAB 2 Transfer Learning
10 pages
Tender Notice - (2023 24)
No ratings yet
Tender Notice - (2023 24)
12 pages
CG8-23-27160 00
No ratings yet
CG8-23-27160 00
3 pages
Lab 3 Yolo Object Detection
No ratings yet
Lab 3 Yolo Object Detection
5 pages
Meyerkord, Annette EMGT Field Project
No ratings yet
Meyerkord, Annette EMGT Field Project
61 pages
MATH2089 NM Lectures Topic4
No ratings yet
MATH2089 NM Lectures Topic4
16 pages
Pakistani Cinema Through A Transitional Lens
No ratings yet
Pakistani Cinema Through A Transitional Lens
11 pages
Assessment of EV Promotion Policies in Chinese Cities
No ratings yet
Assessment of EV Promotion Policies in Chinese Cities
60 pages
GRAD - CAM - LAB - Conduct
No ratings yet
GRAD - CAM - LAB - Conduct
3 pages
Maclaurin Serieskey
No ratings yet
Maclaurin Serieskey
6 pages
HW
No ratings yet
HW
3 pages
HW
No ratings yet
HW
3 pages
Write A Note Here
No ratings yet
Write A Note Here
49 pages
Materi Fintech
No ratings yet
Materi Fintech
45 pages
Springer Ebook Universitas Telkom
No ratings yet
Springer Ebook Universitas Telkom
530 pages
Load Balancing and Service Discovery Using Docker
No ratings yet
Load Balancing and Service Discovery Using Docker
10 pages
Difference Between Big Data and Business Analytics
No ratings yet
Difference Between Big Data and Business Analytics
3 pages
Project Proposal Advanced Database Systems
No ratings yet
Project Proposal Advanced Database Systems
6 pages
BDA viva
No ratings yet
BDA viva
26 pages
Tuan Nguyen A1
No ratings yet
Tuan Nguyen A1
4 pages
2012 2013 Intel IT Performance Report
100% (2)
2012 2013 Intel IT Performance Report
24 pages
Digital Fluency
No ratings yet
Digital Fluency
20 pages
Chapter 1 Introduction Data Analytics
No ratings yet
Chapter 1 Introduction Data Analytics
64 pages
Chapter - 1 AI
No ratings yet
Chapter - 1 AI
40 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
14 pages
Emerging Technologies Handout
No ratings yet
Emerging Technologies Handout
64 pages
Get BIG DATA ANALYTICS: Introduction to Hadoop, Spark, and Machine-Learning Raj Kamal free all chapters
No ratings yet
Get BIG DATA ANALYTICS: Introduction to Hadoop, Spark, and Machine-Learning Raj Kamal free all chapters
51 pages
Big-Data-Ai-Ml-And-Data-Protection ICO - UK
No ratings yet
Big-Data-Ai-Ml-And-Data-Protection ICO - UK
114 pages
Introductory Big Data
No ratings yet
Introductory Big Data
34 pages
FIM Report
No ratings yet
FIM Report
16 pages
ML Module 1
No ratings yet
ML Module 1
52 pages
Rahul Sharma
100% (1)
Rahul Sharma
2 pages
Marketing October November 2015
No ratings yet
Marketing October November 2015
100 pages
Big Data Analytics in Digital Banking - Matt - Assignment2no1 - Final
No ratings yet
Big Data Analytics in Digital Banking - Matt - Assignment2no1 - Final
10 pages
Full Notes
No ratings yet
Full Notes
62 pages
Icu Integrated Solution: Technology Convoys For Life
No ratings yet
Icu Integrated Solution: Technology Convoys For Life
21 pages
Bachelor of Computer Science (With Major) - Study Plan
No ratings yet
Bachelor of Computer Science (With Major) - Study Plan
3 pages
percorso_formativo
No ratings yet
percorso_formativo
7 pages
HDFS Vs CFS
No ratings yet
HDFS Vs CFS
14 pages
Ar - Programing
No ratings yet
Ar - Programing
3 pages
The Future of Big Data Analytics and Its Progress: November 2022
No ratings yet
The Future of Big Data Analytics and Its Progress: November 2022
10 pages
Marketing Data Lake
No ratings yet
Marketing Data Lake
221 pages

Group Project Outline

Uploaded by

Group Project Outline

Uploaded by

DBAS31064 Big Data Storage Management

You might also like