ITU Big Data Projects Summer15
ITU Big Data Projects Summer15
Problem Statement
A.
B.
C.
D.
E.
F.
G.
Dataset
https://edureka.wistia.com/medias/cpj3ljetym/download?
media_file_id=64495520
Dataset Description
https://edureka.wistia.com/medias/410pi4dlfe/download?
media_file_id=64495579
Team B
VarshaTomar, Rupesh,
Nallapaneni
Swetha
,Sheny,
Sandeep
Problem Statement
Find the display name and no. of posts created
by the user who has got maximum reputation.
Find the average age of users on the Stack
Overflow site.
Find the display name of user who posted the
oldest post on Stack Overflow (in terms of date).
Find the display name and no. of comments
done by the user who has got maximum
reputation.
Find the display name of user who has created
maximum no. of posts on Stack Overflow.
Overflow.
Find the owner name and id of user whose post
has got maximum no. of view counts so far.
Find the title and owner name of the post which
has maxim Find the title and owner name of post
who has got maximum no. of Comment count.
Find the location which has maximum no of
Stack Overflow users.
Dataset
https://edureka.wistia.com/medias/d06fdpiiec/download?
media_file_id=64431552
Dataset Description
https://edureka.wistia.com/medias/btr6i3e0p5/download
?media_file_id=81524799
Team C
Vidya Rani Gidiginjala Manne, Remya Nekkuth
Melath, Simmy Payyappilly, Varghese, Monali Modi,
Ajuba Benazir Riyaz
Problem Statement
1. Find the number of movies released between
1950 and 1960.
2. Find the number of movies having rating more
than 4.
3. Find the movies whose rating are between 3 and
4.
4. Find the number of movies with duration more
than 2 hours (7200 second).
5. Find the list of years and number of movies
released each year.
6. Find the total number of movies in the dataset.
Dataset
https://edureka.wistia.com/medias/7qd5lgmko4
Dataset Description
Team D
XinchengTang,XiaoranAn,YelinLu,JingranXu,Xincheng
Tang
Problem Statement
1. Find out the top 5 categories with maximum
number of videos uploaded.
2. Find out the top 10 rated videos.
3. Find out the most viewed videos.
Dataset
https://edureka.wistia.com/medias/6cchxi6to4
Dataset Description
Column1: Video id of 11 characters.Column2: uploader
of the video of string data type.
Column3: Interval between day of establishment of
Youtube and the date of uploading of the video of integer
data type.
Column4: Category of the video of String data type.
Column5: Length of the video of integer data type.
Column6: Number of views for the video of integer data
type. Column7: Rating on the video of float data type.
Column8: Number of ratings given on the video.
Column9: Number of comments on the videos in integer
Team E
Minghao (Murphy) Zhai
Minghao (Murphy) Zhai
Jaime Shien Yuanqi (Linda) Zhou
Problem Statement
1. Count number of countries based on landmass.
2. Find out top 5 country with Sum of bars and strips in
a flag.
3. Count of countries with icon.
4. Count of countries which have same top left and top
right color in flag.
5. Count number of countries based on zone.
6. Find out largest county in terms of area in NE zone.
7. Find out least populated country in S.America
landmass.
8. Find out largest speaking language among all
countries.
9. Find most common colour among flags from all
countries.
10.
Sum of all circles present in all country flags.
11.
Count of countries which have both icon and
text in flag.
Dataset
http://archive.ics.uci.edu/ml/machine-learningdatabases/flags/flag.data
Dataset Description
http://archive.ics.uci.edu/ml/machine-learningdatabases/flags/flag.names
Team F
Abinas Roy, Anushree Randad, Lorena Arague, Vivek
Narang
Problem Statement
1. Find list of Airports operating in the Country
India
2. Find the list of Airlines having zero stops
3. List of Airlines operating with code share
4. Which country (or) territory having highest
Airports
5. Find the list of Active Airlines in United state
Dataset
https://edureka.wistia.com/medias/67vuzsza8j/download
?media_file_id=66596539
Dataset Description
In this use case there are 3 data sets. Final_airlines,
routes.dat,airports_mod.dat
*********************************************************
***AirPortsdataseti.eairports_mod.datIt contains the
following fields
Airport ID Unique OpenFlights identifier for this airport.
AirLinesDataset:
RoutesDataseti.eroutes.dat
It contains the following fields
Airline 2-letter (IATA) or 3-letter (ICAO) code of the
airline.
Airline ID Unique OpenFlights identifier for airline (see
Airline).
Source airport 3-letter (IATA) or 4-letter (ICAO) code of
the source airport.
Source airport ID Unique OpenFlights identifier for source
airport (see Airport)
Destination airport 3-letter (IATA) or 4-letter (ICAO) code
of the destination airport.
Destination airport ID Unique OpenFlights identifier for
destination airport (see Airport)
Codeshare "Y" if this flight is a codeshare (that is, not
operated by Airline, but another carrier), empty
otherwise.
Team G
Vasanth Nair Swarali Chaudhari
Shweta Tiwari
Shikha Saxena
Introduction
This document will tell you how to analyse the NFL
dataset and generate the optimised output for the
same.
13.
Point towards the folder containing the
Interim Data i.e. Cluster_Out.csv
The Cluster_Out.csvwill look like this:
DataSet for this project:
https://www.dropbox.com/s/ykgr2yh67b47rs0/edureka-nfl-dataset-.zip?
dl=0
Team H
Prathyusha Kota Saritha Buchireddy Raghureddy
Laxmi Madhu Kumar Brahmandam Venkesh
Ethiraj