0% found this document useful (0 votes)
128 views

KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania

Here are the key steps I would follow: 1. Collect relevant historical match data between Aston Villa and Sunderland including goals scored by each team. 2. Analyze the data - look at trends, averages, distributions to understand patterns in previous matches. 3. Based on the analysis, predict the probability of different outcomes for this match (e.g. probability of Aston Villa scoring 0, 1 or 2+ goals and likewise for Sunderland). 4. After the match, compare the actual outcome to my predictions to evaluate the model's accuracy. Over time, as more data is collected, the model can be refined. 5. Visualize the data and predictions to better understand trends and communicate insights.

Uploaded by

Jason Zeng
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views

KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania

Here are the key steps I would follow: 1. Collect relevant historical match data between Aston Villa and Sunderland including goals scored by each team. 2. Analyze the data - look at trends, averages, distributions to understand patterns in previous matches. 3. Based on the analysis, predict the probability of different outcomes for this match (e.g. probability of Aston Villa scoring 0, 1 or 2+ goals and likewise for Sunderland). 4. After the match, compare the actual outcome to my predictions to evaluate the model's accuracy. Over time, as more data is collected, the model can be refined. 5. Visualize the data and predictions to better understand trends and communicate insights.

Uploaded by

Jason Zeng
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

Introduction

KIT306/606: Data Analytics


Unit Coordinator: A/Prof. Quan Bai
University of Tasmania
Teaching Staff
Unit Coordinator/Lecturer
A/Prof. Quan Bai
• Email: [email protected]

Lecturer/tutor
Dr. Wenli Yang
• Email: [email protected]

Tutor
Mr. Shiqing Wu
• Email: [email protected]
About me
Dr Quan Bai
• Associate professor, School of ICT, UTAS My research:
• PhD, Wollongong University, Australia AI
• Tasmanian ICT Centre, CSIRO Machine learning
Distributed systems
• Auckland University of Technology (AUT), New Zealand

Contact information: My teaching:


Programming
• Email: [email protected] Software engineering
• Office: Centenary Building 456 AI
About you
• Program?
• Major?
• Why select this unit?
• Future plan?
Data, big data, data science, analytics, machine learning
Data and big data
• Data are characteristics that are collected through observation.[1] In a
more technical sense, data is a set of values of qualitative or 
quantitative variables about one or more persons or objects, while
a datum (singular of data) is a single value of a single variable
(Wikipedia)
• Big data is a field that treats ways to analyze, systematically
extract information from, or otherwise deal with data sets that are
too large or complex to be dealt with by traditional data-processing 
application software. Data with many cases (rows) offer greater 
statistical power, while data with higher complexity (more attributes
or columns) may lead to a higher false discovery rate (Wikipedia)

Example: SKA:
https://www.youtube.com/watch?v=Hog411ZSzEY
Service development

Data mining/KDD

Data processing/
Information overload, preparation
complexity, …
Slide 7
Data Analytics
• Analysis of data is a process of inspecting, cleaning, transforming, and
modeling data with the goal of discovering useful information,
suggesting conclusions, and supporting decision-making.
• Data preparation
• Data mining
• Data visualization
Data Science…is not just data…
Databases Vs Data Science

Databases Data Science

Querying the past Querying the future


Machine Learning vs Data Science

Machine Learning Data Science


Develop new (individual) models Explore many models, build and tune
hybrids
Prove mathematical properties of
models Understand empirical properties of
models
Improve/validate on a few, Develop/use tools that can handle
relatively clean, small datasets massive datasets
Why all the Excitement?
Google Flu Trends:

Detecting outbreaks
two weeks ahead
of CDC data

New models are estimating


which cities are most at risk
for spread of the Ebola virus.

https://covid19.who.int/?gclid=Cj0KCQjwl4v4BRDaARIsAFjATPm1sdKDfdmHba2exp
kkvjkbQOstKuYhr3fMmK4Mcyq7biJ1CmAf4fIaAlBVEALw_wcB

12
Nate Silver and the 2012 Elections
The unreasonable effectiveness of Deep Learning
2012 Imagenet challenge:
Classify 1 million images into 1000 classes.

https://www.youtube.com/watch?v=kPclKYSd8dw 14
Data Analysis Has Been Around for a While
Peter Luhn
R.A. Fisher W.E.
Demming

Howard
Dresner

Abridged Version of Jeff Hammerbacher’s timeline for CS 194, 2012


WHY NOW?
User Generated (Web &
Mobile)


..
Everything on Internet
Every: Internet of Things / M2M
Click
Ad impression
Billing event
Fast Forward, pause,…
Server request
Transaction
Network message
Fault

What can you do with the data?

How to change “data” to “wisdom”?

Can you give me any examples?


17
What can you do with the data?

Crowdsourcing, sensing and data assimilation

to produce:

From Alex Bayen, UCB 18


Data can be misleading
Misleading statistics:
• “ More than 80% of Dentists recommend Colgate.”
The slogan in question was positioned on an advertising billboard in the U.K., and was
deemed to be in breach of U.K. advertising rules. The claim, which was based on surveys
of dentists and hygienists carried out by the manufacturer, was found to be
misrepresentative as it allowed the participants to select one or more toothpaste
brands….
• In 1948 the Chicago Tribune printed “Dewey defeats Truman”, which
was based on a phone survey prior to the actual election.

At this time phones were not the standard. They were


predominantly available for the upper class.

Source: towardsdatascience.com/
• Wrong data sampling

Abraham Wald was part of the Statistical Research Group


during World War II. The Airforce approached Wald with a
problem: Too many planes are shot down, while simply
increasing the armour overall would make it too heavy. They
asked how they can add armour in an efficient way. Wald
investigated planes returning from war, collecting statistics and
analysing it.

“Gentlemen, you need to put more armour-plate where the holes


aren’t, because that’s where the holes were on the planes that
didn’t return.” — Abraham Wald

Source: towardsdatascience.com/
Need of “Data Analyst” now
• “… the …job in the next 10 years will be
statisticians,” Hal Varian, Google Chief Economist
• New Data Science institutes being created or
repurposed – NYU, Columbia, Washington,
UCB,...
• New degree programs, courses, boot-camps:
• e.g., at Berkeley: Stats, I-School, CS, Astronomy,
Nanjing University (China): College of AI, …
What you learn in this unit
• Data Preparation
• Data Cleaning
• Data Transformation
• Data exploration
• Data blending
• Data mining
• Classification
• Clustering
• Regression
• Association rule mining
• Data visualisation
• Data analytics applications and advanced topics
Data Cleaning

Example 1: Example 2:

Date Water level (m)

Day 1 5

Day 2 6.8

Day 3 9.8

Day 4 0.2

Day 5 1.3

Day 6 3.5

Day 7 4.8

Day 8 7.8

Day 9 0.1

Day 10 2.1
Looking at data (Long-tailed data)
Statistics: Hypothesis Testing
• If you toss a coin 10 times, head comes 9 times.
• Should we conclude it’s not fair, why?
• How sure are we?

Now we toss a coin 4 times, and it comes up heads every time.


• What do we conclude?
Clustering…..
• Given a collection of (unlabelled) objects, find meaningful groups
Classification

Given a training set of labelled objects, learn a decision rule


Applications of data analytics

Book Recommendation Movie Recommendation


How to Analyse Data or Do Data
Science
Process of data analytics using statistics

1 Identify a question or problem

2 Collect relevant data on the topic

3 Analyze the data

4 Form a conclusion
Data mining/KDD Process
Pa ern Evalua on

Data Mining
Quality

Data Transforma on
Knowledge

Data Pre-processing
Pa ern
Data Selec on
Transformed Data
Preprocessed Data

Target Data

Raw
Data
Data Science Process
Identify the
Problem Exploratory
Data
Analysis
Raw data Data Data Mining
collection Preparation Algorithm

Model
Evaluation

Data Data Make


Product Visualisation Decisions
A Simple Data Analysis Exercise…
Let us take the example of English Premier League Soccer:
Aston Villa vs. Sunderland
Sunderland Goals
Predict the outcome:
0 1 2+

Aston Villa Goals


0

2+
Analysis
What kinds of data will you use?
• Almost anything is OK?.
• History: individual or pair-wise?
• Team or players?
• Numerical or text?
• What kind of model will you build?
• What assumptions are safe to make?
About this unit
Learning Activity
• Lecture: 2 hours/week

• Tutorial : 2 hours/week
Prerequisites
• Skill prerequisites
• Enthusiasm in :
• AI
• Machine learning
• Data analytics
• Data mining
Learning Activities: Tutorial Activities
• Practical Work
• Using Splunk for practical data analysis skills

• Splunk:
• https://www.splunk.com
• Why Splunk?
• Great demand in the industry
• Easy to use
• With or without coding skills
Teaching schedule:
Week Topics
Week 1 Introduction
Data and data analytics
Week 2 Data types, data quality and data exploration
Week 3 Data preparation
Week 4 Regression (linear, logistic, non-linear)
Week 5 Classification (KNN, DT, Baysian, SVM)
Week 6 Clustering (K-Mean, DB Scan, Hierarchical)
Week 7 Association rule mining
Week 8 Data visualization 1
Week 9 Data visualization 1
Week 10 Advanced topics in data analytics 1
Week 11 Advanced topics in data analytics 2
Week 12 Advanced topics in data analytics 3
Week 13 Recap
Assessments Items
• Final mark / 100%
- At least 50% of the overall/final mark
• In-semester assessments (50%)
- At least 45% of the total mark for in-semester assessment items
• Formal exams (50%)
- At least 45% of the mark for the formal examination
In-semester: Assessment (50%)
Aim of assignments
To learn the methodologies, tools, techniques,
and researches in data analytics and data mining.

Assessment 1(10%) Assessment 2 (15%) Assessment 3 (25%)

Quiz Practical Part


During tutorial
Due in Week 12
In-semester (1 and 2): Lecture/Tutorial Work
• Short quiz (10%):
• Multiple choice questions based on what we introduced in the lectures
• 5 quizzes in total (2% each)
• Tutorial work (15%):
• Attend and ccomplete the tutorial work and show it to the tutor in the
tutorial: 2% each time
• You can at most get 15% in this assessment
In-semester (3): Project(25%)
• Project– 25%
• Group work
• 3-4 students in each group
• Form your own group, or we will assign you to a group
• Data sets will be posted on MyLo for analysis
• You need to choose one data set.
• You need to give a report (in a research paper format) and implementation of
the data mining
Formal Exam (50%)
• Can be online (TBC)
• Length: 2hrs
• Weighting: 50%
Unit Resources
• Online Learning Material
• Lecture notes, tutorial works and programming resources
• Recommended Book
Han, Jiawei, Micheline
Kamber, and Jian Pei. Data
mining: concepts and
techniques: concepts and
techniques. Elsevier, 2011.

Witten, Ian H., and Eibe


Frank. Data Mining:
Practical machine learning
tools and techniques.
Morgan Kaufmann, 2005.
Expectations
• Attend all classes (Lecture and Tutorial).
• Review all learning materials
• Seek help Anytime if you have a question
• Do not waste your time
• Do not hesitate, PLEASE 
Plagiarism…
• Plagiarism is a form of cheating. It is taking and using someone else's
thoughts, writings or inventions and representing them as your own; for
example, using an author's words without putting them in quotation marks
and citing the source, using an author's ideas without proper
acknowledgment and citation, copying another student's work.
• The intentional copying of someone else's work as one's own is a serious
offence punishable by penalties that may range from a fine or
deduction/cancellation of marks and, in the most serious of cases, to
exclusion from a unit, a course or the University. Details of penalties that can
be imposed are available in the Ordinance of Student Discipline - Part 3
Academic Misconduct, see http://www.utas.edu.au/__data/assets/pdf_file
/0006/23991/ord91.pdf.
Contact
• Consultations (Zoom or physical) every week (tutors + me)
• Out of consultation time,
If you need any help, send e-mail.
• E-mails are used for short communications
• DO NOT HESITATE if you have a question.
AI courses in UTAS

BICT: KIT102: Data Science KIT108: AI KIT306: Data Analytics


- Data science major
- AI major (NEW)
KIT214: Intelligent Web
KIT315: ML & applications
Development

MITS:
KIT606: Data Analytics KIT509: Introduction to AI KIT719: AI and Natural Language
The AI Research Group

Prof. Byeong Kang A/Prof. Quan Bai


• Computer vision
• Knowledge discovery
• NLP
• Distributed AI
• Blockchain
• Brain computer interaction Dr. Son Tran Dr. Saurabh Garg Dr. Ananda Maiti

If you are interested in AI, and want to


involve in related research projects,
Dr. Mira Park Dr. Jimmy Cao
please let me know. Dr. Soonja Yeom
Dr. Shuxiang Xu

You might also like