KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
Lecturer/tutor
Dr. Wenli Yang
• Email: [email protected]
Tutor
Mr. Shiqing Wu
• Email: [email protected]
About me
Dr Quan Bai
• Associate professor, School of ICT, UTAS My research:
• PhD, Wollongong University, Australia AI
• Tasmanian ICT Centre, CSIRO Machine learning
Distributed systems
• Auckland University of Technology (AUT), New Zealand
Example: SKA:
https://www.youtube.com/watch?v=Hog411ZSzEY
Service development
Data mining/KDD
Data processing/
Information overload, preparation
complexity, …
Slide 7
Data Analytics
• Analysis of data is a process of inspecting, cleaning, transforming, and
modeling data with the goal of discovering useful information,
suggesting conclusions, and supporting decision-making.
• Data preparation
• Data mining
• Data visualization
Data Science…is not just data…
Databases Vs Data Science
Detecting outbreaks
two weeks ahead
of CDC data
https://covid19.who.int/?gclid=Cj0KCQjwl4v4BRDaARIsAFjATPm1sdKDfdmHba2exp
kkvjkbQOstKuYhr3fMmK4Mcyq7biJ1CmAf4fIaAlBVEALw_wcB
12
Nate Silver and the 2012 Elections
The unreasonable effectiveness of Deep Learning
2012 Imagenet challenge:
Classify 1 million images into 1000 classes.
https://www.youtube.com/watch?v=kPclKYSd8dw 14
Data Analysis Has Been Around for a While
Peter Luhn
R.A. Fisher W.E.
Demming
Howard
Dresner
…
..
Everything on Internet
Every: Internet of Things / M2M
Click
Ad impression
Billing event
Fast Forward, pause,…
Server request
Transaction
Network message
Fault
…
What can you do with the data?
to produce:
Source: towardsdatascience.com/
• Wrong data sampling
Source: towardsdatascience.com/
Need of “Data Analyst” now
• “… the …job in the next 10 years will be
statisticians,” Hal Varian, Google Chief Economist
• New Data Science institutes being created or
repurposed – NYU, Columbia, Washington,
UCB,...
• New degree programs, courses, boot-camps:
• e.g., at Berkeley: Stats, I-School, CS, Astronomy,
Nanjing University (China): College of AI, …
What you learn in this unit
• Data Preparation
• Data Cleaning
• Data Transformation
• Data exploration
• Data blending
• Data mining
• Classification
• Clustering
• Regression
• Association rule mining
• Data visualisation
• Data analytics applications and advanced topics
Data Cleaning
Example 1: Example 2:
Day 1 5
Day 2 6.8
Day 3 9.8
Day 4 0.2
Day 5 1.3
Day 6 3.5
Day 7 4.8
Day 8 7.8
Day 9 0.1
Day 10 2.1
Looking at data (Long-tailed data)
Statistics: Hypothesis Testing
• If you toss a coin 10 times, head comes 9 times.
• Should we conclude it’s not fair, why?
• How sure are we?
4 Form a conclusion
Data mining/KDD Process
Pa ern Evalua on
Data Mining
Quality
Data Transforma on
Knowledge
Data Pre-processing
Pa ern
Data Selec on
Transformed Data
Preprocessed Data
Target Data
Raw
Data
Data Science Process
Identify the
Problem Exploratory
Data
Analysis
Raw data Data Data Mining
collection Preparation Algorithm
Model
Evaluation
2+
Analysis
What kinds of data will you use?
• Almost anything is OK?.
• History: individual or pair-wise?
• Team or players?
• Numerical or text?
• What kind of model will you build?
• What assumptions are safe to make?
About this unit
Learning Activity
• Lecture: 2 hours/week
• Tutorial : 2 hours/week
Prerequisites
• Skill prerequisites
• Enthusiasm in :
• AI
• Machine learning
• Data analytics
• Data mining
Learning Activities: Tutorial Activities
• Practical Work
• Using Splunk for practical data analysis skills
• Splunk:
• https://www.splunk.com
• Why Splunk?
• Great demand in the industry
• Easy to use
• With or without coding skills
Teaching schedule:
Week Topics
Week 1 Introduction
Data and data analytics
Week 2 Data types, data quality and data exploration
Week 3 Data preparation
Week 4 Regression (linear, logistic, non-linear)
Week 5 Classification (KNN, DT, Baysian, SVM)
Week 6 Clustering (K-Mean, DB Scan, Hierarchical)
Week 7 Association rule mining
Week 8 Data visualization 1
Week 9 Data visualization 1
Week 10 Advanced topics in data analytics 1
Week 11 Advanced topics in data analytics 2
Week 12 Advanced topics in data analytics 3
Week 13 Recap
Assessments Items
• Final mark / 100%
- At least 50% of the overall/final mark
• In-semester assessments (50%)
- At least 45% of the total mark for in-semester assessment items
• Formal exams (50%)
- At least 45% of the mark for the formal examination
In-semester: Assessment (50%)
Aim of assignments
To learn the methodologies, tools, techniques,
and researches in data analytics and data mining.
MITS:
KIT606: Data Analytics KIT509: Introduction to AI KIT719: AI and Natural Language
The AI Research Group