CSC522: Automated Learning and Data Analysis Asynchronous Online Class
CSC522: Automated Learning and Data Analysis Asynchronous Online Class
Teaching assistants(TAs):
1
Required text: (digital version Copies available at the NCSU Bookstore)
Summary
This course provides an introduction to concepts and methods for extracting knowledge or other
useful forms of information from data. This activity, also known under names including data
mining, knowledge discovery in databases, and exploratory data analysis, plays an important role
in modern science, engineering, medicine, business, and government.
Students will learn basic properties of several common types of knowledge implicit in data,
along with formal representations of these types of knowledge and methods for identifying knowl-
edge of these types contained in specific data sets. Students will also learn about the overall process
of data collection and analysis that provides the setting for knowledge discovery, and concomitant
issues of privacy and security. Examples and projects introduce the students to application areas
including electronic commerce, information security, biology, and medicine.
Prerequisites
Students will find introductions to artificial intelligence and database management very helpful,
but these are not required. The key prerequisites consist of basic knowledge of
Logic CSC 226, LOG 201, or equivalent
Probability and statistics ST 370 or equivalent
Linear algebra MA 305 or equivalent
2
Course Outcomes
The aim of this class is to introduce the student to concepts and methods of large-scale automated
data analysis. Upon completion, the student will be able to
• List and explain the major types of data and data representations;
• List and explain the problems arising in preparing data for analysis, and the methods for
addressing these problems;
• List and explain representative applications of automated learning and data analysis;
• List and explain representative benefits and dangers of automated learning and data analysis;
• List and explain the fundamental properties of formulations of knowledge and their use in
evaluating and criticizing formulations;
• List and explain some principal representations of knowledge, and compare their strengths
and weakness for different representational tasks;
• Apply automated data analysis tools to carry out a data analysis plan; and
• Apply automated data analysis techniques to carry out a data analysis plan;
Organization
The coursework consists of lectures, readings, homework assignments, tests, and a term project.
• Some of the lectures will depart from the text, either in content or in order. Some material
will be covered only in lecture; other material will be covered only in assigned readings.
Tests will include material from lecture and readings. Students are responsible for all mate-
rial presented or discussed in lecture.
3
• Readings will generally be taken from the text by from textbook, with possible supplements
from the literature.
• The examinations will consist of a midterm and a final exam. Statements of test objectives
will be provided prior to the examinations to indicate their scope. The midterm will cover
roughly the first half of the course content. The final exam will cover the entire course, but
with an emphasis on the last half of the course content.
• Each student must complete an extended data analysis project. The expectation is that the
project will consist of performing and reporting on a data analysis task of your own interest.
Small collaborations (3-4 team members) will be allowed on the analysis tasks. Each group
must submit a written project report on their analysis. Guidelines for the writing of this
report are presented in a separate document.
Teams must NOT talk with other teams about their work, as the analysis should be done by
each team alone. Unwarranted similarity in individual written reports will constitute a
serious breach of academically with severe penalties.
Alternatively, individuals can seek permission to do individual projects that analyze data
sets of special interest to the student. Students interested in such projects should, as early
as possible in the semester, submit a brief proposal to the instructor describing the problem,
data source and character, and results sought. Individual project reports are required in this
case as well.
Computation
Project analyses will require the student to use python, but the course will not teach the use
of software packages in detail.
Resources
Supplementary texts:
– The Elements of Statistical Learning: Data Mining, Inference, and Prediction, by
Trevor Hastie, Robert Tibshirani, and Jerome Friedman
New York: Springer-Verlag, 2001. Available free online.
– Data Mining: Practical Machine Learning Tools and Techniques with Java Implemen-
tations, second edition, by Ian H. Witten and Eibe Frank
San Francisco: Morgan Kaufmann Publishers, 2005.
– Machine Learning, by Tom Mitchell
New York: McGraw Hill, 1997, ISBN 0070428077.
– Artificial Intelligence: A Modern Approach, second edition, by Stuart J. Russell and
Peter Norvig. New York: Prentice Hall, 2003
4
Journals:
– Data Mining and Knowledge Discovery
http://www.kluweronline.com/issn/1384-5810
– Journal of Machine Learning Research
http://www.jmlr.org/
– Several IEEE Transaction journals
Web resources:
– UCI Knowledge Discovery in Databases Archive
http://kdd.ics.uci.edu/
– KDnuggets: Data Mining, Web Mining & Knowledge Discovery News, Consulting
and Recruiting
http://www.kdnuggets.com/
– Kaggle: Kaggle is a platform for data prediction competitions.
http://www.kaggle.com/
Privacy
Do not include student ID numbers on papers or tests unless specifically instructed to do so
by the instructor.
Grading
Participation will be an important part of the class. It is expected that you will attend all
classes and read all relevant portions of the text and any assigned readings.
You will be assigned homework regularly. For each homework, you are strongly encour-
aged to collaborate in teams of 2-3. Each team must submit exactly ONE final home-
work report which clearly identifies team members. Alternatively, individuals can seek
permission to do homework individually. Teams should NOT talk with other teams about
their homework. Unwarranted similarity in individual written homeworks will constitute a
serious breach of academically with severe penalties.
Homework must be submitted via Moodle and the official deadline for online homework
submissions is 11:45PM on a specific date. Generally speaking, late submissions will NOT
be allowed unless the absences excused according to the University attendance policy.
You will be allowed 2 total late days without penalty for the entire semester. You may be
late by 1 day on two different submission or late by 2 days on one submission. Week-
ends and holidays are also counted as late days. Late submissions are automatically
considered as using late days. Once those days are used, you will be penalized accord-
ing to the following policy:
5
– Submission is worth full credit before the deadline.
– It is worth 0.75 credit for the next 24 hours.
– It is worth 0.5 credit for the next 48 hours.
– It is worth zero credit after that.
Incomplete grades will not be assigned except in cases of absences excused according to the
University attendance policy.
The possibility of changing a homework and exam grade can be discussed with the TAs
(first) or me (second); initiating this discussion must occur within one week of the homework
having been returned. After that point, your homework grade is considered final.
There will be a midterm and a final exam. The date and time of the final exam is listed in the
schedule.
Missed Exams – You must take the exam on the day it is given with the class for which you
are registered. If you miss an exam you will NOT be allowed a make-up without a docu-
mented and verifiable medical excuse or demonstration that a family emergency prevented
you from attending. Please be advised that the student health center is no longer able or
willing to provide such documentation therefore documentation from the health center is
likely to be inadequate to allow a make-up exam. While this policy may seem quite strict
it is necessary in order to be fair to all and in order that the TAs and I may spend our time
helping you learn the course material rather than engaged in endless amounts of bureau-
cratic and administrative paper work. If you will be out of town representing the University
on an academic, athletic, or student organization trip please speak with me about taking the
exam before you leave for your event. I am glad to make arrangements for you. If an exam
conflicts with a religious observance please see me as soon as possible to make alternative
arrangements.
Clarity of writing and organization forms an important factor in grading of homework and
examinations. Unclear writing can suggest a lack of understanding of the material. Ex-
amples, figures, tables, and results should be accompanied by clear explanations of what
lessons the reader should take away from them. Students should avoid writing in terms of
bullet lists, which usually reflect a lack of thought about how to express the material. Make
sure the paragraphs of the writing clearly express the main points, supporting arguments,
and connections between the main points.
Grades will be assigned based on a weighted combination of performance on different course
activities, according to the scheme
A+ = 97 - 100 C+ = 77 - 79.9
A = 93 - 96.9 C = 73 - 76.9
A- = 90 - 92.9 C- = 70 - 72.9
B+ = 87 - 89.9 D+ = 67 - 69.9
B = 83 - 86.9 D = 63 - 66.9
B- = 80 - 82.9 D- = 60 - 62.9
F = 59 and below
6
with the weighting given by
Homework 20%
Midterm 17%
Final 30%
Project Reports 23%
Project Presentations 10%
Attendance
All students are responsible for all material or instructions introduced in class, which may
include, but are not limited to, course material from textbooks, material from other sources
(including from outside of the available slides), changes to the schedule, etc. The instructor
and TAs cannot cover missed material with students who were absent, and much of the mate-
rial covered in class is not available online. The university policy for attendance regulations
(REG 02.20.03) can be accessed online at http://policies.ncsu.edu/regulation/reg-02-20-03.
Academic integrity
This course follows the University policy on academic integrity found in the Code of Student
Conduct (POL11.35.01), available at
http://policies.ncsu.edu/policy/pol-11-35-01 and the Honor Pledge.
A student shall be guilty of a violation of academic integrity if he or she:
Violations will be reported to the Office of Student Conduct, which may impose penalties
beyond those recommended by the instructor.
Honor Pledge
Your name on any test or assignment or the electronic submission of an assignment through
Moodle or other class courseware system indicates ”I have neither given nor received unau-
thorized aid on this test or assignment.”
7
Policy on Class Disruption
This class like College as a whole is an at-will activity. You and your classmates are
in class to learn the material and to participate. Willful disruption of the lecture or
other similar actions that impede the progress of your peers will result in deductions to
your total grade. Disruptive students will receive one warning before any deductions
take place. For any persistent or repeat activities points will be deducted from your
grade at the discretion of the course and/or lab instructors and may also result in a re-
port to the NCSU Office of Student Conduct. Please see POL 11.35.01 available at
http://policies.ncsu.edu/policy/pol-11-35-01 - Code of Student Con-
duct for more details on University policies on classroom conduct.
Class evaluations
Online class evaluations will be available for students to complete during the last two
weeks of the semester. Students will receive an email message directing them to a web-
site (classeval.ncsu.edu) where they can login using their Unity ID and complete
evaluations. All evaluations are confidential; instructors will never know how any one stu-
dent responded to any question, and students will never know the ratings for any particular
instructors. The student help desk can be reached at [email protected]. More infor-
mation about evaluations is available at http://www.ncsu.edu/UPA/classeval/.
8
COVID-19 Related
Due to the Coronavirus pandemic, public health measures have been implemented across
campus. Students should stay current with these practices and expectations through the
Protect the Pack website (https://www.ncsu.edu/coronavirus/). The sections below provide
expectations and conduct related to COVID-19 issues.
9
0.3 Community Standards related to COVID-19
We are all responsible for protecting ourselves and our community. Please see the com-
munity expectations and Rule 04.21.01 regarding Personal Safety Requirements Related to
COVID-19 https://policies.ncsu.edu/rule/rul-04-21-01/
Course Expectations Related to COVID-19:
10
0.5 Grading/Scheduling Changing Options Related to COVID-19
If the delivery mode has a negative impact on your academic performance in this course, the
university has provided tools to potentially reduce the impact:
In some cases, another option may be to request an incomplete in the course. Before using
any of these tools, discuss the options with your instructor and your academic advisor. Be
aware that if you use the enhanced S/U, you will still need to complete the course and receive
at least a C- to pass the course.
11