Syllabus
Syllabus
What lies beyond the Jupyter notebook? How can we elevate code from concept to production? What
happens when scikit-learn isn’t enough? Will that last script die as a one-off or perform just as well for
the next 10,000 inputs? The last decade has seen an amazing commoditization of cloud computing and
scientific development tools that make it a truly glorious time to be a data scientist, yet the increasing
ease-of-use can paradoxically hinder the development of more sophisticated tools if the scientist relies
too heavily on magics and never opens the hood to explore how things really work. In this course, we
explore the next level of fundamentals that make a difference for truly impactful data science teams in
real organizations using complex data. Key topics include formal collaboration techniques, testing,
continuous integration and deployment, repeatable and intuitive workflows with directed graphs,
recurring themes in practical algorithms, meta-programming and glue, performance optimization, and
an emphasis on practical integration with tools in the broader data science ecosystem such as GitHub,
Docker, Amazon Web Services, and Hadoop.
Key Info
The course will focus on a number of subjects highly relevant to the modern data scientist or engineer.
Broadly, the semester will be divided into the following sections:
1. Workflows: formal tools and methods for capturing work output in a repeatable, testable, and
collaborative manner.
2. Skeletons: establishing the backbone of your project. We’ll examine frameworks and techniques
that establish a common structure across projects or enable quickly tapping into more powerful
and scalable functionality. No project should reinvent the wheel, and a large determiner of
success is choosing the right framework to build upon.
3. Data: how to store and think about data from a serialization and container perspective. Is your
data columnar, slowly evolving, or unstructured? Should it be partitioned or sorted in a key-value
store? How can it be appropriately cached or memoized? Should you optimize for read, write, or
both?
4. Algorithms: every project is different, but some core techniques prove useful time and time
again. From code optimization and compilation options to the magic of pseudorandom hashes,
we’ll explore some key fundamentals that enable wholly new approaches to old problems.
Assessment
Grades for the course will be determined in part from the following activities:
• Weekly graded assignments (PSET 0 is not graded but is mandatory). Assignments will include
code and data submission and may include an automatic grading component.
• Graduate Students may be required to solve additional, more challenging problems on assign-
ments.
Accessibility
The Extension School is committed to providing an accessible academic community. The Accessibility
Office offers a variety of accommodations and services to students with documented disabilities.
Please visit www.extension.harvard.edu/resources-policies/resources/disability-services-accessibility
for more information.
Academic Integrity
You are responsible for understanding Harvard Extension School policies on academic integrity
(www.extension.harvard.edu/resources-policies/student-conduct/academic-integrity) and how to
use sources responsibly. Not knowing the rules, misunderstanding the rules, running out of time,
submitting the wrong draft, or being overwhelmed with multiple demands are not acceptable
excuses. There are no excuses for failure to uphold academic integrity. To support your learning
about academic citation rules, please visit the Harvard Extension School Tips to Avoid Plagiarism
(www.extension.harvard.edu/resources-policies/resources/tips-avoid-plagiarism), where you’ll find
links to the Harvard Guide to Using Sources and two free online 15-minute tutorials to test your
knowledge of academic citation policy. The tutorials are anonymous open-learning tools.
Homework Assignments
All homework assignments, quizzes and exams must be your independent work. We encourage discus-
sion of concepts, solving issues, and asking general questions on our course forum. We take academic
integrity very seriously. Programmers learn how to overcome problems via popular internet resources
frequently. Please ensure that you include citations in your submission if you used any external re-
source.
Late Policy
We realize that sometimes due to unforeseen circumstances you may be unable to meet a homework
deadline. Therefore, we are giving you a credit of 5 extra days that you may use throughout the duration
of the course. You must inform your TF if you are planning to use credit for your homework. You can
apply a maximum of 3 credits for any homework. 10% of your assignment grade will be subtracted for
any additional late day. Homework will not be accepted after solutions are posted on Canvas. Extra
days credit cannot be used for PSET 0, the last PSET, or the final graduate project.
Class Participation
Class participation is evaluated as activity on Piazza. Students that are contributors and are helpful to
their colleagues will be noticed and rewarded. Piazza is an excellent resource for collaborative learning.
Graduate students will be required to complete a small independent project which contributes to the
open source community, advances a project of their choosing, or otherwise adds to the collective
learning experience of the class. The topic and scope of the project should be established early in the
semester, and must include a code deliverable and presentation to the class.
Grading
Undergraduate Graduate
Accessibility
The Extension School is committed to providing an accessible academic community. The Accessibility
Office offers a variety of accommodations and services to students with documented disabilities. More
Information
We will be using a course management web service called Canvas for all course communication. Please
ensure that you get a Harvard e-mail account and access to Canvas. More importantly, it is critical
that you check the e-mail registered with Canvas, monitor course announcements and participate in
discussions on Piazza (our forum).
Detailed Syllabus