Clare Corthell: Learning Data Science Online

THE OPEN SOURCE DATA SCIENCE MASTERS
(THE DIY DATA SCIENTIST)
Clare Corthell
Data Scientist at Mattermark
@clarecorthell
www.datasciencemasters.org

Deal Intelligence Platform
interface to live data about private companies

TODAY
• What a Data Scientist does
• Paths to becoming a Data Scientist
• Where to start
• Navigating a path
• Why you should run toward hard things

WHAT DOES A DATA SCIENTIST DO?
Data Scientists turn data into knowledge
by answering the right questions
Which is also predicated on asking
the right questions

HOW DO I BECOME A DATA SCIENTIST?
the answer you don’t want…
There’s no paved road, no one way

PATHS
1. Get a Classic Masters from an accredited University
<Warning> I have yet to see one that’s better than the OSDSM
2. Attend a Bootcamp or Academy
• Zipfian Academy (SF)
• Insight Data Science Fellows (Palo Alto, NYC)
• Data Science Retreat (Berlin)
3. Self-Taught
• The Open Source Data Science Masters

THEORY & APPLICATION
or, why universities haven’t figured this out yet
Universities don’t focus on “Data Science” because it’s tightly
bound to application.
Universities develop theory.
Businesses develop applications.
The two exist symbiotically - they do need each other.
The goals are simply very different.

• Math
• Computing
• Algorithms
• Distributed Computing
• Databases
• Data Mining
• Machine Learning
• Graph Theory
• Natural Language Processing
• Analysis
• Visualization
• Python (language & libraries)
The
Open Source
Data Science
Masters
bit.ly/dsmasters
The internet
helps me curate -
hence Open Source

CLARE’S PATH
Previously Product Designer, front end dev
Transcript bit.ly/corthelldata
6 months of study
Data Scientist &
Machine Learning Developer
at Mattermark
My team builds domain-specific systems
for classification, recommendation, prediction,
crawling, fact extraction, and more
languages
Python
SQL
machine learning
Scikit Learn
data manipulation
Pandas
Numpy
matplotlib
NLTK
design
html/css/js

1. Get a goal
2. Get a plan
3. Get mentorship
4. Get a project

1. Get a goal
What kind of “Data Scientist” do you want to be?
Explore the different roles
Pick something that sparks your interest
Find out what those people do on a daily basis

Rachel Schutt, Doing Data Science

Analyzing the Analyzers, O’Reilly

2. Get a plan
Figure out what skills you need to be minimally effective
Design a Curriculum (fork the OSDSM!)
Plan a schedule of study

3. Get mentorship
Talk to people on twitter
Ask to buy them coffee
(with a specific need or question in hand)
Get informational interviews
(a lost art; they can turn into real interviews, but are low-pressure)

4. Get a question
(make it a small question - don’t set yourself up for failure)
Project Use real-world data to answer a question
Who do iguana owners connect to on twitter?
Work on a real business problem
Help a non-profit* with data they don’t understand
What channels of marketing are working for us?
*Orgs that coordinate working with NGOs: Bayes Impact, DataKind

Let’s talk about where this perfect plan
gets really incredibly difficult
(Let’s start with a tautology)

HARD THINGS ARE HARD
Hard things are hard because there are no easy answers or recipes.
They are hard because your emotions are at odds with your logic.
They are hard because you don’t know the answer and you cannot
ask for help without showing weakness.
Ben Horowitz
The Hard Thing about Hard Things

When something scares you
run like hell right into it.
The hardest things are things people avoid the most.
That’s your marginal advantage.
Maybe that’s why there aren’t enough Data Scientists.
You will figure it out.
It’s about ego management and problem solving.

RUN TOWARD HARD THINGS
Choosing what you want to do
and what to work on
Not knowing everything
Being overwhelmed
Time Management
Math
Coding

Not knowing everything
Being overwhelmed
There are a million things you could learn and work on.
That’s overwhelming. But you can’t afford to get overwhelmed.
You won’t know everything.
It’s impractical and impossible to know everything.
Learn to say “I don’t know.”
FYI Programmers don’t read books.
They reference them as needed.

Time Management
How do I do all of this in a reasonable amount of time?
- You don’t.
- Be rigorous.
Ask yourself:
Will this directly help me achieve my goal?
Refine your goals, focus your work.
Don’t switch tasks.
Focus on one thing at a time.

Why is time management so hard?
We’re used to other people telling us what to do;
Teachers
Managers
Parents

a hint for those new to programming
google
stackoverflow + problem

HUMANS SHOULD BE HUMANS
AND
COMPUTERS SHOULD BE COMPUTERS.
You must code.
Because automation.
And no, there is no shortcut.

YOUR ADVANTAGE
Self-study in Data Science is hard.
But what you spend in energy and commitment
to self-teaching is returned to you in:
• Choice of professional focus
• Respect from potential employers for managing yourself. You
want to work with people who will respect and recognize that.
• Skills that are tough to get from a university or employer
• A path with no gatekeepers - no one will stop you.

1. Learn to code in Python.
2. Take Intro to Data Science (UW)
3. Go get a coffee
4. Ask one question

i ♥ questions
datasciencemasters.org
clare@mattermark.com
@clarecorthell

Clare Corthell: Learning Data Science Online

Recommended

More Related Content

What's hot (20)

Viewers also liked (12)

Similar to Clare Corthell: Learning Data Science Online (20)

Recently uploaded (20)

Clare Corthell: Learning Data Science Online