SlideShare a Scribd company logo
THE OPEN SOURCE DATA SCIENCE MASTERS 
(THE DIY DATA SCIENTIST) 
Clare Corthell 
Data Scientist at Mattermark 
@clarecorthell 
www.datasciencemasters.org
Deal Intelligence Platform 
interface to live data about private companies
TODAY 
• What a Data Scientist does 
• Paths to becoming a Data Scientist 
• Where to start 
• Navigating a path 
• Why you should run toward hard things
WHAT DOES A DATA SCIENTIST DO? 
Data Scientists turn data into knowledge 
by answering the right questions 
Which is also predicated on asking 
the right questions
HOW DO I BECOME A DATA SCIENTIST? 
the answer you don’t want… 
There’s no paved road, no one way
PATHS 
1. Get a Classic Masters from an accredited University 
<Warning> I have yet to see one that’s better than the OSDSM 
2. Attend a Bootcamp or Academy 
• Zipfian Academy (SF) 
• Insight Data Science Fellows (Palo Alto, NYC) 
• Data Science Retreat (Berlin) 
3. Self-Taught 
• The Open Source Data Science Masters
THEORY & APPLICATION 
or, why universities haven’t figured this out yet 
Universities don’t focus on “Data Science” because it’s tightly 
bound to application. 
Universities develop theory. 
Businesses develop applications. 
The two exist symbiotically - they do need each other. 
The goals are simply very different.
• Math 
• Computing 
• Algorithms 
• Distributed Computing 
• Databases 
• Data Mining 
• Machine Learning 
• Graph Theory 
• Natural Language Processing 
• Analysis 
• Visualization 
• Python (language & libraries) 
The 
Open Source 
Data Science 
Masters 
bit.ly/dsmasters 
The internet 
helps me curate - 
hence Open Source
(that’s alot)
CLARE’S PATH 
Previously Product Designer, front end dev 
Transcript bit.ly/corthelldata 
6 months of study 
Data Scientist & 
Machine Learning Developer 
at Mattermark 
My team builds domain-specific systems 
for classification, recommendation, prediction, 
crawling, fact extraction, and more 
languages 
Python 
SQL 
machine learning 
Scikit Learn 
data manipulation 
Pandas 
Numpy 
matplotlib 
NLTK 
design 
html/css/js
1. Get a goal 
2. Get a plan 
3. Get mentorship 
4. Get a project
1. Get a goal 
What kind of “Data Scientist” do you want to be? 
Explore the different roles 
Pick something that sparks your interest 
Find out what those people do on a daily basis
Rachel Schutt, Doing Data Science
Analyzing the Analyzers, O’Reilly
2. Get a plan 
Figure out what skills you need to be minimally effective 
Design a Curriculum (fork the OSDSM!) 
Plan a schedule of study
Dave Holtz 
Airbnb
3. Get mentorship 
Talk to people on twitter 
Ask to buy them coffee 
(with a specific need or question in hand) 
Get informational interviews 
(a lost art; they can turn into real interviews, but are low-pressure)
4. Get a question 
(make it a small question - don’t set yourself up for failure) 
Project Use real-world data to answer a question 
Who do iguana owners connect to on twitter? 
Work on a real business problem 
Help a non-profit* with data they don’t understand 
What channels of marketing are working for us? 
*Orgs that coordinate working with NGOs: Bayes Impact, DataKind
Let’s talk about where this perfect plan 
gets really incredibly difficult 
(Let’s start with a tautology)
HARD THINGS ARE HARD 
Hard things are hard because there are no easy answers or recipes. 
They are hard because your emotions are at odds with your logic. 
They are hard because you don’t know the answer and you cannot 
ask for help without showing weakness. 
Ben Horowitz 
The Hard Thing about Hard Things
When something scares you 
run like hell right into it. 
The hardest things are things people avoid the most. 
That’s your marginal advantage. 
Maybe that’s why there aren’t enough Data Scientists. 
You will figure it out. 
It’s about ego management and problem solving.
RUN TOWARD HARD THINGS 
Choosing what you want to do 
and what to work on 
Not knowing everything 
Being overwhelmed 
Time Management 
Math 
Coding
Not knowing everything 
Being overwhelmed 
There are a million things you could learn and work on. 
That’s overwhelming. But you can’t afford to get overwhelmed. 
You won’t know everything. 
It’s impractical and impossible to know everything. 
Learn to say “I don’t know.” 
FYI Programmers don’t read books. 
They reference them as needed.
Time Management 
How do I do all of this in a reasonable amount of time? 
- You don’t. 
- Be rigorous. 
Ask yourself: 
Will this directly help me achieve my goal? 
Refine your goals, focus your work. 
Don’t switch tasks. 
Focus on one thing at a time.
Why is time management so hard? 
We’re used to other people telling us what to do; 
Teachers 
Managers 
Parents
CODING IS HARD.
a hint for those new to programming 
google 
stackoverflow + problem
why code?
HUMANS SHOULD BE HUMANS 
AND 
COMPUTERS SHOULD BE COMPUTERS. 
You must code. 
Because automation. 
And no, there is no shortcut.
YOUR ADVANTAGE 
Self-study in Data Science is hard. 
But what you spend in energy and commitment 
to self-teaching is returned to you in: 
• Choice of professional focus 
• Respect from potential employers for managing yourself. You 
want to work with people who will respect and recognize that. 
• Skills that are tough to get from a university or employer 
• A path with no gatekeepers - no one will stop you.
Take the first step.
1. Learn to code in Python. 
2. Take Intro to Data Science (UW) 
3. Go get a coffee 
4. Ask one question
i ♥ questions 
datasciencemasters.org 
clare@mattermark.com 
@clarecorthell
Ad

More Related Content

What's hot (20)

Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
Melanie Manning, CFA
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data Science
Arc & Codementor
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
Sean Taylor
 
How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist
Manjunath Sindagi
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
Nik Spirin
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Edureka!
 
BioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discoveryBioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discovery
Fernanda Foertter
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
How to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data ScientistHow to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data Scientist
Inside Analysis
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine Learning
Domino Data Lab
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introduction
Dinesh K
 
Life of a data scientist (pub)
Life of a data scientist (pub)Life of a data scientist (pub)
Life of a data scientist (pub)
Buhwan Jeong
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
Alexey Grigorev
 
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.ioData Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
David Coallier
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slides
CloudxLab
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
Corey Chivers
 
Mid-term presentation.pdf
Mid-term presentation.pdfMid-term presentation.pdf
Mid-term presentation.pdf
ZixunZhou
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
ShapeMySkills Pvt Ltd
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
Melanie Manning, CFA
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
Introduction to Python for Data Science
Introduction to Python for Data ScienceIntroduction to Python for Data Science
Introduction to Python for Data Science
Arc & Codementor
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
Sean Taylor
 
How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist
Manjunath Sindagi
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
Nik Spirin
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Edureka!
 
BioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discoveryBioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discovery
Fernanda Foertter
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
How to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data ScientistHow to Identify, Train or Become a Data Scientist
How to Identify, Train or Become a Data Scientist
Inside Analysis
 
Data Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine LearningData Science Popup Austin: Privilege and Supervised Machine Learning
Data Science Popup Austin: Privilege and Supervised Machine Learning
Domino Data Lab
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introduction
Dinesh K
 
Life of a data scientist (pub)
Life of a data scientist (pub)Life of a data scientist (pub)
Life of a data scientist (pub)
Buhwan Jeong
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
Alexey Grigorev
 
Data Science at Scale @ barricade.io
Data Science at Scale @ barricade.ioData Science at Scale @ barricade.io
Data Science at Scale @ barricade.io
David Coallier
 
Big data and AI presentation slides
Big data and AI presentation slidesBig data and AI presentation slides
Big data and AI presentation slides
CloudxLab
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
Corey Chivers
 
Mid-term presentation.pdf
Mid-term presentation.pdfMid-term presentation.pdf
Mid-term presentation.pdf
ZixunZhou
 

Viewers also liked (12)

Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
Edureka!
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
Edureka!
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligence
Edureka!
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
Edureka!
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
Edureka!
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
Edureka!
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
Edureka!
 
Top 5 algorithms used in Data Science
Top 5 algorithms used in Data ScienceTop 5 algorithms used in Data Science
Top 5 algorithms used in Data Science
Edureka!
 
Health care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cureHealth care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cure
Edureka!
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Edureka!
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!Is Data Scientist still the sexiest job of 21st century? Find Out!
Is Data Scientist still the sexiest job of 21st century? Find Out!
Edureka!
 
Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala Big Data Processing with Spark and Scala
Big Data Processing with Spark and Scala
Edureka!
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligence
Edureka!
 
Spark for big data analytics
Spark for big data analyticsSpark for big data analytics
Spark for big data analytics
Edureka!
 
Power of Python with Big Data
Power of Python with Big DataPower of Python with Big Data
Power of Python with Big Data
Edureka!
 
R and Visualization: A match made in Heaven
R and Visualization: A match made in HeavenR and Visualization: A match made in Heaven
R and Visualization: A match made in Heaven
Edureka!
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
Edureka!
 
Top 5 algorithms used in Data Science
Top 5 algorithms used in Data ScienceTop 5 algorithms used in Data Science
Top 5 algorithms used in Data Science
Edureka!
 
Health care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cureHealth care and big data with hadoop – Beacuse prevention is better than cure
Health care and big data with hadoop – Beacuse prevention is better than cure
Edureka!
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Machine Learning In Python | Python Machine Learning Tutorial | Deep Learning...
Edureka!
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
Ad

Similar to Clare Corthell: Learning Data Science Online (20)

The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
Lakshmi Prasanna
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
Adwait Bhave
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical Interview
Anidata
 
Building a Data Science Portfolio that Rocks
Building a Data Science Portfolio that RocksBuilding a Data Science Portfolio that Rocks
Building a Data Science Portfolio that Rocks
Michael Galarnyk
 
The top mistakes you're making in your Data Science interview - Omri Allouche
The top mistakes you're making in your Data Science interview - Omri AlloucheThe top mistakes you're making in your Data Science interview - Omri Allouche
The top mistakes you're making in your Data Science interview - Omri Allouche
Omri Allouche
 
How Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask GoogleHow Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask Google
prateek kumar
 
Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshop
CosmoAIMS Bassett
 
Four Short Foibles of Organizational Data
Four Short Foibles of Organizational DataFour Short Foibles of Organizational Data
Four Short Foibles of Organizational Data
Lars von Sneidern
 
Starting a career in data science
Starting a career in data scienceStarting a career in data science
Starting a career in data science
Brian Spiering
 
PPT
PPTPPT
PPT
Ráví Shânkær
 
Data science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsData science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientists
Jose Quesada
 
Cheif product developer scientist
Cheif product developer scientistCheif product developer scientist
Cheif product developer scientist
Twikki.Com
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Watershed
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
Mandar Parikh
 
Learn Learning + Prototype Testing
Learn Learning + Prototype TestingLearn Learning + Prototype Testing
Learn Learning + Prototype Testing
Dave Hora
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career Internals
Brent Ozar
 
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Jack Pringle
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Data Science: lesson01_intro-to-ds-and-ml.pdf
Data Science: lesson01_intro-to-ds-and-ml.pdfData Science: lesson01_intro-to-ds-and-ml.pdf
Data Science: lesson01_intro-to-ds-and-ml.pdf
alhashediyemen
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
Lakshmi Prasanna
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
Adwait Bhave
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical Interview
Anidata
 
Building a Data Science Portfolio that Rocks
Building a Data Science Portfolio that RocksBuilding a Data Science Portfolio that Rocks
Building a Data Science Portfolio that Rocks
Michael Galarnyk
 
The top mistakes you're making in your Data Science interview - Omri Allouche
The top mistakes you're making in your Data Science interview - Omri AlloucheThe top mistakes you're making in your Data Science interview - Omri Allouche
The top mistakes you're making in your Data Science interview - Omri Allouche
Omri Allouche
 
How Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask GoogleHow Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask Google
prateek kumar
 
Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshop
CosmoAIMS Bassett
 
Four Short Foibles of Organizational Data
Four Short Foibles of Organizational DataFour Short Foibles of Organizational Data
Four Short Foibles of Organizational Data
Lars von Sneidern
 
Starting a career in data science
Starting a career in data scienceStarting a career in data science
Starting a career in data science
Brian Spiering
 
Data science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientistsData science-retreat-how it works plus advice for upcoming data scientists
Data science-retreat-how it works plus advice for upcoming data scientists
Jose Quesada
 
Cheif product developer scientist
Cheif product developer scientistCheif product developer scientist
Cheif product developer scientist
Twikki.Com
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Watershed
 
Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
Mandar Parikh
 
Learn Learning + Prototype Testing
Learn Learning + Prototype TestingLearn Learning + Prototype Testing
Learn Learning + Prototype Testing
Dave Hora
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Vivian S. Zhang
 
500-Level Guide to Career Internals
500-Level Guide to Career Internals500-Level Guide to Career Internals
500-Level Guide to Career Internals
Brent Ozar
 
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Bit by Bit: Effective Use of People, Processes and Computer Technology in the...
Jack Pringle
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Data Science: lesson01_intro-to-ds-and-ml.pdf
Data Science: lesson01_intro-to-ds-and-ml.pdfData Science: lesson01_intro-to-ds-and-ml.pdf
Data Science: lesson01_intro-to-ds-and-ml.pdf
alhashediyemen
 
Ad

Recently uploaded (20)

Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
VKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptxVKS-Python Basics for Beginners and advance.pptx
VKS-Python Basics for Beginners and advance.pptx
Vinod Srivastava
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...Thingyan is now a global treasure! See how people around the world are search...
Thingyan is now a global treasure! See how people around the world are search...
Pixellion
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Data Analytics Overview and its applications
Data Analytics Overview and its applicationsData Analytics Overview and its applications
Data Analytics Overview and its applications
JanmejayaMishra7
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 

Clare Corthell: Learning Data Science Online

  • 1. THE OPEN SOURCE DATA SCIENCE MASTERS (THE DIY DATA SCIENTIST) Clare Corthell Data Scientist at Mattermark @clarecorthell www.datasciencemasters.org
  • 2. Deal Intelligence Platform interface to live data about private companies
  • 3. TODAY • What a Data Scientist does • Paths to becoming a Data Scientist • Where to start • Navigating a path • Why you should run toward hard things
  • 4. WHAT DOES A DATA SCIENTIST DO? Data Scientists turn data into knowledge by answering the right questions Which is also predicated on asking the right questions
  • 5. HOW DO I BECOME A DATA SCIENTIST? the answer you don’t want… There’s no paved road, no one way
  • 6. PATHS 1. Get a Classic Masters from an accredited University <Warning> I have yet to see one that’s better than the OSDSM 2. Attend a Bootcamp or Academy • Zipfian Academy (SF) • Insight Data Science Fellows (Palo Alto, NYC) • Data Science Retreat (Berlin) 3. Self-Taught • The Open Source Data Science Masters
  • 7. THEORY & APPLICATION or, why universities haven’t figured this out yet Universities don’t focus on “Data Science” because it’s tightly bound to application. Universities develop theory. Businesses develop applications. The two exist symbiotically - they do need each other. The goals are simply very different.
  • 8. • Math • Computing • Algorithms • Distributed Computing • Databases • Data Mining • Machine Learning • Graph Theory • Natural Language Processing • Analysis • Visualization • Python (language & libraries) The Open Source Data Science Masters bit.ly/dsmasters The internet helps me curate - hence Open Source
  • 10. CLARE’S PATH Previously Product Designer, front end dev Transcript bit.ly/corthelldata 6 months of study Data Scientist & Machine Learning Developer at Mattermark My team builds domain-specific systems for classification, recommendation, prediction, crawling, fact extraction, and more languages Python SQL machine learning Scikit Learn data manipulation Pandas Numpy matplotlib NLTK design html/css/js
  • 11. 1. Get a goal 2. Get a plan 3. Get mentorship 4. Get a project
  • 12. 1. Get a goal What kind of “Data Scientist” do you want to be? Explore the different roles Pick something that sparks your interest Find out what those people do on a daily basis
  • 13. Rachel Schutt, Doing Data Science
  • 15. 2. Get a plan Figure out what skills you need to be minimally effective Design a Curriculum (fork the OSDSM!) Plan a schedule of study
  • 17. 3. Get mentorship Talk to people on twitter Ask to buy them coffee (with a specific need or question in hand) Get informational interviews (a lost art; they can turn into real interviews, but are low-pressure)
  • 18. 4. Get a question (make it a small question - don’t set yourself up for failure) Project Use real-world data to answer a question Who do iguana owners connect to on twitter? Work on a real business problem Help a non-profit* with data they don’t understand What channels of marketing are working for us? *Orgs that coordinate working with NGOs: Bayes Impact, DataKind
  • 19. Let’s talk about where this perfect plan gets really incredibly difficult (Let’s start with a tautology)
  • 20. HARD THINGS ARE HARD Hard things are hard because there are no easy answers or recipes. They are hard because your emotions are at odds with your logic. They are hard because you don’t know the answer and you cannot ask for help without showing weakness. Ben Horowitz The Hard Thing about Hard Things
  • 21. When something scares you run like hell right into it. The hardest things are things people avoid the most. That’s your marginal advantage. Maybe that’s why there aren’t enough Data Scientists. You will figure it out. It’s about ego management and problem solving.
  • 22. RUN TOWARD HARD THINGS Choosing what you want to do and what to work on Not knowing everything Being overwhelmed Time Management Math Coding
  • 23. Not knowing everything Being overwhelmed There are a million things you could learn and work on. That’s overwhelming. But you can’t afford to get overwhelmed. You won’t know everything. It’s impractical and impossible to know everything. Learn to say “I don’t know.” FYI Programmers don’t read books. They reference them as needed.
  • 24. Time Management How do I do all of this in a reasonable amount of time? - You don’t. - Be rigorous. Ask yourself: Will this directly help me achieve my goal? Refine your goals, focus your work. Don’t switch tasks. Focus on one thing at a time.
  • 25. Why is time management so hard? We’re used to other people telling us what to do; Teachers Managers Parents
  • 27. a hint for those new to programming google stackoverflow + problem
  • 29. HUMANS SHOULD BE HUMANS AND COMPUTERS SHOULD BE COMPUTERS. You must code. Because automation. And no, there is no shortcut.
  • 30. YOUR ADVANTAGE Self-study in Data Science is hard. But what you spend in energy and commitment to self-teaching is returned to you in: • Choice of professional focus • Respect from potential employers for managing yourself. You want to work with people who will respect and recognize that. • Skills that are tough to get from a university or employer • A path with no gatekeepers - no one will stop you.
  • 31. Take the first step.
  • 32. 1. Learn to code in Python. 2. Take Intro to Data Science (UW) 3. Go get a coffee 4. Ask one question
  • 33. i ♥ questions datasciencemasters.org [email protected] @clarecorthell