0% found this document useful (0 votes)
190 views

(Skiena, 2017) - Book - The Data Science Design Manual - 2

This chapter introduces data science and how it differs from computer science and traditional programming. It explains that data scientists must think like real scientists by focusing on understanding data rather than just algorithms. Real scientists are data-driven and care about results and discovery, while computer scientists are often more focused on methods and producing numbers efficiently. The chapter aims to explain how data scientists think and ask broader questions of data to find meaningful answers.

Uploaded by

Antero
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views

(Skiena, 2017) - Book - The Data Science Design Manual - 2

This chapter introduces data science and how it differs from computer science and traditional programming. It explains that data scientists must think like real scientists by focusing on understanding data rather than just algorithms. Real scientists are data-driven and care about results and discovery, while computer scientists are often more focused on methods and producing numbers efficiently. The chapter aims to explain how data scientists think and ask broader questions of data to find meaningful answers.

Uploaded by

Antero
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

2 CHAPTER 1. WHAT IS DATA SCIENCE?

This introductory chapter has three missions. First, I will try to explain how
good data scientists think, and how this differs from the mindset of traditional
programmers and software developers. Second, we will look at data sets in terms
of the potential for what they can be used for, and learn to ask the broader
questions they are capable of answering. Finally, I introduce a collection of
data analysis challenges that will be used throughout this book as motivating
examples.

1.1 Computer Science, Data Science, and Real


Science
Computer scientists, by nature, don’t respect data. They have traditionally
been taught that the algorithm was the thing, and that data was just meat to
be passed through a sausage grinder.
So to qualify as an effective data scientist, you must first learn to think like
a real scientist. Real scientists strive to understand the natural world, which
is a complicated and messy place. By contrast, computer scientists tend to
build their own clean and organized virtual worlds and live comfortably within
them. Scientists obsess about discovering things, while computer scientists in-
vent rather than discover.
People’s mindsets strongly color how they think and act, causing misunder-
standings when we try to communicate outside our tribes. So fundamental are
these biases that we are often unaware we have them. Examples of the cultural
differences between computer science and real science include:

• Data vs. method centrism: Scientists are data driven, while computer
scientists are algorithm driven. Real scientists spend enormous amounts
of effort collecting data to answer their question of interest. They invent
fancy measuring devices, stay up all night tending to experiments, and
devote most of their thinking to how to get the data they need.
By contrast, computer scientists obsess about methods: which algorithm
is better than which other algorithm, which programming language is best
for a job, which program is better than which other program. The details
of the data set they are working on seem comparably unexciting.

• Concern about results: Real scientists care about answers. They analyze
data to discover something about how the world works. Good scientists
care about whether the results make sense, because they care about what
the answers mean.
By contrast, bad computer scientists worry about producing plausible-
looking numbers. As soon as the numbers stop looking grossly wrong,
they are presumed to be right. This is because they are personally less
invested in what can be learned from a computation, as opposed to getting
it done quickly and efficiently.

You might also like