100% found this document useful (1 vote)
26 views

Data Science Fundamentals with R Python and Open Data 1st Edition Marco Cremonini - Get instant access to the full ebook content

The document promotes the ebook 'Data Science Fundamentals with R, Python, and Open Data' by Marco Cremonini, which aims to introduce data science concepts to a broad audience, including non-specialists from various fields. It emphasizes that anyone, regardless of their primary profession, can learn the fundamentals of data science and apply it in their respective domains. The text also provides links to download additional recommended ebooks related to data science and its applications.

Uploaded by

shanahphepsy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
26 views

Data Science Fundamentals with R Python and Open Data 1st Edition Marco Cremonini - Get instant access to the full ebook content

The document promotes the ebook 'Data Science Fundamentals with R, Python, and Open Data' by Marco Cremonini, which aims to introduce data science concepts to a broad audience, including non-specialists from various fields. It emphasizes that anyone, regardless of their primary profession, can learn the fundamentals of data science and apply it in their respective domains. The text also provides links to download additional recommended ebooks related to data science and its applications.

Uploaded by

shanahphepsy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Visit https://ebookultra.

com to download the full version and


explore more ebooks or textbooks

Data Science Fundamentals with R Python and Open


Data 1st Edition Marco Cremonini

_____ Click the link below to download _____


https://ebookultra.com/download/data-science-fundamentals-
with-r-python-and-open-data-1st-edition-marco-cremonini/

Explore and download more ebooks or textbooks at ebookultra.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Data Driven SEO with Python Solve SEO Challenges with Data
Science Using Python 4th Edition Andreas Voniatis

https://ebookultra.com/download/data-driven-seo-with-python-solve-seo-
challenges-with-data-science-using-python-4th-edition-andreas-
voniatis/

Geographic Data Science with Python 1st Edition Sergio Rey

https://ebookultra.com/download/geographic-data-science-with-
python-1st-edition-sergio-rey/

Big data open data and data development 1st Edition Monino

https://ebookultra.com/download/big-data-open-data-and-data-
development-1st-edition-monino/

Practical Data Science with R 1st Edition Nina Zumel

https://ebookultra.com/download/practical-data-science-with-r-1st-
edition-nina-zumel/
Python data science cookbook over 60 practical recipes to
help you explore Python and its robust data science
capabilities Subramanian
https://ebookultra.com/download/python-data-science-cookbook-
over-60-practical-recipes-to-help-you-explore-python-and-its-robust-
data-science-capabilities-subramanian/

R for Data Science 1st Edition Toomey

https://ebookultra.com/download/r-for-data-science-1st-edition-toomey/

Data Mining with Python Theory Application and Case


Studies 1st Edition Di Wu

https://ebookultra.com/download/data-mining-with-python-theory-
application-and-case-studies-1st-edition-di-wu/

Data Science with Julia 1st Edition Paul D. Mcnicholas

https://ebookultra.com/download/data-science-with-julia-1st-edition-
paul-d-mcnicholas/

Geophysical Data Analysis and Inverse Theory with MATLAB


and Python 5th Edition William Menke

https://ebookultra.com/download/geophysical-data-analysis-and-inverse-
theory-with-matlab-and-python-5th-edition-william-menke/
Data Science Fundamentals with R Python and Open
Data 1st Edition Marco Cremonini Digital Instant
Download
Author(s): Marco Cremonini
ISBN(s): 9781394213269, 1394213263
Edition: 1
File Details: PDF, 7.67 MB
Year: 2024
Language: english
Table of Contents
1. Cover
2. Table of Contents
3. Title Page
4. Copyright
5. Preface
6. About the Companion Website
7. Introduction
1. Approach
2. Open Data
3. What You Don't Learn
8. 1 Open-Source Tools for Data Science
1. 1.1 R Language and RStudio
2. 1.2 Python Language and Tools
3. 1.3 Advanced Plain Text Editor
4. 1.4 CSV Format for Datasets
5. Questions
9. 2 Simple Exploratory Data Analysis
1. 2.1 Missing Values Analysis
2. 2.2 R: Descriptive Statistics and Utility Functions
3. 2.3 Python: Descriptive Statistics and Utility Functions
4. Questions
10. 3 Data Organization and First Data Frame Operations
1. Datasets
2. 3.1 R: Read CSV Datasets and Column Selection
3. 3.2 R: Rename and Relocate Columns
4. 3.3 R: Slicing, Column Creation, and Deletion
5. 3.4 R: Separate and Unite Columns
6. 3.5 R: Sorting Data Frames
7. 3.6 R: Pipe
8. 3.7 Python: Column Selection
9. 3.8 Python: Rename and Relocate Columns
10. 3.9 Python: NumPy Slicing, Selection with Index, Column
Creation and Deletion
11. 3.10 Python: Separate and Unite Columns
12. 3.11 Python: Sorting Data Frame
13. Questions
11. 4 Subsetting with Logical Conditions
1. 4.1 Logical Operators
2. 4.2 R: Row Selection
12. 5 Operations on Dates, Strings, and Missing Values
1. Datasets
2. 5.1 R: Operations on Dates and Strings
3. 5.2 R: Handling Missing Values and Data Type
Transformations
4. 5.3 R: Example with Dates, Strings, and Missing Values
5. 5.4 Pyhton: Operations on Dates and Strings
6. 5.5 Python: Handling Missing Values and Data Type
Transformations
7. 5.6 Python: Examples with Dates, Strings, and Missing
Values
8. Questions
13. 6 Pivoting and Wide-long Transformations
1. Datasets
2. 6.1 R: Pivoting
3. 6.2 Python: Pivoting
14. 7 Groups and Operations on Groups
1. Dataset
2. 7.1 R: Groups
3. 7.2 Python: Groups
4. Questions
15. 8 Conditions and Iterations
1. Datasets
2. 8.1 R: Conditions and Iterations
3. 8.2 Python: Conditions and Iterations
4. Questions
16. 9 Functions and Multicolumn Operations
1. 9.1 R: User-defined Functions
2. 9.2 R: Multicolumn Operations
3. 9.3 Python: User-defined and Lambda Functions
4. Questions
17. 10 Join Data Frames
1. Datasets
2. 10.1 Basic Concepts
3. 10.2 Python: Join Operations
4. Questions
18. 11 List/Dictionary Data Format
1. Datasets
2. 11.1 R: List Data Format
3. 11.2 R: JSON Data Format and Use Cases
4. 11.3 Python: Dictionary Data Format
5. Questions
19. Index
20. End User License Agreement

List of Tables

1. Chapter 2
1. Table 2.1 R utility functions.
2. Table 2.2 Python utility functions.
2. Chapter 4
1. Table 4.1 Main logical operators.
2. Table 4.2 Truth tables for binary operators AND, OR, and
XOR.
3. Chapter 5
1. Table 5.1 Main functions of package stringr.
2. Table 5.2 Data type verification and transformation
functions.
3. Table 5.3 Dataset Fahrraddiebstahl in Berlin (translated),
column descriptio...
4. Table 5.4 Symbols for date formats.
5. Table 5.5 Pandas functions for string manipulation.
6. Table 5.6 Data type verification and transformation
functions.
4. Chapter 7
1. Table 7.1 Columns selected from the US domestic flight
dataset.
5. Chapter 8
1. Table 8.1 Unit of measurement and symbols.
6. Chapter 11
1. Table 11.1 Methods for Python dict data format.

List of Illustrations

1. Chapter 1
1. Figure 1.1 RStudio Desktop's standard layout
2. Figure 1.2 Example of starting JupyterLab
3. Figure 1.3 Ambiguity between textual character and
separator symbol
2. Chapter 4
1. Figure 4.1 Binary logical operators AND, OR, and XOR: set
theory.
3. Chapter 6
1. Figure 6.1 Example of long-form dataset.
2. Figure 6.2 Wide-long transformation schema.
4. Chapter 10
1. Figure 10.1 Example of join between data frames.
5. Chapter 11
1. Figure 11.1 The list structure of a3 with the native RStudio
viewer.
2. Figure 11.2 RStudio viewer visualization of data frame df2.
3. Figure 11.3 Result of the unnest_longer() function.
4. Figure 11.4 The Nobel Prize JSON data format.
Data Science Fundamentals with R,
Python, and Open Data

Marco Cremonini

University of Milan

Italy
Copyright © 2024 by John Wiley & Sons, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.


Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or


transmitted in any form or by any means, electronic, mechanical, photocopying,
recording, scanning, or otherwise, except as permitted under Section 107 or 108 of
the 1976 United States Copyright Act, without either the prior written permission of
the Publisher, or authorization through payment of the appropriate per-copy fee to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978)
750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the
Publisher for permission should be addressed to the Permissions Department, John
Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-
6008, or online at http://www.wiley.com/go/permission.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of
John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries
and may not be used without written permission. All other trademarks are the
property of their respective owners. John Wiley & Sons, Inc. is not associated with
any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used
their best efforts in preparing this book, they make no representations or warranties
with respect to the accuracy or completeness of the contents of this book and
specifically disclaim any implied warranties of merchantability or fitness for a
particular purpose. No warranty may be created or extended by sales representatives
or written sales materials. The advice and strategies contained herein may not be
suitable for your situation. You should consult with a professional where
appropriate. Further, readers should be aware that websites listed in this work may
have changed or disappeared between when this work was written and when it is
read. Neither the publisher nor authors shall be liable for any loss of profit or any
other commercial damages, including but not limited to special, incidental,
consequential, or other damages.

For general information on our other products and services or for technical support,
please contact our Customer Care Department within the United States at (800) 762-
2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that
appears in print may not be available in electronic formats. For more information
about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data Applied for:

Hardback ISBN: 9781394213245

Cover Design: Wiley


Cover Image: © Andriy Onufriyenko/Getty Images
Preface
Two questions come along with every new text that aims to
teach someone something. The first is, Who is it addressed to?
and the second is, Why does it have precisely those contents,
organized in that way? These two questions, for this text, have
perhaps even greater relevance than they usually do, because
for both, the answer is unconventional (or at least not entirely
conventional) and to some, it may seem surprising. It shouldn't
be, or even better, if the answers will make the surprise a
pleasant surprise.

Let's start with the first question: Who is the target of a text that
introduces the fundamentals of two programming languages, R
and Python, for the discipline called data science? Those who
study to become data scientists, computer scientists, or
computer engineers, it seems obvious, right? Instead, it is not
so. For sure, future data scientists, computer scientists, and
computer engineers could find this text useful. However, the
real recipients should be others, simply all the others, the non-
specialists, those who do not work or study to make IT or data
science their main profession. Those who study to become or
already are sociologists, political scientists, economists,
psychologists, marketing or human resource management
experts, and those aiming to have a career in business
management and in managing global supply chains and
distribution networks. Also, those studying to be biologists,
chemists, geologists, climatologists, or even physicians. Then
there are law students, human rights activists, experts of
traditional and social media, memes and social networks,
linguists, archaeologists, and paleontologists (I'm not joking,
there really are fabulous data science works applied to
linguistics, archeology, and paleontology). Certainly, in this
roundup, I have forgotten many who deserved to be mentioned
like the others. Don't feel left out. The artists I forgot! There are
contaminations between art, data science, and data
visualization of incredible interest. Art absorbs and re-
elaborates, and in a certain way, this is also what data science
does: it absorbs and re-elaborates. Finally, there are also all
those who just don't know yet what they want to be; they will
figure it out along the way, and having certain tools can come in
handy in many cases.

Everyone can successfully learn the fundamentals of data


science and the use of these computational tools, even with a
few basic computer skills, with some efforts and time, of course,
necessary but reasonable. Everyone could find opportunities
for application in all, or almost all, existing professions,
sciences, humanities, and cultural fields. And above all, without
the need to take on the role of computer scientist or data
scientist when you already have other roles to take on, which
rightly demand time and dedication.

Therefore, the fact of not considering computer scientists and


data scientists as the principal recipients of this book is not to
diminish their role for non-existent reasons, but because for
them there is no need to explain why a book that presents
programming languages for data science has, at least in theory,
something to do with what they typically do.

It is to the much wider audience of non-specialists that the


exhortation to learn the fundamentals of data science should be
addressed to, explaining that they do not have to transform
themselves into computer scientists to be able to do so (or even
worse, into geeks), which, with excellent reasons that are
difficult to dispute, have no intention to do. It doesn't matter if
they have always been convinced to be “unfit for computer
stuff,” and that, frankly, the rhetoric of past twenty years about
“digital natives,” “being a coder,” or “joining the digital
revolution” sounds just annoying. None of this should matter,
time to move on. How? Everyone should look at what digital
skills and technologies would be useful for their own discipline
and do the training for those goals. Do you want to be a
computer scientist or a data scientist? Well, do it; there is no
shortage of possibilities. Do you want to be an economist, a
biologist, or a marketing expert? Very well, do it, but you must
not be cut off from adequate training on digital methodologies
and tools from which you will benefit, as much as you are not
cut off from a legal, statistical, historical, or sociological training
if this knowledge is part of the skills needed for your profession
or education. What is the objection that is usually made? No
one can know everything, and generalists end up knowing a
little of everything and nothing adequately. It's as true as clichés
are, but that's not what we're talking about. A doctor who
acquires statistical or legal training is no less a doctor for this;
on the contrary, in many cases she/he is able to carry out the
medical profession in a better way. No one reproaches an
economist who becomes an expert in statistical analysis that
she/he should have taken a degree in statistics. And soon
(indeed already now), to the same economist who will become
an expert in machine learning techniques for classification
problems for fintech projects, no one, hopefully, will reproach
that as an economist she/he should leave those skills to
computer scientists. Like it or not, computer skills are spreading
and will do so more and more among non-computer scientists,
it's a matter of base rate, notoriously easy to be misinterpreted,
as all students who have taken an introductory course in
statistics know.
Let's consider the second question: Why this text presents two
languages instead of just one as it is usually done? Isn't it
enough to learn just one? Which is better? A friend of mine told
me he's heard that Python is famous, the other one he has never
heard of. Come on, seriously two? It's a miracle if I learn half of
just one! Stop. That's enough.

It's not a competition or a beauty contest between


programming languages, and not even a question of cheering,
as with sports teams, where you have to choose one, none is
admissible, but you can't root for two. R and Python are tools, in
some ways complex, not necessarily complicated, professional,
but also within anyone's reach. Above all, they are the result of
the continuous work of many people; they are evolving objects
and are extraordinary teaching aids for those who want to
learn. Speaking of evolution, a recent and interesting one is the
increasingly frequent convergence between the two languages
presented in this text. Convergence means the possibility of
coordinated, alternating, and complementary use: Complement
the benefits of both, exploit what is innovative in one and what
the other has, and above all, the real didactic value, learning
not to be afraid to change technology, because much of what
you learned with one will be found and will be useful with the
other. There is another reason, this one is more specific. It is
true that Python is so famous that almost everyone has heard
its name while only relatively few know R, except that
practically everyone involved in data science knows it and most
of them uses it, and that's for a pretty simple reason: It's a great
tool with a large community of people who have been
contributing new features for many years. What about Python?
Python is used by millions of people, mainly to make web
services, so it has enormous application possibilities. A part of
Python has specialized in data science and is growing rapidly,
taking advantage of the ease of extension to dynamic and web-
oriented applications. One last piece of information: Learning
the first programming language could look difficult. The
learning curve, so-called how fast you learn, is steep at first, you
struggle at the very beginning, but after a while it softens, and
you run. This is for the first one. Same ramp to climb with the
second one too? Not at all. Attempting an estimate, I would say
that just one-third of the effort is needed to learn the second, a
bargain that probably few are aware of. Therefore, let's do both
of them.

One last comment because one could certainly think that this
discussion is only valid in theory, putting it into practice is quite
another thing. Over the years I have required hundreds of
social science students to learn the fundamentals of both R and
Python for data science and I can tell you that it is true that
most of them struggled initially, some complained more or less
aloud that they were unfit, then they learned very quickly and
ended up demonstrating that it was possible for them to
acquire excellent computational skills without having to
transform into computer scientists or data scientists (to tell the
truth, someone transformed into one, but that's fine too),
without possessing nonexistent digital native geniuses, without
having to be anything other than what they study for, future
experts in social sciences, management, human resources, or
economics, and what is true for them is certainly true for
everyone. This is the pleasant surprise.

Milan, Italy Marco Cremonini


2023
About the Companion Website
This book is accompanied by student companion website.

www.wiley.com/go/DSFRPythonOpenData

The student website includes:

MCQs
Software
Introduction
This text introduces the fundamentals of data science using two
main programming languages and open-source technologies : R
and Python. These are accompanied by the respective
application contexts formed by tools to support coding scripts,
i.e. logical sequences of instructions with the aim to produce
certain results or functionalities. The tools can be of the
command line interface (CLI) type, which are consoles to be
used with textual commands, and integrated development
environment (IDE), which are of interactive type to support the
use of languages. Other elements that make up the application
context are the supplementary libraries that contain the
additional functions in addition to the basic ones coming with
the language, package managers for the automated
management of the download and installation of new libraries,
online documentation, cheat sheets, tutorials, and online
forums of discussion and help for users. This context, formed
by a language, tools, additional features, discussions between
users, and online documentation produced by developers, is
what we mean when we say "R" and "Python," not the simple
programming language tool, which by itself would be very little.
It is like talking only about the engine when instead you want to
explain how to drive a car on busy roads.
R and Python, together and with the meaning just described,
represent the knowledge to start approaching data science,
carry out the first simple steps, complete the educational
examples, get acquainted with real data, consider more
advanced features, familiarize oneself with other real data,
experiment with particular cases, analyze the logic behind
mechanisms, gain experience with more complex real data,
analyze online discussions on exceptional cases, look for data
sources in the world of open data, think about the results to be
obtained, even more sources of data now to put together,
familiarize yourself with different data formats, with large
datasets, with datasets that will drive you crazy before
obtaining a workable version, and finally be ready to move to
other technologies, other applications, uses, types of results,
projects of ever-increasing complexity. This is the journey that
starts here, and as discussed in the preface, it is within the
reach of anyone who puts some effort and time into it. A single
book, of course, cannot contain everything, but it can help to
start, proceed in the right direction, and accompany for a while.

With this text, we will start from the elementary steps to gain
speed quickly. We will use simplified teaching examples, but
also immediately familiarize ourselves with the type of data
that exists in reality, rather than in the unreality of the teaching
examples. We will finish by addressing some elaborate
Other documents randomly have
different content
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
back
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookultra.com

You might also like