0% found this document useful (0 votes)

42 views

Crash Course Data Science

The document provides an overview of key concepts in data science including data collection, descriptive statistics, exploratory data analysis, data visualizations, data cleaning, and machine learning. It discusses different techniques used at various stages of a data science project from gathering data to analyzing and visualizing insights.

Uploaded by

NABEEL KHAN

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Crash Course Data Science

Uploaded by

NABEEL KHAN

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CRASH COURSE DATA SCIENCE -

(BEGINNER LEVEL)
DATA COLLECTION
1) Data collection is the process of gathering relevant
information from various sources to analyze and derive insights.

2) In data science, the quality of collected data directly impacts

the accuracy of the resulting analysis and models.

3) A well-defined sampling strategy ensures that collected data

is representative of the larger population.

4) Surveys, interviews, and questionnaires are common

methods for collecting primary data directly from individuals.

5) Web scraping involves extracting information from websites

and is often used to collect data from online sources.

6) Sensor networks and Internet of Things (IoT) devices

contribute to the collection of real-time data in various
applications.

7) Secondary data refers to data collected by someone else for

a diﬀerent purpose but can still be useful for analysis.

8) The bias present in collected data can lead to skewed

insights and inaccurate conclusions.

9) Data curation involves organizing, cleaning, and preparing

collected data for analysis.

10) The process of data collection should follow ethical

guidelines to ensure privacy and respect for individuals' rights.
DESCRIPTIVE STATISTICS
1) Descriptive statistics summarize and describe the main features of a dataset.

2) Descriptive statistics can be used to summarize both categorical and

numerical variables.

3) Range is a measure of dispersion that represents the diﬀerence between the

maximum and minimum values in a dataset.

4) The range is NOT a measure of central tendency that represents the middle
value in a dataset.

5) The interquartile range (IQR) is a measure of spread that represents the range
between the first quartile (Q1) and the third quartile (Q3).

6) The mode is the value that occurs most frequently in a dataset.

7) The median is less aﬀected by outliers than the mean.

8) The median is less influenced by extreme values in the dataset, making it a

more robust measure of central tendency compared to the mean.

9) Standard deviation measures the average distance of values from the mean.

10) Standard deviation quantifies the dispersion or spread of data by measuring

the average distance between each data point and the mean.

11) Variance is NOT the square root of the standard deviation.

12) Variance is the squared value of the standard deviation.

13) Skewness is a measure of the symmetry of a distribution.

14) Skewness indicates the extent to which a distribution is skewed or

asymmetrical.

15) Correlation measures the strength and direction of the linear relationship
between two numerical variables.
EXPLORATORY DATA ANALYSIS
1) Exploratory data analysis involves summarizing and visualizing data to
gain insights and understand patterns.

2) Exploratory data analysis is typically performed after data cleaning and

preprocessing to ensure the data is in a suitable format for analysis.

3) Exploratory data analysis includes identifying outliers (extreme values) and

missing values in the dataset, which can impact the validity of the analysis.

4) Descriptive statistics, such as mean, median, and standard deviation, are

commonly calculated during exploratory data analysis to summarize the
central tendency and dispersion of the data.

5) Exploratory data analysis is NOT a flexible and iterative process.

6) Exploratory data analysis can help detect relationships and correlations

between variables, which can provide valuable insights into the dataset.

7) The primary goal of Exploratory data analysis is to gain an understanding

of the data rather than formal hypothesis testing and statistical inference.

8) Exploratory data analysis can reveal potential data quality issues, such as
inconsistent or erroneous values, and identify data anomalies that require
further investigation.

9) Graphical techniques, such as histograms, scatter plots, and box plots,

are commonly used in exploratory data analysis to visualize the distribution,
relationships, and outliers in the data.

10) Exploratory data analysis is NOT an ongoing process

DATA VISUALISATIONS

1) Data visualisation is the presentation of data in a graphical or pictorial

format

2) Bar chart, line chart and pie chart are some of the common types of
visualisation charts

3) A line chart is a data visualization technique suitable for displaying trends

over time

4) A heat map is used to represent distribution of values with colors

5) A tree map is used to show hierarchical data using nested rectangles

6) A box plot is used to show the distribution of data

7) A choropleth map is used to represent geographic data with colour

variations

8) The points on the scatter plot show the relationship between two
variables

9) In a bar chart, y-axis shows the dependent variable while x-axis shows
the independent variable

10) Python is the most commonly used programming language that creates
interactive data visualisations
DATA CLEANING
1) Imputation technique is used to fill in missing values

2) Outlier detection is used to identify and handle unusual data points

3) Standardization is used to bring all variables to a common scale

4) Deduplication is used to identify and handle duplicate records

5) Regular Expressions are used for pattern matching and extraction

6) One-Hot Encoding is used for handling categorical variables

7) Scaling is used to re-scale numerical variables

8) Trimming is used to remove unnecessary white spaces

9) Mean imputation Replacing missing values with the mean of the variable

10) Forward filling Filling missing values with the value before them

11) Interpolation Estimating missing values based on the adjacent values

12) Deleting rows Removing rows with missing values

MACHINE LEARNING
1) The two main categories of machine learning models are supervised and
unsupervised.

2) Labeled data in supervised learning provides correct answers for training

the model to learn relationships between input features and output labels.

3) Precision is the ratio of correctly predicted positive observations to the

total predicted positives, while recall is the ratio of correctly predicted
positive observations to the total actual positives.

4) Accuracy might not be suitable for imbalanced datasets because it can be

dominated by the majority class and may not reflect the true model
performance.

5) Cross-validation assesses a machine learning model's performance by

dividing the dataset into subsets, training/evaluating the model on diﬀerent
combinations, and providing insights into its generalization capability.

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6134)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (627)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1148)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (935)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8215)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (631)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1253)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (8365)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (860)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (877)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (954)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (2923)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (484)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (277)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4973)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (444)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2061)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4281)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (447)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1988)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1068)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1993)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2641)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1936)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (125)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4074)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (75)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (901)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (143)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (790)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (105)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (109)

Crash Course Data Science

Uploaded by

Crash Course Data Science

Uploaded by

CRASH COURSE DATA SCIENCE -

2) In data science, the quality of collected data directly impacts

3) A well-defined sampling strategy ensures that collected data

4) Surveys, interviews, and questionnaires are common

5) Web scraping involves extracting information from websites

6) Sensor networks and Internet of Things (IoT) devices

7) Secondary data refers to data collected by someone else for

8) The bias present in collected data can lead to skewed

9) Data curation involves organizing, cleaning, and preparing

10) The process of data collection should follow ethical

2) Descriptive statistics can be used to summarize both categorical and

3) Range is a measure of dispersion that represents the diﬀerence between the

6) The mode is the value that occurs most frequently in a dataset.

7) The median is less aﬀected by outliers than the mean.

8) The median is less influenced by extreme values in the dataset, making it a

10) Standard deviation quantifies the dispersion or spread of data by measuring

11) Variance is NOT the square root of the standard deviation.

12) Variance is the squared value of the standard deviation.

13) Skewness is a measure of the symmetry of a distribution.

14) Skewness indicates the extent to which a distribution is skewed or

2) Exploratory data analysis is typically performed after data cleaning and

3) Exploratory data analysis includes identifying outliers (extreme values) and

4) Descriptive statistics, such as mean, median, and standard deviation, are

5) Exploratory data analysis is NOT a flexible and iterative process.

6) Exploratory data analysis can help detect relationships and correlations

7) The primary goal of Exploratory data analysis is to gain an understanding

9) Graphical techniques, such as histograms, scatter plots, and box plots,

10) Exploratory data analysis is NOT an ongoing process

1) Data visualisation is the presentation of data in a graphical or pictorial

3) A line chart is a data visualization technique suitable for displaying trends

4) A heat map is used to represent distribution of values with colors

5) A tree map is used to show hierarchical data using nested rectangles

6) A box plot is used to show the distribution of data

7) A choropleth map is used to represent geographic data with colour

2) Outlier detection is used to identify and handle unusual data points

3) Standardization is used to bring all variables to a common scale

4) Deduplication is used to identify and handle duplicate records

5) Regular Expressions are used for pattern matching and extraction

6) One-Hot Encoding is used for handling categorical variables

7) Scaling is used to re-scale numerical variables

8) Trimming is used to remove unnecessary white spaces

11) Interpolation Estimating missing values based on the adjacent values

12) Deleting rows Removing rows with missing values

2) Labeled data in supervised learning provides correct answers for training

3) Precision is the ratio of correctly predicted positive observations to the

4) Accuracy might not be suitable for imbalanced datasets because it can be

5) Cross-validation assesses a machine learning model's performance by

You might also like