Data Analytics
Data Analytics
Analytic
s
From Data to
Decisions
Introduction
to Data
Analytics
• What is data ? Data refers to raw facts and
figures collected from various sources.
• What is data analytics ? Data analytics is
the process of looking at data to find useful
information, patterns, or trends that help us
make better decisions.
• It involves:
1. Collecting the right data
2. Processing and organizing it
3. Analyzing it using statistical or
computational techniques
4. Visualizing results using charts, graphs,
dashboards, etc.
Types of Data
Analytics
The Tale of
BeanBrew
Café
How data saved the coffee
empire
Once upon a time in a bustling city, a cozy little coffee shop called
BeanBrew Café was famous for its rich espresso and warm
pastries. For years, it was the go-to spot for students, professionals,
and tourists alike. But in early 2024, the owner, Maya, noticed
something troubling—sales were slipping.
# 1D array
array_1d = np.array([10, 20, 30])
print("1D Array:", array_1d)
# 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print("2D Array:", array_2d)
print("Dimensions:", array_2d.ndim)
print("Shape:", array_2d.shape)
print("Data Type:", array_2d.dtype)
Indexing and Slicing arrays
# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])
print(arr[0])
print(arr[0:3])
# Create a 2D array
arr2d = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
print(arr2d[0, 1])
print(arr2d[0])
print(arr2d[:, 0])
print(arr2d[1:, 1:])
Flatten, Reshape and
Transpose
arr = np.array([
[1, 2, 3],
[4, 5, 6]
])
print("Flattened:", arr.flatten())
print("Transposed:", arr.transpose())
Pandas
• Pandas is an open-source Python library that
provides powerful and easy-to-use data
structures for data analysis and
manipulation.
• At the core of Pandas are two primary data
structures:
1. Series: A one-dimensional labeled array.
2. DataFrame: A two-dimensional, tabular
data structure with labeled rows and
columns
Creating Series and
DataFrames
import pandas as pd
#Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)
#DataFrame
data = {
'Name': ['A', 'B’, C'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
Creating Series and
DataFrames
import pandas as pd
#Series
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print(series)
#DataFrame
data = {
'Name': ['A', 'B’, C'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
Summary methods
data = {
'Name': ['A', 'B’, C', 'D’, E’, F'],
'Age': [25, 30, 35, 40, 22, 28],
'Salary': [50000, 60000, 70000, 80000, 45000, 52000]
}
df = pd.DataFrame(data)
print(df.head())
print(df.tail())
print(df.info())
print(df.describe())
loc[ ] and iloc[ ]
data = {
'Name': ['John', 'Emma', 'Liam'],
'Age': [28, 24, 31]
}
df = pd.DataFrame(data)
print(df.loc[0, 'Name’])
print(df.loc[:, 'Age’])
print(df.iloc[0, 0])
print(df.iloc[0:2, 0])
Handling missing data
data = {
'Name': ['John', 'Emma', None, 'Liam'],
'Age': [28, None, 22, 31],
'City': ['New York', 'Los Angeles', 'Chicago', None]
}
df = pd.DataFrame(data)
print(df.isnull())
print(df.dropna())
df['Name'] = df['Name'].fillna('Unknown')
mean_age = df['Age'].mean()
df['Age'] = df['Age'].fillna(mean_age)
Matplotlib
• Matplotlib is an open-source Python library
used for creating a variety of charts and
graphs.
• At the core of Matplotlib is:
1. Figure: The overall window or page that
holds the plot(s).
2. Axes: The individual plot or graph within
the figure where data is visualized.
• Matplotlib makes it easy to generate common
plots such as line graphs, scatter plots,
Creating a plot
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4])
y = np.array([10, 20, 25, 30])
plt.plot(x, y)
plt.show()
Subplots
x = np.array([1, 2, 3, 4])
y1 = np.array([10, 20, 30, 40])
y2 = np.array([40, 30, 20, 10])
y3 = np.array([5, 15, 10, 25])
# First subplot
axs[0].plot(x, y1)
# Second subplot
axs[1].plot(x, y2)
plt.show()
Seaborn
• Seaborn is an open-source Python library
built on top of Matplotlib.
• It is designed for creating attractive and
informative statistical graphics with ease.
• It provides a high-level interface for drawing
visually appealing and complex plots using
fewer lines of code.
• Seaborn makes it easy to visualize
relationships in data, explore patterns, and
enhance plots created with Matplotlib.
Creating a simple plot
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
plt.show()
SciPy
• SciPy stands for Scientific Python. It's an
open-source Python library used for scientific
and technical computing.
• It's built on top of NumPy and provides
additional functionality.
• Think of SciPy as a powerful extension of
NumPy — NumPy gives you arrays and
basic math, SciPy gives you advanced tools
to analyze and solve complex problems.
Simple hypothesis testing
from scipy.stats import ttest_rel