SlideShare a Scribd company logo
WORKING WITH GRAPHS
1
UNIT 4
DATA WRANGLING
Data wrangling in Python refers to the process of cleaning,
transforming, and preparing raw or messy data for analysis,
visualization, or machine learning tasks using Python
programming language. It involves a series of operations to
make the data more structured, complete, and suitable for the
intended analysis. Python provides various libraries and tools
for efficiently performing data wrangling tasks. Here are some
common steps and techniques involved in data wrangling in
Python:
2
1. Data Loading: Load the raw data into Python using libraries like Pandas (for structured data), NumPy
(for numerical data), or specialized libraries for other data formats like CSV, Excel, JSON, or databases.
import pandas as pd
# Load data from a CSV file
df = pd.read_csv('data.csv')
2. Data Exploration: Get a preliminary understanding of the data by examining its structure, summary statistics,
and identifying missing values.
# Display the first few rows of the DataFrame
print(df.head())
# Get basic summary statistics
print(df.describe())
# Check for missing values
print(df.isnull().sum())
3
3. Data Cleaning:
Handle missing values by imputing them or dropping rows/columns with missing data.
Remove duplicates.
Correct data errors and inconsistencies.
# Drop rows with missing values
df = df.dropna()
# Remove duplicates
df = df.drop_duplicates()
# Correct data errors
df['column_name'] = df['column_name'].apply(correct_function)
4
4. Data Transformation:
Convert data types.
Normalize or scale numerical data.
Encode categorical variables.
Create new features or variables.
# Convert data types
df['numeric_column'] = df['numeric_column'].astype(float)
# Normalize numerical data
df['numeric_column'] = (df['numeric_column'] - df['numeric_column'].mean()) / df['numeric_column'].std()
# Encode categorical variables
df = pd.get_dummies(df, columns=['categorical_column'])
# Create new features
df['new_feature'] = df['feature1'] * df['feature2']
5
5. Data Aggregation and Grouping:
Aggregate data by grouping based on certain attributes.
Calculate summary statistics for groups.
# Group by a categorical variable and calculate the mean
grouped_data = df.groupby('category_column')['numeric_column'].mean()
6. Data Visualization:
Use libraries like Matplotlib or Seaborn to visualize the data, detect patterns, and gain insights.
import matplotlib.pyplot as plt
# Create a histogram
plt.hist(df['numeric_column'])
plt.xlabel('Numeric Column')
plt.ylabel('Frequency')
plt.show()
6
7. Data Export:
Save the cleaned and transformed data to a new file if necessary.
# Export cleaned data to a CSV file
df.to_csv('cleaned_data.csv', index=False)
7
COMBINING AND MERGING DATA SETS IN PYTHON
Combining and merging data sets in Python is a common operation in data analysis and manipulation. You can
achieve this using various libraries, with the most popular one being pandas. Here, I'll provide an overview of
how to combine and merge data sets using pandas.
Combining Data Sets
1. Concatenation:
Concatenation is used to combine data frames either row-wise or column-wise.
a.Row-wise concatenation:
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2']})
df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'], 'B': ['B3', 'B4', 'B5']})
result = pd.concat([df1, df2], axis=0) # Concatenate along rows
(axis=0)
8
b. Column-wise concatenation:
result = pd.concat([df1, df2], axis=1) # Concatenate along columns (axis=1)
2. Appending:
Appending is a convenient way to add rows to an existing DataFrame.
result = df1.append(df2)
9
Ex. 2
Define a dictionary containing employee data
data2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'],
'Age':[17, 14, 12, 52],
'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
# Convert the dictionary into DataFrame
df1 = pd.DataFrame(data2, index=[4, 5, 6, 7])
print(df, "nn", df1)
# using a .concat() method
frames = [df, df1]
res1 = pd.concat(frames)
print(res1)
10
O/P
11
Merging Data Sets
Merging is used to combine data frames based on common columns or indices.
1. Inner Join:
result = pd.merge(df1, df2, on='key_column', how='inner’)
2. Left Join:
result = pd.merge(df1, df2, on='key_column', how='left’)
3. Right Join:
result = pd.merge(df1, df2, on='key_column', how='right’)
4. Outer Join:
result = pd.merge(df1, df2, on='key_column', how='outer’)
5. Merging on Multiple Columns:
You can merge on multiple columns by passing a list of column names to the on parameter.
result = pd.merge(df1, df2, on=['key_column1', 'key_column2'], how='inner')
12
13
res2 = pd.concat([df, df1], axis=1, join='inner')
print(res2)
14
res2 = pd.concat([df, df1], axis=1, join=‘outer')
print(res2)
DATA TRANSFORMATION
Data transformation is the process of converting raw data into a format that is more suitable for analysis,
modeling, or machine learning. It is an essential step in any data science project, and Python is a popular
programming language for data transformation.
There are many different types of data transformation, but some common examples include:
Cleaning and preprocessing: This involves removing errors and inconsistencies from the data, as well as
converting the data to a consistent format.
Feature engineering: This involves creating new features from the existing data, or transforming existing
features in a way that is more informative for the task at hand.
Encoding categorical data: Categorical data, such as text or labels, needs to be converted to numerical data
before it can be used by many machine learning algorithms.
Scaling and normalization: This involves transforming the data so that all features are on a similar scale, which
can improve the performance of machine learning algorithms.
There are a number of different Python libraries that can be used for data transformation, but the most
popular one is Pandas. Pandas is a powerful library for data manipulation and analysis, and it provides a wide
range of functions for data transformation.
15
import pandas as pd
# Load the data
df = pd.read_csv('data.csv')
# Clean the data
df = df.dropna() # Drop rows with missing values
df['age'] = df['age'].astype('int') # Convert the 'age' column to integers
# Create a new feature
df['age_group'] = df['age'].apply(lambda x: 'young' if x < 18 else 'adult')
# Encode categorical data
df['gender'] = df['gender'].map({'male': 1, 'female': 0})
# Scale the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df[['height', 'weight']] = scaler.fit_transform(df[['height', 'weight']])
# Save the transformed data
df.to_csv('transformed_data.csv', index=False)
16
1. Data Cleaning:
Data cleaning involves handling missing values, removing duplicates, and correcting errors in your dataset.
Handling Missing Values:
Pandas provides functions like dropna(), fillna(), and interpolate() to handle missing values.
import pandas as pd
# Remove rows with missing values
df.dropna() dataframe.dropna(axis, how, thresh, subset, inplace)
# Fill missing values with a specific value
df.fillna(0) dataframe.fillna(value, method, axis, inplace, limit, downcast)
# filling missing value using fillna()
df.fillna(0)
# Interpolate missing values
df.interpolate() dataframe.interpolate(method, axis, inplace, limit, limit_direction, limit_area,
downcast, kwargs)
Removing Duplicates:
df.drop_duplicates()
17
2. Data Filtering:
Filtering allows you to select a subset of data based on certain conditions.
# Filter rows where a condition is met
filtered_df = df[df['column_name'] > 10]
3. Data Aggregation:
Aggregation involves summarizing data by grouping it based on certain criteria.
# Group by a column and calculate aggregate statistics
grouped_df = df.groupby('category_column')['numeric_column'].mean()
4. Data Transformation:
Data transformation includes operations like converting data types, scaling values, or applying mathematical
functions.
# Convert data types
df['numeric_column'] = df['numeric_column'].astype(float)
# Scaling values (e.g., Min-Max scaling)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df['scaled_column'] = scaler.fit_transform(df[['numeric_column']])
18
5. One-Hot Encoding:
Convert categorical variables into a numerical format using one-hot encoding.
encoded_df = pd.get_dummies(df, columns=['categorical_column’])
6. Reshaping Data:
Reshaping data includes tasks like pivoting, melting, or stacking/unstacking for better analysis.
# Pivot a DataFrame
pivoted_df = df.pivot(index='row_column', columns='column_column', values='value_column')
# Melt a DataFrame
melted_df = pd.melt(df, id_vars=['id_column'], value_vars=['var1', 'var2'], var_name='variable',
value_name='value')
19
7. Text Data Processing:
For text data, you can perform transformations such as tokenization, stemming, and stop-word removal using
libraries like NLTK or spaCy.
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('stopwords')
nltk.download('punkt')
# Tokenization and stop-word removal
df['text_column'] = df['text_column'].apply(lambda x: ' '.join([word for word in word_tokenize(x)
20
DETECTING AND FILTERING OUTLIERS IN PYTHON
What are Outliers in Python?
Before diving deep into the concept of outliers, let us understand the origin of raw data.
Raw data that is fed to a system is usually generated from surveys and extraction of data from real-time actions
on the web. This may give rise to variations in the data and there exists a chance of measurement error while
recording the data.
An outlier is a point or set of data points that lie away from the rest of the data values of the dataset. That is,
it is a data point(s) that appear away from the overall distribution of data values in a dataset.
Outliers are possible only in continuous values. Thus, the detection and removal of outliers are applicable to
regression values only.
Basically, outliers appear to diverge from the overall proper and well structured distribution of the data
elements. It can be considered as an abnormal distribution which appears away from the class or population.
21
Why is it necessary to remove outliers from the data?
As discussed above, outliers are the data points that lie away from the usual distribution of the data and causes
causes the below effects on the overall data distribution:
Affects the overall standard variation of the data.
Manipulates the overall mean of the data.
Converts the data to a skewed form.
It causes bias in the accuracy estimation of the machine learning model.
Affects the distribution and statistics of the dataset.
Detection of Outliers – IQR approach
The outliers in the dataset can be detected by the below methods:
•Z-score
•Scatter Plots
•Interquartile range(IQR)
22
1. Visual Inspection:
Start by visualizing your data using histograms, box plots, scatter plots, or other visualization techniques.
Outliers often appear as points far from the main cluster or as values outside the whiskers of box plots.
Visualization can help you identify potential outliers.
import matplotlib.pyplot as plt
import seaborn as sns
# Box plot to visualize outliers
sns.boxplot(x=df['column_name'])
plt.show()
2. Z-Score:
The Z-score measures how far a data point is from the mean in terms of standard deviations. You can use the Z-
score to detect outliers. Typically, data points with a Z-score greater than a threshold (e.g., 2 or 3) are considered
outliers.
from scipy import stats
z_scores = stats.zscore(df['column_name'])
outliers = df[abs(z_scores) > 2]
filtered_data = df[abs(z_scores) <= 2]
23
3. IQR (Interquartile Range) Method:
The IQR method involves calculating the IQR (the difference between the 75th percentile and the 25th
percentile) and identifying outliers as values outside a specified range.
Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['column_name'] < lower_bound) | (df['column_name'] > upper_bound)]
filtered_data = df[(df['column_name'] >= lower_bound) & (df['column_name'] <= upper_bound)]
24
4. Tukey's Fences:
Tukey's Fences method is similar to the IQR method but uses a different threshold for identifying outliers.
Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)
lower_fence = Q1 - 3 * (Q3 - Q1)
upper_fence = Q3 + 3 * (Q3 - Q1)
outliers = df[(df['column_name'] < lower_fence) | (df['column_name'] > upper_fence)]
filtered_data = df[(df['column_name'] >= lower_fence) & (df['column_name'] <= upper_fence)]
25
5. Machine Learning-Based Methods:
You can also use machine learning models, such as Isolation Forest or One-Class SVM, to detect outliers in
your data.
from sklearn.ensemble import IsolationForest
clf = IsolationForest(contamination=0.05) # Adjust contamination based on your dataset
outliers = clf.fit_predict(df[['column_name']])
outliers = df[outliers == -1]
filtered_data = df[outliers == 1]
26
STRING MANIPULATION
String manipulation in Python involves performing various operations on strings, such as concatenation,
slicing, searching, replacing, formatting, and more. Python provides a rich set of string manipulation methods
and functions that make it easy to work with text data. Here are some common string manipulation
techniques in Python:
1. String Concatenation:
You can concatenate strings using the + operator or by using the str.join() method.
str1 = "Hello"
str2 = "World"
result = str1 + ", " + str2 # Using the + operator
words = ["Hello", "World"]
result = ", ".join(words) # Using join
27
2. String Slicing:
String slicing allows you to extract substrings from a string based on their positions.
text = "Python Programming"
substring = text[7:18] # Extract "Programming"
3. String Searching:
You can search for substrings within a string using methods like str.find(), str.index(), or regular expressions with
the re module.
text = "Python is a powerful programming language"
position = text.find("powerful") # Find the position of "powerful"
4. String Replacement:
Replace specific substrings within a string using the str.replace() method.
text = "Python is a great programming language"
new_text = text.replace("great", "powerful") # Replace "great" with "powerful"
28
5. String Formatting:
You can format strings using f-strings (Python 3.6+), the .format() method, or the % operator.
name = "Alice"
age = 30
formatted_str = f"My name is {name} and I am {age} years old."
name = "Bob"
age = 25
formatted_str = "My name is {} and I am {} years old.".format(name, age)
6. String Splitting: str Syntax string.split(separator,max)
Split a string into a list of substrings using the str.split() method.
1. text = "Python,Java,C++,JavaScript"
languages = text.split(",") # Split by comma
2. str = “Python is cool” print(str.split()) : [‘Python’, ‘is’, ‘cool’
3. str = “abcabc” print(str.split(c)) :[‘abc’, ‘abc’]
4. f = open(“sample.txt”, “r”)
info = f.read()
print(info.splitlines())
f.close()
29
7. String Stripping:
Remove leading and trailing whitespace characters using str.strip() or str.lstrip() and str.rstrip().
text = " Python is awesome! "
cleaned_text = text.strip() # Remove leading and trailing spaces
8. String Case Conversion:
Convert the case of a string using methods like str.lower(), str.upper(), or str.capitalize().
text = "Hello World"
lower_case = text.lower()
upper_case = text.upper()
capitalized = text.capitalize()
30
Syntax Function
string.upper() To transform all the characters of the string
into uppercase.
string.lower() To transform all the characters of the string
into lowercase.
string.title() To transform the first letter of a word into the
upper case and the rest of the characters into
the lower case.
string.swapcase() To transform the upper case characters into
lower case and vice versa.
string.capitalize() To transform the first character in the string to
the upper case.
string.isupper() Returns true if all the alphabetic characters of
the string are upper case.
string.islower() Returns true if all the alphabetic characters of
the string are lower case.
string.Endswith() Return true if the string ends with a specific
value.
string.Startswith() Return true if the string starts with a specific
value.
string.index(‘character’) Return the position of the character.
31
VECTORIZED STRING FUNCTIONS
Vectorized string functions in pandas allow you to efficiently perform operations on string data within a
pandas DataFrame or Series. These functions are accessed through the .str attribute of a pandas Series and
enable you to apply string operations element-wise. Here are some commonly used vectorized string
functions in pandas:
import pandas as pd
series = pd.Series(['Alice', 'Bob', 'Carol'])
series_lowercase = series.str.lower()
print(series_lowercase)
0 alice
1 bob
2 carol
dtype: object
32
Some of the most commonly used vectorized string functions in Pandas include:
str.lower(): Convert all strings to lowercase.
str.upper(): Convert all strings to uppercase.
str.strip(): Remove whitespace from the beginning and end of all strings.
str.split(): Split all strings into a list of strings, using a specified separator.
str.replace(): Replace all occurrences of a specified substring with another substring in all strings.
str.contains(): Return a boolean Series indicating whether each string contains a specified substring.
Vectorized string functions can also be used to perform more complex string operations, such as regular
expression matching and extraction. For example, the following code uses the str.extract() function to extract the
first name from each email address in a Series:
import pandas as pd
series = pd.Series(['alice@example.com',
'bob@example.com', 'carol@example.com'])
first_names =
series.str.extract(r'(?P<first_name>w+)@example.com')
print(first_names)
Output:
0 alice
1 bob
2 carol
dtype: object
33
PLOTTING AND VISUALIZATION
Plotting and visualization are crucial for understanding and communicating data. In Python, there are several
libraries for creating plots and visualizations, with Matplotlib, Seaborn, and Plotly being some of the most
popular ones. Here's an overview of how to create plots and visualizations in Python:
Matplotlib
Matplotlib is an easy-to-use, low-level data visualization library that is built on NumPy arrays. It consists of
various plots like scatter plot, line plot, histogram, etc. Matplotlib provides a lot of flexibility.
pip install matplotlib
34
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot')
plt.show()
Basic Line Plot:
35
Seaborn:
Seaborn is built on top of Matplotlib and provides a high-level interface for creating informative and attractive
statistical graphics.
Installation:
pip install seaborn
Basic Scatter Plot:
import seaborn as sns
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
sns.scatterplot(x=x, y=y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.show() 36
Plotly:
Plotly is a powerful library for creating interactive and web-based visualizations. It is often used for creating
dashboards and web applications.
Installation:
pip install plotly
Basic Bar Chart:
import plotly.express as px
data = {'Category': ['A', 'B', 'C', 'D'],
'Values': [10, 20, 15, 30]}
fig = px.bar(data, x='Category', y='Values', title='Bar Chart')
fig.show()
37
Other Libraries:
Pandas: Pandas also provides basic plotting functionality through the .plot() method for DataFrames, making it
convenient for quick exploratory data analysis.
Bokeh: Bokeh is another library for interactive web-based visualizations and is well-suited for creating
interactive dashboards.
Altair: Altair is a declarative statistical visualization library for Python, making it easy to create complex
visualizations with concise code.
ggplot (ggpy): ggplot is a Python implementation of the popular ggplot2 package from R, which uses a
grammar of graphics to create plots.
38
MATPLOTLIB API PRIMER
39
Matplotlib is a popular Python library for creating static, animated, and interactive visualizations. When
creating plots with Matplotlib, you can customize the appearance of your lines, markers, and line styles. Here's
a primer on how to do that:
Colors:
Matplotlib allows you to specify colors for lines, markers, and other plot elements in several ways:
Named Colors: You can use named colors like 'red', 'blue', 'green', etc.
import matplotlib.pyplot as plt
plt.plot(x, y, color='red', label='Red Line')
RGB Values: You can use RGB tuples to specify colors.
plt.plot(x, y, color=(0.1, 0.2, 0.3), label='Custom Color')
Hexadecimal Colors: You can also use hexadecimal color codes.
plt.plot(x, y, color='#FF5733', label='Hex Color')
40
Markers:
Markers are used to indicate specific data points on a plot. You can customize markers in Matplotlib:
plt.plot(x, y, marker='o', markersize=8, markerfacecolor='yellow', markeredgecolor='black', label='Custom
Marker')
marker: Specifies the marker style (e.g., 'o' for circles, 's' for squares, 'x' for crosses).
markersize: Sets the size of the marker.
markerfacecolor: Sets the marker's fill color.
markeredgecolor: Sets the marker's edge color.
41
Line Styles:
You can customize the line style of your plot:
plt.plot(x, y, linestyle='--', linewidth=2, label='Dashed Line')
linestyle: Specifies the line style (e.g., '-', '--', '-.', ':').
linewidth: Sets the width of the line.
42
Line Styles:
You can customize the appearance of lines in your plot using various line styles, markers, and colors. Here's an
example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
# Customizing line style with color, marker, and linestyle
plt.plot(x, y, color='blue', marker='o', linestyle='--', markersize=8, label='Custom Line')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Custom Line Style')
plt.legend()
plt.show()
In the above example:
color: Sets the line color.
marker: Specifies the marker style (e.g., 'o' for circles, 's' for
squares).
linestyle: Sets the line style ('--' for dashed, ':' for dotted,
etc.).
markersize: Adjusts the size of markers.
43
Ticks and Labels:
You can customize tick locations and labels on the x and y axes using xticks() and yticks() functions:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y, label='Line Plot')
# Customizing x-axis ticks and labels
plt.xticks([1, 2, 3, 4, 5], ['A', 'B', 'C', 'D', 'E'])
# Customizing y-axis ticks and labels
plt.yticks([2, 4, 6, 8, 10], ['Low', 'Medium', 'High', 'Very High', 'Max'])
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Custom Ticks and Labels')
plt.legend()
plt.show()
In this example:
xticks() and yticks() specify the locations and labels for the
ticks on the x and y axes, respectively.
44
Legends:
To add legends to your plot, you can use the legend() function. You should also label your plotted lines or data
points using the label parameter when creating the plot.
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Legend Example')
plt.legend()
plt.show()
In this example:
label is provided when creating each line, and legend() is
called to display the legend.
ANNOTATIONS AND DRAWING ON A SUBPLOT
45
Annotations and drawing on a subplot in Matplotlib allow you to add textual or graphical elements to your
plots to provide additional information or highlight specific points of interest. Here's how you can add
annotations and draw on a subplot:
46
Adding Text Annotations:
You can add text annotations to your plot using the text()
function. Here's an example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y, label='Line Plot')
# Adding a text annotation
plt.text(3, 7, 'Annotation Here', fontsize=12, color='red')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Text Annotation Example')
plt.legend()
plt.show()
In this example, plt.text(x, y, text, fontsize, color) is used to
add a text annotation at coordinates (3, 7) with the
specified text, fontsize, and color.
47
Adding Arrows with Annotations:
You can add arrows to point to specific locations on your plot using the annotate() function. Here's an example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y, label='Line Plot')
# Adding an arrow with annotation
plt.annotate('Important Point', xy=(3, 6), xytext=(4, 8), fontsize=12,
arrowprops=dict(arrowstyle='->', color='blue'))
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Arrow Annotation Example')
plt.legend()
plt.show()
In this example, plt.annotate(text, xy, xytext, fontsize, arrowprops) is used to add an arrow with text annotation. xy
specifies the point being pointed to, and xytext specifies the location of the text.
48
Drawing Shapes:
You can draw various shapes, lines, and polygons on a subplot using Matplotlib's plotting functions. For
example, to draw a rectangle:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
# Create a subplot
fig, ax = plt.subplots()
# Add a rectangle
rectangle = patches.Rectangle((1, 2), 2, 4, linewidth=2, edgecolor='red', facecolor='none')
ax.add_patch(rectangle)
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Drawing Shapes Example')
plt.show()
In this example, we create a subplot, add a rectangle using patches.Rectangle(), and then add it to the plot
with ax.add_patch().
SAVING PLOTS TO FILE
49
You can save plots created with Matplotlib to various file formats such as PNG, PDF, SVG, and more using the
savefig() function. Here's how to save a plot to a file:
import matplotlib.pyplot as plt
# Create and customize your plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y, label='Line Plot')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Saved Plot Example')
plt.legend()
# Save the plot to a file (e.g., PNG)
plt.savefig('saved_plot.png', dpi=300)
50
In this example, plt.savefig('saved_plot.png', dpi=300) saves the current plot to a PNG file named
"saved_plot.png" with a resolution of 300 dots per inch (dpi). You can specify the file format by changing the
file extension (e.g., ".pdf" for PDF, ".svg" for SVG).
Common parameters for savefig():
fname: The file name and path where the plot will be saved.
dpi: The resolution in dots per inch (default is 100).
format: The file format (e.g., 'png', 'pdf', 'svg').
bbox_inches: Specifies which part of the plot to save. Use 'tight' to save the entire plot (default).
transparent: If True, the plot will have a transparent background.
orientation: For PDFs, you can specify 'portrait' or 'landscape'.
PLOTTING FUNCTIONS IN PANDAS.
51
Pandas provides a number of plotting functions that can be used to create a variety of different plots and
visualizations. These functions are simple to use and can be used to create plots with just a few lines of code.
To use the Pandas plotting functions, you first need to import the pandas.plotting module. Once you have
imported the module, you can use the plot() function to create a plot of a Series or DataFrame. The plot()
function takes a number of keyword arguments, which can be used to control the appearance of the plot.
Here is a simple example of how to use the Pandas plot() function to create a line chart:
import pandas as pd
import matplotlib.pyplot as plt
# Create a Series
series = pd.Series([2, 4, 6, 8, 10])
# Create a line chart of the Series
series.plot()
# Show the plot
plt.show()
52
Line Plot:
To create a line plot of a Series or DataFrame,
simply call the .plot() method on the data:
import pandas as pd
import matplotlib.pyplot as plt
# Create a DataFrame
data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create a line plot
df['y'].plot()
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Line Plot')
plt.show()
Scatter Plot:
Scatter plots are created similarly, but you specify
kind='scatter':
df.plot(kind='scatter', x='x', y='y')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Scatter Plot')
plt.show()
53
Bar Plot:
To create a bar plot, you can use kind='bar':
df.plot(kind='bar', x='x', y='y')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Bar Plot')
plt.show()
Histogram:
For histograms, use kind='hist':
df['y'].plot(kind='hist', bins=5)
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()
Customization:
You can further customize your plots by using Matplotlib functions after calling .plot(). Additionally, you can create
subplots using the .subplots() method to have more control over the layout of multiple plots.
fig, axes = plt.subplots(nrows=2, ncols=2)
df['y'].plot(ax=axes[0, 0], title='Line Plot')
df.plot(kind='scatter', x='x', y='y', ax=axes[0, 1], title='Scatter Plot')
df.plot(kind='bar', x='x', y='y', ax=axes[1, 0], title='Bar Plot')
df['y'].plot(kind='hist', bins=5, ax=axes[1, 1], title='Histogram')
plt.tight_layout() # The tight_layout() function in pyplot module of matplotlib library is used to automatically adjust subplot parameters to give specified padding.
plt.show()
Ad

More Related Content

Similar to Unit 4_Working with Graphs _python (2).pptx (20)

Data Visualization_pandas in hadoop.pptx
Data Visualization_pandas in hadoop.pptxData Visualization_pandas in hadoop.pptx
Data Visualization_pandas in hadoop.pptx
Rahul Borate
 
interenship.pptx
interenship.pptxinterenship.pptx
interenship.pptx
Naveen316549
 
python-pandas-For-Data-Analysis-Manipulate.pptx
python-pandas-For-Data-Analysis-Manipulate.pptxpython-pandas-For-Data-Analysis-Manipulate.pptx
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
Pythonggggg. Ghhhjj-for-Data-Analysis.pptx
Pythonggggg. Ghhhjj-for-Data-Analysis.pptxPythonggggg. Ghhhjj-for-Data-Analysis.pptx
Pythonggggg. Ghhhjj-for-Data-Analysis.pptx
sahilurrahemankhan
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
Piyush rai
 
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfXII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
KrishnaJyotish1
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
NishantKumar1179
 
ppanda.pptx
ppanda.pptxppanda.pptx
ppanda.pptx
DOLKUMARCHANDRA
 
Lecture 9.pptx
Lecture 9.pptxLecture 9.pptx
Lecture 9.pptx
MathewJohnSinoCruz
 
Python Pandas
Python PandasPython Pandas
Python Pandas
Sunil OS
 
Getting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdfGetting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdf
SudhakarVenkey
 
Unit 3_Numpy_VP.pptx
Unit 3_Numpy_VP.pptxUnit 3_Numpy_VP.pptx
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
series and dataframes from python is discussed
series and dataframes from python is discussedseries and dataframes from python is discussed
series and dataframes from python is discussed
vidhyapm2
 
DataFrame Creation.pptx
DataFrame Creation.pptxDataFrame Creation.pptx
DataFrame Creation.pptx
SarveshMariappan
 
dataframe_operations and various functions
dataframe_operations and various functionsdataframe_operations and various functions
dataframe_operations and various functions
JayanthiM19
 
introductiontopandas- for 190615082420.pptx
introductiontopandas- for 190615082420.pptxintroductiontopandas- for 190615082420.pptx
introductiontopandas- for 190615082420.pptx
rahulborate13
 
Data Frame Data structure in Python pandas.pptx
Data Frame Data structure in Python pandas.pptxData Frame Data structure in Python pandas.pptx
Data Frame Data structure in Python pandas.pptx
Ramakrishna Reddy Bijjam
 
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas yayyyyyyyyyyyyyyyyyin Python.pptxPandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
Python Pandas.pptx
Python Pandas.pptxPython Pandas.pptx
Python Pandas.pptx
SujayaBiju
 
Pandas-(Ziad).pptx
Pandas-(Ziad).pptxPandas-(Ziad).pptx
Pandas-(Ziad).pptx
Sivam Chinna
 
Data Visualization_pandas in hadoop.pptx
Data Visualization_pandas in hadoop.pptxData Visualization_pandas in hadoop.pptx
Data Visualization_pandas in hadoop.pptx
Rahul Borate
 
python-pandas-For-Data-Analysis-Manipulate.pptx
python-pandas-For-Data-Analysis-Manipulate.pptxpython-pandas-For-Data-Analysis-Manipulate.pptx
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
Pythonggggg. Ghhhjj-for-Data-Analysis.pptx
Pythonggggg. Ghhhjj-for-Data-Analysis.pptxPythonggggg. Ghhhjj-for-Data-Analysis.pptx
Pythonggggg. Ghhhjj-for-Data-Analysis.pptx
sahilurrahemankhan
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
Piyush rai
 
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdfXII -  2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
KrishnaJyotish1
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
NishantKumar1179
 
Python Pandas
Python PandasPython Pandas
Python Pandas
Sunil OS
 
Getting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdfGetting started with Pandas Cheatsheet.pdf
Getting started with Pandas Cheatsheet.pdf
SudhakarVenkey
 
series and dataframes from python is discussed
series and dataframes from python is discussedseries and dataframes from python is discussed
series and dataframes from python is discussed
vidhyapm2
 
dataframe_operations and various functions
dataframe_operations and various functionsdataframe_operations and various functions
dataframe_operations and various functions
JayanthiM19
 
introductiontopandas- for 190615082420.pptx
introductiontopandas- for 190615082420.pptxintroductiontopandas- for 190615082420.pptx
introductiontopandas- for 190615082420.pptx
rahulborate13
 
Data Frame Data structure in Python pandas.pptx
Data Frame Data structure in Python pandas.pptxData Frame Data structure in Python pandas.pptx
Data Frame Data structure in Python pandas.pptx
Ramakrishna Reddy Bijjam
 
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas yayyyyyyyyyyyyyyyyyin Python.pptxPandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
AamnaRaza1
 
Python Pandas.pptx
Python Pandas.pptxPython Pandas.pptx
Python Pandas.pptx
SujayaBiju
 
Pandas-(Ziad).pptx
Pandas-(Ziad).pptxPandas-(Ziad).pptx
Pandas-(Ziad).pptx
Sivam Chinna
 

More from prakashvs7 (15)

Python lambda.pptx
Python lambda.pptxPython lambda.pptx
Python lambda.pptx
prakashvs7
 
Unit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptxUnit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
unit 5_Real time Data Analysis vsp.pptx
unit 5_Real time Data Analysis  vsp.pptxunit 5_Real time Data Analysis  vsp.pptx
unit 5_Real time Data Analysis vsp.pptx
prakashvs7
 
unit 4-1.pptx
unit 4-1.pptxunit 4-1.pptx
unit 4-1.pptx
prakashvs7
 
unit 3.ppt
unit 3.pptunit 3.ppt
unit 3.ppt
prakashvs7
 
final Unit 1-1.pdf
final Unit 1-1.pdffinal Unit 1-1.pdf
final Unit 1-1.pdf
prakashvs7
 
PCCF-UNIT 2-1 new.docx
PCCF-UNIT 2-1 new.docxPCCF-UNIT 2-1 new.docx
PCCF-UNIT 2-1 new.docx
prakashvs7
 
AI UNIT-4 Final (2).pptx
AI UNIT-4 Final (2).pptxAI UNIT-4 Final (2).pptx
AI UNIT-4 Final (2).pptx
prakashvs7
 
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxAI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptx
prakashvs7
 
AI-UNIT 1 FINAL PPT (2).pptx
AI-UNIT 1 FINAL PPT (2).pptxAI-UNIT 1 FINAL PPT (2).pptx
AI-UNIT 1 FINAL PPT (2).pptx
prakashvs7
 
DS-UNIT 3 FINAL.pptx
DS-UNIT 3 FINAL.pptxDS-UNIT 3 FINAL.pptx
DS-UNIT 3 FINAL.pptx
prakashvs7
 
DS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptxDS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptx
prakashvs7
 
DS-UNIT 1 FINAL (2).pptx
DS-UNIT 1 FINAL (2).pptxDS-UNIT 1 FINAL (2).pptx
DS-UNIT 1 FINAL (2).pptx
prakashvs7
 
Php unit i
Php unit i Php unit i
Php unit i
prakashvs7
 
The process
The processThe process
The process
prakashvs7
 
Python lambda.pptx
Python lambda.pptxPython lambda.pptx
Python lambda.pptx
prakashvs7
 
Unit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptxUnit 3_Numpy_Vsp.pptx
Unit 3_Numpy_Vsp.pptx
prakashvs7
 
unit 5_Real time Data Analysis vsp.pptx
unit 5_Real time Data Analysis  vsp.pptxunit 5_Real time Data Analysis  vsp.pptx
unit 5_Real time Data Analysis vsp.pptx
prakashvs7
 
final Unit 1-1.pdf
final Unit 1-1.pdffinal Unit 1-1.pdf
final Unit 1-1.pdf
prakashvs7
 
PCCF-UNIT 2-1 new.docx
PCCF-UNIT 2-1 new.docxPCCF-UNIT 2-1 new.docx
PCCF-UNIT 2-1 new.docx
prakashvs7
 
AI UNIT-4 Final (2).pptx
AI UNIT-4 Final (2).pptxAI UNIT-4 Final (2).pptx
AI UNIT-4 Final (2).pptx
prakashvs7
 
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxAI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptx
prakashvs7
 
AI-UNIT 1 FINAL PPT (2).pptx
AI-UNIT 1 FINAL PPT (2).pptxAI-UNIT 1 FINAL PPT (2).pptx
AI-UNIT 1 FINAL PPT (2).pptx
prakashvs7
 
DS-UNIT 3 FINAL.pptx
DS-UNIT 3 FINAL.pptxDS-UNIT 3 FINAL.pptx
DS-UNIT 3 FINAL.pptx
prakashvs7
 
DS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptxDS - Unit 2 FINAL (2).pptx
DS - Unit 2 FINAL (2).pptx
prakashvs7
 
DS-UNIT 1 FINAL (2).pptx
DS-UNIT 1 FINAL (2).pptxDS-UNIT 1 FINAL (2).pptx
DS-UNIT 1 FINAL (2).pptx
prakashvs7
 
Ad

Recently uploaded (20)

Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
Political History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptxPolitical History of Pala dynasty Pala Rulers NEP.pptx
Political History of Pala dynasty Pala Rulers NEP.pptx
Arya Mahila P. G. College, Banaras Hindu University, Varanasi, India.
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
Unit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdfUnit 6_Introduction_Phishing_Password Cracking.pdf
Unit 6_Introduction_Phishing_Password Cracking.pdf
KanchanPatil34
 
How to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odooHow to Set warnings for invoicing specific customers in odoo
How to Set warnings for invoicing specific customers in odoo
Celine George
 
The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...The ever evoilving world of science /7th class science curiosity /samyans aca...
The ever evoilving world of science /7th class science curiosity /samyans aca...
Sandeep Swamy
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
New Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptxNew Microsoft PowerPoint Presentation.pptx
New Microsoft PowerPoint Presentation.pptx
milanasargsyan5
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
apa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdfapa-style-referencing-visual-guide-2025.pdf
apa-style-referencing-visual-guide-2025.pdf
Ishika Ghosh
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Ad

Unit 4_Working with Graphs _python (2).pptx

  • 2. DATA WRANGLING Data wrangling in Python refers to the process of cleaning, transforming, and preparing raw or messy data for analysis, visualization, or machine learning tasks using Python programming language. It involves a series of operations to make the data more structured, complete, and suitable for the intended analysis. Python provides various libraries and tools for efficiently performing data wrangling tasks. Here are some common steps and techniques involved in data wrangling in Python: 2
  • 3. 1. Data Loading: Load the raw data into Python using libraries like Pandas (for structured data), NumPy (for numerical data), or specialized libraries for other data formats like CSV, Excel, JSON, or databases. import pandas as pd # Load data from a CSV file df = pd.read_csv('data.csv') 2. Data Exploration: Get a preliminary understanding of the data by examining its structure, summary statistics, and identifying missing values. # Display the first few rows of the DataFrame print(df.head()) # Get basic summary statistics print(df.describe()) # Check for missing values print(df.isnull().sum()) 3
  • 4. 3. Data Cleaning: Handle missing values by imputing them or dropping rows/columns with missing data. Remove duplicates. Correct data errors and inconsistencies. # Drop rows with missing values df = df.dropna() # Remove duplicates df = df.drop_duplicates() # Correct data errors df['column_name'] = df['column_name'].apply(correct_function) 4
  • 5. 4. Data Transformation: Convert data types. Normalize or scale numerical data. Encode categorical variables. Create new features or variables. # Convert data types df['numeric_column'] = df['numeric_column'].astype(float) # Normalize numerical data df['numeric_column'] = (df['numeric_column'] - df['numeric_column'].mean()) / df['numeric_column'].std() # Encode categorical variables df = pd.get_dummies(df, columns=['categorical_column']) # Create new features df['new_feature'] = df['feature1'] * df['feature2'] 5
  • 6. 5. Data Aggregation and Grouping: Aggregate data by grouping based on certain attributes. Calculate summary statistics for groups. # Group by a categorical variable and calculate the mean grouped_data = df.groupby('category_column')['numeric_column'].mean() 6. Data Visualization: Use libraries like Matplotlib or Seaborn to visualize the data, detect patterns, and gain insights. import matplotlib.pyplot as plt # Create a histogram plt.hist(df['numeric_column']) plt.xlabel('Numeric Column') plt.ylabel('Frequency') plt.show() 6
  • 7. 7. Data Export: Save the cleaned and transformed data to a new file if necessary. # Export cleaned data to a CSV file df.to_csv('cleaned_data.csv', index=False) 7
  • 8. COMBINING AND MERGING DATA SETS IN PYTHON Combining and merging data sets in Python is a common operation in data analysis and manipulation. You can achieve this using various libraries, with the most popular one being pandas. Here, I'll provide an overview of how to combine and merge data sets using pandas. Combining Data Sets 1. Concatenation: Concatenation is used to combine data frames either row-wise or column-wise. a.Row-wise concatenation: import pandas as pd df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2']}) df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'], 'B': ['B3', 'B4', 'B5']}) result = pd.concat([df1, df2], axis=0) # Concatenate along rows (axis=0) 8
  • 9. b. Column-wise concatenation: result = pd.concat([df1, df2], axis=1) # Concatenate along columns (axis=1) 2. Appending: Appending is a convenient way to add rows to an existing DataFrame. result = df1.append(df2) 9
  • 10. Ex. 2 Define a dictionary containing employee data data2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'], 'Age':[17, 14, 12, 52], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=[0, 1, 2, 3]) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=[4, 5, 6, 7]) print(df, "nn", df1) # using a .concat() method frames = [df, df1] res1 = pd.concat(frames) print(res1) 10
  • 12. Merging Data Sets Merging is used to combine data frames based on common columns or indices. 1. Inner Join: result = pd.merge(df1, df2, on='key_column', how='inner’) 2. Left Join: result = pd.merge(df1, df2, on='key_column', how='left’) 3. Right Join: result = pd.merge(df1, df2, on='key_column', how='right’) 4. Outer Join: result = pd.merge(df1, df2, on='key_column', how='outer’) 5. Merging on Multiple Columns: You can merge on multiple columns by passing a list of column names to the on parameter. result = pd.merge(df1, df2, on=['key_column1', 'key_column2'], how='inner') 12
  • 13. 13 res2 = pd.concat([df, df1], axis=1, join='inner') print(res2)
  • 14. 14 res2 = pd.concat([df, df1], axis=1, join=‘outer') print(res2)
  • 15. DATA TRANSFORMATION Data transformation is the process of converting raw data into a format that is more suitable for analysis, modeling, or machine learning. It is an essential step in any data science project, and Python is a popular programming language for data transformation. There are many different types of data transformation, but some common examples include: Cleaning and preprocessing: This involves removing errors and inconsistencies from the data, as well as converting the data to a consistent format. Feature engineering: This involves creating new features from the existing data, or transforming existing features in a way that is more informative for the task at hand. Encoding categorical data: Categorical data, such as text or labels, needs to be converted to numerical data before it can be used by many machine learning algorithms. Scaling and normalization: This involves transforming the data so that all features are on a similar scale, which can improve the performance of machine learning algorithms. There are a number of different Python libraries that can be used for data transformation, but the most popular one is Pandas. Pandas is a powerful library for data manipulation and analysis, and it provides a wide range of functions for data transformation. 15
  • 16. import pandas as pd # Load the data df = pd.read_csv('data.csv') # Clean the data df = df.dropna() # Drop rows with missing values df['age'] = df['age'].astype('int') # Convert the 'age' column to integers # Create a new feature df['age_group'] = df['age'].apply(lambda x: 'young' if x < 18 else 'adult') # Encode categorical data df['gender'] = df['gender'].map({'male': 1, 'female': 0}) # Scale the data from sklearn.preprocessing import StandardScaler scaler = StandardScaler() df[['height', 'weight']] = scaler.fit_transform(df[['height', 'weight']]) # Save the transformed data df.to_csv('transformed_data.csv', index=False) 16
  • 17. 1. Data Cleaning: Data cleaning involves handling missing values, removing duplicates, and correcting errors in your dataset. Handling Missing Values: Pandas provides functions like dropna(), fillna(), and interpolate() to handle missing values. import pandas as pd # Remove rows with missing values df.dropna() dataframe.dropna(axis, how, thresh, subset, inplace) # Fill missing values with a specific value df.fillna(0) dataframe.fillna(value, method, axis, inplace, limit, downcast) # filling missing value using fillna() df.fillna(0) # Interpolate missing values df.interpolate() dataframe.interpolate(method, axis, inplace, limit, limit_direction, limit_area, downcast, kwargs) Removing Duplicates: df.drop_duplicates() 17
  • 18. 2. Data Filtering: Filtering allows you to select a subset of data based on certain conditions. # Filter rows where a condition is met filtered_df = df[df['column_name'] > 10] 3. Data Aggregation: Aggregation involves summarizing data by grouping it based on certain criteria. # Group by a column and calculate aggregate statistics grouped_df = df.groupby('category_column')['numeric_column'].mean() 4. Data Transformation: Data transformation includes operations like converting data types, scaling values, or applying mathematical functions. # Convert data types df['numeric_column'] = df['numeric_column'].astype(float) # Scaling values (e.g., Min-Max scaling) from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() df['scaled_column'] = scaler.fit_transform(df[['numeric_column']]) 18
  • 19. 5. One-Hot Encoding: Convert categorical variables into a numerical format using one-hot encoding. encoded_df = pd.get_dummies(df, columns=['categorical_column’]) 6. Reshaping Data: Reshaping data includes tasks like pivoting, melting, or stacking/unstacking for better analysis. # Pivot a DataFrame pivoted_df = df.pivot(index='row_column', columns='column_column', values='value_column') # Melt a DataFrame melted_df = pd.melt(df, id_vars=['id_column'], value_vars=['var1', 'var2'], var_name='variable', value_name='value') 19
  • 20. 7. Text Data Processing: For text data, you can perform transformations such as tokenization, stemming, and stop-word removal using libraries like NLTK or spaCy. import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize nltk.download('stopwords') nltk.download('punkt') # Tokenization and stop-word removal df['text_column'] = df['text_column'].apply(lambda x: ' '.join([word for word in word_tokenize(x) 20
  • 21. DETECTING AND FILTERING OUTLIERS IN PYTHON What are Outliers in Python? Before diving deep into the concept of outliers, let us understand the origin of raw data. Raw data that is fed to a system is usually generated from surveys and extraction of data from real-time actions on the web. This may give rise to variations in the data and there exists a chance of measurement error while recording the data. An outlier is a point or set of data points that lie away from the rest of the data values of the dataset. That is, it is a data point(s) that appear away from the overall distribution of data values in a dataset. Outliers are possible only in continuous values. Thus, the detection and removal of outliers are applicable to regression values only. Basically, outliers appear to diverge from the overall proper and well structured distribution of the data elements. It can be considered as an abnormal distribution which appears away from the class or population. 21
  • 22. Why is it necessary to remove outliers from the data? As discussed above, outliers are the data points that lie away from the usual distribution of the data and causes causes the below effects on the overall data distribution: Affects the overall standard variation of the data. Manipulates the overall mean of the data. Converts the data to a skewed form. It causes bias in the accuracy estimation of the machine learning model. Affects the distribution and statistics of the dataset. Detection of Outliers – IQR approach The outliers in the dataset can be detected by the below methods: •Z-score •Scatter Plots •Interquartile range(IQR) 22
  • 23. 1. Visual Inspection: Start by visualizing your data using histograms, box plots, scatter plots, or other visualization techniques. Outliers often appear as points far from the main cluster or as values outside the whiskers of box plots. Visualization can help you identify potential outliers. import matplotlib.pyplot as plt import seaborn as sns # Box plot to visualize outliers sns.boxplot(x=df['column_name']) plt.show() 2. Z-Score: The Z-score measures how far a data point is from the mean in terms of standard deviations. You can use the Z- score to detect outliers. Typically, data points with a Z-score greater than a threshold (e.g., 2 or 3) are considered outliers. from scipy import stats z_scores = stats.zscore(df['column_name']) outliers = df[abs(z_scores) > 2] filtered_data = df[abs(z_scores) <= 2] 23
  • 24. 3. IQR (Interquartile Range) Method: The IQR method involves calculating the IQR (the difference between the 75th percentile and the 25th percentile) and identifying outliers as values outside a specified range. Q1 = df['column_name'].quantile(0.25) Q3 = df['column_name'].quantile(0.75) IQR = Q3 - Q1 lower_bound = Q1 - 1.5 * IQR upper_bound = Q3 + 1.5 * IQR outliers = df[(df['column_name'] < lower_bound) | (df['column_name'] > upper_bound)] filtered_data = df[(df['column_name'] >= lower_bound) & (df['column_name'] <= upper_bound)] 24
  • 25. 4. Tukey's Fences: Tukey's Fences method is similar to the IQR method but uses a different threshold for identifying outliers. Q1 = df['column_name'].quantile(0.25) Q3 = df['column_name'].quantile(0.75) lower_fence = Q1 - 3 * (Q3 - Q1) upper_fence = Q3 + 3 * (Q3 - Q1) outliers = df[(df['column_name'] < lower_fence) | (df['column_name'] > upper_fence)] filtered_data = df[(df['column_name'] >= lower_fence) & (df['column_name'] <= upper_fence)] 25
  • 26. 5. Machine Learning-Based Methods: You can also use machine learning models, such as Isolation Forest or One-Class SVM, to detect outliers in your data. from sklearn.ensemble import IsolationForest clf = IsolationForest(contamination=0.05) # Adjust contamination based on your dataset outliers = clf.fit_predict(df[['column_name']]) outliers = df[outliers == -1] filtered_data = df[outliers == 1] 26
  • 27. STRING MANIPULATION String manipulation in Python involves performing various operations on strings, such as concatenation, slicing, searching, replacing, formatting, and more. Python provides a rich set of string manipulation methods and functions that make it easy to work with text data. Here are some common string manipulation techniques in Python: 1. String Concatenation: You can concatenate strings using the + operator or by using the str.join() method. str1 = "Hello" str2 = "World" result = str1 + ", " + str2 # Using the + operator words = ["Hello", "World"] result = ", ".join(words) # Using join 27
  • 28. 2. String Slicing: String slicing allows you to extract substrings from a string based on their positions. text = "Python Programming" substring = text[7:18] # Extract "Programming" 3. String Searching: You can search for substrings within a string using methods like str.find(), str.index(), or regular expressions with the re module. text = "Python is a powerful programming language" position = text.find("powerful") # Find the position of "powerful" 4. String Replacement: Replace specific substrings within a string using the str.replace() method. text = "Python is a great programming language" new_text = text.replace("great", "powerful") # Replace "great" with "powerful" 28
  • 29. 5. String Formatting: You can format strings using f-strings (Python 3.6+), the .format() method, or the % operator. name = "Alice" age = 30 formatted_str = f"My name is {name} and I am {age} years old." name = "Bob" age = 25 formatted_str = "My name is {} and I am {} years old.".format(name, age) 6. String Splitting: str Syntax string.split(separator,max) Split a string into a list of substrings using the str.split() method. 1. text = "Python,Java,C++,JavaScript" languages = text.split(",") # Split by comma 2. str = “Python is cool” print(str.split()) : [‘Python’, ‘is’, ‘cool’ 3. str = “abcabc” print(str.split(c)) :[‘abc’, ‘abc’] 4. f = open(“sample.txt”, “r”) info = f.read() print(info.splitlines()) f.close() 29
  • 30. 7. String Stripping: Remove leading and trailing whitespace characters using str.strip() or str.lstrip() and str.rstrip(). text = " Python is awesome! " cleaned_text = text.strip() # Remove leading and trailing spaces 8. String Case Conversion: Convert the case of a string using methods like str.lower(), str.upper(), or str.capitalize(). text = "Hello World" lower_case = text.lower() upper_case = text.upper() capitalized = text.capitalize() 30
  • 31. Syntax Function string.upper() To transform all the characters of the string into uppercase. string.lower() To transform all the characters of the string into lowercase. string.title() To transform the first letter of a word into the upper case and the rest of the characters into the lower case. string.swapcase() To transform the upper case characters into lower case and vice versa. string.capitalize() To transform the first character in the string to the upper case. string.isupper() Returns true if all the alphabetic characters of the string are upper case. string.islower() Returns true if all the alphabetic characters of the string are lower case. string.Endswith() Return true if the string ends with a specific value. string.Startswith() Return true if the string starts with a specific value. string.index(‘character’) Return the position of the character. 31
  • 32. VECTORIZED STRING FUNCTIONS Vectorized string functions in pandas allow you to efficiently perform operations on string data within a pandas DataFrame or Series. These functions are accessed through the .str attribute of a pandas Series and enable you to apply string operations element-wise. Here are some commonly used vectorized string functions in pandas: import pandas as pd series = pd.Series(['Alice', 'Bob', 'Carol']) series_lowercase = series.str.lower() print(series_lowercase) 0 alice 1 bob 2 carol dtype: object 32
  • 33. Some of the most commonly used vectorized string functions in Pandas include: str.lower(): Convert all strings to lowercase. str.upper(): Convert all strings to uppercase. str.strip(): Remove whitespace from the beginning and end of all strings. str.split(): Split all strings into a list of strings, using a specified separator. str.replace(): Replace all occurrences of a specified substring with another substring in all strings. str.contains(): Return a boolean Series indicating whether each string contains a specified substring. Vectorized string functions can also be used to perform more complex string operations, such as regular expression matching and extraction. For example, the following code uses the str.extract() function to extract the first name from each email address in a Series: import pandas as pd series = pd.Series(['[email protected]', '[email protected]', '[email protected]']) first_names = series.str.extract(r'(?P<first_name>w+)@example.com') print(first_names) Output: 0 alice 1 bob 2 carol dtype: object 33
  • 34. PLOTTING AND VISUALIZATION Plotting and visualization are crucial for understanding and communicating data. In Python, there are several libraries for creating plots and visualizations, with Matplotlib, Seaborn, and Plotly being some of the most popular ones. Here's an overview of how to create plots and visualizations in Python: Matplotlib Matplotlib is an easy-to-use, low-level data visualization library that is built on NumPy arrays. It consists of various plots like scatter plot, line plot, histogram, etc. Matplotlib provides a lot of flexibility. pip install matplotlib 34
  • 35. import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] plt.plot(x, y) plt.xlabel('X-axis') plt.ylabel('Y-axis') plt.title('Line Plot') plt.show() Basic Line Plot: 35
  • 36. Seaborn: Seaborn is built on top of Matplotlib and provides a high-level interface for creating informative and attractive statistical graphics. Installation: pip install seaborn Basic Scatter Plot: import seaborn as sns x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] sns.scatterplot(x=x, y=y) plt.xlabel('X-axis') plt.ylabel('Y-axis') plt.title('Scatter Plot') plt.show() 36
  • 37. Plotly: Plotly is a powerful library for creating interactive and web-based visualizations. It is often used for creating dashboards and web applications. Installation: pip install plotly Basic Bar Chart: import plotly.express as px data = {'Category': ['A', 'B', 'C', 'D'], 'Values': [10, 20, 15, 30]} fig = px.bar(data, x='Category', y='Values', title='Bar Chart') fig.show() 37
  • 38. Other Libraries: Pandas: Pandas also provides basic plotting functionality through the .plot() method for DataFrames, making it convenient for quick exploratory data analysis. Bokeh: Bokeh is another library for interactive web-based visualizations and is well-suited for creating interactive dashboards. Altair: Altair is a declarative statistical visualization library for Python, making it easy to create complex visualizations with concise code. ggplot (ggpy): ggplot is a Python implementation of the popular ggplot2 package from R, which uses a grammar of graphics to create plots. 38
  • 39. MATPLOTLIB API PRIMER 39 Matplotlib is a popular Python library for creating static, animated, and interactive visualizations. When creating plots with Matplotlib, you can customize the appearance of your lines, markers, and line styles. Here's a primer on how to do that: Colors: Matplotlib allows you to specify colors for lines, markers, and other plot elements in several ways: Named Colors: You can use named colors like 'red', 'blue', 'green', etc. import matplotlib.pyplot as plt plt.plot(x, y, color='red', label='Red Line') RGB Values: You can use RGB tuples to specify colors. plt.plot(x, y, color=(0.1, 0.2, 0.3), label='Custom Color') Hexadecimal Colors: You can also use hexadecimal color codes. plt.plot(x, y, color='#FF5733', label='Hex Color')
  • 40. 40 Markers: Markers are used to indicate specific data points on a plot. You can customize markers in Matplotlib: plt.plot(x, y, marker='o', markersize=8, markerfacecolor='yellow', markeredgecolor='black', label='Custom Marker') marker: Specifies the marker style (e.g., 'o' for circles, 's' for squares, 'x' for crosses). markersize: Sets the size of the marker. markerfacecolor: Sets the marker's fill color. markeredgecolor: Sets the marker's edge color.
  • 41. 41 Line Styles: You can customize the line style of your plot: plt.plot(x, y, linestyle='--', linewidth=2, label='Dashed Line') linestyle: Specifies the line style (e.g., '-', '--', '-.', ':'). linewidth: Sets the width of the line.
  • 42. 42 Line Styles: You can customize the appearance of lines in your plot using various line styles, markers, and colors. Here's an example: import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] # Customizing line style with color, marker, and linestyle plt.plot(x, y, color='blue', marker='o', linestyle='--', markersize=8, label='Custom Line') plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Custom Line Style') plt.legend() plt.show() In the above example: color: Sets the line color. marker: Specifies the marker style (e.g., 'o' for circles, 's' for squares). linestyle: Sets the line style ('--' for dashed, ':' for dotted, etc.). markersize: Adjusts the size of markers.
  • 43. 43 Ticks and Labels: You can customize tick locations and labels on the x and y axes using xticks() and yticks() functions: import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] plt.plot(x, y, label='Line Plot') # Customizing x-axis ticks and labels plt.xticks([1, 2, 3, 4, 5], ['A', 'B', 'C', 'D', 'E']) # Customizing y-axis ticks and labels plt.yticks([2, 4, 6, 8, 10], ['Low', 'Medium', 'High', 'Very High', 'Max']) plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Custom Ticks and Labels') plt.legend() plt.show() In this example: xticks() and yticks() specify the locations and labels for the ticks on the x and y axes, respectively.
  • 44. 44 Legends: To add legends to your plot, you can use the legend() function. You should also label your plotted lines or data points using the label parameter when creating the plot. import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y1 = [2, 4, 6, 8, 10] y2 = [1, 3, 5, 7, 9] plt.plot(x, y1, label='Line 1') plt.plot(x, y2, label='Line 2') plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Legend Example') plt.legend() plt.show() In this example: label is provided when creating each line, and legend() is called to display the legend.
  • 45. ANNOTATIONS AND DRAWING ON A SUBPLOT 45 Annotations and drawing on a subplot in Matplotlib allow you to add textual or graphical elements to your plots to provide additional information or highlight specific points of interest. Here's how you can add annotations and draw on a subplot:
  • 46. 46 Adding Text Annotations: You can add text annotations to your plot using the text() function. Here's an example: import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] plt.plot(x, y, label='Line Plot') # Adding a text annotation plt.text(3, 7, 'Annotation Here', fontsize=12, color='red') plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Text Annotation Example') plt.legend() plt.show() In this example, plt.text(x, y, text, fontsize, color) is used to add a text annotation at coordinates (3, 7) with the specified text, fontsize, and color.
  • 47. 47 Adding Arrows with Annotations: You can add arrows to point to specific locations on your plot using the annotate() function. Here's an example: import matplotlib.pyplot as plt x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] plt.plot(x, y, label='Line Plot') # Adding an arrow with annotation plt.annotate('Important Point', xy=(3, 6), xytext=(4, 8), fontsize=12, arrowprops=dict(arrowstyle='->', color='blue')) plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Arrow Annotation Example') plt.legend() plt.show() In this example, plt.annotate(text, xy, xytext, fontsize, arrowprops) is used to add an arrow with text annotation. xy specifies the point being pointed to, and xytext specifies the location of the text.
  • 48. 48 Drawing Shapes: You can draw various shapes, lines, and polygons on a subplot using Matplotlib's plotting functions. For example, to draw a rectangle: import matplotlib.pyplot as plt import matplotlib.patches as patches # Create a subplot fig, ax = plt.subplots() # Add a rectangle rectangle = patches.Rectangle((1, 2), 2, 4, linewidth=2, edgecolor='red', facecolor='none') ax.add_patch(rectangle) plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Drawing Shapes Example') plt.show() In this example, we create a subplot, add a rectangle using patches.Rectangle(), and then add it to the plot with ax.add_patch().
  • 49. SAVING PLOTS TO FILE 49 You can save plots created with Matplotlib to various file formats such as PNG, PDF, SVG, and more using the savefig() function. Here's how to save a plot to a file: import matplotlib.pyplot as plt # Create and customize your plot x = [1, 2, 3, 4, 5] y = [2, 4, 6, 8, 10] plt.plot(x, y, label='Line Plot') plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Saved Plot Example') plt.legend() # Save the plot to a file (e.g., PNG) plt.savefig('saved_plot.png', dpi=300)
  • 50. 50 In this example, plt.savefig('saved_plot.png', dpi=300) saves the current plot to a PNG file named "saved_plot.png" with a resolution of 300 dots per inch (dpi). You can specify the file format by changing the file extension (e.g., ".pdf" for PDF, ".svg" for SVG). Common parameters for savefig(): fname: The file name and path where the plot will be saved. dpi: The resolution in dots per inch (default is 100). format: The file format (e.g., 'png', 'pdf', 'svg'). bbox_inches: Specifies which part of the plot to save. Use 'tight' to save the entire plot (default). transparent: If True, the plot will have a transparent background. orientation: For PDFs, you can specify 'portrait' or 'landscape'.
  • 51. PLOTTING FUNCTIONS IN PANDAS. 51 Pandas provides a number of plotting functions that can be used to create a variety of different plots and visualizations. These functions are simple to use and can be used to create plots with just a few lines of code. To use the Pandas plotting functions, you first need to import the pandas.plotting module. Once you have imported the module, you can use the plot() function to create a plot of a Series or DataFrame. The plot() function takes a number of keyword arguments, which can be used to control the appearance of the plot. Here is a simple example of how to use the Pandas plot() function to create a line chart: import pandas as pd import matplotlib.pyplot as plt # Create a Series series = pd.Series([2, 4, 6, 8, 10]) # Create a line chart of the Series series.plot() # Show the plot plt.show()
  • 52. 52 Line Plot: To create a line plot of a Series or DataFrame, simply call the .plot() method on the data: import pandas as pd import matplotlib.pyplot as plt # Create a DataFrame data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8, 10]} df = pd.DataFrame(data) # Create a line plot df['y'].plot() plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Line Plot') plt.show() Scatter Plot: Scatter plots are created similarly, but you specify kind='scatter': df.plot(kind='scatter', x='x', y='y') plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Scatter Plot') plt.show()
  • 53. 53 Bar Plot: To create a bar plot, you can use kind='bar': df.plot(kind='bar', x='x', y='y') plt.xlabel('X-axis Label') plt.ylabel('Y-axis Label') plt.title('Bar Plot') plt.show() Histogram: For histograms, use kind='hist': df['y'].plot(kind='hist', bins=5) plt.xlabel('Values') plt.ylabel('Frequency') plt.title('Histogram') plt.show() Customization: You can further customize your plots by using Matplotlib functions after calling .plot(). Additionally, you can create subplots using the .subplots() method to have more control over the layout of multiple plots. fig, axes = plt.subplots(nrows=2, ncols=2) df['y'].plot(ax=axes[0, 0], title='Line Plot') df.plot(kind='scatter', x='x', y='y', ax=axes[0, 1], title='Scatter Plot') df.plot(kind='bar', x='x', y='y', ax=axes[1, 0], title='Bar Plot') df['y'].plot(kind='hist', bins=5, ax=axes[1, 1], title='Histogram') plt.tight_layout() # The tight_layout() function in pyplot module of matplotlib library is used to automatically adjust subplot parameters to give specified padding. plt.show()