Unit 4_Working with Graphs _python (2).pptx

DATA WRANGLING
Data wrangling in Python refers to the process of cleaning,
transforming, and preparing raw or messy data for analysis,
visualization, or machine learning tasks using Python
programming language. It involves a series of operations to
make the data more structured, complete, and suitable for the
intended analysis. Python provides various libraries and tools
for efficiently performing data wrangling tasks. Here are some
common steps and techniques involved in data wrangling in
Python:
2

1. Data Loading: Load the raw data into Python using libraries like Pandas (for structured data), NumPy
(for numerical data), or specialized libraries for other data formats like CSV, Excel, JSON, or databases.
import pandas as pd
# Load data from a CSV file
df = pd.read_csv('data.csv')
2. Data Exploration: Get a preliminary understanding of the data by examining its structure, summary statistics,
and identifying missing values.
# Display the first few rows of the DataFrame
print(df.head())
# Get basic summary statistics
print(df.describe())
# Check for missing values
print(df.isnull().sum())
3

3. Data Cleaning:
Handle missing values by imputing them or dropping rows/columns with missing data.
Remove duplicates.
Correct data errors and inconsistencies.
# Drop rows with missing values
df = df.dropna()
# Remove duplicates
df = df.drop_duplicates()
# Correct data errors
df['column_name'] = df['column_name'].apply(correct_function)
4

4. Data Transformation:
Convert data types.
Normalize or scale numerical data.
Encode categorical variables.
Create new features or variables.
# Convert data types
df['numeric_column'] = df['numeric_column'].astype(float)
# Normalize numerical data
df['numeric_column'] = (df['numeric_column'] - df['numeric_column'].mean()) / df['numeric_column'].std()
# Encode categorical variables
df = pd.get_dummies(df, columns=['categorical_column'])
# Create new features
df['new_feature'] = df['feature1'] * df['feature2']
5

5. Data Aggregation and Grouping:
Aggregate data by grouping based on certain attributes.
Calculate summary statistics for groups.
# Group by a categorical variable and calculate the mean
grouped_data = df.groupby('category_column')['numeric_column'].mean()
6. Data Visualization:
Use libraries like Matplotlib or Seaborn to visualize the data, detect patterns, and gain insights.
import matplotlib.pyplot as plt
# Create a histogram
plt.hist(df['numeric_column'])
plt.xlabel('Numeric Column')
plt.ylabel('Frequency')
plt.show()
6

7. Data Export:
Save the cleaned and transformed data to a new file if necessary.
# Export cleaned data to a CSV file
df.to_csv('cleaned_data.csv', index=False)
7

COMBINING AND MERGING DATA SETS IN PYTHON
Combining and merging data sets in Python is a common operation in data analysis and manipulation. You can
achieve this using various libraries, with the most popular one being pandas. Here, I'll provide an overview of
how to combine and merge data sets using pandas.
Combining Data Sets
1. Concatenation:
Concatenation is used to combine data frames either row-wise or column-wise.
a.Row-wise concatenation:
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B': ['B0', 'B1', 'B2']})
df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'], 'B': ['B3', 'B4', 'B5']})
result = pd.concat([df1, df2], axis=0) # Concatenate along rows
(axis=0)
8

b. Column-wise concatenation:
result = pd.concat([df1, df2], axis=1) # Concatenate along columns (axis=1)
2. Appending:
Appending is a convenient way to add rows to an existing DataFrame.
result = df1.append(df2)
9

Ex. 2
Define a dictionary containing employee data
data2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'],
'Age':[17, 14, 12, 52],
'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
# Convert the dictionary into DataFrame
df1 = pd.DataFrame(data2, index=[4, 5, 6, 7])
print(df, "nn", df1)
# using a .concat() method
frames = [df, df1]
res1 = pd.concat(frames)
print(res1)
10

Merging Data Sets
Merging is used to combine data frames based on common columns or indices.
1. Inner Join:
result = pd.merge(df1, df2, on='key_column', how='inner’)
2. Left Join:
result = pd.merge(df1, df2, on='key_column', how='left’)
3. Right Join:
result = pd.merge(df1, df2, on='key_column', how='right’)
4. Outer Join:
result = pd.merge(df1, df2, on='key_column', how='outer’)
5. Merging on Multiple Columns:
You can merge on multiple columns by passing a list of column names to the on parameter.
result = pd.merge(df1, df2, on=['key_column1', 'key_column2'], how='inner')
12

13
res2 = pd.concat([df, df1], axis=1, join='inner')
print(res2)

14
res2 = pd.concat([df, df1], axis=1, join=‘outer')
print(res2)

DATA TRANSFORMATION
Data transformation is the process of converting raw data into a format that is more suitable for analysis,
modeling, or machine learning. It is an essential step in any data science project, and Python is a popular
programming language for data transformation.
There are many different types of data transformation, but some common examples include:
Cleaning and preprocessing: This involves removing errors and inconsistencies from the data, as well as
converting the data to a consistent format.
Feature engineering: This involves creating new features from the existing data, or transforming existing
features in a way that is more informative for the task at hand.
Encoding categorical data: Categorical data, such as text or labels, needs to be converted to numerical data
before it can be used by many machine learning algorithms.
Scaling and normalization: This involves transforming the data so that all features are on a similar scale, which
can improve the performance of machine learning algorithms.
There are a number of different Python libraries that can be used for data transformation, but the most
popular one is Pandas. Pandas is a powerful library for data manipulation and analysis, and it provides a wide
range of functions for data transformation.
15

import pandas as pd
# Load the data
df = pd.read_csv('data.csv')
# Clean the data
df = df.dropna() # Drop rows with missing values
df['age'] = df['age'].astype('int') # Convert the 'age' column to integers
# Create a new feature
df['age_group'] = df['age'].apply(lambda x: 'young' if x < 18 else 'adult')
# Encode categorical data
df['gender'] = df['gender'].map({'male': 1, 'female': 0})
# Scale the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df[['height', 'weight']] = scaler.fit_transform(df[['height', 'weight']])
# Save the transformed data
df.to_csv('transformed_data.csv', index=False)
16

1. Data Cleaning:
Data cleaning involves handling missing values, removing duplicates, and correcting errors in your dataset.
Handling Missing Values:
Pandas provides functions like dropna(), fillna(), and interpolate() to handle missing values.
import pandas as pd
# Remove rows with missing values
df.dropna() dataframe.dropna(axis, how, thresh, subset, inplace)
# Fill missing values with a specific value
df.fillna(0) dataframe.fillna(value, method, axis, inplace, limit, downcast)
# filling missing value using fillna()
df.fillna(0)
# Interpolate missing values
df.interpolate() dataframe.interpolate(method, axis, inplace, limit, limit_direction, limit_area,
downcast, kwargs)
Removing Duplicates:
df.drop_duplicates()
17

2. Data Filtering:
Filtering allows you to select a subset of data based on certain conditions.
# Filter rows where a condition is met
filtered_df = df[df['column_name'] > 10]
3. Data Aggregation:
Aggregation involves summarizing data by grouping it based on certain criteria.
# Group by a column and calculate aggregate statistics
grouped_df = df.groupby('category_column')['numeric_column'].mean()
4. Data Transformation:
Data transformation includes operations like converting data types, scaling values, or applying mathematical
functions.
# Convert data types
df['numeric_column'] = df['numeric_column'].astype(float)
# Scaling values (e.g., Min-Max scaling)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df['scaled_column'] = scaler.fit_transform(df[['numeric_column']])
18

5. One-Hot Encoding:
Convert categorical variables into a numerical format using one-hot encoding.
encoded_df = pd.get_dummies(df, columns=['categorical_column’])
6. Reshaping Data:
Reshaping data includes tasks like pivoting, melting, or stacking/unstacking for better analysis.
# Pivot a DataFrame
pivoted_df = df.pivot(index='row_column', columns='column_column', values='value_column')
# Melt a DataFrame
melted_df = pd.melt(df, id_vars=['id_column'], value_vars=['var1', 'var2'], var_name='variable',
value_name='value')
19

7. Text Data Processing:
For text data, you can perform transformations such as tokenization, stemming, and stop-word removal using
libraries like NLTK or spaCy.
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('stopwords')
nltk.download('punkt')
# Tokenization and stop-word removal
df['text_column'] = df['text_column'].apply(lambda x: ' '.join([word for word in word_tokenize(x)
20

DETECTING AND FILTERING OUTLIERS IN PYTHON
What are Outliers in Python?
Before diving deep into the concept of outliers, let us understand the origin of raw data.
Raw data that is fed to a system is usually generated from surveys and extraction of data from real-time actions
on the web. This may give rise to variations in the data and there exists a chance of measurement error while
recording the data.
An outlier is a point or set of data points that lie away from the rest of the data values of the dataset. That is,
it is a data point(s) that appear away from the overall distribution of data values in a dataset.
Outliers are possible only in continuous values. Thus, the detection and removal of outliers are applicable to
regression values only.
Basically, outliers appear to diverge from the overall proper and well structured distribution of the data
elements. It can be considered as an abnormal distribution which appears away from the class or population.
21

Why is it necessary to remove outliers from the data?
As discussed above, outliers are the data points that lie away from the usual distribution of the data and causes
causes the below effects on the overall data distribution:
Affects the overall standard variation of the data.
Manipulates the overall mean of the data.
Converts the data to a skewed form.
It causes bias in the accuracy estimation of the machine learning model.
Affects the distribution and statistics of the dataset.
Detection of Outliers – IQR approach
The outliers in the dataset can be detected by the below methods:
•Z-score
•Scatter Plots
•Interquartile range(IQR)
22

1. Visual Inspection:
Start by visualizing your data using histograms, box plots, scatter plots, or other visualization techniques.
Outliers often appear as points far from the main cluster or as values outside the whiskers of box plots.
Visualization can help you identify potential outliers.
import seaborn as sns
# Box plot to visualize outliers
sns.boxplot(x=df['column_name'])
plt.show()
2. Z-Score:
The Z-score measures how far a data point is from the mean in terms of standard deviations. You can use the Z-
score to detect outliers. Typically, data points with a Z-score greater than a threshold (e.g., 2 or 3) are considered
outliers.
from scipy import stats
z_scores = stats.zscore(df['column_name'])
outliers = df[abs(z_scores) > 2]
filtered_data = df[abs(z_scores) <= 2]
23

3. IQR (Interquartile Range) Method:
The IQR method involves calculating the IQR (the difference between the 75th percentile and the 25th
percentile) and identifying outliers as values outside a specified range.
Q1 = df['column_name'].quantile(0.25)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['column_name'] < lower_bound) | (df['column_name'] > upper_bound)]
filtered_data = df[(df['column_name'] >= lower_bound) & (df['column_name'] <= upper_bound)]
24

4. Tukey's Fences:
Tukey's Fences method is similar to the IQR method but uses a different threshold for identifying outliers.
lower_fence = Q1 - 3 * (Q3 - Q1)
upper_fence = Q3 + 3 * (Q3 - Q1)
outliers = df[(df['column_name'] < lower_fence) | (df['column_name'] > upper_fence)]
filtered_data = df[(df['column_name'] >= lower_fence) & (df['column_name'] <= upper_fence)]
25

5. Machine Learning-Based Methods:
You can also use machine learning models, such as Isolation Forest or One-Class SVM, to detect outliers in
your data.
from sklearn.ensemble import IsolationForest
clf = IsolationForest(contamination=0.05) # Adjust contamination based on your dataset
outliers = clf.fit_predict(df[['column_name']])
outliers = df[outliers == -1]
filtered_data = df[outliers == 1]
26

STRING MANIPULATION
String manipulation in Python involves performing various operations on strings, such as concatenation,
slicing, searching, replacing, formatting, and more. Python provides a rich set of string manipulation methods
and functions that make it easy to work with text data. Here are some common string manipulation
techniques in Python:
1. String Concatenation:
You can concatenate strings using the + operator or by using the str.join() method.
str1 = "Hello"
str2 = "World"
result = str1 + ", " + str2 # Using the + operator
words = ["Hello", "World"]
result = ", ".join(words) # Using join
27

2. String Slicing:
String slicing allows you to extract substrings from a string based on their positions.
text = "Python Programming"
substring = text[7:18] # Extract "Programming"
3. String Searching:
You can search for substrings within a string using methods like str.find(), str.index(), or regular expressions with
the re module.
text = "Python is a powerful programming language"
position = text.find("powerful") # Find the position of "powerful"
4. String Replacement:
Replace specific substrings within a string using the str.replace() method.
text = "Python is a great programming language"
new_text = text.replace("great", "powerful") # Replace "great" with "powerful"
28

5. String Formatting:
You can format strings using f-strings (Python 3.6+), the .format() method, or the % operator.
name = "Alice"
age = 30
formatted_str = f"My name is {name} and I am {age} years old."
name = "Bob"
age = 25
formatted_str = "My name is {} and I am {} years old.".format(name, age)
6. String Splitting: str Syntax string.split(separator,max)
Split a string into a list of substrings using the str.split() method.
1. text = "Python,Java,C++,JavaScript"
languages = text.split(",") # Split by comma
2. str = “Python is cool” print(str.split()) : [‘Python’, ‘is’, ‘cool’
3. str = “abcabc” print(str.split(c)) :[‘abc’, ‘abc’]
4. f = open(“sample.txt”, “r”)
info = f.read()
print(info.splitlines())
f.close()
29

7. String Stripping:
Remove leading and trailing whitespace characters using str.strip() or str.lstrip() and str.rstrip().
text = " Python is awesome! "
cleaned_text = text.strip() # Remove leading and trailing spaces
8. String Case Conversion:
Convert the case of a string using methods like str.lower(), str.upper(), or str.capitalize().
text = "Hello World"
lower_case = text.lower()
upper_case = text.upper()
capitalized = text.capitalize()
30

Syntax Function
string.upper() To transform all the characters of the string
into uppercase.
string.lower() To transform all the characters of the string
into lowercase.
string.title() To transform the first letter of a word into the
upper case and the rest of the characters into
the lower case.
string.swapcase() To transform the upper case characters into
lower case and vice versa.
string.capitalize() To transform the first character in the string to
the upper case.
string.isupper() Returns true if all the alphabetic characters of
the string are upper case.
string.islower() Returns true if all the alphabetic characters of
the string are lower case.
string.Endswith() Return true if the string ends with a specific
value.
string.Startswith() Return true if the string starts with a specific
value.
string.index(‘character’) Return the position of the character.
31

VECTORIZED STRING FUNCTIONS
Vectorized string functions in pandas allow you to efficiently perform operations on string data within a
pandas DataFrame or Series. These functions are accessed through the .str attribute of a pandas Series and
enable you to apply string operations element-wise. Here are some commonly used vectorized string
functions in pandas:
import pandas as pd
series = pd.Series(['Alice', 'Bob', 'Carol'])
series_lowercase = series.str.lower()
print(series_lowercase)
0 alice
1 bob
2 carol
dtype: object
32

Some of the most commonly used vectorized string functions in Pandas include:
str.lower(): Convert all strings to lowercase.
str.upper(): Convert all strings to uppercase.
str.strip(): Remove whitespace from the beginning and end of all strings.
str.split(): Split all strings into a list of strings, using a specified separator.
str.replace(): Replace all occurrences of a specified substring with another substring in all strings.
str.contains(): Return a boolean Series indicating whether each string contains a specified substring.
Vectorized string functions can also be used to perform more complex string operations, such as regular
expression matching and extraction. For example, the following code uses the str.extract() function to extract the
first name from each email address in a Series:
import pandas as pd
series = pd.Series(['alice@example.com',
'bob@example.com', 'carol@example.com'])
first_names =
series.str.extract(r'(?P<first_name>w+)@example.com')
print(first_names)
Output:
0 alice
1 bob
2 carol
dtype: object
33

PLOTTING AND VISUALIZATION
Plotting and visualization are crucial for understanding and communicating data. In Python, there are several
libraries for creating plots and visualizations, with Matplotlib, Seaborn, and Plotly being some of the most
popular ones. Here's an overview of how to create plots and visualizations in Python:
Matplotlib
Matplotlib is an easy-to-use, low-level data visualization library that is built on NumPy arrays. It consists of
various plots like scatter plot, line plot, histogram, etc. Matplotlib provides a lot of flexibility.
pip install matplotlib
34

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot')
plt.show()
Basic Line Plot:
35

Seaborn:
Seaborn is built on top of Matplotlib and provides a high-level interface for creating informative and attractive
statistical graphics.
Installation:
pip install seaborn
Basic Scatter Plot:
import seaborn as sns
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
sns.scatterplot(x=x, y=y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.show() 36

Plotly:
Plotly is a powerful library for creating interactive and web-based visualizations. It is often used for creating
dashboards and web applications.
Installation:
pip install plotly
Basic Bar Chart:
import plotly.express as px
data = {'Category': ['A', 'B', 'C', 'D'],
'Values': [10, 20, 15, 30]}
fig = px.bar(data, x='Category', y='Values', title='Bar Chart')
fig.show()
37

Other Libraries:
Pandas: Pandas also provides basic plotting functionality through the .plot() method for DataFrames, making it
convenient for quick exploratory data analysis.
Bokeh: Bokeh is another library for interactive web-based visualizations and is well-suited for creating
interactive dashboards.
Altair: Altair is a declarative statistical visualization library for Python, making it easy to create complex
visualizations with concise code.
ggplot (ggpy): ggplot is a Python implementation of the popular ggplot2 package from R, which uses a
grammar of graphics to create plots.
38

MATPLOTLIB API PRIMER
39
Matplotlib is a popular Python library for creating static, animated, and interactive visualizations. When
creating plots with Matplotlib, you can customize the appearance of your lines, markers, and line styles. Here's
a primer on how to do that:
Colors:
Matplotlib allows you to specify colors for lines, markers, and other plot elements in several ways:
Named Colors: You can use named colors like 'red', 'blue', 'green', etc.
plt.plot(x, y, color='red', label='Red Line')
RGB Values: You can use RGB tuples to specify colors.
plt.plot(x, y, color=(0.1, 0.2, 0.3), label='Custom Color')
Hexadecimal Colors: You can also use hexadecimal color codes.
plt.plot(x, y, color='#FF5733', label='Hex Color')

40
Markers:
Markers are used to indicate specific data points on a plot. You can customize markers in Matplotlib:
plt.plot(x, y, marker='o', markersize=8, markerfacecolor='yellow', markeredgecolor='black', label='Custom
Marker')
marker: Specifies the marker style (e.g., 'o' for circles, 's' for squares, 'x' for crosses).
markersize: Sets the size of the marker.
markerfacecolor: Sets the marker's fill color.
markeredgecolor: Sets the marker's edge color.

41
Line Styles:
You can customize the line style of your plot:
plt.plot(x, y, linestyle='--', linewidth=2, label='Dashed Line')
linestyle: Specifies the line style (e.g., '-', '--', '-.', ':').
linewidth: Sets the width of the line.

42
Line Styles:
You can customize the appearance of lines in your plot using various line styles, markers, and colors. Here's an
example:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
# Customizing line style with color, marker, and linestyle
plt.plot(x, y, color='blue', marker='o', linestyle='--', markersize=8, label='Custom Line')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Custom Line Style')
plt.legend()
plt.show()
In the above example:
color: Sets the line color.
marker: Specifies the marker style (e.g., 'o' for circles, 's' for
squares).
linestyle: Sets the line style ('--' for dashed, ':' for dotted,
etc.).
markersize: Adjusts the size of markers.

43
Ticks and Labels:
You can customize tick locations and labels on the x and y axes using xticks() and yticks() functions:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y, label='Line Plot')
# Customizing x-axis ticks and labels
plt.xticks([1, 2, 3, 4, 5], ['A', 'B', 'C', 'D', 'E'])
# Customizing y-axis ticks and labels
plt.yticks([2, 4, 6, 8, 10], ['Low', 'Medium', 'High', 'Very High', 'Max'])
plt.title('Custom Ticks and Labels')
plt.legend()
plt.show()
In this example:
xticks() and yticks() specify the locations and labels for the
ticks on the x and y axes, respectively.

44
Legends:
To add legends to your plot, you can use the legend() function. You should also label your plotted lines or data
points using the label parameter when creating the plot.
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.title('Legend Example')
plt.legend()
plt.show()
In this example:
label is provided when creating each line, and legend() is
called to display the legend.

ANNOTATIONS AND DRAWING ON A SUBPLOT
45
Annotations and drawing on a subplot in Matplotlib allow you to add textual or graphical elements to your
plots to provide additional information or highlight specific points of interest. Here's how you can add
annotations and draw on a subplot:

46
Adding Text Annotations:
You can add text annotations to your plot using the text()
function. Here's an example:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
# Adding a text annotation
plt.text(3, 7, 'Annotation Here', fontsize=12, color='red')
plt.title('Text Annotation Example')
plt.legend()
plt.show()
In this example, plt.text(x, y, text, fontsize, color) is used to
add a text annotation at coordinates (3, 7) with the
specified text, fontsize, and color.

47
Adding Arrows with Annotations:
You can add arrows to point to specific locations on your plot using the annotate() function. Here's an example:
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
# Adding an arrow with annotation
plt.annotate('Important Point', xy=(3, 6), xytext=(4, 8), fontsize=12,
arrowprops=dict(arrowstyle='->', color='blue'))
plt.title('Arrow Annotation Example')
plt.legend()
plt.show()
In this example, plt.annotate(text, xy, xytext, fontsize, arrowprops) is used to add an arrow with text annotation. xy
specifies the point being pointed to, and xytext specifies the location of the text.

48
Drawing Shapes:
You can draw various shapes, lines, and polygons on a subplot using Matplotlib's plotting functions. For
example, to draw a rectangle:
import matplotlib.patches as patches
# Create a subplot
fig, ax = plt.subplots()
# Add a rectangle
rectangle = patches.Rectangle((1, 2), 2, 4, linewidth=2, edgecolor='red', facecolor='none')
ax.add_patch(rectangle)
plt.title('Drawing Shapes Example')
plt.show()
In this example, we create a subplot, add a rectangle using patches.Rectangle(), and then add it to the plot
with ax.add_patch().

SAVING PLOTS TO FILE
49
You can save plots created with Matplotlib to various file formats such as PNG, PDF, SVG, and more using the
savefig() function. Here's how to save a plot to a file:
# Create and customize your plot
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.title('Saved Plot Example')
plt.legend()
# Save the plot to a file (e.g., PNG)
plt.savefig('saved_plot.png', dpi=300)

50
In this example, plt.savefig('saved_plot.png', dpi=300) saves the current plot to a PNG file named
"saved_plot.png" with a resolution of 300 dots per inch (dpi). You can specify the file format by changing the
file extension (e.g., ".pdf" for PDF, ".svg" for SVG).
Common parameters for savefig():
fname: The file name and path where the plot will be saved.
dpi: The resolution in dots per inch (default is 100).
format: The file format (e.g., 'png', 'pdf', 'svg').
bbox_inches: Specifies which part of the plot to save. Use 'tight' to save the entire plot (default).
transparent: If True, the plot will have a transparent background.
orientation: For PDFs, you can specify 'portrait' or 'landscape'.

PLOTTING FUNCTIONS IN PANDAS.
51
Pandas provides a number of plotting functions that can be used to create a variety of different plots and
visualizations. These functions are simple to use and can be used to create plots with just a few lines of code.
To use the Pandas plotting functions, you first need to import the pandas.plotting module. Once you have
imported the module, you can use the plot() function to create a plot of a Series or DataFrame. The plot()
function takes a number of keyword arguments, which can be used to control the appearance of the plot.
Here is a simple example of how to use the Pandas plot() function to create a line chart:
import pandas as pd
# Create a Series
series = pd.Series([2, 4, 6, 8, 10])
# Create a line chart of the Series
series.plot()
# Show the plot
plt.show()

52
Line Plot:
To create a line plot of a Series or DataFrame,
simply call the .plot() method on the data:
import pandas as pd
# Create a DataFrame
data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create a line plot
df['y'].plot()
plt.title('Line Plot')
plt.show()
Scatter Plot:
Scatter plots are created similarly, but you specify
kind='scatter':
df.plot(kind='scatter', x='x', y='y')
plt.title('Scatter Plot')
plt.show()

53
Bar Plot:
To create a bar plot, you can use kind='bar':
df.plot(kind='bar', x='x', y='y')
plt.title('Bar Plot')
plt.show()
Histogram:
For histograms, use kind='hist':
df['y'].plot(kind='hist', bins=5)
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()
Customization:
You can further customize your plots by using Matplotlib functions after calling .plot(). Additionally, you can create
subplots using the .subplots() method to have more control over the layout of multiple plots.
fig, axes = plt.subplots(nrows=2, ncols=2)
df['y'].plot(ax=axes[0, 0], title='Line Plot')
df.plot(kind='scatter', x='x', y='y', ax=axes[0, 1], title='Scatter Plot')
df.plot(kind='bar', x='x', y='y', ax=axes[1, 0], title='Bar Plot')
df['y'].plot(kind='hist', bins=5, ax=axes[1, 1], title='Histogram')
plt.tight_layout() # The tight_layout() function in pyplot module of matplotlib library is used to automatically adjust subplot parameters to give specified padding.
plt.show()

Unit 4_Working with Graphs _python (2).pptx

Recommended

More Related Content

Similar to Unit 4_Working with Graphs _python (2).pptx (20)

More from prakashvs7 (15)

Recently uploaded (20)

Unit 4_Working with Graphs _python (2).pptx