0% found this document useful (0 votes)
18 views12 pages

Experiment 678910

The document provides a comprehensive guide on using the pandas library for data manipulation in Python, covering operations like creating DataFrames, concatenating, setting conditions, and adding new columns. It also explains how to handle missing values, sort data, group data, and read various file formats including text, CSV, Excel, JSON, and more. Additionally, it includes a section on web scraping using requests and BeautifulSoup to extract data from websites.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views12 pages

Experiment 678910

The document provides a comprehensive guide on using the pandas library for data manipulation in Python, covering operations like creating DataFrames, concatenating, setting conditions, and adding new columns. It also explains how to handle missing values, sort data, group data, and read various file formats including text, CSV, Excel, JSON, and more. Additionally, it includes a section on web scraping using requests and BeautifulSoup to extract data from websites.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Experiment 6.

Perform following operations using pandas


a. Creating dataframe
b. concat()
c. Setting conditions
d. Adding a new column

Pandas: A Powerful Data Analysis Library in Python

Pandas is a Python library used for data manipulation, analysis, and cleaning. It provides two
primary data structures:

 Series (1D labeled array)


 DataFrame (2D table similar to an Excel spreadsheet)

Installing and Importing Pandas

First, install pandas by using below command

pip install pandas

Now, import pandas:

import pandas as pd

1. import pandas → This imports the pandas library, which is used for data analysis and
manipulation.
2. as pd → This assigns a short alias (pd) to pandas, so we can refer to it as pd instead of
writing pandas every time.

a. Creating DataFrame

A DataFrame is the core structure in pandas, similar to a table in SQL or Excel.

First, we create a basic DataFrame from a dictionary:

import pandas as pd

# Creating a simple DataFrame


data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Salary': [50000, 60000, 70000, 80000]
}

df = pd.DataFrame(data)
print(df)

Output:
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
3 David 40 80000

b. Using concat()

The concat() function is used to concatenate DataFrames along a particular axis (rows or
columns). Here, we concatenate two DataFrames along the rows (axis=0):

# Creating a simple DataFrame


data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Salary': [50000, 60000, 70000, 80000]
}

df = pd.DataFrame(data)
# Creating another DataFrame
data2 = {
'Name': ['Eve', 'Frank'],
'Age': [45, 50],
'Salary': [90000, 100000]
}

df2 = pd.DataFrame(data2)

# Concatenating along rows (axis=0)


df_concat = pd.concat([df, df2], axis=0, ignore_index=True)
print(df_concat)

Output:

Name Age Salary


0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
3 David 40 80000
4 Eve 45 90000
5 Frank 50 100000

c. Setting Conditions

You can filter data in a DataFrame based on conditions. For example, selecting people who have
a salary greater than 70,000:

# Applying condition (Salary > 70000)


condition = df_concat[df_concat['Salary'] > 70000]
print(condition)

Output:

Name Age Salary


4 Eve 45 90000
5 Frank 50 100000

d. Adding a New Column with Explanation


To add a new column based on a condition or calculation, we can do something like adding a
'Bonus' column that is 10% of the salary:

# Adding a new column 'Bonus'


df_concat['Bonus'] = df_concat['Salary'] * 0.1
print(df_concat)

Output:

yaml
CopyEdit
Name Age Salary Bonus
0 Alice 25 50000 5000.0
1 Bob 30 60000 6000.0
2 Charlie 35 70000 7000.0
3 David 40 80000 8000.0
4 Eve 45 90000 9000.0
5 Frank 50 100000 10000.0

Here, the Bonus column is calculated as 10% of the Salary column for each person.

Experiment 7.

Perform following operations using pandas


a. Filling NaN with string
b. Sorting based on column values
c. groupby() with explanation with output

a. Filling NaN with String

You can fill missing values (NaN) in a DataFrame using the fillna() method. For instance, if
we have a NaN in a column and want to replace it with a specific string like 'Unknown',

import pandas as pd
import numpy as np

# Creating a DataFrame with NaN values


data = {
'Name': ['Alice', 'Bob', 'Charlie', np.nan],
'Age': [25, 30, 35, 40],
'Salary': [50000, np.nan, 70000, 80000]
}

df = pd.DataFrame(data)

# Filling NaN values with a string


df_filled = df.fillna('Unknown')
print(df_filled)
Output:

Name Age Salary


0 Alice 25 50000
1 Bob 30 Unknown
2 Charlie 35 70000
3 Unknown 40 80000

The NaN values in both the Name and Salary columns are filled with the string 'Unknown'.

b. Sorting Based on Column Values

You can sort a DataFrame based on the values of one or more columns using the sort_values()
method. Below is an example of sorting the DataFrame based on the Age column in ascending
order:

# Sorting by the 'Age' column in ascending order


df_sorted = df.sort_values(by='Age', ascending=True)
print(df_sorted)

Output:

Name Age Salary


0 Alice 25 50000
1 Bob 30 NaN
2 Charlie 35 70000
3 David 40 80000

the DataFrame is sorted by the Age column, starting from the smallest age.

c. Using groupby()

The groupby() method in pandas allows you to group rows based on a column and perform
some aggregation. For example, let's group the DataFrame by the Age column and calculate the
average Salary for each age group:

# Grouping by 'Age' and calculating the average Salary

df_grouped = df.groupby('Age')['Salary'].mean().reset_index()
print(df_grouped)

Explanation:

 The groupby('Age') groups the data based on unique values in the Age column.
 We then use .mean() to compute the average salary for each age group.
 The reset_index() is used to convert the resulting grouped data back into a regular
DataFrame.

Output:
Age Salary
0 25 50000.0
1 30 NaN
2 35 70000.0
3 40 80000.0

Experiment 8.

Read the following file formats using pandas


a. Text files
b. CSV files
c. Excel files
d. JSON files

a. Reading Text Files

You can read text files in pandas using the read_csv() function, even for simple text files, by
specifying the delimiter (e.g., space, tab, etc.). Here is an example for reading a text file that uses
spaces or tabs as delimiters.

Example:

import pandas as pd

# Read a text file (assuming it has space or tab-separated data)


df_text = pd.read_csv('file.txt', delimiter=' ') # or use '\t' for tab-
delimited files
print(df_text)

Explanation:

 read_csv() is versatile and can read text files as long as we specify the correct delimiter.
 You can replace ' ' with the actual delimiter used in your text file.
b. Reading CSV Files

CSV files are very common, and pandas makes it very easy to read them using the read_csv()
function.

Example:

# Read a CSV file


df_csv = pd.read_csv('file.csv')
print(df_csv)

Explanation:

 read_csv() reads the file and automatically handles comma-separated data.


 You can pass extra arguments like header, index_col, etc., if needed.

Output:

Name Age Salary


0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000

c. Reading Excel Files

For reading Excel files, you can use the read_excel() function. You may need to install
openpyxl or xlrd for Excel files depending on the file format (xlsx or xls).

Example:

# Read an Excel file


df_excel = pd.read_excel('file.xlsx', sheet_name='Sheet1')
print(df_excel)

Explanation:

 read_excel() reads an Excel file.


 sheet_name allows you to specify which sheet to load (if there are multiple sheets).

Output:

Name Age Salary


0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000

d. Reading JSON Files

For JSON files, you can use the read_json() function. JSON files are commonly used for
hierarchical or nested data.
Example:

# Read a JSON file


df_json = pd.read_json('file.json')
print(df_json)

Explanation:

 read_json() loads data from JSON files into a pandas DataFrame.


 JSON files are often structured in a nested way, and pandas will flatten the data into
tabular format.

Output (example of JSON data converted to DataFrame):

[
{"Name": "Alice", "Age": 25, "Salary": 50000},
{"Name": "Bob", "Age": 30, "Salary": 60000},
{"Name": "Charlie", "Age": 35, "Salary": 70000}
]
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000

Experiment 9.

Read the following file formats


a. Pickle files
b. Image files using PIL
c. Multiple files using Glob
d. Importing data from database

a. Reading Pickle Files

Pickle files are used to serialize Python objects. You can load them back into memory using the
read_pickle() function in pandas.

Example:

import pandas as pd

# Read a Pickle file


df_pickle = pd.read_pickle('file.pkl')
print(df_pickle)

Explanation:

 Pickle files save data in a binary format and pandas can read them directly with
read_pickle().
 This is useful when you want to save the state of a DataFrame (or other Python objects)
and load it back later.
Output:

# Example output (depending on the content of the Pickle file)


Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000

b. Reading Image Files using PIL

To read and work with image files, you can use the PIL (Python Imaging Library) or Pillow,
which is an improved version of PIL.

Example:

from PIL import Image

# Open an image file


img = Image.open('image.jpg')
img.show() # This will display the image

Explanation:

 Image.open() opens the image file.


 The show() method displays the image (you can also save or manipulate the image as
needed).

c. Reading Multiple Files using Glob

You can use the glob module to match files using patterns (such as .csv, .txt, etc.). This
allows you to work with multiple files at once.

Example:

import glob
import pandas as pd

# Using glob to get all CSV files in the directory


files = glob.glob('*.csv')

# Reading all files into a list of DataFrames


dfs = [pd.read_csv(file) for file in files]

# Concatenating all DataFrames into one


df_combined = pd.concat(dfs, ignore_index=True)
print(df_combined)

Explanation:

 glob.glob('*.csv') retrieves all CSV files in the current directory.


 We loop through the file paths, read each one using pd.read_csv(), and store the
DataFrames in a list.
 pd.concat() is used to combine all DataFrames into a single DataFrame.
Output (example):

Name Age Salary


0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000

d. Importing Data from a Database

To import data from a database (such as SQLite, MySQL, etc.), you can use pandas along with a
database connector. The example here uses SQLite with sqlite3.

Example:

import sqlite3
import pandas as pd

# Create a connection to the SQLite database


conn = sqlite3.connect('database.db')

# Query to select data


query = "SELECT * FROM employees"

# Import data into a pandas DataFrame


df_db = pd.read_sql_query(query, conn)
print(df_db)

# Close the database connection


conn.close()

Explanation:

 sqlite3.connect() establishes a connection to the SQLite database (for other


databases, you'd use the corresponding connector like mysql.connector for MySQL).
 pd.read_sql_query() runs the SQL query and loads the results into a DataFrame.

Output (example):

ID Name Age Department Salary


0 1 Alice 25 HR 50000
1 2 Bob 30 IT 60000
2 3 Charlie 35 Finance 70000

For databases like MySQL, you can use pymysql.connect() or similar connectors and follow a
similar process.
Exp 10.

Demonstrate web scraping using python

Web scraping in Python can be done using libraries like requests for making HTTP requests,
and BeautifulSoup (from the bs4 library) for parsing and extracting data from HTML content.
Below is a simple demonstration of web scraping.

Steps:

1. Install the required libraries: If you don't have requests and beautifulsoup4
installed, you can install them using pip:

pip install requests beautifulsoup4

2. Web Scraping Process:


o Send an HTTP request to the website.
o Parse the HTML content using BeautifulSoup.
o Extract the relevant data (e.g., text, links, tables).

Example: Scraping Quotes from a Website

Let's demonstrate web scraping by extracting quotes from a sample website:


http://quotes.toscrape.com/.

1. Sending a Request and Parsing the HTML

import requests
from bs4 import BeautifulSoup

# Step 1: Send a request to the website


url = 'http://quotes.toscrape.com/'
response = requests.get(url)

# Step 2: Parse the HTML content using BeautifulSoup


soup = BeautifulSoup(response.text, 'html.parser')

# Step 3: Extracting data: In this case, we extract all quotes from the page
quotes = soup.find_all('span', class_='text')

# Step 4: Print the quotes


for quote in quotes:
print(quote.text)

Explanation:

 requests.get(url) sends an HTTP request to the specified URL and retrieves the
webpage's content.
 BeautifulSoup(response.text, 'html.parser') parses the HTML content.
 soup.find_all('span', class_='text') finds all <span> tags with the class 'text',
which contain the quotes.
 Finally, we loop through the quotes and print them.

Output:
“The world as we have created it is a process of our thinking. It cannot be
changed without changing our thinking.”
“Life is what happens when you're busy making other plans.”
“It is our choices that show what we truly are, far more than our abilities.”
“Never let the fear of striking out keep you from playing the game.”
“You have within you right now, everything you need to deal with whatever the
world can throw at you.”
“The person, be it gentleman or lady, who has not pleasure in a good novel,
must be intolerably stupid.”

2. Extracting Additional Information (e.g., Author and Tags)

You can also extract other information such as the author of each quote and tags associated with
it. Here’s how you can extend the previous example:

# Extracting authors
authors = soup.find_all('small', class_='author')

# Extracting tags
tags = soup.find_all('div', class_='tags')

# Step 4: Print quotes, authors, and tags


for quote, author, tag in zip(quotes, authors, tags):
print(f'Quote: {quote.text}')
print(f'Author: {author.text}')
print(f'Tags: {[t.text for t in tag.find_all("a")]}\n')

Explanation:

 soup.find_all('small', class_='author') finds all <small> tags with the class


'author', which contain the authors of the quotes.
 soup.find_all('div', class_='tags') finds all <div> tags with the class 'tags',
which contain the tags related to each quote.
 We then loop through the quotes, authors, and tags, printing each set of information.

Output:

Quote: “The world as we have created it is a process of our thinking. It


cannot be changed without changing our thinking.”
Author: Albert Einstein
Tags: ['change', 'deep-thoughts', 'thinking', 'world']

Quote: “Life is what happens when you're busy making other plans.”
Author: John Lennon
Tags: ['life', 'adulthood', 'quotes']

Quote: “It is our choices that show what we truly are, far more than our
abilities.”
Author: J.K. Rowling
Tags: ['abilities', 'choices']

Quote: “Never let the fear of striking out keep you from playing the game.”
Author: Babe Ruth
Tags: ['sports', 'fear', 'inspirational']

Quote: “You have within you right now, everything you need to deal with
whatever the world can throw at you.”
Author: Brian Tracy
Tags: ['self-confidence', 'inspirational', 'world']

You might also like