Experiment 678910
Experiment 678910
Pandas is a Python library used for data manipulation, analysis, and cleaning. It provides two
primary data structures:
import pandas as pd
1. import pandas → This imports the pandas library, which is used for data analysis and
manipulation.
2. as pd → This assigns a short alias (pd) to pandas, so we can refer to it as pd instead of
writing pandas every time.
a. Creating DataFrame
import pandas as pd
df = pd.DataFrame(data)
print(df)
Output:
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
3 David 40 80000
b. Using concat()
The concat() function is used to concatenate DataFrames along a particular axis (rows or
columns). Here, we concatenate two DataFrames along the rows (axis=0):
df = pd.DataFrame(data)
# Creating another DataFrame
data2 = {
'Name': ['Eve', 'Frank'],
'Age': [45, 50],
'Salary': [90000, 100000]
}
df2 = pd.DataFrame(data2)
Output:
c. Setting Conditions
You can filter data in a DataFrame based on conditions. For example, selecting people who have
a salary greater than 70,000:
Output:
Output:
yaml
CopyEdit
Name Age Salary Bonus
0 Alice 25 50000 5000.0
1 Bob 30 60000 6000.0
2 Charlie 35 70000 7000.0
3 David 40 80000 8000.0
4 Eve 45 90000 9000.0
5 Frank 50 100000 10000.0
Here, the Bonus column is calculated as 10% of the Salary column for each person.
Experiment 7.
You can fill missing values (NaN) in a DataFrame using the fillna() method. For instance, if
we have a NaN in a column and want to replace it with a specific string like 'Unknown',
import pandas as pd
import numpy as np
df = pd.DataFrame(data)
The NaN values in both the Name and Salary columns are filled with the string 'Unknown'.
You can sort a DataFrame based on the values of one or more columns using the sort_values()
method. Below is an example of sorting the DataFrame based on the Age column in ascending
order:
Output:
the DataFrame is sorted by the Age column, starting from the smallest age.
c. Using groupby()
The groupby() method in pandas allows you to group rows based on a column and perform
some aggregation. For example, let's group the DataFrame by the Age column and calculate the
average Salary for each age group:
df_grouped = df.groupby('Age')['Salary'].mean().reset_index()
print(df_grouped)
Explanation:
The groupby('Age') groups the data based on unique values in the Age column.
We then use .mean() to compute the average salary for each age group.
The reset_index() is used to convert the resulting grouped data back into a regular
DataFrame.
Output:
Age Salary
0 25 50000.0
1 30 NaN
2 35 70000.0
3 40 80000.0
Experiment 8.
You can read text files in pandas using the read_csv() function, even for simple text files, by
specifying the delimiter (e.g., space, tab, etc.). Here is an example for reading a text file that uses
spaces or tabs as delimiters.
Example:
import pandas as pd
Explanation:
read_csv() is versatile and can read text files as long as we specify the correct delimiter.
You can replace ' ' with the actual delimiter used in your text file.
b. Reading CSV Files
CSV files are very common, and pandas makes it very easy to read them using the read_csv()
function.
Example:
Explanation:
Output:
For reading Excel files, you can use the read_excel() function. You may need to install
openpyxl or xlrd for Excel files depending on the file format (xlsx or xls).
Example:
Explanation:
Output:
For JSON files, you can use the read_json() function. JSON files are commonly used for
hierarchical or nested data.
Example:
Explanation:
[
{"Name": "Alice", "Age": 25, "Salary": 50000},
{"Name": "Bob", "Age": 30, "Salary": 60000},
{"Name": "Charlie", "Age": 35, "Salary": 70000}
]
Name Age Salary
0 Alice 25 50000
1 Bob 30 60000
2 Charlie 35 70000
Experiment 9.
Pickle files are used to serialize Python objects. You can load them back into memory using the
read_pickle() function in pandas.
Example:
import pandas as pd
Explanation:
Pickle files save data in a binary format and pandas can read them directly with
read_pickle().
This is useful when you want to save the state of a DataFrame (or other Python objects)
and load it back later.
Output:
To read and work with image files, you can use the PIL (Python Imaging Library) or Pillow,
which is an improved version of PIL.
Example:
Explanation:
You can use the glob module to match files using patterns (such as .csv, .txt, etc.). This
allows you to work with multiple files at once.
Example:
import glob
import pandas as pd
Explanation:
To import data from a database (such as SQLite, MySQL, etc.), you can use pandas along with a
database connector. The example here uses SQLite with sqlite3.
Example:
import sqlite3
import pandas as pd
Explanation:
Output (example):
For databases like MySQL, you can use pymysql.connect() or similar connectors and follow a
similar process.
Exp 10.
Web scraping in Python can be done using libraries like requests for making HTTP requests,
and BeautifulSoup (from the bs4 library) for parsing and extracting data from HTML content.
Below is a simple demonstration of web scraping.
Steps:
1. Install the required libraries: If you don't have requests and beautifulsoup4
installed, you can install them using pip:
import requests
from bs4 import BeautifulSoup
# Step 3: Extracting data: In this case, we extract all quotes from the page
quotes = soup.find_all('span', class_='text')
Explanation:
requests.get(url) sends an HTTP request to the specified URL and retrieves the
webpage's content.
BeautifulSoup(response.text, 'html.parser') parses the HTML content.
soup.find_all('span', class_='text') finds all <span> tags with the class 'text',
which contain the quotes.
Finally, we loop through the quotes and print them.
Output:
“The world as we have created it is a process of our thinking. It cannot be
changed without changing our thinking.”
“Life is what happens when you're busy making other plans.”
“It is our choices that show what we truly are, far more than our abilities.”
“Never let the fear of striking out keep you from playing the game.”
“You have within you right now, everything you need to deal with whatever the
world can throw at you.”
“The person, be it gentleman or lady, who has not pleasure in a good novel,
must be intolerably stupid.”
You can also extract other information such as the author of each quote and tags associated with
it. Here’s how you can extend the previous example:
# Extracting authors
authors = soup.find_all('small', class_='author')
# Extracting tags
tags = soup.find_all('div', class_='tags')
Explanation:
Output:
Quote: “Life is what happens when you're busy making other plans.”
Author: John Lennon
Tags: ['life', 'adulthood', 'quotes']
Quote: “It is our choices that show what we truly are, far more than our
abilities.”
Author: J.K. Rowling
Tags: ['abilities', 'choices']
Quote: “Never let the fear of striking out keep you from playing the game.”
Author: Babe Ruth
Tags: ['sports', 'fear', 'inspirational']
Quote: “You have within you right now, everything you need to deal with
whatever the world can throw at you.”
Author: Brian Tracy
Tags: ['self-confidence', 'inspirational', 'world']