0% found this document useful (0 votes)

2 views27 pages

UNIT-4 (1)

The document provides an overview of key concepts in NumPy and Pandas, including reshaping arrays, differences between iloc and loc indexers, and methods for installing Python libraries. It also discusses data manipulation techniques in NumPy, importing data from CSV files into Pandas, and the attributes of Pandas Series. Additionally, it highlights how NumPy and Pandas can be integrated in a data analysis workflow.

Uploaded by

byjuslearn874

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views27 pages

UNIT-4 (1)

Uploaded by

byjuslearn874

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

UNIT-4

1. How to reshape arrays in NumPy. What happens when you use the reshape() function with
a -1 parameter?

In NumPy, the reshape() function is used to change the shape (or dimensions) of an array without
changing its data.

Syntax:

numpy.reshape(a, newshape)

if you're calling from a NumPy array:

a.reshape(newshape)

Example:

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6])

b = a.reshape((2, 3)) # 2 rows, 3 columns

print(b)

Output:

[[1 2 3]

[4 5 6]]

What happens when you use -1 in reshape()?

The -1 is a special placeholder in reshape() that tells NumPy to infer the correct dimension
automatically based on the original array's size.

📘 Example:

a = np.array([1, 2, 3, 4, 5, 6])

b = a.reshape((3, -1)) # Let NumPy determine the number of columns

print(b)

Output:

[[1 2]

[3 4]
[5 6]

2. Difference between iloc and loc indexers in Pandas

Feature iloc loc

Stands for Integer location Label location

Index type Integer-based (positional) Label-based (index names)

Syntax df.iloc[row_idx, col_idx] df.loc[row_label, col_label]

Inclusive? Excludes the end for slicing Includes an end for slicing

3. Explain the process of installing Python libraries using different methods. Compare pip,
conda, and manual installation, highlighting their advantages and limitations.

Python supports multiple ways to install libraries. The three main methods are:

Method Description

pip The default Python package manager

conda A package/environment manager provided by Anaconda

Manual installation Directly downloading and installing packages

🔹 1. pip (Python Package Installer)

pip is Python’s official package manager that downloads packages from PyPI.
Installation:
pip install package_name
Advantages:
 Comes with Python 3.4+
 Wide support (100k+ packages on PyPI)
 Lightweight and fast
 Works in virtual environments (venv)
⚠️ Limitations:
 Dependency conflicts can occur
 Doesn’t handle non-Python dependencies well (e.g., system libraries like OpenCV's C++
components)

🔹 2. conda (Anaconda Package Manager)

conda is a powerful package and environment manager, part of the Anaconda distribution.
Installation:
conda install package_name
Advantages:
 Handles both Python and non-Python packages (e.g., NumPy with C optimizations)
 Manages isolated environments easily
 Good for data science and machine learning setups
Limitations:
 Heavier footprint (Anaconda is ~3GB)
 Smaller package repository than PyPI
 Packages may lag behind PyPI in updates

🔹 3. Manual Installation
Downloading the source code or a .whl (wheel) or .tar.gz file and installing manually.
Steps:
# Download package
pip install /path/to/package.whl

# OR clone from GitHub

git clone https://github.com/author/project.git
cd project
python setup.py install
Advantages:
 Full control over version and build
 Useful for unreleased or custom packages
⚠️ Limitations:
 Requires more technical knowledge
 Dependency resolution is manual
 Risk of incompatible builds

4. Compare and contrast the primary data structures in Pandas: Series and DataFrame

Feature Series DataFrame

Dimensionality 1D 2D

Index type Single axis (index) Two axes (rows and columns)

Data type Homogeneous Heterogeneous (multiple dtypes)

Example Column of data Full table with rows & columns

Usage example Time series, scores Structured datasets, CSVs, DBs

Example:
import pandas as pd

s1 = pd.Series([85, 78, 92], index=['Math', 'English', 'Science'])

s2 = pd.Series([90, 88, 80], index=['Math', 'English', 'Science'])

df = pd.DataFrame({'Student1': s1, 'Student2': s2}).T

df['Average'] = df.mean(axis=1)

print(df)

Output:

Math English Science Average

Student1 85 78 92 85.00

Student2 90 88 80 86.00

5. Explain the process of manipulating array shapes in NumPy. Discuss transpose operations,
reshaping, stacking, and splitting arrays with appropriate examples.

NumPy provides powerful tools to change the shape or structure of arrays for mathematical and data
operations.

🔹 1. Reshaping Arrays

Purpose: Change the shape (dimensions) of an array without changing the data.

import numpy as np

a = np.arange(6) # [0, 1, 2, 3, 4, 5]

b = a.reshape((2, 3)) # Reshape to 2 rows, 3 columns

print(b)

Output:

[[0 1 2]

[3 4 5]]

🔹 2. Transposing Arrays

Purpose: Flip rows and columns (useful in linear algebra and image processing).
a = np.array([[1, 2], [3, 4]])

print(a.T) # or np.transpose(a)

Output:

[[1 3]

[2 4]]

🔹 3. Stacking Arrays

Purpose: Combine multiple arrays into one.

 Vertical Stack (vstack) – Stack arrays row-wise (like adding more rows):

a = np.array([[1, 2], [3, 4]])

b = np.array([[5, 6]])

print(np.vstack((a, b)))

 Horizontal Stack (hstack) – Stack arrays column-wise:

c = np.array([[7], [8]])

print(np.hstack((a, c)))

🔹 4. Splitting Arrays

Purpose: Divide arrays into multiple sub-arrays.

a = np.array([[1, 2, 3], [4, 5, 6]])

print(np.hsplit(a, 3))

print(np.vsplit(a, 2))

6. Explain the process of importing data from CSV files into Pandas DataFrames. Discuss
various parameters that can be used to handle different CSV formats, missing values, and
data types.

CSV (Comma-Separated Values) is a common format for datasets. Pandas makes it easy to import
them.
Basic import:

import pandas as pd

df = pd.read_csv("students.csv")

print(df.head())

Key parameters:

Parameter Purpose

sep Delimiter (default is comma)

header Row number(s) to use as column names

names Provide column names manually

index_col Column to set as index

usecols Load only specific columns

dtype Specify data types

Define custom missing value

na_values
representations

skiprows Skip rows at the start

nrows Read only N rows

Handle different file encodings (like utf-

encoding
8, latin1)

Examples:

1. Load a CSV with a custom delimiter:

df = pd.read_csv("data.csv", sep=";")

2. Handle missing values and data types:

df = pd.read_csv("data.csv", na_values=["NA", "n/a"], dtype={"Age": int})

3. Use a specific column as index:

df = pd.read_csv("data.csv", index_col="StudentID")
4. Read specific columns and skip rows:

df = pd.read_csv("data.csv", usecols=["Name", "Score"], skiprows=1)

7. How array slicing works in NumPy. Provide an example

Array slicing in NumPy works similarly to Python list slicing, but it's more powerful because it
supports multi-dimensional arrays.

🔹 Syntax:

array[start:stop:step]

 start: starting index (inclusive)

 stop: ending index (exclusive)

 step: stride (optional)

✅ 1D Example:

import numpy as np

a = np.array([10, 20, 30, 40, 50])

print(a[1:4])

# Elements at index 1, 2, 3 → [20, 30, 40]

2D Example:

b = np.array([

[1, 2, 3],

[4, 5, 6],

[7, 8, 9]

])

print(b[:2, :2])

Output:

[[1 2]
[4 5]

8. Write two methods to import data from a CSV file into a Pandas DataFrame

Method 1: Using pd.read_csv()

This is the most common and recommended way.

import pandas as pd

df = pd.read_csv("data.csv")
print(df.head())

Method 2: Using csv module with Pandas DataFrame

This is useful when you want more control while reading the file manually.

import csv
import pandas as pd

with open("data.csv", newline='') as file:

reader = csv.reader(file)
data = list(reader)

# Convert to DataFrame manually (assumes first row is header)

df = pd.DataFrame(data[1:], columns=data[0])
print(df.head())

9. Identify the Python library commonly used for solving differential equations numerically

Solve ODE dy/dt=−2y

import numpy as np

from scipy.integrate import solve_ivp

import matplotlib.pyplot as plt

def dydt(t, y):

return -2*y

y0 = [1]

t_span = (0, 5)

t_eval = np.linspace(*t_span, 100)

solution = solve_ivp(dydt, t_span, y0, t_eval=t_eval)

plt.plot(solution.t, solution.y[0])

plt.xlabel("Time")

plt.ylabel("y(t)")

plt.title("Solution of dy/dt = -2y")

plt.grid()

plt.show()

10. Describe the function in Matplotlib used to plot a graph

Function: plot()

 Belongs to: matplotlib.pyplot

 Used for 2D line plots.

🔹 Basic Example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]

y = [2, 4, 1, 8, 7]

plt.plot(x, y) # Plot the line

plt.title("Simple Line Plot")

plt.xlabel("X-axis")

plt.ylabel("Y-axis")

plt.grid(True)

plt.show()

🧠 Other Useful Plot Functions:

Function Purpose

scatter() Scatter plots

bar() Bar charts

hist() Histograms
Function Purpose

Display images or
imshow()
2D arrays

Multiple plot
subplot()
layout

11. Describe the various ways to create NumPy arrays from Python lists, ranges, and using
built-in functions. Explain the significance of the dtype parameter when creating arrays.

NumPy arrays can be created in multiple ways, primarily from Python sequences, iterables, and built-
in NumPy functions. Here’s an explanation of each method, along with the role of the dtype
parameter:

A. Creating Arrays from Python Lists

You can convert Python lists (or nested lists) into NumPy arrays using numpy.array().

Example:

import numpy as np

list1 = [1, 2, 3, 4]

arr1 = np.array(list1)

print(arr1)

Output:

[1 2 3 4]

For multi-dimensional arrays:

list2 = [[1, 2], [3, 4]]

arr2 = np.array(list2)

print(arr2)

# Output:

[[1 2]

[3 4]]
B. Creating Arrays from Ranges

You can use Python’s range() function in combination with np.array(), or use NumPy’s own arange()
function which is more flexible.

Example using Python range:

arr3 = np.array(range(0, 10, 2))

print(arr3) # Output: [0 2 4 6 8]

Example using NumPy’s arange():

arr4 = np.arange(0, 10, 2)

print(arr4) # Output: [0 2 4 6 8]

C. Using Built-in NumPy Functions

NumPy provides several built-in functions to create arrays efficiently:

1. np.zeros() – Creates an array filled with zeros.

np.zeros((2, 3)) # Output: array([[0., 0., 0.], [0., 0., 0.]])

2. np.ones() – Creates an array filled with ones.

np.ones((2, 2)) # Output: array([[1., 1.], [1., 1.]])

3. np.full() – Creates an array filled with a specified value.

np.full((2, 2), 7) # Output: array([[7, 7], [7, 7]])

4. np.eye() – Creates an identity matrix.

np.eye(3) # Output: 3x3 identity matrix

5. np.linspace() – Creates a specified number of evenly spaced values between two

endpoints.

np.linspace(0, 1, 5) # Output: array([0. , 0.25, 0.5 , 0.75, 1. ])

6. np.random.rand() / np.random.randn() – Create arrays with random values.

np.random.rand(2, 3) # Random values in [0, 1)

D. Significance of the dtype Parameter

The dtype parameter defines the data type of the elements in the array. This is important for:
 Memory efficiency: Specifying float32 instead of the default float64 can save memory.

 Precision control: Choose int32, float64, or complex128 depending on the required precision.

 Type enforcement: Ensures consistency in calculations and avoids type-casting errors.

Example:

np.array([1, 2, 3], dtype=float)

Output:

array([1., 2., 3.])

Supported dtypes include: int32, int64, float32, float64, bool, complex, str, object, etc

12. Explain the attributes and properties of Pandas Series with examples.

A Pandas Series is a one-dimensional labeled array capable of holding any data type (integers,
strings, floats, Python objects, etc.).

Creating a Series

import pandas as pd

s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])

Key Attributes and Properties of Series

Attribute Description Example

s.values Returns the underlying NumPy array array([10, 20, 30, 40])

s.index Returns the index labels Index(['a', 'b', 'c', 'd'], dtype='object')

s.dtype Returns the data type of the Series dtype('int64')

s.shape Returns the shape (number of elements,) (4,)

s.size Number of elements in the Series 4

s.ndim Number of dimensions (always 1) 1

s.name Name of the Series (optional) Can be set via s.name = 'my_series'

s.isnull() Detects missing values Returns a Boolean Series

Attribute Description Example

s.notnull() Opposite of isnull() Returns a Boolean Series

s.hasnans Checks if Series contains NaNs False in the above case

Examples:

print(s.values) # [10 20 30 40]

print(s.index) # Index(['a', 'b', 'c', 'd'], dtype='object')

print(s.dtype) # int64

print(s.shape) # (4,)

print(s.name) # None (initially)

Accessing Elements

 By label: s['b'] → 20

 By position: s[1] → 20

Vectorized Operations

Pandas Series supports element-wise operations:

s + 5 # Adds 5 to each element

Summary Statistics

s.mean(), s.min(), s.max(), s.describe()

13. Explain how NumPy and Pandas can be used together in a data analysis workflow.

NumPy and Pandas are two core libraries in Python's data analysis stack. They complement each
other in various ways. NumPy provides fast, efficient numerical computations, while Pandas builds on
NumPy by offering powerful, user-friendly data structures like Series and DataFrames.

Typical Data Analysis Workflow Using NumPy and Pandas

Step 1: Data Collection

 Data is often imported using Pandas (pd.read_csv(), pd.read_excel(), etc.).

import pandas as pd

df = pd.read_csv('data.csv')

Step 2: Data Cleaning

 Use Pandas for handling missing values, renaming columns, converting types, etc.

df.dropna(inplace=True) # Remove missing values

df.fillna(0, inplace=True) # Replace NaNs with 0

Step 3: Data Transformation

 Convert columns to NumPy arrays for efficient numerical processing.

import numpy as np

values = df['column1'].to_numpy()

normalized = (values - np.mean(values)) / np.std(values)

df['normalized'] = normalized

Step 4: Feature Engineering

 Use NumPy functions for mathematical transformations.

df['log_sales'] = np.log(df['sales'] + 1)

Step 5: Statistical Analysis

 Perform descriptive statistics using both Pandas and NumPy.

mean = np.mean(df['sales'])

summary = df.describe()

Step 6: Visualization (using external libraries)

 Libraries like Matplotlib and Seaborn can plot Pandas Series/DataFrames directly.

Why Use Both?

Task Library Preferred

Efficient numeric computation NumPy

Data manipulation Pandas

Task Library Preferred

Handling missing data Pandas

Descriptive statistics Pandas + NumPy

Matrix algebra NumPy

Indexing and labeling Pandas

14. Demonstrate mathematical operations and statistical functions that can be performed on
Series objects. How do NaN values affect these operations?

Pandas Series supports vectorized operations and built-in statistical functions, making it ideal for
performing computations on data columns.

A. Mathematical Operations on Series

Operation Example Result

Addition s+2 Adds 2 to all elements

Subtraction s-1 Subtracts 1 from all elements

Multiplication s * 3 Multiplies each element by 3

Division s/2 Divides each element by 2

import pandas as pd

s = pd.Series([10, 20, 30, 40])

print(s * 2) # Output: [20, 40, 60, 80]

B. Statistical Functions on Series

Function Description

s.sum() Sum of all elements

s.mean() Mean (average)

s.median() Median value

Function Description

s.std() Standard deviation

s.var() Variance

s.min() Minimum value

s.max() Maximum value

s.count() Count of non-NaN values

s.describe() Summary statistics

s = pd.Series([10, 20, 30, 40])

print(s.mean()) # Output: 25.0

print(s.describe())

C. Handling of NaN (Missing) Values

NaN (Not a Number) values automatically get excluded in most statistical computations unless
explicitly handled.

s = pd.Series([10, 20, None, 40])

print(s.mean()) # Output: 23.33 (ignores None/NaN)

print(s.sum()) # Output: 70

print(s.count()) # Output: 3 (non-null values)

Detecting and Handling NaN:

Function Use

s.isnull() Returns a boolean Series indicating NaNs

s.notnull() Opposite of isnull()

s.dropna() Removes NaNs

s.fillna(x) Replaces NaNs with a specified value

s.fillna(0, inplace=True) # Replace NaNs with 0

Key Notes on NaN in Mathematical Ops

 Operations preserve NaN positions (they don’t get removed automatically).

s = pd.Series([1, np.nan, 3])

print(s * 2) # Output: [2, NaN, 6]

 Use skipna=False to force error or include NaNs in stats:

s.mean(skipna=False) # Output: NaN

15. How to concatenate NumPy arrays both horizontally and vertically. What happens when
the arrays have different shapes?

Concatenation in NumPy refers to joining multiple arrays along an axis. NumPy provides several
functions for concatenation, such as np.concatenate(), np.vstack(), and np.hstack().

✅ A. Horizontal Concatenation (Along Columns / Axis=1)

Method 1: np.concatenate()

import numpy as np

a = np.array([[1, 2], [3, 4]]) # shape: (2, 2)

b = np.array([[5, 6], [7, 8]]) # shape: (2, 2)

# Horizontal concatenation

result = np.concatenate((a, b), axis=1)

print(result)

Output:

[[1 2 5 6]

[3 4 7 8]]

Method 2: np.hstack()

result = np.hstack((a, b))

Same output as above.

✅ B. Vertical Concatenation (Along Rows / Axis=0)

Method 1: np.concatenate()

result = np.concatenate((a, b), axis=0)

print(result)

Output:

[[1 2]

[3 4]

[5 6]

[7 8]]

Method 2: np.vstack()

result = np.vstack((a, b))

Same output as above.

C. What Happens When Arrays Have Different Shapes?

If the arrays do not match in shape along the concatenation axis, NumPy will raise a ValueError.

Example of Shape Mismatch (Horizontal):

a = np.array([[1, 2], [3, 4]]) # shape: (2, 2)

b = np.array([[5], [6], [7]]) # shape: (3, 1)

np.concatenate((a, b), axis=1) # Raises ValueError!

Error:

ValueError: all the input arrays must have the same number of rows for axis=1

Example of Shape Mismatch (Vertical):

a = np.array([[1, 2], [3, 4]]) # shape: (2, 2)

b = np.array([[5, 6, 7]]) # shape: (1, 3)

np.concatenate((a, b), axis=0) # Raises ValueError!

Error:

ValueError: all the input arrays must have the same number of columns for axis=0

✅ D. Handling Shape Mismatches

To resolve mismatches:

 Reshape the arrays using np.reshape() or np.expand_dims() if needed.

 Use padding or broadcasting if logical (e.g., with zeros).

Example: Reshaping Before Concatenation

a = np.array([1, 2, 3]) # shape: (3,)

b = np.array([[4], [5], [6]]) # shape: (3,1)

# Reshape a to (3,1) for vertical stacking

a = a.reshape((3, 1))

result = np.hstack((a, b)) # Now both shapes are (3,1)

print(result)

16. Discuss DataFrame creation methods and attributes in detail.

A Pandas DataFrame is a 2-dimensional labeled data structure with columns of potentially different
types — like an Excel spreadsheet or SQL table.

A. DataFrame Creation Methods

1. From a Dictionary of Lists or Arrays

Each key becomes a column label.

import pandas as pd
data = {

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

'Salary': [50000, 60000, 70000]

df = pd.DataFrame(data)

2. From a Dictionary of Series

data = {

'A': pd.Series([1, 2, 3], index=['x', 'y', 'z']),

'B': pd.Series([4, 5], index=['x', 'y'])

df = pd.DataFrame(data)

Missing values will be filled with NaN.

3. From a List of Dictionaries

data = [

{'Name': 'Alice', 'Age': 25},

{'Name': 'Bob', 'Salary': 60000}

df = pd.DataFrame(data)

4. From a 2D NumPy Array

import numpy as np

array = np.array([[1, 2], [3, 4]])

df = pd.DataFrame(array, columns=['A', 'B'])

5. From a List of Tuples

data = [('Alice', 25), ('Bob', 30)]

df = pd.DataFrame(data, columns=['Name', 'Age'])

6. From External Sources

 CSV: pd.read_csv('file.csv')

 Excel: pd.read_excel('file.xlsx')

 SQL: pd.read_sql(query, connection)

✅ B. Common DataFrame Attributes

Attribute Description Example

df.shape Returns (rows, columns) (3, 3)

df.columns Returns column labels as an Index Index(['Name', 'Age', 'Salary'])

df.index Returns row labels RangeIndex(start=0, stop=3, step=1)

df.dtypes Returns the data types of each column Name: object, Age: int64, ...

df.size Total number of elements 9 for 3x3 DataFrame

df.ndim Number of dimensions (always 2) 2

df.values Numpy array of all values array([...])

df.info() Summary of index, columns, and data types

df.head(n) First n rows (default 5)

df.tail(n) Last n rows

17. Discuss indexing, reindexing, and aligning Series objects of Pandas Series with examples

A. Indexing in Series

Series objects are like dictionaries; they map labels (indices) to data (values).

1. Positional Indexing

s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

print(s[0]) # 10

2. Label-based Indexing

print(s['b']) # 20

3. Slicing
print(s[1:]) # Uses position

print(s['a':'c']) # Uses label (inclusive of 'c')

4. Boolean Indexing

print(s[s > 15]) # Output: Series with values > 15

B. Reindexing Series

Reindexing means changing the index of a Series, potentially introducing or removing data.

Example:

s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

# Reindexing with a new index

s2 = s.reindex(['a', 'b', 'd'])

print(s2)

Output:

a 10.0

b 20.0

d NaN

You can fill in missing values:

s2 = s.reindex(['a', 'b', 'd'], fill_value=0)

✅ C. Aligning Series

Alignment happens automatically during arithmetic operations between Series with different
indexes.

Example:

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

s2 = pd.Series([10, 20, 30], index=['b', 'c', 'd'])

print(s1 + s2)

Output:

a NaN

b 12.0

c 23.0
d NaN

 Only matching indices are summed.

 Non-matching indices result in NaN.

✅ Handling Missing Data After Alignment

You can use .add() with a fill_value:

s1.add(s2, fill_value=0)

Now, missing values are treated as zero:

a 1.0

b 12.0

c 23.0

d 30.0

18. Compare the performance of NumPy's statistical operations with equivalent operations in
pure Python. Include examples of calculating mean, median, standard deviation,
correlation, and other statistical measures.

NumPy offers fast, vectorized operations using compiled C code, while pure Python uses interpreted
loops, which are slower and more verbose. Let’s compare both in terms of performance, simplicity,
and readability.

✅ A. Setup

import numpy as np

import statistics

import time

data = list(range(1, 1_000_001)) # 1 million numbers

np_data = np.array(data)

✅ B. Mean Calculation

Pure Python:
start = time.time()

mean_py = sum(data) / len(data)

end = time.time()

print("Pure Python Mean:", mean_py, "Time:", end - start)

NumPy:

start = time.time()

mean_np = np.mean(np_data)

end = time.time()

print("NumPy Mean:", mean_np, "Time:", end - start)

🔹 NumPy is ~10–50x faster for large datasets.

C. Median Calculation

Pure Python:

median_py = statistics.median(data)

NumPy:

median_np = np.median(np_data)

🔸 NumPy uses quickselect under the hood, much faster than Python’s sort-based method.

D. Standard Deviation

Pure Python:

std_py = statistics.stdev(data) # sample std dev

NumPy:

std_np = np.std(np_data, ddof=1) # ddof=1 for sample std dev

NumPy is significantly faster, especially with millions of numbers.

E. Correlation Coefficient

Pure Python:
# Manually compute the Pearson correlation coefficient

def pearson_corr(x, y):

mean_x = sum(x)/len(x)

mean_y = sum(y)/len(y)

num = sum((a - mean_x)*(b - mean_y) for a, b in zip(x, y))

denom = (sum((a - mean_x)**2 for a in x) * sum((b - mean_y)2 for b in y)) 0.5

return num / denom

corr_py = pearson_corr(data, data)

NumPy:

corr_np = np.corrcoef(np_data, np_data)[0, 1]

💡 NumPy is much faster, more accurate, and handles edge cases better.

F. Performance Summary

Pure Python NumPy

Operation
(time) (time)

Mean Slower Very Fast

Fast
Median Slower (sorts)
(Quickselect)

Std Deviation Slower Fast

Fast & built-

Correlation Verbose & slow
in

G. Conclusion

 Use NumPy for any statistical work on large or even moderately sized datasets.
19. Explain how to select, filter, and manipulate data using different indexing methods (loc,
iloc, at, iat) used in DataFrame creation in Pandas.

Pandas provides four main indexing methods to access data in DataFrames:

✅ A. loc[] – Label-based Indexing

Access rows and columns by labels (names, not positions).

df = pd.DataFrame({

'Name': ['Alice', 'Bob', 'Charlie'],

'Age': [25, 30, 35],

}, index=['a', 'b', 'c'])

df.loc['a'] # Row with label 'a'

df.loc['a', 'Age'] # Single value: Age of Alice

df.loc[:, 'Age'] # Entire Age column

🔹 Can also be used with boolean filtering:

df.loc[df['Age'] > 25]

✅ B. iloc[] – Position-based Indexing

Access rows and columns by integer position (like NumPy arrays).

df.iloc[0] # First row

df.iloc[0, 1] # First row, second column

df.iloc[:, 1] # All rows, second column

🔸 Useful when labels are unknown or not sequential.

✅ C. at[] – Fast Scalar Access (Label-based)

Access a single value using row and column labels (faster than loc[]).

df.at['a', 'Age'] # Faster than df.loc['a', 'Age']

✅ Best for scalar access when performance matters.

✅ D. iat[] – Fast Scalar Access (Position-based)

Access a single value using integer positions (like iloc, but faster).

df.iat[0, 1] # Age of first row (Alice)

✅ E. Comparison Summary

Method Based On Can Slice? Fast? Use Case

Named
loc Labels ✅ Yes Slow rows/colu
mns

Index-
iloc Positions ✅ Yes Medium based
access

Fast
access to
at Labels ❌ No ✅✅
one value
(label)

Fast
access to
iat Positions ❌ No ✅✅
one value
(pos)

✅ F. Filtering Example with loc

# Select people older than 25

df.loc[df['Age'] > 25]

✅ G. Updating Data

df.loc['a', 'Age'] = 26 # Using label

df.iloc[0, 1] = 27 # Using position

df.at['a', 'Age'] = 28 # Fast single update

df.iat[0, 1] = 29 # Fastest single update

Murder at The Old Wolf Inn Hires
100% (5)
Murder at The Old Wolf Inn Hires
36 pages
SACHS Actuator Installation
100% (3)
SACHS Actuator Installation
2 pages
CAPE Chemistry Unit 2 Paper 1 2007-2017
77% (31)
CAPE Chemistry Unit 2 Paper 1 2007-2017
108 pages
Week 4- Introduction to Python #3
No ratings yet
Week 4- Introduction to Python #3
47 pages
4 Introduction to Python Part 3 (2)
No ratings yet
4 Introduction to Python Part 3 (2)
48 pages
4 Introduction to Python Part 3(1)
No ratings yet
4 Introduction to Python Part 3(1)
62 pages
Ch-2 Python Libraries For ML
No ratings yet
Ch-2 Python Libraries For ML
70 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
FINAL FDS MANUAL print
No ratings yet
FINAL FDS MANUAL print
55 pages
05-Unit-V Python Lecture Notes
No ratings yet
05-Unit-V Python Lecture Notes
14 pages
M3-Introduction to Numpy and Pandas
No ratings yet
M3-Introduction to Numpy and Pandas
55 pages
Python_unit4_Answers
No ratings yet
Python_unit4_Answers
2 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
Packages
No ratings yet
Packages
37 pages
PPS - Unit 5 (Imp Topics)
No ratings yet
PPS - Unit 5 (Imp Topics)
7 pages
Unit 5 PythonPackages(Matplotlib)
No ratings yet
Unit 5 PythonPackages(Matplotlib)
24 pages
Attachment 3 Python for Data Analysis Lyst9850 (1)
No ratings yet
Attachment 3 Python for Data Analysis Lyst9850 (1)
31 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
PyDays Day-2 - Final
No ratings yet
PyDays Day-2 - Final
26 pages
Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
Machine Learning Lab File: Submitted To: Submitted by
9 pages
FDS record last copy
No ratings yet
FDS record last copy
61 pages
Numpy
No ratings yet
Numpy
30 pages
Chapter 3 Python For Data Science
No ratings yet
Chapter 3 Python For Data Science
81 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
CS3361-Data Science Lab Manual - B.rethina Kumar
No ratings yet
CS3361-Data Science Lab Manual - B.rethina Kumar
36 pages
lab manual fds
No ratings yet
lab manual fds
44 pages
NumPy & Pandas
No ratings yet
NumPy & Pandas
27 pages
Fds Lab Manual
No ratings yet
Fds Lab Manual
61 pages
LAB 2 DWM
No ratings yet
LAB 2 DWM
13 pages
Python Abstract
No ratings yet
Python Abstract
7 pages
DSE UNIT 3
No ratings yet
DSE UNIT 3
12 pages
DAY6 Pandas Seaborn
No ratings yet
DAY6 Pandas Seaborn
97 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Numpy_Data_Analysis_and_visualisation_with_Python
No ratings yet
Numpy_Data_Analysis_and_visualisation_with_Python
75 pages
Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
Fundamentals of Data Science Lab Manual New1
No ratings yet
Fundamentals of Data Science Lab Manual New1
32 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
NUMPY
No ratings yet
NUMPY
33 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Python For Data Science
No ratings yet
Python For Data Science
4 pages
UNIT IV FDS
No ratings yet
UNIT IV FDS
142 pages
fods lab
No ratings yet
fods lab
36 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Python-Unit-4
No ratings yet
Python-Unit-4
43 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
Numpy
No ratings yet
Numpy
32 pages
Top Python Questions 1735201448
No ratings yet
Top Python Questions 1735201448
25 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
20CA2204 DATA SCIENCE QB WITH ANSWERS
No ratings yet
20CA2204 DATA SCIENCE QB WITH ANSWERS
48 pages
Fds Record
No ratings yet
Fds Record
69 pages
dav 2 unit
No ratings yet
dav 2 unit
55 pages
fds lab manual[1]
No ratings yet
fds lab manual[1]
24 pages
Report
No ratings yet
Report
18 pages
DE LAB MANUAL NEW
No ratings yet
DE LAB MANUAL NEW
24 pages
B14_LT2_07_Numpy Matplotlib Pandas
No ratings yet
B14_LT2_07_Numpy Matplotlib Pandas
101 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
New Chat
No ratings yet
New Chat
30 pages
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Cat Is Kinda Sussy Baka
No ratings yet
Cat Is Kinda Sussy Baka
9 pages
Lesson Plan Waste Management
No ratings yet
Lesson Plan Waste Management
7 pages
Automotive Aftermarket: Adhesive and Sealants Guide
No ratings yet
Automotive Aftermarket: Adhesive and Sealants Guide
48 pages
The Complete MARILLION Discography V2 PDF
No ratings yet
The Complete MARILLION Discography V2 PDF
13 pages
Pds - Avalon HLP
No ratings yet
Pds - Avalon HLP
1 page
Warren Buffet Principles
No ratings yet
Warren Buffet Principles
6 pages
英文辦公室 Business English (Resume)
No ratings yet
英文辦公室 Business English (Resume)
3 pages
PO Service Line Item Quantity Exceed Validation Against PR Quantity - SAP Blogs
No ratings yet
PO Service Line Item Quantity Exceed Validation Against PR Quantity - SAP Blogs
14 pages
Sti0808 Hill
No ratings yet
Sti0808 Hill
7 pages
Siprotec-5 Series 7Sd86 Relay: L&T Electrical & Automation
No ratings yet
Siprotec-5 Series 7Sd86 Relay: L&T Electrical & Automation
1 page
Solution of GPSC A.En. Exam Held On 23.06.19
No ratings yet
Solution of GPSC A.En. Exam Held On 23.06.19
27 pages
Reading Compilation by Susi Fauziah
100% (3)
Reading Compilation by Susi Fauziah
59 pages
TR 72 Durable Post Tensioned Concrete Bridge
0% (1)
TR 72 Durable Post Tensioned Concrete Bridge
6 pages
Handbook of Pharmaceutical Manufacturing Formulations, Third Edition-Volume Four, Semisolid Products Sarfaraz K. Niazi (Author)
100% (3)
Handbook of Pharmaceutical Manufacturing Formulations, Third Edition-Volume Four, Semisolid Products Sarfaraz K. Niazi (Author)
62 pages
Thesis On Construction Technology and Management PDF
100% (1)
Thesis On Construction Technology and Management PDF
8 pages
UcD-XLiteFB - Discrete Class D Amplifier Fullbridge REM BLONG v.1
No ratings yet
UcD-XLiteFB - Discrete Class D Amplifier Fullbridge REM BLONG v.1
1 page
Aadhar Kendra From Pune Municipal Corporation - Pune Municipal Corporation
No ratings yet
Aadhar Kendra From Pune Municipal Corporation - Pune Municipal Corporation
4 pages
01-Unit-01-Waves and Optics Lectures L1-L3
No ratings yet
01-Unit-01-Waves and Optics Lectures L1-L3
15 pages
Doubt and Discussion Links of Telegram (EE)
No ratings yet
Doubt and Discussion Links of Telegram (EE)
2 pages
314313-ESTIMATING-COSTING-AND-VALUATION
No ratings yet
314313-ESTIMATING-COSTING-AND-VALUATION
9 pages
Assignment - I: Fundamentals of Interior Designing
100% (1)
Assignment - I: Fundamentals of Interior Designing
9 pages
Khasi House, Meghalaya Amulya, Mankit
100% (1)
Khasi House, Meghalaya Amulya, Mankit
9 pages
Amended Verified Petition and Complaint Mcinchak v. City of Carmel
No ratings yet
Amended Verified Petition and Complaint Mcinchak v. City of Carmel
38 pages
Bai Word
No ratings yet
Bai Word
4 pages
Flare Radiation
No ratings yet
Flare Radiation
27 pages
Turbocompounding Technology
No ratings yet
Turbocompounding Technology
23 pages
SQL - A Practical Introduction
100% (1)
SQL - A Practical Introduction
180 pages