0% found this document useful (0 votes)
4 views

Data Analysis and Visualization LAB

The document outlines a Data Analysis and Visualization Lab course that focuses on using NumPy and Pandas for data manipulation and analysis. It includes course objectives, outcomes, a CO-PO articulation matrix, lab experiments, required tools, and preferred textbooks. Students will learn to create, manipulate, and visualize data through various programming techniques and libraries.

Uploaded by

shobitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Analysis and Visualization LAB

The document outlines a Data Analysis and Visualization Lab course that focuses on using NumPy and Pandas for data manipulation and analysis. It includes course objectives, outcomes, a CO-PO articulation matrix, lab experiments, required tools, and preferred textbooks. Students will learn to create, manipulate, and visualize data through various programming techniques and libraries.

Uploaded by

shobitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

22CSCxx

DATA ANALYSIS AND VISUALIZATION LAB


Instruction 2 Hours per week
Duration of SEE - Hours
SEE 50 Marks
CIE 50 Marks
Credits

Prerequisite: Python Programming

Course Objectives:
1. To introduce the fundamental concepts of NumPy for efficient array creation, manipulation, and
broadcasting.
2. To develop proficiency in performing numerical computations using universal and aggregation
functions in NumPy.
3. To familiarize students with structured arrays and compound data types for managing heterogeneous
data.
4. To enable students to handle and analyze data using Pandas structures such as Series and Data Frames.
5. To equip students with skills to merge, group, and visualize data using advanced Pandas functionalities
and perform basic inferential statistical analysis.
Course Outcomes:
1. Describe and differentiate between 1D, 2D, and 3D arrays and their attributes such as shape, size, and
data type using NumPy.
2. Implement NumPy functions to perform slicing, reshaping, broadcasting, and array-based computations
on structured and unstructured data.
3 Analyze and manipulate tabular datasets using Pandas through filtering, indexing, hieratical operations
4. Evaluate and combine datasets using merge, join, and group-based aggregation methods; summarize
data using pivot tables.
5. Design and simulate data analysis workflows involving string processing, time series analysis, and
statistical testing using NumPy and Pandas.

CO-PO Articulation Matrix

PO/PSO PO PO PO PO PO PO PO PO PO PO PO PSO PSO PSO


CO 1 2 3 4 5 6 7 8 9 10 11 1 2 3
CO 1 2 1 - - 2 - - - - - - 1 1 -
CO 2 1 1 - - 1 - - - - - - 1 1 -
CO 3 2 1 - - - - - - - - - 1 1 -
CO 4 1 1 - 1 1 - - - - - - 1 2 -
CO 5 1 1 - 2 1 - - - - - - 1 2 -

Lab Experiments:

1. Create and manipulate 1D, 2D, and 3D arrays using NumPy. Inspect array attributes (e.g., shape, size,
dtype) and demonstrate slicing, reshaping, and broadcasting operations.
2. Apply NumPy universal functions (e.g., np.sin(), np.exp()), aggregation functions (e.g., np.sum(),
np.mean()), Boolean indexing, and fancy indexing. Perform sorting using np.sort() and np.argsort().
3. Create structured arrays using NumPy with compound data types and perform field-based data access
and manipulation.
4. Create Pandas Series and DataFrames. Perform data selection (e.g., loc, iloc) and filtering (e.g.,
Boolean masks), and handle missing values using isnull(), fillna(), and dropna().
5. Perform arithmetic operations on Series and DataFrames. Implement hierarchical indexing,
stack/unstack operations, and demonstrate index alignment in operations.
6. Combine datasets using merge(), join(), and concat(). Demonstrate different join strategies (e.g., inner,
outer, left, right). Note: Avoid using the deprecated append() method.
7. Perform group-based aggregation using groupby() and construct pivot tables for summarizing
multidimensional data.
8. Apply string operations using the .str accessor (e.g., lower(), split()). Work with time-indexed data by
parsing dates, resampling, and applying frequency conversions (e.g., daily to monthly).
9. Use Pandas query() and eval() functions for high-performance filtering and computation.
10. Perform time zone localization, conversion, and arithmetic operations with time zone-aware timestamp
objects.
11. Use a dataset with irregular timestamps (e.g., stock prices or sensor readings). Convert the data into
different frequencies using aggregations .Apply a rolling mean to reduce noise and highlight trends.Plot
the original, resampled, and smoothed data using Matplotlib/Seaborn to observe differences.
12. Create visualizations using Matplotlib and Seaborn, including line plots, bar charts, scatter plots,
histograms, kernel density estimates (KDEs), pair plots, violin plots, and heatmaps.

Tools Required
1. Python (3.7+)
2. Jupyter Notebook or Google Colab
3. Libraries: NumPy, Pandas, Matplotlib, Seaborn, Scipy, Statsmodels

Preferred Textbooks
1. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly Media, 2016.
2. Samir Madhavan, “Mastering Python for Data Science”, Packt Publishing, 2015.

References

1. Wes McKinney, Python for Data Analysis: Data Wrangling with Pandas, NumPy, and Python,
O’Reilly Media, 2018.

You might also like