0% found this document useful (0 votes)
2 views

Data Analysis and Visualization-Theory -R22A

The document outlines a course on Data Analysis and Visualization for Computer Science and Engineering students, focusing on Python libraries such as Numpy, Pandas, and Matplotlib. It details course objectives, outcomes, and a structured curriculum divided into five units covering topics from data manipulation to visualization techniques. Additionally, it includes recommended textbooks and online resources for further learning.

Uploaded by

shobitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Analysis and Visualization-Theory -R22A

The document outlines a course on Data Analysis and Visualization for Computer Science and Engineering students, focusing on Python libraries such as Numpy, Pandas, and Matplotlib. It details course objectives, outcomes, and a structured curriculum divided into five units covering topics from data manipulation to visualization techniques. Additionally, it includes recommended textbooks and online resources for further learning.

Uploaded by

shobitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

BE-COMPUTER SCIENCE AND ENGINEERING

22CSCxx
DATA ANALYSIS AND VISUALIZATION

Instruction 3
Hours per week
Duration of SEE 3 Hours
SEE 60 Marks
CIE 40 Marks
Credits 3

Prerequisite: Python Programming

Course Objectives: This course aims to:


1. Introduce the Numpy library in Python to support storage and operations on large multi-
dimensional arrays and matrices
2. Introduce large collection of mathematical functions to operate on multidimensional sequential data
structures
3. Demonstrate the functionality of the Pandas library in Python for open-source data analysis and
manipulation
4. Demonstrate Data Aggregation, Grouping and Time Series analysis with Pandas
5. Introduce the Matplotlib library in Python for creating static, animated and interactive
visualizations

Course Outcomes: Upon completion of this course, students will be able to:
1. Create, manipulate, and analyze numerical data using NumPy arrays and associated functions.
2. Perform various preprocessing operations on datasets using Pandas Series and DataFrame objects.
3. Combine and manipulating complex datasets using a variety of Pandas techniques, including
concatenation, merging, grouping, aggregation, and time series analysis,
4. Apply inferential statistics to analyze data, draw valid conclusions about populations, based on hypothesis
testing, confidence intervals, and correlation analysis.
5. Create and interpret different types of data visualizations using Matplotlib and Seaborn

CO-PO Articulation Matrix


PO/PSO PO PO PO PO PO PO PO PO PO PO PO PSO PSO PSO
CO 1 2 3 4 5 6 7 8 9 10 11 1 2 3
CO 1 2 - - - - - - - - - - 1 1 -
CO 2 3 2 - 1 1 - - - - - - 1 1 -
CO 3 3 1 - 3 1 - - - - - - 2 2 2
CO 4 3 2 1 3 1 - - - - - - 2 2 -
CO 5 2 2 - 2 1 - - - - - - 2 1 2

UNIT - I
Introduction to Numpy: Data types in Python - Fixed type arrays, creating arrays, array indexing, array slicing,
reshaping arrays, array concatenation and splitting, Universal Functions, Aggregations, Broadcasting rules,
Comparisons, Boolean Arrays, Masks Fancy Indexing, Fast Sorting using np.sort and np.argsort, partial sorting
Creating Structured Arrays, Compound types and Record Arrays.

UNIT - II
Introduction to Pandas: Series Object, DataFrame Object, Data Indexing and Selecting for Series and
DataFrames, Universal Functions for Index Preservation, Index Alignment and Operations between Series and
DataFrames, Handling missing data, operating on Null values, Hierarchical Indexing.

CHAITANYA BHARATHI INSTITUTE OF TECHNOLOGY(A)


BE-COMPUTER SCIENCE AND ENGINEERING

UNIT - III
Combining Datasets: Concat, Append, Merge and Joins, Aggregation and Grouping, Pivot Tables, Vectorized
String Operations, High-Performance functions - query() and eval()

UNIT - IV
Time Series : Date and Time Data Types and Tools ,Time Series Basics , Date Ranges, Frequencies, and
Shifting ,Time Zone Handling , Time Zone Localization and Conversion , Operations with Time Zone-Aware
Timestamp Objects , Operations Between Different Time Zones ,Periods and Period Arithmetic ,Resampling
and Frequency Conversion , Moving Window Functions.

UNIT - V
Visualization with Matplotlib : Simple Line plots, Scatter plots, Visualizing errors, Density and Contour plots,
Histograms, Binnings, Multiple subplots, Three-dimensional plotting with Matplotlib, Geographic data with
Basemap, Visualization with Seaborn.

Text Books:
1. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly Media, 2016.
2. Wes McKinney, “Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter”, 3rd
Edition, 2022

Suggested Reading:
1. Samir Madhavan, “Mastering Python for Data Science”, Packt Publishing, 2015

Online Resources:
1. https://numpy.org/doc/stable/user/index.html
2. https://pandas.pydata.org/
3. https://matplotlib.org/
4. https://seaborn.pydata.org/tutorial.html
5. https://www.coursera.org/learn/data-analysis-with-python

CHAITANYA BHARATHI INSTITUTE OF TECHNOLOGY(A)

You might also like