0% found this document useful (0 votes)

6 views96 pages

FIN_FODS

The document outlines the vision and mission of an ISO 9001:2015 certified engineering institution affiliated with Anna University, Chennai, focusing on academic excellence, industry readiness, and innovation. It details the objectives and outcomes of the Computer Science and Engineering program, including educational goals and course specifics for Foundations of Data Science. Additionally, it includes a list of students enrolled in the program and various academic resources such as timetables and lesson plans.

Uploaded by

sabimoorthy02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views96 pages

FIN_FODS

Uploaded by

sabimoorthy02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 96

An ISO 9001:2015 Certified

Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

INSTITUTION
VISION AND MISSION
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

COLLEGE VISION AND MISSION

VISION
Our vision is to become a most preferred engineering college in
TamilNadu by creating a student-centric ecosystem that fosters excellence.

MISSION
 Academic Excellence

 Industry Readiness

 Industry Colloboration

 Quality Accrediation

 Innovation Ecosystem
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

DEPARTMENT
VISION AND MISSION
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

VISION
Empower minds through technology, transforming possibilities
into realities embodies our commitment to leveraging the power of technology
to unlock the potential of individuals communities. We believe in empowering
minds by providing access to cutting-edge tools, knowledge, and resources that
enable individuals to innovate, create, and achieve their goals.

MISSION
 Unleashing potential: Our educational approach is designed to uncover
and unleash these potential, providing students with the tools, resources
and opportunities needed to excel and succeed.
 Inspiring innovation: By fostering a culture of innovation, we empower
students to become agents of change and develop groundbreaking
solutions to real-world problems
 Driving excellence in education: Through rigorous academic standards,
personalized support, and a focus on continuous improvement, we strive
to equip our students with the knowledge, skills, and values needed to
excel in an ever-changing world
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

PROGRAM EDUCATIONAL
OBJECTIVES(PEOs),
PROGRAM SPECIFIC
OUTCOMES (PSOs) AND
PROGRAM OUTCOME(POs)
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

ANNA UNIVERSITY, CHENNAI

NON - AUTONOMOUS AFFILIATED COLLEGES
REGULATIONS 2021
CHOICE BASED CREDIT SYSTEM
B.E. COMPUTER SCIENCE AND ENGINEERING

I. PROGRAM EDUCATIONAL OBJECTIVES (PEOs) Graduates

can
 Apply their technical competence in computer science to solve real
world problems, with technical and people leadership.
 Conduct cutting edge research and develop solutions on problems of
social relevance.
 Work in a business environment, exhibiting team skills, work ethics,
adaptability and lifelong learning.

II. PROGRAM SPECIFIC OUTCOMES (PSOs)

The Students will be able to
 Exhibit design and programming skills to build and automate
business solutions using cutting edge technologies.
 Strong theoretical foundation leading to excellence and excitement
towards research, to provide elegant solutions to complex problems.
 Ability to work effectively with various engineering fields as a team
to design, build and develop system applications.

III. PROGRAM OUTCOMES (POs)

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

1. Engineering knowledge : Apply the knowledge of

mathematics, science, engineering fundamentals, and an
engineering specialization to the solution of complex
engineering problems.
2. Problem analysis : Identify, formulate, review research
literature, and analyze complex engineering problems reaching
substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex
engineering problems and design system components or
processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural,
societal, and environmental considerations.
4. Conduct investigations of complex problems : Use research-
based knowledge and research methods including design of
experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.
5. Modern tool usage : Create, select, and apply appropriate
techniques, resources, and modern engineering and IT tools
including prediction and modeling to complex engineering
activities with an understanding of the limitations.
6. The engineer and society : Apply reasoning informed by the
contextual knowledge to assess societal, health, safety, legal and
cultural issues and the consequent responsibilities relevant to the
professional engineering practice.
7. Environment and sustainability : Understand the impact of
the professional engineering solutions in societal and
environmental contexts, and demonstrate the knowledge of, and
need for sustainable development.
8. Ethics : Apply ethical principles and commit to professional
ethics and responsibilities and norms of the engineering practice.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

8 Ethics : Apply ethical principles and commit to professional

ethics and responsibilities and norms of the engineering
practice.
9 Individual and team work : Function effectively as an
individual, and as a member or leader in diverse teams, and in
multidisciplinary settings.
10 Communication : Communicate effectively on complex
engineering activities with the B.E. COMPUTER SCIENCE
AND ENGINEERING 2 engineering community and with
society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective
presentations, and give and receive clear instructions.
11 Project management and finance : Demonstrate knowledge
and understanding of the engineering and management
principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary
environments.
12 Life-long learning : Recognize the need for, and have the
preparation and ability to engage in independent and life-long
learning in the broadest context of technological change.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

CLASS
TIMETABLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

INDIVIDUAL
TIMETABLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

COURSE
OUTCOMES ,COURSE
OBJECTIVES AND
SYLLABUS
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

CS3352 FOUNDATIONS OF DATA SCIENCE LTPC300

COURSE OBJECTIVES:
 To understand the data science fundamentals and process.
 To learn to describe the data for the data science process.
 To learn to describe the relationship between data.
 To utilize the Python libraries for Data Wrangling.
 To present and interpret data using visualization libraries in Python
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

SYLLABUS:

UNIT I INTRODUCTION 9
Data Science: Benefits and uses – facets of data - Data Science Process:
Overview – Defining research goals – Retrieving data – Data preparation -
Exploratory Data analysis – build the model– presenting findings and building
applications - Data Mining - Data Warehousing – Basic Statistical descriptions
of Data
UNIT II DESCRIBING DATA 9
Types of Data - Types of Variables -Describing Data with Tables and Graphs –
Describing Data with Averages - Describing Variability - Normal Distributions
and Standard (z) Scores
UNIT III DESCRIBING RELATIONSHIPS 9
Correlation –Scatter plots –correlation coefficient for quantitative data –
computational formula for correlation coefficient – Regression –regression line
–least squares regression line – Standard error of estimate – interpretation of r2
–multiple regression equations –regression towards the mean
UNIT IV PYTHON LIBRARIES FOR DATA WRANGLING 9
Basics of Numpy arrays –aggregations –computations on arrays –comparisons,
masks, boolean logic – fancy indexing – structured arrays – Data manipulation
with Pandas – data indexing andselection – operating on data – missing data –
Hierarchical indexing – combining datasets – aggregation and grouping – pivot
tables
UNIT V DATA VISUALIZATION 9
Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density
and contour plots – Histograms – legends – colors – subplots – text and
annotation – customization – three dimensional plotting - Geographic Data with
Basemap - Visualization with Seaborn
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

COURSE OUTCOMES
At the end of this course, the students will be able to:
CO1: Define the data science process
CO2: Understand different types of data description for data science process
CO3: Gain knowledge on relationships between data
CO4: Use the Python Libraries for Data Wrangling
CO5: Apply visualization Libraries in Python to interpret and explore data
TOTAL:45 PERIODS
TEXT BOOKS
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali,
“Introducing Data Science”, Manning Publications, 2016. (Unit I)
2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition,
Wiley Publications, 2017. (Units II and III) 69
3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016.
(Units IV and V)
REFERENCES:
1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”,
Green Tea Press,2014
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

CO – PO MAPPING

CO’s PO’s PSO’s

1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
1 2 2 1 2 2 - - - 1 1 1 2 2 2 2
2 2 1 - 1 1 - - - 2 1 1 2 2 3 1
3 2 2 1 2 2 1 1 - 1 2 1 3 2 2 3
4 3 2 2 1 2 - - - 1 1 2 2 3 3 2
5 2 2 1 2 2 - - - 1 1 1 2 2 2 2
AVg. 2 2 1 2 2 1 1 - 1 1 1 2 2 2 2

1-low, 2 - medium, 3 - high, ‘-“- no correlation

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

STUDENT
NAME LIST
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

STUDENT NAME LIST

Reg No Name

513523104047 NADEESH A P
513523104048 NASREEN BANU M
513523104049 NAVINKUMARAN P
513523104050 NAVYA D
513523104051 NIRANJAN G
513523104052 NIWIN RAJ J
513523104053 POOMA R
513523104054 POORANI A
513523104055 PRATHAP T
513523104056 PRATHIPA K
513523104057 PREETHA ZEN S
513523104058 PRIYADHARSHINI A
513523104059 PRIYADHARSHINI R
513523104060 PRIYANKA S
513523104061 RAGHUL RAJ S
513523104062 RAKSHITHA S
513523104063 RASIGAPRIYA K
513523104064 ROOPAN K
513523104065 RUTHIRAN C

513523104066 SAARIYA KOWNEN K

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104067 SAKTHIVEL K
513523104068 SALEEM AHMED A
513523104069 SANGEETHA P
513523104070 SANJEEVAN SURYA KUMAR
513523104071 SATHISH KUMAR D
513523104072 SATHYA BAMA S P
513523104073 SHALINI V
513523104074 SHARANYA M
513523104075 SHREE DEVI V G
513523104076 SIVA KUMAR T
513523104077 SONIYA S
513523104078 SUBASHINI T
513523104079 SUGAVANAN A
513523104080 SUJAN U
513523104081 SULTHAN BABU E A
513523104082 TAMILSELVAN P
513523104083 THARUN KUMAR M
513523104084 THULASI DEVI S
513523104085 VARUN MOHAN M D
513523104086 VENKATESAN S
513523104087 VIDHYA J
513523104088 VIGNESHWARAN B
513523104089 YUVASHRI E
513523104090 YUVASHRI P
513523104301 DEVISRI S
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104302 DHANA SHREE S

513523104303 GOKUL M

DEPARTMENT
ACADEMIC CALENDAR
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

LESSON PLAN
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

FIRST SERIES
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

TIMETABLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

QUESTION PAPER
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

FOUNDATION OF DATA SCIENCE
PART-A
1. What is Data Science?
2. What are the benefits of Data Science?
3. What is Exploratory Data Analysis (EDA)?
4. Define Data Mining.
5. What is the purpose of data preparation in the Data Science
Process?
6. What are Types of Variables?
7. What is Data Warehousing?
8. What is meant by 'Standard Deviation'?
9. What is a 'z-Score'?
10.What are the main types of data?
PART-B
11.Explain the Data Science Process with its stages.
12.Describe the importance of data preparation in the Data
Science process.
13.What are the basic statistical descriptions of data? Explain the
key measures.
14.What is the role of Data Mining in Data Science?
15.Explain the different types of data and their importance in data
analysis.
PART-C
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

16. Discuss the process of describing data with averages,

variability, and distributions. Include normal distributions and z-
scores.

ANSWER KEY
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

PART-A

1.What is Data Science?

Data Science is an interdisciplinary field that combines methods from statistics,
computer science, and domain-specific knowledge to extract insights and make
data-driven decisions from structured and unstructured data.

2.What are the benefits of Data Science?

Data Science enables improved decision-making, optimization of business
processes, identification of trends, better customer experiences, and the ability
to innovate by analyzing large datasets for valuable insights.

3.What is Exploratory Data Analysis (EDA)?

EDA is the process of analyzing and visualizing data to summarize its main
characteristics, identify patterns, detect anomalies, and test assumptions, often
using graphs and summary statistics.

4.Define Data Mining.

Data Mining is the process of discovering patterns, correlations, and useful
information from large datasets using techniques like classification, clustering,
and association rule learning.

5.What is the purpose of data preparation in the Data Science Process?

Data preparation ensures that raw data is cleaned, transformed, and structured
properly, making it suitable for analysis by handling missing values,
inconsistencies, and outliers.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

6.What are Types of Variables?

Variables are classified as either quantitative (numerical) or qualitative
(categorical). Quantitative variables are measurable (e.g., height, weight), while
qualitative variables are categorical (e.g., color, gender).

7.What is Data Warehousing?

Data Warehousing is the process of collecting, storing, and managing data from
different sources in a central repository for the purpose of reporting and data
analysis.

8.What is meant by 'Standard Deviation'?

Standard deviation is a measure of how much the data points deviate from the
mean. It quantifies the spread or dispersion of data in a dataset.

9.What is a 'z-Score'?
A z-score indicates how many standard deviations a data point is away from the
mean of a dataset. It is used to determine if a value is unusual in a given
distribution.

10.What are the main types of data?

The main types of data are:

 Quantitative (Numerical) Data: Data that can be measured and

expressed numerically.
 Qualitative (Categorical) Data: Data that represents categories or labels.

PART-B

11. Explain the Data Science Process with its stages.

The Data Science process is a systematic approach that involves several key
stages, each crucial for extracting meaningful insights from data. These stages
include:

1. Defining Research Goals:

The first step in the Data Science process is to define the research
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

problem or the business question. Clear objectives help guide the entire
analysis and ensure that the right questions are being addressed.
2. Retrieving Data:
In this stage, relevant data is collected from various sources, which may
include databases, APIs, spreadsheets, or external datasets. This data can
be structured (e.g., tables) or unstructured (e.g., text, images).
3. Data Preparation:
This stage involves cleaning the data to ensure that it is accurate,
consistent, and ready for analysis. It includes handling missing values,
correcting errors, transforming variables, and filtering irrelevant data.
4. Exploratory Data Analysis (EDA):
EDA involves the initial analysis of the data to summarize its main
characteristics. This stage typically uses visualizations and basic
statistical methods to understand the distribution, identify patterns, and
detect outliers or anomalies.
5. Building the Model:
After understanding the data, appropriate machine learning or statistical
models are selected. These models are then trained using the data to
predict outcomes or discover patterns. Common algorithms include
regression, classification, and clustering.
6. Presenting Findings and Building Applications:
Once the models are built, the findings are presented to stakeholders
using visualizations, reports, and dashboards. If necessary, the models are
deployed into applications or integrated into decision-making processes
to improve business operations.

12. Describe the importance of data preparation in the Data Science

process.

Data preparation is a critical stage in the Data Science process as it directly

impacts the quality and effectiveness of the analysis. The importance of data
preparation can be understood through the following points:

1. Ensures Data Quality:

Data often contains errors, missing values, or inconsistencies. Cleaning
the data ensures that it is accurate and reliable, which is crucial for
building models that yield meaningful results.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

2. Handling Missing Data:

Missing data can skew results and reduce the accuracy of models. Data
preparation involves addressing these gaps by techniques such as
imputation or removing incomplete data.
3. Data Transformation:
Raw data often needs to be transformed into a usable format. This
includes normalization, encoding categorical variables, and scaling
numerical data to ensure that it can be processed effectively by machine
learning algorithms.
4. Identifying Outliers:
Outliers can distort the analysis and lead to incorrect conclusions. Data
preparation helps identify and handle these outliers, either by removing or
adjusting them based on the context.
5. Ensuring Consistency:
Data collected from multiple sources may not be consistent in terms of
format or scale. Standardizing and normalizing the data ensures that all
variables are compatible and comparable.

Without proper data preparation, the accuracy and reliability of the models
would be compromised, leading to suboptimal results and misguided decisions.

13. What are the basic statistical descriptions of data? Explain the key
measures.

Statistical descriptions are used to summarize and describe the features of a

dataset. The key statistical measures used to describe data include:

1. Measures of Central Tendency:

These measures indicate the center or typical value of a dataset:
o Mean: The average of all data points, calculated by summing all
values and dividing by the number of observations.
o Median: The middle value of the dataset when sorted in ascending
order. It is useful for skewed distributions.
o Mode: The most frequent value in the dataset. A dataset can have
multiple modes or none at all.
2. Measures of Dispersion:
These measures describe the spread or variability of the data:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

o Range: The difference between the maximum and minimum

values in the dataset.
o Variance: The average squared deviation of each data point from
the mean, indicating how spread out the data is.
o Standard Deviation: The square root of variance, it gives a more
interpretable measure of spread, representing the typical deviation
from the mean.
3. Shape of the Distribution:
o Skewness: Measures the asymmetry of the data. A positive skew
indicates a longer right tail, while a negative skew indicates a
longer left tail.
o Kurtosis: Measures the "peakedness" of the data. High kurtosis
indicates a sharp peak, while low kurtosis indicates a flatter
distribution.
4. Percentiles and Quartiles:
o These measures divide the data into parts. The median is the 50th
percentile, and the quartiles divide the data into four equal parts.

These statistical descriptions help summarize the data, providing insights into
its central tendency, variability, and overall distribution, making it easier to
understand and model.

14. What is the role of Data Mining in Data Science?

Data Mining plays a crucial role in Data Science by applying algorithms and
techniques to explore and analyze large datasets for hidden patterns,
correlations, and trends. The role of Data Mining can be described as:

1. Pattern Recognition:
Data mining helps identify patterns or relationships that are not
immediately obvious. For example, it can identify customer behavior
trends in e-commerce or detect fraud patterns in financial transactions.
2. Predictive Modeling:
Using historical data, data mining can build predictive models. These
models forecast future events based on existing data, such as predicting
customer churn or sales trends.
3. Classification and Clustering:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

o Classification: Data mining classifies data into predefined

categories, such as spam vs. non-spam emails.
o Clustering: It groups similar data points together, uncovering
natural clusters within the data, which is useful for customer
segmentation or anomaly detection.
4. Association Rule Learning:
This technique discovers relationships between variables. A common
application is market basket analysis, where data mining identifies which
products are often purchased together.
5. Improving Decision-Making:
Data mining supports data-driven decision-making by providing insights
into the underlying trends and patterns. This can help businesses optimize
operations, improve marketing strategies, and enhance customer
experience.

By applying data mining techniques, Data Science can extract valuable insights
from large datasets, improving the decision-making process and driving
business growth.

15. Explain the different types of data and their importance in data
analysis.

In Data Science, understanding the different types of data is crucial because it

determines the methods and techniques used for analysis. The main types of
data are:

1. Nominal Data (Categorical Data):

o Definition: Data that represents categories or labels without any
order or ranking (e.g., gender, color, country).
o Importance: Nominal data is important for grouping and counting,
and techniques like bar charts or frequency tables are used for
analysis.
2. Ordinal Data:
o Definition: Data that represents categories with a defined order but
without a specific distance between them (e.g., customer
satisfaction levels, ranking scales).
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

o Importance: Ordinal data helps identify relationships and trends,

and techniques like median or mode are useful for summarizing
this type of data.
3. Interval Data:
o Definition: Numerical data with equal intervals between values but
no true zero point (e.g., temperature in Celsius or Fahrenheit).
o Importance: Interval data allows for meaningful addition and
subtraction, and measures like mean and standard deviation can be
used for analysis.
4. Ratio Data:
o Definition: Numerical data with equal intervals and a true zero
point (e.g., height, weight, income).
o Importance: Ratio data allows for all mathematical operations
(addition, subtraction, multiplication, division) and is the most
flexible type of data for statistical analysis.

Understanding the type of data helps Data Scientists choose the correct
statistical techniques and ensures accurate and meaningful analysis. Each type
has specific tools and methods suited for its analysis, guiding the choice of
algorithms and models.

PART-C

16.Discuss the process of describing data with averages, variability, and

distributions. Include normal distributions and z-scores.
Describing data is a fundamental aspect of data analysis, and it involves
summarizing the data using various statistical measures. These measures help in
understanding the characteristics of the data, such as its central tendency,
spread, and distribution shape. Key components of describing data include
averages, variability, and distributions,

with normal distributions and z-scores playing essential roles in summarizing

and interpreting data.
1. Averages (Measures of Central Tendency)
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

Averages represent the central point around which data points are distributed.
The primary measures of central tendency are:
 Mean: The mean is the arithmetic average of a dataset. It is calculated by
summing all data points and dividing by the total number of points. The
mean provides an overall idea of the dataset's central value but can be
affected by extreme values (outliers).
Mean(μ)=∑xin\text{Mean} (\mu) = \frac{\sum x_i}{n}Mean(μ)=n∑xi
where xix_ixi represents each data point and nnn is the total number of
data points.
 Median: The median is the middle value of a dataset when arranged in
ascending or descending order. The median is less sensitive to outliers
than the mean and is particularly useful in skewed distributions.
 Mode: The mode represents the most frequently occurring value in the
dataset. A dataset may have one mode (unimodal), multiple modes
(multimodal), or no mode at all if all values are unique.
2. Variability (Measures of Spread)
While averages provide information about the central location of the data,
variability describes the extent to which the data points spread out from the
center. The key measures of variability include:
 Range: The range is the difference between the highest and lowest values
in the dataset.
Range=Maximum−Minimum\text{Range} = \text{Maximum} - \
text{Minimum}Range=Maximum−Minimum
Although easy to calculate, the range is highly sensitive to outliers and
may not be a reliable measure of spread in skewed datasets.

 Variance: Variance measures the average squared deviation from the

mean. It quantifies how spread out the data is. A higher variance indicates
that the data points are more spread out from the mean.
Variance(σ2)=∑(xi−μ)2n\text{Variance} (\sigma^2) = \frac{\sum (x_i - \
mu)^2}{n}Variance(σ2)=n∑(xi−μ)2
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 Standard Deviation: Standard deviation is the square root of the

variance and provides a more interpretable measure of spread, as it is in
the same unit as the original data.
Standard Deviation(σ)=Variance\text{Standard Deviation} (\sigma) = \
sqrt{\text{Variance}}Standard Deviation(σ)=Variance
A smaller standard deviation means the data points are closely clustered
around the mean, while a larger standard deviation indicates a wider
spread.
3. Distributions
A distribution describes how the data is spread or arranged across values. The
distribution provides insights into the general pattern or shape of the data.
 Normal Distribution:
The normal distribution, also known as the bell curve, is one of the
most important and common probability distributions in statistics. It is
symmetric, with the majority of the data points clustering around the
mean, and fewer points appearing as you move further away from the
mean.
In a normal distribution:
o The mean, median, and mode are all equal.
o The data is symmetric around the mean.
o About 68% of the data falls within one standard deviation from the
mean.
o About 95% of the data falls within two standard deviations.
o About 99.7% of the data falls within three standard deviations.
The formula for the normal distribution's probability density function
(PDF) is:
f(x)=1σ2πe−(x−μ)22σ2f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x
- \mu)^2}{2\sigma^2}}f(x)=σ2π1e−2σ2(x−μ)2
where μ\muμ is the mean and σ\sigmaσ is the standard deviation.
The normal distribution is widely used in various fields because many
real-world variables (e.g., height, weight, test scores) tend to follow this
distribution pattern.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 Skewness and Kurtosis:

While normal distributions are symmetric, real-world data may exhibit
skewness or kurtosis.
o Skewness refers to the asymmetry of the distribution. A positive
skew means the data has a long right tail (more values on the left),
and a negative skew means the data has a long left tail.
o Kurtosis measures the "tailedness" of the distribution. High
kurtosis indicates heavy tails (more outliers), while low kurtosis
indicates lighter tails.
4. Z-Scores
A z-score (also called the standard score) is a measure of how many standard
deviations a data point is from the mean of the dataset. Z-scores are used to
standardize data and make comparisons across different datasets or distributions
with different means and standard deviations.
The z-score for a data point xxx is calculated as:
z=x−μσz = \frac{x - \mu}{\sigma}z=σx−μ
where xxx is the value, μ\muμ is the mean, and σ\sigmaσ is the standard
deviation.
 A z-score of 0 indicates that the data point is exactly at the mean.
 A z-score of 1 indicates that the data point is one standard deviation
above the mean.
 A z-score of -1 indicates that the data point is one standard deviation
below the mean.
Z-scores are valuable for comparing data from different distributions. For
instance, if you want to compare the test scores of students from different
schools, converting
both sets of scores to z-scores allows you to determine which student performed
better relative to their peers, regardless of the overall scores in each school.
Z-scores are also useful for identifying outliers. Typically, a z-score above 3 or
below -3 suggests that a data point is an outlier, as it falls far from the mean
(more than three standard deviations away)
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

SAMPLE ANSWER SHEET

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

MARK
STATEMENT
&
RESULT ANALYSIS
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

MARK STATEMENT

Reg No Name IAT 1 MARKS

513523104047 NADEESH A P 71
513523104048 NASREEN BANU M 0
513523104049 NAVINKUMARAN P 13
513523104050 NAVYA D 70
513523104051 NIRANJAN G 30
513523104052 NIWIN RAJ J 0
513523104053 POOMA R 75
513523104054 POORANI A 79
513523104055 PRATHAP T 83
513523104056 PRATHIPA K 92
513523104057 PREETHA ZEN S 76
513523104058 PRIYADHARSHINI A 71
513523104059 PRIYADHARSHINI R 81
513523104060 PRIYANKA S 68
513523104061 RAGHUL RAJ S 0
513523104062 RAKSHITHA S 62
513523104063 RASIGAPRIYA K 93
513523104064 ROOPAN K 0
513523104065 RUTHIRAN C 90
513523104066 SAARIYA KOWNEN K 94
513523104067 SAKTHIVEL K 83
513523104068 SALEEM AHMED A 90
513523104069 SANGEETHA P 62
513523104070 SANJEEVAN SURYA KUMAR 60
513523104071 SATHISH KUMAR D 13
513523104072 SATHYA BAMA S P 69
513523104073 SHALINI V 53
513523104074 SHARANYA M 86
513523104075 SHREE DEVI V G 66
513523104076 SIVA KUMAR T 38
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104077 SONIYA S 85
513523104078 SUBASHINI T 80
513523104079 SUGAVANAN A 50
513523104080 SUJAN U 64
513523104081 SULTHAN BABU E A 0
513523104082 TAMILSELVAN P 58
513523104083 THARUN KUMAR M 59
513523104084 THULASI DEVI S 70
513523104085 VARUN MOHAN M D 41
513523104086 VENKATESAN S 2
513523104087 VIDHYA J 68
513523104088 VIGNESHWARAN B 30
513523104089 YUVASHRI E 72
513523104090 YUVASHRI P 64
513523104301 DEVISRI S 77
513523104302 DHANA SHREE S 71
513523104303 GOKUL M 71
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

RESULT ANALYSIS

NO. OF STUDENTS 47

NO. OF STUDENTS APPEARED 42

NO. OF STUDENTS PASS 35

NO. OF STUDENTS FAIL 07

PASS % 83%

FACULTY SIGN

HOD
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

SECOND SERIES
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

TIMETABLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

QUESTION PAPER
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

FOUNDATION OF DATA SCIENCE
PART-A
1. What is Correlation?
2. What is a Scatter Plot?
3. What is the correlation coefficient?
4. What is the computational formula for the correlation coefficient?
5. What is Regression in statistics?
6. What is the least squares regression line?
7. What is the standard error of estimate?
8. What does r2represent in regression analysis?
9. What is multiple regression?
10.What is regression towards the mean?
PART-B
11.Explain the concept of correlation and the role of the correlation
coefficient in analyzing data.
12.Describe how regression analysis works and the importance of the least
squares regression line in predicting outcomes.
13.Discuss the concept of multiple regression and how it differs from
simple linear regression.
14.Explain the significance of r^2 and the standard error of estimate in
regression analysis.
15.Describe the process of using Python libraries such as Numpy and
Pandas for data wrangling and manipulation.
PART-C
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

16. Discuss the process of data visualization, including the use of

Matplotlib, Seaborn, and other tools for creating effective plots

ANSWER KEY
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

PART-A

1. What is Correlation?
Correlation is a statistical measure that describes the strength and direction of
the relationship between two variables.
2. What is a Scatter Plot?
A scatter plot is a graphical representation of two variables, showing the
relationship between them using points on a Cartesian plane.
3. What is the correlation coefficient?
The correlation coefficient is a numerical measure that indicates the degree of
linear relationship between two variables, ranging from -1 to +1.
4. What is the computational formula for the correlation coefficient?
The formula for the correlation coefficient rrr is:
r=n(∑xy)−(∑x)(∑y)[n∑x2−(∑x)2][n∑y2−(∑y)2]r = \frac{n(\sum xy) - (\sum
x)(\sum y)}{\sqrt{[n \sum x^2 - (\sum x)^2][n \sum y^2 - (\sum
y)^2]}}r=[n∑x2−(∑x)2][n∑y2−(∑y)2]n(∑xy)−(∑x)(∑y)
5. What is Regression in statistics?
Regression is a statistical technique used to model the relationship between a
dependent variable and one or more independent variables.
6. What is the least squares regression line?
The least squares regression line minimizes the sum of the squared differences
between the observed values and the predicted values from the regression line.
7. What is the standard error of estimate?
The standard error of estimate is a measure of the accuracy of predictions made
by the regression line, representing the typical distance between the observed
values and the regression line.
8. What does r2r^2r2 represent in regression analysis?
The r2r^2r2 (coefficient of determination) represents the proportion of the
variance in the dependent variable that is predictable from the independent
variable(s).
9. What is multiple regression?
Multiple regression is a statistical method used to model the relationship
between a dependent variable and two or more independent variables.
10. What is regression towards the mean?
Regression towards the mean refers to the phenomenon where extreme values
in a dataset tend to move closer to the average in subsequent measurements.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

PART-B

1. Explain the concept of correlation and the role of the correlation coefficient in
analyzing data.

Correlation refers to the statistical relationship between two variables. It measures

how changes in one variable are associated with changes in another variable.
Correlation does not imply causation, but it indicates whether the two variables tend to
move together in some way.

The correlation coefficient is a numerical value that quantifies the degree and
direction of the relationship between two variables. The most common correlation
coefficient is the Pearson correlation coefficient, denoted by rrr. It can take values
between -1 and +1:

 r=1r = 1r=1: Perfect positive correlation (as one variable increases, the other
increases in exact proportion).
 r=−1r = -1r=−1: Perfect negative correlation (as one variable increases, the
other decreases in exact proportion).
 r=0r = 0r=0: No linear correlation (no predictable relationship).
 Between 0 and ±1: A value closer to 1 or -1 indicates a stronger linear
relationship between the two variables.

The role of the correlation coefficient in data analysis is crucial for identifying
relationships between variables, allowing for prediction, modeling, and hypothesis
testing. A high positive or negative correlation can provide insight into potential
causal relationships, while a correlation near 0 suggests little or no linear relationship.

2. Describe how regression analysis works and the importance of the least
squares regression line in predicting outcomes.

Regression analysis is a statistical technique used to model the relationship between a

dependent variable and one or more independent variables. The goal is to understand
how the dependent variable changes as the independent variables change and to make
predictions based on this relationship.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

Simple Linear Regression involves a single independent variable and a dependent

variable. The regression model is represented by the equation:

y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilony=β0+β1x+ϵ

where:

 yyy is the dependent variable,

 xxx is the independent variable,
 β0\beta_0β0 is the intercept (the value of yyy when x=0x = 0x=0),
 β1\beta_1β1 is the slope (the change in yyy for a one-unit change in xxx),
 ϵ\epsilonϵ is the error term (accounting for variations not explained by the
model).

The least squares regression line is the line that minimizes the sum of the squared
differences (errors) between the observed data points and the predicted values from
the regression model. This method ensures the best fit by reducing the impact of large
errors and producing the most accurate predictions possible.

The importance of the least squares regression line lies in its ability to predict
outcomes based on the data. By understanding the relationship between variables,
businesses, researchers, and analysts can forecast future trends, optimize operations,
and make data-driven decisions.

3. Discuss the concept of multiple regression and how it differs from simple linear
regression.

Multiple regression is an extension of simple linear regression where more than one
independent variable is used to predict the dependent variable. It allows for a more
complex model that can capture the influence of multiple factors simultaneously. The
equation for multiple regression is:

y=β0+β1x1+β2x2+⋯+βkxk+ϵy = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \

beta_k x_k + \epsilony=β0+β1x1+β2x2+⋯+βkxk+ϵ

where:

 yyy is the dependent variable,

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 x1,x2,…,xkx_1, x_2, \dots, x_kx1,x2,…,xk are the independent

variables,
 β1,β2,…,βk\beta_1, \beta_2, \dots, \beta_kβ1,β2,…,βk are the
coefficients (representing the influence of each independent variable),
 β0\beta_0β0 is the intercept,
 ϵ\epsilonϵ is the error term.

Difference from Simple Linear Regression:

 Simple Linear Regression involves only one independent variable (xxx) to

predict the dependent variable (yyy).
 Multiple Regression involves two or more independent variables to explain
the dependent variable (yyy).

Key advantage of multiple regression is that it accounts for the influence of multiple
variables simultaneously, making the predictions more accurate when several factors
influence the outcome. For example, predicting a house price might involve variables
like size, location, and number of rooms, all of which are captured in a multiple
regression model.

4. Explain the significance of r2r^2r2 and the standard error of estimate in

regression analysis.

1. r2r^2r2 (Coefficient of Determination):

The r2r^2r2 value represents the proportion of the variance in the dependent
variable that is explained by the independent variables in the regression model.
It is a measure of the model's goodness of fit.
o r2r^2r2 ranges from 0 to 1:
 r2=1r^2 = 1r2=1 indicates that the regression model perfectly
explains the variation in the dependent variable.
 r2=0r^2 = 0r2=0 means the model does not explain any of the
variation in the dependent variable.
o A higher r2r^2r2 value suggests a better fit of the model to the data,
meaning that the independent variables explain more of the variation in
the dependent variable.

Significance: r2r^2r2 is important for assessing how well the regression model
performs. A high r2r^2r2 indicates that the model does a good job of predicting
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

the dependent variable, while a low r2r^2r2 suggests that the model may need
improvement.

2. Standard Error of Estimate:

The standard error of estimate is a measure of the accuracy of predictions
made by the regression model. It represents the typical distance between the
observed values and the values predicted by the regression model.
o A lower standard error indicates that the predictions are more accurate,
with smaller deviations from the actual data.
o The standard error is used to calculate confidence intervals and conduct
hypothesis testing about the regression coefficients.

Significance: The standard error of estimate is crucial for understanding the

precision of the model’s predictions. It helps in evaluating how much error is
expected in the predictions, making it essential for assessing the reliability of
the regression results.

5. Describe the process of using Python libraries such as Numpy and Pandas for
data wrangling and manipulation.

Data wrangling refers to the process of cleaning, transforming, and preparing raw
data for analysis. Python libraries like Numpy and Pandas are essential tools for this
task:

1. Numpy:
Numpy is a powerful library for numerical computing. It provides support for:
o Arrays: Multi-dimensional arrays that allow efficient storage and
manipulation of large datasets.
o Aggregations: Functions for calculating sums, means, standard
deviations, and other statistics across arrays.
o Computations: Mathematical operations like addition, multiplication,
matrix operations, and more are performed efficiently using Numpy
arrays.
o Boolean Logic: Numpy supports logical operations on arrays, allowing
for element-wise comparisons, filtering, and masking of data.

2. Pandas:
Pandas is a library built for data manipulation and analysis. It provides two
primary data structures:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

o Series: A one-dimensional array, useful for representing individual

columns of data.
o DataFrame: A two-dimensional table, ideal for working with datasets
in a tabular format, like spreadsheets or SQL tables.

Key functionalities in Pandas include:

o Data Indexing and Selection: Efficient selection, slicing, and filtering

of rows and columns from data frames.
o Handling Missing Data: Functions like .isnull(), .dropna(), and .fillna()
help identify and manage missing or null values.
o Aggregation and Grouping: The .groupby() method allows for
aggregating data by specific categories, calculating sums, means, counts,
etc.
o Combining Datasets: Functions like .merge(), .concat(), and .join()
enable combining multiple datasets based on common keys or indexes.
o Hierarchical Indexing: Allows for multi-level indexing in data frames,
making it easier to work with complex datasets.

These libraries provide flexible, powerful tools for data manipulation, enabling
efficient data wrangling, which is essential for cleaning, preparing, and
transforming data into a usable format for analysis.

PART-C

Discuss the Process of Data Visualization, Including the Use of Matplotlib,

Seaborn, and Other Tools for Creating Effective Plots

Introduction to Data Visualization: Data visualization is the graphical

representation of data and information. By using visual elements like charts, graphs,
and maps, data visualization tools provide an accessible way to view and understand
trends, outliers, and patterns in data. It is an essential part of the data analysis process,
as it simplifies complex data and allows for easier interpretation, better decision-
making, and insightful storytelling.

Effective data visualization helps both analysts and non-technical stakeholders quickly
comprehend relationships in the data. In this process, Python libraries such as
Matplotlib, Seaborn, and other visualization tools like Plotly and Bokeh play a
critical role in creating various kinds of plots and interactive visualizations.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

1. Matplotlib: A Core Library for Static Visualizations

Matplotlib is one of the most popular Python libraries for creating static, animated,
and interactive visualizations. It provides fine-grained control over plot elements and
is highly customizable. It is often considered the backbone of most Python-based
plotting libraries.

Key Features and Use Cases:

 Line Plots: Matplotlib is widely used for creating line plots, useful for
visualizing trends over time or continuous variables. For example, plotting
stock prices over several months.
 import matplotlib.pyplot as plt
 plt.plot(x, y)
 plt.title("Line Plot Example")
 plt.xlabel("X-axis Label")
 plt.ylabel("Y-axis Label")
 plt.show()

 Bar Charts: Bar charts are often used to compare quantities across different
categories. In Matplotlib, bar charts are straightforward to create, helping to
compare data distributions across categories or groups.
 plt.bar(categories, values)
 plt.title("Bar Chart Example")
 plt.show()

 Histograms: Histograms allow the analysis of the distribution of a dataset,

revealing how data is spread across various bins.
 plt.hist(data, bins=10)
 plt.title("Histogram Example")
 plt.show()

 Customization and Control: Matplotlib provides extensive customization

options like changing plot styles, adding legends, grid lines, titles, and labels,
and even controlling the font size and colors.
 Subplots and Multi-Panel Plots: Matplotlib supports creating multiple
subplots in one figure, making it easier to compare different visualizations side
by side.
 fig, ax = plt.subplots(1, 2)
 ax[0].plot(x, y1)
 ax[1].plot(x, y2)
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

Conclusion: Matplotlib’s flexibility and wide range of features make it a fundamental

library for creating high-quality static visualizations. However, it can be verbose and
requires manual tweaking for complex visualizations.

2. Seaborn: High-Level API Built on Matplotlib for Statistical Plots

Seaborn is a higher-level Python library built on top of Matplotlib, which simplifies

the creation of statistical visualizations. Seaborn is specifically designed to work with
Pandas data structures (DataFrames) and provides a range of built-in themes to
improve the aesthetics of plots.

Key Features and Use Cases:

 Automatic Aesthetics: Seaborn automatically applies attractive color schemes

and plots with less code, enhancing the visual appeal of the visualizations.
 import seaborn as sns
 sns.set(style="darkgrid")

 Pair Plots and Heatmaps: Seaborn makes it easy to visualize relationships

between multiple variables with functions like pairplot, which creates a matrix
of scatter plots for all pairs of variables, and heatmap, which visualizes
correlation matrices or data tables.
 sns.pairplot(df)
 sns.heatmap(df.corr(), annot=True, cmap="coolwarm")

 Categorical Plots: Seaborn simplifies the creation of categorical plots, such as

box plots, violin plots, and strip plots, which allow for visualizing the
distribution and variation of data points across categories.
 sns.boxplot(x="category", y="value", data=df)

 Regression Plots: Seaborn provides the ability to easily create regression plots
that fit a line (or curve) to the data, helping visualize linear relationships
between variables.
 sns.regplot(x="x_var", y="y_var", data=df)

 Faceting: Seaborn allows for easy faceting, which means creating multiple
plots based on a categorical variable. This is useful for analyzing data subsets
across different categories.
 sns.FacetGrid(df, col="category").map(plt.scatter, "x", "y")
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

Conclusion: Seaborn is an excellent tool for quickly creating high-quality,

informative statistical visualizations with minimal effort. Its simple API and powerful
plotting capabilities make it ideal for exploratory data analysis (EDA) and presenting
data patterns and relationships.

3. Plotly and Bokeh: Interactive Visualizations

While Matplotlib and Seaborn focus on static plots, Plotly and Bokeh are excellent
tools for creating interactive visualizations that allow users to explore the data
dynamically.

 Plotly: Plotly supports interactive plots with built-in functionalities like

zooming, hovering over points to display data, and exporting plots. It supports
a wide variety of charts, from basic plots to 3D visualizations.

Example:

import plotly.express as px
fig = px.scatter(df, x="x_var", y="y_var", color="category")
fig.show()

 Bokeh: Bokeh provides a platform for creating sophisticated web-based

interactive visualizations. It is particularly useful when creating dashboards or
embedding interactive visualizations into web applications.

Example:

from bokeh.plotting import figure, show

p = figure(title="Interactive Plot", x_axis_label='X', y_axis_label='Y')
p.scatter(x='x_var', y='y_var', source=df)
show(p)

Conclusion: For users who require interactivity in their visualizations—such as real-

time data exploration, zooming, or dynamic updates—Plotly and Bokeh are powerful
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

tools to consider. These libraries allow for a more engaging experience when working
with complex datasets or when presenting findings to stakeholders.

4. Geographic Data Visualization: Basemap and GeoPandas

For visualizing geospatial data, Python provides libraries like Basemap (an
extension of Matplotlib) and GeoPandas.

 Basemap: This library allows for the creation of maps and geospatial
visualizations. It is commonly used for plotting geographic data on various
types of projections (e.g., Mercator, Lambert).

Example:

from mpl_toolkits.basemap import Basemap

m = Basemap(projection='ortho', lat_0=50, lon_0=0)
m.drawcountries()

 GeoPandas: GeoPandas builds on Pandas to handle spatial data. It provides

support for reading and plotting shapefiles, GeoJSON, and other geospatial
formats, allowing for easier manipulation and visualization of geographic data.

Example:

import geopandas as gpd

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.plot()

Conclusion: When working with geographic datasets (such as map visualizations or

spatial data analysis), tools like Basemap and GeoPandas are invaluable for
visualizing locations, boundaries, and geographic features.

5. Customization and Best Practices

While creating effective visualizations, it’s important to follow best practices for
making charts clear and interpretable:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 Titles and Labels: Every plot should have a clear title and labeled axes.
 Legends: Legends should be used to differentiate data series or categories.
 Color Schemes: Use appropriate color schemes to enhance readability and not
mislead interpretation.
 Gridlines: Gridlines or background lines can help in reading the values more
accurately.
 Annotations: Use text annotations to highlight important data points or trends.

Conclusion: A good plot is one that tells a clear story and helps the audience
understand the underlying patterns in the data.

Conclusion:

In summary, data visualization is a crucial step in the data analysis pipeline, enabling
analysts to communicate insights more effectively. Libraries such as Matplotlib
provide the foundational tools for static plots, while Seaborn enhances the aesthetics
and ease of use for statistical plots. For interactive visualizations, Plotly and Bokeh
are excellent options, and Basemap and GeoPandas serve the specialized needs of
geospatial data. By using these tools effectively and following best practices in
visualization, analysts can create impactful and insightful visualizations that facilitate
data understanding and decision-making
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

MARKS
STATEMENT&
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

RESULT ANALYSIS

MARK STATEMENT

IAT 2
Reg No Name
MARKS
513523104047 NADEESH A P 63
513523104048 NASREEN BANU M 18
513523104049 NAVINKUMARAN P 14
513523104050 NAVYA D 0
513523104051 NIRANJAN G 14
513523104052 NIWIN RAJ J 41
513523104053 POOMA R 65
513523104054 POORANI A 72
513523104055 PRATHAP T 70
513523104056 PRATHIPA K 0
513523104057 PREETHA ZEN S 79
513523104058 PRIYADHARSHINI A 80
513523104059 PRIYADHARSHINI R 76
513523104060 PRIYANKA S 0
513523104061 RAGHUL RAJ S 10
513523104062 RAKSHITHA S 63
513523104063 RASIGAPRIYA K 94
513523104064 ROOPAN K 3
513523104065 RUTHIRAN C 62
513523104066 SAARIYA KOWNEN K 94
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104067 SAKTHIVEL K 58
513523104068 SALEEM AHMED A 92
513523104069 SANGEETHA P 66
513523104070 SANJEEVAN SURYA KUMAR 0
513523104071 SATHISH KUMAR D 15
513523104072 SATHYA BAMA S P 84
513523104073 SHALINI V 50
513523104074 SHARANYA M 90
513523104075 SHREE DEVI V G 77
513523104076 SIVA KUMAR T 67
513523104077 SONIYA S 95
513523104078 SUBASHINI T 75
513523104079 SUGAVANAN A 66
513523104080 SUJAN U 73
513523104081 SULTHAN BABU E A 52
513523104082 TAMILSELVAN P 65
513523104083 THARUN KUMAR M 68
513523104084 THULASI DEVI S 70
513523104085 VARUN MOHAN M D 39
513523104086 VENKATESAN S 15
513523104087 VIDHYA J 81
513523104088 VIGNESHWARAN B 34
513523104089 YUVASHRI E 82
513523104090 YUVASHRI P 81
513523104301 DEVISRI S 62
513523104302 DHANA SHREE S 52
513523104303 GOKUL M 63
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

RESULT ANALYSIS

NO. OF STUDENTS 47

NO. OF STUDENTS APPEARED 42

NO. OF STUDENTS PASS 33

NO. OF STUDENTS FAIL 09

PASS % 79%

FACULTY SIGN

HOD
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

FOUNDATION OF DATA SCIENCE
PART-A

1. What is Data Science?

2. Define the term "Data Mining".
3. What is the purpose of exploratory data analysis (EDA)?
4. What is meant by "Normal Distribution"?
5. What does the correlation coefficient represent?
6. What is hierarchical indexing in Pandas?
7. What is the significance of r2r^2r2 in regression analysis?
8. What is the use of scatter plots in data analysis?
9. Explain the term "missing data" in the context of data wrangling.
10.What is a line plot and when is it used?

PART-B

1. Discuss the various facets of data in Data Science and their significance in
the analysis process.
2. Explain the process of Data Science, including data retrieval, preparation,
and exploratory data analysis.
3. Describe the different types of data and variables used in data science, and
explain how data can be described using averages and measures of
variability.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

4. Explain the concept of correlation and regression, and discuss how they are
used to describe relationships between variables in data analysis.
5. Discuss how data manipulation is performed using Pandas, including
indexing, selection, handling missing data, and combining datasets

PART-C
1.Explain the complete Data Science process, from defining research goals to
building applications, including steps like data retrieval, preparation,
exploratory data analysis, and model building

ANSWER KEY
PART-A

1.What is Data Science?

Data Science is an interdisciplinary field that uses scientific methods,

processes, algorithms, and systems to extract knowledge and insights from
structured and unstructured data. It combines elements from statistics, computer
science, and domain expertise to analyze and interpret complex data to help
make decisions or predictions.

2. Define the term "Data Mining".

Data Mining is the process of discovering patterns, correlations, and useful

information from large datasets using statistical, mathematical, and
computational techniques. It involves exploring and analyzing large blocks of
information to uncover meaningful patterns or relationships, which can be used
for decision-making, predictions, or problem-solving.

3. What is the purpose of exploratory data analysis (EDA)?

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

Exploratory Data Analysis (EDA) is a critical step in the data analysis process
where data scientists analyze and summarize the main characteristics of a
dataset, often with visual methods. The goal of EDA is to:

 Understand the underlying structure of the data.

 Identify patterns, trends, and relationships between variables.
 Detect anomalies, outliers, or errors in the data.
 Generate hypotheses for further analysis or model building.

4. What is meant by "Normal Distribution"?

Normal Distribution refers to a probability distribution that is symmetric about

the mean, showing that data near the mean are more frequent in occurrence than
data far from the mean. It is also known as the Gaussian distribution, and it
follows the bell curve shape. Many statistical tests and methods assume that
data follows a normal distribution.

5. What does the correlation coefficient represent?

The correlation coefficient measures the strength and direction of the linear
relationship between two variables. It is a value between -1 and 1:

 A correlation coefficient close to 1 indicates a strong positive

relationship.
 A correlation coefficient close to -1 indicates a strong negative
relationship.
 A coefficient near 0 indicates no linear relationship between the
variables.

6. What is hierarchical indexing in Pandas?

Hierarchical Indexing in Pandas is a technique that allows users to manage

and manipulate data with multiple levels of indexing. It enables data to be
stored in a multi-dimensional format, improving the ability to represent
complex data structures and perform operations such as aggregations and slicing
on multi-level index data.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

7. What is the significance of r2r^2r2 in regression analysis?

The coefficient of determination, denoted as r2r^2r2, is a measure of how well

the regression model fits the data. It indicates the proportion of variance in the
dependent variable that can be explained by the independent variable(s).

 An r2r^2r2 value of 1 means that the model explains all the variance in
the data.
 An r2r^2r2 value of 0 means that the model does not explain any of the
variance.

8. What is the use of scatter plots in data analysis?

A scatter plot is used to visualize the relationship between two quantitative

variables. It displays data points on a two-dimensional graph, where each point
represents a pair of values. Scatter plots help to:

 Identify potential correlations or trends between the variables.

 Detect outliers or unusual patterns.
 Visualize the distribution and spread of data.

9. Explain the term "missing data" in the context of data wrangling.

Missing data refers to the absence of values in a dataset, where some

observations are not recorded or are incomplete. In the context of data
wrangling, missing data must be handled carefully, as it can introduce bias or
errors in analysis. Common techniques for handling missing data include:

 Imputation (replacing missing values with estimates).

 Deletion (removing rows or columns with missing values).
 Interpolation (estimating missing values based on surrounding data
points).

10. What is a line plot and when is it used?

A line plot (or line graph) is a type of chart used to visualize data points
connected by straight lines. It is typically used to show trends over time (time
series data) or the relationship between two continuous variables. Line plots are
useful for:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 Visualizing trends or changes over time.

 Identifying patterns or cycles.
 Comparing multiple variables across the same x-axis (e.g., time or
categories)

PART-B

1. Discuss the various facets of data in Data Science and their significance
in the analysis process.

In Data Science, data is central to all analyses, and it can be classified into
several facets based on its characteristics. Understanding these facets is crucial
as they determine how we process and interpret the data. The various facets of
data include:

 Types of Data:
o Structured Data: This is data that is organized in a fixed schema,
typically in tables (rows and columns), such as databases. It can be
easily analyzed using traditional data processing tools.
o Unstructured Data: This is data that does not have a predefined
structure, like text documents, images, videos, and social media
posts. Unstructured data often requires more complex methods
such as Natural Language Processing (NLP) or deep learning
models.
o Semi-Structured Data: This data has some organizational
structure but does not fit neatly into a traditional database, such as
JSON or XML files.

 Quantitative vs Qualitative Data:

o Quantitative Data: Data that can be measured and expressed
numerically, such as age, salary, or temperature. It is used for
statistical analysis.
o Qualitative Data: Also called categorical data, this data is
descriptive and cannot be measured numerically, such as names,
labels, or categories (e.g., gender, country).

 Continuous vs Discrete Data:

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

o Continuous Data: Data that can take any value within a given
range (e.g., height, weight).
o Discrete Data: Data that can only take specific values, typically
integers (e.g., the number of people in a room).

 Time-series Data: This type of data involves observations collected at

regular time intervals, such as stock prices or temperature readings. Time-
series analysis is critical in predictive analytics.
 Data Quality and Cleanliness:
o Missing Data: Missing values are a common issue in data analysis.
They need to be handled correctly to avoid bias.
o Outliers: Extreme values that differ significantly from other data
points. Identifying outliers is essential as they can distort analysis
results.

Significance in Analysis:

Understanding the facets of data allows analysts to choose appropriate methods

for data cleaning, exploration, transformation, and modeling. For example,
quantitative data might require statistical models, while unstructured data may
need NLP techniques. The quality and type of data directly affect the reliability
of conclusions drawn from the analysis.

2. Explain the process of Data Science, including data retrieval,

preparation, and exploratory data analysis.

The Data Science process is a systematic approach that helps in solving

complex data-related problems. The key steps include:

1. Defining Research Goals:

 The first step in the data science process is to clearly define the problem
or question to be addressed. Research goals should be specific,
measurable, and aligned with business objectives. For example, if the
goal is to predict customer churn, the analysis should focus on
understanding the factors that influence churn.

2. Retrieving Data:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 Data Collection: Data can be retrieved from various sources, such as

databases, APIs, web scraping, surveys, or sensors. Common data sources
include SQL databases, Excel files, cloud platforms, and publicly
available datasets.
 Data Acquisition Tools: Data can be acquired using programming
languages (like Python with libraries such as requests or BeautifulSoup
for web scraping), database query languages (SQL), or APIs from
external services (e.g., Google Analytics API).

3. Data Preparation:

 Data Cleaning: Raw data often contains errors, inconsistencies, and

missing values. Data cleaning techniques include removing duplicates,
handling missing values (e.g., imputation), and correcting errors.
 Data Transformation: The data may need to be transformed into a
format suitable for analysis. This could include normalization,
standardization, encoding categorical variables, and converting dates into
a consistent format.
 Feature Engineering: Involves creating new features (columns) from
existing ones to improve model performance. This may involve extracting
new variables like the day of the week from a timestamp or calculating
ratios from other features.

4. Exploratory Data Analysis (EDA):

 Statistical Summaries: Summarize data using measures like mean,

median, mode, and standard deviation to understand the central tendency
and variability.
 Visualization: Visual techniques like histograms, scatter plots, box plots,
and heatmaps are used to identify trends, distributions, and relationships
in the data.
 Hypothesis Testing: Conducting preliminary statistical tests (e.g., t-tests,
chi-squared tests) to understand the data better and identify relationships
or differences between variables.
 Outlier Detection: Identifying and handling outliers that could skew the
results of the analysis.

Significance:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 Data retrieval ensures that relevant and high-quality data is collected.

 Data preparation is crucial for transforming raw data into a clean,
structured format that can be analyzed effectively.
 EDA provides a deeper understanding of the data and helps in making
informed decisions regarding modeling techniques.

3. Describe the different types of data and variables used in data science,
and explain how data can be described using averages and measures of
variability.

Types of Data in Data Science:

1. Numerical (Quantitative) Data: Data that represents measurable

quantities.
o Continuous Data: Data that can take any value in a given range
(e.g., height, temperature, sales).
o Discrete Data: Data that can take only specific integer values (e.g.,
number of customers, number of products sold).

2. Categorical (Qualitative) Data: Data that represents categories or labels.

o Nominal Data: Categories without any specific order (e.g., colors,
gender).
o Ordinal Data: Categories with a defined order or ranking (e.g.,
education levels: high school, bachelor’s, master’s).

3. Binary Data: A special case of categorical data that has only two
possible outcomes (e.g., True/False, Yes/No).

Variables in Data Science:

o Independent Variables: These are the input features or predictors

used to explain the dependent variable (e.g., age, income,
education level).
o Dependent Variables: The output or target variable that is being
predicted or explained (e.g., sales, temperature).

Describing Data using Averages and Measures of Variability:

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

1. Measures of Central Tendency:

o Mean: The average of a set of values. It is the sum of all values
divided by the number of observations. It is useful when data is
symmetric but can be skewed by outliers.
o Median: The middle value when data is sorted in order. The
median is a better measure when the data contains outliers or is
skewed.
o Mode: The most frequent value in a dataset. It is useful for
categorical data.

2. Measures of Variability:
o Range: The difference between the maximum and minimum
values in a dataset. It gives a basic sense of the spread but is
sensitive to outliers.
o Variance: A measure of how spread out the values are around the
mean. It is calculated as the average of squared differences from
the mean.
o Standard Deviation: The square root of the variance, giving a
measure of spread in the same units as the data. A larger standard
deviation indicates more variability.
o Interquartile Range (IQR): The range between the 1st and 3rd
quartiles (Q1 and Q3), representing the middle 50% of the data. It
is useful for identifying outliers.

Significance:

 Averages (mean, median, mode) provide an overall sense of

central tendency.
 Measures of variability (range, variance, standard deviation) help
in understanding the spread or dispersion of the data, which is
essential for statistical modeling and risk analysis.

4. Explain the concept of correlation and regression, and discuss how they
are used to describe relationships between variables in data analysis.

Correlation:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 Definition: Correlation refers to the statistical relationship

between two variables. It measures the strength and direction of
the linear relationship between the variables.
 Types:
o Positive Correlation: As one variable increases, the other
also increases (e.g., height and weight).
o Negative Correlation: As one variable increases, the other
decreases (e.g., speed and time to reach a destination).
o No Correlation: No discernible relationship between
variables.
 Pearson Correlation Coefficient: It is a measure of the linear
correlation between two variables, ranging from -1 (perfect
negative correlation) to 1 (perfect positive correlation). A value of
0 indicates no linear correlation.

Regression:

 Definition: Regression is a statistical technique used to model and

analyze the relationship between a dependent variable and one or
more independent variables. It is used to predict the value of the
dependent variable based on the values of the independent
variables.
 Linear Regression: In simple linear regression, the relationship
between the dependent variable (Y) and the independent variable
(X) is modeled as a straight line: Y=a+bXY = a + bXY=a+bX
Where a is the intercept and b is the slope.
 Multiple Regression: This involves multiple independent
variables and can be used to model more complex relationships

PART-C

1.Data Manipulation in Pandas: Indexing, Selection, Handling Missing

Data, and Combining Datasets

Data manipulation is a key aspect of data analysis, allowing us to clean,

transform, and prepare data for more sophisticated analysis and modeling.
Pandas, a Python library, is widely used for this purpose due to its versatile
data structures, namely DataFrame and Series, which enable efficient handling
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

of data. This answer discusses the major data manipulation techniques in

Pandas: indexing, selection, handling missing data, and combining datasets,
without focusing on code but emphasizing their concepts and significance.

1. Indexing in Pandas

Indexing refers to how data is accessed or identified in Pandas. It determines

how efficiently you can select, retrieve, or modify data in a DataFrame. Pandas
allows different ways to index rows and columns, which makes it very flexible.

Key Concepts in Indexing:

 Default Index: By default, rows are labeled with integer indices starting
from 0, which is the most common way of identifying rows. Each column
can be accessed by its name, which can be thought of as the label for that
column.
 Custom Index: Pandas allows the use of custom indices for rows (e.g.,
using a column as an index). Custom indices make the dataset more
intuitive, especially when dealing with time-series data or datasets where
rows have meaningful identifiers.
 Label-based Indexing: Pandas provides .loc[], which allows you to
select rows and columns using labels. This is particularly useful when the
dataset has non-integer indices (e.g., strings or dates).
 Position-based Indexing: Using .iloc[], you can access data by its
integer position. This is similar to traditional arrays, where data is
accessed by its index position.

Significance:

 Effective indexing helps to quickly locate and retrieve subsets of data.

For instance, with a well-chosen index, rows and columns can be
accessed efficiently for analysis, making the process more streamlined,
especially in large datasets

2. Selection in Pandas
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

Selection refers to extracting or filtering specific portions of data from a

DataFrame or Series based on certain criteria, such as conditions or specific
row/column labels.

Key Selection Techniques:

 Selecting Columns: You can select a column (or multiple columns) by

its label. If you want a single column, you can reference the column name
directly. To select multiple columns, you pass a list of column names.
 Selecting Rows: Rows can be selected by index label using label-based
indexing (.loc[]) or by position using position-based indexing (.iloc[]).
This allows for easy row extraction, whether by their label or their integer
position.
 Conditional Selection: You can filter rows based on conditions applied
to one or more columns. For example, selecting rows where a certain
column exceeds a specified threshold or where multiple conditions are
met.
 Multiple Condition Selection: Using logical operators like AND (&)
and OR (|), you can combine multiple conditions to filter data more
specifically.

Significance:

 Selection is a powerful way to isolate specific data from a large dataset. It

helps analysts focus on the relevant portion of the data, whether it’s for
data cleaning, analysis, or building a model.

3. Handling Missing Data in Pandas

Missing data is a common problem in real-world datasets. Handling missing

data appropriately is essential for ensuring the integrity of any analysis or
model. Pandas provides several methods for detecting, handling, and imputing
missing values.

Key Techniques for Handling Missing Data:

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 Detecting Missing Data: Missing values are typically represented by

NaN (Not a Number). Pandas provides functions to detect such values
within a DataFrame or Series. This can be helpful for identifying gaps in
the data that need to be addressed.
 Removing Missing Data: In cases where the missing data is minimal or
irrelevant, rows or columns containing missing values can be removed
using functions that drop rows or columns with NaN values. This can
help maintain the quality and consistency of the dataset.
 Filling Missing Data: Rather than removing missing values, you can fill
them with estimated or default values. Common strategies for filling
include using the mean, median, or mode of the respective column, or
filling values using a forward fill or backward fill method, where missing
values are replaced by the last valid observation.
 Interpolation: For more advanced handling of missing values,
interpolation can be used. This technique estimates missing values based
on the surrounding data, particularly useful in time-series data where the
missing values can be predicted from nearby points.

Significance:

 Missing data can skew analysis and models, leading to biased or

inaccurate results. Proper handling ensures the dataset remains robust and
representative of the underlying patterns. By either imputing or removing
missing data, the dataset becomes more reliable for further analysis or
modeling.

4. Combining Datasets in Pandas

Often in data analysis, data is spread across multiple sources or tables that need
to be combined to form a comprehensive dataset. Pandas provides a range of
tools for merging, joining, and concatenating datasets.

Key Techniques for Combining Datasets:

 Concatenation: Concatenation allows for stacking multiple DataFrames

either along rows (vertically) or columns (horizontally). This is useful
when the datasets have the same structure and need to be appended or
stacked together.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

 Merging: The merge() function is used to combine DataFrames based on

one or more common columns (keys). This operation is analogous to SQL
joins (inner join, left join, right join, outer join), where datasets are
combined based on matching values in shared columns. Merging is
particularly useful when datasets are related by a common attribute.
 Joining: The join() method is similar to merging, but it is more
convenient for joining DataFrames based on their index. This method
simplifies the combination of datasets that share the same index or
require less complex join conditions.
 Appending: When one dataset needs to be added to the end of another
(adding rows), the append() method is used. This is typically employed
when data from one source is being added to a dataset in another.

Significance:

Combining datasets is essential in cases where data is spread across multiple

tables or sources. By merging, concatenating, or joining, analysts can bring
together related data to form a complete dataset for analysis. This integration of
datasets is vital for ensuring that no critical information is omitted, and the data
is aligned for accurate interpretation.

MARK
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

STATEMENT
&
RESULT ANALYSIS

MARK STATEMENT

MODELEXAM
Reg No Name
MARKS
513523104047 NADEESH A P 78
513523104048 NASREEN BANU M AB
513523104049 NAVINKUMARAN P 22
513523104050 NAVYA D 79
513523104051 NIRANJAN G 08
513523104052 NIWIN RAJ J 18
513523104053 POOMA R 79
513523104054 POORANI A AB
513523104055 PRATHAP T 62
513523104056 PRATHIPA K 95
513523104057 PREETHA ZEN S 85
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104058 PRIYADHARSHINI A 66
513523104059 PRIYADHARSHINI R 68
513523104060 PRIYANKA S AB
513523104061 RAGHUL RAJ S AB
513523104062 RAKSHITHA S AB
513523104063 RASIGAPRIYA K 90
513523104064 ROOPAN K AB
513523104065 RUTHIRAN C 80
513523104066 SAARIYA KOWNEN K 93
513523104067 SAKTHIVEL K 66
513523104068 SALEEM AHMED A 94
513523104069 SANGEETHA P AB
513523104070 SANJEEVAN SURYA KUMAR 24
513523104071 SATHISH KUMAR D AB
513523104072 SATHYA BAMA S P 73
513523104073 SHALINI V 61
513523104074 SHARANYA M 87
513523104075 SHREE DEVI V G AB
513523104076 SIVA KUMAR T 66
513523104077 SONIYA S 93
513523104078 SUBASHINI T 72
513523104079 SUGAVANAN A 11
513523104080 SUJAN U 09
513523104081 SULTHAN BABU E A 56
513523104082 TAMILSELVAN P 62
513523104083 THARUN KUMAR M 55
513523104084 THULASI DEVI S AB
513523104085 VARUN MOHAN M D 42
513523104086 VENKATESAN S 03
513523104087 VIDHYA J 83
513523104088 VIGNESHWARAN B 15
513523104089 YUVASHRI E 69
513523104090 YUVASHRI P 73
513523104301 DEVISRI S AB
513523104302 DHANA SHREE S AB
513523104303 GOKUL M 75
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

RESULT ANALYSIS

NO. OF STUDENTS 47

NO. OF STUDENTS APPEARED 35

NO. OF STUDENTS PASS 26

NO. OF STUDENTS FAIL 09

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

PASS % 74%

FACULTY SIGN

HOD
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

SLOW
LEARNERS
IDENTIFICATION

SLOW LEARNERS

Reg No Name

513523104048 NASREEN BANU M

513523104052 NIWIN RAJ J

513523104061 RAGHUL RAJ S

513523104064 ROOPAN K

513523104081 SULTHAN BABU E A

513523104086 VENKATESAN S
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104049 NAVINKUMARAN P

513523104071 SATHISH KUMAR D

513523104051 NIRANJAN G

513523104088 VIGNESHWARAN B

513523104076 SIVA KUMAR T

513523104085 VARUN MOHAN M D

ATTENDANCE
FOR REMEDIAL
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

CLASS

ATTENDANCE FOR REMEDIAL CLASS

Attendance
Reg No Name

513523104048 NASREEN BANU M

513523104052 NIWIN RAJ J
513523104061 RAGHUL RAJ S
513523104064 ROOPAN K
513523104081 SULTHAN BABU E A
513523104086 VENKATESAN S
513523104049 NAVINKUMARAN P
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104071 SATHISH KUMAR D

513523104051 NIRANJAN G
513523104088 VIGNESHWARAN B
513523104076 SIVA KUMAR T
513523104085 VARUN MOHAN M D
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

MARK STATEMENT
SHOWING PROGRESSION
OF SLOW LEARNERS

MARK STATEMENT SHOWING PROGRESSION OF SLOW

LEARNERS

Marks
Reg No Name

IAT-1 IAT-2 MODEL

513523104048 NASREEN BANU M
513523104052 NIWIN RAJ J
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104061 RAGHUL RAJ S

513523104064 ROOPAN K
513523104081 SULTHAN BABU E A
513523104086 VENKATESAN S
513523104049 NAVINKUMARAN P
513523104071 SATHISH KUMAR D
513523104051 NIRANJAN G
513523104088 VIGNESHWARAN B
513523104076 SIVA KUMAR T
513523104085 VARUN MOHAN M D
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

ASSIGNMENT
TOPICS

ASSIGNMENT TOPICS

1. The Data Science Process: A Step-by-Step Guide

2. Types of Data and Variables: A Comprehensive Overview

3. Exploring the Relationship Between Two Quantitative Variables Using

Correlation and Regression

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

4. Data Analysis and Manipulation with NumPy and Pandas

5. Data Visualization using Matplotlib and Seaborn

An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

SAMPLE
ASSIGNMENT
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

ASSIGNMENT
MARKS
STATEMENT

ASSIGNMENT MARK STATEMENT

Reg No Name MARKS AVERAGE

513523104047 NADEESH A P 9 9 8 9
513523104048 NASREEN BANU M 9 9 8 9
513523104049 NAVINKUMARAN P 8 9 9 9
513523104050 NAVYA D 6 8 7 7
513523104051 NIRANJAN G 8 7 6 7
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104052 NIWIN RAJ J 7 7 8 7

513523104053 POOMA R 8 8 9 8
513523104054 POORANI A 8 8 8 8
513523104055 PRATHAP T 9 9 7 8
513523104056 PRATHIPA K 7 9 9 8
513523104057 PREETHA ZEN S 6 9 9 8
513523104058 PRIYADHARSHINI A 8 8 9 8
513523104059 PRIYADHARSHINI R 9 6 9 8
513523104060 PRIYANKA S 8 8 8 8
513523104061 RAGHUL RAJ S 7 7 7 7
513523104062 RAKSHITHA S 9 8 9 9
513523104063 RASIGAPRIYA K 9 8 9 9
513523104064 ROOPAN K 9 9 8 9
513523104065 RUTHIRAN C 9 7 6 7
513523104066 SAARIYA KOWNEN K 8 6 8 7
513523104067 SAKTHIVEL K 7 8 7 7
513523104068 SALEEM AHMED A 7 9 8 8
513523104069 SANGEETHA P 8 8 8 8
SANJEEVAN SURYA
513523104070 8 7 8
KUMAR 8
513523104071 SATHISH KUMAR D 9 9 7 8
513523104072 SATHYA BAMA S P 8 9 8 8
513523104073 SHALINI V 9 8 9 9
513523104074 SHARANYA M 7 9 9 8
513523104075 SHREE DEVI V G 8 8 9 8
513523104076 SIVA KUMAR T 9 9 9 9
513523104077 SONIYA S 8 8 8 8
513523104078 SUBASHINI T 8 8 9 8
513523104079 SUGAVANAN A 9 9 7 8
513523104080 SUJAN U 8 9 9 9
513523104081 SULTHAN BABU E A 9 9 8 9
513523104082 TAMILSELVAN P 8 9 9 9
513523104083 THARUN KUMAR M 9 8 8 8
513523104084 THULASI DEVI S 8 9 9 9
513523104085 VARUN MOHAN M D 9 7 8 8
513523104086 VENKATESAN S 8 8 8 8
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

513523104087 VIDHYA J 9 9 9 9
513523104088 VIGNESHWARAN B 8 8 9 8
513523104089 YUVASHRI E 8 8 8 8
513523104090 YUVASHRI P 9 9 9 9
513523104301 DEVISRI S 9 8 8 8
513523104302 DHANA SHREE S 9 9 9 9
513523104303 GOKUL M 9 8 8 8
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

LOG NOTEBOOK
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

CO ATTAINMENT
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

LECTURE NOTES
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

ASSESSMENT EXAMINATON
QUESTION PAPERS WITH
SCHEME AND SAMPLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

HAND OUTS
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai

TIMETABLE FOR
REMEDIAL CLASS

2021 Anna University Regulations
No ratings yet
2021 Anna University Regulations
415 pages
Quiz No. 2 Cem118
No ratings yet
Quiz No. 2 Cem118
2 pages
ADS R2022
No ratings yet
ADS R2022
178 pages
Bda 20cs41001 Course File
No ratings yet
Bda 20cs41001 Course File
133 pages
bda 1
No ratings yet
bda 1
95 pages
Fds Course File
No ratings yet
Fds Course File
13 pages
Bda 20cs41001 Course File Ds
No ratings yet
Bda 20cs41001 Course File Ds
170 pages
CS3352 Foundations of Data Science
No ratings yet
CS3352 Foundations of Data Science
27 pages
VasudevanSK Internship Report
No ratings yet
VasudevanSK Internship Report
58 pages
Lecture Zero_UIT-Data Sciednce
No ratings yet
Lecture Zero_UIT-Data Sciednce
18 pages
Dsa Lab Manual Reg 2021-Correct PGM
No ratings yet
Dsa Lab Manual Reg 2021-Correct PGM
140 pages
Csbs 2022 Syllabus RMK
No ratings yet
Csbs 2022 Syllabus RMK
41 pages
Cst 322 Data Analytics (Elective)
No ratings yet
Cst 322 Data Analytics (Elective)
244 pages
Data Analytics Course handout 2024 29.11.24 anjamma
No ratings yet
Data Analytics Course handout 2024 29.11.24 anjamma
42 pages
Data Base Management Systems Laboratory: Department of Computer Science Engineering
No ratings yet
Data Base Management Systems Laboratory: Department of Computer Science Engineering
74 pages
Internship
No ratings yet
Internship
22 pages
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
No ratings yet
Geethanjali College of Engineering and Technology (Ugc Autonomous Institution)
34 pages
R2023-CSE-Curriculum & Syllabus
No ratings yet
R2023-CSE-Curriculum & Syllabus
53 pages
CSE - R2024 Curriculum and Syllabus for the Students Admitted in AY2024 25
No ratings yet
CSE - R2024 Curriculum and Syllabus for the Students Admitted in AY2024 25
49 pages
Cse Autonomous Syllabus 31 05 24
No ratings yet
Cse Autonomous Syllabus 31 05 24
89 pages
Curriculum and Syllabi (2020-2021) : School of Computer Science and Engineering
No ratings yet
Curriculum and Syllabi (2020-2021) : School of Computer Science and Engineering
26 pages
2023 regulation CSBS Autonomous Syllabus
No ratings yet
2023 regulation CSBS Autonomous Syllabus
85 pages
Module 1 PPT
No ratings yet
Module 1 PPT
96 pages
B.E.Cse (AIML)
No ratings yet
B.E.Cse (AIML)
402 pages
Department of Computer Science and Engineering: Institute Vision
No ratings yet
Department of Computer Science and Engineering: Institute Vision
3 pages
AI & ML Lab Manual
No ratings yet
AI & ML Lab Manual
13 pages
LecturePlan CS201 20SMP-460
No ratings yet
LecturePlan CS201 20SMP-460
5 pages
FDSA Manual
No ratings yet
FDSA Manual
9 pages
CCS334 UPDATED 05-05-2025
No ratings yet
CCS334 UPDATED 05-05-2025
19 pages
Ada Final
No ratings yet
Ada Final
37 pages
Untitled (2)
No ratings yet
Untitled (2)
396 pages
Aiml FPP
No ratings yet
Aiml FPP
227 pages
BigDataAnalytics Lab Manual(Ds) (1)
No ratings yet
BigDataAnalytics Lab Manual(Ds) (1)
44 pages
B.E.cse (Cyber Security)_Padeepz App
No ratings yet
B.E.cse (Cyber Security)_Padeepz App
370 pages
ME-CSE-2020-SYLLABUS
No ratings yet
ME-CSE-2020-SYLLABUS
74 pages
Part1 Ai&Ml Course File (1)
No ratings yet
Part1 Ai&Ml Course File (1)
26 pages
DBMS FPP 24-25 EVEN
No ratings yet
DBMS FPP 24-25 EVEN
24 pages
Anna University, Chennai Affiliated Institutions B.Tech. Artifical Intelligence and Data Science Regulations - 2017 Choice Based Credit System
No ratings yet
Anna University, Chennai Affiliated Institutions B.Tech. Artifical Intelligence and Data Science Regulations - 2017 Choice Based Credit System
27 pages
DM LAB MANUAL
No ratings yet
DM LAB MANUAL
72 pages
DS LAB MANUAL
No ratings yet
DS LAB MANUAL
70 pages
M.E. Cse.
0% (1)
M.E. Cse.
62 pages
Big Data Analytics(r18a0529)
No ratings yet
Big Data Analytics(r18a0529)
139 pages
Complete Syllabus R2022
0% (1)
Complete Syllabus R2022
626 pages
R2023-CSBS-Curriculum & Syllabus
No ratings yet
R2023-CSBS-Curriculum & Syllabus
47 pages
M.E. Cse.
No ratings yet
M.E. Cse.
62 pages
B.E.cse (Cyber Security)
No ratings yet
B.E.cse (Cyber Security)
370 pages
R Programming Lab Manual
No ratings yet
R Programming Lab Manual
57 pages
B e Cse PDF
No ratings yet
B e Cse PDF
406 pages
fa1a8fe33ea455ddf446167f3ca33d5d
No ratings yet
fa1a8fe33ea455ddf446167f3ca33d5d
137 pages
08 B Tech AIDS
No ratings yet
08 B Tech AIDS
31 pages
DBMS_RECORD_19.4
No ratings yet
DBMS_RECORD_19.4
88 pages
DS Handout
No ratings yet
DS Handout
25 pages
DWDM Lab Manual - It - Iii-Ii - 2018-19 PDF
No ratings yet
DWDM Lab Manual - It - Iii-Ii - 2018-19 PDF
96 pages
DSBA Manual 2025
No ratings yet
DSBA Manual 2025
77 pages
ISE-3rd-Semester-Syllabus-23-24-Ver-4.0 (1)
No ratings yet
ISE-3rd-Semester-Syllabus-23-24-Ver-4.0 (1)
47 pages
R Manual
No ratings yet
R Manual
62 pages
BTCS9202 Data Sciences Lab Manual
No ratings yet
BTCS9202 Data Sciences Lab Manual
39 pages
Preparing for Accreditation: Of Quality Assurance of Professional Educational Services
From Everand
Preparing for Accreditation: Of Quality Assurance of Professional Educational Services
Prof. Satish Kumar Soni
No ratings yet
Creating a New Technological Institute
From Everand
Creating a New Technological Institute
Arvind Kudchadker
No ratings yet
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
From Everand
Cookbook for Mobile Robotic Platform Control: With Internet of Things And Ti Launch Pad
Dr. Anita Gehlot
No ratings yet
The Innovator's Toolkit: Workbook and Learner Guide
From Everand
The Innovator's Toolkit: Workbook and Learner Guide
Sarah McKenna
No ratings yet
Forecasting: Categories of Forecasting Methods
No ratings yet
Forecasting: Categories of Forecasting Methods
3 pages
Econ 536 Syllabus FL24
No ratings yet
Econ 536 Syllabus FL24
5 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
QUALITY ASSURANCE/QUALITY CONTROL (7020) /basic Quality Control Program 7-3
No ratings yet
QUALITY ASSURANCE/QUALITY CONTROL (7020) /basic Quality Control Program 7-3
6 pages
Statistics Is An Art
No ratings yet
Statistics Is An Art
4 pages
G653
No ratings yet
G653
7 pages
SM For Statistics
No ratings yet
SM For Statistics
88 pages
Nur 028 NR P1 Term Part 1
No ratings yet
Nur 028 NR P1 Term Part 1
12 pages
SJ PDF 1 MRJ 10.1177 - 00222437231182434
No ratings yet
SJ PDF 1 MRJ 10.1177 - 00222437231182434
30 pages
Studying_an_OC_Curve_of_an_Acceptance_Sampling_Pla
No ratings yet
Studying_an_OC_Curve_of_an_Acceptance_Sampling_Pla
7 pages
Scheugh Christoph (Inter Alios) - Tidy Finance With Python
100% (2)
Scheugh Christoph (Inter Alios) - Tidy Finance With Python
262 pages
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
No ratings yet
Practical - 592 MA SOCIOLOGY SPSS Fourth Sem
45 pages
SEHH2031 Exercises Chapter 10
No ratings yet
SEHH2031 Exercises Chapter 10
7 pages
Senior High School Department 3F Consuelo Bldg. Corrales Ave., Cagayan de Oro City
No ratings yet
Senior High School Department 3F Consuelo Bldg. Corrales Ave., Cagayan de Oro City
1 page
SG (2) Case Study Assignment MM5002 SMEMBA7
No ratings yet
SG (2) Case Study Assignment MM5002 SMEMBA7
14 pages
Excel Skills For Business Forecasting
No ratings yet
Excel Skills For Business Forecasting
5 pages
Measures of Dispersion
100% (1)
Measures of Dispersion
13 pages
4.-Student Dropout Prediction 2020
No ratings yet
4.-Student Dropout Prediction 2020
12 pages
Sample Size Estimation in Clinical Research
No ratings yet
Sample Size Estimation in Clinical Research
9 pages
2nd S1 Mock Test Total Marks
No ratings yet
2nd S1 Mock Test Total Marks
3 pages
SurveyData 3
No ratings yet
SurveyData 3
49 pages
Homework Week 12
No ratings yet
Homework Week 12
2 pages
Nurses Knowledge and Performance On Colostomy Care
No ratings yet
Nurses Knowledge and Performance On Colostomy Care
16 pages
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
No ratings yet
Bundle Adjustment - A Modern Synthesis: Bill - Triggs@
75 pages
Methods In Behavioural Research 3rd Edition Paul C. Cozby - eBook PDF pdf download
100% (3)
Methods In Behavioural Research 3rd Edition Paul C. Cozby - eBook PDF pdf download
73 pages
01 Distribution Functions
No ratings yet
01 Distribution Functions
7 pages
Solution Ecom30004 Homework2 Questions-1
No ratings yet
Solution Ecom30004 Homework2 Questions-1
6 pages
Universiti Kuala Lumpur: Attach This Coversheet As The Cover of Your Submission. All Sections Must Be Completed
No ratings yet
Universiti Kuala Lumpur: Attach This Coversheet As The Cover of Your Submission. All Sections Must Be Completed
12 pages
Rogerian Essay
No ratings yet
Rogerian Essay
6 pages