0% found this document useful (0 votes)
16 views

TOPIC WISE DSA QUESTIONS

Uploaded by

Surabhi Raj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

TOPIC WISE DSA QUESTIONS

Uploaded by

Surabhi Raj
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Science and its Applications (21AD62)

MODULE 1

Data Visualization

1. What is data Visualization? Explain bar chart and line chart. (8)
2. Explain Data Visualization and recognize its use. Sketch Python code segment to
visualize line chart and scatterplot with example. (6)
3. With matplotlib explain simple line chart and bar chart. (8)
4. Write a short note on data visualization. (6)
5. Describe the process of creating a bar chart using matplotlib. What information is
typically conveyed by a bar chart? (4)
6. Explain the concept of correlation and its significance in data analysis. Discuss
Simpson’s Paradox and other correlational caveats with examples. (8)
7. Explain with example the matplotlib library in python. (6)
8. Draw the scatter plot to illustrate the relationship between the number of friends and
the number of minutes spent on every day. (4)
9. Develop a python program to plot a bar chart for the given data. Draw the bar chart
and label x and y axes. (8)
10. Develop a python program to plot a line chart for the given data. Explain the various
attributes of the line chart. Draw the line chart. (6)

Probability Theory & Bayes Theorem

1. Write a note on probability theory as applicable to data science. (8)


2. Describe Bayes’s Theorem in detail with an example. (8)
3. State and explain Bayes’s theorem. (6)
4. Describe Bayes’s Theorem and its significance in statistical inference. How can
Bayes’s Theorem be applied to improve classification models? (8)
5. Describe the following probability concepts:
o Conditional Probability
o Bayes Theorem
o Central Limit Theorem
o Normal Distribution
o Random Variables (8)
6. Find the probability of the given events:
o A single letter is selected at random from the word ‘MACHINE LEARNING’.
The probability that it is a consonant.
o The probability of rolling 2 dice to get a sum of 4 or 7.
o Lottery tokens are numbered from 1 to 25. What is the probability that a token
drawn is a multiple of 5 or 7?
o The probability of getting a face in 52 cards. (8)
Normal Distribution

1. Write a note on normal distribution. (4)


2. Describe Normal Distribution with a Python routine for PDF and CDF. (7)
3. Describe the statement “Correlation is not Causation” with an example in detail. (6)
4. What is the Standard Normal distribution? Explain how to use the Z-score to
standardize a normal random variable. (7)
5. Illustrate normal distribution and continuous distribution in detail. (10)
6. Illustrate Central limit theorem with a neat diagram. (8)
7. State and illustrate the Central Limit Theorem with a python code using a suitable
example. (7)
8. Discuss the Central Limit Theorem and its significance in relation to the Normal
distribution. How is the Normal distribution used in hypothesis testing? (6)

Vectors in Data Science

1. Explain the following:


o Vector addition
o Vector sum
o Vector mean
o Vector multiplication (8)
2. Describe vectors in Data Science and explain any three operations on vectors with
Python routine for each operation. (6)
3. Write a Python Program to add Two Vectors and Multiply a Vector by Scalar. (6)
4. Explain vectors with a code to find the distance between two vectors. (6)
5. Discuss the concept of vectors and matrices in Linear Algebra. Provide examples of
how they are used in data manipulation and machine learning. (10)

Measures of Central Tendency & Dispersion

1. Explain the following statistical techniques:


o Mean
o Median
o Mode
o Interquartile range (8)
2. What are the main measures of central tendency? Describe each one. How do you
represent a vector in Python using libraries like NumPy? (7)
3. What are measures of dispersion, and why are they important? (6)
4. Summarize dispersion. Using Python code snippet explain the various measures of
dispersion. (7)
5. Describe dispersion and variance and write the python code to compute the variance.
(6)
6. Explain standard deviation and interquartile range and write python code to compute
standard deviation and interquartile range. (8)
7. Develop python functions for computing the components of central tendencies with
explanation. (6)
8. Consider the following employees data:


Find the standard deviation of salary of employees in each dept. of a company
and identify the department with the highest standard deviation. (7)
 Find the mean and median salary of employees in each department of the
company. (7)

9. Compute code to compute standard deviation. (6)

Simpson’s Paradox

1. Explain Simpson’s Paradox. (4)


2. Explain Simpson’s paradox with an example. (7)
3. Illustrate Simpson’s paradox with an example. (6)
4. Discuss Simpson’s Paradox and other correlational caveats with examples. (8)

Correlation vs Causation

1. Explain the difference between correlation and causation. Why is it incorrect to infer
causation from correlation alone? Describe an example where correlation does not
imply causation. (7)
2. Describe the statement “Correlation is not Causation” with an example in detail. (6)

Data Science

1. Define Data Science. Explain the Venn diagram of Data Science. (6)
2. What is Data Science? Write a short note on data visualization. (6)
3. What is Data Science? With example explain the role of a data scientist. (8)
4. Who is a Data scientist? Draw the data science life cycle in detail. (8)
Random Variables

1. Discuss Random variables with an example in detail. (6)


2. Discuss random variables with an example in detail. (6)
3. What are random variables? State Bayes’s theorem in detail with an example. (8)
MODULE 2

Gradient Descent:

1. Explain the gradient descent approach in detail with a relevant example.


2. What is gradient descent, and why is it important in machine learning? Explain the
difference between gradient descent and stochastic gradient descent.
3. Explain the way how Gradient descent is used to fit Parameterized models.
4. Explain how gradient descent is used to fit parameterized models.
5. What is gradient descent? Explain the idea behind gradient descent and how it is used
to fit models. Discuss the differences between batch gradient descent, minibatch
gradient descent, and stochastic gradient descent.
6. Compute code to estimate the gradient.
7. Summarize Stochastic and Minibatch Gradient Descent.

Hypothesis Testing & A/B Testing:

1. Explain in detail on hypothesis testing with example.


2. Interpret the importance of power and significance in Statistical Hypothesis Testing
with a suitable Python routine.
3. What is an A/B test, and why is it used in data science? Describe the steps involved in
designing and running an A/B test.
4. Describe A/B test with an example.
5. Explain null and alternative hypothesis by considering the example for a flipping
coin. Write a Python program to flip the coin 1000 times and count the number of
heads and tails. Based on the results, determine if the coin is fair.
6. What is P-Hacking? Describe A/B test with an example.
7. Explain statistical hypothesis testing with examples.
8. Explain A/B testing with an example, with a relevant equation.
9. Write a short note on null and alternative hypothesis by considering the example for a
flipping coin.
10. Describe the process of statistical hypothesis testing. Using the example of flipping a
coin, explain how you would determine if a coin is fair or biased.
11. Illustrate A/B test with an example.
12. Illustrate p-Values with an example.
13. What are p-values and confidence intervals in the context of hypothesis testing?
Discuss their significance and how they are used in making statistical inferences.
14. Explain Confidence Intervals with an example.
15. Write a note on confidence intervals in detail.
16. What is p-hacking? Write a short note running an A/B testing.

Data Cleaning & Munging:

1. Explain data cleaning, data munging, and manipulating Data.


2. Explain cleaning and munging of data with an example.
3. Illustrate cleaning and munging with suitable code.

Web Scraping:

1. Explain the methodologies to extract data from web scraping.


2. Articulate the role of BeautifulSoup in Web scraping using Python snippet.
3. Consider an HTML file. Write a Python program to scrap the page extract values
associated with tags and properties.
4. Write a Python code for scraping an HTML document with an example.
5. Consider an HTML file and build a Python program to scrap the page, extract values
associated with tags and properties.
6. Describe the steps involved in obtaining data from various sources such as stdin,
stdout, reading files, web scraping, and using APIs. Provide a detailed example of
using the Twitter API to gather data.
7. Write a short note on Beautiful Soup library.

Linear Regression & Error Detection:

1. What is Simple Linear Regression? How is error calculated in the Linear Regression
model? How would you detect overfitting in a linear model?
2. Explain the mathematical intuition of Multiple Linear Regression. Explain the steps.
3. Explain how gradient descent is used to fit parameterized models.

Data Handling with Python (CSV, Named Tuples, etc.):

1. Sketch the use of csv.reader, csv.DictReader, and csv.writer in processing Delimited


Files.
2. Brief out Bootstrapping. Explain how manipulation of data is done and brief out what
is named tuples.
3. Write a program that counts the lines it receives and then writes out the count.
4. Explain and write a code using the “NamedTuple” class.
5. Illustrate the difference between named tuples and Data classes with an example.
6. Write a Python code for counting the number of lines and counting the 10 most
repeated words in the given file using stdin and stdout and regular expression.

Dimensionality Reduction:

1. Explain dimensionality reduction in detail.


2. Explain in detail dimensionality reduction with an example.

Miscellaneous:
1. Predict the genre of the ‘Barbie’ movie with IMDB=7.4 and duration 114 using KNN,
considering k=3.

2. Illustrate tqdm Library functions with an example.


3. Illustrate the tqdm library by considering an example.
4. Compute code to explain the beta distributions.
5. Explain with an example the concept of rescaling.
6. Illustrate 1D, 2D, and multi-dimensional data with examples.
7. Describe Bayesian Inference in detail.
8. Compute code to explain the beta distributions.
MODULE 3

Overfitting and Underfitting

1. Explain underfitting and overfitting in detail.


2. Summarize overfitting and underfitting with examples and explain how to resolve
them.
3. Compare overfitting and underfitting the training data in Machine Learning.
4. Explain overfitting and underfitting with examples.
5. Define machine learning and discuss the difference between overfitting and
underfitting. How can these issues be mitigated in model training?
6. Discuss the Bias-Variance tradeoff in detail.

Naive Bayes Algorithm

1. Explain Naive Bayes as a really dumb spam filter.


2. Explain Naïve Bayes Algorithm in the context of classification with functions.
3. Describe theoretically the Naive Bayes theorem to model a sophisticated spam filter
and write a Python program to classify whether a message contains spam or not using
Naive Bayes theorem.
4. Describe theoretically the Naive Bayes theorem to model a sophisticated spam filter.
5. What is the Naive Bayes algorithm? Illustrate its application with an example of a
spam filter.

Logistic Function and Logistic Regression

1. Explain the use of the logistic function in logistic regression in detail.


2. Explain the logistic function in detail.
3. Write a note on simple linear regression using gradient descent.

Simple Linear Regression and Gradient Descent

1. Explain the simple linear regression model in detail and write a Python program to
illustrate gradient descent for a simple linear regression model.
2. Write a note on simple linear regression using gradient descent.

Feature Extraction and Feature Selection

1. What is feature extraction, and why is it important in machine learning? Explain the
difference between feature extraction and feature selection.
2. Write a short note on feature extraction and selection.
3. Illustrate the process of feature extraction and selection in machine learning. Why is
this step important, and what techniques are commonly used?
Support Vector Machines (SVM)

1. How is the support vector machine used to classify the data?


2. Write a program to train an SVM classifier on the iris dataset using sklearn. Try
different kernels and the associated hyperparameters. Train the model with the
following set of hyperparameters: RBF kernel, gamma=0.5, one-vs-rest classifier, no
feature normalization. Also try C=0.01, 1, 10. For the above set of hyperparameters,
find the best classification accuracy along with the total number of support vectors on
the test data.
3. Explain Support Vector Machines in detail.

Iris Dataset

1. Describe the Iris dataset and its significance in machine learning. What are the
features and target variables in the Iris dataset? How is the Iris dataset typically used
to demonstrate classification algorithms?
2. What is the Iris Dataset? Build a model that can predict the class from the first four
measurements.
3. Write a Python program to build a K-nearest neighbor model that can predict the class
from the Iris dataset.

Model and Model Fitting

1. What is a model in the context of machine learning? Explain the difference between
supervised and unsupervised learning models.
2. Discuss the need for fitting the model in Multiple Regression.

Maximum Likelihood Estimation

1. Write a short note on Maximum Likelihood Estimation.


2. Discuss the process of simple linear regression. Explain how gradient descent and
maximum likelihood estimation are used to fit the model.

Digression

1. Explain Digression in detail.


2. Write a short note on digression with code.

Regularization

1. Explain in detail the regularization technique in machine learning.


2. Illustrate regularization.
3.
K-Nearest Neighbors (K-NN)

1. Explain the K-Nearest Neighbors Algorithm using the Iris dataset.


2. Explain the K-Nearest Neighbors (K-NN) algorithm with an example.
3. Write a Python program to build a K-nearest neighbor model that can predict the class
from the Iris dataset.

Standard Errors and Regression Coefficients

1. Explain the Standard errors of Regression Coefficients.


2. Illustrate standard errors of regression coefficients.
MODULE 4

Decision Trees

1. Illustrate the working of decision tree and explain the importance of entropy in
decision trees.
2. Can decision trees handle continuous data? If so, how is entropy used to handle
continuous data in decision trees? What are the limitations of decision trees?
3. Discuss decision trees in detail and provide a Python program to create a decision
tree.
4. Describe the decision tree process with Python and demonstrate the ID3 algorithm.
5. Consider the following dataset. Write a program to demonstrate the working of the
decision tree based ID3 algorithm.

6. Explain the role of entropy and entropy partition in creating a decision tree with
explanation and Python code.
7. Describe how entropy is used to create a decision tree and provide an example to
illustrate the process.

Feedforward Neural Networks & Backpropagation

1. Define a feedforward neural network and explain the backpropagation method for
training neural networks.
2. Describe the basic architecture of a feedforward neural network and explain the
concept of a loss function.
3. Discuss the role of the backpropagation algorithm in training neural networks.
4. Explain layer abstraction in deep learning and provide a Python program to compute
loss and optimization in deep learning.
5. Illustrate K-Nearest Neighbors with code.
6. Define neural networks and explain implementing AND function using the perceptron
algorithm.
7. Illustrate the backpropagation algorithm, its importance in training neural networks,
and how gradients are computed and weights are updated.
Deep Learning vs. Machine Learning

1. Describe how deep learning differs from machine learning.


2. Explain deep learning and how it differs from traditional machine learning methods.
Include the general architecture of a deep learning model.
3. Define a loss function and discuss its importance in deep learning. Describe common
loss functions used for different applications.

Artificial Neural Networks

1. Illustrate the working of artificial neural networks.


2. Explain neural networks as a sequence of layers with functions.
3. Describe the basic structure and function of a perceptron and its role as a building
block in feedforward neural networks.
4. Illustrate the working of the perceptron using OR Gate and AND Gate as examples.

Clustering & K-Means

1. Define clustering and explain the K-means clustering algorithm in detail.


2. Describe the basic idea behind clustering algorithms using color quantization as an
example.
3. Consider the dataset with coordinates and cluster labels, compute the rand index for
various clustering methods, visualize the dataset, and determine which algorithm
recovers the true clusters.
4. Explain the bottom-up hierarchical clustering approach with examples.
5. Illustrate K-means clustering with examples.

Optimization and Loss Functions

1. Define an optimization algorithm and explain its role in training deep learning
models. Describe gradient descent and its variants.
2. Define entropy and write code for entropy calculation.
3. Write a function to compute gradients for backpropagation.
4. Write code to train a network that computes XOR using a new framework.
5. Write code to generate any number of clusters by performing the appropriate number
of unmerges.
6. Explain the process of training a neural network on the MNIST dataset, including
architecture, input preprocessing, evaluation metrics, and a summary of the network's
performance.

Miscellaneous

1. Build and explain the Random Forests algorithm.


2. Compute tensors in deep learning by implementing concepts in Python.
3. Construct linear layers with implementation in Python.
4. Write a Python program to train a network that can compute XOR.
MODULE 5

Gibbs Sampling & Topic Modeling

1. Define and explain Gibbs Sampling with an example.


2. Describe Gibbs Sampling and its application in machine learning or statistical
modeling. Provide an example of using Gibbs Sampling to estimate parameters in a
Bayesian model.
3. Summarize topic modeling with reference to topic-word distribution and document-
topic distribution.
4. Build with relevant Python code and explain topic modeling for natural language
processing.

Recurrent Neural Networks (RNNs)

1. Write a note on Recurrent Neural Networks (RNNs).


2. What are Recurrent Neural Networks (RNNs), and how do they differ from
feedforward neural networks? What are Long Short-Term Memory (LSTM)
networks?
3. Explain the architecture and function of recurrent neural networks (RNNs). Provide
an example of using a character-level RNN in a text generation task.
4. Describe the architecture of a recurrent neural network (RNN) and its application in
sequential data modeling. Implement a simple character-level RNN using Python and
train it on a text dataset.
5. Explain Recurrent Neural Network in detail.

Word Clouds & n-Gram Language Models

1. Explain Word Clouds and n-Gram Language Models.


2. Discuss word clouds and write a Python program to generate word clouds.
3. Explain Word cloud approach in data visualization using Python code snippet.
4. What is an n-gram in the context of language modeling? Explain the differences
between unigrams, bigrams, and trigrams.
5. Describe n-Gram language models in detail.
6. Discuss n-gram language models and their application in NLP. How do these models
help in understanding the context within a text? Provide an example.
7. Explain how grammars are used in modeling languages.

Recommender Systems

1. Write a note on recommender systems.


2. Explain item-based collaborative filtering and matrix factorization.
3. Describe item-based collaborative filtering and how it differs from user-based
collaborative filtering.
4. How does item-based collaborative filtering generate recommendations in a
recommendation system?
5. Explain matrix factorization in the context of recommender systems. Discuss how it is
used to improve recommendation accuracy and provide an example.
6. Discuss the techniques used for recommender systems: (i) User-based collaborative
filtering (ii) Item-based collaborative filtering.
7. Write a code to find the interests most similar to Big Data (interest 0) using item-
based collaborative filtering.

Centrality Measures & Network Analysis

1. Write a note on betweenness centrality and eigenvector centrality.


2. Discuss the following metrics used for network analysis: (i) Betweenness centrality
(ii) Closeness centrality (iii) Eigenvector centrality.
3. With an example, explain the DataSciencester network sized by betweenness
centrality.
4. Write a code to find an eigenvector using matrix_times_vector.
5. Write a code to explain the DataSciencester network sized by PageRank.
6. Define network analysis and how two centrality measures are used to evaluate node
importance in a network. Calculate the degree centrality and betweenness centrality of
nodes in a small social network graph.
7. Describe the function of a recurrent layer in a recurrent neural network (RNN). (Note:
This question is related to RNNs but can be cross-referenced here for context.)

PageRank Algorithm & Graphs

1. Illustrate the PageRank algorithm and its application in directed graphs. How does it
work and what is its significance in network analysis?
2. Develop a Python function for the PageRank algorithm for a directed graph.

3. Explain PageRank with the Hypertext Induced Topic Selection algorithm in terms of
their underlying principles and use cases.

Additional Topics

1. Compare singular value decomposition with probabilistic matrix factorization in


terms of their suitability for recommendation systems.
2. Write a code to generate sentences using bigrams.

You might also like