0% found this document useful (0 votes)

41 views8 pages

Dsbda Viva Ans

The document discusses various data science concepts like data frames, data preprocessing, different regression techniques, evaluation metrics, Bayes theorem and natural language processing. It provides examples and explanations of these concepts across 7 assignments.

Uploaded by

Ehteshamalisayyad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views8 pages

Dsbda Viva Ans

Uploaded by

Ehteshamalisayyad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

**Assignment No.

1**
1. **Explain Data Frame with Suitable Example**
- A data frame is a two-dimensional data structure, similar to a table, typically used in data
analysis. It's a key concept in libraries like pandas and R.
- Example in Python:
```python
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
})
```

2. What is the Limitation of the Label Encoding Method?

- Label encoding can create false ordinal relationships between categorical data, leading to
unintended results in models that expect ordinal relationships.

3. What is the Need for Data Normalization?

- Data normalization is needed to ensure uniformity in scale across different features, reducing
bias in models and improving performance in algorithms like K-means and neural networks.

4. What are the Different Techniques for Handling Missing Data?

- Common techniques include:
- Imputation with mean, median, or mode.
- Dropping rows or columns with missing values.
- Using algorithms that handle missing data natively.
- Forward/backward filling for time series data.

5. How Can We Read the Data through the Pandas Library?

- Common functions to read data with pandas:
- `pd.read_csv()`: Reads CSV files.
- `pd.read_excel()`: Reads Excel files.
- `pd.read_sql()`: Reads SQL query results.
- `pd.read_json()`: Reads JSON data.

6. Different Types of Library and Their Uses

- Common data-related libraries and their uses:
- **Pandas**: Data manipulation and analysis.
- **NumPy**: Numerical operations with arrays.
- **Matplotlib/Seaborn**: Data visualization.
- **Scikit-learn**: Machine learning algorithms.
- **TensorFlow/PyTorch**: Deep learning frameworks.

7. What is Meant by Data Preprocessing?

- Data preprocessing involves transforming raw data into a suitable format for analysis or
machine learning, including tasks like cleaning, normalization, and feature engineering.
8. **What is Meant by Outliers? How Can We Work on It?**
- Outliers are data points that deviate significantly from other observations. Methods to handle
them include:
- Removing outliers.
- Replacing them with a central tendency measure (like mean).
- Using robust statistical techniques to reduce their impact.
- Applying transformations to mitigate their effects.

---

Assignment No. 2

1. **How to Identify and Handle Null Values?**
- Use pandas functions like `isnull()`, `isna()`, or `fillna()` to identify and handle null values. You
can drop rows/columns with nulls or impute them with a central value.

2. What is the Purpose of Data Transformation?

- Data transformation standardizes or normalizes data, making it compatible with certain
algorithms, reducing skewness, and ensuring more reliable analysis.

3. Explain the Methods to Detect Outliers

- Common methods for detecting outliers:
- Statistical methods like Z-score or Interquartile Range (IQR).
- Visual methods such as box plots, scatter plots, or histograms.
- Isolation Forest or clustering algorithms.

4. **Write the Algorithm to Display the Statistics of Null Values Present in the Dataset**
```python
import pandas as pd
df = pd.read_csv('file.csv')
null_counts = df.isnull().sum()
print("Null values in each column:")
print(null_counts)
```

5. **Write an Algorithm to Replace Outlier Value with the Mean of the Variable**
```python
import numpy as np
import pandas as pd

df = pd.read_csv('file.csv')
mean_value = df['Column_Name'].mean()
std_dev = df['Column_Name'].std()

# Set a threshold to define outliers, e.g., z-score > 3

z_scores = (df['Column_Name'] - mean_value) / std_dev
outlier_threshold = 3
df.loc[np.abs(z_scores) > outlier_threshold, 'Column_Name'] = mean_value
```

---
**Assignment No. 3**
1. **What are the Measures of Central Tendency?**
- Measures of central tendency describe the center of a dataset:
- **Mean**: The average of the data.
- **Median**: The middle value when data is sorted.
- **Mode**: The most frequent value in the data.

2. What are the Measures of Dispersion?

- Measures of dispersion describe the spread or variability of a dataset:
- **Variance**: Average of the squared deviations from the mean.
- **Standard Deviation**: The square root of variance.
- **Range**: The difference between the maximum and minimum values.

3. What is the Difference Between Range and Variance?

- **Range**: A simple measure of dispersion; it is the difference between the highest and lowest
values in a dataset.
- **Variance**: A more complex measure of dispersion that considers how far each value is from
the mean.

4. What is Meant by Hypothesis Testing?

- Hypothesis testing is a statistical method to test assumptions or hypotheses about a population
parameter, often using significance tests and p-values.

5. What is Type 1 and Type 2 Error?

- **Type 1 Error**: Incorrectly rejecting a true null hypothesis (false positive).
- **Type 2 Error**: Failing to reject a false null hypothesis (false negative).

---

Assignment No. 4

1. **What is Regression?**
- Regression is a statistical method for estimating the relationship between a dependent variable
and one or more independent variables, often used for prediction and analysis.

2. Difference Between Linear and Logistic Regression

- **Linear Regression**: Models the relationship between variables assuming a linear
relationship; used for continuous outcomes.
- **Logistic Regression**: Models the relationship between variables assuming a logistic
function; used for binary or categorical outcomes.

3. What are the Different Types of Logistic Regression?

- Types of logistic regression include:
- **Binary Logistic Regression**: Predicts binary outcomes.
- **Multinomial Logistic Regression**: Predicts categorical outcomes with more than two
classes.
- **Ordinal Logistic Regression**: Predicts ordered categorical outcomes.
4. **What are the Different Types of Linear Regression?**
- **Simple Linear Regression**: One dependent variable and one independent variable.
- **Multiple Linear Regression**: One dependent variable with multiple independent variables.
- **Polynomial Regression**: Uses polynomial functions to model relationships.

5. How to Compute SST, SSE, SSR, MSE, RMSE, R Square

- SST: Total sum of squares.
- SSE: Sum of squares due to error.
- SSR: Sum of squares due to regression.
- MSE: Mean squared error.
- RMSE: Root mean squared error.
- R-Square: Coefficient of determination, indicating the proportion of variance explained by the
regression model.
```python
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np
import pandas as pd

df = pd.read_csv('file.csv')
X = df[['feature1', 'feature2']]
y = df['target']

model = LinearRegression().fit(X, y)
y_pred = model.predict(X)

SST = np.sum((y - np.mean(y)) ** 2)

SSE = np.sum((y - y_pred) ** 2)
SSR = np.sum((y_pred - np.mean(y)) ** 2)

MSE = mean_squared error(y, y_pred)

RMSE = np.sqrt(MSE)
R_Square = model.score(X, y)
```

---

Assignment No. 5

1. **How to Evaluate a Classification Model?**
- Common methods for evaluating classification models include:
- Confusion matrix.
- Accuracy, precision, recall, F1-score.
- ROC curve and area under the ROC curve (AUC).

2. How to Evaluate a Regression Model?

- Evaluation metrics for regression models include:
- Mean squared error (MSE).
- Root mean squared error (RMSE).
- Mean absolute error (MAE).
- R-Square.
3. **What is Accuracy, Precision, Recall, and F1-Score?**
- **Accuracy**: The proportion of correct predictions among total predictions.
- **Precision**: The ratio of true positives to all predicted positives.
- **Recall**: The ratio of true positives to all actual positives.
- **F1-Score**: A harmonic mean of precision and recall.

---

Assignment No. 6

1. **What is Bayes Theorem?**
- Bayes theorem calculates the probability of a hypothesis given prior conditions or evidence.
- Formula: \( P(A|B) = \frac{P(B|A) \times P(A)}{P(B)} \).

2. Different Key Terms of Bayes Theorem?

- **Prior probability**: The probability of an event before new evidence.
- **Posterior probability**: The updated probability after new evidence.
- **Likelihood**: The probability of observing specific evidence given a particular

hypothesis.

3. What is Meant by Likelihood Probability?

- Likelihood probability is the probability of observed data given certain assumptions or a model.
It’s often used to evaluate how well a model fits the data.

---

Assignment No. 7

1. **What is POS Tagging?**
- POS (Part of Speech) tagging is the process of assigning parts of speech, such as noun, verb,
adjective, etc., to words in a text.

2. What is Meant by Lemmatization?

- Lemmatization is the process of reducing words to their base or root form while considering
grammar and context. It aims to return words to their canonical form.

3. What is Meant by Stemming?

- Stemming involves reducing words to their basic form by removing affixes without considering
the specific context. It’s simpler than lemmatization.

4. What is Meant by Tokenization?

- Tokenization involves breaking text into smaller units, such as words or sentences, for analysis
or processing.

5. Why Remove Stop Words in Text Analysis?

- Stop words are common words that do not contribute significant meaning to text (like "the," "is,"
"and"). Removing them reduces noise and improves the focus on meaningful words.
6. **Advantages and Disadvantages of TF-IDF**
- **Advantages**:
- Emphasizes important words in a text by reducing the impact of common terms.
- **Disadvantages**:
- Doesn’t consider word semantics.
- May overemphasize less common terms.

7. Steps of a Text Analysis Model Using TF-IDF

- Tokenize the text.
- Remove stop words.
- Apply lemmatization or stemming.
- Calculate TF-IDF for each token.
- Use TF-IDF features for further analysis or modeling.

---

Assignment No. 8

1. **Explain Scatter Plot**
- A scatter plot is a graph showing individual data points, typically used to visualize relationships
between two variables. Useful for identifying trends and outliers.

2. What is Univariate, Bivariate, Multivariate Plot?

- **Univariate Plot**: Analysis of one variable (e.g., histogram, box plot).
- **Bivariate Plot**: Analysis of two variables (e.g., scatter plot).
- **Multivariate Plot**: Analysis of more than two variables (e.g., 3D scatter plot, pair plot).

3. **What is CM?**
- CM typically stands for confusion matrix, used to evaluate classification models by showing
true positives, true negatives, false positives, and false negatives.

4. What is a Heatmap? What Does 0 & 1 Mean in a Heatmap?

- A heatmap is a visual representation of data using color gradients. It can represent different
types of data, such as correlation matrices or categorical relationships.
- In heatmaps, 0 and 1 may represent binary outcomes, with 0 indicating one condition and 1
indicating another.

5. How to Handle Text Data?

- Techniques for handling text data include:
- Tokenization.
- Removing stop words.
- Lemmatization or stemming.
- Feature extraction with methods like TF-IDF or bag-of-words.

---
**Assignment No. 9**
1. **What is the Use of Statistics in Data Science?**
- Statistics is used to understand and analyze data, make inferences, and validate models. It
provides foundational techniques for data science and machine learning.

2. How to Analyze a Single Feature?

- Different ways to analyze a single feature include:
- Descriptive statistics like mean, median, and mode.
- Visualization methods like histograms, box plots, or density plots.
- Normality tests such as Shapiro-Wilk or Kolmogorov-Smirnov.

3. What is the Interquartile Range (IQR)?

- The interquartile range (IQR) is the difference between the first and third quartiles, used to
measure data spread and identify outliers. Data outside 1.5 times the IQR from the quartiles is
considered outlier-prone.

4. **What is a Z-Score?**
- A Z-score represents the number of standard deviations a data point is from the mean. It’s used
to identify outliers and standardize data.

5. Does the Outliers Affect Which Central Tendency?

- Outliers primarily affect the mean, since it's sensitive to extreme values. The median is more
robust to outliers, representing the central value in an ordered dataset.

---

Assignment No. 10

1. **How to Create a Histogram?**
- Creating a histogram with Python:
```python
import matplotlib.pyplot as plt
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
plt.hist(data, bins=5)
plt.show()
```
2. **Difference Between Histogram and Bar Graph?**
- **Histogram**: Used to represent continuous data, with bars representing frequency in specific
ranges (bins).
- **Bar Graph**: Used to represent categorical data, with bars representing each category.

3. What is a Density Plot?

- A density plot is a graph that represents the distribution of a variable, often as a smooth curve,
used to estimate and visualize data distributions.

4. What Do You Understand by Density Plot?

- A density plot shows the estimated distribution of a continuous variable, providing a smooth
representation of data distribution.
---

Assignment No. 11, 10, 12

1. **What is HDFS (Hadoop Distributed File System)?**
- HDFS is a distributed file system designed for large-scale data storage and processing,
offering scalability and redundancy across multiple nodes.

2. **What is MapReduce?**
- MapReduce is a programming model in Hadoop for distributed data processing. It consists of
"Map" tasks for parallel processing and "Reduce" tasks for aggregating results.

3. What is the Purpose of Pig in HDFS Architecture?

- Apache Pig provides a high-level platform for MapReduce programming with a scripting
language called Pig Latin, simplifying complex data processing tasks.

4. Steps to Install HDFS

- Download and install Hadoop.
- Configure core-site.xml and hdfs-site.xml for cluster setup.
- Format the NameNode.
- Start the NameNode and DataNodes.

5. How Does MapReduce Work?

- In the "Map" phase, input data is processed in parallel to generate intermediate key-value
pairs.
- In the "Reduce" phase, intermediate results are aggregated to produce the final output.

6. Steps to Install Hadoop for Distributed Environment

- Download and install Hadoop.
- Configure Hadoop environment variables.
- Set up cluster configurations in core-site.xml, hdfs-site.xml, mapred-site.xml.
- Start the Hadoop services for distributed processing.

7. Steps to Install Scala

- Download and install Scala from its official website.
- Configure Scala environment variables.
- Verify the installation with a simple Scala script or REPL.

---

Cognitive Class - Answers Data Analysis With Python
No ratings yet
Cognitive Class - Answers Data Analysis With Python
6 pages
Statistics With R 2014vb PDF
No ratings yet
Statistics With R 2014vb PDF
102 pages
data analytics lab manual
No ratings yet
data analytics lab manual
26 pages
Viva
No ratings yet
Viva
7 pages
DataAnalytics Lab Manual (1)
No ratings yet
DataAnalytics Lab Manual (1)
35 pages
Oral Aswers Dsbda
No ratings yet
Oral Aswers Dsbda
7 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
DA PROGRAM UPTO 6 (1)
No ratings yet
DA PROGRAM UPTO 6 (1)
20 pages
Updated Syllabus ML Ver 1
No ratings yet
Updated Syllabus ML Ver 1
21 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
2 Unit
No ratings yet
2 Unit
2 pages
Week 6 - Data Cleaning
No ratings yet
Week 6 - Data Cleaning
8 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
ds
No ratings yet
ds
28 pages
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
No ratings yet
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
68 pages
1data Cleansing Cheklist
No ratings yet
1data Cleansing Cheklist
2 pages
L-2 (Data Frame Part 1).Ipynb - Colab
No ratings yet
L-2 (Data Frame Part 1).Ipynb - Colab
5 pages
Data Science
No ratings yet
Data Science
18 pages
DSBDA Lab Assignment No 2
No ratings yet
DSBDA Lab Assignment No 2
7 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Foundation of Data Science previous year question paper
No ratings yet
Foundation of Data Science previous year question paper
40 pages
ml programs
No ratings yet
ml programs
41 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
14 pages
DA lab
No ratings yet
DA lab
27 pages
DS Problem Statements and Codes
No ratings yet
DS Problem Statements and Codes
21 pages
Predictive_Modelling_Alternate_Project_Business_Case.docx
No ratings yet
Predictive_Modelling_Alternate_Project_Business_Case.docx
47 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
ml file syllabus
No ratings yet
ml file syllabus
43 pages
Train
No ratings yet
Train
17 pages
lab ML
No ratings yet
lab ML
26 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
ML Interview Questions
No ratings yet
ML Interview Questions
10 pages
data science practicals
No ratings yet
data science practicals
47 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
EDA_INDEPTH
No ratings yet
EDA_INDEPTH
19 pages
Group A Assignment No2 Writeup
No ratings yet
Group A Assignment No2 Writeup
9 pages
ML Complete Notes Hridoy.docx
No ratings yet
ML Complete Notes Hridoy.docx
5 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
ML Practical File
100% (2)
ML Practical File
43 pages
Data Science
No ratings yet
Data Science
15 pages
External
No ratings yet
External
11 pages
Corrected_Index_of_Topics
No ratings yet
Corrected_Index_of_Topics
2 pages
EXP-2
No ratings yet
EXP-2
6 pages
Machine Learning Lab Assignment 2
No ratings yet
Machine Learning Lab Assignment 2
23 pages
100 Days of Machine Learning
No ratings yet
100 Days of Machine Learning
14 pages
Advance Python
No ratings yet
Advance Python
5 pages
ADS LAB Merged
No ratings yet
ADS LAB Merged
86 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
Machine Exercise 3 (1)
No ratings yet
Machine Exercise 3 (1)
22 pages
EXP-2 ML
No ratings yet
EXP-2 ML
6 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
Ds 5
No ratings yet
Ds 5
9 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Chapter 2 Research Process
100% (1)
Chapter 2 Research Process
89 pages
07 (Chapter 7)
No ratings yet
07 (Chapter 7)
63 pages
Continuous Improvement, Probability, and Statistics: Using Creative Hands-On Techniques 1st Edition Hooper Download PDF
100% (3)
Continuous Improvement, Probability, and Statistics: Using Creative Hands-On Techniques 1st Edition Hooper Download PDF
52 pages
Beyond Significance Testing Statistics Reform in The Behavioral Sciences
50% (2)
Beyond Significance Testing Statistics Reform in The Behavioral Sciences
361 pages
Biostatistics short Notes for MST2
No ratings yet
Biostatistics short Notes for MST2
3 pages
Tryfos-Interval Estimation
No ratings yet
Tryfos-Interval Estimation
49 pages
The Impact of Brand Extension On Parent Brand Image
No ratings yet
The Impact of Brand Extension On Parent Brand Image
10 pages
Have A Nice Time Here, You Are Nearly There
100% (1)
Have A Nice Time Here, You Are Nearly There
32 pages
An Ova
No ratings yet
An Ova
82 pages
AOAC -Evaluation of Analytical Methods Used for Regulation
No ratings yet
AOAC -Evaluation of Analytical Methods Used for Regulation
6 pages
Second Periodical Examination in Advanced Statistics
No ratings yet
Second Periodical Examination in Advanced Statistics
3 pages
Statistics
No ratings yet
Statistics
99 pages
The Best Available Evidence Decision Making For Educational Improvement
No ratings yet
The Best Available Evidence Decision Making For Educational Improvement
194 pages
Business Statistics Communicating With Numbers 1st Edition Jaggia Solutions Manual
100% (39)
Business Statistics Communicating With Numbers 1st Edition Jaggia Solutions Manual
33 pages
Lecture 12
No ratings yet
Lecture 12
21 pages
Unit II - Parametric & Non-Parametric Tests
100% (1)
Unit II - Parametric & Non-Parametric Tests
81 pages
Blood Based Tests For Multicancer Early Detection
No ratings yet
Blood Based Tests For Multicancer Early Detection
10 pages
Mini Project - Golf: By: Kantimati Subramanian Iyer
No ratings yet
Mini Project - Golf: By: Kantimati Subramanian Iyer
12 pages
Performance Based Comparative Analysis of Thermal Power Plant: A Review
No ratings yet
Performance Based Comparative Analysis of Thermal Power Plant: A Review
6 pages
The Basic of Hypothesis Testing
No ratings yet
The Basic of Hypothesis Testing
18 pages
HYPOTHESIS
No ratings yet
HYPOTHESIS
24 pages
12th Week Lesson Hypothesis Testing
100% (1)
12th Week Lesson Hypothesis Testing
24 pages
AAOS Orthopaedic Knowledge Update 8
100% (1)
AAOS Orthopaedic Knowledge Update 8
763 pages
If You Have A Data Set of 120 Students in The University, You Can Find The Mean of The Data Set From Those 120 Students
No ratings yet
If You Have A Data Set of 120 Students in The University, You Can Find The Mean of The Data Set From Those 120 Students
23 pages
Jimmaexitmodel
No ratings yet
Jimmaexitmodel
20 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
18 pages
GENESIS-27
No ratings yet
GENESIS-27
19 pages
FCE BINF Question Pool Solved
No ratings yet
FCE BINF Question Pool Solved
109 pages
CH 09
0% (1)
CH 09
39 pages

Dsbda Viva Ans

Uploaded by

Dsbda Viva Ans

Uploaded by

**Assignment No.

2. **What is the Limitation of the Label Encoding Method?**

3. **What is the Need for Data Normalization?**

4. **What are the Different Techniques for Handling Missing Data?**

5. **How Can We Read the Data through the Pandas Library?**

6. **Different Types of Library and Their Uses**

7. **What is Meant by Data Preprocessing?**

**Assignment No. 2**

2. **What is the Purpose of Data Transformation?**

3. **Explain the Methods to Detect Outliers**

# Set a threshold to define outliers, e.g., z-score > 3

2. **What are the Measures of Dispersion?**

3. **What is the Difference Between Range and Variance?**

4. **What is Meant by Hypothesis Testing?**

5. **What is Type 1 and Type 2 Error?**

**Assignment No. 4**

2. **Difference Between Linear and Logistic Regression**

3. **What are the Different Types of Logistic Regression?**

5. **How to Compute SST, SSE, SSR, MSE, RMSE, R Square**

SST = np.sum((y - np.mean(y)) ** 2)

MSE = mean_squared error(y, y_pred)

**Assignment No. 5**

2. **How to Evaluate a Regression Model?**

**Assignment No. 6**

2. **Different Key Terms of Bayes Theorem?**

3. **What is Meant by Likelihood Probability?**

**Assignment No. 7**

2. **What is Meant by Lemmatization?**

3. **What is Meant by Stemming?**

4. **What is Meant by Tokenization?**

5. **Why Remove Stop Words in Text Analysis?**

7. **Steps of a Text Analysis Model Using TF-IDF**

**Assignment No. 8**

2. **What is Univariate, Bivariate, Multivariate Plot?**

4. **What is a Heatmap? What Does 0 & 1 Mean in a Heatmap?**

5. **How to Handle Text Data?**

2. **How to Analyze a Single Feature?**

3. **What is the Interquartile Range (IQR)?**

5. **Does the Outliers Affect Which Central Tendency?**

**Assignment No. 10**

3. **What is a Density Plot?**

4. **What Do You Understand by Density Plot?**

**Assignment No. 11, 10, 12**

3. **What is the Purpose of Pig in HDFS Architecture?**

4. **Steps to Install HDFS**

5. **How Does MapReduce Work?**

6. **Steps to Install Hadoop for Distributed Environment**

7. **Steps to Install Scala**

You might also like

2. What is the Limitation of the Label Encoding Method?

3. What is the Need for Data Normalization?

4. What are the Different Techniques for Handling Missing Data?

5. How Can We Read the Data through the Pandas Library?

6. Different Types of Library and Their Uses

7. What is Meant by Data Preprocessing?

Assignment No. 2

2. What is the Purpose of Data Transformation?

3. Explain the Methods to Detect Outliers

2. What are the Measures of Dispersion?

3. What is the Difference Between Range and Variance?

4. What is Meant by Hypothesis Testing?

5. What is Type 1 and Type 2 Error?

Assignment No. 4

2. Difference Between Linear and Logistic Regression

3. What are the Different Types of Logistic Regression?

5. How to Compute SST, SSE, SSR, MSE, RMSE, R Square

Assignment No. 5

2. How to Evaluate a Regression Model?

Assignment No. 6

2. Different Key Terms of Bayes Theorem?

3. What is Meant by Likelihood Probability?

Assignment No. 7

2. What is Meant by Lemmatization?

3. What is Meant by Stemming?

4. What is Meant by Tokenization?

5. Why Remove Stop Words in Text Analysis?

7. Steps of a Text Analysis Model Using TF-IDF

Assignment No. 8

2. What is Univariate, Bivariate, Multivariate Plot?

4. What is a Heatmap? What Does 0 & 1 Mean in a Heatmap?

5. How to Handle Text Data?

2. How to Analyze a Single Feature?

3. What is the Interquartile Range (IQR)?

5. Does the Outliers Affect Which Central Tendency?

Assignment No. 10

3. What is a Density Plot?

4. What Do You Understand by Density Plot?

Assignment No. 11, 10, 12

3. What is the Purpose of Pig in HDFS Architecture?

4. Steps to Install HDFS

5. How Does MapReduce Work?

6. Steps to Install Hadoop for Distributed Environment

7. Steps to Install Scala