Dsbda Viva Ans
Dsbda Viva Ans
1**
1. **Explain Data Frame with Suitable Example**
- A data frame is a two-dimensional data structure, similar to a table, typically used in data
analysis. It's a key concept in libraries like pandas and R.
- Example in Python:
```python
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
})
```
---
4. **Write the Algorithm to Display the Statistics of Null Values Present in the Dataset**
```python
import pandas as pd
df = pd.read_csv('file.csv')
null_counts = df.isnull().sum()
print("Null values in each column:")
print(null_counts)
```
5. **Write an Algorithm to Replace Outlier Value with the Mean of the Variable**
```python
import numpy as np
import pandas as pd
df = pd.read_csv('file.csv')
mean_value = df['Column_Name'].mean()
std_dev = df['Column_Name'].std()
---
**Assignment No. 3**
1. **What are the Measures of Central Tendency?**
- Measures of central tendency describe the center of a dataset:
- **Mean**: The average of the data.
- **Median**: The middle value when data is sorted.
- **Mode**: The most frequent value in the data.
---
df = pd.read_csv('file.csv')
X = df[['feature1', 'feature2']]
y = df['target']
model = LinearRegression().fit(X, y)
y_pred = model.predict(X)
---
---
hypothesis.
---
---
3. **What is CM?**
- CM typically stands for confusion matrix, used to evaluate classification models by showing
true positives, true negatives, false positives, and false negatives.
---
**Assignment No. 9**
1. **What is the Use of Statistics in Data Science?**
- Statistics is used to understand and analyze data, make inferences, and validate models. It
provides foundational techniques for data science and machine learning.
4. **What is a Z-Score?**
- A Z-score represents the number of standard deviations a data point is from the mean. It’s used
to identify outliers and standardize data.
---
2. **What is MapReduce?**
- MapReduce is a programming model in Hadoop for distributed data processing. It consists of
"Map" tasks for parallel processing and "Reduce" tasks for aggregating results.
---