Untitled document (4)
Untitled document (4)
a) Data storage optimization – Not the main goal, though efficient storage is important.
b) Pattern recognition for decision-making – Correct. Data science aims to discover patterns to
guide strategic actions.
d) Network security – A separate domain, though it can use data science techniques.
---
Answer: c) Structured
Explanation:
b) Semi-structured – Has some structure (e.g., XML, JSON) but not rigid.
---
3. The data science process step that involves handling outliers occurs during:
Answer: b) Pre-processing
Explanation:
---
Answer: b) Hadoop
Explanation:
---
---
c) Rare event counts – Correct. Like calls per hour or website hits.
Answer: b) Pre-processing
Explanation:
---
8. The term "ETL" in data management refers to:
---
---
---
---
23. Which of the following is a supervised learning task?
Answer: a) Classification
b) Clustering – Unsupervised.
---
---
---
26. Which is NOT a component of the data science Venn diagram (Drew Conway)?
a) Hacking skills
b) Math/stat knowledge
c) Substantive expertise
---
Answer: d) MongoDB
a) MySQL – SQL.
b) PostgreSQL – SQL.
c) Oracle – SQL.
e) SQLite – SQL-based.
---
Answer: b) JSON
d) MP3 – Audio.
e) PNG – Image.
---
Answer: c) Classification
---
Answer: d) RMSE
c) Recall – Classification.
e) F1 Score – Classification.
---
Answer: b) Scalability
---
a) Height – Numerical.
b) Age – Numerical.
d) Temperature – Continuous.
e) Weight – Continuous.
---
a) Features – Columns.
---
Answer: b) Matplotlib
---
---
a) Supervised – Yes.
b) Unsupervised – Yes.
c) Reinforcement – Yes.
d) Semi-supervised – Yes.
---
---
Answer: b) AutoML
a) Excel – Manual.
d) Tableau – Visualization.
---
Answer: a) NumPy
b) Flask – Web.
c) Seaborn – Visualization.
d) Pandas – Dataframes.
---
b) Correct. Balance between underfitting (high bias) and overfitting (high variance).
---
Answer: a) SMOTE
---
c) Visualization – No.
d) Deployment – No.
---
---
---
Answer: b) XGBoost
a) KNN – No boosting.
c) K-means – Clustering.
---
---
Answer: b) Preprocessing
---
---
50. Which is NOT a characteristic of big data?
Answer: d) Validity
a) Volume – Yes.
b) Velocity – Yes.
c) Variety – Yes.
e) Veracity – Yes.
---
---
Answer: a) Docker
b) Tableau – Visualization.
c) Excel – Spreadsheet.
---
---
d) Compression – No.
e) Visualization – No.
---
Answer: b) K-means
c) SVM – Supervised.
---
---
Answer: c) TensorFlow
a) Pandas – Dataframes.
b) NumPy – Numerical.
d) Matplotlib – Visualization.
---
Answer: a) Bootstrapping
c) Normalization – Scaling.