FAM PTT2
FAM PTT2
Q6 what is dataset
A dataset is a structured collection of data used in machine learning and
statistical analysis. It comprises organized information, such as numbers,
text, or images, grouped together for research or analysis. Datasets provide
the foundation for training machine learning models, enabling algorithms to
learn patterns and make predictions based on the provided data.
Q7 explain dataset in detail with example
A dataset is a structured collection of data points used for analysis,
research, or machine learning. It can include various types of information,
such as numerical values, text, images, or any other structured format.
Datasets are essential in understanding trends, making predictions, or
training machine learning models, as they provide the raw material for
analysis and learning algorithms.
Example:
Square footage No. of bedrooms Neighborhood Price($)
1500 3 Suburb A 250000
1200 2 Downtown 320000
1800 4 Suburb B 280000
2000 3 Downtown 350000
Q9 difference between
a. data analytics and data science
Data analytics Data science
Analyzing past data to Utilizing advanced algorithms
understand trends and make and statistical methods to
informed decisions. predict future outcomes
Uses descriptive analysis Uses predictive and
descriptive analysis
Relies on tools like Excel, Uses a broader range of tools
SQL, and visualization tools including python and ML
libraries
Used in Business Used in Predictive modeling,
intelligence, market recommendation systems
analysis, reporting, and
dashboards.
Unsupervised learning:
Unsupervised learning is a machine learning approach where the algorithm
is trained on unlabeled data, meaning there are no explicit output labels
provided. The algorithm explores the inherent structure or patterns within
the data, clustering similar data points or reducing dimensionality without
specific guidance.
Key Characteristics:
No Labels: Unlike supervised learning, unsupervised learning does not
have labeled output to guide the learning process. The algorithm must
find patterns and structure within the input data without any
predefined categories.
Exploratory Analysis: Unsupervised learning is often used for
exploratory analysis, allowing data scientists to understand the data's
underlying structure, discover hidden patterns, or group similar data
points together.
Clustering: One of the main applications of unsupervised learning is
clustering, where similar data points are grouped into clusters based
on their similarities. Common algorithms for clustering include K-
Means, Hierarchical Clustering, and DBSCAN.
Dimensionality Reduction: Unsupervised learning techniques like
Principal Component Analysis (PCA) and t-SNE are used to reduce the
dimensionality of the data. This is especially valuable when dealing
with high-dimensional datasets, making visualization and analysis
more manageable.
Q17 Supervised vs unsupervised
Supervised unsupervised
Labeled data Unlabeled data
Used in Email spam Used in Clustering customer
classification, Handwriting segments, Anomaly detection
recognition
The algorithm receives No feedback loop; the model
feedback through labeled data explores patterns without
to adjust and improve supervision.
predictions.
Silhouette Score, Inertia,
Accuracy, Precision, Recall, F1- Davies-Bouldin Index
score
Predicted Predicted
positive (P) negative (N)
Actual positive True positive False positive
(P) (TP) (FP)
Actual negative True negative False negative
(N) (TN) (FN)
True Positive (TP): Instances that are actually positive and are
correctly predicted as positive.
False Positive (FP): Instances that are actually negative but are
incorrectly predicted as positive.
True Negative (TN): Instances that are actually negative and are
correctly predicted as negative.
False Negative (FN): Instances that are actually positive but are
incorrectly predicted as negative.
c. Precession
Definition: Precision is a metric that measures the accuracy of positive
predictions made by a classification model. It calculates the ratio of true
positive predictions to the total positive predictions made by the model.
Formula: Precision=True Positives/(True Positives + False Positives)
d. Recall
Definition: Recall, also known as sensitivity or true positive rate, measures
the ability of a classification model to identify all relevant instances. It
calculates the ratio of true positive predictions to the total actual positive
instances in the dataset.
Formula: Recall=True Positives/(True Positives + False Negatives)
e. F1 score
Definition: F1 score is the harmonic mean of precision and recall. It provides
a balance between precision and recall when they have an uneven class
distribution. F1 score is especially useful when the class labels are
imbalanced.
Formula: F1 Score=2×(Precision×Recall/Precision + Recall)
Q25 Explain any 4 error majors used in machine learning