0% found this document useful (0 votes)
2 views

FAM PTT2

Uploaded by

Aariz Fakih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

FAM PTT2

Uploaded by

Aariz Fakih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

FAM PTT2

Q1 Define machine learning?


Machine learning is a subset of artificial intelligence where computer
systems learn patterns from data to make predictions or decisions without
explicit programming. It uses algorithms to analyze data, identify patterns,
and improve its performance over time, enabling machines to learn from
experience and improve tasks without human intervention.

Q2 give application of machine learning in detail


 Healthcare:
ML predicts diseases, aids drug discovery, and tailors treatments based on
patient data, enhancing early diagnosis and personalized medicine.
 Finance:
ML models assess credit risks, predict market trends, automate trades, and
detect fraudulent activities, ensuring secure financial transactions.
 NLP and Chatbots:
ML-driven chatbots provide automated customer support and sentiment
analysis helps understand user feedback and market sentiments.
 Autonomous Vehicles:
ML algorithms process real-time data from sensors to navigate vehicles,
interpret road conditions, and ensure passenger safety in self-driving cars.
Q3 explain life cycle of machine learning with its diagram
1. Data Collection: Gather relevant data from various sources, ensuring it's
comprehensive and unbiased.
2. Data Preparation: Cleanse and preprocess the data, handling missing
values and outliers. Split the data into training and testing sets.
3. Feature Selection/Engineering: Identify significant features or create new
ones to enhance the model's performance.
4. Model Selection: Choose an appropriate machine learning algorithm
based on the problem (e.g., regression, classification) and data
characteristics.
5. Training: Feed the training data into the selected algorithm. The model
learns from the data to make predictions.
6. Evaluation: Test the model using the testing data to assess its accuracy,
precision, recall, or other relevant metrics.
7. Tuning: Fine-tune the model by adjusting hyperparameters for optimal
performance.
8. Deployment: Integrate the model into the application, allowing it to make
predictions on new, unseen data.
9. Monitoring and Maintenance: Continuously monitor the model's
performance in real-world scenarios. Retrain or update the model as
needed to maintain accuracy
Q5 explain types of data in machine learning
 Numerical Data:
Definition: Numerical data consists of real numbers used for quantitative
measurements, often representing quantities or numerical values.
Application: Predicting stock prices based on historical data.
 Categorical Data:
Definition: Categorical data represents discrete categories or labels without
a natural order. It can include variables like gender, color, or product types.
Application: Email classification (spam or not spam).
 Text Data:
Definition: Text data includes unstructured textual information, often found
in documents, articles, or social media posts.
Application: Sentiment analysis in customer reviews.
 Time Series Data:
Definition: Time series data comprises data points collected over a
continuous time interval, ordered chronologically.
Application: Weather forecasting for upcoming days.

Q6 what is dataset
A dataset is a structured collection of data used in machine learning and
statistical analysis. It comprises organized information, such as numbers,
text, or images, grouped together for research or analysis. Datasets provide
the foundation for training machine learning models, enabling algorithms to
learn patterns and make predictions based on the provided data.
Q7 explain dataset in detail with example
A dataset is a structured collection of data points used for analysis,
research, or machine learning. It can include various types of information,
such as numerical values, text, images, or any other structured format.
Datasets are essential in understanding trends, making predictions, or
training machine learning models, as they provide the raw material for
analysis and learning algorithms.
Example:
Square footage No. of bedrooms Neighborhood Price($)
1500 3 Suburb A 250000
1200 2 Downtown 320000
1800 4 Suburb B 280000
2000 3 Downtown 350000

In this example, each row represents a data point. Machine learning


algorithms can analyze this dataset to find patterns, allowing predictions of
house prices based on features like square footage and neighborhood. The
dataset's quality and relevance significantly impact the accuracy and
reliability of the insights or predictions derived from it.

Q8 explain data chaining process


"Data chaining" usually refers to the process of linking or connecting data
points or datasets in a meaningful way. This can involve creating
relationships between different datasets or organizing data in a sequential
manner to extract valuable insights.
For example, in business analytics, data chaining might involve linking
customer data with purchase history, allowing businesses to analyze
customer behavior over time. In the context of supply chain management,
data chaining could involve tracking products from manufacturing to
delivery, ensuring transparency and efficiency in the supply chain process.

Q9 difference between
a. data analytics and data science
Data analytics Data science
Analyzing past data to Utilizing advanced algorithms
understand trends and make and statistical methods to
informed decisions. predict future outcomes
Uses descriptive analysis Uses predictive and
descriptive analysis
Relies on tools like Excel, Uses a broader range of tools
SQL, and visualization tools including python and ML
libraries
Used in Business Used in Predictive modeling,
intelligence, market recommendation systems
analysis, reporting, and
dashboards.

b. statistics and data mining


Statistics Data mining
Summarize, analyze data Extract useful information
In-depth analysis, theoretical Large datasets, patterns,
concepts trends
Study of data patterns, Automated discovery of
probabilities, experiments patterns in large datasets
Mathematical formulas, Machine learning, data
statistical software mining software
Q10 define classification of machine learning
Classification is a type of supervised learning in machine learning where
the algorithm learns to categorize input data into specific classes or labels.
The goal is to predict the categorical class labels of new, unseen data based
on past observations and their corresponding labels.

Q11 How does machine learning work? Explain in detail with


its features
Machine learning is a subset of artificial intelligence that enables
computers to learn from data and make predictions or decisions without
being explicitly programmed. It works through a series of steps and
techniques, driven by algorithms, to develop models that can generalize
from data.
Some Key Features:
 Data-Driven: Machine learning relies on data as its primary source of
information. The quality and quantity of data greatly influence the
model's performance.
 Learning and Adaptation: ML models learn from data and adapt to
changing conditions, allowing them to make predictions or decisions
in real-time.
 Generalization: ML models aim to generalize patterns from data,
allowing them to make predictions on new, unseen data.
 Automation: Machine learning automates decision-making processes,
reducing the need for explicit programming.
 Pattern Recognition: ML models excel at recognizing complex
patterns and relationships in data, which may not be evident to
humans.
 Iterative Process: Machine learning is often an iterative process,
involving multiple cycles of model development, training, evaluation,
and refinement.

 Scalability: ML models can handle vast amounts of data and make


predictions at scale, making them suitable for big data applications.

Q12 Define supervised and unsupervised and semi


supervised learning
 Supervised learning:
Supervised learning is a machine learning paradigm where the algorithm is
trained on labeled data, meaning the input data is paired with corresponding
output labels or target values. The algorithm learns to map input features
to the correct output by minimizing the error between its predictions and
the actual labels during training.

 Unsupervised learning:
Unsupervised learning is a machine learning approach where the algorithm
is trained on unlabeled data, meaning there are no explicit output labels
provided. The algorithm explores the inherent structure or patterns within
the data, clustering similar data points or reducing dimensionality without
specific guidance.

 Semi supervised learning:


Semi-supervised learning is a hybrid approach that combines elements of
supervised and unsupervised learning. It leverages a small amount of
labeled data and a large amount of unlabeled data for training. The
algorithm learns from both labeled and unlabeled examples, utilizing the
unlabeled data to improve its performance.
List different types of supervised machine learning
algorithm?
 Linear Algorithms
 Tree-Based Algorithms
 Neural Network
 Time Series Forecasting

Q13 Classification vs regression


classification regression
Predicts a continuous
Predicts the class or category of numerical value based on input
an outcome variable based on features.
input features

Decision Trees, Random Forest Linear Regression, Polynomial


Regression
Accuracy, Precision, Recall, F1- Mean Squared Error, R-
score squared,
Email spam detection, Image Sales forecasting, Stock price
recognition, prediction, Weather forecasting

Q14 Advantages and disadvantages of supervised machine


learning.
advantages disadvantages
Accurate Predictions Dependency on Labeled Data
Automation Over fitting
Interpretability Limited Generalization
Need for Feature Engineering
Q15 Applications of supervised machine learning
 Image Segmentation: Supervised Learning algorithms are used in
image segmentation. In this process, image classification is performed
on different image data with pre-defined labels.
 Medical Diagnosis: Supervised algorithms are also used in the medical
field for diagnosis purposes. It is done by using medical images and
past labelled data with labels for disease conditions. With such a
process, the machine can identify a disease for the new patients.
 Fraud Detection: Supervised Learning classification algorithms are
used for identifying fraud transactions, fraud customers, etc. It is done
by using historic data to identify the patterns that can lead to possible
fraud.
 Spam detection: In spam detection & filtering, classification
algorithms are used. These algorithms classify an email as spam or
not spam. The spam emails are sent to the spam folder.
 Speech Recognition: Supervised learning algorithms are also used in
speech recognition. The algorithm is trained with voice data, and
various identifications can be done using the same, such as voice-
activated passwords, voice commands, etc.
Q16 Describe unsupervised machine learning in detail
Unsupervised machine learning is a type of machine learning where the
algorithm is trained on unlabeled data, meaning the input data does not have
corresponding output labels. The algorithm explores the inherent structure
or patterns within the data without specific guidance, allowing it to identify
hidden relationships, clusters, or patterns.

Key Characteristics:
 No Labels: Unlike supervised learning, unsupervised learning does not
have labeled output to guide the learning process. The algorithm must
find patterns and structure within the input data without any
predefined categories.
 Exploratory Analysis: Unsupervised learning is often used for
exploratory analysis, allowing data scientists to understand the data's
underlying structure, discover hidden patterns, or group similar data
points together.
 Clustering: One of the main applications of unsupervised learning is
clustering, where similar data points are grouped into clusters based
on their similarities. Common algorithms for clustering include K-
Means, Hierarchical Clustering, and DBSCAN.
 Dimensionality Reduction: Unsupervised learning techniques like
Principal Component Analysis (PCA) and t-SNE are used to reduce the
dimensionality of the data. This is especially valuable when dealing
with high-dimensional datasets, making visualization and analysis
more manageable.
Q17 Supervised vs unsupervised
Supervised unsupervised
Labeled data Unlabeled data
Used in Email spam Used in Clustering customer
classification, Handwriting segments, Anomaly detection
recognition
The algorithm receives No feedback loop; the model
feedback through labeled data explores patterns without
to adjust and improve supervision.
predictions.
Silhouette Score, Inertia,
Accuracy, Precision, Recall, F1- Davies-Bouldin Index
score

Q18 Training sets vs test sets


Training sets Test sets
Typically larger to allow for Smaller than the training set,
better model learning. ensuring an unbiased
evaluation of the model's
performance.
Quality of training set directly Quality of the test set assesses
affects the model's how well the model can
performance. generalize to new, unseen data.
Essential for building the model. Essential for validating the
model's accuracy and ensuring
it performs well on new data.
Algorithm learns from this data Used to evaluate the model's
during the training phase. accuracy and effectiveness
after training
Q19 Explain in detail issues which can occur in machine
learning models
 Over fitting:
Issue: Over fitting occurs when a model learns the training data too well,
capturing noise and specific patterns that don't generalize well to new,
unseen data. As a result, the model performs poorly on test data.
Solution: Regularization techniques, cross-validation, and using more
training data can help prevent over fitting. Choosing simpler models and
avoiding excessively complex ones also mitigate this issue.
 Under fitting:
Issue: Under fitting happens when the model is too simplistic to capture the
underlying patterns in the data. It performs poorly both on the training and
test data.
Solution: Increasing the model's complexity, using more relevant features,
and employing more advanced algorithms can help address underfitting.
 Data Quality:
Issue: Poor-quality or inconsistent data, including missing values and
outliers, can significantly impact model performance. Models learn from the
data provided, and if the data is erroneous or biased, the predictions will be
flawed.
Solution: Careful data preprocessing, cleaning, and validation are crucial.
Handling missing values, outliers, and ensuring data consistency are
essential steps.
 Imbalanced Data:
Issue: In datasets where one class significantly outnumbers the others
(class imbalance), the model tends to favor the majority class, leading to
biased predictions for the minority class.
Solution: Techniques like oversampling the minority class, under sampling
the majority class, or using algorithms designed for imbalanced data (e.g.,
SMOTE) can mitigate class imbalance issues.

Q20 Write a syntax to split the dataset


Q21 Explain how training and test data work in machine
learning.
 Training Data:
Selection: You start with a dataset containing input features (variables)
and their corresponding target values (labels or outcomes).
Division: The dataset is typically divided into two parts: the training data
and the testing data. The training data is a subset (usually around 70-80%)
of the entire dataset.
Training the Model: The machine learning algorithm uses the training data
to learn the patterns and relationships between the input features and the
target values. The algorithm iteratively adjusts its internal parameters to
minimize the difference between predicted outcomes and actual outcomes.
Model Building: After training, the algorithm produces a model that can
make predictions based on new, unseen data. This model has learned from
the training data and can generalize patterns to predict outcomes for
similar, unseen data.
 Testing Data:
Selection: The remaining portion of the dataset (usually around 20-30%) is
kept aside and not used during the training phase. This is the testing data.
Evaluation: The trained model is evaluated using the testing data. The input
features from the testing dataset are fed into the model, which then
predicts the outcomes.
Comparison: The predicted outcomes are compared with the actual
outcomes (labels) from the testing data. Common evaluation metrics like
accuracy, precision, recall, or mean squared error are used to measure
how well the model performs on the testing data.

Q22 Explain confusion matrix


A confusion matrix is a table used in machine learning to evaluate the
performance of a classification algorithm. It is particularly useful for
algorithms that classify data into multiple classes. The matrix provides a
summary of the correct and incorrect predictions made by the classifier.

Predicted Predicted
positive (P) negative (N)
Actual positive True positive False positive
(P) (TP) (FP)
Actual negative True negative False negative
(N) (TN) (FN)
 True Positive (TP): Instances that are actually positive and are
correctly predicted as positive.
 False Positive (FP): Instances that are actually negative but are
incorrectly predicted as positive.
 True Negative (TN): Instances that are actually negative and are
correctly predicted as negative.
 False Negative (FN): Instances that are actually positive but are
incorrectly predicted as negative.

Q23 Explain type 1 and type 2 error


Type 1 Error:
Type 1 error, also known as a false positive, occurs when a null hypothesis
that is actually true is rejected. In other words, it's the incorrect acceptance
of an effect or result that does not exist. For example, in a medical context,
a Type 1 error would happen if a diagnostic test incorrectly indicates the
presence of a disease when the patient is actually healthy.
Type 2 Error:
Type 2 error, also known as a false negative, happens when a null
hypothesis that is false is not rejected. This means failing to detect a real
effect or difference when it actually exists. In the medical example, a Type
2 error would occur if a diagnostic test fails to detect a disease that is
present in the patient.

Q24 Define terms:


a. Accuracy
Definition: Accuracy is a metric used to measure the overall correctness of
a model. It calculates the ratio of correctly predicted instances to the total
instances in the dataset.
Formula:
Accuracy=Number of Correct Predictions/Total Number of Predictions
b. Rate
Definition: Rate generally refers to the frequency or speed of an occurrence.
In various contexts, it can represent different measures, such as success
rate, error rate, or conversion rate, depending on the specific application.

c. Precession
Definition: Precision is a metric that measures the accuracy of positive
predictions made by a classification model. It calculates the ratio of true
positive predictions to the total positive predictions made by the model.
Formula: Precision=True Positives/(True Positives + False Positives)

d. Recall
Definition: Recall, also known as sensitivity or true positive rate, measures
the ability of a classification model to identify all relevant instances. It
calculates the ratio of true positive predictions to the total actual positive
instances in the dataset.
Formula: Recall=True Positives/(True Positives + False Negatives)

e. F1 score
Definition: F1 score is the harmonic mean of precision and recall. It provides
a balance between precision and recall when they have an uneven class
distribution. F1 score is especially useful when the class labels are
imbalanced.
Formula: F1 Score=2×(Precision×Recall/Precision + Recall)
Q25 Explain any 4 error majors used in machine learning

 Mean Absolute Error (MAE): •


Definition: MAE is a regression metric that measures the average absolute
difference between the predicted values and the actual values. •
Formula: MAE = (1/n) * Σ|Y_pred - Y_actual|
 Mean Squared Error (MSE): •
Definition: MSE is another regression metric that measures the average
squared difference between the predicted values and the actual values. •
Formula: MSE = (1/n) * Σ(Y_pred - Y_actual)^2.
 Root Mean Squared Error (RMSE): •
Definition: RMSE is a variation of MSE that provides the square root of the
average squared difference between predicted and actual values. •
Formula: RMSE = √(MSE)
 MAPE (Mean Absolute Percentage Error): •
Definition: MAPE is a percentage-based metric that assesses the relative
accuracy of predictive models by measuring the average percentage
difference between predicted and actual values, commonly applied in
finance, economics, and supply chain management. •
Formula: MAPE = (1/n) * Σ(|(Y_actual - Y_pred) / Y_actual|) * 100

You might also like