ML Viva Q&A
ML Viva Q&A
For example: Robots are programmed so that they can perform the task
based on data they gather from sensors. It automatically learns programs
from data.
● Artificial Intelligence
● Rule based inference
● Genetic Programming
● Inductive Learning
● Model building
● Model testing
● Applying the model
Machine learning relates with the study, design and development of the
algorithms that give computers the capability to learn without being
explicitly programmed.
While, data mining can be defined as the process in which the unstructured
data tries to extract knowledge or unknown interesting patterns. During this
process machine, learning algorithms are used.
The possibility of overfitting exists as the criteria used for training the model
is not the same as the criteria used to judge the efficacy of a model.
But if you have a small database and you are forced to come with a model
based on that. In such a situation, you can use a technique known as
cross validation.
In this method the dataset splits into two sections, testing and training
datasets, the testing dataset will only test the model while, in the training
dataset, the data points will come up with the model.
Since we can detect underfitting based on the training set, we can better
assist at establishing the dominant relationship between the input and
output variables at the onset.
● Decrease regularization
● Increase the duration of training
● Feature selection
Overfitting means the model fitted to training data too well, in this case, we
need to resample the data and estimate the model accuracy using
techniques like k-fold cross-validation.
Whereas for the Underfitting case we are not able to understand or capture
the patterns from the data, in this case, we need to change the algorithms,
or we need to feed more data points to the model.
● Decision Trees
● Neural Networks (back propagation)
● Probabilistic networks
● Nearest Neighbor
● Support vector machines
20) What are the different Algorithm techniques in Machine
Learning?
● Supervised Learning
● Unsupervised Learning
● Semi-supervised Learning
● Reinforcement Learning
● Transduction
● Learning to Learn
Example: 01
Knowing the height and weight identifying the gender of the person. Below
are the popular supervised learning algorithms.
Example: 02
● Classifications
● Speech recognition
● Regression
● Predict time series
● Annotate strings
24) What are the two methods used for the calibration in
Supervised Learning?
● Platt Calibration
● Isotonic Regression
These methods are designed for binary classification, and it is not trivial.
● Clustering,
● Anomaly Detection,
● Neural Networks and Latent Variable Models.
Example:
In the same example, a T-shirt clustering will categorize as “collar style and
V neck style”, “crew neck style” and “sleeve types”
Training set is an example given to the learner, while the Test set is used to
test the accuracy of the hypotheses generated by the learner, and it is the
set of examples held back from the learner. Training set is distinct from the
Test set.
The last subset is held for testing. This is done for each of the subsets. This
is k-fold cross-validation. Finally, the scores from all the k-folds are
averaged to produce the final score.
For example, a tech giant like Amazon to speed the hiring process they
build one engine where they are going to give 100 resumes, it will spit out
the top five, and hire those.
38) What is the difference between heuristic for rule learning and
heuristics for decision trees?
The difference is that the heuristics for decision trees evaluate the average
quality of a number of disjoint sets while rule learners only evaluate the
quality of the set of instances that is covered with the candidate rule.
A Support Vector Machine (SVM) is an algorithm that tries to fit a line (or
plane or hyperplane) between the different classes that maximizes the
distance from the line to the points of the classes.
In this way, it tries to find a robust separation between the classes. The
Support Vectors are the points of the edge of the dividing hyperplane.
46) What are the two classification methods that SVM ( Support
Vector Machine) can handle?
Polynomial kernel - When you have discrete data that has no natural
notion of smoothness.
Ensemble learning is used when you build component classifiers that are
more accurate and independent from each other.
51) What are the two paradigms of ensemble methods?
Boosting and Bagging both can reduce errors by reducing the variance
term.
The variance term measures how much the learning algorithm’s prediction
fluctuates for different training sets.
● Data Acquisition
● Ground Truth Acquisition
● Cross Validation Technique
● Query Type
● Scoring Metric
● Significance Test
● Sliding-window methods
● Recurrent sliding windows
● Hidden Markow models
● Maximum entropy Markov models
● Conditional random fields
● Graph transformer networks
● Imitation Learning
● Structured prediction
● Model based reinforcement learning
64) What are the different categories you can categorise the
sequence learning process?
● Sequence prediction
● Sequence generation
● Sequence recognition
● Sequential decision
65) What are Loss Function and Cost Functions? Explain the key
Difference Between them?
When calculating loss we consider only a single data point, then we use the
term loss function.
Whereas, when calculating the sum of error for multiple data then we use
the cost function. There is no major difference.
In other words, the loss function is to capture the difference between the
actual and predicted values for a single record whereas cost functions
aggregate the difference for the entire training dataset.
There are two kinds of methods that include direct methods and statistical
testing methods:
The silhouette is the most frequently used while determining the optimal
value of k.
Data required for recommender systems stems from explicit user ratings
after watching a film or listening to a song, from implicit search engine
queries and purchase histories, or from other knowledge about the
users/items themselves.
Correlation is used for measuring and also for estimating the quantitative
relationship between two variables. Correlation measures how strongly two
variables are related. Examples like, income and expenditure, demand and
supply, etc.
Parametric models will have limited parameters and to predict new data,
you only need to know the parameter of the model.
The sigmoid function is used for binary classification. The probabilities sum
needs to be 1. Whereas, Softmax function is used for multi-classification.
The probabilities sum will be 1.