ML Reference-Material-I
ML Reference-Material-I
Learning
What is Machine Learning?
Machine learning can be broadly defined as computational methods using
experience to improve performance or to make accurate predictions.
• Experience - refers to the past information available to the learner (electronic data
collected and made available for analysis)
• Data could be in the form of digitized human-labeled training sets, or other types of
information obtained via interaction with the environment.
• Quality & Size of the data collected are crucial to the success of the predictions made by
the learner.
What kind of problems can be tackled using machine learning?
• Text or document classication. This includes problems such as assigning a topic to a text or a document, or
determining automatically if the content of a web page is inappropriate or too explicit; it also includes spam
detection.
• Natural language processing (NLP). Named-entity recognition, context-free parsing, or dependency parsing,
Spell Checking
• Speech processing applications. This includes speech recognition, speech synthesis, speaker verication, speaker
identification, as well as sub-problems such as language modeling and acoustic modeling.
• Computer vision applications. This includes object recognition, object identication, face detection, Optical
character recognition (OCR), content-based image retrieval, or pose estimation.
• Computational biology applications. This includes protein function prediction, identication of key sites, or the
analysis of gene and protein networks.
• Many other problems such as fraud detection for credit card, telephone or insurance companies, network
intrusion, learning to play games such as chess, unassisted control of vehicles such as robots or cars, medical
diagnosis, are tackled using machine learning techniques
• This list is by no means comprehensive.
Some standard learning tasks
Classification: this is the problem of assigning a category to each item.
Examples of classification
Document classification consists of assigning a category such as politics, business, sports, or weather to each
document
Image classification consists of assigning to each image a category such as car, train, or plane.
The number of categories in such tasks is often less than a few hundred, but it can be much larger in some
difficult tasks and even unbounded as in OCR, text classification, or speech recognition.
Regression: this is the problem of predicting a real value for each item.
Examples of regression
Prediction of stock values or that of variations of economic variables.
In regression, the penalty for an incorrect prediction depends on the magnitude of the difference between the
true and predicted values
Some standard learning tasks
Ranking: this is the problem of learning to order items according to some criterion.
Examples of Ranking
Web search,- returning web pages relevant to a search query
Clustering: this is the problem of partitioning a set of items into homogeneous subsets. Clustering is often used to analyze
very large data sets.
For example, in the context of social network analysis, clustering algorithms attempt to identify natural communities within
large groups of people.
Dimensionality reduction or manifold learning: this problem consists of transforming an initial representation of items into a
lower-dimensional representation while preserving some properties of the initial representation.
Scenarios differ in the types of training data available to the learner, the order and method
by which training data is received, and the test data used to evaluate the learning
algorithm.
Supervised learning: The learner receives a set of labeled examples as training data and
makes predictions for all unseen points.
This is the most common scenario associated with classification, regression, and ranking
problems , spam detection problem
Unsupervised learning: The learner exclusively receives unlabeled training data, and makes
predictions for all unseen points.
Since in general no labeled example is available in that setting, it can be difficult to
quantitatively evaluate the performance of a learner.
Clustering and dimensionality reduction are example of unsupervised learning problems.
Supervised Learning