0% found this document useful (0 votes)
20 views

Machine Learning

The document provides an overview of regression analysis in machine learning including definitions of key terms and descriptions of different types of regression models. It explains what regression analysis is used for and gives examples of linear, logistic, and other types of regressions.

Uploaded by

R Muhammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Machine Learning

The document provides an overview of regression analysis in machine learning including definitions of key terms and descriptions of different types of regression models. It explains what regression analysis is used for and gives examples of linear, logistic, and other types of regressions.

Uploaded by

R Muhammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

https://www.youtube.

com/@Codanics/playlists

Turi Create documentation:


https://apple.github.io/turicreate/docs/api/
Install Turicreate using this doc:
https://medium.com/@malondireads/installing-turicreate-on-
windows-10-534e147a4792

Week 1:

Getting started with Python, Jupyter Notebook, & Turi Create

The learning approach in this specialization is to start from use cases and then dig into
algorithms and methods, what we call a case-studies approach. The first course is focused on
understanding how ML can be used in various cases studies, and the follow on courses will dig
into the details of algorithms and methods for each of the main ML areas.

Python
Python is a simple scripting language that makes it easy to interact with data. Python is widely
used in industry, and is becoming the de facto language for data science in industry

Jupyter notebook
. The Jupyter Notebook is a simple interactive environment for programming with Python, which
makes it really easy to share your results. Think about it as a combination of a Python terminal
and a wiki page.
How to install turicreate in windows
https://www.geeksforgeeks.org/guide-to-install-turicreate-in-python3-x/

SFrame
SFrame is a scalable, tabular, column-mutable dataframe object. The data
in SFrame is stored column-wise, and is stored on persistent storage (e.g.
disk) to avoid being constrained by memory size.
Why SFrame & Turi Create
There are many excellent machine learning libraries in Python. One of the most popular one
today is scikit-learn. Similarly, there are many tools for data manipulations in Python; a popular
example is Pandas. However, most of these tools do not scale to large
datasets, including some we will tackle in this specialization. In addition, in this
specialization, we will cover a wide range of ML models, feature engineering transformation,
and evaluation metrics. With most existing packages, you will have to install a combination of
packages to get the tools that we need to tackle the use cases in this course. This is possible,
but requires advanced knowledge of Python, which we feel will slow down most people's
learning of the core concepts.
Turi Create is a highly scalable machine learning library for Python, which also includes the
SFrame, a highly-scalable library for data manipulation. A huge advantage
of SFrame over Pandas is that with SFrame, you are not limited to datasets that fit in
memory, which allows you to deal with large datasets, even on a laptop. (The SFrame API is
very similar to Pandas' API. Here is a doc showing the relationship between the two of them.)

Install turicreate:
https://github.com/apple/turicreate/issues/3198

use this code to run jupyter notebook

cd $HOME
virtualenv venv
cd venv/
source bin/activate

use this one for jupyter notebook


source venv/bin/activate
jupyter notebook

Regression Analysis in Machine


learning
Use this link:
https://www.javatpoint.com/
regression-analysis-in-machine-
learning

Regression analysis is a statistical method to model the relationship between a dependent


(target) and independent (predictor) variables with one or more independent variables. More
specifically, Regression analysis helps us to understand how the value of the dependent variable
is changing corresponding to an independent variable when other independent variables are
held fixed. It predicts continuous/real values such as temperature, age, salary, price, etc.

Regression is a supervised learning technique which helps in finding the correlation between
variables and enables us to predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction, forecasting, time series modeling, and
determining the causal-effect relationship between variables.

In Regression, we plot a graph between the variables which best fits the given datapoints, using
this plot, the machine learning model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through all the datapoints on target-
predictor graph in such a way that the vertical distance between the datapoints and the
regression line is minimum." The distance between datapoints and line tells whether a model
has captured a strong relationship or not.

Terminologies Related to the Regression


Analysis:
o Dependent Variable: The main factor in Regression analysis which we want to predict or
understand is called the dependent variable. It is also called target variable.
o Independent Variable: The factors which affect the dependent variables or which are
used to predict the values of the dependent variables are called independent variable,
also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result, so it
should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each other
than other variables, then such condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while ranking the most affecting
variable.
o Underfitting and Overfitting: If our algorithm works well with the training dataset but
not well with test dataset, then such problem is called Overfitting. And if our algorithm
does not perform well even with training dataset, then such problem is
called underfitting.

There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all the
regression methods analyze the effect of the independent variable on dependent
variables. Here we are discussing some important types of regression which are given
below:

o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:

Linear Regression:
o Linear regression is a statistical regression method which is used for predictive analysis.
o It is one of the very simple and easy algorithms which works on regression and shows
the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable (X-axis)
and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple linear
regression. And if there is more than one input variable, then such linear regression is
called multiple linear regression.
o The relationship between variables in the linear regression model can be explained using
the below image. Here we are predicting the salary of an employee on the basis of the
year of experience.

o Below is the mathematical equation for Linear regression:

1. Y= aX+b

Here, Y = dependent variables (target variables),


X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates


o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.

turicreate.SFrame.groupby
SFrame.groupby (key_column_names, operations, *args)

products.groupby('name',operations={'count':turicreate.aggregate.COUNT()})
https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.groupby.html#turicreate-
sframe-groupby
roducts['word_count'] = turicreate.text_analytics.count_words(products['review'])

turicreate.SFrame.sort
SFrame.sort (key_column_names, ascending=True)

.sort('count',ascending=False)

turicreate.logistic_classifier.create
turicreate.logistic_classifier.create (dataset, target, features=None, l2_penalty=0.01, l1_penal
ty=0.0, solver='auto', feature_rescaling=True, convergence_threshold=0.01, step_size=1.
0, lbfgs_memory_level=11, max_iterations=10, class_weights=None, validation_set='auto
', verbose=True, seed=None)
sentiment_model = turicreate.logistic_classifier.create(train_data,target='sentiment',
features=['word_count'], validation_set=test_data)
sentiment = is column
word_count = is column
train_data,test_data = products.random_split(.8,seed=0)
train_data = 80% of data
test data = 20% of data

turicreate.linear_regression.LinearRegression.pre
dict
LinearRegression.predict (dataset, missing_value_action='auto')

products['predicted_sentiment'] = sentiment_model.predict(products, output_type = 'probability')

You might also like