100% found this document useful (1 vote)

54 views

Variosalgoritmos - Jupyter Notebook

The document compares several regression algorithms on a sales dataset. It finds that a polynomial regression model with degree 3 performs best with a root mean squared error of 0.5803. Decision tree regression also performs well with a RMSE of 1.0120. The document concludes that for this small dataset, polynomial regression and decision tree regression are better choices than other algorithms like KNN or random forests due to data limitations.

Uploaded by

PAULO CESAR CALDERON BERMUDO

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

54 views

Variosalgoritmos - Jupyter Notebook

Uploaded by

PAULO CESAR CALDERON BERMUDO

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Regresión con varios algoritmos

REGRESION LINEAL
In [28]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [29]: df = pd.read_csv("Advertising2.csv")

In [89]: def ejecutar_modelo(modelo, X_train, y_train, X_test, y_test):

#crea el modelo, lo entrena y evalua
modelo.fit(X_train, y_train)
preds = modelo.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test,preds))
print('RMSE: %.4f'%(rmse))
#graficar los resultados
#rango_senal = np.arange(0,101)
#salida = modelo.predict(rango_senal.reshape(-1,1))
#plt.figure(figsize=(12,6),dpi=200)
#sns.scatterplot(x='Publicidad',y='Ventas',data=df,color='black')

In [100]: df.tail()

Out[100]: TV radio newspaper sales

195 38.2 3.7 13.8 7.6

196 94.2 4.9 8.1 9.7

197 177.0 9.3 6.4 12.8

198 283.6 42.0 66.2 25.5

199 232.1 8.6 8.7 13.4

In [32]: fig,axes = plt.subplots(nrows=1,ncols=3,figsize=(16,6))

axes[0].plot(df['TV'],df['sales'],'o')
axes[0].set_ylabel("Ventas")
axes[0].set_title("Gasto en TV")

axes[1].plot(df['radio'],df['sales'],'o')
axes[1].set_title("Gasto en Radio")
axes[1].set_ylabel("Ventas")

axes[2].plot(df['newspaper'],df['sales'],'o')
axes[2].set_title("Gasto en Periódicos");
axes[2].set_ylabel("Ventas")
plt.tight_layout();
In [33]: # Relaciones entre característica
sns.pairplot(df,diag_kind='kde')

Out[33]: <seaborn.axisgrid.PairGrid at 0x2c9919aa2f0>

In [34]: X = df.drop('sales',axis=1)
y = df['sales']

In [35]: from sklearn.model_selection import train_test_split

In [36]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

In [37]: from sklearn.linear_model import LinearRegression

In [38]: modelo = LinearRegression()

In [39]: modelo.fit(X_train,y_train)

Out[39]: LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [40]: modelo.intercept_

Out[40]: 3.1515267680706494

In [41]: modelo.coef_

Out[41]: array([ 0.04469599, 0.1875657 , -0.00032275])

In [42]: predicciones_test = modelo.predict(X_test)

In [43]: # Muestra las primeras 5 predicciones de prueba
predicciones_test[0:5]

Out[43]: array([15.74131332, 19.61062568, 11.44888935, 17.00819787, 9.17285676])

In [44]: # Muestra los 5 primeros valores verdaderos de prueba

y_test[:5]

Out[44]: 37 14.7
109 19.8
31 11.9
89 16.7
66 9.5
Name: sales, dtype: float64

In [45]: from sklearn.metrics import mean_absolute_error,mean_squared_error

In [46]: MAE = mean_absolute_error(y_test,predicciones_test)

MSE = mean_squared_error(y_test,predicciones_test)
RMSE = np.sqrt(MSE)

In [47]: MAE

Out[47]: 1.2137457736144808

In [48]: MSE

Out[48]: 2.298716697886378

In [49]: RMSE

Out[49]: 1.5161519375993877

In [50]: df['sales'].mean()

Out[50]: 14.0225
REGRESION POLINOMIAL
In [90]: ejecutar_modelo(canal, X_train, y_train, X_test, y_test )

RMSE: 1.2580

In [51]: modelPol=LinearRegression()

In [62]: from sklearn.pipeline import make_pipeline

from sklearn.preprocessing import PolynomialFeatures

In [69]: canal=make_pipeline(PolynomialFeatures(2), LinearRegression())

In [91]: ejecutar_modelo(canal, X_train, y_train, X_test, y_test )

RMSE: 1.2580

In [92]: canal=make_pipeline(PolynomialFeatures(3), LinearRegression())

In [93]: ejecutar_modelo(canal, X_train, y_train, X_test, y_test )

RMSE: 0.5803

In [94]: canal=make_pipeline(PolynomialFeatures(4), LinearRegression())

ejecutar_modelo(canal, X_train, y_train, X_test, y_test )

RMSE: 1.2580

KNN
In [95]: from sklearn.neighbors import KNeighborsRegressor
In [96]: valores_k = [1,5,10,15]
for n in valores_k:
modelo_knn = KNeighborsRegressor(n_neighbors=n)
ejecutar_modelo(modelo_knn,X_train, y_train, X_test, y_test)

RMSE: 1.6936
RMSE: 1.5412
RMSE: 1.9077
RMSE: 2.2057

ARBOLES DE DECISIÓN
In [97]: from sklearn.tree import DecisionTreeRegressor
modelo_arbolDeci=DecisionTreeRegressor()

In [98]: ejecutar_modelo(modelo_arbolDeci,X_train, y_train, X_test, y_test )

RMSE: 1.0120

In [99]: modelo_arbolDeci.get_n_leaves()

Out[99]: 132

Conclusiones
Debido a los parametros de evaluación se puede concluir que el mejor algoritmo de predicción para este caso es la regre
sión polinomial en grado 3, para el algoritmo de bosques aleatorios la escazes de datos hace insignificante su uso, ya
que con el arbol de decisión se obtienen mejores resultados para este dataset reducido

In [ ]:

George Smith's Identification of Karkemish
No ratings yet
George Smith's Identification of Karkemish
31 pages
Wedding Ceremony Program
100% (1)
Wedding Ceremony Program
8 pages
Service Manual Contents Notice: For Use in Service Manual Form SB2199E SB2200E00 J U N - 1 9 9 9
100% (3)
Service Manual Contents Notice: For Use in Service Manual Form SB2199E SB2200E00 J U N - 1 9 9 9
592 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Logistics Regression
100% (1)
Logistics Regression
5 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
0.1 Guilherme Marthe - Boston House Pricing Challenge
100% (1)
0.1 Guilherme Marthe - Boston House Pricing Challenge
15 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
Analysing Ad Budget
100% (1)
Analysing Ad Budget
4 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Assignment 11
100% (1)
Assignment 11
7 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
Bagging, Boosting
100% (1)
Bagging, Boosting
32 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
10 pages
Lab7.ipynb - Colaboratory
100% (1)
Lab7.ipynb - Colaboratory
5 pages
IRIS BPNN - Ipynb - Colaboratory
100% (1)
IRIS BPNN - Ipynb - Colaboratory
4 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Book
100% (1)
Book
480 pages
HW1
100% (1)
HW1
8 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
100% (1)
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
16 pages
Vinee
100% (1)
Vinee
28 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Cloud Motion Tracking (1) (Read-Only)
100% (1)
Cloud Motion Tracking (1) (Read-Only)
10 pages
Actividad Semana 4 - Jupyter Notebook
100% (1)
Actividad Semana 4 - Jupyter Notebook
7 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
Project 1 - Radio Link Failure Prediction
100% (1)
Project 1 - Radio Link Failure Prediction
8 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Glass Classification
100% (2)
Glass Classification
3 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Neural Network Based Rainfall Prediction System
100% (1)
Neural Network Based Rainfall Prediction System
6 pages
9 Regression
100% (1)
9 Regression
14 pages
TP Regression
100% (1)
TP Regression
1 page
Code ExerciseModelSelection
100% (1)
Code ExerciseModelSelection
19 pages
XG Boost PDF
100% (1)
XG Boost PDF
3 pages
Presentation GPT 4
100% (1)
Presentation GPT 4
25 pages
Estiven - Hurtado.Santos - Regresión Con Varios Algoritmos
No ratings yet
Estiven - Hurtado.Santos - Regresión Con Varios Algoritmos
16 pages
Import As Import As Import As: "Default - CSV"
No ratings yet
Import As Import As Import As: "Default - CSV"
9 pages
Fx1s Fx1n Fx2n (C) Fx3u Beginner's Manual
No ratings yet
Fx1s Fx1n Fx2n (C) Fx3u Beginner's Manual
0 pages
Word 2016 Preparation Test MOS
100% (1)
Word 2016 Preparation Test MOS
55 pages
American Literature
No ratings yet
American Literature
3 pages
Junior Engineer MP Poorva
No ratings yet
Junior Engineer MP Poorva
25 pages
BSEE's Proposed Production Systems Rule
No ratings yet
BSEE's Proposed Production Systems Rule
149 pages
Stc-Steering Control System
No ratings yet
Stc-Steering Control System
494 pages
Theatrical Convention and Audience Response in Early Modern Drama (J Lopez)
100% (3)
Theatrical Convention and Audience Response in Early Modern Drama (J Lopez)
249 pages
Paper-2 Set-B Key
No ratings yet
Paper-2 Set-B Key
14 pages
H&MT Lab Manual.
No ratings yet
H&MT Lab Manual.
47 pages
Action Verbs - Definition, List & Examples
No ratings yet
Action Verbs - Definition, List & Examples
11 pages
Session 18
No ratings yet
Session 18
16 pages
pg194 Axi Bridge Pcie Gen3 en Us 3.0
No ratings yet
pg194 Axi Bridge Pcie Gen3 en Us 3.0
203 pages
BAMA 1101, CH3, Mark Up Price and Value Added Tax
No ratings yet
BAMA 1101, CH3, Mark Up Price and Value Added Tax
23 pages
Viral Hepatitis
100% (4)
Viral Hepatitis
40 pages
Chapter 26 Final
No ratings yet
Chapter 26 Final
3 pages
REN - General PLL Loop Filter Design Fine Tuning - APN - 20230228
No ratings yet
REN - General PLL Loop Filter Design Fine Tuning - APN - 20230228
18 pages
MITSDE Admission Form
No ratings yet
MITSDE Admission Form
3 pages
Basic B.SC Nursing - 060620
No ratings yet
Basic B.SC Nursing - 060620
6 pages
Nitin Jamboree Front
No ratings yet
Nitin Jamboree Front
10 pages
" Study On The Topic Dna Fingerprinting": A Biology Project
100% (1)
" Study On The Topic Dna Fingerprinting": A Biology Project
14 pages
Assignment 1 Data Mining
No ratings yet
Assignment 1 Data Mining
1 page
How-To Reverse Shipment & Delivery
No ratings yet
How-To Reverse Shipment & Delivery
10 pages
Dreams Listening 1 (Reading Pro)
No ratings yet
Dreams Listening 1 (Reading Pro)
11 pages
Focus4 2E Vocabulary Quiz Unit5 GroupA 1kol
No ratings yet
Focus4 2E Vocabulary Quiz Unit5 GroupA 1kol
2 pages
Galileo Basic Commands
No ratings yet
Galileo Basic Commands
9 pages
Anth 1103-001, 002 Course Outline (Summer 2024)
No ratings yet
Anth 1103-001, 002 Course Outline (Summer 2024)
7 pages
M1 Lesson 1 - Introduction To Biopharmaceutics and Pharmacokinetics
No ratings yet
M1 Lesson 1 - Introduction To Biopharmaceutics and Pharmacokinetics
115 pages