0% found this document useful (0 votes)

8 views

Estimating regression fits — seaborn 0.13.2 documentation

The document provides an overview of estimating regression fits using the seaborn library, focusing on visualizing relationships between quantitative variables through linear regression models. It explains the use of functions like regplot() and lmplot() for drawing regression plots, and discusses fitting different types of models, including polynomial and logistic regression. Additionally, it covers how to condition on other variables and integrate regression into more complex visualizations, emphasizing the importance of exploratory data analysis through visualization.

Uploaded by

higissa3

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Estimating regression fits — seaborn 0.13.2 documentation

Uploaded by

higissa3

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

19/04/2024 Estimating regression fits — seaborn 0.13.

2 documentation

Estimating regression fits

Many datasets contain multiple quantitative variables, and the goal of an analysis is often to relate those variables to each other. We previously
discussed functions that can accomplish this by showing the joint distribution of two variables. It can be very helpful, though, to use statistical models to
estimate a simple relationship between two noisy sets of observations. The functions discussed in this chapter will do so through the common
framework of linear regression.

In the spirit of Tukey, the regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during
exploratory data analyses. That is to say that seaborn is not itself a package for statistical analysis. To obtain quantitative measures related to the fit of
regression models, you should use statsmodels. The goal of seaborn, however, is to make exploring a dataset through visualization quick and easy, as
doing so is just as (if not more) important than exploring a dataset through tables of statistics.

Functions for drawing linear regression models

The two functions that can be used to visualize a linear fit are regplot() and lmplot() .

In the simplest invocation, both functions draw a scatterplot of two variables, x and y , and then fit the regression model y ~ x and plot the resulting
regression line and a 95% confidence interval for that regression:

tips = sns.load_dataset("tips")
sns.regplot(x="total_bill", y="tip", data=tips);

sns.lmplot(x="total_bill", y="tip", data=tips);

https://seaborn.pydata.org/tutorial/regression.html 1/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

These functions draw similar plots, but regplot() is an axes-level function, and lmplot() is a figure-level function. Additionally, regplot() accepts
the x and y variables in a variety of formats including simple numpy arrays, pandas.Series objects, or as references to variables in a
pandas.DataFrame object passed to data . In contrast, lmplot() has data as a required parameter and the x and y variables must be specified as
strings. Finally, only lmplot() has hue as a parameter.

The core functionality is otherwise similar, though, so this tutorial will focus on lmplot() :.

It’s possible to fit a linear regression when one of the variables takes discrete values, however, the simple scatterplot produced by this kind of dataset is
often not optimal:

sns.lmplot(x="size", y="tip", data=tips);

One option is to add some random noise (“jitter”) to the discrete values to make the distribution of those values more clear. Note that jitter is applied
only to the scatterplot data and does not influence the regression line fit itself:

sns.lmplot(x="size", y="tip", data=tips, x_jitter=.05);

A second option is to collapse over the observations in each discrete bin to plot an estimate of central tendency along with a confidence interval:

sns.lmplot(x="size", y="tip", data=tips, x_estimator=np.mean);

https://seaborn.pydata.org/tutorial/regression.html 2/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

Fitting different kinds of models

The simple linear regression model used above is very simple to fit, however, it is not appropriate for some kinds of datasets. The Anscombe’s quartet
dataset shows a few examples where simple linear regression provides an identical estimate of a relationship where simple visual inspection clearly
shows differences. For example, in the first case, the linear regression is a good model:

anscombe = sns.load_dataset("anscombe")

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),

ci=None, scatter_kws={"s": 80});

The linear relationship in the second dataset is the same, but the plot clearly shows that this is not a good model:

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),

ci=None, scatter_kws={"s": 80});

https://seaborn.pydata.org/tutorial/regression.html 3/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

In the presence of these kind of higher-order relationships, lmplot() and regplot() can fit a polynomial regression model to explore simple kinds of
nonlinear trends in the dataset:

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),

order=2, ci=None, scatter_kws={"s": 80});

A different problem is posed by “outlier” observations that deviate for some reason other than the main relationship under study:

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),

ci=None, scatter_kws={"s": 80});

https://seaborn.pydata.org/tutorial/regression.html 4/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

In the presence of outliers, it can be useful to fit a robust regression, which uses a different loss function to downweight relatively large residuals:

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),

robust=True, ci=None, scatter_kws={"s": 80});

When the y variable is binary, simple linear regression also “works” but provides implausible predictions:

tips["big_tip"] = (tips.tip / tips.total_bill) > .15

sns.lmplot(x="total_bill", y="big_tip", data=tips,
y_jitter=.03);

https://seaborn.pydata.org/tutorial/regression.html 5/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

The solution in this case is to fit a logistic regression, such that the regression line shows the estimated probability of y = 1 for a given value of x :

sns.lmplot(x="total_bill", y="big_tip", data=tips,

logistic=True, y_jitter=.03);

Note that the logistic regression estimate is considerably more computationally intensive (this is true of robust regression as well). As the confidence
interval around the regression line is computed using a bootstrap procedure, you may wish to turn this off for faster iteration (using ci=None ).

An altogether different approach is to fit a nonparametric regression using a lowess smoother. This approach has the fewest assumptions, although it is
computationally intensive and so currently confidence intervals are not computed at all:

sns.lmplot(x="total_bill", y="tip", data=tips,

lowess=True, line_kws={"color": "C1"});

https://seaborn.pydata.org/tutorial/regression.html 6/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

The residplot() function can be a useful tool for checking whether the simple regression model is appropriate for a dataset. It fits and removes a
simple linear regression and then plots the residual values for each observation. Ideally, these values should be randomly scattered around y = 0 :

sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),

scatter_kws={"s": 80});

If there is structure in the residuals, it suggests that simple linear regression is not appropriate:

sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),

scatter_kws={"s": 80});

https://seaborn.pydata.org/tutorial/regression.html 7/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

Conditioning on other variables

The plots above show many ways to explore the relationship between a pair of variables. Often, however, a more interesting question is “how does the
relationship between these two variables change as a function of a third variable?” This is where the main differences between regplot() and
lmplot() appear. While regplot() always shows a single relationship, lmplot() combines regplot() with FacetGrid to show multiple fits using
hue mapping or faceting.

The best way to separate out a relationship is to plot both levels on the same axes and to use color to distinguish them:

sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips);

Unlike relplot() , it’s not possible to map a distinct variable to the style properties of the scatter plot, but you can redundantly code the hue variable
with marker shape:

sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips,

markers=["o", "x"], palette="Set1");

To add another variable, you can draw multiple “facets” with each level of the variable appearing in the rows or columns of the grid:

sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", data=tips);

https://seaborn.pydata.org/tutorial/regression.html 8/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

sns.lmplot(x="total_bill", y="tip", hue="smoker",

col="time", row="sex", data=tips, height=3);

Plotting a regression in other contexts

A few other seaborn functions use regplot() in the context of a larger, more complex plot. The first is the jointplot() function that we introduced in
the distributions tutorial. In addition to the plot styles previously discussed, jointplot() can use regplot() to show the linear regression fit on the
joint axes by passing kind="reg" :

sns.jointplot(x="total_bill", y="tip", data=tips, kind="reg");

https://seaborn.pydata.org/tutorial/regression.html 9/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

Using the pairplot() function with kind="reg" combines regplot() and PairGrid to show the linear relationship between variables in a dataset.
Take care to note how this is different from lmplot() . In the figure below, the two axes don’t show the same relationship conditioned on two levels of a
third variable; rather, PairGrid() is used to show multiple relationships between different pairings of the variables in a dataset:

sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],

height=5, aspect=.8, kind="reg");

Conditioning on an additional categorical variable is built into both of these functions using the hue parameter:

sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],

hue="smoker", height=5, aspect=.8, kind="reg");

https://seaborn.pydata.org/tutorial/regression.html 10/11
19/04/2024 Estimating regression fits — seaborn 0.13.2 documentation

v0.13.2 Archive
Created using Sphinx and the PyData Theme.

https://seaborn.pydata.org/tutorial/regression.html 11/11

Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Rpms-Commitment & Target Setting
78% (9)
Rpms-Commitment & Target Setting
47 pages
HQ Orchestral Soundfont v3.0 Changelog
No ratings yet
HQ Orchestral Soundfont v3.0 Changelog
4 pages
ML Answers Updated
No ratings yet
ML Answers Updated
13 pages
Regularization & Gradient Descent
No ratings yet
Regularization & Gradient Descent
18 pages
ML Practical File
100% (2)
ML Practical File
43 pages
Introduction To Datascience (R20DS501)
No ratings yet
Introduction To Datascience (R20DS501)
19 pages
Chapter_2_Linear and Logistic Regression
No ratings yet
Chapter_2_Linear and Logistic Regression
34 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
Visualizing Categorical Data — Seaborn 0.13.2 Documentation
No ratings yet
Visualizing Categorical Data — Seaborn 0.13.2 Documentation
14 pages
Lasso Regularization of Generalized Linear Models - MATLAB & Simulink
No ratings yet
Lasso Regularization of Generalized Linear Models - MATLAB & Simulink
14 pages
Statistics, Statistical Modelling and Data analytics_practicalfile_sj
No ratings yet
Statistics, Statistical Modelling and Data analytics_practicalfile_sj
23 pages
Experiment - 8
No ratings yet
Experiment - 8
3 pages
ML Lab - Sukanya Raja
No ratings yet
ML Lab - Sukanya Raja
23 pages
Mass Volume Curve
No ratings yet
Mass Volume Curve
29 pages
Forecasting Assignment2023
No ratings yet
Forecasting Assignment2023
3 pages
Manifold Learning Algorithms
No ratings yet
Manifold Learning Algorithms
17 pages
Lab 1
No ratings yet
Lab 1
6 pages
Statistical Estimation and Error Bars — Seaborn 0.13.2 Documentation
No ratings yet
Statistical Estimation and Error Bars — Seaborn 0.13.2 Documentation
4 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
Ridge and Lasso Regression in Python
No ratings yet
Ridge and Lasso Regression in Python
18 pages
Notes 3 - Linear Regression
No ratings yet
Notes 3 - Linear Regression
6 pages
DS Unit 2 Essay Answers
No ratings yet
DS Unit 2 Essay Answers
17 pages
Perspectives On System Identification
100% (1)
Perspectives On System Identification
13 pages
Polynomial Interpolation SICLAB
No ratings yet
Polynomial Interpolation SICLAB
16 pages
Lab#10 Ai
No ratings yet
Lab#10 Ai
3 pages
Data Cleansing,Linear Regression,Gradient Descent Algorithm in ML
No ratings yet
Data Cleansing,Linear Regression,Gradient Descent Algorithm in ML
11 pages
UNIT I Notes
No ratings yet
UNIT I Notes
23 pages
UNIT I Notes-1
No ratings yet
UNIT I Notes-1
18 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
Linear Models - Numeric Prediction
No ratings yet
Linear Models - Numeric Prediction
7 pages
Whole ML PDF 1614408656
100% (1)
Whole ML PDF 1614408656
214 pages
Final Answer Bank
No ratings yet
Final Answer Bank
10 pages
Intro to Linear and Logistic Reg
No ratings yet
Intro to Linear and Logistic Reg
5 pages
Lab-10-Forest-Regression
No ratings yet
Lab-10-Forest-Regression
5 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
Group Assignment - Predictive Modelling
No ratings yet
Group Assignment - Predictive Modelling
23 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Introduction To Predictive Models
No ratings yet
Introduction To Predictive Models
5 pages
Statistical Models in R
No ratings yet
Statistical Models in R
18 pages
Data Mining Versus Statistical Tools For Value at Risk Estimation
No ratings yet
Data Mining Versus Statistical Tools For Value at Risk Estimation
9 pages
Streaming Algorithms For Data in Motion
No ratings yet
Streaming Algorithms For Data in Motion
11 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
ACPusingR
No ratings yet
ACPusingR
25 pages
Linear Regression Assignment_Subjective
No ratings yet
Linear Regression Assignment_Subjective
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
9 pages
Week 3 - 1-Linear Regression
No ratings yet
Week 3 - 1-Linear Regression
8 pages
Nonlinear Systems Scilab
No ratings yet
Nonlinear Systems Scilab
12 pages
presentation group 4
No ratings yet
presentation group 4
6 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Unit3__R
No ratings yet
Unit3__R
19 pages
Models PDF
No ratings yet
Models PDF
86 pages
ML Combined
No ratings yet
ML Combined
254 pages
Basic Trend Line
No ratings yet
Basic Trend Line
12 pages
Module 2 Lab Activity - Regression
No ratings yet
Module 2 Lab Activity - Regression
9 pages
Comparison of Wavelet Network and
No ratings yet
Comparison of Wavelet Network and
14 pages
Seaborn
No ratings yet
Seaborn
7 pages
Econometrics
No ratings yet
Econometrics
6 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
EU EV WW WD6000T English 210226
No ratings yet
EU EV WW WD6000T English 210226
33 pages
Audit of Shareholders Equity
No ratings yet
Audit of Shareholders Equity
2 pages
Thriftstore
No ratings yet
Thriftstore
3 pages
BA BE 370 - 990 Englisch - März 2009 E
No ratings yet
BA BE 370 - 990 Englisch - März 2009 E
22 pages
Application For Amateur Radio License Form CA F FSM 02 1
No ratings yet
Application For Amateur Radio License Form CA F FSM 02 1
6 pages
Assessment Task 2
No ratings yet
Assessment Task 2
20 pages
Umer CV
No ratings yet
Umer CV
1 page
Value Investing: Review of Warren Buffett's Investment Philosophy and Practice
No ratings yet
Value Investing: Review of Warren Buffett's Investment Philosophy and Practice
13 pages
Autotrol: Logix™ 764 Control Performa CV Series Valves (273, 278)
No ratings yet
Autotrol: Logix™ 764 Control Performa CV Series Valves (273, 278)
44 pages
Indian Air Force: Air Force Common Admission Test Admit Card - Afcat 02/2021
No ratings yet
Indian Air Force: Air Force Common Admission Test Admit Card - Afcat 02/2021
7 pages
Airport Rooftop Club Luxurious Rooms For Guests House Guest Reviews
No ratings yet
Airport Rooftop Club Luxurious Rooms For Guests House Guest Reviews
1 page
Technical Service Information: 1996 & Later Dodge/Chrysler Vehicles With 41Te/42Le Transaxles
No ratings yet
Technical Service Information: 1996 & Later Dodge/Chrysler Vehicles With 41Te/42Le Transaxles
4 pages
List of Lumpsum Items in CSR 1998 S.No. Material Unit Rate
No ratings yet
List of Lumpsum Items in CSR 1998 S.No. Material Unit Rate
2 pages
Illegal Logging
No ratings yet
Illegal Logging
67 pages
M09. WIKA Pressure Gauge
No ratings yet
M09. WIKA Pressure Gauge
2 pages
Week 2&3 - Topic 3 - VIMOKRAPI - SPATRES
No ratings yet
Week 2&3 - Topic 3 - VIMOKRAPI - SPATRES
18 pages
Adhesive Glue
No ratings yet
Adhesive Glue
14 pages
Parker CPI™ Instrumentation Tube Fittings Story
No ratings yet
Parker CPI™ Instrumentation Tube Fittings Story
12 pages
Almodiente Thesis 1 4
No ratings yet
Almodiente Thesis 1 4
74 pages
Tank 72 Manual
No ratings yet
Tank 72 Manual
32 pages
Flight Performance-Part 1 (Steady Level Flight) Min and Max Speed
No ratings yet
Flight Performance-Part 1 (Steady Level Flight) Min and Max Speed
5 pages
Vwap 20131113
No ratings yet
Vwap 20131113
0 pages
Nokia 1661 User Guide
No ratings yet
Nokia 1661 User Guide
36 pages
Master Thesis Sample1
No ratings yet
Master Thesis Sample1
16 pages
PLSQL Semestar1 MidTerm Sa Resenjima
No ratings yet
PLSQL Semestar1 MidTerm Sa Resenjima
9 pages
Your Browser Is No Longer Supported: Accept
No ratings yet
Your Browser Is No Longer Supported: Accept
3 pages
Slideplayer Com Slide 10294885
No ratings yet
Slideplayer Com Slide 10294885
15 pages
PTW Work Leader Assessment Result (22 Feb 2023) : Dear All
No ratings yet
PTW Work Leader Assessment Result (22 Feb 2023) : Dear All
2 pages

Estimating regression fits — seaborn 0.13.2 documentation

Uploaded by

Estimating regression fits — seaborn 0.13.2 documentation

Uploaded by

19/04/2024 Estimating regression fits — seaborn 0.13.

Estimating regression fits

Functions for drawing linear regression models

sns.lmplot(x="total_bill", y="tip", data=tips);

sns.lmplot(x="size", y="tip", data=tips);

sns.lmplot(x="size", y="tip", data=tips, x_jitter=.05);

sns.lmplot(x="size", y="tip", data=tips, x_estimator=np.mean);

Fitting different kinds of models

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),

sns.lmplot(x="x", y="y", data=anscombe.query("dataset == 'III'"),

tips["big_tip"] = (tips.tip / tips.total_bill) > .15

sns.lmplot(x="total_bill", y="big_tip", data=tips,

sns.lmplot(x="total_bill", y="tip", data=tips,

sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'I'"),

sns.residplot(x="x", y="y", data=anscombe.query("dataset == 'II'"),

Conditioning on other variables

sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips);

sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips,

sns.lmplot(x="total_bill", y="tip", hue="smoker", col="time", data=tips);

sns.lmplot(x="total_bill", y="tip", hue="smoker",

Plotting a regression in other contexts

sns.jointplot(x="total_bill", y="tip", data=tips, kind="reg");

sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],

sns.pairplot(tips, x_vars=["total_bill", "size"], y_vars=["tip"],

© Copyright 2012-2024, Michael Waskom.

You might also like