10分鐘python圖表繪製 | seaborn入門（四）：回歸模型lmplot

01-23

Seaborn介紹

官方鏈接：Seaborn: statistical data visualization

Seaborn是一種基於matplotlib的圖形可視化python libraty。它提供了一種高度互動式界面，便於用戶能夠做出各種有吸引力的統計圖表。

Seaborn其實是在matplotlib的基礎上進行了更高級的API封裝，從而使得作圖更加容易，在大多數情況下使用seaborn就能做出很具有吸引力的圖，而使用matplotlib就能製作具有更多特色的圖。應該把Seaborn視為matplotlib的補充，而不是替代物。同時它能高度兼容numpy與pandas數據結構以及scipy與statsmodels等統計模式。掌握seaborn能很大程度幫助我們更高效的觀察數據與圖表，並且更加深入了解它們。

安裝seaborn

利用pip安裝

pip install seaborn

2. 在Anaconda環境下，打開prompt

conda install seaborn

lmplot

seaborn.lmplot - seaborn 0.7.1 documentation

lmplot是一種集合基礎繪圖與基於數據建立回歸模型的繪圖方法。旨在創建一個方便擬合數據集回歸模型的繪圖方法，利用"hue"、"col"、"row"參數來控制繪圖變數。

同時可以使用模型參數來調節需要擬合的模型：order、logistic、lowess、robust、logx。

seaborn.lmplot(x, y, data, hue=None, col=None, row=None, palette=None, col_wrap=None, size=5, aspect=1, markers="o", sharex=True, sharey=True, hue_order=None, col_order=None, row_order=None, legend=True, legend_out=True, x_estimator=None, x_bins=None, x_ci="ci", scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, order=1, logistic=False, lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=False, x_jitter=None, y_jitter=None, scatter_kws=None, line_kws=None)

Common Parameters:

hue, col, row : strings #定義數據子集的變數，並在不同的圖像子集中繪製

Variables that define subsets of the data, which will be drawn on separate facets in the grid. See the *_order parameters to control the order of levels of this variable.

size : scalar, optional #定義子圖的高度

Height (in inches) of each facet. See also: aspect.

markers : matplotlib marker code or list of marker codes, optional #定義散點的圖標

Markers for the scatterplot. If a list, each marker in the list will be used for each level of the hue variable.

col_wrap : int, optional #設置每行子圖數量

「Wrap」 the column variable at this width, so that the column facets span multiple rows. Incompatible with a row facet.

order : int, optional #多項式回歸，設定指數

If order is greater than 1, use numpy.polyfit to estimate a polynomial regression.

logistic : bool, optional #邏輯回歸

If True, assume that y is a binary variable and use statsmodels to estimate a logistic regression model. Note that this is substantially more computationally intensive than linear regression, so you may wish to decrease the number of bootstrap resamples (n_boot) or set ci to None.

logx : bool, optional #轉化為log(x)

If True, estimate a linear regression of the form y ~ log(x), but plot the scatterplot and regression model in the input space. Note that x must be positive for this to work.

Senior Example Ⅰ for Practice

import seaborn as snssns.set_style("whitegrid")tips = sns.load_dataset("tips") #載入自帶數據集#研究小費tips與總消費金額total_bill在吸煙與不吸煙人之間的關係g = sns.lmplot(x="total_bill", y="tip", hue="smoker", data=tips,palette="Set1")

通過回歸模型發現total_bill=20為分界點，不吸煙者的小費高於吸煙者

#研究在不同星期下，消費總額與消費的回歸關係，col|hue控制子圖不同的變數day，col_wrap控制每行子圖數量，size控制子圖高度g = sns.lmplot(x="total_bill", y="tip", col="day", hue="day",data=tips, col_wrap=2, size=3)

#繼續研究pokemon數據集import pandas as pdimport seaborn as snspokemon=pd.read_csv("H:/zhihu/Pokemon.csv")pokemon.head()

#觀察每一代攻擊與防禦的分布，利用二次多項式逼近sns.lmplot(x="Defense", y="Attack",data=pokemon,col="Generation", hue="Generation",col_wrap=3, size=3,order=2)

#繼續在同一圖中觀察不同代的sp.Atk,Sp.Def線性關係sns.lmplot(x="Sp. Atk", y="Sp. Def", data=pokemon, hue="Generation", size=5,order=1)

sp.Atk,Sp.Def線性相關性不高導致圖像有點浮誇模糊

[download:pokemon數據集] 密碼：4zma

更多關於python數據分析與挖掘內容請關注我的專欄：數與碼
或者關注我的知乎賬號：知行
才疏學淺，希望觀眾老爺們多提意見，謝謝
專欄持續更新中，求贊求關注