一.linear_model
1.简介:
该模块实现了各种"线性模型"(linear models)
2.线性分类器(Linear classifiers):
"逻辑回归分类器"(Logistic Regression classifier):class sklearn.linear_model.LogisticRegression([penalty='l2',dual=False,tol=0.0001,C=1.0,fit_intercept=True,intercept_scaling=1,class_weight=None,random_state=None,solver='lbfgs',max_iter=100,multi_class='auto',verbose=0,warm_start=False,n_jobs=None,l1_ratio=None])
#参数说明:
penalty:指定使用的范数惩罚正则项;为"L1"/"L2"/"elasticnet"/"none"
dual:指定是否进行对偶化;为bool
tol:指定最小误差(若误差小于该值,则停止);为float
C:指定"正则化强度"(regularization strength)的倒数;为float>0
#即范数惩罚正则化项前系数的倒数
fit_intercept:指定是否估计截距;为bool
intercept_scaling:为float
Useful only when the solver 'liblinear' is used and self.fit_intercept is set to True. In this case, x becomes
[x, self.intercept_scaling], i.e. a 'synthetic' feature with constant value equal to intercept_scaling is
appended to the instance vector. The intercept becomes intercept_scaling * synthetic_feature_weight
#注意:the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect
# of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be
# increased
class_weight:指定各个类别的权重;为dict/"balanced"
random_state:指定使用的随机数;为int/RandomState instance/None
solver:指定用于优化的算法;为"newton-cg"/"lbfgs"/"liblinear"/"sag"/"saga"
max_iter:指定最大迭代次数;为int
multi_class:指定如何处理多类别分类问题;为"auto"/"ovr"/"multinomial"
verbose:指定输出信息的冗余度;为int/bool
warm_start:指定是否启用热启动;为bool
n_jobs:指定用于并行计算的CPU核心数;为int
l1_ratio:指定"弹性网络混合参数"(Elastic-Net mixing parameter);为float
#用于控制L1/L2惩罚项的占比,l1_ratio=0相当于penalty="l2",而l1_ratio=1相当于penalty="l1"
######################################################################################################################
进行了"交叉验证"(cross-validation)的逻辑回归分类器:class sklearn.linear_model.LogisticRegressionCV([Cs=10,fit_intercept=True,cv=None,dual=False,penalty='l2',scoring=None,solver='lbfgs',tol=0.0001,max_iter=100,class_weight=None,n_jobs=None,verbose=0,refit=True,intercept_scaling=1.0,multi_class='auto',random_state=None,l1_ratios=None])
#参数说明:其他参数同class sklearn.linear_model.LogisticRegression()
Cs:功能同class sklearn.linear_model.LogisticRegression()的参数C;为int/float list
cv:指定交叉验证的拆分策略;为int/cross-validation generator
scoring:指定如何打分;为str/callable
refit:If set to True,the scores are averaged across all folds,and the coefs and the C that corresponds to the best score is taken,and a final refit is done using these parameters
Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged
######################################################################################################################
"被动攻击性分类器"(Passive Aggressive Classifier;PA Classifier):class sklearn.linear_model.PassiveAggressiveClassifier([C=1.0,fit_intercept=True,max_iter=1000,tol=0.001,early_stopping=False,validation_fraction=0.1,n_iter_no_change=5,shuffle=True,verbose=0,loss='hinge',n_jobs=None,random_state=None,warm_start=False,class_weight=None,average=False])
#参数说明:其他参数同class sklearn.linear_model.LogisticRegression()
C:指定最大步长(正则化);为float
early_stopping:指定是否使用验证提前停止终止训练;为bool
validation_fraction:指定预留的作为提前停止的验证集的数据比例;为0<float<1
n_iter_no_change:指定提前停止前精度没有提升的连续迭代次数;为int
shuffle:指定在每次迭代后是否重新打乱数据;为bool
loss:指定损失函数;为"hinge"/"squared_hinge"
average:When set to True,computes the averaged SGD weights and stores the result in the coef_ attribute
If set to an int greater than 1,averaging will begin once the total number of samples seen reaches average
So average=10 will begin averaging after seeing 10 samples
######################################################################################################################
"感知机"(Perceptron):class sklearn.linear_model.Perceptron([penalty=None,alpha=0.0001,l1_ratio=0.15,fit_intercept=True,max_iter=1000,tol=0.001,shuffle=True,verbose=0,eta0=1.0,n_jobs=None,random_state=0,early_stopping=False,validation_fraction=0.1,n_iter_no_change=5,class_weight=None,warm_start=False])
#参数说明:其他参数同class sklearn.linear_model.PassiveAggressiveClassifier()
alpha:指定正则化强度;为float
#即正则化项的系数
eta0:指定更新时乘以的常数;为float
######################################################################################################################
"岭回归分类器"(Ridge regression Classifier):class sklearn.linear_model.RidgeClassifier([alpha=1.0,fit_intercept=True,normalize=False,copy_X=True,max_iter=None,tol=0.001,class_weight=None,solver='auto',random_state=None])
#参数说明:solver同同class sklearn.linear_model.LogisticRegression()
# 其他参数同class sklearn.linear_model.Perceptron()
normalize:指定是否先对数据进行归一化;为bool
#若为True,将进行如下变换:(X-mean)/l2-norm
#当fit_intercept=false时忽略该参数
copy_X:指定是否复制数据;为bool
######################################################################################################################
进行了交叉验证的岭回归分类器:class sklearn.linear_model.RidgeClassifierCV([alphas=(0.1,1.0,10.0),fit_intercept=True,normalize=False,scoring=None,cv=None,class_weight=None,store_cv_values=False])
#参数说明:scoring同class sklearn.linear_model.LogisticRegressionCV()
# cv同class sklearn.linear_model.LogisticRegressionCV()
# 其他参数同class sklearn.linear_model.RidgeClassifier()
alphas:指定正则化项的系数;为1×n_alphas ndarray
store_cv_values:指定是否存储每个alpha对应的交叉验证值;为bool
#仅与cv=None兼容
######################################################################################################################
基于"随机梯度下降"(stochastic gradient descent;SGD)的线性分类器:class sklearn.linear_model.SGDClassifier([loss='hinge',penalty='l2',alpha=0.0001,l1_ratio=0.15,fit_intercept=True,max_iter=1000,tol=0.001,shuffle=True,verbose=0,epsilon=0.1,n_jobs=None,random_state=None,learning_rate='optimal',eta0=0.0,power_t=0.5,early_stopping=False,validation_fraction=0.1,n_iter_no_change=5,class_weight=None,warm_start=False,average=False])
#参数说明:alpha/eta0同class sklearn.linear_model.Perceptron()
# 其他参数同class sklearn.linear_model.LogisticRegression()/PassiveAggressiveClassifier()
loss:指定损失函数;为"hinge"/"log"/"modified_huber"/"squared_hinge"/"perceptron"/"squared_loss"/"huber"/"epsilon_insensitive"/"squared_epsilon_insensitive"
epsilon:指定"ε-不敏感损失函数"(epsilon-insensitive loss functions)中的阈值ε;为float
#当预测和正确值间的差异小于该值,那么该差异将被忽略
learning_rate:指定学习率;为"constant"/"optimal"/"invscaling"/"adaptive"
eta0:指定初始学习率;为float
power_t:指定"逆标度学习率"(inverse scaling learning rate)的指数;为float
3.线性回归器
(1)经典线性回归器(Classical linear regressors):
"普通最小二乘法线性回归"(Ordinary least squares Linear Regression):class sklearn.linear_model.LinearRegression([fit_intercept=True,normalize=False,copy_X=True,n_jobs=None,positive=False])
#参数说明:n_jobs同class sklearn.linear_model.LogisticRegression()
# 其他参数同class sklearn.linear_model.RidgeClassifier()
positive:指定是否要求系数全部为正值;为bool
#该参数仅被dense arrays支持
######################################################################################################################
带有"L2规范化"(l2 regularization)的"线性最小二乘"(Linear least squares):class sklearn.linear_model.Ridge([alpha=1.0,fit_intercept=True,normalize=False,copy_X=True,max_iter=None,tol=0.001,solver='auto',random_state=None])
#参数说明:同class sklearn.linear_model.RidgeClassifier()
######################################################################################################################
带有"交叉验证"(cross-validation)的"岭回归"(Ridge regression):class sklearn.linear_model.RidgeCV([alphas=(0.1,1.0,10.0),fit_intercept=True,normalize=False,scoring=None,cv=None,gcv_mode=None,store_cv_values=False,alpha_per_target=False])
#参数说明:其他参数同class sklearn.linear_model.RidgeClassifierCV()
gcv_mode:指定执行留一法交叉验证时使用哪种策略;为"auto"/"svd"/"eigen"
alpha_per_target:指定是否对每个目标分别优化alpha;为bool
######################################################################################################################
使用SGD来最小化"正则经验损失"(regularized empirical loss)的线性模型:class sklearn.linear_model.SGDRegressor([loss='squared_loss',penalty='l2',alpha=0.0001,l1_ratio=0.15,fit_intercept=True,max_iter=1000,tol=0.001,shuffle=True,verbose=0,epsilon=0.1,random_state=None,learning_rate='invscaling',eta0=0.01,power_t=0.25,early_stopping=False,validation_fraction=0.1,n_iter_no_change=5,warm_start=False,average=False])
#参数说明:同class sklearn.linear_model.SGDClassifier()
(2)带有变量选择的线性回归器(Linear regressors with variable selection):
以"L1范数"(L1 priors)和"L2范数"(L2 priors)的混合作为"正则化器"(regularizer)的"弹性网络回归模型"(Elastic Net Regression model):class sklearn.linear_model.ElasticNet([alpha=1.0,l1_ratio=0.5,fit_intercept=True,normalize=False,precompute=False,max_iter=1000,copy_X=True,tol=0.0001,warm_start=False,positive=False,random_state=None,selection='cyclic'])
#参数说明:alpha/normalize/copy_X同class sklearn.linear_model.RidgeClassifier()
# positive同class sklearn.linear_model.LinearRegression()
# 其他参数同class sklearn.linear_model.LogisticRegression()
precompute:指定是否使用预先计算的"格拉姆矩阵"(Gram matrix)来加快计算速度;为bool/n_features×n_features array-like
selection:If set to "random",a random coefficient is updated every iteration
If set to "cyclic",features are looped over sequentially
#设为"random"通常会使收敛更快,尤其是当tol>1e-4时
######################################################################################################################
沿"正则化路径"(regularization path)进行迭代拟合的弹性网络回归模型:class sklearn.linear_model.ElasticNetCV([l1_ratio=0.5,eps=0.001,n_alphas=100,alphas=None,fit_intercept=True,normalize=False,precompute='auto',max_iter=1000,tol=0.0001,cv=None,copy_X=True,verbose=0,n_jobs=None,positive=False,random_state=None,selection='cyclic'])
#参数说明:alphas同class sklearn.linear_model.RidgeClassifierCV()
# cv/verbose/n_jobs同class sklearn.linear_model.LogisticRegressionCV()
# 其他参数同class sklearn.linear_model.class sklearn.linear_model.ElasticNet()
eps:指定路径长度;为float
n_alphas:指定沿正则化路径的alpha个数(用于每个l1_ratio);为int
######################################################################################################################
"最小角回归模型"(Least Angle Regression model):class sklearn.linear_model.Lars([fit_intercept=True,verbose=False,normalize=True,precompute='auto',n_nonzero_coefs=500,eps=2.220446049250313e-16,copy_X=True,fit_path=True,jitter=None,random_state=None])
#参数说明:其他参数同class sklearn.linear_model.ElasticNetCV()
n_nonzero_coefs:指定非零系数的目标数量;为int/np.inf
eps:指定"乔列斯基对角线因子"(Cholesky diagonal factors)的计算中的"机器精度正则化"(machine-precision regularizatio);为float
#对"病态系统"(ill-conditioned systems)应增加该值
fit_path:指定是否存储完整路径;为bool
#如果要求1个大问题或多个目标的解决方案,设置fit_path=False会导致加速,尤其是对于较小的alpha
jitter:指定要添加到y的"均匀噪声参数"(uniform noise parameter)的上限(以满足模型的"assumption of one-at-a-time computations");为float
######################################################################################################################
带有"交叉验证"(cross-validation)的最小角回归模型:class sklearn.linear_model.LarsCV([fit_intercept=True,verbose=False,max_iter=500,normalize=True,precompute='auto',cv=None,max_n_alphas=1000,n_jobs=None,eps=2.220446049250313e-16,copy_X=True])
#参数说明:eps同class sklearn.linear_model.Lars()
# 其他参数同class sklearn.linear_model.ElasticNetCV()
max_n_alphas:指定路径上用于计算交叉验证中残差的最大点数;为int
######################################################################################################################
以L1范数作为正则化器的"套索模型"(Least absolute shrinkage and selection operator model;LASSO model):class sklearn.linear_model.Lasso([alpha=1.0,fit_intercept=True,normalize=False,precompute=False,copy_X=True,max_iter=1000,tol=0.0001,warm_start=False,positive=False,random_state=None,selection='cyclic'])
#参数说明:同class sklearn.linear_model.ElasticNet()
######################################################################################################################
沿正则化路径进行迭代拟合的套索模型:class sklearn.linear_model.LassoCV([eps=0.001,n_alphas=100,alphas=None,fit_intercept=True,normalize=False,precompute='auto',max_iter=1000,tol=0.0001,copy_X=True,cv=None,verbose=False,n_jobs=None,positive=False,random_state=None,selection='cyclic'])
#参数说明:同class sklearn.linear_model.ElasticNetCV()
######################################################################################################################
基于最小角回归的套索模型:class sklearn.linear_model.LassoLars([alpha=1.0,fit_intercept=True,verbose=False,normalize=True,precompute='auto',max_iter=500,eps=2.220446049250313e-16,copy_X=True,fit_path=True,positive=False,jitter=None,random_state=None])
#参数说明:alpha同class sklearn.linear_model.Perceptron()
# eps,fit_path同class sklearn.linear_model.Lars()
# 其他参数同class sklearn.linear_model.LassoLars()
######################################################################################################################
带有交叉验证的基于最小角回归的套索模型:class sklearn.linear_model.LassoLarsCV([fit_intercept=True,verbose=False,max_iter=500,normalize=True,precompute='auto',cv=None,max_n_alphas=1000,n_jobs=None,eps=2.220446049250313e-16,copy_X=True,positive=False])
#参数说明:cv/max_n_alphas/n_jobs同class sklearn.linear_model.LarsCV()
# 其他参数同class sklearn.linear_model.LassoLars()
######################################################################################################################
使用"BIC"/"AIC"进行模型选择的基于最小角回归的套索模型:class sklearn.linear_model.LassoLarsIC([criterion='aic',fit_intercept=True,verbose=False,normalize=True,precompute='auto',max_iter=500,eps=2.220446049250313e-16,copy_X=True,positive=False])
#参数说明:其他参数同class sklearn.linear_model.LassoLars()
criterion:指定使用的标准类型;为"bic"/"aic"
######################################################################################################################
"正交匹配追踪模型"(Orthogonal Matching Pursuit model;OMP model):class sklearn.linear_model.OrthogonalMatchingPursuit([n_nonzero_coefs=None,tol=None,fit_intercept=True,normalize=True,precompute='auto'])
#参数说明:其他参数同class sklearn.linear_model.Lars()
# n_nonzero_coefs同class sklearn.linear_model.ElasticNet()
######################################################################################################################
带有交叉验证的正交匹配追踪模型:class sklearn.linear_model.OrthogonalMatchingPursuitCV([copy=True,fit_intercept=True,normalize=True,max_iter=None,cv=None,n_jobs=None,verbose=False])
#参数说明:n_jobs同class sklearn.linear_model.LogisticRegression()
# 其他参数同class sklearn.linear_model.ElasticNetCV()
copy:指定是否必定复制数据X;为bool
#只有当X是Fortran-ordered时,False值才有用,否则无论如何都会复制