二元函数凹凸性判断
二元函数凹凸性判断:
设f(x,y)f(x,y)f(x,y)在区域DDD上具有二阶连续偏导数,假定(x0,y0)(x_0,y_0)(x0,y0)为一个驻点,且分别记为:A=fxx′′(x0,y0),B=fxy′′(x0,y0),C=fyy′′(x0,y0)A=f_{xx}^{''}(x_0,y_0),B=f_{xy}^{''}(x_0,y_0),C=f_{yy}^{''}(x_0,y_0)A=fxx′′(x0,y0),B=fxy′′(x0,y0),C=fyy′′(x0,y0)则:
(1)在D上恒有A>0,且AC−B2≥0⟹凸函数\qquad{(1)在D上恒有A>0,且AC-B^2\geq0\Longrightarrow凸函数}(1)在D上恒有A>0,且AC−B2≥0⟹凸函数
(2)在D上恒有A<0,且AC−B2≥0⟹凹函数\qquad{(2)在D上恒有A<0,且AC-B^2\geq0\Longrightarrow凹函数}(2)在D上恒有A<0,且AC−B2≥0⟹凹函数
注:这里的凸函数是指下凸,也就是我们常见的“凹函数”,只不过在机器学习中用这种叫法,毕竟是外国人发明的东西。
二元凹凸函数求最值:
设f(x,y)f(x,y)f(x,y)是在开区域DDD内具有连续偏导数的凸(或者凹)函数,其中(x0,y0)∈D(x_0,y_0)\in{D}(x0,y0)∈D,且fx′(x0,y0)=0,fy′(x0,y0)=0f_{x}^{'}(x_0,y_0)=0,f_{y}^{'}(x_0,y_0)=0fx′(x0,y0)=0,fy′(x0,y0)=0,则f(x0,y0)f(x_0,y_0)f(x0,y0)必定为f(x,y)f(x,y)f(x,y)在区域DDD内的最小值(或者最大值)。
这里已知函数为:
E(w,b)=∑i=1m(yi−wxi−b)2(式1)
E(w,b)=\sum_{i=1}^{m}(y_i-wx_i-b)^2\tag{式1}
E(w,b)=i=1∑m(yi−wxi−b)2(式1)
将E(w,b)E(w,b)E(w,b)分别对于w,bw,bw,b求导数(偏导数),得到:
∂E(w,b)∂w=2(w⋅∑i=1mxi2−∑i=1m(yi−b)xi)(式2)
\cfrac{\partial{E(w,b)}}{\partial{w}}=2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i)\tag{式2}
∂w∂E(w,b)=2(w⋅i=1∑mxi2−i=1∑m(yi−b)xi)(式2)
∂E(w,b)∂b=2(mb−∑i=1m(yi−wxi))(式3)
\cfrac{\partial{E(w,b)}}{\partial{b}}=2(mb-\sum_{i=1}^{m}(y_i-wx_i))\tag{式3}
∂b∂E(w,b)=2(mb−i=1∑m(yi−wxi))(式3)
在(式2)基础上:∂2E(w,b)∂w2=∂∂w(∂E(w,b)∂w)=∂∂w(2(w⋅∑i=1mxi2−∑i=1m(yi−b)xi))
\cfrac{\partial^{2}E(w,b)}{\partial{w^2}}=\cfrac{\partial}{\partial{w}}(\cfrac{\partial{E(w,b)}}{\partial{w}})=\cfrac{\partial}{\partial{w}}(2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i))
∂w2∂2E(w,b)=∂w∂(∂w∂E(w,b))=∂w∂(2(w⋅i=1∑mxi2−i=1∑m(yi−b)xi))
=∂∂w(2w⋅∑i=1mxi2)=2∑i=1mxi2(式4)
=\cfrac{\partial}{\partial{w}}(2w\cdot{\sum_{i=1}^{m}x_i^2})=2\sum_{i=1}^{m}x_i^2\tag{式4}
=∂w∂(2w⋅i=1∑mxi2)=2i=1∑mxi2(式4)
⟹A=fxx′′(x,y)=2∑i=1mxi2\Longrightarrow{A=f_{xx}^{''}(x,y)}=2\sum_{i=1}^{m}x_i^2⟹A=fxx′′(x,y)=2∑i=1mxi2
∂E2(w,b)∂w∂b=∂∂b(∂E(w,b)∂w)=∂∂b(2(w⋅∑i=1mxi2−∑i=1m(yi−b)xi))
\cfrac{\partial{E^2(w,b)}}{\partial{w}\partial{b}}=\cfrac{\partial}{\partial{b}}(\cfrac{\partial{E(w,b)}}{\partial{w}})=\cfrac{\partial}{\partial{b}}(2(w\cdot\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}(y_i-b)x_i))
∂w∂b∂E2(w,b)=∂b∂(∂w∂E(w,b))=∂b∂(2(w⋅i=1∑mxi2−i=1∑m(yi−b)xi))
=∂∂b(−2∑i=1m(yi−b)xi)=2∑i=1mxi(式5)
=\cfrac{\partial}{\partial{b}}(-2\sum_{i=1}^{m}(y_i-b)x_i)=2\sum_{i=1}^{m}x_i\tag{式5}
=∂b∂(−2i=1∑m(yi−b)xi)=2i=1∑mxi(式5)
⟹B=fxy′′(x,y)=2∑i=1mxi\Longrightarrow{B=f_{xy}^{''}(x,y)}=2\sum_{i=1}^{m}x_i⟹B=fxy′′(x,y)=2∑i=1mxi
在(式3)基础上:
∂2E(w,b)∂b2=∂∂b(∂E(w,b)∂b)=∂∂b(2(mb−∑i=1m(yi−wxi)))=2m(式6)
\cfrac{\partial^2E{(w,b)}}{\partial{b^2}}=\cfrac{\partial}{\partial{b}}(\cfrac{\partial{E(w,b)}}{\partial{b}})=\cfrac{\partial}{\partial{b}}(2(mb-\sum_{i=1}^{m}(y_i-wx_i)))
=2m\tag{式6}∂b2∂2E(w,b)=∂b∂(∂b∂E(w,b))=∂b∂(2(mb−i=1∑m(yi−wxi)))=2m(式6)
⟹C=fyy′′(x,y)=2m\Longrightarrow{C=f_{yy}^{''}(x,y)}=2m⟹C=fyy′′(x,y)=2m
AC−B2=4m∑i=1mxi2−[2∑i=1mxi2]2=4m∑i=1mxi2−4m1m∑i=1mxi⋅∑i=1mxi=4m(∑i=1mxi2−∑i=1mxixˉ)
AC-B^2=4m\sum_{i=1}^{m}x_i^2-[2\sum_{i=1}^{m}x_i^2]^2=4m\sum_{i=1}^{m}x_i^2-4m\cfrac{1}{m}\sum_{i=1}^{m}x_i\cdot\sum_{i=1}^{m}x_i=4m(\sum_{i=1}^{m}x_i^2-\sum_{i=1}^{m}x_i{\bar{x}})
AC−B2=4mi=1∑mxi2−[2i=1∑mxi2]2=4mi=1∑mxi2−4mm1i=1∑mxi⋅i=1∑mxi=4m(i=1∑mxi2−i=1∑mxixˉ)
4m∑i=1m(xi2−xixˉ−xixˉ+xixˉ)=4m∑i=1m(xi2−2xixˉ+xˉ2)≥0(式7)
4m\sum_{i=1}^{m}(x_i^2-x_i\bar{x}-x_i\bar{x}+x_i\bar{x})=4m\sum_{i=1}^{m}(x_i^2-2x_i\bar{x}+\bar{x}^2)\geq0\tag{式7}
4mi=1∑m(xi2−xixˉ−xixˉ+xixˉ)=4mi=1∑m(xi2−2xixˉ+xˉ2)≥0(式7)
注:上式中进行的一个替换操作为:∑i=1mxixˉ=xˉ⋅m⋅1m∑i=1mxi=mxˉ2=∑i=1mxˉ2\sum_{i=1}^{m}x_i\bar{x}=\bar{x}\cdot{m}\cdot\cfrac{1}{m}\sum_{i=1}^{m}x_i=m\bar{x}^2=\sum_{i=1}^{m}\bar{x}^2∑i=1mxixˉ=xˉ⋅m⋅m1∑i=1mxi=mxˉ2=∑i=1mxˉ2
以及:1m∑i=1mxi=xˉ\cfrac{1}{m}\sum_{i=1}^{m}x_i=\bar{x}m1∑i=1mxi=xˉ。
到这里就证明了E(w,b)E(w,b)E(w,b)为凸函数,所以就可以进行凸优化操作了。