Logistic Regression cost function and Maximum likehood estimate

最编程 2024-07-25 15:21:42

...

Logistic Regression cost function

The original form is y^, here we simplified by using y’ because of the latex grammar.

Ify=1:p(y∣x)=y′If y = 1: p(y|x) = y'Ify=1:p(y∣x)=y′ Ify=0:p(y∣x)=1−y′If y = 0: p(y|x) = 1-y'Ify=0:p(y∣x)=1−y′

Summarize−>p(y∣x)=y′y(1−y)1−ySummarize -> p(y|x) = y'^y (1-y)^{1-y}Summarize−>p(y∣x)=y′y(1−y)1−y This one equation can express that: Ify=1:p(y∣x)=y′If y = 1: p(y|x) = y'Ify=1:p(y∣x)=y′ Ify=0:p(y∣x)=1−y′If y = 0: p(y|x) = 1-y'Ify=0:p(y∣x)=1−y′ The log function is a strictly monotonically increasing. Maximizing log(p(y∣x))log(p(y|x))log(p(y∣x)) give you a similar result that is optimizing p(y|x) and if you compute log of p(y|x) -> log p(y∣x)=log(y′y(1−y)1−y)=ylogy′+(1−y)log(1−y′)log\ p(y|x)=log (y'^y (1-y)^{1-y}) =ylogy' +(1-y)log(1-y')log p(y∣x)=log(y′y(1−y)1−y)=ylogy′+(1−y)log(1−y′) =−l(y′,y)=-l(y',y)=−l(y′,y) note: l represents loss function here. Minimizing the loss function corresponds to maximum the log of the probability. This is what the loss funcion on a single example looks like.

Cost on m examples

log p(labels in thetraining set)=log∏i=1mp(y′i,y′)log\ p(labels\ in \ the training \ set) = log \prod_{i=1}^mp(y'^i,y')log p(labels in thetraining set)=logi=1∏mp(y′i,y′) log p(...)=∑i=1mlog p(yi∣xi)=−∑i=1ml(y′i,yi)log\ p(...) = \sum_{i=1}^mlog\ p(y^i|x^i)=-\sum_{i=1}^ml(y'^i,y^i)log p(...)=i=1∑mlog p(yi∣xi)=−i=1∑ml(y′i,yi)

Maximum likelihood estimation

And so in statistics, there’s a principle called the principal of maximum likelihood estimation,which just means choose the parameters that maximizes this thing(refer to above).

Cost function: Because we want to minimize the cost, instead of maximizing likelihood we’ve got rid of negative. And then finally for convenience, we make sure that our quantities are better scale, we just add a 1 over m extra scaling factor there. J(w,b)=1m∑i=1ml(y′i,yi)J(w,b) =\frac{1}{m}\sum_{i=1}^ml(y'^i,y^i)J(w,b)=m1i=1∑ml(y′i,yi)

But to summarize, by minimizing this cost function J(w,b), we’re really carrying out maximum likelihood estimation Under the assumption that our training examples were IID or identically independently distributed.

Reference

https://mooc.study.163.com/learn/2001281002?tid=2001392029#/learn/content?type=detail&id=2001702014 Maximum likelihood Estimate https://blog.****.net/zengxiantao1994/article/details/72787849

上一篇： SNP遗传力是什么意思？

下一篇：详解高斯过程中的函数最优化：从理论推导到实战实现（配代码与公式示例）