why log-likelihood is negative-when , the function becomes

最编程 2024-07-25 15:37:56

...

$Cost(h_\theta(x), y)$ = $-(1-y)log(1-h_\theta(x))$
and if $h_\theta(x) \rightarrow^{(approach)} 0$
e.g. $0.05$ , the result becomes
$Cost(h_\theta(x), y)$ = $-(1-y)log(1-h_\theta(x)) = -1 * log(1-0.05)$

image.png

Notice that the log(#) $\rightarrow$ Negative tiny number, that's the final reason why the Log-Likelihood Cost Function need the Negative Sign

See the following figure and find out why we use log-likelihood this way

image.png

Conclusion

log-likelihood would simplify the computation that likelihood function does since both functions are estimating monotonically
We want the part that is between a range of {0,1}, therefore we need to take the reversed computation that adding negative sign at the beginning of the log-likelihood, i.e. the cross-entropy form which would help us to minimize the cost function
Because we use log-likelihood as our cost function and we want to find the parameter $\theta$ , we actually want to know that how we determine that value of $\theta$ that would help us to approach the lowest point as quick as possible, in other words Quasi-Newton method, as well as it is similar to what we often use in ML The Gradient Descent

Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. The "full" Newton's method requires the Jacobian in order to search for zeros, or the Hessian for finding extrema.

上一篇： MLE详解：最大似然估计方法

下一篇： likelihood 函数详解：概念、最大化应用与最小二乘解的相关性