likelihood and log likelihood
References
https://stats.stackexchange.com/questions/289190/theoretical-motivation-for-using-log-likelihood-vs-likelihood
https://www.mathsisfun.com/algebra/logarithms.html
https://en.wikipedia.org/wiki/Maximum_likelihood_estimation
https://towardsdatascience.com/probability-concepts-explained-maximum-likelihood-estimation-c7b4342fdbb1
https://www.quora.com/What-is-the-advantage-of-using-the-log-likelihood-function-versus-the-likelihood-function-for-maximum-likelihood-estimation
https://machinelearningmastery.com/what-is-maximum-likelihood-estimation-in-machine-learning/
https://stats.stackexchange.com/questions/2641/what-is-the-difference-between-likelihood-and-probability
https://math.stackexchange.com/questions/892832/why-we-consider-log-likelihood-instead-of-likelihood-in-gaussian-distribution
https://math.stackexchange.com/questions/892832/why-we-consider-log-likelihood-instead-of-likelihood-in-gaussian-distribution
https://towardsdatascience.com/whats-a-logarithm-cca50d031241
https://mathbitsnotebook.com/Algebra2/Exponential/EXExpMoreFunctions.html
https://towardsdatascience.com/log-loss-function-math-explained-5b83cd8d9c83
Maximum likelihood estimation and log-likelihood
- MLE(Maximum likelihood estimation)
MLE and pretty much every statistical approach assumes that observations are independent or at least conditionally independent. Thus, every likelihood can be written as
MLE and pretty much every statistical approach assumes that observations are independent or at least conditionally independent. Thus, every likelihood can be written as
gives the probability of observing given conditional on some parameters(s) , and we pick to maximize this likelihood. The exact shape of this f may be pretty funky, depending on how complicated your model is.
Since the log is a monotonic transformation, the argument that maximizes the log of a function is the same as the one that maximizes the original function. Thus, using a basic property of logs, the log-likelihood becomes a sum
Since each term is separate, this is a lot easier to maximize. It is also more concave since the logarithm is a concave function, which makes newtonian methods of optimization work better. Numerical precision errors may be reduced. If you’re dealing with a simple model, it’s a lot easier to take an analytical derivative and find a closed-form solution. (Taking the derivative of sums is easy; taking the derivative of a lot of terms multiplied together gets messy!)
Natural log has nice properties when combined with probability models from the exponential family, but you would still want to use a log-likelihood even if your probability model is not in the exponential family of distributions. The fact that it can help cancel some exponential terms is just a bonus.
- Quasi-Newton method
Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton's method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. The "full" Newton's method requires the Jacobian in order to search for zeros or the Hessian for finding extrema.
Difference between likelihood and probability
https://tinyheero.github.io/2016/03/17/prob-distr.html
"Probability mass functions (pmf) are used to describe discrete probability distributions.
Probability density functions (pdf) are used to describe continuous probability distributions."
- Discrete Random Variables
The observed outcomes by and the set of parameters that describe the stochastic process as . Thus, when we speak of probability we want to calculate . In other words, given specific values for is the probability that we would observe the outcomes represented by
However, when we model a real-life stochastic process, we often do not know , simply observe and the goal then is to arrive at an estimate for , that would be a plausible choice given the observed outcomes . We know that given a value of the probability of observing is . Thus, a 'natural' estimation process is to choose that value of that would maximize the probability that we would actually observe .
In other words, we find the parameter values θ that maximize
is called the likelihood function, it is conditioned on the observed and that it is a function of the unknown parameters
- Continuous Random Variables
The answer depends on whether you are dealing with discrete or continuous random variables. So, I will split my answer accordingly. I will assume that you want some technical details and not necessarily an explanation in plain English.
Discrete Random Variables
Suppose that you have a stochastic process that takes discrete values (e.g., outcomes of tossing a coin 10 times, number of customers who arrive at a store in 10 minutes, etc.). In such cases, we can calculate the probability of observing a particular set of outcomes by making suitable assumptions about the underlying stochastic process (e.g., probability of coin landing heads is p and that coin tosses are independent).
Denote the observed outcomes by O and the set of parameters that describe the stochastic process as θ. Thus, when we speak of probability we want to calculate P(O|θ). In other words, given specific values for θ, P(O|θ) is the probability that we would observe the outcomes represented by O.
However, when we model a real-life stochastic process, we often do not know θ. We simply observe O and the goal then is to arrive at an estimate for θ that would be a plausible choice given the observed outcomes O. We know that given a value of θ the probability of observing O is P(O|θ). Thus, a 'natural' estimation process is to choose that value of θ that would maximize the probability that we would actually observe O. In other words, we find the parameter values θ that maximize the following function:
L(θ|O) is called the likelihood function. Notice that by definition the likelihood function is conditioned on the observed O and that it is a function of the unknown parameters θ.
- Continuous Random Variables
In the continuous case the situation is similar with one important difference. We can no longer talk about the probability that we observed given because in the continuous case
Denote the probability density function (pdf) associated with the outcomes as , thus, in the continuous case we estimate given observed outcomes by maximizing the function
In this situation, we cannot technically assert that we are finding the parameter value that maximizes the probability that we observe as we maximize the PDF associated with the observed outcomes
推荐阅读
-
14-傅里叶变换的代码实现-一、numpy实现傅里叶变换和逆傅里叶变换 1.numpy实现傅里叶变换numpy.fft.fft2实现傅里叶变换,返回一个复数数组(complex ndarray),也就是频谱图像numpy.fft.fftshift将零频率分量移到频谱中心(将左上角的低频区域,移到中心位置) 20*np.log(np.abs(fshift))设置频谱的范围。可以理解为,之前通过傅里叶变换得到复数的数组,是不能通过图像的方法展示出来的,需要转换为灰度图像(映射到[0,255]区间)需要注意的是1> 傅里叶得到低频、高频信息,针对低频、高频处理能够实现不同的目的2> 傅里叶过程是可逆的,图像经过傅里叶变换、逆傅里叶变换后,能够恢复到原始图像3> 在频域对图像进行处理,在频域的处理会反映在逆变换图像上 # 将绘制的图显示在窗口 %matplotlib qt5 import cv2 import numpy as np import matplotlib.pyplot as plt img = cv2.imread(r"image\lena.bmp",cv2.IMREAD_GRAYSCALE) # 傅里叶变换 f = np.fft.fft2(img) # 移动中心位置 fshift = np.fft.fftshift(f) # 调整值范围 result = 20*np.log(np.abs(fshift)) plt.subplot(1,2,1) plt.imshow(img,cmap=plt.cm.gray) plt.title("original") plt.axis("off") plt.subplot(1,2,2) plt.imshow(result,cmap=plt.cm.gray) plt.title("result") plt.axis("off") plt.show 傅里叶变换的频谱图像: 2.numpy实现逆傅里叶变换numpy.fft.ifft2实现逆傅里叶变换,返回一个复数数组(complex ndarray)numpy.fft.ifftshiftfftshift函数的逆函数,将中心位置的低频,重新移到左上角iimg = np.abs(逆傅里叶变化结果)设置值的范围,映射到[0,255]区间 # 将绘制的图显示在窗口 %matplotlib qt5 import cv2 import numpy as np import matplotlib.pyplot as plt img = cv2.imread(r"image\boat.bmp",cv2.IMREAD_GRAYSCALE) # 傅里叶变换 f = np.fft.fft2(img) fshift = np.fft.fftshift(f) # 逆傅里叶变换 ishift = np.fft.ifftshift(fshift) iimg = np.fft.ifft2(ishift) iimg = np.abs(iimg) plt.subplot(1,2,1) plt.imshow(img,cmap=plt.cm.gray) plt.title("original") plt.axis("off") plt.subplot(1,2,2) plt.imshow(iimg,cmap=plt.cm.gray) plt.title("iimg") plt.axis("off") plt.show 将一副图像,进行傅里叶变换和逆傅里叶变换后,进行对比(一样的) 实例:通过numpy实现高通滤波,保留图像的边缘信息 获取图像的形状rows,cols = img.shape获取图像的中心点crow,ccol = int(rows/2),int(cols/2)将频谱图像的中心区域(低频区域)设置为0(黑色)fshift[crow-30:crow+30,ccol-30:ccol+30] = 0 # 将绘制的图显示在窗口 %matplotlib qt5 import cv2 import numpy as np import matplotlib.pyplot as plt img = cv2.imread(r"image\boat.bmp",cv2.IMREAD_GRAYSCALE) # 傅里叶变换 f = np.fft.fft2(img) fshift = np.fft.fftshift(f) # 高通滤波 rows,cols = img.shape crow,ccol = int(rows/2),int(cols/2) fshift[crow-30:crow+30,ccol-30:ccol+30] = 0 # 逆傅里叶变换 ishift = np.fft.ifftshift(fshift) iimg = np.fft.ifft2(ishift) iimg = np.abs(iimg) plt.subplot(1,2,1) plt.imshow(img,cmap=plt.cm.gray) plt.title("original") plt.axis("off") plt.subplot(1,2,2) plt.imshow(iimg,cmap=plt.cm.gray) plt.title("iimg") plt.axis("off") plt.show 使用numpy实现高通滤波的实验结果: 二、opencv实现傅里叶变换和逆傅里叶变换 1.opencv实现傅里叶变换 返回结果 = cv2.dft(原始图像,转换标识)1> 返回结果:是双通道的,第一个通道是结果的实数部分,第二个通道是结果的虚数部分2> 原始图像:输入图像要首先转换成np.float32(img)格式3> 转换标识:flags = cv2.DFT_COMPLEX_OUTPUT,输出一个复数阵列numpy.fft.fftshift将零频率分量移到频谱中心(将左上角的低频区域,移到中心位置)调整频谱的范围,将上面频谱图像的复数数组,转换为可以显示的灰度图像(映射到[0,255]区间)返回值 = 20*np.log(cv2.magnitude(参数1,参数2))1> 参数1:浮点型X坐标值,也就是实部2> 参数2:浮点型Y坐标值,也就是虚部 # 将绘制的图显示在窗口 %matplotlib qt5 import cv2 import numpy as np import matplotlib.pyplot as plt img = cv2.imread(r"image\lena.bmp",cv2.IMREAD_GRAYSCALE) # 傅里叶变换 dft = cv2.dft(np.float32(img),flags = cv2.DFT_COMPLEX_OUTPUT) # 移动中心位置 dftShift = np.fft.fftshift(dft) # 调整频谱的范围 result = 20*np.log(cv2.magnitude(dftShift[:,:,0],dftShift[:,:,1])) plt.subplot(1,2,1) plt.imshow(img,cmap=plt.cm.gray) plt.title("original") plt.axis("off") plt.subplot(1,2,2) plt.imshow(result,cmap=plt.cm.gray) plt.title("result") plt.axis("off") plt.show 傅里叶变换的频谱图像: 2.opencv实现逆傅里叶变换返回结果 = cv2.idft(原始数据)1> 返回结果:取决于原始数据的类型和大小2> 原始数据:实数或者复数均可numpy.fft.ifftshiftfftshift函数的逆函数,将中心位置的低频,重新移到左上角调整频谱的范围,映射到[0,255]区间返回值 = cv2.magnitude(参数1,参数2)1> 参数1:浮点型X坐标值,也就是实部2> 参数2:浮点型Y坐标值,也就是虚部 # 将绘制的图显示在窗口 %matplotlib qt5 import cv2 import numpy as np import matplotlib.pyplot as plt img = cv2.imread(r"image\lena.bmp",cv2.IMREAD_GRAYSCALE) # 傅里叶变换 dft = cv2.dft(np.float32(img),flags = cv2.DFT_COMPLEX_OUTPUT) dftShift = np.fft.fftshift(dft) # 逆傅里叶变换 ishift = np.fft.ifftshift(dftShift) iimg = cv2.idft(ishift) iimg = cv2.magnitude(iimg[:,:,0],iimg[:,:,1]) plt.subplot(1,2,1) plt.imshow(img,cmap=plt.cm.gray) plt.title("original") plt.axis("off") plt.subplot(1,2,2) plt.imshow(iimg,cmap=plt.cm.gray) plt.title("inverse") plt.axis("off") plt.show 将一副图像,进行傅里叶变换和逆傅里叶变换后,进行对比(一样的) 实例:通过opencv实现低通滤波,模糊一副图像
-
使用OpenCV的cv::log函数计算自然对数
-
通过atrace工具获取systrace log
-
C# WinForm应用中如何利用log4net同时生成两个日志文件
-
优化 [.NET] Log4net FileAppender 和 MinimalLock 性能问题的解决方案(第二部分:使用 MSMQ)
-
以吃瓜视角全面解析Log4j事件始末
-
剖析hs_err_prd.log错误日志
-
Bin-Log Distributor 数据消费丢失问题的解决方案与修复历程
-
在MySQL中理解binlog参数: 记录行级查询事件的binlog_rows_query_log_events
-
生物信息学初探:网页版罗伯特塔折叠蛋白预测体验实录 - log10分享