欢迎您访问 最编程 本站为您分享编程语言代码,编程技术文章!
您现在的位置是: 首页

如何在R语言中理解和计算混淆矩阵的各类指标(sen, spe, AUC, recall, F1)

最编程 2024-07-22 16:02:16
...

ConfusionMatrix


image.png

由上表可以计算的指标有:

Accuracy = (TP+TN)/(P+N)
Error Rate = 1 – Accuracy = (FP+FN)/(P+N)
False Positive Rate = Fallout = FP/N
True Positive Rate = Recall = Sensitivity = TP/P
False Negative Rate = Miss = FN/P
True Negative Rate = Specificity = TN/N
Positive Predictive Value = Precision = TP/(TP+FP)
Negative Predictive Value = TN/(TN+FN)
Prediction-conditioned Fallout = FP/(TP+FP)
Prediction-conditioned Miss = FN/(TN+FN)
Rate of Positive Predictions = Detection Prevalence = (TP+FP)/(P+N)
Rate of Negative Predictions = (TN+FN)/(P+N)
Prevalence = (TP+FN)/(P+N)
Detection Rate = TP/(P+N)
Balanced Accuracy = (Sensitivity+Specificity)/2
MCC(Matthews correlation coefficient) = (TPTN - FPFN)/[(TP+FP)(TP+FN)(TN+FP)(TN+FN)]^(1/2)

以ROCR自带的数据集ROCR.simple为例:
require(ROCR)
data(ROCR.simple)
str(ROCR.simple)
## List of 2
##  $ predictions: num [1:200] 0.613 0.364 0.432 0.14 0.385 ...
##  $ labels     : num [1:200] 1 1 0 0 0 1 1 1 1 0 ...

pred.class <- as.integer(ROCR.simple$predictions > 0.5) #这里假设prediciton>0.5的预测为1,其余为0.
print(cft <- table(pred.class, ROCR.simple$labels)) #先prediction value 后label
          
## pred.class  0  1
##          0 91 14
##          1 16 79
Method 1: manual
tp <- cft[2, 2]
tn <- cft[1, 1]
fp <- cft[2, 1]
fn <- cft[1, 2]
print(accuracy <- (tp + tn)/(tp + tn + fp + fn))
## [1] 0.85
print(sensitivity <- tp/(tp + fn))
## [1] 0.8494624
print(specificity <- tn/(tn + fp))
## [1] 0.8504673
Method 2: caret
require(caret)
confusionMatrix(cft, positive = "1") #使用上面的table
## Confusion Matrix and Statistics
## 
##           
## pred.class  0  1
##          0 91 14
##          1 16 79
##                                           
##                Accuracy : 0.85            
##                  95% CI : (0.7928, 0.8965)
##     No Information Rate : 0.535           
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6989          
##                                           
##  Mcnemar's Test P-Value : 0.8551          
##                                           
##             Sensitivity : 0.8495          
##             Specificity : 0.8505          
##          Pos Pred Value : 0.8316          
##          Neg Pred Value : 0.8667          
##              Prevalence : 0.4650          
##          Detection Rate : 0.3950          
##    Detection Prevalence : 0.4750          
##       Balanced Accuracy : 0.8500          
##                                           
##        'Positive' Class : 1               
confusionMatrix(as.factor(pred.class), as.factor(ROCR.simple$labels),
                positive = "1", mode = "everything")
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 91 14
##          1 16 79
##                                           
##                Accuracy : 0.85            
##                  95% CI : (0.7928, 0.8965)
##     No Information Rate : 0.535           
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6989          
##                                           
##  Mcnemar's Test P-Value : 0.8551          
##                                           
##             Sensitivity : 0.8495          
##             Specificity : 0.8505          
##          Pos Pred Value : 0.8316          
##          Neg Pred Value : 0.8667          
##               Precision : 0.8316          
##                  Recall : 0.8495          
##                      F1 : 0.8404          
##              Prevalence : 0.4650          
##          Detection Rate : 0.3950          
##    Detection Prevalence : 0.4750          
##       Balanced Accuracy : 0.8500          
##                                           
##        'Positive' Class : 1               

参考资料:

  1. 分类器评价与在R中的实现:混淆矩阵