PASCAL VOC 2012 and SBD (the augment dataset) 总结

最编程 2024-07-29 20:10:57

...

在阅读DeepLab时，发现paper中首先介绍了PASCAL VOC 2012数据集，然后又说使用一个augment后的dataset来进行训练。论文中是这样说的：

The proposed models are evaluated on the PASCAL VOC 2012 semantic segmentation benchmark [1] which contains 20 foreground object classes and one background class. The original dataset contains 1, 464 (train), 1, 449 (val ), and 1, 456 (test) pixel-level annotated images. We augment the dataset by the extra annotations provided by [76], resulting in 10, 582 (trainaug) training images. The performance is measured in terms of pixel intersection-over-union averaged across the 21 classes (mIOU).

接下来看一下这个original dataset和augment the datase的区别。

一、PASCAL VOC 2012 segmentation

VOC 2012官方已经说的非常清楚，1464 (train), 1449 (val), and 1456 (test).
详细的分布如下：
在这里插入图片描述

二、SBD dataset

所谓的VOC的augment dataset也叫作SBD，8498 (train)， 2857 (val)是出自这篇文章《Semantic Contours from Inverse Detectors》链接：http://home.bharathh.info/pubs/pdfs/BharathICCV2011.pdf
作者还提供了一个网站，http://home.bharathh.info/pubs/codes/SBD/download.html，里边介绍的比较详细。

需要注意的是

SBD数据集的图片来自于VOC 2011的图片（11355张），而VOC 2012和VOC 2011在数据集的图片上同样没有变化，它们只是标记数量的不同。
SBD数据集的train和val set和VOC是不同的，作者在上边链接的网站里说明：

Please note that the train and val splits included with this dataset are different from the splits in the PASCAL VOC dataset. In particular some “train” images might be part of VOC 2012 val.

即这个训练集包含了部分验证集中的图像。

三、10582 trainaug

DeepLab中所用的10582 trainaug是怎么来的？
参考这个链接：https://www.sun11.me/blog/2018/how-to-use-10582-trainaug-images-on-DeeplabV3-code/
这个博客里提供了完整的如何在tensorflow中使用10582 trainaug训练DeepLabv3：

下载VOC 2012 和 SegmentationClassAug，后一个文件是SBD提供的额外标注（extra annotations）
从这个地址保存trainaug的文件名，创建一个trainaug.txt，然后复制进去。
用vscode打开直接拖到最后，发现确实是10582
最后用上面链接作者提供的脚本运行一下就可以得到10582的trainaug的tfrecord文件使用tensorflow来训练了。

四、10582是怎么得到的？

最后再看一下10582这个数字是怎么得到的，先给一些数据：
voc数据集标签：
voc_trainval：2913
voc_train：1464
voc_val：1449
sbd数据集标签：
sbd_train：8498
sbd_val：2857
因为我们有上边所有数据集对应的文件名.txt文件，通过对比其中图片文件名重合情况，发现：
sbd_train(8498)=和voc_train重复的图片(1133)+和voc_val重复的图片(545)+sbd_train真正补充的图片(6820)
sbd_val(2857)=和voc_train重复的图片(1)+和voc_val重复的图片(558)+sbd_val真正补充的图片(2298)
所以可以得到的最大的扩充数据集应为：
voc_train(1464)+voc_val(1449)+sbd_train真正补充的图片(6820)+sbd_val真正补充的图片(2298)=12031张标注图
用原来的voc_val(1449)作为验证集，剩下的12031-voc_val(1449)=10582都可以用作训练，就是trainaug(10582)

参考：

dataset for semantic sgementation ，图像分割任务中VOC的augment dataset 到底在哪？
SBD数据集

上一篇： Windows Server 2012 NIC Teaming配置实战

下一篇：详尽指南：如何在VMware中安装Windows Server 2012 R2（附带下载链接）