All baselines were trained using 8 GPU with a batch size of 8 (1 images per GPU) using the linear scaling rule to scale the learning rate.
All models were trained on cityscapes_train, and tested on cityscapes_val.
1x training schedule indicates 64 epochs which corresponds to slightly less than the 24k iterations reported in the original schedule from the Mask R-CNN paper
COCO pre-trained weights are used to initialize.
A conversion script is provided to convert Cityscapes into COCO format. Please refer to install.md for details.
CityscapesDataset implemented three evaluation methods. bbox and segm are standard COCO bbox/mask AP. cityscapes is the cityscapes dataset official evaluation, which may be slightly higher than COCO.