DETR: End-to-End Object Detection with Transformers
PyTorch training code and pretrained models for DETR (DEtection TRansformer). We replace the full complex hand-crafted object detection pipeline with a Transformer, and match Faster R-CNN with a ResNet-50, obtaining 42 AP on COCO using half the computation power (FLOPs) and the same number of parameters. Inference in 50 lines of PyTorch.
Usage - Object detection
There are no extra compiled components in DETR and package dependencies are minimal, so the code is very simple to use. We provide instructions how to install dependencies via conda. First, clone the repository locally:
git clone -b DETR https://gitee.com/eason-hw/ModelZoo-PyTorch.git
Install pycocotools (for evaluation on COCO) and scipy (for training):
pip3 install -r requirements.txt
Note: pillow recommends installing a newer version. If the corresponding torchvision version cannot be installed directly, you can use the source code to install the corresponding version. The source code reference link: Suggestion the pillow is 9.1.0 and the torchvision is 0.6.0 That's it, should be good to train and evaluate detection models.
Data preparation
Download and extract COCO 2017 train and val images with annotations from http://cocodataset.org. We expect the directory structure to be the following:
opt/npu/coco/
annotations/ # annotation json files
train2017/ # train images
val2017/ # val images
Training
To train baseline DETR on a single node with 8 gpus for 300 epochs run:
# env
cd ModelZoo-PyTorch/PyTorch/contrib/cv/detection/DETR
dos2unix ./test/*.sh
# training 1p performance
bash test/train_performance_1p.sh --data_path=YourDataPath
# training 8p performance
bash test/train_performance_8p.sh --data_path=YourDataPat
# training 8p accuracy
bash test/train_full_8p.sh --data_path=YourDataPath
We train DETR with AdamW setting learning rate in the transformer to 1e-4 and 1e-5 in the backbone. Horizontal flips, scales and crops are used for augmentation. Images are rescaled to have min size 800 and max size 1333. The transformer is trained with dropout of 0.1, and the whole model is trained with grad clip of 0.1.
Performance (opt_level = O0)
DEVICE | Epochs/steps | FPS | LOSS
GPU(1P) | 1000 steps | 12.7 |
NPU(1P) | 1000 steps | 0.2 |
GPU(8P) | 2 Epochs | 73.5 | 24.4104
NPU(8P) | 2 Epochs | 0.489 | 24.9557