LV-ViT

All Tokens Matter: Token Labeling for Training Better Vision Transformers ,based Transformer model for image classification, detail in (arxiv)

Requirements

torch>=1.4.0 torchvision>=0.5.0 pyyaml scipy timm==0.4.5 Note: pillow recommends installing a newer version. If the corresponding torchvision version cannot be installed directly, you can use the source code to install the corresponding version. The source code reference link: Suggestion the pillow is 9.1.0 and the torchvision is 0.6.0 data prepare: ImageNet with the following folder structure

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Label generation

To generate token label data for training:

python3 generate_label.py /path/to/imagenet/train /path/to/save/label_top5_train_nfnet --model dm_nfnet_f6 --pretrained --img-size 576 -b 32 --crop-pct 1.0

also provided genarated labeled date in BaiDu Yun (password: y6j2)

Model Train

Train the LV-ViT-S:

1:train on 1 NPU
bash /test/train_full_1p.sh '/Path_to_Imagenet' 'Path_to_Token-label-data'
Example: bash /test/train_full_1p.sh '/opt/npu/imagenet/' './label_top5_train_nfnet'

2:train on 8 NPU
bash /test/train_full_8p.sh '/Path_to_Imagenet' 'Path_to_Token-label-data'
Example: bash /test/train_full_8p.sh '/opt/npu/imagenet/' './label_top5_train_nfnet'

Get model performance

1:test 1p performance
bash test/train_performance_1p.sh '/Path_to_Imagenet/' '/Path_to_Token-label-data/'
Example: bash test/train_performance_1p.sh  '/opt/npu/imagenet/' './label_top5_train_nfnet'
2:test 8p performance
bash test/train_performance_8p.sh '/Path_to_Imagenet/' '/Path_to_Token-label-data/'
Example: bash test/train_performance_8p.sh '/opt/npu/imagenet/' './label_top5_train_nfnet'

Validation

Replace DATA_DIR with your imagenet validation set path and MODEL_DIR with the checkpoint path

bash test/train_eval_8p.sh '/PATHTO/imagenet/val' '/PATHTO/LVVIT/eval_pth' 
Example:test/train_eval_8p.sh '/opt/npu/imagenet/val' '/trained/model.pth.tar'

Fine-tuning

To Fine-tune the pre-trained LV-ViT-S

bash /test/train_finetune_1p.sh '/Path_to_Imagenet/' '/Path_to_Token-label-data/' '/Pah_to_Trained_pth/'
Example: bash /test/train_full_1p.sh '/opt/npu/imagenet/' './label_top5_train_nfnet' './finetune/lvvit_s-26m-224-83.3.pth.tar'

About Train FPS

Example log:Train: 257 [ 150/625 ( 24%)]  Loss:  9.841134 (10.1421)  Time: 1.941s, 1054.88/s  (2.048s, 1000.09/s)  LR: 4.609e-04  Data: 0.029 (0.062)
As log  above get FPS：1054.88

公网地址说明

代码涉及公网地址参考 public_address_statement.md