Skip to content

Latest commit

 

History

History

detection

Applying PoolFormer to Object Detection

For details see MetaFormer is Actually What You Need for Vision (CVPR 2022 Oral).

Note

Please note that we just simply follow the hyper-parameters of PVT which may not be the optimal ones for PoolFormer. Feel free to tune the hyper-parameters to get better performance.

Environement Setup

Install MMDetection v2.19.0 from souce cocde,

or

pip install mmdet==2.19.0 --user

Apex (optional):

git clone https://github.com/NVIDIA/apex
cd apex
python setup.py install --cpp_ext --cuda_ext --user

If you would like to disable apex, modify the type of runner as EpochBasedRunner and comment out the following code block in the configuration files:

fp16 = None
optimizer_config = dict(
    type="DistOptimizerHook",
    update_interval=1,
    grad_clip=None,
    coalesce=True,
    bucket_size_mb=-1,
    use_fp16=True,
)

Note: Since we write PoolFormer backbone code of detection and segmentation in a same file which requires to install both MMDetection v2.19.0 and MMSegmentation v0.19.0. Please continue to install MMSegmentation or modify the backone code.

Dockerfile_mmdetseg is the docker file that I use to set up the environment for detection and segmentation. You can also refer to it.

Data preparation

Prepare COCO according to the guidelines in MMDetection v2.19.0.

Results and models on COCO

Method Backbone Pretrain Lr schd Aug box AP mask AP Config Download
RetinaNet PoolFormer-S12 ImageNet-1K 1x No 36.2 - config log & model
RetinaNet PoolFormer-S24 ImageNet-1K 1x No 38.9 - config log & model
RetinaNet PoolFormer-S36 ImageNet-1K 1x No 39.5 - config log & model
Mask R-CNN PoolFormer-S12 ImageNet-1K 1x No 37.3 34.6 config log & model
Mask R-CNN PoolFormer-S24 ImageNet-1K 1x No 40.1 37.0 config log & model
Mask R-CNN PoolFormer-S36 ImageNet-1K 1x No 41.0 37.7 config log & model

All the models can also be downloaded by BaiDu Yun (password: esac).

Evaluation

To evaluate PoolFormer-S12 + RetinaNet on COCO val2017 on a single node with 8 GPUs run:

FORK_LAST3=1 dist_test.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py /path/to/checkpoint_file 8 --out results.pkl --eval bbox

To evaluate PoolFormer-S12 + Mask R-CNN on COCO val2017, run:

dist_test.sh configs/mask_rcnn_poolformer_s12_fpn_1x_coco.py /path/to/checkpoint_file 8 --out results.pkl --eval bbox segm

Training

To train PoolFormer-S12 + RetinaNet on COCO train2017 on a single node with 8 GPUs for 12 epochs run:

FORK_LAST3=1 dist_train.sh configs/retinanet_poolformer_s12_fpn_1x_coco.py 8

To train PoolFormer-S12 + Mask R-CNN on COCO train2017:

dist_train.sh configs/mask_rcnn_poolformer_s12_fpn_1x_coco.py 8

Bibtex

@article{yu2021metaformer,
  title={MetaFormer is Actually What You Need for Vision},
  author={Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng},
  journal={arXiv preprint arXiv:2111.11418},
  year={2021}
}

Acknowledgment

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

mmdetection, PVT detection.