[Shunta Saito, chainer] @mingxiao.huang Did you surely perform all of these techniques used in the inference time which are written in the original ResNet50 paper?
In testing, for comparison studies we adopt the standard 10-crop testing [21]. For best results, we adopt the fully convolutional form as in [41, 13], and average the scores at multiple scales (images are resized such that the shorter side is in {224, 256, 384, 480, 640}).