Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Yeah, @mingxiao.huang Have you already read this paper? https://arxiv.org/abs/1711.04325
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul Yes, we have read it. However, to make the problem simpler, we are trying single GPU case now, it would be nice if you have such a training script file for single node and send it to us. Thanks.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] What is the global minibatch size? I think the techniques written in the paper is for "Extremely large minibatch" as you can see in the title. If the global minibatch size is not so large because you are just using a single GPU, the technique required to achieve good results will be different. What do you think @kfukuda?
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul Yes, that is also my concern. for single node, we used batchsize=128. The technique required to achieve good results will be different for single node training, so, it would be nice if you have such a training script file for single node and send it to us. Thanks.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul any more comments? Do you have such a training script file for single node ?
    Sameroom
    @sameroom-bot

    [Shunta Saito, chainer] @mingxiao.huang Could you answer to his question?

    Did you achieve the SOTA with the original ResNet-50 (with batchsize 64) ?

    [Shunta Saito, chainer] Regarding a training script, I think it's OK to use the normal one for ResNet50 training on ImageNet-1K with 128 minibatchsize. You can find the example code in Chainer repository. Have you already tried that?
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul the example code in Chainer repository is rather simple, actually, we would like to reproduce your result with detailed hyper parameters you used, since we could not get SOTA accuracy on resnet50 on chainer at all, however, with same parameters, we can achieve SOTA on caffe.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang What we intended to say was that you need to use "extremely large batchsize" to reproduce the result on paper. If you want to achieve over 74% accuracy by using only 1 GPU, you need different technique for that setting. I think we have never tested such small environment with only 1 GPU because it takes too much time to finish 90 epochs. Is this correct? @kfukuda
    Sameroom
    @sameroom-bot
    [Keisuke Fukuda, chainer] Yes, even the original ResNet author (Kaiming He) used 8 GPUs (if I remember correctly). The technique written on the paper is for extremely large minibatch size (>8K) and it does not make much sense to use the techniques (RMSProp warmup etc.) for such a small batchsize.
    [Keisuke Fukuda, chainer] Before trying our techniques, I recommend to try Facebook’s result with 4K batchsize.
    [Keisuke Fukuda, chainer] Techniques for large batch size does not necessarily good for small batchsize.
    Sameroom
    @sameroom-bot
    [Keisuke Fukuda, chainer] So, it depends on what you are trying to do @mingxiao.huang. If you want to reproduce our results, first you need to prepare >512 GPUs and try 32K batchsize. Trying to reproduce our results with small batchsize is nonsense. And, repeatedly, I recommend starting from Facebook’s result.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul @kfukuda thanks, theoretically speaking, I guess we can try one GPU with mini batchsize=128, use same learning rate as yours, but not involve "RMSProp warmup". Correct?
    Sameroom
    @sameroom-bot
    [Keisuke Fukuda, chainer] Learning rate is just one of parameters. What is your goal?
    [Keisuke Fukuda, chainer] To achieve the SOTA with BS=128?
    [Keisuke Fukuda, chainer] Or, reproduce our “extremely large minibatch” paper?
    Sameroom
    @sameroom-bot
    [Keisuke Fukuda, chainer] If (1), forget our “extremely large…” paper and go back to the original ResNet hyperparameters. If you’ve done on caffe, you should be able to port your code to Chainer (but very carefully). It’s totally a different work.
    [Keisuke Fukuda, chainer] If (2), you need to implement everything on the paper including RMSProp and everything. And you should try batch size 32K. You can simulate 32K batchsize with a small number of GPUs by skipping updating and accumulating gradients.
    [Keisuke Fukuda, chainer] Again, I recommend starting from Facebook’s paper with 4K batchsize because our paper is based on it.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] And, could you share the code you are using now for ResNet50 training on ImageNet-1K with 1 GPU? I'd like to try the code to ensure that cannot achieve > 74% accuracy on validation dataset.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul please kindly check https://github.com/mingxiaoh/chainer-training-script for the code we used. Thanks.
    [Shunta Saito, chainer] Thanks! I'll take a look into it.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @nishino @mitmul hello, may you please help merge chainer/chainer#4933, chainer/chainer#5033 ? It has been pending for a long time. Thanks.
    Sameroom
    @sameroom-bot
    [nishino, chainer] I'm sorry for inconvenience. @kmaehashi Could you take a look?
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] Sorry for the delay. I’ll proceed to merge shortly.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @nishino @kmaehashi hello, may you please help merge chainer/chainer#4933, chainer/chainer#5033 ? It has been pending for a long time. Thanks.
    [mingxiao huang, chainer] @mitmul hello, how is your resnet50 training test going?
    Sameroom
    @sameroom-bot
    [nishino, chainer] @mingxiao.huang Sorry, @kmaehashi is on leave this week. We will arrange another reviewer soon.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] thanks
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] Could you check my comment?
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] hello, @kmaehashi @nishino when will chainer/chainer#5033 be merged?
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] Sorry for the delay, I’ll proceed to merge.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul @nishino hello, how is the resnet50 GPU training on your site?
    [Shunta Saito, chainer] Sorry I was on a business trip and too busy for another project preparing for CEATEC event next week, so couldn't have enough time to work on it...
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang Did you perform multi-scale cropping at *inference time*?
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang Could you give us the evaluation script? I could find only the training scripts in this repository: https://github.com/mingxiaoh/chainer-training-script
    Sameroom
    @sameroom-bot

    [Shunta Saito, chainer] @mingxiao.huang Did you surely perform all of these techniques used in the inference time which are written in the original ResNet50 paper?

    In testing, for comparison studies we adopt the standard 10-crop testing [21]. For best results, we adopt the fully convolutional form as in [41, 13], and average the scores at multiple scales (images are resized such that the shorter side is in {224, 256, 384, 480, 640}).

    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul For your first question, in that script you can see that extensions.Evaluator function is used for evaluation.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul For your second question, as you can see the scripts we send to you, we used raw data images, then adopted the same data augmentation policy as caffe(https://github.com/intel/caffe/blob/master/models/intel_optimized_models/multinode/default_resnet50_8nodes/train_val.prototxt). With same HP, caffe can achieve SOTA accuracy. How about you send us directly the script you used for GPU training? Or would you please send the trained model to us, so that we can check where is the gap? Thanks.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang I personally haven't run the full training of ResNet50 on ImageNet dataset so far. But I think someone has been done that with the official example code of Chainer found in https://github.com/chainer/chainer/tree/master/examples/imagenet . I'm trying to find the result, but haven't found that yet...
    [Shunta Saito, chainer] @mingxiao.huang Well, could you share the log file of the training with Caffe that achieved "SOTA accuracy". Well, what do you mean by "SOTA"? I think ResNet50 is no longer the state-of-the-art in the image classification task.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul However, we have tried the official example code on GPU before, it only got 73+% accuracy. For the log file, I am figuring how to send file on slack now...
    [Shunta Saito, chainer] @mingxiao.huang Thanks. Well, how about the second question? What do you mean by “SOTA” with caffe… do you have any specific number in mind>
    [mingxiao huang, chainer] IntelCaffe_16nodes_Tests_Train.log
    [mingxiao huang, chainer] @mitmul I just sent you the log file, you can see that caffe acieved "top-1 = 0.75596, top-5 = 0.926322“
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang
    From this part, Caffe does the inference time augmentation. You should do the same augmentation in a custom Evaluator class in the Chainer script.
    https://github.com/intel/caffe/blob/master/models/intel_optimized_models/multinode/default_resnet50_8nodes/train_val.prototxt#L35-L63
    [Shunta Saito, chainer] Doesn’t “random_resize_param” mean the random resizing augmentation?
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Is resizing with “CUBIC” as found in the Caffe prototype is the same way as what skimage.transform.resize does? Different method for resizing may cause different inference results.
    [Shunta Saito, chainer] I think we cannot compare the results without unifying the preprocessing.