Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Sameroom
    @sameroom-bot
    [ikei, chainer] :1
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] Do you have a plan when iDeep 2.0 will be released?
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] I also hope this fix is included into iDeep4py 2.0. https://chainer.slack.com/archives/C4MQ9RMNG/p1517983472000240
    Currently users are forced to install NumPy==1.13.0 and cannot use other versions of NumPy.
    Setting requirements to numpy>=1.13.0 would be nice.
    [Cao Zhong, chainer] It is nearly done. In master branch. We are deciding when to release it and what is in it. @kmaehashi
    [Cao Zhong, chainer] Yes, we’ll remove that constraint.
    [Kenichi Maehashi, chainer] Nice, thank you!
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] I’d like to copy data from NumPy array to iDeep array, and found ideep4py.basic_copyto(dst, src) which seems like an equivalent for numpy.copyto(dst, src). Is this basic_copyto considered stable? Can we use this interface? (context: chainer/chainer#5009)
    [Cao Zhong, chainer] @feng1.yuan
    Sameroom
    @sameroom-bot
    [fengyuan, chainer] Yes, That is a stable API of iDeep python package (ideep4py).
    The case would be,
    ideep4py.basic_copyto(x_ideep, x_np).
    [Kenichi Maehashi, chainer] Ok, thanks for clarification!
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul Thanks. So, any findings on your site?
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Currently we haven't succeeded to manage the time to work on it for now. Please give us some time by this weekend. Sorry about that.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul ok, Chainermn's batch normalization implementation is heavily based on Chainer's batch normalization old version, I am trying to sync Chainermn's batch normalization with Chainer's batch normalization latest version. I will keep you informed with the progress. Thanks.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul we have found some something improper in our script, we did not shuffle the input training set during scatter. After we shuffle the data, we haven't seen any validation gap compared to single node after 10000 iterations. Thanks.
    [Shunta Saito, chainer] Great! Thanks for letting us know it.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @nishino hello, nishino, recently we trained resnet50 GPU model on single node and only got 73+% validation accuracy on ILSVRC2012, there is 2% gap as compared to SOTA. We used poly learning rate policy. It is said that you can got SOTA for resnet50 on your site, may you please share the hyper parameters with us? Thanks.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Which site are you referring? I'll ask people who conducted that experiment if I can know the exact page URL.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul from http://on-demand.gputechconf.com/gtc/2018/presentation/s8889-training-imagenet-in-15-minutes-with-chainermn-a-scalable-distributed-dl-framework.pdf, we can see that you got a comparable accuracy of 74.9% on multi-node. May you please send the training script file you used to us? If you have ever trained resnet50 on single node, it would be better to send us the training script file you used for single node to us. Thanks.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang The batchsize are you using for your experiment is also 32000?
    [Shunta Saito, chainer] @kfukuda Hi Fukuda-san, he is mentioning your slides you presented at GTC and asking the training setting to achieve 74.9% accuracy on ImageNet with ResNet50. Can we share the training script with him?
    [Keisuke Fukuda, chainer] I think we need to consult Akiba-san.
    [Keisuke Fukuda, chainer] However, basically, everything is written in the paper
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Yeah, @mingxiao.huang Have you already read this paper? https://arxiv.org/abs/1711.04325
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul Yes, we have read it. However, to make the problem simpler, we are trying single GPU case now, it would be nice if you have such a training script file for single node and send it to us. Thanks.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] What is the global minibatch size? I think the techniques written in the paper is for "Extremely large minibatch" as you can see in the title. If the global minibatch size is not so large because you are just using a single GPU, the technique required to achieve good results will be different. What do you think @kfukuda?
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul Yes, that is also my concern. for single node, we used batchsize=128. The technique required to achieve good results will be different for single node training, so, it would be nice if you have such a training script file for single node and send it to us. Thanks.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul any more comments? Do you have such a training script file for single node ?
    Sameroom
    @sameroom-bot

    [Shunta Saito, chainer] @mingxiao.huang Could you answer to his question?

    Did you achieve the SOTA with the original ResNet-50 (with batchsize 64) ?

    [Shunta Saito, chainer] Regarding a training script, I think it's OK to use the normal one for ResNet50 training on ImageNet-1K with 128 minibatchsize. You can find the example code in Chainer repository. Have you already tried that?
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul the example code in Chainer repository is rather simple, actually, we would like to reproduce your result with detailed hyper parameters you used, since we could not get SOTA accuracy on resnet50 on chainer at all, however, with same parameters, we can achieve SOTA on caffe.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang What we intended to say was that you need to use "extremely large batchsize" to reproduce the result on paper. If you want to achieve over 74% accuracy by using only 1 GPU, you need different technique for that setting. I think we have never tested such small environment with only 1 GPU because it takes too much time to finish 90 epochs. Is this correct? @kfukuda
    Sameroom
    @sameroom-bot
    [Keisuke Fukuda, chainer] Yes, even the original ResNet author (Kaiming He) used 8 GPUs (if I remember correctly). The technique written on the paper is for extremely large minibatch size (>8K) and it does not make much sense to use the techniques (RMSProp warmup etc.) for such a small batchsize.
    [Keisuke Fukuda, chainer] Before trying our techniques, I recommend to try Facebook’s result with 4K batchsize.
    [Keisuke Fukuda, chainer] Techniques for large batch size does not necessarily good for small batchsize.
    Sameroom
    @sameroom-bot
    [Keisuke Fukuda, chainer] So, it depends on what you are trying to do @mingxiao.huang. If you want to reproduce our results, first you need to prepare >512 GPUs and try 32K batchsize. Trying to reproduce our results with small batchsize is nonsense. And, repeatedly, I recommend starting from Facebook’s result.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul @kfukuda thanks, theoretically speaking, I guess we can try one GPU with mini batchsize=128, use same learning rate as yours, but not involve "RMSProp warmup". Correct?
    Sameroom
    @sameroom-bot
    [Keisuke Fukuda, chainer] Learning rate is just one of parameters. What is your goal?
    [Keisuke Fukuda, chainer] To achieve the SOTA with BS=128?
    [Keisuke Fukuda, chainer] Or, reproduce our “extremely large minibatch” paper?
    Sameroom
    @sameroom-bot
    [Keisuke Fukuda, chainer] If (1), forget our “extremely large…” paper and go back to the original ResNet hyperparameters. If you’ve done on caffe, you should be able to port your code to Chainer (but very carefully). It’s totally a different work.
    [Keisuke Fukuda, chainer] If (2), you need to implement everything on the paper including RMSProp and everything. And you should try batch size 32K. You can simulate 32K batchsize with a small number of GPUs by skipping updating and accumulating gradients.
    [Keisuke Fukuda, chainer] Again, I recommend starting from Facebook’s paper with 4K batchsize because our paper is based on it.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] And, could you share the code you are using now for ResNet50 training on ImageNet-1K with 1 GPU? I'd like to try the code to ensure that cannot achieve > 74% accuracy on validation dataset.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul please kindly check https://github.com/mingxiaoh/chainer-training-script for the code we used. Thanks.
    [Shunta Saito, chainer] Thanks! I'll take a look into it.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @nishino @mitmul hello, may you please help merge chainer/chainer#4933, chainer/chainer#5033 ? It has been pending for a long time. Thanks.
    Sameroom
    @sameroom-bot
    [nishino, chainer] I'm sorry for inconvenience. @kmaehashi Could you take a look?
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] Sorry for the delay. I’ll proceed to merge shortly.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @nishino @kmaehashi hello, may you please help merge chainer/chainer#4933, chainer/chainer#5033 ? It has been pending for a long time. Thanks.
    [mingxiao huang, chainer] @mitmul hello, how is your resnet50 training test going?