by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] Sorry for the delay. I’ll proceed to merge shortly.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @nishino @kmaehashi hello, may you please help merge chainer/chainer#4933, chainer/chainer#5033 ? It has been pending for a long time. Thanks.
    [mingxiao huang, chainer] @mitmul hello, how is your resnet50 training test going?
    Sameroom
    @sameroom-bot
    [nishino, chainer] @mingxiao.huang Sorry, @kmaehashi is on leave this week. We will arrange another reviewer soon.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] thanks
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] Could you check my comment?
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] hello, @kmaehashi @nishino when will chainer/chainer#5033 be merged?
    Sameroom
    @sameroom-bot
    [Kenichi Maehashi, chainer] Sorry for the delay, I’ll proceed to merge.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul @nishino hello, how is the resnet50 GPU training on your site?
    [Shunta Saito, chainer] Sorry I was on a business trip and too busy for another project preparing for CEATEC event next week, so couldn't have enough time to work on it...
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang Did you perform multi-scale cropping at *inference time*?
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang Could you give us the evaluation script? I could find only the training scripts in this repository: https://github.com/mingxiaoh/chainer-training-script
    Sameroom
    @sameroom-bot

    [Shunta Saito, chainer] @mingxiao.huang Did you surely perform all of these techniques used in the inference time which are written in the original ResNet50 paper?

    In testing, for comparison studies we adopt the standard 10-crop testing [21]. For best results, we adopt the fully convolutional form as in [41, 13], and average the scores at multiple scales (images are resized such that the shorter side is in {224, 256, 384, 480, 640}).

    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul For your first question, in that script you can see that extensions.Evaluator function is used for evaluation.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul For your second question, as you can see the scripts we send to you, we used raw data images, then adopted the same data augmentation policy as caffe(https://github.com/intel/caffe/blob/master/models/intel_optimized_models/multinode/default_resnet50_8nodes/train_val.prototxt). With same HP, caffe can achieve SOTA accuracy. How about you send us directly the script you used for GPU training? Or would you please send the trained model to us, so that we can check where is the gap? Thanks.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang I personally haven't run the full training of ResNet50 on ImageNet dataset so far. But I think someone has been done that with the official example code of Chainer found in https://github.com/chainer/chainer/tree/master/examples/imagenet . I'm trying to find the result, but haven't found that yet...
    [Shunta Saito, chainer] @mingxiao.huang Well, could you share the log file of the training with Caffe that achieved "SOTA accuracy". Well, what do you mean by "SOTA"? I think ResNet50 is no longer the state-of-the-art in the image classification task.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul However, we have tried the official example code on GPU before, it only got 73+% accuracy. For the log file, I am figuring how to send file on slack now...
    [Shunta Saito, chainer] @mingxiao.huang Thanks. Well, how about the second question? What do you mean by “SOTA” with caffe… do you have any specific number in mind>
    [mingxiao huang, chainer] IntelCaffe_16nodes_Tests_Train.log
    [mingxiao huang, chainer] @mitmul I just sent you the log file, you can see that caffe acieved "top-1 = 0.75596, top-5 = 0.926322“
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang
    From this part, Caffe does the inference time augmentation. You should do the same augmentation in a custom Evaluator class in the Chainer script.
    https://github.com/intel/caffe/blob/master/models/intel_optimized_models/multinode/default_resnet50_8nodes/train_val.prototxt#L35-L63
    [Shunta Saito, chainer] Doesn’t “random_resize_param” mean the random resizing augmentation?
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Is resizing with “CUBIC” as found in the Caffe prototype is the same way as what skimage.transform.resize does? Different method for resizing may cause different inference results.
    [Shunta Saito, chainer] I think we cannot compare the results without unifying the preprocessing.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Thank you for sending the log. I'll share it with Chainer team!
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul any progress for this issue so far?
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @mingxiao.huang Oh, I waiting for your answer about “random_resize_param” in the caffe’s prototxt. If it means the random resizing augmentation during inference, we need to fix the Evaluator used in the Chainer script first.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul it means random resizing augmentation during training(not inference). Besides, according to our test(https://github.com/mingxiaoh/chainer-training-script/blob/master/train_imagenet_resnet50_blockmode_augmentation.py) on Chainer, it seems that there is no much difference whether we do augmentation or not.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] @minx
    [Shunta Saito, chainer] @mingxiao.huang OK thanks. I finally succeeded to allocate my time and the computational resource, so started the training script with 4 V100 GPUs. Please wait until it finishes. THanks
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] thanks
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] (The job has been terminated by an internal issue of the cluster, but I re-launched the training job. It says it will take 6 days to finish.)
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] ok, thanks for the update
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Hmm... there seems to exist some collapsed images in the dataset on our system... it somehow has stopped due to the failure of loading images, sad. I'll check all the images first and try it again. Sorry for taking too long time to reproduce your work.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Fixed a bug!
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] The training is running…
    Sameroom
    @sameroom-bot
    [Cao Zhong, chainer] @mitmul About ChainerX, anything we can help or we can do please let us know. :$
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul how about the progress of resnet50 training ?
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] Now it’s done until 54 epoch. Still training… (I restarted the training after the process was killed by an internal reason) sorry it takes so long time
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] 20 hours remaining to finish the training. It shows 70.24% for the validation accuracy at 80 epochs. Will see the result at 90 epochs in 20 hours.
    Sameroom
    @sameroom-bot
    [Shunta Saito, chainer] I confirmed that the final validation accuracy at 900000 iteration (89epoch) was 0.7012. It’s actually not enough to say this can achieve the same accuracy as Caffe’s script can. I’ll investigate the reason.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul thanks, I had trained it on GPU on chainer4.0.0b2 long before and achieved 0.7313 accuracy. Looking forward to your findings.
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul any progress for resnet50 training accuracy problem investigation on GPU?
    Sameroom
    @sameroom-bot
    [Cao Zhong, chainer] @nishino @mitmul Does chainer support INT8 inference?
    Sameroom
    @sameroom-bot
    [Yong Wu, chainer] @nishino @mitmul Hi, We are working on a plan to support INT8/VNNI in Chainer/iDeep to accelerate inference on IA. We’d like to listen to your thoughts/plan on INT8 inference. Will it be part of Chainer 5.x roadmap? Or you think it should go with another path?
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @nishino wrt docker image, according to my understanding, latest-intel-python2 and latest-intel-python3 are generated by https://github.com/chainer/chainer/tree/master/docker/intel/python2 and https://github.com/chainer/chainer/tree/master/docker/intel/python3, respectively. But, where is the dockerfile for latest-intel? We downloaded the latest-intel and latest-intel-python2 images, and found that latest-intel is also based on Ubuntu 16.04.5 and 2.7.12 as latest-intel-python2. what is the difference between latest-intel and latest-intel-python2? Thanks.
    Sameroom
    @sameroom-bot
    [nishino, chainer] @4pao @uyong
    We are aware that there's a demand for int8 inference. However we have not a plan so far, and currently it is not sure how Chainer should support that. One possibility might be to delegate that to ONNX, for example.
    Sameroom
    @sameroom-bot
    [Yong Wu, chainer] @nishino Thanks for the reply. As for ONNX, you mean Chainer to ONNX then to other frameworks that supports INT8/ONNX? Do you have any recommendations?
    Sameroom
    @sameroom-bot
    [mingxiao huang, chainer] @mitmul no mean to push, but any progress for resnet50 training accuracy problem investigation on GPU?