Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Sameroom
    @sameroom-bot
    [nishino, chainer] I think this behavior is inconsistent:
    chainer.Parameter(3.)  # broadcasted
    chainer.Parameter(np.array(3.))  # NOT broadcasted
    chainer.Parameter(initializers.Constant(3.))  # broadcasted
    chainer.Parameter(initializers.Constant(np.array(3.)))  # broadcasted
    Sameroom
    @sameroom-bot

    [nishino, chainer] Or perhaps we could change the behavior like this:

    chainer.Parameter(3.)  # broadcasted
    chainer.Parameter(np.array(3.))  # NOT broadcasted
    chainer.Parameter(initializers.Constant(3.))  # broadcasted
    chainer.Parameter(initializers.Constant(np.array(3.)))  # NOT broadcasted

    In order to do this, we could check shape attribute of the initializer: if it has non-None shape, immediately initialized (without broadcast). Otherwise delayed (broadcasted). We need to fix initializers.Constant to have appropriate shape attribute.

    Sameroom
    @sameroom-bot
    [Seiya Tokui, chainer] or let the initializer check the shape
    [Seiya Tokui, chainer] I mean just fixing Constant initializer to check the shape if the constant value to fill is ndarray, and skip the check when the constant value is scalar.
    [Seiya Tokui, chainer] It looks reasonable to me to only broadcast scalars and not broadcast ndarrays.
    Sameroom
    @sameroom-bot
    [nishino, chainer] The problem is, Parameter.__init__ has to know the initializer's decision.
    [nishino, chainer] shape could be the interface to do that.
    Sameroom
    @sameroom-bot
    [Seiya Tokui, chainer] Ah, ok. Then that sounds good.
    Sameroom
    @sameroom-bot
    [Yuta Okamoto, chainer] I wrote this patch chainer/chainer#6072 but wondering how the setter of array should work for the specified array:
    1. For example, it seems it is reasonable that Variable only accepts an array from the device which is the same as one of the array held by the variable. If the user want to assign new array from the different device, s/he can simply calls to_device() explicitly before assigning it.
    2. How to deal with grad if the array is updated?
      Any idea?
    Sameroom
    @sameroom-bot
    [nishino, chainer] With directional differentiation of check_backward, it's difficult to investigate the cause of large numeric errors. I wonder if there are anything that can be done to make it easier when debugging.
    Sameroom
    @sameroom-bot
    [nishino, chainer] Sampling a one-hot direction vector should be enough. (also sampling a unit output grads helps further)
    Sameroom
    @sameroom-bot
    [Masaki Kozuki, chainer] Currently, chainer.testing.FunctionTestCase.generate_inputs is supposed to return tuple of only arrays due to _check_array_types. However, this will make generate_inputs and its corresponding unpacks of some tests (connection_tests) more complex like [this](https://github.com/crcrpar/chainer/blob/rewrite-linear-function-test/tests/chainer_tests/functions_tests/connection_tests/test_linear.py).
    So I think it would be better if _check_array_types accepts None.
    What do you think?
    Sameroom
    @sameroom-bot
    [nishino, chainer] I personally don't want to accept None. Although FunctionNode currently accept None as an input, it's not recommended (we are even considering forbidding it). Similarly, None should not be included in the values of generate_inputs. As F.linear accepts varied number of inputs, I think it's natural that its test generates varied number of inputs.
    Sameroom
    @sameroom-bot
    [Masaki Kozuki, chainer] That makes sense to me. Thank you!
    Sameroom
    @sameroom-bot

    [nishino, chainer] ```

    a = [1, 2, 3]
    a.insert(100, 4)
    a
    [1, 2, 3, 4]
    `` Shouldchainer.Sequentialreplicate thislist`'s behavior? Currently it does not.
    Personally I don't like it because it would be difficult to detect logic errors.

    [nishino, chainer] But if chainer.Sequential's goal is to mimic list's semantics, it should.
    Sameroom
    @sameroom-bot
    [Seiya Tokui, chainer] I think being stricter than list is acceptable here.
    Sameroom
    @sameroom-bot

    [Yuki Hashimoto, chainer] Regarding cupy/cupy#2042, I have a question about cupy implementation.

    In the code below, the input argument s is specified as unsigned long long (uint64, if correct), but it is used to initialize state->xor128 (unsigned int [4]: uint32[4]) where 128 means 32 * 4.
    https://github.com/cupy/cupy/blob/44023c524851d39558cab56cfd1f0c8c52ded4d2/cupy/random/_kernels.py#L26-L35

    If there is a reason for this implementation, I'd like to know about it.

    Sameroom
    @sameroom-bot
    [tos, chainer] To generate uniform, normal, etc., cuRAND host APIs are used.
    To generate some other distributions, random numbers have to be generated on demand. It doesn't seem easy to use cuRAND device APIs (https://github.com/cupy/cupy/issues/1431#issuecomment-417685336).
    Sameroom
    @sameroom-bot
    [nishino, chainer] This was not accurate. It does not work for links, but does work for other callables.
    # error
    seq = Sequential(L.Linear(3))
    seq.insert(1000, L.Linear(4))
    
    # works
    seq = Sequential(F.sin)
    seq.insert(100, F.cos)
    [nishino, chainer] Does Sequential have to inherit from ChainList? It looks to me that that's the cause of bugs (including #6053).
    [nishino, chainer] This was not accurate. It does not work for links, but does work for other callables.
    # error
    seq = Sequential(L.Linear(3))
    seq.insert(100, L.Linear(4))
    
    # works
    seq = Sequential(F.sin)
    seq.insert(100, F.cos)
    Sameroom
    @sameroom-bot

    [nishino, chainer] To me, this comment looks worth reconsidering.
    https://github.com/chainer/chainer/pull/2918#pullrequestreview-104821928

    For this point, we can just override namedlinks() and skip dummy links.
    https://github.com/chainer/chainer/pull/2918#issuecomment-375222516

    Sameroom
    @sameroom-bot
    [himkt, chainer] I want to know what dev-team discuss about chainer/chainer#6351 in detail, if I can. :eyes:
    Sameroom
    @sameroom-bot
    [Seiya Tokui, chainer] We are discussing to introduce a new type that combines a padded ndarray (sorted by lengths) and the lengths information. cuDNN changes its main RNN APIs toward this direction, so we want to follow that. We are currently considering NStepRNN and some basic attention mechanism as an example usage, where some reduction and elementwise operations should be supported for this new type (e.g. softmax, sum, multiply, etc.). There is no concrete design plan yet.
    Sameroom
    @sameroom-bot
    [himkt, chainer] Thanks for giving comprehensive and detailed information! So does it indicates NStepRNN may become like RNN of PyTorch?
    (Personally, I really love the current interface of NStepRNN which takes a list of variable-length sequences)
    Sameroom
    @sameroom-bot
    [Seiya Tokui, chainer] We do not have a conclusion on whether to reuse the existing functions/links to overload the interface or to create new functions/links for that new interface.
    Sameroom
    @sameroom-bot

    [Masaki Kozuki, chainer] Question about how to use attr.cudnn in links tests.

    Current tests of L.BatchNormalization related to GPU/cuDNN (https://github.com/chainer/chainer/blob/master/tests/chainer_tests/links_tests/normalization_tests/test_batch_normalization.py#L119-L122) seems to contradict that of L.Convolution2D (https://github.com/chainer/chainer/blob/master/tests/chainer_tests/links_tests/connection_tests/test_convolution_2d.py#L166-L169).
    The former tests seem to check whether link.forward kicks CuPy implementation when cudnn is available while the latter tests seem to check whether link.forward correctly kicks cudnn implementation if cudnn is available.

    I know there will be LinkTestCase that will ease writing tests for links in a unified way, though, *which is more appropriate?*

    Sameroom
    @sameroom-bot

    [Masaki Kozuki, chainer] Currently, F.concat, F.dstack, F.hstack, F.stack, and F.vstack can take an ndarray as input while they assume xs to be a list of Variables or ndarrays. This is because they don’t do isinstance(xs, list).
    Is this expected?

    e.g.

    In [12]: x = [np.random.randn(1, 2) for _ in range(3)]
    
    In [13]: F.dstack(x)
    Out[13]:
     variable([[[ 1.06418134, -1.1030954 , -1.77550052],
                      [ 0.91533154,  1.22747268, -0.84523645]]])
    
     In [14]: F.dstack(np.asarray(x))
     Out[14]:
     variable([[[ 1.06418134, -1.1030954 , -1.77550052],
                      [ 0.91533154,  1.22747268, -0.84523645]]])
    Sameroom
    @sameroom-bot
    [Seiya Tokui, chainer] Not sure about the intention of the current code, but maybe it's safe to just assume that xs is a "sequence" of Variables or ndarrays. In this case, ndarray with ndim>0 is valid as xs.
    Sameroom
    @sameroom-bot

    [Kshiteej K, chainer] Hi,
    Going through the code base I saw,

    kMaxNdim = 10
    I am curious to know about this choice.
    Thank You.

    Sameroom
    @sameroom-bot
    [Masaki Kozuki, chainer] @tos As to the below link, I wonder there is there’s a consensus about eps values for each dtype. This is true to some methods like l2normalize, I think.
    https://github.com/chainer/chainer/pull/6655#issuecomment-480648016
    Sameroom
    @sameroom-bot
    [Masaki Kozuki, chainer] 無題
    [Masaki Kozuki, chainer] Is this expected behavior?
    In [1]: import numpy as np, chainer, chainer.links as L
    
    In [2]: chainer.print_runtime_info()
    Platform: Linux-4.4.0-141-generic-x86_64-with-debian-stretch-sid
    Chainer: 6.0.0rc1
    NumPy: 1.15.4
    CuPy:
      CuPy Version          : 6.0.0rc1
      CUDA Root             : /usr/local/cuda-9.2
      CUDA Build Version    : 9020
      CUDA Driver Version   : 10000
      CUDA Runtime Version  : 9020
      cuDNN Build Version   : 7004
      cuDNN Version         : 7004
      NCCL Build Version    : None
      NCCL Runtime Version  : None
    iDeep: 2.0.0.post3
    
    In [3]: D = chainer.mixed16
    
    In [4]: initialW, initial_bias = np.random.uniform(-1, 1, (20, 10)).astype(D), np.random.uniform(-1, 1, (20,)).astype(D)
    
    In [5]: linear = L.Linear(10, 20, initialW=initialW, initial_bias=initial_bias)
    
    In [6]: linear.W.dtype, linear.b.dtype
    Out[6]: (dtype('float32'), dtype('float32'))
    Sameroom
    @sameroom-bot
    [Masaki Kozuki, chainer] I guess the cause is that chainer.initializers.Constant ignores dtype [here](https://github.com/chainer/chainer/blob/master/chainer/initializers/init.py#L96-L104, https://github.com/chainer/chainer/blob/master/chainer/initializers/constant.py#L57).
    Also, I'm wondering the priority order of CHAINER_DTYPE and initializer.dtype.
    Sameroom
    @sameroom-bot
    [tos, chainer] It's unexpected to me.
    Sameroom
    @sameroom-bot
    [tos, chainer] There was an offline discussion on chainer/chainer#6116 on Feb 14, but dtype wasn't discussed.
    Sameroom
    @sameroom-bot
    [Seiya Tokui, chainer] Hmm
    [Seiya Tokui, chainer] I originally intended that the default dtype is used whenever there is no information to decide the dtype of a new array (thus the name "default dtype").
    [Seiya Tokui, chainer] So I personally think the above case should be float16 instead of float32 (because the given initial array has the information of dtype). Not sure if there is a subtle case that this behavior would not be natural...
    Sameroom
    @sameroom-bot
    [Masaki Kozuki, chainer] I will consider this more later, but one corner case I randomly come up with is BatchNormalization (this puzzles us always).
    So, if the default dtype is float32 (untouched by a user) and the initializer arrays for gamma and beta are half, then it’s almost impossible to correctly infer the dtype the user wants to use.
    Sameroom
    @sameroom-bot
    [Seiya Tokui, chainer] I expect that the user intentionally passed fp16 arrays in that case (thus it's ok to me that these parameters are initialized in fp16).
    Sameroom
    @sameroom-bot

    [Masaki Kozuki, chainer] Agree.

    Aside from that, I’m curious about why L.BatchNormalization allows dtype argument in its constructor?

    Sameroom
    @sameroom-bot
    [Seiya Tokui, chainer] I think it's a historical argument.
    [Seiya Tokui, chainer] It was added before the following options (use_gamma/beta, initial_***, ...).
    Sameroom
    @sameroom-bot
    [Masaki Kozuki, chainer] As to parameter initializers with dtype, I have some thoughts.
    1. I prefer dtype specification of whole network to a layer-wise manner. I mean, it’ll be much more intuitive and user-friendly if there’s a Link’s method such as to_half() and cast(mixed16).
    2. Current Initializers support too many patterns. So, if the fill_value can be a parameter as is, we should treat it separately. One example is implementing a class method like L.Convolution2D.from_params(cls, W, b=None, stride=1, pad=0, groups=1, dilate=1) because if a fill_value can be a parameter as is, in_size, out_size, and equivalent arguments are redundant.
    Sameroom
    @sameroom-bot

    [Seiya Tokui, chainer] 1. That sounds good for single-device models. I'm not sure what is the desirable behavior for multi-device models (model parallelism). One idea is to let user write the mapping of devices (e.g. for moving cpu/gpu hybrid model to multi-gpu, write mapping like cpu->gpu1, gpu0->gpu0). I think such a case is currently rare, so just starting from cast(dtype) for single-device models is also ok.

    1. The simplified creation function sounds interesting. We can start from major links like Linear and Convolution*D.

    Could you make issues for them? I think both ideas are reasonable to implement.

    Sameroom
    @sameroom-bot
    [Masaki Kozuki, chainer] Sure!
    Sameroom
    @sameroom-bot

    [Masaki Kozuki, chainer] So, I tentatively filed an issue that describe background of changes that I want to add.
    chainer/chainer#7040

    I’ll add detailed issues for my 2 ideas.