Parameterwith a given ndarray. My opinion could change if "scalar"
Variableis supported in the sense of
np.isscalar. The following behavior is natural to me:
>>> chainer.Parameter(3.) variable(None) >>> chainer.Parameter(np.array(3.)) variable(3.)
[nishino, chainer] Or perhaps we could change the behavior like this:
chainer.Parameter(3.) # broadcasted chainer.Parameter(np.array(3.)) # NOT broadcasted chainer.Parameter(initializers.Constant(3.)) # broadcasted chainer.Parameter(initializers.Constant(np.array(3.))) # NOT broadcasted
In order to do this, we could check
shape attribute of the initializer: if it has non-None
shape, immediately initialized (without broadcast). Otherwise delayed (broadcasted). We need to fix
initializers.Constant to have appropriate
Constantinitializer to check the shape if the constant value to fill is ndarray, and skip the check when the constant value is scalar.
shapecould be the interface to do that.
arrayshould work for the specified array:
Variableonly accepts an array from the device which is the same as one of the array held by the variable. If the user want to assign new array from the different device, s/he can simply calls
to_device()explicitly before assigning it.
chainer.testing.FunctionTestCase.generate_inputsis supposed to return tuple of only arrays due to
_check_array_types. However, this will make
generate_inputsand its corresponding unpacks of some tests (
connection_tests) more complex like [this](https://github.com/crcrpar/chainer/blob/rewrite-linear-function-test/tests/chainer_tests/functions_tests/connection_tests/test_linear.py).
Noneas an input, it's not recommended (we are even considering forbidding it). Similarly,
Noneshould not be included in the values of
F.linearaccepts varied number of inputs, I think it's natural that its test generates varied number of inputs.
chainer.Sequential's goal is to mimic
list's semantics, it should.
[Yuki Hashimoto, chainer] Regarding cupy/cupy#2042, I have a question about cupy implementation.
In the code below, the input argument
s is specified as
unsigned long long (
uint64, if correct), but it is used to initialize
unsigned int :
32 * 4.
If there is a reason for this implementation, I'd like to know about it.
Sequentialhave to inherit from
ChainList? It looks to me that that's the cause of bugs (including #6053).
# error seq = Sequential(L.Linear(3)) seq.insert(100, L.Linear(4)) # works seq = Sequential(F.sin) seq.insert(100, F.cos)
[nishino, chainer] To me, this comment looks worth reconsidering.
For this point, we can just override
namedlinks() and skip dummy links.
[Masaki Kozuki, chainer] Question about how to use
attr.cudnn in links tests.
Current tests of
L.BatchNormalization related to GPU/cuDNN (https://github.com/chainer/chainer/blob/master/tests/chainer_tests/links_tests/normalization_tests/test_batch_normalization.py#L119-L122) seems to contradict that of
The former tests seem to check whether
link.forward kicks CuPy implementation when cudnn is available while the latter tests seem to check whether
link.forward correctly kicks cudnn implementation if cudnn is available.
I know there will be
LinkTestCase that will ease writing tests for links in a unified way, though, *which is more appropriate?*
[Masaki Kozuki, chainer] Currently,
F.vstack can take an
ndarray as input while they assume
xs to be a list of
ndarrays. This is because they don’t do
Is this expected?
In : x = [np.random.randn(1, 2) for _ in range(3)] In : F.dstack(x) Out: variable([[[ 1.06418134, -1.1030954 , -1.77550052], [ 0.91533154, 1.22747268, -0.84523645]]]) In : F.dstack(np.asarray(x)) Out: variable([[[ 1.06418134, -1.1030954 , -1.77550052], [ 0.91533154, 1.22747268, -0.84523645]]])
In : import numpy as np, chainer, chainer.links as L In : chainer.print_runtime_info() Platform: Linux-4.4.0-141-generic-x86_64-with-debian-stretch-sid Chainer: 6.0.0rc1 NumPy: 1.15.4 CuPy: CuPy Version : 6.0.0rc1 CUDA Root : /usr/local/cuda-9.2 CUDA Build Version : 9020 CUDA Driver Version : 10000 CUDA Runtime Version : 9020 cuDNN Build Version : 7004 cuDNN Version : 7004 NCCL Build Version : None NCCL Runtime Version : None iDeep: 2.0.0.post3 In : D = chainer.mixed16 In : initialW, initial_bias = np.random.uniform(-1, 1, (20, 10)).astype(D), np.random.uniform(-1, 1, (20,)).astype(D) In : linear = L.Linear(10, 20, initialW=initialW, initial_bias=initial_bias) In : linear.W.dtype, linear.b.dtype Out: (dtype('float32'), dtype('float32'))
chainer.initializers.Constantignores dtype [here](https://github.com/chainer/chainer/blob/master/chainer/initializers/init.py#L96-L104, https://github.com/chainer/chainer/blob/master/chainer/initializers/constant.py#L57).
Initializers support too many patterns. So, if the fill_value can be a parameter as is, we should treat it separately. One example is implementing a class method like L.Convolution2D.from_params(cls, W, b=None, stride=1, pad=0, groups=1, dilate=1) because if a fill_value can be a parameter as is, in_size, out_size, and equivalent arguments are redundant.
[Seiya Tokui, chainer] 1. That sounds good for single-device models. I'm not sure what is the desirable behavior for multi-device models (model parallelism). One idea is to let user write the mapping of devices (e.g. for moving cpu/gpu hybrid model to multi-gpu, write mapping like cpu->gpu1, gpu0->gpu0). I think such a case is currently rare, so just starting from
cast(dtype) for single-device models is also ok.
Could you make issues for them? I think both ideas are reasonable to implement.