[nishino, chainer] To me, this comment looks worth reconsidering.
https://github.com/chainer/chainer/pull/2918#pullrequestreview-104821928
For this point, we can just override namedlinks()
and skip dummy links.
https://github.com/chainer/chainer/pull/2918#issuecomment-375222516
[Masaki Kozuki, chainer] Question about how to use attr.cudnn
in links tests.
Current tests of L.BatchNormalization
related to GPU/cuDNN (https://github.com/chainer/chainer/blob/master/tests/chainer_tests/links_tests/normalization_tests/test_batch_normalization.py#L119-L122) seems to contradict that of L.Convolution2D
(https://github.com/chainer/chainer/blob/master/tests/chainer_tests/links_tests/connection_tests/test_convolution_2d.py#L166-L169).
The former tests seem to check whether link.forward
kicks CuPy implementation when cudnn is available while the latter tests seem to check whether link.forward
correctly kicks cudnn implementation if cudnn is available.
I know there will be LinkTestCase
that will ease writing tests for links in a unified way, though, *which is more appropriate?*
[Masaki Kozuki, chainer] Currently, F.concat
, F.dstack
, F.hstack
, F.stack
, and F.vstack
can take an ndarray
as input while they assume xs
to be a list of Variable
s or ndarray
s. This is because they don’t do isinstance(xs, list)
.
Is this expected?
e.g.
In [12]: x = [np.random.randn(1, 2) for _ in range(3)]
In [13]: F.dstack(x)
Out[13]:
variable([[[ 1.06418134, -1.1030954 , -1.77550052],
[ 0.91533154, 1.22747268, -0.84523645]]])
In [14]: F.dstack(np.asarray(x))
Out[14]:
variable([[[ 1.06418134, -1.1030954 , -1.77550052],
[ 0.91533154, 1.22747268, -0.84523645]]])
In [1]: import numpy as np, chainer, chainer.links as L
In [2]: chainer.print_runtime_info()
Platform: Linux-4.4.0-141-generic-x86_64-with-debian-stretch-sid
Chainer: 6.0.0rc1
NumPy: 1.15.4
CuPy:
CuPy Version : 6.0.0rc1
CUDA Root : /usr/local/cuda-9.2
CUDA Build Version : 9020
CUDA Driver Version : 10000
CUDA Runtime Version : 9020
cuDNN Build Version : 7004
cuDNN Version : 7004
NCCL Build Version : None
NCCL Runtime Version : None
iDeep: 2.0.0.post3
In [3]: D = chainer.mixed16
In [4]: initialW, initial_bias = np.random.uniform(-1, 1, (20, 10)).astype(D), np.random.uniform(-1, 1, (20,)).astype(D)
In [5]: linear = L.Linear(10, 20, initialW=initialW, initial_bias=initial_bias)
In [6]: linear.W.dtype, linear.b.dtype
Out[6]: (dtype('float32'), dtype('float32'))
chainer.initializers.Constant
ignores dtype [here](https://github.com/chainer/chainer/blob/master/chainer/initializers/init.py#L96-L104, https://github.com/chainer/chainer/blob/master/chainer/initializers/constant.py#L57).CHAINER_DTYPE
and initializer.dtype.
Initializer
s support too many patterns. So, if the fill_value can be a parameter as is, we should treat it separately. One example is implementing a class method like L.Convolution2D.from_params(cls, W, b=None, stride=1, pad=0, groups=1, dilate=1) because if a fill_value can be a parameter as is, in_size, out_size, and equivalent arguments are redundant.[Seiya Tokui, chainer] 1. That sounds good for single-device models. I'm not sure what is the desirable behavior for multi-device models (model parallelism). One idea is to let user write the mapping of devices (e.g. for moving cpu/gpu hybrid model to multi-gpu, write mapping like cpu->gpu1, gpu0->gpu0). I think such a case is currently rare, so just starting from cast(dtype)
for single-device models is also ok.
Could you make issues for them? I think both ideas are reasonable to implement.
[Masaki Kozuki, chainer] So, I tentatively filed an issue that describe background of changes that I want to add.
chainer/chainer#7040
I’ll add detailed issues for my 2 ideas.
[Masaki Kozuki, chainer] 1. As to CIs. I got your point that the detail is changing though, IMO github’s wiki is more appropriate place if README has the link to the wiki to write such information with changing details since there is no need to file PRs.
experimental
since the CuPy doesn’t support multihead attention yet. But the original PR got no reaction for a while despite there is an assignee. I submitted a PR as WIP (draft PR) initially and there is a possibility the one did not notice that I set it ready for review. Since GitHub officially supports draft PRs, we have to some rules as to them. I mean, if one mark his/her PR as ready for review, whether to mention the assignee.Thanks.
n_{prural of noun}
because it seems to be most frequently used at a glance.
`\
model.summary()
like that of keras that prints a summary representation of our model or simply print(model)
like that of pytorch that gives us some ideas about different layers involved and their specifications.graphviz
but it involves number of steps to be performed. Something like a quick summary would definitely make it easy to analyze the network.