drahnr on master
chore/archive: add archive note (compare)
drahnr on master
chore/archive: archive this rep… (compare)
drahnr on master
chore/archive: closing this repo (compare)
drahnr on master
fix/README.md: remove link to s… (compare)
drahnr on master
chore/version: release 0.2.1 (compare)
drahnr on master
doc/badge: adjust path again (compare)
drahnr on master
doc/badge: adjust path (compare)
sirmergealot on gh-pages
doc/automatic: update (compare)
sirmergealot on gh-pages
doc/automatic: update (compare)
drahnr on master
fix/backends: prevent impl of :… (compare)
for example, the linear layer has:
impl<B: IBackend + LayerOps<f32>> ComputeInputGradient<f32, B> for Linear {
fn compute_input_gradient(&self,
backend: &B,
weights_data: &[&SharedTensor<f32>],
output_data: &[&SharedTensor<f32>],
output_gradients: &[&SharedTensor<f32>],
input_data: &[&SharedTensor<f32>],
input_gradients: &mut [&mut SharedTensor<f32>]) {
// Gradient with respect to input data
backend.gemm(&self.one,
Transpose::NoTrans,
output_gradients[0],
Transpose::NoTrans,
weights_data[0],
&self.zero,
input_gradients[0])
.unwrap();
}
}
impl<B: IBackend + LayerOps<f32>> ComputeParametersGradient<f32, B> for Linear {
fn compute_parameters_gradient(&self,
backend: &B,
output_data: &[&SharedTensor<f32>],
output_gradients: &[&SharedTensor<f32>],
input_data: &[&SharedTensor<f32>],
parameters_gradients: &mut [&mut SharedTensor<f32>]) {
// gradient w.r.t. weights
backend.gemm(&self.one,
Transpose::Trans,
output_gradients[0],
Transpose::NoTrans,
input_data[0],
&self.zero,
parameters_gradients[0])
.unwrap();
}
}
I am confused on computing gradients here since there is no derivatives of activation functions
Tensor2<Const<2>, Dynamic>