Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Feb 02 16:15
    fantes labeled #1502
  • Feb 02 15:05
    mergify[bot] review_requested #1502
  • Feb 02 15:05
    mergify[bot] review_requested #1502
  • Feb 02 15:05
    mergify[bot] review_requested #1502
  • Feb 02 15:04
    royale labeled #1502
  • Feb 02 15:04
    royale opened #1502
  • Jan 31 16:21
    mergify[bot] unlabeled #1501
  • Jan 31 16:21

    mergify[bot] on master

    fix(trace_yolox): bbox shifted … (compare)

  • Jan 31 16:21
    mergify[bot] closed #1501
  • Jan 31 16:21
    mergify[bot] labeled #1501
  • Jan 31 15:30
    mergify[bot] unlabeled #1501
  • Jan 31 14:26
    mergify[bot] unlabeled #1500
  • Jan 31 14:25
    mergify[bot] synchronize #1501
  • Jan 31 14:25

    mergify[bot] on master

    fix(torch): data augmentation h… (compare)

  • Jan 31 14:25
    mergify[bot] closed #1500
  • Jan 30 18:58
    mergify[bot] labeled #1501
  • Jan 30 16:34
    mergify[bot] review_requested #1501
  • Jan 30 16:34
    mergify[bot] review_requested #1501
  • Jan 30 16:34
    mergify[bot] review_requested #1501
  • Jan 30 16:34
    Bycob labeled #1501
Louis Jean
@Bycob
Do you have a dd call? service creation + predict call
cchadowitz-pf
@cchadowitz-pf
curl -X PUT "http://localhost:8081/services/openimages" -d '{
  "mllib":"torch",
  "description":"test",
  "type":"supervised",
  "parameters":{
    "input":{"connector":"image"},
    "mllib":{ "nclasses":6012}
  },
  "model":{"repository":"/opt/models/"}
}'|jq
no predict call since it errors at the service creation
Louis Jean
@Bycob
ok
cchadowitz-pf
@cchadowitz-pf
let me rebuild once more to make sure there isn't something weird happening
cchadowitz-pf
@cchadowitz-pf
yeah same error
Louis Jean
@Bycob
It looks like it's an exception from torch::jit::load, so maybe you can reproduce it with your minimal example
DD should not be crashing though, I will check this
cchadowitz-pf
@cchadowitz-pf
what version of torch does DD use? perhaps it's a version mismatch
i used 1.12.0+cu113 from google colab to save the scripted model

It looks like it's an exception from torch::jit::load, so maybe you can reproduce it with your minimal example

my minimal example doesn't have any problems, but i'm using libtorch-1.12.0+cpu in that example

cchadowitz-pf
@cchadowitz-pf
okay i'll try saving the scripted model with v1.11 and see what happens
cchadowitz-pf
@cchadowitz-pf
definitely an issue with type mismatch. model loads without a problem when it was scripted+saved with torch==1.11.0 and torchvision==0.12.0 (instead of torch==1.12.0 and torchvision==0.13.0)
cchadowitz-pf
@cchadowitz-pf
sorry @Bycob but I'm running into a different error now :sweat_smile: I've now gotten the v1.11.0 scripted model loaded successfully but the predict call is producing this error
[2022-07-18 13:34:12.916] [openimages] [error] mllib internal error: Libtorch error:Dimension out of range (expected to be in range of [-1, 0], but got 1)
Exception raised from maybe_wrap_dim at /deepdetect/build/pytorch/src/pytorch/c10/core/WrapDimMinimal.h:25 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f55a6186dfc in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc6b58a (0x7f55a6e3158a in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #2: at::meta::structured__softmax::meta(at::Tensor const&, long, bool) + 0x37 (0x7f55a79ccbf7 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x206dde5 (0x7f55a8233de5 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x206de6c (0x7f55a8233e6c in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #5: at::_ops::_softmax::redispatch(c10::DispatchKeySet, at::Tensor const&, long, bool) + 0xd4 (0x7f55a8069de4 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x3cb3dfe (0x7f55a9e79dfe in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x3cb42cf (0x7f55a9e7a2cf in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::_softmax::call(at::Tensor const&, long, bool) + 0x144 (0x7f55a80d8114 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #9: at::native::softmax(at::Tensor const&, long, c10::optional<c10::ScalarType>) + 0xa6 (0x7f55a79cd5f6 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x227db2b (0x7f55a8443b2b in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #11: at::_ops::softmax_int::call(at::Tensor const&, long, c10::optional<c10::ScalarType>) + 0x14d (0x7f55a80c712d in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x382006 (0x55c89956e006 in ./main/dede)
frame #13: <unknown function> + 0x25567e (0x55c89944167e in ./main/dede)
frame #14: <unknown function> + 0x255a08 (0x55c899441a08 in ./main/dede)
frame #15: <unknown function> + 0x255d28 (0x55c899441d28 in ./main/dede)
frame #16: <unknown function> + 0x256048 (0x55c899442048 in ./main/dede)
frame #17: <unknown function> + 0x256368 (0x55c899442368 in ./main/dede)
frame #18: <unknown function> + 0x256688 (0x55c899442688 in ./main/dede)
frame #19: <unknown function> + 0x2569a8 (0x55c8994429a8 in ./main/dede)
frame #20: <unknown function> + 0x256cc8 (0x55c899442cc8 in ./main/dede)
frame #21: <unknown function> + 0x256fe8 (0x55c899442fe8 in ./main/dede)
frame #22: <unknown function> + 0x257308 (0x55c899443308 in ./main/dede)
frame #23: <unknown function> + 0x257628 (0x55c899443628 in ./main/dede)
frame #24: <unknown function> + 0x257d66 (0x55c899443d66 in ./main/dede)
frame #25: <unknown function> + 0x513418 (0x55c8996ff418 in ./main/dede)
frame #26: <unknown function> + 0x203994 (0x55c8993ef994 in ./main/dede)
frame #27: <unknown function> + 0x1b41e2 (0x55c8993a01e2 in ./main/dede)
frame #28: <unknown function> + 0x874f66 (0x55c899a60f66 in ./main/dede)
frame #29: <unknown function> + 0x875ae2 (0x55c899a61ae2 in ./main/dede)
frame #30: <unknown function> + 0x879ca0 (0x55c899a65ca0 in ./main/dede)
frame #31: <unknown function> + 0xd6de4 (0x7f55a4e35de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #32: <unknown function> + 0x8609 (0x7f55a4b96609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #33: clone + 0x43 (0x7f55a4abb133 in /lib/x86_64-linux-gnu/libc.so.6)
this is happening after the _module.forward() call, so progress!
oh i wonder if the output of my model doesn't match what DD expects....
cchadowitz-pf
@cchadowitz-pf
is there any documentation about that for the DD/Torch integration - what the output format should be like from the model?
cchadowitz-pf
@cchadowitz-pf
ok i can confirm that error is coming from https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchlib.cc#L1501 which supports my theory that my model output is different from what DD expects
cchadowitz-pf
@cchadowitz-pf
in my model's forward() method, I'm now returning this return torch.reshape(y_pred, (1, -1)) instead of simply return y_pred and that resolved the error. It seems that DD expects the output Tensor to have dimensions (1, N) instead of simply a length N vector.
Emmanuel Benazera
@beniz
What's your model, a single label classifier, multi-label maybe ?
cchadowitz-pf
@cchadowitz-pf
it's actually the old OpenImagesInceptionV3 model, so multi-label
I seem to have it working except that the output probabilities are not equal between my minimal example and DD. When I use torch::ones({1, 3,299,299}) as input they're identical, so I wonder if there's something different between how I'm reading/preprocessing the image vs how it's done in DD
Louis Jean
@Bycob
I think DD expects the model to return the logits, that are post processed afterward
So the dimensions of the output tensor should be batch_size x num_classes
Louis Jean
@Bycob
Hmm if your model is multi label you may not want the softmax on top
I think that's something to be added to DD, can we get the model to try it on our side?
cchadowitz-pf
@cchadowitz-pf
sure, what stage do you want the model? It's the original OpenImagesInceptionV3 for Tensorflow (from here https://www.deepdetect.com/models/tf/) that I then converted with MMDNN to PyTorch. Then I scripted it to use with LibTorch/DD
at least, I believe the OpenImagesInceptionV3 model is multi-label
the last few layers look like this:
        InceptionV3_Logits_Conv2d_1c_1x1_convolution = self.InceptionV3_Logits_Conv2d_1c_1x1_convolution(InceptionV3_Logits_AvgPool_1a_8x8_AvgPool)
        InceptionV3_Logits_SpatialSqueeze = torch.squeeze(InceptionV3_Logits_Conv2d_1c_1x1_convolution)
        multi_predictions = F.sigmoid(InceptionV3_Logits_SpatialSqueeze)
        return multi_predictions
so that seems to suggest you're correct - it's multi-label and it's already applying a sigmoid at the end so a softmax on top of that probably isn't ideal
cchadowitz-pf
@cchadowitz-pf
also I tried removing the softmax from DD to compare the output of my minimal example vs DD (without softmax) and the values are still off for some reason. trying to dig deeper to see if they diverge before any forward() (i.e. b/c of image preprocessing) or otherwise
cchadowitz-pf
@cchadowitz-pf

in my minimal example, i see this in the input tensor just before calling forward() (just looking at the first 10 values as a quick comparison):

Sizes: [1, 3, 299, 299]
before forward() inputs[0].toTensor()[0][0][0].slice(0, 0, 10):  
 20
 20
 21
 21
 20
 20
 15
  8
  8
  8

in DD, I see this:

in_vals[0].toTensor().size(): [1, 3, 299, 299]
before forward() in_vals[0].toTensor()[0][0][0].slice(0, 0, 10):  
 20
 20
 21
 21
 21
 21
 16
  9
  8
  8
My minimal code (excluding the model loading)
        cv::Mat img = cv::imread(argv[2]);
        cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
        cv::resize(img, img, cv::Size(299, 299), 0, 0, cv::INTER_CUBIC);
        at::Tensor tensor_image = torch::from_blob(img.data, {  img.rows, img.cols, img.channels() }, at::kByte);
        tensor_image = tensor_image.to(at::kFloat);
        tensor_image = tensor_image.permute({ 2, 0, 1 });
        std::vector<torch::jit::IValue> inputs;
        tensor_image = torch::unsqueeze(tensor_image, 0);
        inputs.push_back({tensor_image});
        std::cout << "Sizes: " << inputs[0].toTensor().sizes() << std::endl;
        std::cout << "before forward() inputs[0].toTensor()[0][1][0].slice(0, 0, 10): " << inputs[0].toTensor()[0][1][0].slice(0, 0, 10) << std::endl;
        at::Tensor output = module.forward(inputs).toTensor();
Emmanuel Benazera
@beniz
@cchadowitz-pf hi! (I'm on EDT this week while @Bycob is CET), if you can share the weights (since this is a public model), I believe we can add the functionality. We haven't played with multi-label models with the torch backend since we have had no need for it yet. This is why the torch backend is not always on par with caffe/tf.
cchadowitz-pf
@cchadowitz-pf
hi @beniz - welcome to EDT! :grinning: would it be more helpful to share the .pth for pytorch or the .pt for libtorch? (or both)
Emmanuel Benazera
@beniz
both ? thanks! (Baltimore!)
cchadowitz-pf
@cchadowitz-pf
also i see that in tensor::from_blob i'm using at::kByte whereas DD uses at::ScalarType::Byte - any chance that's causing any differences?
Emmanuel Benazera
@beniz
I don't know, @Bycob may know. It's not immediately clear from libtorch's doc.
cchadowitz-pf
@cchadowitz-pf
doesn't seem like it had any change on my example
cchadowitz-pf
@cchadowitz-pf
aha
when i remove cv::cvtColor(img, img, cv::COLOR_BGR2RGB); from my minimal example it appears i get the same exact output from my example vs DD
is the rgb param in DD new? I'm surprised to see in the API that it defaults to false as I thought only OpenCV uses BGR and so caffe, tf, torch, and everything else in DD would need RGB anyways 🤔
Emmanuel Benazera
@beniz
yes rgb is new because most torch models are RGB whereas caffe defaulted to BGR due to opencv
cchadowitz-pf
@cchadowitz-pf
ahh interesting okay - and in this case it's a tensorflow model converted to torch and i believe tensorflow is also RGB normally, right?
now i'm trying to work out if this model expects it as BGR or RGB 😅
Emmanuel Benazera
@beniz
usually it does change the labels a bit, not drastically
the initial model may have been rgb
cchadowitz-pf
@cchadowitz-pf
it's the model I got from here years ago: https://www.deepdetect.com/models/tf/openimages_inception_v3/
I don't know if you happen to recall at this point if it was rgb or bgr..... :)
Emmanuel Benazera
@beniz
you may deduce it from the DD API call...
cchadowitz-pf
@cchadowitz-pf
which call?