Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 23 11:26
    mergify[bot] unlabeled #1476
  • Nov 23 11:26

    mergify[bot] on master

    fix(tensorrt): clarify conditio… (compare)

  • Nov 23 11:26
    mergify[bot] closed #1476
  • Nov 23 11:26
    mergify[bot] labeled #1476
  • Nov 22 16:49
    mergify[bot] review_requested #1476
  • Nov 22 16:49
    mergify[bot] review_requested #1476
  • Nov 22 16:48
    Bycob labeled #1476
  • Nov 22 16:48
    Bycob labeled #1476
  • Nov 22 16:48
    Bycob opened #1476
  • Nov 10 11:55
    mergify[bot] unlabeled #1475
  • Nov 10 11:55

    mergify[bot] on master

    feat(torch): update torch to 1.… (compare)

  • Nov 10 11:55
    mergify[bot] closed #1475
  • Nov 10 11:55
    mergify[bot] labeled #1475
  • Nov 09 13:55
    mergify[bot] unlabeled #1475
  • Nov 09 13:39
    mergify[bot] synchronize #1475
  • Nov 09 13:39
    mergify[bot] labeled #1475
  • Nov 08 10:54
    fantes commented #1475
  • Nov 08 09:19
    Bycob commented #1475
  • Nov 08 09:18
    mergify[bot] review_requested #1475
  • Nov 08 09:18
    mergify[bot] review_requested #1475
Louis Jean
@Bycob
yes
cchadowitz-pf
@cchadowitz-pf
traced module forward - source[0].toTensor().dim(): 4
traced module forward - source[0].toTensor().sizes(): [1, 3, 299, 299]
so for some reason there are 4 dims for the first (and only, in this case) image in the source std::vector
Louis Jean
@Bycob
The first dimension is the batch size
cchadowitz-pf
@cchadowitz-pf
and i imagine the error is because then source has dim 5?
i thought source was the batch, not source[0]?
Louis Jean
@Bycob
It can be a bit confusing, sometimes models have multiple tensors as input, so source contains each argument as a batched tensor
For example with detection models training you need to pass the images and the labels, so source[0] contains all the images of the batch and source[1] contains all the labels
cchadowitz-pf
@cchadowitz-pf
ah right
so in my minimal working example, i'm doing something like at::Tensor output = module.forward({tensor_image}).toTensor(); where tensor_image has size [3, 299, 299] and so .forward() is being passed a vector where the first and only element is [3,299,299]
(i think)

in python my model had this forward() method:

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        with torch.no_grad():
            x = torch.unsqueeze(self.transforms(x), 0)
            y_pred = self.model(x)
            return y_pred

so do I need to rewrite that to accept a batch of Tensors as input?

Louis Jean
@Bycob
If you can do that I think this is the way to go. When torch models take tensors as input, it's a general convention that the first dimension is the batch size. So we kept that convention in dd as well
cchadowitz-pf
@cchadowitz-pf
that makes sense - it appears that perhaps this line x = torch.unsqueeze(self.transforms(x), 0) is causing the issue. at this point I'm not sure why I have that in there. Perhaps I was incorrectly passing a single tensor in when I was first experimenting and solved it incorrectly by manually adding a dimension inside .forward()
Louis Jean
@Bycob
It makes sense!
cchadowitz-pf
@cchadowitz-pf
hmm i'm getting an error on loading the (new) traced model now
terminate called after throwing an instance of 'c10::Error'
  what():  isTuple()INTERNAL ASSERT FAILED at "/deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h":1916, please report a bug to PyTorch. Expected Tuple but got String
Exception raised from toTupleRef at /deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h:1916 (most recent call first):
full error log:
[2022-07-18 11:46:29.437] [openimages] [error] unable to load /opt/models/traced_openimages_model.pt
terminate called after throwing an instance of 'c10::Error'
  what():  isTuple()INTERNAL ASSERT FAILED at "/deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h":1916, please report a bug to PyTorch. Expected Tuple but got String
Exception raised from toTupleRef at /deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h:1916 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f9c08499dfc in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x8f (0x7f9c0844de2c in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x53 (0x7f9c084976b3 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #3: <unknown function> + 0x4644d61 (0x7f9c0cb1dd61 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x4644e79 (0x7f9c0cb1de79 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #5: torch::jit::SourceRange::highlight(std::ostream&) const + 0x48 (0x7f9c09a4c778 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #6: torch::jit::ErrorReport::what() const + 0x2c3 (0x7f9c09a326e3 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x1bc24d (0x563fb116f24d in ./main/dede)
frame #8: <unknown function> + 0x11fedd (0x563fb10d2edd in ./main/dede)
frame #9: <unknown function> + 0x26912d (0x563fb121c12d in ./main/dede)
frame #10: <unknown function> + 0x51930b (0x563fb14cc30b in ./main/dede)
frame #11: <unknown function> + 0x203684 (0x563fb11b6684 in ./main/dede)
frame #12: <unknown function> + 0x1b41e2 (0x563fb11671e2 in ./main/dede)
frame #13: <unknown function> + 0x874fd6 (0x563fb1827fd6 in ./main/dede)
frame #14: <unknown function> + 0x875b52 (0x563fb1828b52 in ./main/dede)
frame #15: <unknown function> + 0x879d10 (0x563fb182cd10 in ./main/dede)
frame #16: <unknown function> + 0xd6de4 (0x7f9c07148de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #17: <unknown function> + 0x8609 (0x7f9c06ea9609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #18: clone + 0x43 (0x7f9c06dce133 in /lib/x86_64-linux-gnu/libc.so.6)

 0# dd::OatppJsonAPI::abort(int) at /deepdetect/src/oatppjsonapi.cc:325
 1# 0x00007F9C06CF2090 in /lib/x86_64-linux-gnu/libc.so.6
 2# raise at ../sysdeps/unix/sysv/linux/raise.c:51
 3# abort at /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81
 4# 0x00007F9C07110911 in /lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F9C0711C38C in /lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007F9C0711B369 in /lib/x86_64-linux-gnu/libstdc++.so.6
 7# __gxx_personality_v0 in /lib/x86_64-linux-gnu/libstdc++.so.6
 8# 0x00007F9C06ED4BEF in /lib/x86_64-linux-gnu/libgcc_s.so.1
 9# _Unwind_Resume in /lib/x86_64-linux-gnu/libgcc_s.so.1
10# torch::jit::ConcreteSourceRangeUnpickler::unpickle() [clone .cold] in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
11# torch::jit::ConcreteSourceRangeUnpickler::findSourceRangeThatGenerated(torch::jit::SourceRange const&) in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
12# torch::jit::SourceRange::highlight(std::ostream&) const [clone .localalias] in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
13# torch::jit::ErrorReport::what() const in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
14# boost::exception_detail::diagnostic_information_impl[abi:cxx11](boost::exception const*, std::exception const*, bool, bool) at /usr/include/boost/exception/diagnostic_information.hpp:131
15# boost::current_exception_diagnostic_information[abi:cxx11](bool) at /usr/include/boost/exception/diagnostic_information.hpp:47
16# dd::Services::add_service(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mapbox::util::variant<dd::MLService<dd::CaffeLib, dd::ImgCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVTSCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::TxtCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::SVMCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::ImgCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVTSCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::TxtCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::SVMCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::TorchLib, dd::ImgTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>, dd::MLService<dd::TorchLib, dd::VideoTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>, dd::MLService<dd::TorchLib, dd::TxtTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>, dd::MLService<dd::TorchLib, dd::CSVTSTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel> >&&, dd::APIData const&) at /deepdetect/src/services.h:429
17# dd::JsonAPI::service_create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /deepdetect/src/jsonapi.cc:718
18# DedeController::Z__PROXY_METHOD_update_service(std::shared_ptr<oatpp::web::protocol::http::incoming::Request> const&) at /deepdetect/src/http/controller.hpp:132
19# oatpp::web::server::api::ApiController::Handler<DedeController>::handle(std::shared_ptr<oatpp::web::protocol::http::incoming::Request> const&) at /deepdetect/build/oatpp/src/oatpp/src/oatpp/web/server/api/ApiController.hpp:300
20# oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&, std::shared_ptr<oatpp::web::protocol::http::incoming::Request> const&, oatpp::web::protocol::http::utils::CommunicationUtils::ConnectionState&) in ./main/dede
21# oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&) in ./main/dede
22# oatpp::web::server::HttpProcessor::Task::run() in ./main/dede
23# 0x00007F9C07148DE4 in /lib/x86_64-linux-gnu/libstdc++.so.6
24# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478
25# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Aborted (core dumped)
Louis Jean
@Bycob
ouch
what's the new model's code?
cchadowitz-pf
@cchadowitz-pf
class Predictor(nn.Module):
    def __init__(self):
        super().__init__()
        MainModel = imp.load_source('MainModel', '/content/drive/MyDrive/TFtoTorchConversion/OpenImages/openimages.py')
        self.model = torch.load('/content/drive/MyDrive/TFtoTorchConversion/OpenImages/openimages.pth')
        self.model.eval()
        self.transforms = nn.Sequential(
            T.Resize([299, 299]),
            T.ConvertImageDtype(torch.float) #,
        )
        with open('/content/drive/MyDrive/TFtoTorchConversion/OpenImages/corresp-utf8.txt') as f:
          self.labels = [' '.join(l.strip().split(' ')[1:]) for l in f.readlines()]

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        with torch.no_grad():
            x = self.transforms(x)
            y_pred = self.model(x)
            return y_pred
this is the only thing I changed: x = torch.unsqueeze(self.transforms(x), 0) is now x = self.transforms(x)
and when I trace it I'm doing something like:
example = torchvision.io.read_image(os.path.join(data_path, exampleFile)).to('cpu')
predictor = Predictor().to('cpu')
traced_script_module = torch.jit.trace(predictor, torch.unsqueeze(example, 0))
(the torch.unsqueeze(example, 0) is also new to account for where I'm turning a single image input into a batch of 1)
Louis Jean
@Bycob
Do you have a dd call? service creation + predict call
cchadowitz-pf
@cchadowitz-pf
curl -X PUT "http://localhost:8081/services/openimages" -d '{
  "mllib":"torch",
  "description":"test",
  "type":"supervised",
  "parameters":{
    "input":{"connector":"image"},
    "mllib":{ "nclasses":6012}
  },
  "model":{"repository":"/opt/models/"}
}'|jq
no predict call since it errors at the service creation
Louis Jean
@Bycob
ok
cchadowitz-pf
@cchadowitz-pf
let me rebuild once more to make sure there isn't something weird happening
cchadowitz-pf
@cchadowitz-pf
yeah same error
Louis Jean
@Bycob
It looks like it's an exception from torch::jit::load, so maybe you can reproduce it with your minimal example
DD should not be crashing though, I will check this
cchadowitz-pf
@cchadowitz-pf
what version of torch does DD use? perhaps it's a version mismatch
i used 1.12.0+cu113 from google colab to save the scripted model

It looks like it's an exception from torch::jit::load, so maybe you can reproduce it with your minimal example

my minimal example doesn't have any problems, but i'm using libtorch-1.12.0+cpu in that example

cchadowitz-pf
@cchadowitz-pf
okay i'll try saving the scripted model with v1.11 and see what happens
cchadowitz-pf
@cchadowitz-pf
definitely an issue with type mismatch. model loads without a problem when it was scripted+saved with torch==1.11.0 and torchvision==0.12.0 (instead of torch==1.12.0 and torchvision==0.13.0)
cchadowitz-pf
@cchadowitz-pf
sorry @Bycob but I'm running into a different error now :sweat_smile: I've now gotten the v1.11.0 scripted model loaded successfully but the predict call is producing this error
[2022-07-18 13:34:12.916] [openimages] [error] mllib internal error: Libtorch error:Dimension out of range (expected to be in range of [-1, 0], but got 1)
Exception raised from maybe_wrap_dim at /deepdetect/build/pytorch/src/pytorch/c10/core/WrapDimMinimal.h:25 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f55a6186dfc in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc6b58a (0x7f55a6e3158a in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #2: at::meta::structured__softmax::meta(at::Tensor const&, long, bool) + 0x37 (0x7f55a79ccbf7 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x206dde5 (0x7f55a8233de5 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x206de6c (0x7f55a8233e6c in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #5: at::_ops::_softmax::redispatch(c10::DispatchKeySet, at::Tensor const&, long, bool) + 0xd4 (0x7f55a8069de4 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x3cb3dfe (0x7f55a9e79dfe in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x3cb42cf (0x7f55a9e7a2cf in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::_softmax::call(at::Tensor const&, long, bool) + 0x144 (0x7f55a80d8114 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #9: at::native::softmax(at::Tensor const&, long, c10::optional<c10::ScalarType>) + 0xa6 (0x7f55a79cd5f6 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x227db2b (0x7f55a8443b2b in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #11: at::_ops::softmax_int::call(at::Tensor const&, long, c10::optional<c10::ScalarType>) + 0x14d (0x7f55a80c712d in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x382006 (0x55c89956e006 in ./main/dede)
frame #13: <unknown function> + 0x25567e (0x55c89944167e in ./main/dede)
frame #14: <unknown function> + 0x255a08 (0x55c899441a08 in ./main/dede)
frame #15: <unknown function> + 0x255d28 (0x55c899441d28 in ./main/dede)
frame #16: <unknown function> + 0x256048 (0x55c899442048 in ./main/dede)
frame #17: <unknown function> + 0x256368 (0x55c899442368 in ./main/dede)
frame #18: <unknown function> + 0x256688 (0x55c899442688 in ./main/dede)
frame #19: <unknown function> + 0x2569a8 (0x55c8994429a8 in ./main/dede)
frame #20: <unknown function> + 0x256cc8 (0x55c899442cc8 in ./main/dede)
frame #21: <unknown function> + 0x256fe8 (0x55c899442fe8 in ./main/dede)
frame #22: <unknown function> + 0x257308 (0x55c899443308 in ./main/dede)
frame #23: <unknown function> + 0x257628 (0x55c899443628 in ./main/dede)
frame #24: <unknown function> + 0x257d66 (0x55c899443d66 in ./main/dede)
frame #25: <unknown function> + 0x513418 (0x55c8996ff418 in ./main/dede)
frame #26: <unknown function> + 0x203994 (0x55c8993ef994 in ./main/dede)
frame #27: <unknown function> + 0x1b41e2 (0x55c8993a01e2 in ./main/dede)
frame #28: <unknown function> + 0x874f66 (0x55c899a60f66 in ./main/dede)
frame #29: <unknown function> + 0x875ae2 (0x55c899a61ae2 in ./main/dede)
frame #30: <unknown function> + 0x879ca0 (0x55c899a65ca0 in ./main/dede)
frame #31: <unknown function> + 0xd6de4 (0x7f55a4e35de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #32: <unknown function> + 0x8609 (0x7f55a4b96609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #33: clone + 0x43 (0x7f55a4abb133 in /lib/x86_64-linux-gnu/libc.so.6)
this is happening after the _module.forward() call, so progress!
oh i wonder if the output of my model doesn't match what DD expects....
cchadowitz-pf
@cchadowitz-pf
is there any documentation about that for the DD/Torch integration - what the output format should be like from the model?
cchadowitz-pf
@cchadowitz-pf
ok i can confirm that error is coming from https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchlib.cc#L1501 which supports my theory that my model output is different from what DD expects
cchadowitz-pf
@cchadowitz-pf
in my model's forward() method, I'm now returning this return torch.reshape(y_pred, (1, -1)) instead of simply return y_pred and that resolved the error. It seems that DD expects the output Tensor to have dimensions (1, N) instead of simply a length N vector.
Emmanuel Benazera
@beniz
What's your model, a single label classifier, multi-label maybe ?
cchadowitz-pf
@cchadowitz-pf
it's actually the old OpenImagesInceptionV3 model, so multi-label
I seem to have it working except that the output probabilities are not equal between my minimal example and DD. When I use torch::ones({1, 3,299,299}) as input they're identical, so I wonder if there's something different between how I'm reading/preprocessing the image vs how it's done in DD
Louis Jean
@Bycob
I think DD expects the model to return the logits, that are post processed afterward
So the dimensions of the output tensor should be batch_size x num_classes
Louis Jean
@Bycob
Hmm if your model is multi label you may not want the softmax on top
I think that's something to be added to DD, can we get the model to try it on our side?
cchadowitz-pf
@cchadowitz-pf
sure, what stage do you want the model? It's the original OpenImagesInceptionV3 for Tensorflow (from here https://www.deepdetect.com/models/tf/) that I then converted with MMDNN to PyTorch. Then I scripted it to use with LibTorch/DD
at least, I believe the OpenImagesInceptionV3 model is multi-label