Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 23 11:26
    mergify[bot] unlabeled #1476
  • Nov 23 11:26

    mergify[bot] on master

    fix(tensorrt): clarify conditio… (compare)

  • Nov 23 11:26
    mergify[bot] closed #1476
  • Nov 23 11:26
    mergify[bot] labeled #1476
  • Nov 22 16:49
    mergify[bot] review_requested #1476
  • Nov 22 16:49
    mergify[bot] review_requested #1476
  • Nov 22 16:48
    Bycob labeled #1476
  • Nov 22 16:48
    Bycob labeled #1476
  • Nov 22 16:48
    Bycob opened #1476
  • Nov 10 11:55
    mergify[bot] unlabeled #1475
  • Nov 10 11:55

    mergify[bot] on master

    feat(torch): update torch to 1.… (compare)

  • Nov 10 11:55
    mergify[bot] closed #1475
  • Nov 10 11:55
    mergify[bot] labeled #1475
  • Nov 09 13:55
    mergify[bot] unlabeled #1475
  • Nov 09 13:39
    mergify[bot] synchronize #1475
  • Nov 09 13:39
    mergify[bot] labeled #1475
  • Nov 08 10:54
    fantes commented #1475
  • Nov 08 09:19
    Bycob commented #1475
  • Nov 08 09:18
    mergify[bot] review_requested #1475
  • Nov 08 09:18
    mergify[bot] review_requested #1475
cchadowitz-pf
@cchadowitz-pf
for comparison, in my small libtorch test environment that I successfully used this model, I used the following:
        torch::jit::script::Module module;
        module = torch::jit::load(argv[1]);

        cv::Mat img = cv::imread(argv[2]);
        cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
        at::Tensor tensor_image = torch::from_blob(img.data, {  img.rows, img.cols, img.channels() }, at::kByte);
        tensor_image = tensor_image.to(at::kFloat);
        tensor_image = tensor_image.permute({ 2, 0, 1 });

        at::Tensor output = module.forward({tensor_image}).toTensor();
Emmanuel Benazera
@beniz
Certainly the permutation, cc @Bycob
cchadowitz-pf
@cchadowitz-pf
I've been trying to build DD with torchlib.cc changed to match the way my minimal working example (outside DD) works in terms of passing a tensor to the .forward() method but have not yet been able to reconcile the two to identify what the issue is on the DD side....
Louis Jean
@Bycob
Images are loaded here in DD https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchdataset.cc#L890
It looks like we do the same operations as you
Maybe you can try to log the dimensions of the input tensor to see if it's similar to the one in your example?
cchadowitz-pf
@cchadowitz-pf
would it make sense to do that around here? https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchlib.cc#L1462
something like just outputting tensor.dim()?
Louis Jean
@Bycob
Yes, that way you will see exactly what the model is taking as input
cchadowitz-pf
@cchadowitz-pf
so at that point i'm getting dim 4, i'm going to also try to output in_vals[0].dim() at https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchlib.cc#L1472
Louis Jean
@Bycob
The traced module is called here https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchmodule.cc#L315
.dim() gives you the number of dimensions, you can call .sizes() to get all the dimensions
cchadowitz-pf
@cchadowitz-pf
ah okay. and since source there is a vector of IValues, if I'm only testing with a single input image it still makes sense to do something like source[0].toTensor().sizes(), right?
Louis Jean
@Bycob
yes
cchadowitz-pf
@cchadowitz-pf
traced module forward - source[0].toTensor().dim(): 4
traced module forward - source[0].toTensor().sizes(): [1, 3, 299, 299]
so for some reason there are 4 dims for the first (and only, in this case) image in the source std::vector
Louis Jean
@Bycob
The first dimension is the batch size
cchadowitz-pf
@cchadowitz-pf
and i imagine the error is because then source has dim 5?
i thought source was the batch, not source[0]?
Louis Jean
@Bycob
It can be a bit confusing, sometimes models have multiple tensors as input, so source contains each argument as a batched tensor
For example with detection models training you need to pass the images and the labels, so source[0] contains all the images of the batch and source[1] contains all the labels
cchadowitz-pf
@cchadowitz-pf
ah right
so in my minimal working example, i'm doing something like at::Tensor output = module.forward({tensor_image}).toTensor(); where tensor_image has size [3, 299, 299] and so .forward() is being passed a vector where the first and only element is [3,299,299]
(i think)

in python my model had this forward() method:

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        with torch.no_grad():
            x = torch.unsqueeze(self.transforms(x), 0)
            y_pred = self.model(x)
            return y_pred

so do I need to rewrite that to accept a batch of Tensors as input?

Louis Jean
@Bycob
If you can do that I think this is the way to go. When torch models take tensors as input, it's a general convention that the first dimension is the batch size. So we kept that convention in dd as well
cchadowitz-pf
@cchadowitz-pf
that makes sense - it appears that perhaps this line x = torch.unsqueeze(self.transforms(x), 0) is causing the issue. at this point I'm not sure why I have that in there. Perhaps I was incorrectly passing a single tensor in when I was first experimenting and solved it incorrectly by manually adding a dimension inside .forward()
Louis Jean
@Bycob
It makes sense!
cchadowitz-pf
@cchadowitz-pf
hmm i'm getting an error on loading the (new) traced model now
terminate called after throwing an instance of 'c10::Error'
  what():  isTuple()INTERNAL ASSERT FAILED at "/deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h":1916, please report a bug to PyTorch. Expected Tuple but got String
Exception raised from toTupleRef at /deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h:1916 (most recent call first):
full error log:
[2022-07-18 11:46:29.437] [openimages] [error] unable to load /opt/models/traced_openimages_model.pt
terminate called after throwing an instance of 'c10::Error'
  what():  isTuple()INTERNAL ASSERT FAILED at "/deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h":1916, please report a bug to PyTorch. Expected Tuple but got String
Exception raised from toTupleRef at /deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h:1916 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f9c08499dfc in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x8f (0x7f9c0844de2c in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x53 (0x7f9c084976b3 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #3: <unknown function> + 0x4644d61 (0x7f9c0cb1dd61 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x4644e79 (0x7f9c0cb1de79 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #5: torch::jit::SourceRange::highlight(std::ostream&) const + 0x48 (0x7f9c09a4c778 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #6: torch::jit::ErrorReport::what() const + 0x2c3 (0x7f9c09a326e3 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x1bc24d (0x563fb116f24d in ./main/dede)
frame #8: <unknown function> + 0x11fedd (0x563fb10d2edd in ./main/dede)
frame #9: <unknown function> + 0x26912d (0x563fb121c12d in ./main/dede)
frame #10: <unknown function> + 0x51930b (0x563fb14cc30b in ./main/dede)
frame #11: <unknown function> + 0x203684 (0x563fb11b6684 in ./main/dede)
frame #12: <unknown function> + 0x1b41e2 (0x563fb11671e2 in ./main/dede)
frame #13: <unknown function> + 0x874fd6 (0x563fb1827fd6 in ./main/dede)
frame #14: <unknown function> + 0x875b52 (0x563fb1828b52 in ./main/dede)
frame #15: <unknown function> + 0x879d10 (0x563fb182cd10 in ./main/dede)
frame #16: <unknown function> + 0xd6de4 (0x7f9c07148de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #17: <unknown function> + 0x8609 (0x7f9c06ea9609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #18: clone + 0x43 (0x7f9c06dce133 in /lib/x86_64-linux-gnu/libc.so.6)

 0# dd::OatppJsonAPI::abort(int) at /deepdetect/src/oatppjsonapi.cc:325
 1# 0x00007F9C06CF2090 in /lib/x86_64-linux-gnu/libc.so.6
 2# raise at ../sysdeps/unix/sysv/linux/raise.c:51
 3# abort at /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81
 4# 0x00007F9C07110911 in /lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F9C0711C38C in /lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007F9C0711B369 in /lib/x86_64-linux-gnu/libstdc++.so.6
 7# __gxx_personality_v0 in /lib/x86_64-linux-gnu/libstdc++.so.6
 8# 0x00007F9C06ED4BEF in /lib/x86_64-linux-gnu/libgcc_s.so.1
 9# _Unwind_Resume in /lib/x86_64-linux-gnu/libgcc_s.so.1
10# torch::jit::ConcreteSourceRangeUnpickler::unpickle() [clone .cold] in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
11# torch::jit::ConcreteSourceRangeUnpickler::findSourceRangeThatGenerated(torch::jit::SourceRange const&) in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
12# torch::jit::SourceRange::highlight(std::ostream&) const [clone .localalias] in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
13# torch::jit::ErrorReport::what() const in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
14# boost::exception_detail::diagnostic_information_impl[abi:cxx11](boost::exception const*, std::exception const*, bool, bool) at /usr/include/boost/exception/diagnostic_information.hpp:131
15# boost::current_exception_diagnostic_information[abi:cxx11](bool) at /usr/include/boost/exception/diagnostic_information.hpp:47
16# dd::Services::add_service(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mapbox::util::variant<dd::MLService<dd::CaffeLib, dd::ImgCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVTSCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::TxtCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::SVMCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::ImgCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVTSCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::TxtCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::SVMCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::TorchLib, dd::ImgTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>, dd::MLService<dd::TorchLib, dd::VideoTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>, dd::MLService<dd::TorchLib, dd::TxtTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>, dd::MLService<dd::TorchLib, dd::CSVTSTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel> >&&, dd::APIData const&) at /deepdetect/src/services.h:429
17# dd::JsonAPI::service_create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /deepdetect/src/jsonapi.cc:718
18# DedeController::Z__PROXY_METHOD_update_service(std::shared_ptr<oatpp::web::protocol::http::incoming::Request> const&) at /deepdetect/src/http/controller.hpp:132
19# oatpp::web::server::api::ApiController::Handler<DedeController>::handle(std::shared_ptr<oatpp::web::protocol::http::incoming::Request> const&) at /deepdetect/build/oatpp/src/oatpp/src/oatpp/web/server/api/ApiController.hpp:300
20# oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&, std::shared_ptr<oatpp::web::protocol::http::incoming::Request> const&, oatpp::web::protocol::http::utils::CommunicationUtils::ConnectionState&) in ./main/dede
21# oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&) in ./main/dede
22# oatpp::web::server::HttpProcessor::Task::run() in ./main/dede
23# 0x00007F9C07148DE4 in /lib/x86_64-linux-gnu/libstdc++.so.6
24# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478
25# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Aborted (core dumped)
Louis Jean
@Bycob
ouch
what's the new model's code?
cchadowitz-pf
@cchadowitz-pf
class Predictor(nn.Module):
    def __init__(self):
        super().__init__()
        MainModel = imp.load_source('MainModel', '/content/drive/MyDrive/TFtoTorchConversion/OpenImages/openimages.py')
        self.model = torch.load('/content/drive/MyDrive/TFtoTorchConversion/OpenImages/openimages.pth')
        self.model.eval()
        self.transforms = nn.Sequential(
            T.Resize([299, 299]),
            T.ConvertImageDtype(torch.float) #,
        )
        with open('/content/drive/MyDrive/TFtoTorchConversion/OpenImages/corresp-utf8.txt') as f:
          self.labels = [' '.join(l.strip().split(' ')[1:]) for l in f.readlines()]

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        with torch.no_grad():
            x = self.transforms(x)
            y_pred = self.model(x)
            return y_pred
this is the only thing I changed: x = torch.unsqueeze(self.transforms(x), 0) is now x = self.transforms(x)
and when I trace it I'm doing something like:
example = torchvision.io.read_image(os.path.join(data_path, exampleFile)).to('cpu')
predictor = Predictor().to('cpu')
traced_script_module = torch.jit.trace(predictor, torch.unsqueeze(example, 0))
(the torch.unsqueeze(example, 0) is also new to account for where I'm turning a single image input into a batch of 1)
Louis Jean
@Bycob
Do you have a dd call? service creation + predict call
cchadowitz-pf
@cchadowitz-pf
curl -X PUT "http://localhost:8081/services/openimages" -d '{
  "mllib":"torch",
  "description":"test",
  "type":"supervised",
  "parameters":{
    "input":{"connector":"image"},
    "mllib":{ "nclasses":6012}
  },
  "model":{"repository":"/opt/models/"}
}'|jq
no predict call since it errors at the service creation
Louis Jean
@Bycob
ok
cchadowitz-pf
@cchadowitz-pf
let me rebuild once more to make sure there isn't something weird happening
cchadowitz-pf
@cchadowitz-pf
yeah same error
Louis Jean
@Bycob
It looks like it's an exception from torch::jit::load, so maybe you can reproduce it with your minimal example
DD should not be crashing though, I will check this
cchadowitz-pf
@cchadowitz-pf
what version of torch does DD use? perhaps it's a version mismatch
i used 1.12.0+cu113 from google colab to save the scripted model

It looks like it's an exception from torch::jit::load, so maybe you can reproduce it with your minimal example

my minimal example doesn't have any problems, but i'm using libtorch-1.12.0+cpu in that example

cchadowitz-pf
@cchadowitz-pf
okay i'll try saving the scripted model with v1.11 and see what happens
cchadowitz-pf
@cchadowitz-pf
definitely an issue with type mismatch. model loads without a problem when it was scripted+saved with torch==1.11.0 and torchvision==0.12.0 (instead of torch==1.12.0 and torchvision==0.13.0)
cchadowitz-pf
@cchadowitz-pf
sorry @Bycob but I'm running into a different error now :sweat_smile: I've now gotten the v1.11.0 scripted model loaded successfully but the predict call is producing this error
[2022-07-18 13:34:12.916] [openimages] [error] mllib internal error: Libtorch error:Dimension out of range (expected to be in range of [-1, 0], but got 1)
Exception raised from maybe_wrap_dim at /deepdetect/build/pytorch/src/pytorch/c10/core/WrapDimMinimal.h:25 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f55a6186dfc in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #1: <unknown function> + 0xc6b58a (0x7f55a6e3158a in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #2: at::meta::structured__softmax::meta(at::Tensor const&, long, bool) + 0x37 (0x7f55a79ccbf7 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x206dde5 (0x7f55a8233de5 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x206de6c (0x7f55a8233e6c in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #5: at::_ops::_softmax::redispatch(c10::DispatchKeySet, at::Tensor const&, long, bool) + 0xd4 (0x7f55a8069de4 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x3cb3dfe (0x7f55a9e79dfe in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x3cb42cf (0x7f55a9e7a2cf in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #8: at::_ops::_softmax::call(at::Tensor const&, long, bool) + 0x144 (0x7f55a80d8114 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #9: at::native::softmax(at::Tensor const&, long, c10::optional<c10::ScalarType>) + 0xa6 (0x7f55a79cd5f6 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x227db2b (0x7f55a8443b2b in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #11: at::_ops::softmax_int::call(at::Tensor const&, long, c10::optional<c10::ScalarType>) + 0x14d (0x7f55a80c712d in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x382006 (0x55c89956e006 in ./main/dede)
frame #13: <unknown function> + 0x25567e (0x55c89944167e in ./main/dede)
frame #14: <unknown function> + 0x255a08 (0x55c899441a08 in ./main/dede)
frame #15: <unknown function> + 0x255d28 (0x55c899441d28 in ./main/dede)
frame #16: <unknown function> + 0x256048 (0x55c899442048 in ./main/dede)
frame #17: <unknown function> + 0x256368 (0x55c899442368 in ./main/dede)
frame #18: <unknown function> + 0x256688 (0x55c899442688 in ./main/dede)
frame #19: <unknown function> + 0x2569a8 (0x55c8994429a8 in ./main/dede)
frame #20: <unknown function> + 0x256cc8 (0x55c899442cc8 in ./main/dede)
frame #21: <unknown function> + 0x256fe8 (0x55c899442fe8 in ./main/dede)
frame #22: <unknown function> + 0x257308 (0x55c899443308 in ./main/dede)
frame #23: <unknown function> + 0x257628 (0x55c899443628 in ./main/dede)
frame #24: <unknown function> + 0x257d66 (0x55c899443d66 in ./main/dede)
frame #25: <unknown function> + 0x513418 (0x55c8996ff418 in ./main/dede)
frame #26: <unknown function> + 0x203994 (0x55c8993ef994 in ./main/dede)
frame #27: <unknown function> + 0x1b41e2 (0x55c8993a01e2 in ./main/dede)
frame #28: <unknown function> + 0x874f66 (0x55c899a60f66 in ./main/dede)
frame #29: <unknown function> + 0x875ae2 (0x55c899a61ae2 in ./main/dede)
frame #30: <unknown function> + 0x879ca0 (0x55c899a65ca0 in ./main/dede)
frame #31: <unknown function> + 0xd6de4 (0x7f55a4e35de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #32: <unknown function> + 0x8609 (0x7f55a4b96609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #33: clone + 0x43 (0x7f55a4abb133 in /lib/x86_64-linux-gnu/libc.so.6)
this is happening after the _module.forward() call, so progress!
oh i wonder if the output of my model doesn't match what DD expects....
cchadowitz-pf
@cchadowitz-pf
is there any documentation about that for the DD/Torch integration - what the output format should be like from the model?