Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Oct 03 12:42
    mergify[bot] unlabeled #1470
  • Oct 03 12:42

    mergify[bot] on master

    chore: ignore faiss warnings (compare)

  • Oct 03 12:42
    mergify[bot] closed #1470
  • Oct 03 12:42
    mergify[bot] labeled #1470
  • Oct 03 10:33
    Bycob edited #1470
  • Oct 03 10:15
    mergify[bot] review_requested #1470
  • Oct 03 10:15
    mergify[bot] review_requested #1470
  • Oct 03 10:15
    Bycob opened #1470
  • Oct 03 10:15
    Bycob labeled #1470
  • Oct 03 10:15
    Bycob labeled #1470
  • Sep 30 15:10
    mergify[bot] review_requested #1469
  • Sep 30 15:10
    mergify[bot] review_requested #1469
  • Sep 30 15:09
    Bycob labeled #1469
  • Sep 30 15:09
    Bycob labeled #1469
  • Sep 30 15:09
    Bycob opened #1469
  • Sep 29 08:40

    Bycob on v0.23.0

    (compare)

  • Sep 29 08:40

    Bycob on master

    chore(release): 0.23.0 (compare)

  • Sep 28 14:03
    mergify[bot] unlabeled #1468
  • Sep 28 14:03

    mergify[bot] on master

    fix(oatpp): oatpp-zlib memory l… (compare)

  • Sep 28 14:03
    mergify[bot] closed #1468
dgtlmoon
@dgtlmoon
I tried using a single file of images to train on, and added "parameters": { "input": { "test_split": 0.10 to both service setup and service train calls, but I still get that same segfault
Emmanuel Benazera
@beniz
object detector training with caffe has no test_split
dgtlmoon
@dgtlmoon
x)
Emmanuel Benazera
@beniz
it's going to be deprecated soon, for the torch backend with object detection. At the moment, split by hand and provide a test.txt file, the fix has been merged.
dgtlmoon
@dgtlmoon
ok I usually use caffe for absolutely no reason than that's whats in the examples, maybe I should try torch? (trying squeezenet object detector here)
yes I have a python script to split stuff for me I wrote
that's funny, so actually the bug I found was completely unrelated to what I was doing :D
dgtlmoon
@dgtlmoon
I wonder if you can run a classifier with just one class, i'll try, i think so
Nur
@nurkbts
Hi
How can use code for AI Server , easy setup
Marc Reichman
@marcreichman
Hello @beniz or anyone. Quick question - for a pretrained scenario with just classifications, is it possible to run in read-only docker? I am seeing a message couldn't write model.json file in model repository... when loading, but classifications seem to work fine. Can this message be disregarded in this case?
Emmanuel Benazera
@beniz
Hey @marcreichman yes I believe you can ignore.
Marc Reichman
@marcreichman
Thanks @beniz !
cchadowitz-pf
@cchadowitz-pf
Hi @beniz, long time no chat! I'm working on integrating some torch models into our DD configuration and ran into the following error. This is a traced pytorch model that I've successfully used (in a test environment) with LibTorch using OpenCV to load the image file into a cv::Mat, which then required color channel permutation as well as the dimensions permuted to match up with what libtorch required. I imagine the error I'm getting from DD is probably related to something similar where the dimensions aren't lining up for some reason between what the torch dataloader is providing the model and what the model expects. Does that seem like the case to you? If so, do you have any suggestions on how to tweak things to work around it?
mllib internal error: Libtorch error:The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__.py", line 12, in forward
    model = self.model
    transforms = self.transforms
    input = torch.unsqueeze((transforms).forward(x, ), 0)
                             ~~~~~~~~~~~~~~~~~~~ <--- HERE
    return (model).forward(input, )
  File "code/__torch__/torch/nn/modules/container.py", line 12, in forward
    _1 = getattr(self, "1")
    _0 = getattr(self, "0")
    return (_1).forward((_0).forward(x, ), )
                         ~~~~~~~~~~~ <--- HERE
  File "code/__torch__/torchvision/transforms/transforms.py", line 10, in forward
    img = torch.unsqueeze(x, 0)
    input = torch.to(img, 6)
    img0 = torch.upsample_bilinear2d(input, [299, 299], False, None)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    img1 = torch.squeeze(img0, 0)
    img2 = torch.round(img1)

Traceback of TorchScript, original code (most recent call last):
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py(3919): interpolate
/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py(490): resize
/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py(438): resize
/usr/local/lib/python3.7/dist-packages/torchvision/transforms/transforms.py(349): forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1098): _slow_forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1110): _call_impl
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py(141): forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1098): _slow_forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1110): _call_impl
<ipython-input-3-4afab89cb121>(19): forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1098): _slow_forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1110): _call_impl
/usr/local/lib/python3.7/dist-packages/torch/jit/_trace.py(965): trace_module
/usr/local/lib/python3.7/dist-packages/torch/jit/_trace.py(750): trace
<ipython-input-10-1646f9ee17ed>(5): <module>
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py(2882): run_code
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py(2822): run_ast_nodes
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py(2718): run_cell
/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py(537): run_cell
/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py(208): do_execute
/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py(399): execute_request
/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py(233): dispatch_shell
/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py(283): dispatcher
/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py(300): null_wrapper
/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py(556): _run_callback
/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py(606): _handle_recv
/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py(577): _handle_events
/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py(300): null_wrapper
/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py(122): _handle_events
/usr/lib/python3.7/asyncio/events.py(88): _run
/usr/lib/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib/python3.7/asyncio/base_events.py(541): run_forever
/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py(132): start
/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py(499): start
/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py(846): launch_instance
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py(16): <module>
/usr/lib/python3.7/runpy.py(85): _run_code
/usr/lib/python3.7/runpy.py(193): _run_module_as_main
RuntimeError: It is expected input_size equals to 4, but got size 5
for comparison, in my small libtorch test environment that I successfully used this model, I used the following:
        torch::jit::script::Module module;
        module = torch::jit::load(argv[1]);

        cv::Mat img = cv::imread(argv[2]);
        cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
        at::Tensor tensor_image = torch::from_blob(img.data, {  img.rows, img.cols, img.channels() }, at::kByte);
        tensor_image = tensor_image.to(at::kFloat);
        tensor_image = tensor_image.permute({ 2, 0, 1 });

        at::Tensor output = module.forward({tensor_image}).toTensor();
Emmanuel Benazera
@beniz
Certainly the permutation, cc @Bycob
cchadowitz-pf
@cchadowitz-pf
I've been trying to build DD with torchlib.cc changed to match the way my minimal working example (outside DD) works in terms of passing a tensor to the .forward() method but have not yet been able to reconcile the two to identify what the issue is on the DD side....
Louis Jean
@Bycob
Images are loaded here in DD https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchdataset.cc#L890
It looks like we do the same operations as you
Maybe you can try to log the dimensions of the input tensor to see if it's similar to the one in your example?
cchadowitz-pf
@cchadowitz-pf
would it make sense to do that around here? https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchlib.cc#L1462
something like just outputting tensor.dim()?
Louis Jean
@Bycob
Yes, that way you will see exactly what the model is taking as input
cchadowitz-pf
@cchadowitz-pf
so at that point i'm getting dim 4, i'm going to also try to output in_vals[0].dim() at https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchlib.cc#L1472
Louis Jean
@Bycob
The traced module is called here https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchmodule.cc#L315
.dim() gives you the number of dimensions, you can call .sizes() to get all the dimensions
cchadowitz-pf
@cchadowitz-pf
ah okay. and since source there is a vector of IValues, if I'm only testing with a single input image it still makes sense to do something like source[0].toTensor().sizes(), right?
Louis Jean
@Bycob
yes
cchadowitz-pf
@cchadowitz-pf
traced module forward - source[0].toTensor().dim(): 4
traced module forward - source[0].toTensor().sizes(): [1, 3, 299, 299]
so for some reason there are 4 dims for the first (and only, in this case) image in the source std::vector
Louis Jean
@Bycob
The first dimension is the batch size
cchadowitz-pf
@cchadowitz-pf
and i imagine the error is because then source has dim 5?
i thought source was the batch, not source[0]?
Louis Jean
@Bycob
It can be a bit confusing, sometimes models have multiple tensors as input, so source contains each argument as a batched tensor
For example with detection models training you need to pass the images and the labels, so source[0] contains all the images of the batch and source[1] contains all the labels
cchadowitz-pf
@cchadowitz-pf
ah right
so in my minimal working example, i'm doing something like at::Tensor output = module.forward({tensor_image}).toTensor(); where tensor_image has size [3, 299, 299] and so .forward() is being passed a vector where the first and only element is [3,299,299]
(i think)

in python my model had this forward() method:

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        with torch.no_grad():
            x = torch.unsqueeze(self.transforms(x), 0)
            y_pred = self.model(x)
            return y_pred

so do I need to rewrite that to accept a batch of Tensors as input?

Louis Jean
@Bycob
If you can do that I think this is the way to go. When torch models take tensors as input, it's a general convention that the first dimension is the batch size. So we kept that convention in dd as well
cchadowitz-pf
@cchadowitz-pf
that makes sense - it appears that perhaps this line x = torch.unsqueeze(self.transforms(x), 0) is causing the issue. at this point I'm not sure why I have that in there. Perhaps I was incorrectly passing a single tensor in when I was first experimenting and solved it incorrectly by manually adding a dimension inside .forward()
Louis Jean
@Bycob
It makes sense!
cchadowitz-pf
@cchadowitz-pf
hmm i'm getting an error on loading the (new) traced model now
terminate called after throwing an instance of 'c10::Error'
  what():  isTuple()INTERNAL ASSERT FAILED at "/deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h":1916, please report a bug to PyTorch. Expected Tuple but got String
Exception raised from toTupleRef at /deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h:1916 (most recent call first):
full error log:
[2022-07-18 11:46:29.437] [openimages] [error] unable to load /opt/models/traced_openimages_model.pt
terminate called after throwing an instance of 'c10::Error'
  what():  isTuple()INTERNAL ASSERT FAILED at "/deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h":1916, please report a bug to PyTorch. Expected Tuple but got String
Exception raised from toTupleRef at /deepdetect/build/pytorch/src/pytorch/aten/src/ATen/core/ivalue_inl.h:1916 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7f9c08499dfc in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x8f (0x7f9c0844de2c in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x53 (0x7f9c084976b3 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libc10.so)
frame #3: <unknown function> + 0x4644d61 (0x7f9c0cb1dd61 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x4644e79 (0x7f9c0cb1de79 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #5: torch::jit::SourceRange::highlight(std::ostream&) const + 0x48 (0x7f9c09a4c778 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #6: torch::jit::ErrorReport::what() const + 0x2c3 (0x7f9c09a326e3 in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x1bc24d (0x563fb116f24d in ./main/dede)
frame #8: <unknown function> + 0x11fedd (0x563fb10d2edd in ./main/dede)
frame #9: <unknown function> + 0x26912d (0x563fb121c12d in ./main/dede)
frame #10: <unknown function> + 0x51930b (0x563fb14cc30b in ./main/dede)
frame #11: <unknown function> + 0x203684 (0x563fb11b6684 in ./main/dede)
frame #12: <unknown function> + 0x1b41e2 (0x563fb11671e2 in ./main/dede)
frame #13: <unknown function> + 0x874fd6 (0x563fb1827fd6 in ./main/dede)
frame #14: <unknown function> + 0x875b52 (0x563fb1828b52 in ./main/dede)
frame #15: <unknown function> + 0x879d10 (0x563fb182cd10 in ./main/dede)
frame #16: <unknown function> + 0xd6de4 (0x7f9c07148de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #17: <unknown function> + 0x8609 (0x7f9c06ea9609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #18: clone + 0x43 (0x7f9c06dce133 in /lib/x86_64-linux-gnu/libc.so.6)

 0# dd::OatppJsonAPI::abort(int) at /deepdetect/src/oatppjsonapi.cc:325
 1# 0x00007F9C06CF2090 in /lib/x86_64-linux-gnu/libc.so.6
 2# raise at ../sysdeps/unix/sysv/linux/raise.c:51
 3# abort at /build/glibc-SzIz7B/glibc-2.31/stdlib/abort.c:81
 4# 0x00007F9C07110911 in /lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F9C0711C38C in /lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007F9C0711B369 in /lib/x86_64-linux-gnu/libstdc++.so.6
 7# __gxx_personality_v0 in /lib/x86_64-linux-gnu/libstdc++.so.6
 8# 0x00007F9C06ED4BEF in /lib/x86_64-linux-gnu/libgcc_s.so.1
 9# _Unwind_Resume in /lib/x86_64-linux-gnu/libgcc_s.so.1
10# torch::jit::ConcreteSourceRangeUnpickler::unpickle() [clone .cold] in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
11# torch::jit::ConcreteSourceRangeUnpickler::findSourceRangeThatGenerated(torch::jit::SourceRange const&) in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
12# torch::jit::SourceRange::highlight(std::ostream&) const [clone .localalias] in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
13# torch::jit::ErrorReport::what() const in /deepdetect/build/pytorch/src/pytorch/torch/lib/libtorch_cpu.so
14# boost::exception_detail::diagnostic_information_impl[abi:cxx11](boost::exception const*, std::exception const*, bool, bool) at /usr/include/boost/exception/diagnostic_information.hpp:131
15# boost::current_exception_diagnostic_information[abi:cxx11](bool) at /usr/include/boost/exception/diagnostic_information.hpp:47
16# dd::Services::add_service(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mapbox::util::variant<dd::MLService<dd::CaffeLib, dd::ImgCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVTSCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::TxtCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::SVMCaffeInputFileConn, dd::SupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::ImgCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::CSVTSCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::TxtCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::CaffeLib, dd::SVMCaffeInputFileConn, dd::UnsupervisedOutput, dd::CaffeModel>, dd::MLService<dd::TorchLib, dd::ImgTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>, dd::MLService<dd::TorchLib, dd::VideoTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>, dd::MLService<dd::TorchLib, dd::TxtTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel>, dd::MLService<dd::TorchLib, dd::CSVTSTorchInputFileConn, dd::SupervisedOutput, dd::TorchModel> >&&, dd::APIData const&) at /deepdetect/src/services.h:429
17# dd::JsonAPI::service_create(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /deepdetect/src/jsonapi.cc:718
18# DedeController::Z__PROXY_METHOD_update_service(std::shared_ptr<oatpp::web::protocol::http::incoming::Request> const&) at /deepdetect/src/http/controller.hpp:132
19# oatpp::web::server::api::ApiController::Handler<DedeController>::handle(std::shared_ptr<oatpp::web::protocol::http::incoming::Request> const&) at /deepdetect/build/oatpp/src/oatpp/src/oatpp/web/server/api/ApiController.hpp:300
20# oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&, std::shared_ptr<oatpp::web::protocol::http::incoming::Request> const&, oatpp::web::protocol::http::utils::CommunicationUtils::ConnectionState&) in ./main/dede
21# oatpp::web::server::HttpProcessor::processNextRequest(oatpp::web::server::HttpProcessor::ProcessingResources&) in ./main/dede
22# oatpp::web::server::HttpProcessor::Task::run() in ./main/dede
23# 0x00007F9C07148DE4 in /lib/x86_64-linux-gnu/libstdc++.so.6
24# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478
25# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Aborted (core dumped)
Louis Jean
@Bycob
ouch
what's the new model's code?
cchadowitz-pf
@cchadowitz-pf
class Predictor(nn.Module):
    def __init__(self):
        super().__init__()
        MainModel = imp.load_source('MainModel', '/content/drive/MyDrive/TFtoTorchConversion/OpenImages/openimages.py')
        self.model = torch.load('/content/drive/MyDrive/TFtoTorchConversion/OpenImages/openimages.pth')
        self.model.eval()
        self.transforms = nn.Sequential(
            T.Resize([299, 299]),
            T.ConvertImageDtype(torch.float) #,
        )
        with open('/content/drive/MyDrive/TFtoTorchConversion/OpenImages/corresp-utf8.txt') as f:
          self.labels = [' '.join(l.strip().split(' ')[1:]) for l in f.readlines()]

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        with torch.no_grad():
            x = self.transforms(x)
            y_pred = self.model(x)
            return y_pred
this is the only thing I changed: x = torch.unsqueeze(self.transforms(x), 0) is now x = self.transforms(x)
and when I trace it I'm doing something like:
example = torchvision.io.read_image(os.path.join(data_path, exampleFile)).to('cpu')
predictor = Predictor().to('cpu')
traced_script_module = torch.jit.trace(predictor, torch.unsqueeze(example, 0))
(the torch.unsqueeze(example, 0) is also new to account for where I'm turning a single image input into a batch of 1)
Louis Jean
@Bycob
Do you have a dd call? service creation + predict call
cchadowitz-pf
@cchadowitz-pf
curl -X PUT "http://localhost:8081/services/openimages" -d '{
  "mllib":"torch",
  "description":"test",
  "type":"supervised",
  "parameters":{
    "input":{"connector":"image"},
    "mllib":{ "nclasses":6012}
  },
  "model":{"repository":"/opt/models/"}
}'|jq