Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 23 11:26
    mergify[bot] unlabeled #1476
  • Nov 23 11:26

    mergify[bot] on master

    fix(tensorrt): clarify conditio… (compare)

  • Nov 23 11:26
    mergify[bot] closed #1476
  • Nov 23 11:26
    mergify[bot] labeled #1476
  • Nov 22 16:49
    mergify[bot] review_requested #1476
  • Nov 22 16:49
    mergify[bot] review_requested #1476
  • Nov 22 16:48
    Bycob labeled #1476
  • Nov 22 16:48
    Bycob labeled #1476
  • Nov 22 16:48
    Bycob opened #1476
  • Nov 10 11:55
    mergify[bot] unlabeled #1475
  • Nov 10 11:55

    mergify[bot] on master

    feat(torch): update torch to 1.… (compare)

  • Nov 10 11:55
    mergify[bot] closed #1475
  • Nov 10 11:55
    mergify[bot] labeled #1475
  • Nov 09 13:55
    mergify[bot] unlabeled #1475
  • Nov 09 13:39
    mergify[bot] synchronize #1475
  • Nov 09 13:39
    mergify[bot] labeled #1475
  • Nov 08 10:54
    fantes commented #1475
  • Nov 08 09:19
    Bycob commented #1475
  • Nov 08 09:18
    mergify[bot] review_requested #1475
  • Nov 08 09:18
    mergify[bot] review_requested #1475
Emmanuel Benazera
@beniz
Sad indeed... You may have enough data to train your own if the results have been recorded.
cchadowitz-pf
@cchadowitz-pf
:+1:
dgtlmoon
@dgtlmoon

@beniz hey, happy new year 😊 simsearch question, I have the object detector working nice for the tshirt artwork, and you said that training the ResNet classifier model with more categories (say 200 or so different bands tshirt artwork) can improve the search results, however I've found that training the classifier on only 10 or so band names seems to give the simsearch better results.. whats your thoughts? note - a single band might have many different designs, so maybe this is the problem..

so trained on 10 categories, the resnet simsearch model is kinda OK, but not brilliant
on 200 categories, its sometimes workable but not great

Do you have any other tips for tuning simsearch? fortunately the 'domain' is all kinda similar, printed artwork on cloth.. maybe some image filter or tuning option?

maybe its a FAISS index tuning issue too
dgtlmoon
@dgtlmoon
hmm I need to work on a pipeline that builds one huge image with lots of example simsearch results, so that I can better visualise if i'm improving or not, then try different layers too
Emmanuel Benazera
@beniz
Hi @dgtlmoon by looking at results it's easier to "debug", diagnosis and improve
dgtlmoon
@dgtlmoon
hey :) yeah.. ok, i'll focus on making (automated) some giant JPEG with result comparisons
Emmanuel Benazera
@beniz
Also a clear definition of what similar should mean in your context could be useful since you are looking to optimize further.
dgtlmoon
@dgtlmoon
yup for sure :) i'll get back to you with something more concrete :) some nice big reports or something
dgtlmoon
@dgtlmoon
@beniz training object detector question - when i'm training multiple classes, and those classes/objects generally always appear in the same image, should I always be sure to train an image that contains all of the objects? or is it totally no difference if I have a dataset of Object A (bbox txt's and imgs) and a dataset of Object B (bbox txt's and imgs), but in practice the images i'm detecting on always contain both A and B? should be no difference in accuracy or?
dgtlmoon
@dgtlmoon
congrats on v0.20.0 :D
dgtlmoon
@dgtlmoon
[2022-02-04 16:06:58.042] [torchlib] [info] Initializing net from parameters: 
[2022-02-04 16:06:58.042] [torchlib] [info] Creating layer / name=tdata / type=AnnotatedData
[2022-02-04 16:06:58.042] [torchlib] [info] Creating Layer tdata
[2022-02-04 16:06:58.042] [torchlib] [info] tdata -> data
[2022-02-04 16:06:58.042] [torchlib] [info] tdata -> label
terminate called after throwing an instance of 'CaffeErrorException'
  what():  ./include/caffe/util/db_lmdb.hpp:15 / Check failed (custom): (mdb_status) == (0)
 0# dd::OatppJsonAPI::abort(int) at /opt/deepdetect/src/oatppjsonapi.cc:255
 1# 0x00007F24AC053210 in /lib/x86_64-linux-gnu/libc.so.6
 2# raise in /lib/x86_64-linux-gnu/libc.so.6
 3# abort in /lib/x86_64-linux-gnu/libc.so.6
 4# 0x00007F24AC477911 in /lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F24AC48338C in /lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007F24AC4833F7 in /lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007F24AC4836A9 in /lib/x86_64-linux-gnu/libstdc++.so.6
 8# caffe::db::MDB_CHECK(int) at ./include/caffe/util/db_lmdb.hpp:15
 9# caffe::db::LMDB::Open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, caffe::db::Mode) at src/caffe/util/db_lmdb.cpp:40
10# caffe::DataReader<caffe::AnnotatedDatum>::Body::InternalThreadEntry() at src/caffe/data_reader.cpp:95
11# 0x00007F24B0DC343B in /lib/x86_64-linux-gnu/libboost_thread.so.1.71.0
12# start_thread in /lib/x86_64-linux-gnu/libpthread.so.0
13# __clone in /lib/x86_64-linux-gnu/libc.so.6

Aborted (core dumped)

current jolibrain/deepdetect_cpu with https://www.deepdetect.com/downloads/platform/pretrained/ssd_300/VGG_rotate_generic_detect_v2_SSD_rotate_300x300_iter_115000.caffemodel

used to work before i think.. checking GPU version

hmm maybe related to a permissions issue in the model dir? src/caffe/util/db_lmdb.cpp LMDB etc related.. maybe.. checking
x/detection$ ls -al model/train.lmdb/
total 29932
drwxr--r-- 2 dgtlmoon dgtlmoon     4096 Feb  4 17:10 .
drwxrwxrwx 3 dgtlmoon dgtlmoon     4096 Feb  4 17:10 ..
-rw-r--r-- 1 dgtlmoon dgtlmoon 30638080 Feb  4 17:10 data.mdb
-rw-r--r-- 1 dgtlmoon dgtlmoon     8192 Feb  4 17:10 lock.mdb
looks atleast like its writing, I'm nuking that dir when i create the service, so these are always fresh
Emmanuel Benazera
@beniz
@dgtlmoon it doesn't matter if some objects are on some images only if this is your question
make sure you have the writing permissions overall since this is being access by the docker image.
dgtlmoon
@dgtlmoon
@beniz i'm doing docker stuff for many years now, that was the first thing i checked :)
dd@5c643a8f4b39:/tags_dataset/bottom/model$ mkdir foobar
dd@5c643a8f4b39:/tags_dataset/bottom/model$ touch foobar/ok.txt
dd@5c643a8f4b39:/tags_dataset/bottom/model$
I can see that train.lmdb/data.mdb is created without problem brand new every time, and THEN the segfault happens
the setup of the service works fine, then I call the train method, I see the MDB is created every time, and i get segfault

curl -X POST "http://localhost:8080/train" -d '
{
  "service": "location",
  "async": true,
  "parameters": {
    "input": {
      "db": true,
      "db_width": 512,
      "db_height": 512,
           "width": 300,
           "height": 300

    },
    "mllib": {
      "resume": false,
      "net": {
        "batch_size": 20,
        "test_batch_size": 12
      },
      "solver": {
        "iterations": 50000,
        "test_interval": 500,
        "snapshot": 1000,
        "base_lr": 0.0001
      },
      "bbox": true
    },
    "output": {
      "measure": [
        "map"
      ]
    }
  },
  "data": [ "/tags_dataset/bottom/bottom-images.txt" ]
}
'
dgtlmoon
@dgtlmoon
I even chmod 777'ed the model dir just to be safe
dgtlmoon
@dgtlmoon
hmm ok, tried some earlier tags like v0.15.0 v0.17.0 which I remember worked, same error, so I'm doing something weird here somehow
dgtlmoon
@dgtlmoon

got it.. needs some check put in the code I think

Forgot to add test AND train lists

  "data": [ "/tags_dataset/bottom/train.txt" ]

will cause the crash

Emmanuel Benazera
@beniz
ah thanks, this should be caught by unit tests, and an error returned instead.
dgtlmoon
@dgtlmoon
done :)
dgtlmoon
@dgtlmoon

I love when you get simple problems to solve that goto amazing accuracy in a few iterations

"map_hist": [ 0.5956036100784937, 0.8226678265879551, 0.9506028971324364, 0.9687120678524176 ] :)

dgtlmoon
@dgtlmoon
I tried using a single file of images to train on, and added "parameters": { "input": { "test_split": 0.10 to both service setup and service train calls, but I still get that same segfault
Emmanuel Benazera
@beniz
object detector training with caffe has no test_split
dgtlmoon
@dgtlmoon
x)
Emmanuel Benazera
@beniz
it's going to be deprecated soon, for the torch backend with object detection. At the moment, split by hand and provide a test.txt file, the fix has been merged.
dgtlmoon
@dgtlmoon
ok I usually use caffe for absolutely no reason than that's whats in the examples, maybe I should try torch? (trying squeezenet object detector here)
yes I have a python script to split stuff for me I wrote
that's funny, so actually the bug I found was completely unrelated to what I was doing :D
dgtlmoon
@dgtlmoon
I wonder if you can run a classifier with just one class, i'll try, i think so
Nur
@nurkbts
Hi
How can use code for AI Server , easy setup
Marc Reichman
@marcreichman
Hello @beniz or anyone. Quick question - for a pretrained scenario with just classifications, is it possible to run in read-only docker? I am seeing a message couldn't write model.json file in model repository... when loading, but classifications seem to work fine. Can this message be disregarded in this case?
Emmanuel Benazera
@beniz
Hey @marcreichman yes I believe you can ignore.
Marc Reichman
@marcreichman
Thanks @beniz !
cchadowitz-pf
@cchadowitz-pf
Hi @beniz, long time no chat! I'm working on integrating some torch models into our DD configuration and ran into the following error. This is a traced pytorch model that I've successfully used (in a test environment) with LibTorch using OpenCV to load the image file into a cv::Mat, which then required color channel permutation as well as the dimensions permuted to match up with what libtorch required. I imagine the error I'm getting from DD is probably related to something similar where the dimensions aren't lining up for some reason between what the torch dataloader is providing the model and what the model expects. Does that seem like the case to you? If so, do you have any suggestions on how to tweak things to work around it?
mllib internal error: Libtorch error:The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__.py", line 12, in forward
    model = self.model
    transforms = self.transforms
    input = torch.unsqueeze((transforms).forward(x, ), 0)
                             ~~~~~~~~~~~~~~~~~~~ <--- HERE
    return (model).forward(input, )
  File "code/__torch__/torch/nn/modules/container.py", line 12, in forward
    _1 = getattr(self, "1")
    _0 = getattr(self, "0")
    return (_1).forward((_0).forward(x, ), )
                         ~~~~~~~~~~~ <--- HERE
  File "code/__torch__/torchvision/transforms/transforms.py", line 10, in forward
    img = torch.unsqueeze(x, 0)
    input = torch.to(img, 6)
    img0 = torch.upsample_bilinear2d(input, [299, 299], False, None)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    img1 = torch.squeeze(img0, 0)
    img2 = torch.round(img1)

Traceback of TorchScript, original code (most recent call last):
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py(3919): interpolate
/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py(490): resize
/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py(438): resize
/usr/local/lib/python3.7/dist-packages/torchvision/transforms/transforms.py(349): forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1098): _slow_forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1110): _call_impl
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py(141): forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1098): _slow_forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1110): _call_impl
<ipython-input-3-4afab89cb121>(19): forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1098): _slow_forward
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py(1110): _call_impl
/usr/local/lib/python3.7/dist-packages/torch/jit/_trace.py(965): trace_module
/usr/local/lib/python3.7/dist-packages/torch/jit/_trace.py(750): trace
<ipython-input-10-1646f9ee17ed>(5): <module>
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py(2882): run_code
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py(2822): run_ast_nodes
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py(2718): run_cell
/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py(537): run_cell
/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py(208): do_execute
/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py(399): execute_request
/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py(233): dispatch_shell
/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py(283): dispatcher
/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py(300): null_wrapper
/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py(556): _run_callback
/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py(606): _handle_recv
/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py(577): _handle_events
/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py(300): null_wrapper
/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py(122): _handle_events
/usr/lib/python3.7/asyncio/events.py(88): _run
/usr/lib/python3.7/asyncio/base_events.py(1786): _run_once
/usr/lib/python3.7/asyncio/base_events.py(541): run_forever
/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py(132): start
/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py(499): start
/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py(846): launch_instance
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py(16): <module>
/usr/lib/python3.7/runpy.py(85): _run_code
/usr/lib/python3.7/runpy.py(193): _run_module_as_main
RuntimeError: It is expected input_size equals to 4, but got size 5
for comparison, in my small libtorch test environment that I successfully used this model, I used the following:
        torch::jit::script::Module module;
        module = torch::jit::load(argv[1]);

        cv::Mat img = cv::imread(argv[2]);
        cv::cvtColor(img, img, cv::COLOR_BGR2RGB);
        at::Tensor tensor_image = torch::from_blob(img.data, {  img.rows, img.cols, img.channels() }, at::kByte);
        tensor_image = tensor_image.to(at::kFloat);
        tensor_image = tensor_image.permute({ 2, 0, 1 });

        at::Tensor output = module.forward({tensor_image}).toTensor();
Emmanuel Benazera
@beniz
Certainly the permutation, cc @Bycob
cchadowitz-pf
@cchadowitz-pf
I've been trying to build DD with torchlib.cc changed to match the way my minimal working example (outside DD) works in terms of passing a tensor to the .forward() method but have not yet been able to reconcile the two to identify what the issue is on the DD side....
Louis Jean
@Bycob
Images are loaded here in DD https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchdataset.cc#L890
It looks like we do the same operations as you
Maybe you can try to log the dimensions of the input tensor to see if it's similar to the one in your example?
cchadowitz-pf
@cchadowitz-pf
would it make sense to do that around here? https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchlib.cc#L1462
something like just outputting tensor.dim()?
Louis Jean
@Bycob
Yes, that way you will see exactly what the model is taking as input
cchadowitz-pf
@cchadowitz-pf
so at that point i'm getting dim 4, i'm going to also try to output in_vals[0].dim() at https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchlib.cc#L1472
Louis Jean
@Bycob
The traced module is called here https://github.com/jolibrain/deepdetect/blob/master/src/backends/torch/torchmodule.cc#L315
.dim() gives you the number of dimensions, you can call .sizes() to get all the dimensions