Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 29 18:31
    mergify[bot] unlabeled #1446
  • Jun 29 18:31
    mergify[bot] closed #1446
  • Jun 29 18:31

    mergify[bot] on master

    chore: update to oatpp 1.3 (compare)

  • Jun 29 18:31
    mergify[bot] labeled #1446
  • Jun 29 15:45
    Bycob synchronize #1446
  • Jun 29 10:35
    Bycob synchronize #1446
  • Jun 28 15:54
    Bycob synchronize #1446
  • Jun 27 11:49
    Bycob labeled #1446
  • Jun 27 08:57
    Bycob synchronize #1446
  • Jun 27 07:54
    beniz commented #1446
  • Jun 25 09:18
    chichaj unassigned #547
  • Jun 25 09:18
    chichaj unassigned #443
  • Jun 25 09:18
    chichaj unassigned #447
  • Jun 24 16:00
    mergify[bot] review_requested #1446
  • Jun 24 16:00
    mergify[bot] review_requested #1446
  • Jun 24 15:59
    Bycob labeled #1446
  • Jun 24 15:59
    Bycob labeled #1446
  • Jun 24 15:59
    Bycob opened #1446
  • Jun 22 15:41
    Bycob edited #1437
  • Jun 22 10:44
    mergify[bot] unlabeled #1437
tasibalint
@tasibalint
great amma do that
tasibalint
@tasibalint

image.png

some error as in here

tasibalint
@tasibalint
i fogot to change 300 to 512 in the training call as well as I had two train.txt in the train data, changed that to test.txt. Now i dont get the error atleast :D
Emmanuel Benazera
@beniz
good :)
tasibalint
@tasibalint
how do u guys use graphics card for docker?
Emmanuel Benazera
@beniz
Hi, we have special docker builds and they need to be used with the nvidia docker runtime. They are public.
dgtlmoon
@dgtlmoon
@tasibalint i'm using DD with a very highend graphics card over at paperspace.com works great
Danai Brilli
@danaibrilli
@dgtlmoon happy new year! could you provide info on how to do it because im new to these tools?
Danai Brilli
@danaibrilli
Hello guys! Im new to deepdetect and i'm trying to use the platform and the pre-trained models to predict from a local file, but when i give the path as a data url i get an error: "Service Input Error: vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)"
Any idea as to what I should do?
On the other hand, when I use an online image (its url) the api returns the predictions I wanted
Emmanuel Benazera
@beniz
Hi, the path to a file is relative to the filesystem of the docker platform, i.e. /opt/platform/data for instance
cchadowitz-pf
@cchadowitz-pf

hey @beniz - happy new year! long time since I've been looking through this stuff - tons of updates!! was just wondering if you're still building docker images with TF included. I'm trying to update our build process and I'm running into quite a number of issues with changes in DD affecting the build as well as changes in floopz/tensorflow_cc (and tensorflow itself).

Anyways, was just wondering if a (even CPU-only, for now) automated build is still happening with tensorflow that I can compare my build process too. Thanks in advance!

Emmanuel Benazera
@beniz
Hi @cchadowitz-pf thanks, happy solar roundabout to you. TF is completely unused and unmaintained on our side. Sorry about that, but we ended up having no use case for it. Libtorch is the production backend now.
cchadowitz-pf
@cchadowitz-pf
I see, that's kind of what I figured :+1: are you still using caffe for anything? I know it's the default backend for builds, but curious if you're actually using it.
Also, do you have a reliable pipeline for converting models? I'd love to convert some of our models but there seems to be quite a bit of manual effort involved (not to mention validating/testing afterwards). If you have any sort of pipeline I'd be very interested :)
Emmanuel Benazera
@beniz
@cchadowitz-pf we don't train new models with caffe, the 'legacy' models still running with our customers are in practice running the tensorrt backend (that got a nice upgrade recently in DD), thus abstracting away the initial backend. In your case it's a bit different since you are running CPU, correct ?
I believe the best pipeline for CPU would be to convert to ONNX and use onnxruntime. You could use onnxruntime directly, or we could add it to DD. It's actually on our optional todolist, but as we are driven by our customers and they never asked for CPU we didn't got it into DD. Not difficult for inference though.
Let me know your thoughts.
cchadowitz-pf
@cchadowitz-pf
I see :+1: we definitely still have CPU in some cases unfortunately. Any reason you suggest using onnxruntime instead of libtorch? Is it just more efficient without a GPU?
In practice, it sounds like you're more often training models and then deploying, rather than converting models - is that accurate? It sounds like I'll need to put together a methodology to compare and validate models before/after converting them. I'm hoping to not have to do it manually but it will probably come down to that....
and yes! many new and awesome things lately in DD - that's part of why I'm trying to get our build up to date again hah. Have yet to find a good way to continue to use the TF openimages pretrained model outside of TF, however.
Emmanuel Benazera
@beniz
Hi @cchadowitz-pf , suggesting onnxruntime since you are asking about converting your tf models: converting to onnx seems more appropriate than TF to pytorch, but you may want to double check. For simple models such as openimage, this seems a no brainer, but in reality... who know ?!
onnxruntime reports good performances also.
You are correct we are always training custom models, that's actually our business. If the openimage model is your sole TF dependency, you may be able to convert it or find a more recent and better one for torch.
cchadowitz-pf
@cchadowitz-pf
:+1: awesome thanks! I had actually found that same page, but saw this note: These are unpruned models with just the feature extractor weights, and may not be used without retraining to deploy in a classification application.
I'll keep looking around but may pop back in here for more thoughts :)
Emmanuel Benazera
@beniz
Sad indeed... You may have enough data to train your own if the results have been recorded.
cchadowitz-pf
@cchadowitz-pf
:+1:
dgtlmoon
@dgtlmoon

@beniz hey, happy new year 😊 simsearch question, I have the object detector working nice for the tshirt artwork, and you said that training the ResNet classifier model with more categories (say 200 or so different bands tshirt artwork) can improve the search results, however I've found that training the classifier on only 10 or so band names seems to give the simsearch better results.. whats your thoughts? note - a single band might have many different designs, so maybe this is the problem..

so trained on 10 categories, the resnet simsearch model is kinda OK, but not brilliant
on 200 categories, its sometimes workable but not great

Do you have any other tips for tuning simsearch? fortunately the 'domain' is all kinda similar, printed artwork on cloth.. maybe some image filter or tuning option?

maybe its a FAISS index tuning issue too
dgtlmoon
@dgtlmoon
hmm I need to work on a pipeline that builds one huge image with lots of example simsearch results, so that I can better visualise if i'm improving or not, then try different layers too
Emmanuel Benazera
@beniz
Hi @dgtlmoon by looking at results it's easier to "debug", diagnosis and improve
dgtlmoon
@dgtlmoon
hey :) yeah.. ok, i'll focus on making (automated) some giant JPEG with result comparisons
Emmanuel Benazera
@beniz
Also a clear definition of what similar should mean in your context could be useful since you are looking to optimize further.
dgtlmoon
@dgtlmoon
yup for sure :) i'll get back to you with something more concrete :) some nice big reports or something
dgtlmoon
@dgtlmoon
@beniz training object detector question - when i'm training multiple classes, and those classes/objects generally always appear in the same image, should I always be sure to train an image that contains all of the objects? or is it totally no difference if I have a dataset of Object A (bbox txt's and imgs) and a dataset of Object B (bbox txt's and imgs), but in practice the images i'm detecting on always contain both A and B? should be no difference in accuracy or?
dgtlmoon
@dgtlmoon
congrats on v0.20.0 :D
dgtlmoon
@dgtlmoon
[2022-02-04 16:06:58.042] [torchlib] [info] Initializing net from parameters: 
[2022-02-04 16:06:58.042] [torchlib] [info] Creating layer / name=tdata / type=AnnotatedData
[2022-02-04 16:06:58.042] [torchlib] [info] Creating Layer tdata
[2022-02-04 16:06:58.042] [torchlib] [info] tdata -> data
[2022-02-04 16:06:58.042] [torchlib] [info] tdata -> label
terminate called after throwing an instance of 'CaffeErrorException'
  what():  ./include/caffe/util/db_lmdb.hpp:15 / Check failed (custom): (mdb_status) == (0)
 0# dd::OatppJsonAPI::abort(int) at /opt/deepdetect/src/oatppjsonapi.cc:255
 1# 0x00007F24AC053210 in /lib/x86_64-linux-gnu/libc.so.6
 2# raise in /lib/x86_64-linux-gnu/libc.so.6
 3# abort in /lib/x86_64-linux-gnu/libc.so.6
 4# 0x00007F24AC477911 in /lib/x86_64-linux-gnu/libstdc++.so.6
 5# 0x00007F24AC48338C in /lib/x86_64-linux-gnu/libstdc++.so.6
 6# 0x00007F24AC4833F7 in /lib/x86_64-linux-gnu/libstdc++.so.6
 7# 0x00007F24AC4836A9 in /lib/x86_64-linux-gnu/libstdc++.so.6
 8# caffe::db::MDB_CHECK(int) at ./include/caffe/util/db_lmdb.hpp:15
 9# caffe::db::LMDB::Open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, caffe::db::Mode) at src/caffe/util/db_lmdb.cpp:40
10# caffe::DataReader<caffe::AnnotatedDatum>::Body::InternalThreadEntry() at src/caffe/data_reader.cpp:95
11# 0x00007F24B0DC343B in /lib/x86_64-linux-gnu/libboost_thread.so.1.71.0
12# start_thread in /lib/x86_64-linux-gnu/libpthread.so.0
13# __clone in /lib/x86_64-linux-gnu/libc.so.6

Aborted (core dumped)

current jolibrain/deepdetect_cpu with https://www.deepdetect.com/downloads/platform/pretrained/ssd_300/VGG_rotate_generic_detect_v2_SSD_rotate_300x300_iter_115000.caffemodel

used to work before i think.. checking GPU version

hmm maybe related to a permissions issue in the model dir? src/caffe/util/db_lmdb.cpp LMDB etc related.. maybe.. checking
x/detection$ ls -al model/train.lmdb/
total 29932
drwxr--r-- 2 dgtlmoon dgtlmoon     4096 Feb  4 17:10 .
drwxrwxrwx 3 dgtlmoon dgtlmoon     4096 Feb  4 17:10 ..
-rw-r--r-- 1 dgtlmoon dgtlmoon 30638080 Feb  4 17:10 data.mdb
-rw-r--r-- 1 dgtlmoon dgtlmoon     8192 Feb  4 17:10 lock.mdb
looks atleast like its writing, I'm nuking that dir when i create the service, so these are always fresh
Emmanuel Benazera
@beniz
@dgtlmoon it doesn't matter if some objects are on some images only if this is your question
make sure you have the writing permissions overall since this is being access by the docker image.
dgtlmoon
@dgtlmoon
@beniz i'm doing docker stuff for many years now, that was the first thing i checked :)
dd@5c643a8f4b39:/tags_dataset/bottom/model$ mkdir foobar
dd@5c643a8f4b39:/tags_dataset/bottom/model$ touch foobar/ok.txt
dd@5c643a8f4b39:/tags_dataset/bottom/model$
I can see that train.lmdb/data.mdb is created without problem brand new every time, and THEN the segfault happens
the setup of the service works fine, then I call the train method, I see the MDB is created every time, and i get segfault

curl -X POST "http://localhost:8080/train" -d '
{
  "service": "location",
  "async": true,
  "parameters": {
    "input": {
      "db": true,
      "db_width": 512,
      "db_height": 512,
           "width": 300,
           "height": 300

    },
    "mllib": {
      "resume": false,
      "net": {
        "batch_size": 20,
        "test_batch_size": 12
      },
      "solver": {
        "iterations": 50000,
        "test_interval": 500,
        "snapshot": 1000,
        "base_lr": 0.0001
      },
      "bbox": true
    },
    "output": {
      "measure": [
        "map"
      ]
    }
  },
  "data": [ "/tags_dataset/bottom/bottom-images.txt" ]
}
'
dgtlmoon
@dgtlmoon
I even chmod 777'ed the model dir just to be safe
dgtlmoon
@dgtlmoon
hmm ok, tried some earlier tags like v0.15.0 v0.17.0 which I remember worked, same error, so I'm doing something weird here somehow
dgtlmoon
@dgtlmoon

got it.. needs some check put in the code I think

Forgot to add test AND train lists

  "data": [ "/tags_dataset/bottom/train.txt" ]

will cause the crash

Emmanuel Benazera
@beniz
ah thanks, this should be caught by unit tests, and an error returned instead.
dgtlmoon
@dgtlmoon
done :)
dgtlmoon
@dgtlmoon

I love when you get simple problems to solve that goto amazing accuracy in a few iterations

"map_hist": [ 0.5956036100784937, 0.8226678265879551, 0.9506028971324364, 0.9687120678524176 ] :)