Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • May 17 13:48
    mergify[bot] unlabeled #1433
  • May 17 13:48
    mergify[bot] closed #1433
  • May 17 13:48

    mergify[bot] on master

    chore(ci): enable torch vision … (compare)

  • May 17 12:32
    mergify[bot] synchronize #1433
  • May 17 12:32
    mergify[bot] labeled #1433
  • May 17 12:31

    Bycob on master

    fix: remove semantic commit che… (compare)

  • May 17 12:31
    Bycob closed #1434
  • May 16 15:41

    Bycob on master

    feat(torch): ocr model training… (compare)

  • May 16 15:41
    Bycob closed #1432
  • May 16 14:35
    Bycob edited #1434
  • May 16 14:34
    mergify[bot] review_requested #1434
  • May 16 14:34
    mergify[bot] review_requested #1434
  • May 16 14:34
    Bycob labeled #1434
  • May 16 14:34
    Bycob labeled #1434
  • May 16 14:34
    Bycob opened #1434
  • May 16 12:56
    mergify[bot] review_requested #1433
  • May 16 12:56
    mergify[bot] review_requested #1433
  • May 16 12:55
    fantes labeled #1433
  • May 16 12:55
    fantes labeled #1433
  • May 16 08:44
    fantes opened #1433
Emmanuel Benazera
@beniz
we have cached versions of the torch build
you can tarball the pre-built torch as needed, same for caffe
YaYaB
@YaYaB
Haha I'll ask for more ressources :p
On my side it seems to be stuck quite a long time on faiss with the following errors
ptxas /tmp/tmpxft_00006a63_00000000-12_PQScanMultiPassNoPrecomputed.compute_50.ptx, line 111; warning : ld
...
...
ptxas /tmp/tmpxft_0000753a_00000000-5_PQScanMultiPassPrecomputed.compute_75.ptx, line 11479; warning : ld
Emmanuel Benazera
@beniz
you are probably building for too many architectures
they get detected by cmake, and you can also force them by hand for caffe with CUDA_ARCH, not sure about pytorch builds, I can ask those who deal with it.
YaYaB
@YaYaB
Yeah I thought they were detected by cmake but it does not seem to be the case here
Emmanuel Benazera
@beniz
they are not passed to torch, so torch builds for multiple architectures
there'll be an internal card for passing native gpu arch
YaYaB
@YaYaB
Ok good know, I'll try setting the CUDA_ARCH for caffe to see if that reduces the build time
cchadowitz-pf
@cchadowitz-pf

hey @beniz - continuing to experiment with NCNN, I was trying to convert the word_detect_v2, crop action, multiword_ocr chain from Caffe to NCNN and ran into an internal error:

[2021-01-15 22:07:29.135] [ocr-d21cfe6c-2e8c-40c7-94df-b4740bcfb44a-0] [info] number of calls=3
[2021-01-15 22:07:29.135] [ocr-d21cfe6c-2e8c-40c7-94df-b4740bcfb44a-0] [info] [0] / executing predict on service word_detect_v2_ncnn
[2021-01-15 22:07:35.575] [ocr-d21cfe6c-2e8c-40c7-94df-b4740bcfb44a-0] [info] [1] / executing action crop
[2021-01-15 22:07:35.576] [api] [error] 10.10.10.32 "PUT /chain/ocr-d21cfe6c-2e8c-40c7-94df-b4740bcfb44a-0" 500 6441ms

and the returned error was:

{
  "status": {
    "code": 500,
    "dd_code": 1007,
    "dd_msg": "in get<T>()",
    "msg": "InternalError"
  }
}

Any idea what's going on?

Emmanuel Benazera
@beniz
@cchadowitz-pf Hi, best if you can provide the full API call and image for reproducing. This looks like either an API parameter with wrong type or an internal value to the chain with wrong type.
Emmanuel Benazera
@beniz
@cchadowitz-pf OK thanks for the report, got it, see PR #1137. There should be a 0.12.1 release next wed anyways as we fixed a few things.
However, there's a remaining issue with NCNN since it only supports batch size of 1, and thus on chains, downstream NCNN models can only process the first sample.
I'll add larger batch size support for NCNN, but their doc shows how to do this by simply using omp to parallelize a for loop, meaning it's parallel on the CPU, not aggregated into batches at GPU level.
Nevertheless, since NCNN is mostly supposed to be used on CPU, this should not harm too much. Maybe I'll force the number of threads to the number of local CPU cores.
Emmanuel Benazera
@beniz
OK, so there's now PR 1138 that adds support for batches to NCNN with image models. Chains do appear to work correctly for me. We're lacking proper tests on chains, but they should make it in soon.
cchadowitz-pf
@cchadowitz-pf
oh fantastic! (sorry for the delay, i was away for a bit)
I wasn't aware that NCNN was even utilizing the GPU in DeepDetect since it relies on vulkan I thought?
Emmanuel Benazera
@beniz
yes, it's vulkan based, I use it with the Apple silicon M1 chip at the moment
cchadowitz-pf
@cchadowitz-pf
:+1: i didn't realize DeepDetect supported NCNN on gpu already though, cool!
Emmanuel Benazera
@beniz
that was my xmas project yes, we may port it to NVidia GPUs, but at least the Vulkan part is there (still PR)
cchadowitz-pf
@cchadowitz-pf
:+1: it doesn't natively support CUDA, right?
Emmanuel Benazera
@beniz
not that I know
but nvidia gpus via vulkan I believe
dgtlmoon
@dgtlmoon
./dede: error while loading shared libraries: libprotobuf.so.3.11.4.0: cannot open shared object file: No such file or directory from jolibrain/deepdetect_cpu
not sure if it's me just yet
Emmanuel Benazera
@beniz
docker?
dgtlmoon
@dgtlmoon
dd@afe456179096:/opt/deepdetect/build/main$ ./dede
./dede: error while loading shared libraries: libprotobuf.so.3.11.4.0: cannot open shared object file: No such file or directory
in the dede docker is
dd@afe456179096:/opt/deepdetect/build/main$ dpkg -l|grep libprotobuf
ii  libprotobuf10:amd64             3.0.0-9.1ubuntu1                    amd64        protocol buffers C++ library

dd@afe456179096:/opt/deepdetect/build/main$ cat /var/lib/dpkg/info/libprotobuf10\:amd64.list 
/usr/lib/x86_64-linux-gnu/libprotobuf.so.10.0.0
dgtlmoon
@dgtlmoon
dd@afe456179096:/opt/deepdetect/build/main$ find /|grep libprotobuf            
/opt/deepdetect/build/lib/libprotobuf.so
/opt/deepdetect/build/lib/libprotobuf.so.3.11.4.0
/opt/deepdetect/build/lib/libprotobuf-lite.so
/opt/deepdetect/build/lib/libprotobuf-lite.so.3.11.4.0
your ldpath config is broken perhaps
Emmanuel Benazera
@beniz
hi @dgtlmoon protobuf is built internally to avoid conflicts, what docker are you using ?
dgtlmoon
@dgtlmoon
good morning :) I'm using the current jolibrain/deepdetect_cpu, just ran a docker-compose pull
dd@afe456179096:/opt/deepdetect/build/main$ ldconfig -v 2>/dev/null | grep -v ^$'\t'
/usr/local/lib:
/lib/x86_64-linux-gnu:
/usr/lib/x86_64-linux-gnu:
/lib:
/usr/lib:
I'm using my own start-up script by the way.. ahhh
Emmanuel Benazera
@beniz
I believe the docker containers are now tested by our CI so this should not happen. Can I reproduce ?
dgtlmoon
@dgtlmoon
I think it's because I'm starting the container this way....
  deepdetect:
    image: jolibrain/deepdetect_cpu
    command: bash -c 'LD_LIBRARY_PATH=/opt/deepdetect/build/lib/; export LD_LIBRARY_PATH; ./dede -host 0.0.0.0 & sleep 3; /init.sh; wait;'
    container_name: tss_dd
    volumes:
      # Curl doesnt exist there and we arent root :(
      - ./deepdetect/curl:/usr/bin/curl
      - ./deepdetect/init.sh:/init.sh
      - ./deepdetect/models:/models
      - ./deepdetect/models-classifier:/models-classifier
      - ./deepdetect/models-tag-classifier:/models-tag-classifier

    expose:
      - 8080
    networks:
      - tssnet
    restart: always
but weird thing is - this only started happening in the last couple of weeks
I added that LD_LIB_PATH config stuff just now - and everything is fine, without it - it wont start
My init.sh creates a few services from within the container, but thinking about it, I dont need to run that from inside the container.... could be my weird brain at work
Emmanuel Benazera
@beniz
sure, no worries, if you see something that would ease usage and that we should add on dd side, let me know
dgtlmoon
@dgtlmoon
thanks :)
YaYaB
@YaYaB
:+1:
cchadowitz-pf
@cchadowitz-pf
:fireworks:
cchadowitz-pf
@cchadowitz-pf
@beniz any insight on if the ci-master docker image is up to date with the v0.13.0 tag?
Mehdi ABAAKOUK
@sileht
We just release the tag 0.13.0, docker images will be built this night (CEST), ci-master have been built last night and is one commit behind 0.13.0
cchadowitz-pf
@cchadowitz-pf
by any chance is this the commit that ci-master is missing? jolibrain/deepdetect@b85d79e
it's the one i was hoping for :sweat_smile:
Mehdi ABAAKOUK
@sileht
that's the one unfortunately
cchadowitz-pf
@cchadowitz-pf
ah well :smile: i'll look forward to the v0.13.0 and/or next ci-master builds! thanks!
cchadowitz-pf
@cchadowitz-pf
just testing v0.13.0 release (i built a docker image locally) - ran into the error I described in #1151
I don't believe I had that error in v0.12.0 (or even some of the ci-master builds between v0.12.0 and now), so I'm guessing something was introduced, or changed in the NCNN master branch upstream?