These are chat archives for beniz/deepdetect

19th
Nov 2017
jubeenshah
@jubeenshah
Nov 19 2017 17:16
I tried again setting up another service, but I'm getting the same error
INFO - 22:38:12 - This network produces output loss3/top-1 INFO - 22:38:12 - This network produces output probt INFO - 22:38:12 - Network initialization done. E1119 22:38:12.653645 7960 caffelib.cc:785] exception while forward/backward pass through the network E1119 22:40:46.902273 7601 caffelib.cc:1109] Error creating model for prediction INFO - 22:38:12 - Solver scaffolding done. ERROR - 22:40:46 - service cosmos_image_classification mllib internal error: no model in /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository for initializing the net
I'm sorry for asking noob questions, but I'm new to environment, and might need some help to start up... Sorry :(
:worried:
INFO - 22:37:28 - Processed 1000 files.
INFO - 22:37:32 - Processed 2000 files.
INFO - 22:37:35 - Processed 3000 files.
INFO - 22:37:38 - Processed 4000 files.
INFO - 22:37:41 - Processed 5000 files.
INFO - 22:37:44 - Processed 6000 files.
INFO - 22:37:47 - Processed 7000 files.
INFO - 22:37:50 - Processed 8000 files.
INFO - 22:37:53 - Processed 9000 files.
INFO - 22:37:56 - Processed 10000 files.
INFO - 22:37:59 - Processed 11000 files.
INFO - 22:37:59 - Processed 11012 files.
Emmanuel Benazera
@beniz
Nov 19 2017 17:26
the error says it all, there's no model in your directory
if your training call has failed, you'd need to post the error message
jubeenshah
@jubeenshah
Nov 19 2017 17:27
`
E1119 22:40:46.902273 7601 caffelib.cc:1109] Error creating model for prediction INFO - 22:38:12 - Solver scaffolding done.
E1119 22:38:12.653645 7960 caffelib.cc:785] exception while forward/backward pass through the network
does that help?
Emmanuel Benazera
@beniz
Nov 19 2017 17:28
here you go, most likely a memory error. If you are training on GPU, lower the batch size, and tell us what GPU that is. If you are on CPU, same thing, lower the batch size.
jubeenshah
@jubeenshah
Nov 19 2017 17:28
i use a GTX1080Ti
okay I'll try and let you know
jubeenshah
@jubeenshah
Nov 19 2017 17:35
I'm was using a batch size of 32, now I'm using a batch size of 8
Emmanuel Benazera
@beniz
Nov 19 2017 17:36
what's your neural net template / architecture ?
jubeenshah
@jubeenshah
Nov 19 2017 17:36
Caffe
googlenet
Emmanuel Benazera
@beniz
Nov 19 2017 17:37
OK, 32 should work, unless you have something else running on the GPU. Use nvidia-smi to look at it
jubeenshah
@jubeenshah
Nov 19 2017 17:37
nothing is running it's completely free
```
Sun Nov 19 23:07:43 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.98                 Driver Version: 384.98                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:41:00.0  On |                  N/A |
|  0%   50C    P8    17W / 250W |    589MiB / 11169MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1305      G   /usr/lib/xorg/Xorg                           195MiB |
|    0      2343      G   compiz                                       175MiB |
|    0      7569      C   ./main/dede                                  215MiB |
+-----------------------------------------------------------------------------+
please help me with the basics here, So I have a repository folder which is empty, where the model is supposed to get created..
I have another folder where I have the images, I'm using absolute path to reference all of them
Emmanuel Benazera
@beniz
Nov 19 2017 17:40
you'd need to post the whole server output as a gist, plus all the exact calls you are making to the API
jubeenshah
@jubeenshah
Nov 19 2017 17:41
okay I'll do that give me two minutes I'm trying something which I think could've been the fault
jubeenshah
@jubeenshah
Nov 19 2017 17:48

Okay so it's not working, I was not putting the number of classes properly, even after correcting that I didn't get a correct response.
Here is the curl calls for creating the service

curl -X PUT "http://localhost:8080/services/cosmos_image_classification_3" -d '{                                :"cosmos_image_classi
       "mllib":"caffe",
       "description":"COSMOS - Image Classifier_2",
       "type":"supervised",
       "parameters":{
         "input":{
           "connector":"image",
           "width":224,
           "height":224
         },          terval":500,
         "mllib":{
           "template":"googlenet",
           "nclasses":12
         }                   
       },               
       "model":{
       "templates":"/home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/templates/caffe",
         "repository":"/home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository"
       }               
     }'

Here is the output of the server

INFO - 23:11:42 - Sun Nov 19 23:11:42 2017 IST - 127.0.0.1 "PUT /services/cosmos_image_classification_3" 201 19

Here is the training call

curl -X POST "http://localhost:8080/train" -d '{
       "service":"cosmos_image_classification_3",
       "async":true,
       "parameters":{
         "mllib":{
           "gpu":true,
           "net":{
             "batch_size":32
           },
           "solver":{
             "test_interval":500,
             "iterations":30000,
             "base_lr":0.001,
             "stepsize":1000,
             "gamma":0.9
           }
         },
         "input":{
           "connector":
           "image",
           "test_split":0.1,
           "shuffle":true,
           "width":224,
           "height":224
         },
         "output":{
           "measure":["acc","mcll","f1"]
         }
       },
       "data":["/home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/imgnet"]
     }'

Here is the training output I'm getting

INFO - 23:12:33 - Sun Nov 19 23:12:33 2017 IST - 127.0.0.1 "POST /train" 201 0

INFO - 23:12:33 - A total of 11012 images.
INFO - 23:12:33 - Opened lmdb /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/train.lmdb
INFO - 23:12:36 - Processed 1000 files.
INFO - 23:12:39 - Processed 2000 files.
INFO - 23:12:42 - Processed 3000 files.
INFO - 23:12:45 - Processed 4000 files.
INFO - 23:12:48 - Processed 5000 files.
INFO - 23:12:51 - Processed 6000 files.
INFO - 23:12:55 - Processed 7000 files.
INFO - 23:12:58 - Processed 8000 files.
INFO - 23:13:01 - Processed 9000 files.
INFO - 23:13:04 - Processed 10000 files.
INFO - 23:13:07 - Processed 11000 files.
INFO - 23:13:07 - Processed 11012 files.
INFO - 23:13:07 - Opened lmdb /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/test.lmdb
INFO - 23:13:10 - Processed 1000 files.
INFO - 23:13:10 - Processed 1224 files.
INFO - 23:13:10 - Opened lmdb /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/train.lmdb
INFO - 23:13:10 - Decoding Datum
INFO - 23:13:19 - Processed 10000 files.
INFO - 23:13:20 - Processed 11012 files.
INFO - 23:13:20 - Write to /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/mean.binaryproto
INFO - 23:13:20 - Number of channels: 3
INFO - 23:13:20 - mean_value channel [0]:128.108
INFO - 23:13:20 - mean_value channel [1]:135.752
INFO - 23:13:20 - mean_value channel [2]:123.139

Please note that in the imgnet folder I have the 12 folders of images stored

and Repository folder is empty where the models are supposed to get created

Emmanuel Benazera
@beniz
Nov 19 2017 17:49
please post the full output from the server
jubeenshah
@jubeenshah
Nov 19 2017 17:49
Here is the full output
Emmanuel Benazera
@beniz
Nov 19 2017 17:49
put it into a gist if it is long
jubeenshah
@jubeenshah
Nov 19 2017 17:50
```
INFO - 23:11:42 - Sun Nov 19 23:11:42 2017 IST - 127.0.0.1 "PUT /services/cosmos_image_classification_3" 201 19

INFO - 23:12:33 - Sun Nov 19 23:12:33 2017 IST - 127.0.0.1 "POST /train" 201 0

INFO - 23:12:33 - A total of 11012 images.
INFO - 23:12:33 - Opened lmdb /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/train.lmdb
INFO - 23:12:36 - Processed 1000 files.
INFO - 23:12:39 - Processed 2000 files.
INFO - 23:12:42 - Processed 3000 files.
INFO - 23:12:45 - Processed 4000 files.
INFO - 23:12:48 - Processed 5000 files.
INFO - 23:12:51 - Processed 6000 files.
INFO - 23:12:55 - Processed 7000 files.
INFO - 23:12:58 - Processed 8000 files.
INFO - 23:13:01 - Processed 9000 files.
INFO - 23:13:04 - Processed 10000 files.
INFO - 23:13:07 - Processed 11000 files.
INFO - 23:13:07 - Processed 11012 files.
INFO - 23:13:07 - Opened lmdb /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/test.lmdb
INFO - 23:13:10 - Processed 1000 files.
INFO - 23:13:10 - Processed 1224 files.
INFO - 23:13:10 - Opened lmdb /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/train.lmdb
INFO - 23:13:10 - Decoding Datum
INFO - 23:13:19 - Processed 10000 files.
INFO - 23:13:20 - Processed 11012 files.
INFO - 23:13:20 - Write to /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/mean.binaryproto
INFO - 23:13:20 - Number of channels: 3
INFO - 23:13:20 - mean_value channel [0]:128.108
INFO - 23:13:20 - mean_value channel [1]:135.752
INFO - 23:13:20 - mean_value channel [2]:123.139
Emmanuel Benazera
@beniz
Nov 19 2017 17:52
Look into /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository and list its content here please
jubeenshah
@jubeenshah
Nov 19 2017 17:53
/home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository
sorry
corresp.txt  mean.binaryproto  model.json  test.lmdb  train.lmdb
Emmanuel Benazera
@beniz
Nov 19 2017 17:54
the googlenet templates are missing
check what's in /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/templates/caffe
jubeenshah
@jubeenshah
Nov 19 2017 17:55
alexnet  deeplab_vgg16  lregression  pspnet_vgg16  resnet_152  resnet_50   squeezenet  vdcnn_17
cifar    googlenet      mlp          resnet        resnet_18   segnet      ssd_300     vdcnn_9
convnet  googlenet_bn   nin          resnet_101    resnet_32   shufflenet  unet        vgg_16
ubuntu@ubuntu:~/Documents/Jubeen/deepdetect/deepdetect/templates/caffe$ cd googlenet
ubuntu@ubuntu:~/Documents/Jubeen/deepdetect/deepdetect/templates/caffe/googlenet$ ls
deploy.prototxt  googlenet.prototxt  googlenet_solver.prototxt  readme.md
Emmanuel Benazera
@beniz
Nov 19 2017 17:57
are you absolutely certain you are copying the full server output ? there should be an error report at the end of it
jubeenshah
@jubeenshah
Nov 19 2017 17:58
Yes I'm Absolutely sure
Emmanuel Benazera
@beniz
Nov 19 2017 17:58
what's the head of the server log, the first few lines ?
jubeenshah
@jubeenshah
Nov 19 2017 18:00
oh damn, I misinterpreted the question I guess.
./main/dede
DeepDetect [ commit ab7bae2b9a0b994efeb95be4aa836385beccd6f8 ]

INFO - 22:16:57 - Running DeepDetect HTTP server on localhost:8080
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1119 22:37:08.036111  7594 caffelib.cc:131] instantiating model template googlenet
I1119 22:37:08.036164  7594 caffelib.cc:135] source=/home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/templates/caffe/googlenet/
I1119 22:37:08.036170  7594 caffelib.cc:136] dest=/home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/googlenet.prototxt
I'm so sorry
Emmanuel Benazera
@beniz
Nov 19 2017 18:01
can you just post the full output at once pleas ?
jubeenshah
@jubeenshah
Nov 19 2017 18:01
How do I create and post a gist file?
Emmanuel Benazera
@beniz
Nov 19 2017 18:01
google it
jubeenshah
@jubeenshah
Nov 19 2017 18:02
okay doing it
Emmanuel Benazera
@beniz
Nov 19 2017 18:22
Not sure what is wrong or what you are doing, you should see this file /home/ubuntu/Documents/Jubeen/deepdetect/deepdetect/model/repository/googlenet.prototxt. It seems this file is removed while the training is starting, maybe you are removing it somehow ?
need to go, bbl.
jubeenshah
@jubeenshah
Nov 19 2017 18:22
yeah I have it now, I started the server again
ubuntu@ubuntu:~/Documents/Jubeen/deepdetect/deepdetect/model/repository$ ls
corresp.txt      googlenet.prototxt         mean.binaryproto  test.lmdb
deploy.prototxt  googlenet_solver.prototxt  model.json        train.lmdb
but I'm still getting the
INFO - 23:49:07 - googlenet does not need backward computation.
INFO - 23:49:07 - This network produces output loss1/loss1
INFO - 23:49:07 - This network produces output loss1/top-1
INFO - 23:49:07 - This network produces output loss1/top-5
INFO - 23:49:07 - This network produces output loss2/loss1
INFO - 23:49:07 - This network produces output loss2/top-1
INFO - 23:49:07 - This network produces output loss2/top-5
INFO - 23:49:07 - This network produces output loss3/loss3
INFO - 23:49:07 - This network produces output loss3/top-1
INFO - 23:49:07 - This network produces output probt
INFO - 23:49:07 - Network initialization done.
E1119 23:49:07.404664  8740 caffelib.cc:785] exception while forward/backward pass through the network
error
Emmanuel Benazera
@beniz
Nov 19 2017 18:27
memory or GPU error probably. look with dmesg for your system report
With hight certainty you haven't compiled for your gpu
Look at the deepdetect readme and the cuda arch command at build time
jubeenshah
@jubeenshah
Nov 19 2017 18:30
I did pull your gpu and followed the steps
Emmanuel Benazera
@beniz
Nov 19 2017 18:30
Use the cuda compute code for Pascal card
If you are using docker use the Pascal version I guess
jubeenshah
@jubeenshah
Nov 19 2017 18:31
I'm using nvidia-docker
does that create an issue?
Emmanuel Benazera
@beniz
Nov 19 2017 18:35
I believe you need to use https://hub.docker.com/r/beniz/deepdetect_gpu_pascal/ and then you need to share volume etc... it's in the docker/readme file
good luck
jubeenshah
@jubeenshah
Nov 19 2017 18:36
okay thanks a lot for the help
Emmanuel Benazera
@beniz
Nov 19 2017 18:37
np, just make sure you report everything when you are looking for help from Open Source project maintainers. our time is scarce, make sure you gather all info and report the problem. This will help you in the future as well ^^
jubeenshah
@jubeenshah
Nov 19 2017 18:41
okay Sure, I'm sorry for the delay I caused you!
Emmanuel Benazera
@beniz
Nov 19 2017 18:42
no worries, just if I hadn't missed the fact you were using docker, this would have been solved days ago.