These are chat archives for beniz/deepdetect

31st
May 2017
roysG
@roysG
May 31 2017 09:59
Guys, please tell me how can i improve the speed of prediction
For now i success to iterate on 100 images in 35 sec
MY computer info:
i7,16GB RAM,cards graphic: gt 1070
The result was by using gpu:true
Emmanuel Benazera
@beniz
May 31 2017 10:04
send multiple images at once.
roysG
@roysG
May 31 2017 10:04
The result was by using as multiple images at once
Emmanuel Benazera
@beniz
May 31 2017 10:05
then try gpu:false and see whether it is much slower, indicating that your GPU is already working fine.
roysG
@roysG
May 31 2017 10:05
yes i already tested it and in cpu is slower
i am using in caffe model from this url:
https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

maybe if i will convert the caffe model to tensorflow it will be faster?

Or maybe you have another suggestion?

Emmanuel Benazera
@beniz
May 31 2017 10:08
it should run in about 140ms per image on a K40, probably a bit faster on a GTX 1070.
and a bit faster when sending multiple images at once.
roysG
@roysG
May 31 2017 10:09
K40 is stronger than GTX 1070?
Ohh K40 is the tesla with the 28GB
roysG
@roysG
May 31 2017 10:15
the time in the response is 355 for image with GTX 1070, is that normally?
@beniz ?
Emmanuel Benazera
@beniz
May 31 2017 10:15
I've already answered you on this. Check that you are using cudnn, etc... good luck!
roysG
@roysG
May 31 2017 10:16
Thanks you and yes i installed cudnn is there something else that may help me?
Emmanuel Benazera
@beniz
May 31 2017 10:17
making sure you are using it ;)
roysG
@roysG
May 31 2017 10:17
but i can not verfiy that cudnn is working with the deep detect
Emmanuel Benazera
@beniz
May 31 2017 10:17
ldd dede | grep cudnn
roysG
@roysG
May 31 2017 10:18
Maybe this is exactly my problem, is there any flag that i need to add?
i tried your command and this is the result
roy@roy:~/deepdetect/build$ ldd dede | grep cudnn
ldd: ./dede: No such file or directory
I also tried it here:
roy@roy:~/deepdetect/build/main$ ldd dede | grep cudnn
but no output is returend
Emmanuel Benazera
@beniz
May 31 2017 10:20
ldd dede | grep cudnn
libcudnn.so.6 => /usr/local/cuda/lib64/libcudnn.so.6 (0x00007f2510592000)
read the README carefully again and you'll solve this, good luck.
roysG
@roysG
May 31 2017 10:22
when you say to read the README, you mean in the deeptecet site?
your github?
roysG
@roysG
May 31 2017 10:39

@beniz , i found it in the readme as you told me :)

i did the commend:

cmake .. -DUSE_CUDNN=ON

and then i made:

make

the build is finished, i tested it and the result time are the same.
Also i tried to type the command you gave me before:
ldd dede | grep cudnn, but i still get empty result.

Please tell me if i did it as needed, thank you.

Emmanuel Benazera
@beniz
May 31 2017 10:40
the cmake command tells you whether cudnn is detected on your system.
roysG
@roysG
May 31 2017 10:40
i did not see it in the cmake, maybe i need to restart my server?
this is the result from cmake .
-- Boost version: 1.58.0
-- Found the following Boost libraries:
-- filesystem
-- thread
-- system
-- iostreams
-- chrono
-- date_time
-- atomic
-- regex
-- CUDA detected: 8.0
-- Found cuDNN (include: /usr/local/cuda-8.0/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- Added CUDA NVCC flags for: sm_61
-- OpenCV 2 (2.4.9.1) found (/usr/share/OpenCV)
-- Configuring customized caffe
-- Build Tests : OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /home/roy/deepdetect/build
rdoume
@rdoume
May 31 2017 10:42
Well, then you have the CuDNN
roysG
@roysG
May 31 2017 10:42
Yes i saw it now when i paste
So how can i verify that dede is working with the cudnn?

i tried again the command:
ldd deepdetect/build/main/dede | grep cudnn

but i get no result

Emmanuel Benazera
@beniz
May 31 2017 10:43
you need to delete everything in your build directory and rebuild with cmake .. -DUSE_CUDNN=ON
roysG
@roysG
May 31 2017 10:44
ohh ok, if i will do make clean
is fine?
for remove everything

when it finish i will do the command :
ldd deepdetect/build/main/dede | grep cudnn

and then i suppose to see the result:
libcudnn.so.6 => /usr/local/cuda/lib64/libcudnn.so.6

right?

roysG
@roysG
May 31 2017 11:01
Ok, i see it!, @beniz thanks you for your assist :)
Is there any addition library that may help me?
roysG
@roysG
May 31 2017 12:03
Can i use in multiple models in one request?
Emmanuel Benazera
@beniz
May 31 2017 12:03
no
roysG
@roysG
May 31 2017 14:40

From the moment that i typed the command:
ldd deepdetect/build/main/dede | grep cudnn

and i get:
libcudnn.so.6 => /usr/local/cuda/lib64/libcudnn.so.6

The cudnn is already starting automaticllay when is run the proceess dede ?

@beniz
rdoume
@rdoume
May 31 2017 16:07
It should, that means the dede loads the cudnn lib
roysG
@roysG
May 31 2017 16:07
How can i verify it?
rdoume
@rdoume
May 31 2017 16:08
well you do using this command
it tells you dede use cudnn
roysG
@roysG
May 31 2017 16:08
I found that there is library caffe2, does it make the response call for predict much faster than the coffe1?
*caffe
rdoume
@rdoume
May 31 2017 16:09
You should look at caffe2 doc for that. Caffe2 is not yet implemented in DD as far as I know
roysG
@roysG
May 31 2017 16:10
yes i know, i am just curios if i use with my mode in caffe2, if i will see any improvment
rdoume
@rdoume
May 31 2017 16:10
Well look at caffe2 documentation then.
roysG
@roysG
May 31 2017 16:11
Maybe do you have any suggestion for me, how can i speed up the time response per image?
i have, i7, 16GB RAM, GTX 1070
rdoume
@rdoume
May 31 2017 16:11
No Idea.
roysG
@roysG
May 31 2017 16:11
and for one image it took 300 milisecond
rdoume
@rdoume
May 31 2017 16:12
You will have to explore yourself for that
roysG
@roysG
May 31 2017 16:12
yee.. tell me about that i am digging for few days
roysG
@roysG
May 31 2017 16:18

I checked th nvidia-smi to see how much gpu is used and i found that it not got to his maximun

roy@roy:~$ nvidia-smi
Wed May 31 12:17:09 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 0000:01:00.0 On | N/A |
| 28% 35C P2 36W / 151W | 6024MiB / 8110MiB | 1% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 962 G /usr/lib/xorg/Xorg 174MiB |
| 0 1381 G compiz 214MiB |
| 0 13665 G /usr/lib/firefox/firefox 1MiB |
| 0 16380 C deepdetect/build/main/dede 5629MiB |
+-----------------------------------------------------------------------------+

" 0 16380 C deepdetect/build/main/dede 5629MiB", why just 5GB?
roysG
@roysG
May 31 2017 17:38
@beniz Please put a notice that the GPU Util is just on 1%, it looks like it not trying to take any presure on the gpu
In other hand, when i ran something with graphic this is the result:

roy@roy:~$ nvidia-smi
Wed May 31 13:28:48 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1070 Off | 0000:01:00.0 On | N/A |
| 28% 37C P8 14W / 151W | 439MiB / 8110MiB | 21% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 962 G /usr/lib/xorg/Xorg 200MiB |
| 0 1381 G compiz 214MiB |
| 0 13665 G /usr/lib/firefox/firefox 1MiB |
| 0 24271 G ./GpuTest 18MiB |
+-----------------------------------------------------------------------------+

Emmanuel Benazera
@beniz
May 31 2017 17:45
If you max up your gpu with batches you ll get to 99% or so. Read documentations around on GPU, deep learning and caffe, and you'll get knowledgeable enough to debug yourself. Good luck.
roysG
@roysG
May 31 2017 18:41
@beniz , of course i am using as batch, i send 100 images in one request, but it still not take my GPU to 99%
Just tell me when you say "batches" if you mean to this, thanks
roysG
@roysG
May 31 2017 18:56
I also start reading about caffe, the GPU percent soppuse to be high in learning and also prediction, right?
Emmanuel Benazera
@beniz
May 31 2017 18:58
nvidia-smi is not very precise, it gives you the usage per second, so if your call lasts few seconds you should see a heavy usage. Now we can't
roysG
@roysG
May 31 2017 18:59
Very strange, i see that it takes time, but no cpu high usage
I am wondering what i am missing
i also tried to do fast refresh for nvidia-smi
Emmanuel Benazera
@beniz
May 31 2017 19:00
say more... Try using some of the reference models so that other users will tell you what performance they have maybe ? I already told you you should get around 140ms per image with one image per batch.
If not much lower depending on where you are fetching the image
Tunlrcom
@tunlrcom_twitter
May 31 2017 19:24

hi we are trying an deepdect xgboost regression service, here's the create service call:
curl -X PUT "http://localhost:8080/services/ypanel" -d '{
"mllib":"xgboost",
"description":"ypanel service",
"type":"supervised",
"parameters":{
"input":{
"categoricals":["account_id","site_id","template_id","redirect_id","mini_id","browser","general_os","general_device"],
"connector":"csv"
},
"mllib":{
"ntargets":1
}
},
"model":{
"repository":"/home/ubuntu/models/ypanel"
}
}'

but we got error in the response:
{"status":{"code":400,"msg":"BadRequest","dd_code":1006,"dd_msg":"Service Bad Request Error"}}

here's a little more details in the syslog:
May 31 19:23:01 ip-172-31-25-103 dede_start.sh[1851]: E /media/dd1/ami/builds/deepdetect/src/services.h:266] service creation mllib bad param: number of classes is unknown (nclasses == 0)
May 31 19:23:01 ip-172-31-25-103 dede_start.sh[1851]: E /media/dd1/ami/builds/deepdetect/src/httpjsonapi.cc:357] Wed May 31 19:23:01 2017 UTC - 127.0.0.1 "PUT /services/ypanel" 400 0

does anyone know what I have done wrong? thanks

Emmanuel Benazera
@beniz
May 31 2017 19:30
@tunlrcom_twitter check that you are specifying the regression: true parameter to the mllib object
Tunlrcom
@tunlrcom_twitter
May 31 2017 19:39
oh, thank you so much. I added regression: true, and now the service is created.
just wondering why it's not in the api doc for xgboost, caffee one does mention regression property
Tunlrcom
@tunlrcom_twitter
May 31 2017 19:45

again, thanks for the quick response and we got further.

but we now encounter another issue:
here's our train api call:

curl -X POST "http://localhost:8080/train" -d '{
"service":"ypanel",
"async":true,
"parameters":{
"mllib":{
"iterations":100,
"test_interval":10,
"objective":"reg:linear"
},
"input":{
"categoricals":["account_id","site_id","template_id","redirect_id","mini_id","browser","general_os","general_device"],
"label":"RPM",
"separator":",",
"shuffle":true,
"test_split":0.1
},
"output":{
"measure":["acc","mcll","f1"]
}
},
"data":["/home/ubuntu/models/ypanel/ypanel_3.csv"]
}'

The response is: {"status":{"code":201,"msg":"Created"},"head":{"method":"/train","job":1,"status":"running"}}
and the syslog seems fine, it shows https://gyazo.com/c71cbdb3ee234f196250664b494247d6

but when i call check status:
curl -X GET "http://localhost:8080/train?service=ypanel&job=1"
{"status":{"code":200,"msg":"OK"},"head":{"method":"/train","job":1,"status":"error"},"body":{}}

the syslog showing:
May 31 19:36:59 ip-172-31-25-103 dede_start.sh[1851]: E /media/dd1/ami/builds/deepdetect/src/services.h:425] service ypanel training status call failed
May 31 19:36:59 ip-172-31-25-103 dede_start.sh[1851]: E /media/dd1/ami/builds/deepdetect/src/jsonapi.cc:776] {"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"target class has id 1268.292725 is higher than the number of classes 1 (e.g. wrong number of classes specified with nclasses"}

here's the train file look like:
date,account_id,site_id,template_id,redirect_id,mini_id,browser,general_os,general_device,Visitors,RPM
-2,4222,2362799885,635129,14939,nr.ytsurvey.mini,Mobile Safari,1,0,4791,76.927572
-1,555,87990184,541047,491,nr.redirect.mini,Mobile Safari,1,0,421549,0.755119
-2,4222,2362799885,611763,14939,nr.ytsurvey.mini,Chrome Mobile,2,0,3884,72.605561
-1,4222,2362799885,634037,14939,nr.diet.mini,Chrome Mobile,2,0,4133,57.464311
-2,4222,2362799885,611763,14939,nr.ytsurvey.mini,Chrome Mobile,2,0,3687,61.838893
-1,555,87990184,541047,491,nr.redirect.mini,Mobile Safari,1,0,211159,0.883552
-2,10791,529877394,635127,35817,nr.ytsurvey.mini,IE,3,1,2412,44.112769
-1,2348,298275049,635129,50677,nr.ytsurvey.mini,Mobile Safari,1,0,82,1268.292682
-2,3848,2698452049,503562,13129,nr.skin.mini,Firefox,3,1,1,93000
-1,10791,529877394,635127,35817,nr.ytsurvey.mini,Chrome,3,1,2763,29.388346
-1,10791,509059194,635127,35817,nr.ytsurvey.mini,Chrome,3,1,2832,26.836158

could you tell me what i was missing?

here's the easy way to view the train data: https://gyazo.com/35940c9729a123f931bb75d270542c8a
please let me know if my api calls make sense, i'm new to this.
Emmanuel Benazera
@beniz
May 31 2017 20:01
yes I think the option is missing in the xgboost section
you can't compute acc, f1 etc on a regression objective
try eucll instead
Tunlrcom
@tunlrcom_twitter
May 31 2017 20:17
ah, eucll works! thank you so much!
again, i can't find 'eucll' option for measure in the doc.
Emmanuel Benazera
@beniz
May 31 2017 20:18
right, thanks for finding this out, it's in the examples but is missing in the API. Fixing it we must.
Tunlrcom
@tunlrcom_twitter
May 31 2017 20:24
cool, thanks. could you tell me the meaning of 'eucll' ?
Emmanuel Benazera
@beniz
May 31 2017 20:25
euclidean distance
Tunlrcom
@tunlrcom_twitter
May 31 2017 20:36
ok, thanks. are there any other options for regression measure?
Emmanuel Benazera
@beniz
May 31 2017 20:37
no. Note that this is the measure, not the loss, so it's only indicative of what your loss may lead to.
Tunlrcom
@tunlrcom_twitter
May 31 2017 20:49

oh, ok, thx.
last step, my predict call:
curl -X POST "http://localhost:8080/predict" -d '{
"service":"ypanel",
"parameters":{
"input":{
"separator":","
}
},
"data":["/home/ubuntu/models/ypanel/ypanel_predict.csv"]
}'

{"status":{"code":400,"msg":"BadRequest","dd_code":1005,"dd_msg":"Service Input Error"}}

syslog says:
May 31 20:39:20 ip-172-31-25-103 dede_start.sh[1851]: E /media/dd1/ami/builds/deepdetect/src/services.h:478] service ypanel mllib bad param: no data could be found
May 31 20:39:20 ip-172-31-25-103 dede_start.sh[1851]: E /media/dd1/ami/builds/deepdetect/src/httpjsonapi.cc:357] Wed May 31 20:39:20 2017 UTC - 127.0.0.1 "POST /predict" 400 0

I checked the predict file:

-rwxrwxrwx 1 ubuntu ubuntu 807 May 31 19:09 ypanel_predict.csv

cat ypanel_predict.csv
date,account_id,site_id,template_id,redirect_id,mini_id,browser,general_os,general_device,Visitors
0,4222,2362799885,635129,14939,nr.ytsurvey.mini,Mobile Safari,1,0,1
0,555,87990184,541047,491,nr.redirect.mini,Mobile Safari,1,0,1
0,4222,2362799885,611763,14939,nr.ytsurvey.mini,Chrome Mobile,2,0,1
0,4222,2362799885,634037,14939,nr.diet.mini,Chrome Mobile,2,0,1
0,4222,2362799885,611763,14939,nr.ytsurvey.mini,Chrome Mobile,2,0,1
0,555,87990184,541047,491,nr.redirect.mini,Mobile Safari,1,0,1
0,10791,529877394,635127,35817,nr.ytsurvey.mini,IE,3,1,1
0,2348,298275049,635129,50677,nr.ytsurvey.mini,Mobile Safari,1,0,1
0,3848,2698452049,503562,13129,nr.skin.mini,Firefox,3,1,1
0,10791,529877394,635127,35817,nr.ytsurvey.mini,Chrome,3,1,1
0,10791,509059194,635127,35817,nr.ytsurvey.mini,Chrome,3,1,1

can you help me out?

Tunlrcom
@tunlrcom_twitter
May 31 2017 21:02
even i do in memory data, still same error:
curl -X POST "http://localhost:8080/predict" -d '{
   "service":"ypanel",
   "parameters":{
     "input":{
       "separator":","
     }
   },
   "data":["0,4222,2362799885,635129,14939,nr.ytsurvey.mini,Mobile Safari,1,0,1"]
 }'
Emmanuel Benazera
@beniz
May 31 2017 21:03
Look at categorical_mappings and you may want to specify the id etc...
Tunlrcom
@tunlrcom_twitter
May 31 2017 21:04
but we don't have an id
If you don't have an idea it is set automatically
Tunlrcom
@tunlrcom_twitter
May 31 2017 21:15
I added categoricals_mapping, still same error
curl -X POST "http://localhost:8080/predict" -d '{
"service":"ypanel",
"parameters":{
"input":{
"categoricals_mapping":{"redirect_id":{"50677":3,"13129":4,"14939":0,"35817":2,"491":1},"account_id":{"2348":3,"3848":4,"555":1,"4222":0,"10791":2},"mini_id":{"nr.ytsurvey.mini":0,"nr.redirect.mini":1,"nr.skin.mini":3,"nr.diet.mini":2},"general_os":{"1":0,"2":1,"3":2},"site_id":{"529877394":2,"298275049":3,"509059194":5,"2698452049":4,"2362799885":0,"87990184":1},"general_device":{"0":0,"1":1},"browser":{"Firefox":3,"Mobile Safari":0,"IE":2,"Chrome":4,"Chrome Mobile":1},"template_id":{"634037":3,"611763":2,"635127":4,"635129":0,"503562":5,"541047":1}},
"separator":","
}
},
"data":["0,4222,2362799885,635129,14939,nr.ytsurvey.mini,Mobile Safari,1,0,1"]
}'
do i need scale, min_vals and max_vals? they seem optional, plus i don't have min_vals and max_vals returned from the call
Emmanuel Benazera
@beniz
May 31 2017 21:21
you don't need the min/max and scale
Tunlrcom
@tunlrcom_twitter
May 31 2017 21:23
but why isn't it working?
Emmanuel Benazera
@beniz
May 31 2017 21:25
not sure, the unit tests on regression + prediction are passing, so... could be either the call configuration or an untested bug somewhere
Tunlrcom
@tunlrcom_twitter
May 31 2017 21:27
could you find out for me?
btw, which lib is better for regression stuff? caffee or xgboost?
Emmanuel Benazera
@beniz
May 31 2017 21:34
we'll look at it
Emmanuel Benazera
@beniz
May 31 2017 22:33
So I think it's due to bugs, possibly introduced recently, though the regression unit tests are still passing through with categoricals etc... which is not the least astounding.
Emmanuel Benazera
@beniz
May 31 2017 22:40
so @tunlrcom_twitter see PR #320. At the moment you'll need to either merge it or switch to branch csv_cat_fix. This should do the trick. Thanks for spotting this one, let us know how it goes.