These are chat archives for beniz/deepdetect

Jun 2017
Jun 02 2017 14:27

thanks for your help.
we rebuild the source with your fix.
curl -X GET "http://localhost:8080/info"
{"status":{"code":200,"msg":"OK"},"head":{"method":"/info","version":"0.1","branch":"master","commit":"f27d637166bf95ccbf57e7458147036d27ad82cc","services":[{"mllib":"xgboost","description":"ypanel service","name":"ypanel"}]}}

when i get to the predict section, it's still not working:

curl -X POST "http://localhost:8080/predict" -d '{
"categoricals_mapping":{"redirect_id":{"50677":3,"13129":4,"14939":0,"35817":2,"491":1},"account_id":{"2348":3,"3848":4,"555":1,"4222":0,"10791":2},"mini_id":{"":0,"":1,"":3,"":2},"general_os":{"1":0,"2":1,"3":2},"site_id":{"529877394":2,"298275049":3,"509059194":5,"2698452049":4,"2362799885":0,"87990184":1},"general_device":{"0":0,"1":1},"browser":{"Firefox":3,"Mobile Safari":0,"IE":2,"Chrome":4,"Chrome Mobile":1},"template_id":{"634037":3,"611763":2,"635127":4,"635129":0,"503562":5,"541047":1}},
"data":["-1,4222,2362799885,635129,14939,,Mobile Safari,1,0,1"]
the response is :
{"status":{"code":400,"msg":"BadRequest","dd_code":1005,"dd_msg":"Service Input Error"}}

in the log says:
ERROR - 14:22:28 - service ypanel mllib bad param: no data could be found

ERROR - 14:22:28 - Fri Jun 2 14:22:28 2017 UTC - "POST /predict" 400 0

I'm predict using in-memory data, why am i getting 'no data could be found' ?
Jun 02 2017 14:41
and why is it complaining "service ypanel mllib bad param", for predict, no params required for xgboost according to the api doc.
Jun 02 2017 14:49
ah, i change the data to include the header and it worked
"data":["date,account_id,site_id,template_id,redirect_id,mini_id,browser,general_os,general_device,Visitors","-1,4222,2362799885,635129,14939,,Mobile Safari,1,0,1"]
Emmanuel Benazera
Jun 02 2017 14:50
Yes you need the header because you are using categorical variables. Some automation comes at the cost of a few requirements.
Jun 02 2017 14:52
ok, thanks
Emmanuel Benazera
Jun 02 2017 14:53
Keeping the header is not a big overhead since in general prediction is faster in batches
Jun 02 2017 18:41

I have another question, why "categoricals_mapping" is required in the predict call? we get "categoricals_mapping" returns from train call, and I think your system log it in the model.json file.

currently I'm running in a much bigger data set which have lots of different site_id, redirect_id and etc. so the "categoricals_mapping" is huge to pass on to predict call.

if I don't make them categoricals, the result seem not correct.

can you give me a good solution?

Emmanuel Benazera
Jun 02 2017 19:47
Look 'machine learning categorical data' up.
DD is stateless but for the model, this you need to pass the arguments when needed. You may try to recreate the service after training and to set the categorical mappings only once at service creation.