These are chat archives for beniz/deepdetect

10th
Nov 2017
rperdon
@rperdon
Nov 10 2017 19:58
{"status":{"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"src/caffe/util/io.cpp:48 / Check failed (custom): (fd) != (-1)"}}c
I wanted to train an alexnet model
so I changed the code to alexnet and did a 227x227 resize
got the error above;
I cleared the training folder and reran the server
curl -X PUT "http://localhost:9999/services/ddanimemodel" -d '{
"mllib":"caffe",
"description":"anime classifier",
"type":"supervised",
"parameters":{
"input":{
"connector":"image",
"width":227,
"height":227
},
"mllib":{
"template":"alexnet",
"nclasses":2
}
},
"model":{
"templates":"../templates/caffe/",
"repository":"/source"
}
}'
curl -X POST "http://localhost:9999/train" -d '{
"service":"ddanimemodel",
"async":false,
"parameters":{
"mllib":{
"gpu":true,
"net":{
"batch_size":32
},
"solver":{
"test_interval":500,
"iterations":30000,
"base_lr":0.025
}
},
"input":{
"connector":
"image",
"test_split":0.1,
"shuffle":true,
"width":227,
"height":227
},
"output":{
"measure":["acc","mcll","f1"]
}
},
"data":["/source"]
}'
rperdon
@rperdon
Nov 10 2017 20:16
I also notice something weird on the first attempt of a training run:
{"status":{"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"src/caffe/syncedmem.cpp:56 / Check failed (custom): (error) == (cudaSuccess)"}}
if I rerun it; training starts up
resnet_50 seems to be orking, but not sur eon alexnet
rperdon
@rperdon
Nov 10 2017 21:09
I received a random {"status":{"code":500,"msg":"InternalError","dd_code":1007,"dd_msg":"src/caffe/syncedmem.cpp:56 / Check failed (custom): (error) == (cudaSuccess)"}}
1007 indicates internal ml library error.
I can restart resnet, but it'll happen again randomly

INFO - 21:11:55 - Iteration 480, lr = 0.025[21:11:56] /opt/deepdetect/src/caffelib.cc:812: smoothed_loss=0.189263

INFO - 21:12:11 - Ignoring source layer prob
INFO - 21:12:11 - Opened lmdb /source/test.lmdb[21:12:11] /opt/deepdetect/src/caffelib.cc:1016: Error while proceeding with test forward pass

ERROR - 21:12:11 - service ddanimemodel training call failed

ERROR - 21:12:11 - Fri Nov 10 21:12:11 2017 UTC - 172.17.0.1 "POST /train" 500 383238

Of the 3 I have tested, only googlenet has run to completion, but I am not satisfied ith its 49% accuracy rate.
rperdon
@rperdon
Nov 10 2017 21:21
Testing a 4th template, nin, so far working ok. I followed the API description for templates to select correct image resize values.