These are chat archives for beniz/deepdetect

24th
Jun 2016
Emmanuel Benazera
@beniz
Jun 24 2016 14:37
@Isaacpm Thanks, I believe this is a good catch, there's a bug on the cmfull matrix generation
Emmanuel Benazera
@beniz
Jun 24 2016 14:48
fixed
Isaacpm
@Isaacpm
Jun 24 2016 15:46
Hi!
oh, ok, so we need to get the code again with the fix?
;-
;-(
:worried:
it seems I'm going to have to repeat a few hundreds of runs....
Emmanuel Benazera
@beniz
Jun 24 2016 16:43
the zeros are not affected, the other numbers sort of have their ratio if your dataset is not too imbalanced
you can still see where your miss classifications go in-mass
danielgollas
@danielgollas
Jun 24 2016 19:17

Hey @beniz ! Quick question... I'm doing googlenet finetuning and it's working like a champ but I'm having trouble knowing if the training ended successfully since the last call to get training status sends back an error (showing the last two status requests)

[2016-06-24 03:17:07,827: DEBUG/Worker-2] {
    "status": {
        "msg": "OK", 
        "code": 200
    }, 
    "body": {
        "measure_hist": {}, 
        "measure": {
            "train_loss": 6.30756950378418, 
            "iteration": 9.0
        }
    }, 
    "head": {
        "status": "running", 
        "job": 1, 
        "method": "/train", 
        "time": 960.0
    }
}
[2016-06-24 03:17:12,909: DEBUG/Worker-2] {
    "status": {
        "msg": "OK", 
        "code": 200
    }, 
    "body": {}, 
    "head": {
        "status": "error", 
        "job": 1, 
        "method": "/train"
    }
}

Neither the .caffemodel nor .solverstate files get written so I imagine that in fact the training did not complete (but only because of some problem with the snapshotting). Here is the tail of the dede log that corresponds to those last moments (I've configured to snapshot at 10 iterations):

INFO - 20:01:31 - Network initialization done.
I0623 20:03:07.721875  6015 caffelib.cc:1556] smoothed_loss=13.5272
INFO - 20:01:31 - Solver scaffolding done.
INFO - 20:17:07 - Snapshotting to binary proto file 46__iter_10.caffemodel
INFO - 20:17:08 - Snapshotting solver state to binary proto file 46__iter_10.solverstate
ERROR - 20:17:12 - service srv_46 training status call failed

ERROR - 20:17:12 - {"code":500,"msg":"InternalError"}

Any ideas on what might be going on or how I could make the logs more detailed?

Emmanuel Benazera
@beniz
Jun 24 2016 20:36
@danielgollas try removing the _ in your service name, there may be an issue with it.
danielgollas
@danielgollas
Jun 24 2016 20:42
Thanks @beniz, it seems like the problem was related to the snapshot_prefix setting indeed.
I'll do some more testing but I think it has to do with it starting with a number instead of a letter, same thing happened when I used numbers as service names (/info and /train raised BadRequest issues)
Emmanuel Benazera
@beniz
Jun 24 2016 20:55
I believe this had been reported before, or at least mentioned. Opening an issue would be useful as it appears to be both common and useful to number the services.
danielgollas
@danielgollas
Jun 24 2016 21:41
Ok, I'll open an issue. Upon further testing I found that using prefixes that start with numbers did not break things provided an absolute path was given.
(which I guess techinically means that the prefix does not start with a number)
danielgollas
@danielgollas
Jun 24 2016 21:49
issue opened: beniz/deepdetect#152
Emmanuel Benazera
@beniz
Jun 24 2016 22:03
Thanks
danielgollas
@danielgollas
Jun 24 2016 22:30
No, thank you! Really! you make coffee bearable to work with!
*caffe