These are chat archives for beniz/deepdetect

6th
Jun 2017
Emmanuel Benazera
@beniz
Jun 06 2017 05:59
Sounds like a feature engineering problem, and we can't really help you here. That being said, categorical variables are one hot encoded in DD, and this can lead to a (too) high number of variables. You can try to preprocess some fields with a simpler scheme, e.g. mapping discrete values to integer. Some more complex scheme exist, from embeddings to grouping based on your underlying application.
Tunlrcom
@tunlrcom_twitter
Jun 06 2017 13:29
thanks for your help
Emmanuel Benazera
@beniz
Jun 06 2017 13:29
did it help ?
Tunlrcom
@tunlrcom_twitter
Jun 06 2017 18:29
mm... those fields we put in categoricals are already integer, we take them out of the categoricals list. we adjust the value of another column 'Visitors' by setting it to 1 in the train data, just like the predict file where we set all Visitors to 1, now we are happy with the prediction result.
I think we can just get rid of the Visitors column since it's always being set to 1.