##### Activity
newtiopon
@newtiopon
hi there. could someone please tell me if the mini samples were removed online and why
SP Mohanty
@spMohanty
@newtiopon : Yes they were. They will be added soon, mostly later today.
@brianbrost will follow up with more details. :)
yassphi
@yassphi
Hello,
I have some question about the AP calculation
When you are speaking about that we have to predict the second half of a session
That means that, for example, in a session of 20 tracks
The first ten tracks are not evaluated
But the 10 last, needs to be evaluated ?
In my example, for AP :
T is the number of tracks to be predicted for the given session , this would be 10
But P & L are not clear
yassphi
@yassphi
@spMohanty Any details ?
brianbrost
@brianbrost
Hi @newtiopon, just to let you know that we've added the mini version of the training set and track features again.
brianbrost
@brianbrost

@yassphi You are correct. For a session of length 20, you would need to evaluate your predictions on the last 10 tracks. T would be 10. P(i) is the precision at position i of your predictions, so if at position 5, you had 2 correct predictions, P(5) = 2/5. L(i) indicates if your i'th prediction was correct, so if your prediction at position 5 was correct, L(5) = 1, and otherwise, L(5) = 0. It may be helpful to look at the local evaluation file in the starter kit: https://github.com/crowdAI/skip-prediction-challenge-starter-kit/blob/master/local_evaluation.ipynb. See also https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precision.

Hope this helps!

yassphi
@yassphi
@brianbrost Thank you Brian, so if i understand, i will do a classifcation on all tracks but the evaluation are on the last 10 right ?
brianbrost
@brianbrost
@yassphi No, for the test set the first half of the interactions in a session
Are already given with the labelled ground truth, which you can use to inform your predictions for the second half.
So you'll only be making the predictions for the last 10 tracks in a session of length 20
newtiopon
@newtiopon
hello, i have an issue during downloading. the 56G-large trainset seems not supporting broken-point continuing transferring. i tried downloading it twice and i could only extract part of the compressed files. how could i resovle this....looking for help
SP Mohanty
@spMohanty
@newtiopon : we understand. We are considering alternate options to deliver the data, and hopefully will send out official communication about the possible alternate approaches
Whats the bandwidth of your internet connection ?
newtiopon
@newtiopon
100M/s, fyi
brianbrost
@brianbrost
@newtiopon in the meantime there are some suggestions on the discussion board (https://www.crowdai.org/topics/56g-is-the-training-set/discussion) which might help.
newtiopon
@newtiopon
thanks Brian. i have question about the relation between train_set filename and the "date" field. if i understand, the date field refers to the date the session happens. But something confusing is that file 'log_0_20180726_000000000000.csv ' contains sessions whose date field vary from 2012 to 2018. So my questions are 1) what does the datestamp in file name refers to? 2)if the datestamp in filename means something, how does sessions from different dates assigned to this file? looking for further clarification
brianbrost
@brianbrost
@newtiopon, there are some occasional logging errors that can occur for a number of reasons, causing a mismatch between the date in the file name and the date when the session occurred according to the field. These mismatches should be relatively rare, and it's up to you exactly how you handle them.
hjh1011
@hjh1011
Are we able to extract artists and album info through the web api for all 3M tracks?
@brianbrost
brianbrost
@brianbrost
Hi @hjh1011, this would have been very interesting, unfortunately this is no longer allowed since we were required to remove the link between our internal track id's for the dataset and the actual track id's.