## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
• Create your own community
##### Activity
yassphi
@yassphi
Hello,
I have some question about the AP calculation
When you are speaking about that we have to predict the second half of a session
That means that, for example, in a session of 20 tracks
The first ten tracks are not evaluated
But the 10 last, needs to be evaluated ?
In my example, for AP :
T is the number of tracks to be predicted for the given session , this would be 10
But P & L are not clear
yassphi
@yassphi
@spMohanty Any details ?
brianbrost
@brianbrost
Hi @newtiopon, just to let you know that we've added the mini version of the training set and track features again.
brianbrost
@brianbrost

@yassphi You are correct. For a session of length 20, you would need to evaluate your predictions on the last 10 tracks. T would be 10. P(i) is the precision at position i of your predictions, so if at position 5, you had 2 correct predictions, P(5) = 2/5. L(i) indicates if your i'th prediction was correct, so if your prediction at position 5 was correct, L(5) = 1, and otherwise, L(5) = 0. It may be helpful to look at the local evaluation file in the starter kit: https://github.com/crowdAI/skip-prediction-challenge-starter-kit/blob/master/local_evaluation.ipynb. See also https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precision.

Hope this helps!

yassphi
@yassphi
@brianbrost Thank you Brian, so if i understand, i will do a classifcation on all tracks but the evaluation are on the last 10 right ?
brianbrost
@brianbrost
@yassphi No, for the test set the first half of the interactions in a session
Are already given with the labelled ground truth, which you can use to inform your predictions for the second half.
So you'll only be making the predictions for the last 10 tracks in a session of length 20
newtiopon
@newtiopon
hello, i have an issue during downloading. the 56G-large trainset seems not supporting broken-point continuing transferring. i tried downloading it twice and i could only extract part of the compressed files. how could i resovle this....looking for help
SP Mohanty
@spMohanty
@newtiopon : we understand. We are considering alternate options to deliver the data, and hopefully will send out official communication about the possible alternate approaches
Whats the bandwidth of your internet connection ?
newtiopon
@newtiopon
100M/s, fyi
brianbrost
@brianbrost
@newtiopon in the meantime there are some suggestions on the discussion board (https://www.crowdai.org/topics/56g-is-the-training-set/discussion) which might help.
newtiopon
@newtiopon
thanks Brian. i have question about the relation between train_set filename and the "date" field. if i understand, the date field refers to the date the session happens. But something confusing is that file 'log_0_20180726_000000000000.csv ' contains sessions whose date field vary from 2012 to 2018. So my questions are 1) what does the datestamp in file name refers to? 2)if the datestamp in filename means something, how does sessions from different dates assigned to this file? looking for further clarification
brianbrost
@brianbrost
@newtiopon, there are some occasional logging errors that can occur for a number of reasons, causing a mismatch between the date in the file name and the date when the session occurred according to the field. These mismatches should be relatively rare, and it's up to you exactly how you handle them.
hjh1011
@hjh1011
Are we able to extract artists and album info through the web api for all 3M tracks?
@brianbrost
brianbrost
@brianbrost
Hi @hjh1011, this would have been very interesting, unfortunately this is no longer allowed since we were required to remove the link between our internal track id's for the dataset and the actual track id's.
hjh1011
@hjh1011
@brianbrost I just realized that the data was updated on 20 Nov and artists and album and some other information were removed from the track features. I just wonder for people who has download the data before that, they would have those columns, right? If that is the case, how is this a fair contests???
brianbrost
@brianbrost
@hjh1011 As noted on the discussion board, participants will be required to open source their code, so we will be able to see if participants have used those features.
RStudent
@rstudent_gitlab
@brianbrost , @spMohanty any one else, please, I could use some help here. Very late entrant to the contest, still want to give it a sincere try. having trouble downloading the train set file. Even with the splits. I keep getting the "gzip: stdin: invalid compressed data--format violated" error on doing tar -xzf. please, if anyone else had similar issues and resolved them, i could use some help here. please note that i am able to successfully download and extract the test set. i am on ubuntu 18.04. thanks a lot in anticipation.
SP Mohanty
@spMohanty
@brianbrost : Can you upload md5sums of all the files, so that participants can atleast verify that the file they downloaded was not corrupted in any sense during tramsmission ?
RStudent
@rstudent_gitlab
@spMohanty : Thanks a lot for the response. For reference, this is what i get for the file throwing an error. ~/workspace/music/train\$ md5sum 20181113_training_set.tar.gz
1dd55ea738937c5a3c23f6d18eb9804c 20181113_training_set.tar.gz
and the size as expected is 56G
SP Mohanty
@spMohanty
well for me the md5sum is this :
f0b818a7cffd355d6ddeb368d2b244c0  20181113_training_set.tar.gz
so something seems off definitely
are you sure the download completed in a clean way ?
brianbrost
@brianbrost
hi @rstudent_gitlab, I'll be home in about 10 minutes and I'll verify the md5sum, but have you tried downloading the split version of the training set? See the Training_Set_Split_Download.txt file in the Dataset tab. It contains links to download the training set split into 10 files, making it easier to download
brianbrost
@brianbrost
ah sorry just saw that you tried the split version as well
@rstudent_gitlab, do you get the exact same error message for the split version of the training set?
brianbrost
@brianbrost
For what it's worth, here's the md5sums I get:
f0b818a7cffd355d6ddeb368d2b244c0 20181113_training_set.tar.gz
will list the ones for the split version of the training set when I finish re-downloading it
brianbrost
@brianbrost
9bef4b0ed6ec4754c91d43fa0058213c training_set_0.tar.gz
cb1a443f9613f11388c1c1aac703f7f6 training_set_1.tar.gz
c0508e75ea300fd0e04b385d83a4ff04 training_set_2.tar.gz
brianbrost
@brianbrost
66773b8a1f6d7a3034414afa223fe617 training_set_3.tar.gz