## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
yassphi
@yassphi
But the 10 last, needs to be evaluated ?
In my example, for AP :
T is the number of tracks to be predicted for the given session , this would be 10
But P & L are not clear
yassphi
@yassphi
@spMohanty Any details ?
brianbrost
@brianbrost
Hi @newtiopon, just to let you know that we've added the mini version of the training set and track features again.
brianbrost
@brianbrost

@yassphi You are correct. For a session of length 20, you would need to evaluate your predictions on the last 10 tracks. T would be 10. P(i) is the precision at position i of your predictions, so if at position 5, you had 2 correct predictions, P(5) = 2/5. L(i) indicates if your i'th prediction was correct, so if your prediction at position 5 was correct, L(5) = 1, and otherwise, L(5) = 0. It may be helpful to look at the local evaluation file in the starter kit: https://github.com/crowdAI/skip-prediction-challenge-starter-kit/blob/master/local_evaluation.ipynb. See also https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precision.

Hope this helps!

yassphi
@yassphi
@brianbrost Thank you Brian, so if i understand, i will do a classifcation on all tracks but the evaluation are on the last 10 right ?
brianbrost
@brianbrost
@yassphi No, for the test set the first half of the interactions in a session
Are already given with the labelled ground truth, which you can use to inform your predictions for the second half.
So you'll only be making the predictions for the last 10 tracks in a session of length 20
newtiopon
@newtiopon
hello, i have an issue during downloading. the 56G-large trainset seems not supporting broken-point continuing transferring. i tried downloading it twice and i could only extract part of the compressed files. how could i resovle this....looking for help
SP Mohanty
@spMohanty
@newtiopon : we understand. We are considering alternate options to deliver the data, and hopefully will send out official communication about the possible alternate approaches
Whats the bandwidth of your internet connection ?
newtiopon
@newtiopon
100M/s, fyi
brianbrost
@brianbrost
@newtiopon in the meantime there are some suggestions on the discussion board (https://www.crowdai.org/topics/56g-is-the-training-set/discussion) which might help.
newtiopon
@newtiopon
thanks Brian. i have question about the relation between train_set filename and the "date" field. if i understand, the date field refers to the date the session happens. But something confusing is that file 'log_0_20180726_000000000000.csv ' contains sessions whose date field vary from 2012 to 2018. So my questions are 1) what does the datestamp in file name refers to? 2)if the datestamp in filename means something, how does sessions from different dates assigned to this file? looking for further clarification
brianbrost
@brianbrost
@newtiopon, there are some occasional logging errors that can occur for a number of reasons, causing a mismatch between the date in the file name and the date when the session occurred according to the field. These mismatches should be relatively rare, and it's up to you exactly how you handle them.
hjh1011
@hjh1011
Are we able to extract artists and album info through the web api for all 3M tracks?
@brianbrost
brianbrost
@brianbrost
Hi @hjh1011, this would have been very interesting, unfortunately this is no longer allowed since we were required to remove the link between our internal track id's for the dataset and the actual track id's.
hjh1011
@hjh1011
@brianbrost I just realized that the data was updated on 20 Nov and artists and album and some other information were removed from the track features. I just wonder for people who has download the data before that, they would have those columns, right? If that is the case, how is this a fair contests???
brianbrost
@brianbrost
@hjh1011 As noted on the discussion board, participants will be required to open source their code, so we will be able to see if participants have used those features.
RStudent
@rstudent_gitlab
@brianbrost , @spMohanty any one else, please, I could use some help here. Very late entrant to the contest, still want to give it a sincere try. having trouble downloading the train set file. Even with the splits. I keep getting the "gzip: stdin: invalid compressed data--format violated" error on doing tar -xzf. please, if anyone else had similar issues and resolved them, i could use some help here. please note that i am able to successfully download and extract the test set. i am on ubuntu 18.04. thanks a lot in anticipation.
SP Mohanty
@spMohanty
@brianbrost : Can you upload md5sums of all the files, so that participants can atleast verify that the file they downloaded was not corrupted in any sense during tramsmission ?
RStudent
@rstudent_gitlab
@spMohanty : Thanks a lot for the response. For reference, this is what i get for the file throwing an error. ~/workspace/music/train$md5sum 20181113_training_set.tar.gz 1dd55ea738937c5a3c23f6d18eb9804c 20181113_training_set.tar.gz and the size as expected is 56G SP Mohanty @spMohanty well for me the md5sum is this : f0b818a7cffd355d6ddeb368d2b244c0 20181113_training_set.tar.gz so something seems off definitely are you sure the download completed in a clean way ? brianbrost @brianbrost hi @rstudent_gitlab, I'll be home in about 10 minutes and I'll verify the md5sum, but have you tried downloading the split version of the training set? See the Training_Set_Split_Download.txt file in the Dataset tab. It contains links to download the training set split into 10 files, making it easier to download brianbrost @brianbrost ah sorry just saw that you tried the split version as well @rstudent_gitlab, do you get the exact same error message for the split version of the training set? brianbrost @brianbrost For what it's worth, here's the md5sums I get: f0b818a7cffd355d6ddeb368d2b244c0 20181113_training_set.tar.gz will list the ones for the split version of the training set when I finish re-downloading it brianbrost @brianbrost 9bef4b0ed6ec4754c91d43fa0058213c training_set_0.tar.gz cb1a443f9613f11388c1c1aac703f7f6 training_set_1.tar.gz c0508e75ea300fd0e04b385d83a4ff04 training_set_2.tar.gz brianbrost @brianbrost 66773b8a1f6d7a3034414afa223fe617 training_set_3.tar.gz 99a88fa87ffadc40d1777d002e830805 training_set_4.tar.gz brianbrost @brianbrost a7193e27165ab849fb8e70156d9aa265 training_set_5.tar.gz brianbrost @brianbrost 65d3b5731f1f735ccb8f7de1128c3354 training_set_6.tar.gz b2e6e6c0989b9995672cc92219ac4bd8 training_set_7.tar.gz 1d716c77bcc64ca197372a89c7963d3d training_set_8.tar.gz 58f1e0b1e3d2c91edef3903199f15e9a training_set_9.tar.gz @rstudent_gitlab please let me know if your checksums are different, and if you can extract any of the split files? RStudent @rstudent_gitlab I know i am being a bother. Could someone else please who has downloaded the file successfully confirm the md5sum please. At least i will know the corruption error is valid brianbrost @brianbrost I just downloaded those from the competition website, so the md5sums are the ones I would expect anyone else to get too. Are some of your checksums different, or are all of them different? @rstudent_gitlab RStudent @rstudent_gitlab Thank you so much for helping me out on this @brianbrost , @spMohanty . Well my md5sums are different for the main and the splits. For example for ~/workspace/music/train$ md5sum training_set_1.tar.gz