Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    newtiopon
    @newtiopon
    hi there. could someone please tell me if the mini samples were removed online and why
    SP Mohanty
    @spMohanty
    @newtiopon : Yes they were. They will be added soon, mostly later today.
    @brianbrost will follow up with more details. :)
    yassphi
    @yassphi
    Hello,
    I have some question about the AP calculation
    When you are speaking about that we have to predict the second half of a session
    That means that, for example, in a session of 20 tracks
    The first ten tracks are not evaluated
    But the 10 last, needs to be evaluated ?
    In my example, for AP :
    T is the number of tracks to be predicted for the given session , this would be 10
    But P & L are not clear
    yassphi
    @yassphi
    @spMohanty Any details ?
    brianbrost
    @brianbrost
    Hi @newtiopon, just to let you know that we've added the mini version of the training set and track features again.
    brianbrost
    @brianbrost

    @yassphi You are correct. For a session of length 20, you would need to evaluate your predictions on the last 10 tracks. T would be 10. P(i) is the precision at position i of your predictions, so if at position 5, you had 2 correct predictions, P(5) = 2/5. L(i) indicates if your i'th prediction was correct, so if your prediction at position 5 was correct, L(5) = 1, and otherwise, L(5) = 0. It may be helpful to look at the local evaluation file in the starter kit: https://github.com/crowdAI/skip-prediction-challenge-starter-kit/blob/master/local_evaluation.ipynb. See also https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precision.

    Hope this helps!

    yassphi
    @yassphi
    @brianbrost Thank you Brian, so if i understand, i will do a classifcation on all tracks but the evaluation are on the last 10 right ?
    brianbrost
    @brianbrost
    @yassphi No, for the test set the first half of the interactions in a session
    Are already given with the labelled ground truth, which you can use to inform your predictions for the second half.
    So you'll only be making the predictions for the last 10 tracks in a session of length 20
    newtiopon
    @newtiopon
    hello, i have an issue during downloading. the 56G-large trainset seems not supporting broken-point continuing transferring. i tried downloading it twice and i could only extract part of the compressed files. how could i resovle this....looking for help
    SP Mohanty
    @spMohanty
    @newtiopon : we understand. We are considering alternate options to deliver the data, and hopefully will send out official communication about the possible alternate approaches
    Whats the bandwidth of your internet connection ?
    newtiopon
    @newtiopon
    100M/s, fyi
    brianbrost
    @brianbrost
    @newtiopon in the meantime there are some suggestions on the discussion board (https://www.crowdai.org/topics/56g-is-the-training-set/discussion) which might help.
    newtiopon
    @newtiopon
    thanks Brian. i have question about the relation between train_set filename and the "date" field. if i understand, the date field refers to the date the session happens. But something confusing is that file 'log_0_20180726_000000000000.csv ' contains sessions whose date field vary from 2012 to 2018. So my questions are 1) what does the datestamp in file name refers to? 2)if the datestamp in filename means something, how does sessions from different dates assigned to this file? looking for further clarification
    brianbrost
    @brianbrost
    @newtiopon, there are some occasional logging errors that can occur for a number of reasons, causing a mismatch between the date in the file name and the date when the session occurred according to the field. These mismatches should be relatively rare, and it's up to you exactly how you handle them.
    hjh1011
    @hjh1011
    Are we able to extract artists and album info through the web api for all 3M tracks?
    @brianbrost
    brianbrost
    @brianbrost
    Hi @hjh1011, this would have been very interesting, unfortunately this is no longer allowed since we were required to remove the link between our internal track id's for the dataset and the actual track id's.
    hjh1011
    @hjh1011
    @brianbrost I just realized that the data was updated on 20 Nov and artists and album and some other information were removed from the track features. I just wonder for people who has download the data before that, they would have those columns, right? If that is the case, how is this a fair contests???
    brianbrost
    @brianbrost
    @hjh1011 As noted on the discussion board, participants will be required to open source their code, so we will be able to see if participants have used those features.
    RStudent
    @rstudent_gitlab
    @brianbrost , @spMohanty any one else, please, I could use some help here. Very late entrant to the contest, still want to give it a sincere try. having trouble downloading the train set file. Even with the splits. I keep getting the "gzip: stdin: invalid compressed data--format violated" error on doing tar -xzf. please, if anyone else had similar issues and resolved them, i could use some help here. please note that i am able to successfully download and extract the test set. i am on ubuntu 18.04. thanks a lot in anticipation.
    SP Mohanty
    @spMohanty
    @brianbrost : Can you upload md5sums of all the files, so that participants can atleast verify that the file they downloaded was not corrupted in any sense during tramsmission ?
    RStudent
    @rstudent_gitlab
    @spMohanty : Thanks a lot for the response. For reference, this is what i get for the file throwing an error. ~/workspace/music/train$ md5sum 20181113_training_set.tar.gz
    1dd55ea738937c5a3c23f6d18eb9804c 20181113_training_set.tar.gz
    and the size as expected is 56G
    SP Mohanty
    @spMohanty
    well for me the md5sum is this :
    f0b818a7cffd355d6ddeb368d2b244c0  20181113_training_set.tar.gz
    so something seems off definitely
    are you sure the download completed in a clean way ?
    brianbrost
    @brianbrost
    hi @rstudent_gitlab, I'll be home in about 10 minutes and I'll verify the md5sum, but have you tried downloading the split version of the training set? See the Training_Set_Split_Download.txt file in the Dataset tab. It contains links to download the training set split into 10 files, making it easier to download
    brianbrost
    @brianbrost
    ah sorry just saw that you tried the split version as well
    @rstudent_gitlab, do you get the exact same error message for the split version of the training set?
    brianbrost
    @brianbrost
    For what it's worth, here's the md5sums I get:
    f0b818a7cffd355d6ddeb368d2b244c0 20181113_training_set.tar.gz
    will list the ones for the split version of the training set when I finish re-downloading it
    brianbrost
    @brianbrost
    9bef4b0ed6ec4754c91d43fa0058213c training_set_0.tar.gz
    cb1a443f9613f11388c1c1aac703f7f6 training_set_1.tar.gz
    c0508e75ea300fd0e04b385d83a4ff04 training_set_2.tar.gz
    brianbrost
    @brianbrost
    66773b8a1f6d7a3034414afa223fe617 training_set_3.tar.gz
    99a88fa87ffadc40d1777d002e830805 training_set_4.tar.gz
    brianbrost
    @brianbrost
    a7193e27165ab849fb8e70156d9aa265 training_set_5.tar.gz
    brianbrost
    @brianbrost
    65d3b5731f1f735ccb8f7de1128c3354 training_set_6.tar.gz
    b2e6e6c0989b9995672cc92219ac4bd8 training_set_7.tar.gz