Hi I just joined the competition.Sorry if these are dumb questions, I kind of not sure what the rules are. We are given a sample MIDI dataset. Are we constrained to that dataset because it says in the 'dataset' section it is just a sample? And then the winner is determined by the best sounding music after it is checked through their overfitting checker and other checks?
_
Kalani Murakami
@khmurakami
Also is the midi files constrained to piano or any instrument is okay?
SP Mohanty
@spMohanty
Hey you are welcome to use any publicly available dataset of MIDI files
And regarding the piano constraint, we will indeed be mapping the first track of the MIDI file to a acoustic grand piano soundfont
But in principle if you say trained only on midi files of music mapped to flute, it will still be played on the evaluator, it will just sounds like that of an acoustic grand piano
Kalani Murakami
@khmurakami
So we can't make our dataset like if we scarped a bunch of midi files?
SP Mohanty
@spMohanty
You can, as long as you make the dataset available to other participants too !
(well technically I believe you can just release the dataset after the challenge too)
(I understand the collection of a particular dataset might be an important winning aspect of a solution in this challenge)
So if u do collect ur own dataset, release it along with your music modelling approach at the end of the challenge (or even before if you feel generous)
Qin Yongliang
@ctmakro
lots of familiar faces.
my current approach: convert midi files into a stream of events, where each event is one of (note, velocity, delay). then sequence modeling with GRU.
there are a few problems with this approach.
first, the RNN does not know which notes are more important than others. Even in classical piano music, the pad usually consist of more notes than the song, therefore the RNN will try to model the pad better.
Qin Yongliang
@ctmakro
this results in participants training with the song only sound better than those training with both song and pads.
second, given a bunch of data, sequence modeling with RNN is trying to match the dataset's conditional distribution. but not all samples in the dataset sound equally good; some author's pieces sound worse than others when being judged by average people.
therefore for high evaluation performance, you will have to pick out those pieces by hand and discard them. or, you can assign a penalty to them while training.
Qin Yongliang
@ctmakro
or better if you own CrowdAI and do that automatically via the TrueSkill system..
currently i implemented an interface to judge each of the midi files. by pressing a key i can choose to keep or reject a file for training.
Qin Yongliang
@ctmakro
how ever it takes some while to judge 2000 midi files
Nilabha Bhattacharya
@nilabha666_twitter
does anyone have the code to split mid files ? I have generated files which are slightly greater than the 3610 seconds and would want to trim the last few sections.
Nilabha Bhattacharya
@nilabha666_twitter
I have managed to manually trim the last portions of the track
Nilabha Bhattacharya
@nilabha666_twitter
but that is more hackish since I manually decide how many items from the last to trim