Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Alexander Veysov
    @snakers4
    Also I did not quite get the idea - the open solution seems to be legit in terms of ML and loss weighting, but this proprietary product - neptune.ml - is referenced there all over the place
    Is this some form of advertising?
    Konstantin Maksimov
    @maksimovkonstantin
    @jakubczakon where are fair play principles? I also as @snakers4 don't understand what is the purpose of posting 0.95 baseline
    @spMohanty What do you think about this situation?
    Alexander Veysov
    @snakers4
    Well, it is kind of possible that this baseline is posted by a third party willing to advertise its product
    I am not a good engineer, but I can say that the complexity of the code in the solution seems to be artificial in some places
    But the ML is legit, and I would say impeccable. Loss weighting and result presentation is stellar
    Event though the same can be achieved with 3x less code
    I.e. you can just put all of this in the Dataset class and ditch all the neptune.ai sugar ...
    SP Mohanty
    @spMohanty
    @snakers4 @maksimovkonstantin
    The solution by @jakubczakon and team is NOT the baseline. They are one of the top teams who are sharing their code with the rest of the participants. And we at crowdai in particular, celebrate the said spirit
    Collaboration between participants usually leads to much better solutions, and most participants in the top few ranks who ideally try to keep all their trade secrets to themselves, if one of them decides to share it with everyone else , then isnt it better for the rest of the participants who are both trying to learn and compete at the same time.
    In fact, thats one of the reasons we prefer to use the term “challenges” instead of “competition” because we want the community to collaborate as much as they can
    SP Mohanty
    @spMohanty
    regarding the actual scores. The dataset in round 2 will be very different, and all the participants will be required to submit code which both trains on the new dataset and then makes the predictions, so everyone can benefit from what @jakubczakon and team have made available openly
    Jan Philip Göpfert
    @jangop_gitlab
    I for one am super happy with what minerva.ml are doing and if this is how they chose to advertise their product then I think they went with one of the best, most honest options available to machine learning companies at present. Sure, it's not going to be easy to beat them, but that is not that point anyways.
    taraspiotr
    @taraspiotr
    @snakers4 It is only partially true that there are some artificial code. In the context of only this one challenge it could be true in some places, but we try to make as many parts of our code as possible, reusable. For others and for ourselves in this but also many other competitions, for example a lot of code was also used in this year's Data Science Bowl on kaggle (also as open solution). As far as neptune go, it would be quite hard to track hundreds of experiments without it, and everyone can track them as well and see our raw results ;)
    taraspiotr
    @taraspiotr
    And I think @spMohanty pretty much summed up the idea of open solution. It allows everyone to either take parts from our solution that will improve theirs or simply improve ours, participants can get more ideas and working solution to compare with or to use and achieve better results, organizers get better results and models, we love to contribute to open source, which is also a way to give back to ML community and I believe open source is one of the reasons of such a tremendous growth in ML and data science, and also we are able to show a product, which is in my opinion pretty cool and truly useful. I mean, it is a win-win-win imo. :D
    Jan Philip Göpfert
    @jangop_gitlab
    @taraspiotr word
    Jakub
    @jakubczakon
    @maksimovkonstantin We've been posting our solution (and improvements) on the forum since the very beginning so it had been a baseline which turned into a really good standalone solution. I am sure that there is still a lot to improve in this solution and people like yourself could take what we build and test the boundaries of what is possible here.
    Jakub
    @jakubczakon
    @snakers4 Yes we are a part of neptune.ml and we can build (and post) solutions like this because of it. That being said the solution runs on pure python and there is nothing in the code that makes anyone use neptune product. However, we do see a lot of benefit in using neptune.ml for tracking experiments and managing the entire process and we want to share that knowledge with the community too. Which as @taraspiotr said is, in our minds, win win win for everyone.
    Jakub
    @jakubczakon
    Also @snakers4 could you please drop an issue/feature request on our repo about where you see artificial code. I would love to hear your thoughts and improve it
    Alexander Veysov
    @snakers4

    @jakubczakon
    (1)
    Any references to pipeline management done via proprietary tools and any sugar whatsoever related to this
    I kind of understand that the code posted is your boilerplate related to MANY ML tasks, but it is not relevant for the public (and obviously custom built to work with neptune.ml)
    To find the weighting function you have to plunge 3-4 levels into the dependencies in the code, whereas the ndimage call (which is the core idea) which actually calculates such weights / distances takes one line - and would take a single flag in the Dataset class and 2-3 lines of code

    (2)
    The fact that you are using 3-4 levels of abstractions, whereas the CNN training in PyTorch can be handled with

    • Dataset and dataloader
    • Separate Model, Loss and Augs classes
    • 30 line imperative train loop
    • 1 CLI script with explicit params
      is kind of detrimental to the general ML community, promotes code bulk, over engineering and puts wrong ideas into immature minds

    (3)
    Despite of the fact that your write-up is amazing (only probably lacking ablation experiments) - the general public will most likely see your solution as blatant advertising of your platform, even though, I repeat - the ML in you solution is legit

    (4)
    Also regarding handling 100+ experiments - just using TensorBoard for training curve logging + putting explicit flags into one CLI script - takes 3-4x less code, does not promote code bulk, is 100% transparent and is much more accessible and reusable

    Jakub
    @jakubczakon

    Thanks @snakers4 !

    1) we tried to have that in the generator but it was very slow so we decided to prepare that before training .

    2) We do have dataset and dataloaders and separating model/loss/aug and 30 line imperative train loop was meant to help with reusability. As a matter of fact we wanted to put all that in a python lib so that people wouldn't have to care about it. We have already released it as steppy and steppy-toolkit on pip. However, as of now, it is missing things from this challenge.
    Could you please elaborate on the 1 CLI script .. ? I would really like to hear your thoughts on that.

    3) I respectfully disagree. We provide clean solution with no strings attached. You can use it however you like (MIT). There is nothing "blatant" about it in my mind. I think that such activity is actually a good thing. Correct me if I am wrong as I would really like your feedback on that.

    4) Have you actually checked what neptune does. I am quite confident that you cannot do all of that experiment management in tensorboard.

    I am really happy and thankfull for your feedback @snakers4 as that is exactly why we want to make our work public. To learn from the community and people like You.

    Alexander Veysov
    @snakers4

    @jakubczakon

    we tried to have that in the generator but it was very slow so we decided to prepare that before training

    It can be done quite easily (at least it worked for me on DS Bowl) since the batch-sizes are quite small even for 2-3 GPUs even with 300px resolution
    The key idea is tracking how much time each morphologic operation takes and optimize where possible - e.g. perform some operations only on separate "small" masks
    Also ofc this requires using 5-10 workers in PyTorch dataloader class
    I.e. you have to make sure that single image is processes faster than batch_size / num_workers / approx. batch processing time or something like this
    In practice this time is usually under 0.5s for small images which is reasonable

    Could you please elaborate on the 1 CLI script .. ? I would really like to hear your thoughts on that.

    Basically for a simple task like this you just put all the params related to ablation tests / features to the params in one python script
    Works really well with dynamic runtime dataloader

    Have you actually checked what neptune does

    Looks like recording your experiment results. TB + CLI scripts + google spreadsheet seem to do the job fine w/o introducing third-party software and exposing code to the public (or you have to pay I guess)
    There are so many tools. The majority of them will die with 95% probability. Why use one more?

    Jakub
    @jakubczakon

    @snakers4

    1. Thanks I will rethink (try) using it in generator then! I guess we abandoned it too early.

    2. I do like to have config params seperate but I see your point. Could you please point me to some discussion/blog post where this is discussed. I need to give it some more thought but again thanks for pointing that out.

    3. TB+CLI srcipts +google spreadsheet (+github) is my point exactly. In my experience it saves time to have that in one place. In my opinion not having to write those CLI scrips spreadsheets etc and having all that with nice UX adds value (also there is easy cloud compute but that's another story) . I do feel however, that whatever makes your work most efficient is what you should choose. Also you can easily set --exclude option to not show any code to the public if you don't want to. Private projects is another option just like on github.

    Regarding:

    "I kind of understand that the code posted is your boilerplate related to MANY ML tasks, but it is not relevant for the public (and obviously custom built to work with neptune.ml)"

    it is indeed related to many ml tasks and I think it is again a good thing to share something like that with the community but more importantly it is NOT "custom built to work with neptune.ml" . You can plug-in neptune via pytroch/keras/lgbm callbacks or send metrics to neptune instead of printing to console but it is entirely your decision

    Alexander Veysov
    @snakers4

    You can plug-in neptune via pytroch/keras/lgbm callbacks or send metrics to neptune instead of printing to console but it is entirely your decision
    Well my instincts and Occam's Razor tells me to do otherwise unfortunately

    To truly be a good ML practitioner you have to own every line of code you are using (or trust the black boxes you use), and when you see a lot of abstractions you kind of realize that they took a long path (and probably are justified), and since packaging them in a decent framework may be prohibitively expensive (and require an effort of many people) - such code may become obsolete faster than it is packaged and / or re-engineered

    So when you are just unleashing your code to the public - the less layers - the better.

    Anyway, you can find an example of such a CLI imperative script here
    https://github.com/snakers4/ds_bowl_2018 (the code was just posted as is, w/o being cleaned or made nice)
    Funnily enough, I forgot to recreate the optimizer after unfreezing the encoder, so I believe that solo ~150th place actually can be significantly improved just by doing that
    This is the illustration of the downside of this approach, i.e. simplicity requires really knowing what you are doing

    Jakub
    @jakubczakon
    @snakers4 thanks!
    Alexander Veysov
    @snakers4

    Btw for the community sake, I believe that UNet weighting is the key difference.
    It's kind of funny - because it is a seminal paper, I have always glanced over the weighting part.
    The top 1 solution in DS bowl used cell-separating borders, which are kind of similar, but come straight our of morphological operations.
    I used similar approach on 2D medical dental images - it really boosts tooth separation.
    _

    I have almost all the tricks in my pipeline (extensive augs, LR decay, transfer learning, "fat" encoders, clever decoders, clever augs etc) except for loss weighting.
    And the result is somewhere around .82 w/o any post-processing tricks (I believe they are irrelevant here. unlike the DS Bowl).
    Ofc filtering small instances and / or adding erosion levels or borders may give some boost like .05, but it seems like the weighting is the killer feature.
    _
    In a nutshell, in my case F1 score and hard dice (dice at 0.5 threshold, essentially a % of guessed pixels) are for various models:

    изображение.png
    Yeah, and note that fat ResNet based LinkNets provide roughly the same performance and heavy ResNet based UNets ... but they are 2x faster and lighter ...
    Jakub
    @jakubczakon

    Mhm I'll be sure to try that in the next challenge.

    I do think the weighing is key and the size weighted component (which btw I have never seen before) make a difference.

    However, it is important to see that the competition metric relies heavily on the per-object score and hence asigning proper score can give huge boosts.
    In our case second level model > area weighted probability > probability >> constant value for all objects.

    What is your strategy?

    Alexander Veysov
    @snakers4

    What is your strategy?

    Since there are no real valuable prizes, I just want to get .9+ (maybe .95+) on local pixel-based F1 score (I do not take object scores into consideration) and then decide
    Now the inception-based nets remain to be tested on the baseline (just plain semseg nets + labeling, w/o any watershed or its like)

    Also given the training curves, it would take 10-30 hours training time tops for the best model (supposing I take LinkNet152 + 50-100 epochs) - ~ 1 day, so there is ample time
    Adding weighting into my generator may take a couple of days, because I am kind of lazy

    Also it is just interesting whether the complicated weighting procedure from UNet is worth the time - of just the distance transform + sizes is enough

    The second part + submission will be done by my friend, I do not really want to go there, investigating the weighting is much more interesting

    Jakub
    @jakubczakon

    Got it. I am interested to see what you find.

    I think it would be interesting to investigate whether adding this contextual layer explained in this paper
    improves the Unet results. I am planning on investigating it in future competitions/challenges.

    On a separete note one interesting architecture (that didn't work for us but I had high hopes) is the one fully based on dilated convolutions descriped in this paper . Authors claim to outperform SOTA on small objects in aerial imaging. Not sure if it is worth investing your time but it is definitely worth knowing/monitoring for the future.

    Alexander Veysov
    @snakers4

    As for the Dilated convolutions - the hive mind opinion from ODS.ai is - they do no work

    I think I tried doing the following test on one of the competitions

    • just take LinkNet
    • replace all convolutions in the ResNet with dilated ones (had to do some tricks to keep the resolutions stable)
    • the result was ~same
    Alexander Veysov
    @snakers4

    @jakubczakon

    overlay_masks_from_annotations
    update_distances
    clean_distances

    Btw, I am not quite getting the logic of this
    In the Unet paper they need some sort of averaged distance to the border of the nearest cell and to the second nearest cell

    In your coder essentially you:

    • take each instance object in the loop from the coco annotation
    • add to mask m
    • on each step you calculate the distance_transform_edt for {1 - the current state of the mask} (i.e. w 1 object, 2 objects, 3 objects, etc)
    • then do this
    def clean_distances(distances):
        if len(distances.shape) < 3:
            distances = np.dstack([distances, distances])
        else:
            distances.sort(axis=2)
            distances = distances[:, :, :2]
        second_nearest_distances = distances[:, :, 1]
        distances_clean = np.sum(distances, axis=2)
        return distances_clean.astype(np.float16), second_nearest_distances

    I am not quite sure what distances.sort(axis=2) will do in this case, but it looks like that you get a set of distances between object borders and then just take the 2 largest ones
    I am not quite clear what is the logic of distances_clean = np.sum(distances, axis=2) in this case

    apyskir
    @apyskir
    Hi all, hi @snakers4 , I actually implemented the weighting function so let me answer this question. distances is an array with third dimension equal to number of annotations on each image. First we simply fill each next slice with distance to the single object (in function update_distances()) and then we sort it along third dimension, so first slice contains distances to closest object from each pixel, and second slice contains distances to second closest pixel from the object. Hope it helps!
    Alexander Veysov
    @snakers4
    I see, this call contains m, not mask
            if distances is not None:
                distances = update_distances(distances, m)
    Now this makes sense. I thought that distances was being calculated cumulatively
    apyskir
    @apyskir
    No, that wouldn't let us take sum of distances to only 2 closest objects.
    Alexander Veysov
    @snakers4
    The downside being that you actually calculate n*n distance transforms, which is kind of slow
    I believe that the rational thing to test is just trying running one big distance transform on the summed mask - did you try it?
    apyskir
    @apyskir
    Yes, it is slow, but we have to calculate it only once - while preparing data, before training. So here performance is not crucial for us.
    No, I didn't try single distance transform, it would focus more on all borders, not on places between two very close objects
    But I believe it could give really good results, too
    Alexander Veysov
    @snakers4
    I will return in a couple of days with a couple of additional tests
    It's kind of interesting to add this feature to the best model on the baseline and measure local F1 / HDICE
    apyskir
    @apyskir
    Looking forward to see your results!
    Alexander Veysov
    @snakers4
    So, I have investigated the loss weighting.
    Interesting take-aways:
    Alexander Veysov
    @snakers4

    (0)
    I believe that the exact way of computing distances does not matter - because we have 1 + w0 * exp(-d^2 / sigma^2) term , which really squeezes any values roughly to [1;w0]
    hence I hope you can safely go ahead with just using one distance transform

    (1)
    Since the values are squeezed, we just must make sure that the [1;w0] does not explode and that sigma^2 is roughly similar to average distance std^2

    (2)
    In PyTorch you have to be careful

    • BCE loss and BCE loss functional actually take weight pixel matrix, i.e. WxH
    • categorical cross entropy requires manual fiddling with reduce=False

    E.g. my loss is BCE based, but the solution presented has categorical cross entropy

    Hope this is useful for somebody

    Alexander Veysov
    @snakers4

    Finished encoder tests on the baseline (plain labeling wo watershed or morphology, plain BCE/DICE loss, no loss weighting).

    As I expected

    • Fat ResNet/LinkNet ~ fat ResNet/UNet - but the former is even 3-4x faster (!)
    • Inception/InceptionResNet work, but a bit worse. They are probably harder to train and in my experience on some tasks required setting different LR for each layer, which is tricky
      (there was a paper saying that ResNets transfer best / are easier and faster to transfer, but on average the better the architecture on Imagenet the better)
    • VGG based models work worst
    изображение.png