Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Sang-Hyeok Yang
    @yy-ssang
    Hello. I'm glad to join this chat.
    Could I ask about the configuration of dataset?
    Ah... I forgot to introduce myself. I'm MS-Ph.D Integrated course student in SungKyunKwan University(SKKU) in South Korea. I'm a member of SKKU-STEM group(Young-Min Kim's group)
    Sang-Hyeok Yang
    @yy-ssang
    I just wonder how to configure the training datasets. I almost use STEM simulation image. But, in some example notebook of AtomAI, the datasets seem like real experiment image. Is there any difference between simulation and experiment?
    Maxim Ziatdinov
    @ziatdinovmax
    @yy-ssang The performance of ML/DL is known to degrade significantly when tested outside the domain of training examples. Hence, it is generally better to have the training and "test" sets coming from the same distribution. And since our "test" set is usually experimental data, ideally we would also want to use the experimental data for training our models. Now, in some cases, we can get away with using only simulated data, especially if our data augmentation procedures account for experimental non-idealities. The rule of thumb is that if you have high-quality experimental data for which you know the ground truth, you should use it to train your models; if no such data available, the only other option is to use the simulated data and then try to minimize the out-of-distribution (OOD) effects.
    Generally, we found that when doing simulation-to-real-world transition, it is better to train an ensemble of models instead of a single model. See e.g. https://arxiv.org/ftp/arxiv/papers/2101/2101.08449.pdf
    Sang-Hyeok Yang
    @yy-ssang
    @ziatdinovmax Thank you for your comment and paper recommendation! I will read it and back when I have a question.
    Maxim Ziatdinov
    @ziatdinovmax
    It all boils down to the fact that there are certain things in the experiments that are very hard to simulate and they lead to degraded performance. Say, for example, you train your model on some nice DFT/MD simulations. Then, in the experiment, you get some "junk" on your sample because of how it was prepared and/or transferred on a grid. Most of the models trained on simulated data will likely show poor performance on such regions since they were not included in training examples. Of course, performing larger-scale simulations could help. Another option is to have a simple binary classifier before your main DL model. This classifier will filter out all "bad" images and pass only "good" images to your main DL model. This option is obviously much cheaper (and I use it quite a lot) than performing larger scale simulations.
    @yy-ssang you are welcome!
    Sang-Hyeok Yang
    @yy-ssang
    @ziatdinovmax So, ensemble learning is the method to segment the STEM image by voting from many model with pretrained. right?? In this case, hyper-parameter is fixed? or slightly tuned??
    Maxim Ziatdinov
    @ziatdinovmax
    Yes, with a caveat that you do not want to include in the voting those models that produce unphysical results. There are several ways how we can train an ensemble. See this paper for an overview: https://arxiv.org/pdf/1912.02757.pdf. The best one (but also the most expensive one) is just to train multiple models from scratch with different weight initialization and different shuffling of training data, and perform stochastic weight averaging at the end of each training (roughly equivalent to the multi-SWA method described in https://arxiv.org/pdf/2002.08791.pdf). Everything else remains the same. Yet, they produce surprisingly different outcomes when applied to OOD data.
    Sang-Hyeok Yang
    @yy-ssang
    Ah. I got it. Of coursely, the best method is to try training the model in all of cases about hyper-parameter in various fields using deep learning. So, I just tuned the hyper-parameters for more accurately result. But, in this cases, ensemble learning can be the efficient way and the fact breaks my stereotypes. Thank you for discussing me and I will study harder with papers you recommended!
    Maxim Ziatdinov
    @ziatdinovmax
    The AtomAI is now equipped with deep kernel learning for predicting functionality from structural data! More details about the latest release -> https://github.com/pycroscopy/atomai/releases/tag/v0.7.0
    ChenYuancy123
    @ChenYuancy123
    I'm very glad to join this chat.I'm a student from China.But recently I want to recurrence code about atom finding.However, when I want to train my dataset, I find that "Assertionerror: labels should start from 0" appears in the preproc.py file in the atomai file. Is there a problem with the shape of the "label" data of my training set. The shape of the "label" data of my training set is "(n_images, image_height, image_width)". I hope to be corrected.
    Junmian Zhu
    @zhujunmian
    Hi, may I get to know are there any kind of recommended way for labeling microscopic image for image segmentation task?
    Maxim Ziatdinov
    @ziatdinovmax
    @ChenYuancy123 Sorry for the late response. Were you able to resolve the issue? If not, can you please run np.unique(you_labeled_data) and let me know what the output is?
    @zhujunmian Hi Junmian, this depends on the system. We have some (simple) tools for preparing labeled data for atom-resolved images. For other types of systems, it probably needs to be done manually or semi-manually.