Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 13:09
    codecov[bot] commented #2336
  • 13:09
    tjof2 synchronize #2336
  • 13:07
    thomasaarholt commented #2203
  • 13:07
    codecov[bot] commented #2336
  • 13:07
    tjof2 synchronize #2336
  • 13:06
    tjof2 edited #2336
  • 12:54
    ericpre commented #2336
  • 12:52
    pquinn-dls commented #2203
  • 12:34
    ericpre synchronize #2327
  • 12:28
    ericpre synchronize #2327
  • 12:07
    codecov[bot] commented #2348
  • 11:47
    codecov[bot] commented #2348
  • 11:47
    codecov[bot] commented #2348
  • 11:47
    codecov[bot] commented #2348
  • 11:35
    codecov[bot] commented #2348
  • 11:26
    thomasaarholt commented #2336
  • 11:08
    codecov[bot] commented #2203
  • 11:07
    thomasaarholt commented #1274
  • 11:00
    thomasaarholt synchronize #1274
  • 10:57
    codecov[bot] commented #2203
Tom Furnival
@tjof2
Warning: mlpca can be v.slow for large datasets
You can certainly try centering. But as previously discussed, you cannot do it as well as normalizing Poisson noise
Francisco de la Peña
@francisco-dlp
@Mingquan_Xu_twitter, if you try centring EELS data you will see that it'll lead to one more significant component in the scree plot comparted to standard SVD without poisson scaling. That extra component is needed to account for the unnecessary centring step and is thefore deleterious for the purposes of denoising and blind source separation. Unfortunately there is a lot of confusion about this in the EELS litterature.
jeinsle
@jeinsle
This message was deleted
Thomas Aarholt
@thomasaarholt
@TheFermiSea do you need lazy loading? Currently I'm assuming you don't.
(lazy loading is for loading really big files, larger than ram)
Oh, and what is the third dimension of the navigator? (the file you sent me has shape (4,4,8 | 180)
jeinsle
@jeinsle

@francisco-dlp and @tjof2

@Mingquan_Xu_twitter, if you try centring EELS data you will see that it'll lead to one more significant component in the scree plot comparted to standard SVD without poisson scaling. That extra component is needed to account for the unnecessary centring step and is thefore deleterious for the purposes of denoising and blind source separation. Unfortunately there is a lot of confusion about this in the EELS litterature.

So I have done just this (Possion = True and Centre =Trials) on some EDS data. I am getting significantly better PCA results than without doing this step. So a) what I would wonder is if you could elucidate more on this incompatibility (feel free to just give me a citation to chase) and b) if I am seeing this on EDS to work, why would it not work on EELS. Really, I just want to get more at the statement:

plain SVD works better for our application.

What does this mean? how does it work better? It is not clear from the hyperspy documentation why this choice is being made.

In my case here, I have significant electronic noise of some kind on the EDS data (no i can not go and recollect - sample has been lost), which standard non-centring results in the first SVD component being this noise. Further, without the Poissonian normalisation, while I can get two components to describe messy system, it still leaves the results underdetermined, and unable to locate all the phases I am actually interested in.

Francisco de la Peña
@francisco-dlp
@jeinsle, that's a good point. The reason why we don't need whitening and centring in EDS and EELS is that in those 2 fields there is a meaningful baseline and scale. This is not the case in most applications of PCA e.g. in social sciences. However, if your data suffers from electronic noise, then your baseline may not be zero as it should and actually it may vary across the dataset. As EDX data tends to be sparse, my guess is that centring is more or less subtracting the electronic noise component, what leads to a better decomposition in your case. So I bet that yours is the exception that proves the rule.
TheFermiSea
@TheFermiSea
@thomasaarholt I will need lazy loading in the future. The first two axes are the image, the third is wavelength, and the signal is of dimension 180. I forgot to truncate the wavelength axis, but it is important.
Thomas Aarholt
@thomasaarholt
If you follow the installation instructions for ipympl, you'll have a nice time with the function I've written, I think
Mingquan Xu
@Mingquan_Xu_twitter
@jeinsle and @francisco-dlp , thanks for your sharing. Because there are several papers said weighting (scaling) and centering are useful pre-processing steps when doing PCA, even in the EELS field (e.g. https://doi.org/10.1016/j.ultramic.2006.04.016), I am usually confused about that. I have not processed EDX data yet and will have a try. Thanks.
Francisco de la Peña
@francisco-dlp
@Mingquan_Xu_twitter, as noted by @tjof2, weighting and centering are incompatible. The paper that you mention is 14 years old and since then the community has advanced a little on this topic, albeit unfortunately reaching concensus on this seems to be taking longer than one would expect. It is certainly right from your side to question our statements. What I suggest is that you try it for yourself: perform PCA (actually SVD) with and without centring on your dataset and compare the results. Of course you should not perform weighting when you don't center, otherwise the comparison wouldn't be fair.
Thomas Aarholt
@thomasaarholt
@TheFermiSea what do you think of this?
image.png
TheFermiSea
@TheFermiSea
@thomasaarholt This is great! Would you be willing to put together a very basic jupyter notebook explaining how you did this? Even just a course outline would be helpful. I would like to learn how to use this package more completely.
Thomas Aarholt
@thomasaarholt
@TheFermiSea try this function. Just paste the whole thing into a cell in jupyter notebook / jupyter lab
I'll add a bunch of comments, if you can try the above out
Mingquan Xu
@Mingquan_Xu_twitter
@francisco-dlp , thanks very much for your seggetions. I will compare them on my own dataset.
Thomas Aarholt
@thomasaarholt
@TheFermiSea I realised I made one breaking change before I posted it. Updating now.
TheFermiSea
@TheFermiSea
@thomasaarholt Ok, I was having some issues. I thought it was because of my matplotlib backend, but maybe not.
Thomas Aarholt
@thomasaarholt
@TheFermiSea How big (what is the shape) of your actual datasets?
Tom Furnival
@tjof2

@francisco-dlp is there a chance you could take a quick look at the robust NMF PR I've had open for a while (#2035)? It's only got a little bit of stuff left to review I think.

The reason I ask is I'd like to tackle the decomposition documentation to answer many of the issues raised here and in #1159, but since the PR above also makes changes to the documentation, I want to tackle them one-after-the-other to make life simpler.

Thomas Aarholt
@thomasaarholt
@TheFermiSea I've updated the gist with lots of comments. Hope that helps!
Let me know if you have any trouble with ipywidgets or ipympl. Unfortuntaely, these do not work in Spyder, but they should be working in the next version of VSCode.
Francisco de la Peña
@francisco-dlp
@thomasaarholt, turning your work into a PR would be great!
@tjof2, I'll have a look at your PR later in the day
Alexander Skorikov
@askorikov

@thomasaarholt

Is the following a bug? Or is there some other dictionary expected by Signal2D? I'd quite like to use this sort of implementation in #1243.

s = hs.signals.Signal1D([1,2,3])
hs.signals.Signal1D(data = s.data, axes=s.axes_manager.as_dictionary())

----
...
TypeError: _append_axis() argument after ** must be a mapping, not str

I think the axes parameter expects a list of axis dictionaries. You can unpack axes from the axes_manager.as_dictionary() result in the following way:
hs.signals.Signal1D(data=s.data, axes=s.axes_manager.as_dictionary().values())

Tom Furnival
@tjof2
Thanks very much @francisco-dlp!
Thomas Aarholt
@thomasaarholt
@askorikov that's an elegant way of doing it!
gjdevos
@gjdevos
Hi, I'm working on a Phenom desktop SEM elid file reader for hyperspy. Would you be interested in this?
Thomas Aarholt
@thomasaarholt
Definitely! I don't use Phenom myself, but we very much approve of supporting new formats!
gjdevos
@gjdevos
Great! Can you explain the difference between plain metadata attributes and mapped attributes? Can I for instance store the acceleration voltage value in metadata.Acquisition_instrument.SEM.beam_energy directly, or should I provide a mapping for it?
Eric Prestat
@ericpre
@gjdevos: the idea is to store all the metadata in original_metadata, and add some of them of the metadata to match a specific structure
some readers use the mapping dictionary to map metadata from original_metadata to metadata when the file is loaded
gjdevos
@gjdevos
I understand. I just wondered why I should provide a mapping function from original_metadata to metadata attributes, rather than just storing the metadata attributes directly, where appropriate.
Eric Prestat
@ericpre
the mapping dictionary is optional and quite often it is more really readable to have all the metadata parsing in one place
gjdevos
@gjdevos
OK, thank you.
Tom Furnival
@tjof2
Thanks for taking a look at #2035 @francisco-dlp . I've updated the test coverage as requested. It looks like there may be a lack of coverage for the lazy versions of each that @to266 implemented, but I'll create a separate issue for that now.
Francisco de la Peña
@francisco-dlp
Great, thanks @tjof2.
Tom Furnival
@tjof2
No problem :) created #2346
TheFermiSea
@TheFermiSea
@thomasaarholt Hi Thomas. I was able to get this running smoothly in a jupyter notebook. I made some changes to the signal plot, so that it does not autoscale (lines 107 and 131). I would like to do the same for the navigation plane. I tried just commenting out line 123, but then I get saturation at points. Is there a way to force the color bar scale to be defined?
Thomas Aarholt
@thomasaarholt
Yes, you can use the im.set_clim(low, high) method
If you set that after line 69, and comment out 123, then you'll get what you want.
TheFermiSea
@TheFermiSea
Awesome, Thanks!
Tom Furnival
@tjof2
I've left some questions on #2336 regarding print() vs _logger.info() for the decomposition information, particularly given the default logging level of WARNING.