by

Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
• Create your own community
Activity
Tim Snow
@timsnow
Hello!
Andrew Nelson
@andyfaff
Tim, do you have any resources to explain the NXcanSAS format?
Tim Snow
@timsnow
Not directly, what kind of resources are you looking for?
Are probably worth checking out
Somewhere Pete Jemian made a NXcanSAS validator tool as well...
adrianrennie
@adrianrennie
@timsnow There is a link to a validation tool for NXcansas on the page https://github.com/canSAS-org/NXcanSAS_examples . The link is https://punx.readthedocs.io/en/latest/
As background information, there are some examples and brief descriptions of various file formats that are presently used for reflectivity data at http://www.reflectometry.net/refdata.htm . Unfortunately very few are based on definitions but rather just interpretation of examples.
jochenstahn
@jochenstahn
Salut everyone. I prepared some text with ideas and questions to start / continue the discussion. You can access it at https://drive.switch.ch/index.php/s/4Sa6Q6EfB5xQnrr ( I am new to gitter and the like and thus do not know its possibilities to attach or share files.... ) Please feel free to comment on it.
Andrew Nelson
@andyfaff
• I don't think we should worry immediately what the file will look like (sect 1.2); e.g. whether it is binary, ASCII, HDF, SQL database. That can be thought about later. Thus I don't think we need to think about a column order at this time (sect 3.1).

• if a time series was obtained within a single acquisition (single file) then I think the reduced file for that dataset should contain the reduced datasets for the time series. I currently do this by using separate files, e.g.
PLP0000425.nx.hdf gets reduced to PLP0000425_0, PLP0000425_1, etc.

• sect 3.2. Should a reduced dataset contain data expressed as a histogram? What is the rationale for doing so?
Monochromatic instruments typically wouldn't use that because of their mode of operation (except if they use an area detector to encode angle of incidence). Energy dispersive instruments do have wavelength histograms, and may or may not use angular histograms.
Every analysis program that I've come across uses a point wise value for Q and R, any histogram widths get folded into a pointwise resolution value dQ. A given analysis program would have no idea of how to use a histogram width to amend a resolution value. Neither Motofit or refnx can handle histogrammed data. I don't think refl1d does either.
About the only reason I can think of for retaining bins is to make it easier to display offspecular data.
I would be tempted to change 3.2.1 to "The reflectometry data set contains pointwise values for Q/R/dR/dQ, the use of histogrammed values for Q are strongly discouraged."

• I like the start of the definitions in sect4.1. The local coordinate system will reduce to the macroscopic coordinate system when the sample is flat. My preference would be to use the macroscopic coordinate system. This is because we know the location of instrument components such as direct beam paths, reflected beam paths, detectors. alpha_i and alpha_f are then calculated from the macroscopic location of the instrument (they have to be, that's the purpose of a sample alignment).
If there are any corrections for non-flat surfaces they are then applied on top of the values derived from the macroscopic coordinate system.

• sect4.1 is it necessary to define specular scattering angle, or specular angle of incidence? I'm happy with alpha_i, alpha_f. The specular angles can be worked out from those, and are therefore redundant.

• sect 4.1 we need to define what kind of radiation, e.g. 'probe' = {'neutron', 'X-ray'}

• sect 4.2. I was surprised that this engendered so much debate. We could probably just deal with the difference between the two by using an attribute associated with the data:
e.g. lets call the data 'I' (for intensity) then have a 'normalised' attribute that is either {True, False}. I don't mind if we use 'R' instead of 'I'.

Andrew Nelson
@andyfaff
This paper might be interesting resource for those interested in data formats, https://datascience.codata.org/articles/10.5334/dsj-2016-012/
Andrew McCluskey
@arm61
Hey all, was trying to work on a logo for ORSO. I have made a repo to keep ideas for the logo (if anyone else fancies making one) and eventually all of the logo assets. All comments/design suggestion on mine are welcome. https://github.com/reflectivity/logo/tree/master/arm
Andrew Nelson
@andyfaff
Logo looks nice. Do the stars have too fine detail to reproduce across a range of circumstances?
jochenstahn
@jochenstahn
Dear all, I improved my documemt based on suggestions by Artur and Andrew: https://drive.switch.ch/index.php/s/4Sa6Q6EfB5xQnrr Please have a look. AND COMMENT ALSO.
Tim Snow
@timsnow
Noted! (McCluskey has been prodding me...)
Brian Benjamin Maranville
@bmaranville

Here is a brief response to the questions raised in your document, Jochen:

Short Term Aims

• What do we want to achieve? I think the establishment of any standard (schema, or just a list of definitions) that facilitates the cross-comparison of datasets from different facilities or instruments will help the situation a great deal
• The most flexible but complete definition will help the most (why not allow Angstroms and nm? As long as they're both well-defined)
• I don't think there's any problem with redundant entries: most users will appreciate being able to plot vs. $Q_z$ even if they need $\alpha_i$ for doing real reduction/fitting.
• some shared vocabulary will immediately benefit - I think we can probably all agree on a notation to indicate whether the resolution for a point is a) a gaussian of width sigma, b) some other function (defined in the header) c) a measured function with provided data points, etc...

Medium Term Aims

• Comprehensive list of definitions: I think this is the core, most important thing. (machine-readable and accessible by URL as well)

• Fixing the content of the data file: also important, and greatly aided by having the definitions in place!
• The data format: This will be tricky. People love ASCII, for good reason, but we push it to the limit.

Long Term Aims

• Providing python modules to import/export This is my bias, but I strongly disagree with this goal. If the format is self-documented sufficiently the value added by providing a bespoke library is much less than the value of a few code examples that show how to quickly get the values you need (in various languages) I think we can do a lot by leveraging existing technologies and libraries to get what we want.

• Publication: this seems like a necessary but tedious thing.
• Keeping discussions going: Yes!!

Ok, not so brief...

1.4 principles
"make it right": I agree with JS comment, that old notations can be difficult to leave behind, but with a proper reference to the standard definitions, we don't have to!

2.3 header: specific information: I'm not sure if there should be a strong distinction between data in the header and data in the "columns"; e.g. an angular resolution that is constant can be in the header but one that changes will be a column. Our schema (and programs that consume these files) should seamlessly be able to refer to data in either place.

2.4.1 data set, columns
I agree mostly with AN, that the exact structure of the data file (e.g. column order) is not the most important concern right now. I'd like to think any software package in active development could quickly be modified to allow column order to be specified in the header (making the order less relevant) Is there analysis software out there that is used by the reflectometry community that is not maintained/written by active members of the reflectometry community?

2.4.2 bin labelling
Agreed that histogram widths currently get folded into instrumental resolution, which can be less than optimal - since then the resolution function is then an implicit convolution of the two quantities. I think down the road we will want to have these bin widths explicitly specified so that the next version of the analysis program can make use of this information properly. Probably a lower priority item, though, given that most experiments/instruments are designed so that this is not a major issue. But let's define the vocabulary for specifying this!

3 Format

People seem to love ASCII columns, but there are real advantages in other structured formats. An HDF file can have a folder in it which tells the software what to plot by default, for instance... (the NeXus scheme has this)

As a case study, at NIST we moved from ASCII to HDF for raw data in about 2016 (reflectometry). Part of the reason we didn't have a revolt is that we have an in-house software package that allows quick viewing/plotting of the data from the HDF files, but the plotting (reduction) framework is highly specific to our HDF schema and not generally useful to the reflectometry community.

I think there is a real need for a NeXus data plotter that is as easy to work with as the (text formats) + (gnuplot/Origin/IGOR/Excel/...) I have done some work on a web-based HDF viewer but it does not fulfill the NeXus intent of plotting the data automatically (partially because our implementation of the NeXus spec in our data writers is not complete)

4 Software Modules

If there was a generally-available library for reading our recommended schema for ASCII files, (resolving all the references to standard definitions in the headers, and identifying all the columns) I think this would help drive adoption of our definitions and schema.

A.1 coordinate systems

JS - when you refer to a non-flat sample, are you talking about macroscopically curved samples, or just local roughness? For local variations I would argue that the average z is the only one that matters, but for a curved sample we might need another coordinate system definition (rather than complicating the default coordinate system, why not define a new one?)

A.2 names, sympbols

I would tolerate as many units for a quantity as are in common use. There's no cost in providing definitions that cover all the use cases.

For the ORSO standard names, should we stick with the ASCII character set? Is a name like 'alpha_i' going to rub people the wrong way? If we use the more descriptive labels as the ORSO standard identifier, e.g. "incident_angle" we could provide a list of common aliases or "display labels" which could include extended characters or LaTex or ...?

Agreed with AN that we should provide a way of specifying the probe (neutron vs X-ray). Should polarization state of the probe be included in this definition?

C Examples

Reductus: the output text files have a JSON representation (single line, no spaces) of the reduction steps in the header (see for example Pt15nm23552.refl)

Each function is identified as a "module" in that JSON, e.g. "ncnr.refl.mask_points", and the connections between the functions (a directed graph) is indicated by the "wires". Parameters for the functions are identified in the "config" for each module.

The .refl file above can be re-loaded into our reduction server at https://reductus.nist.gov (menu->data->reload exported) and the graph can be inspected and altered. The program reads the first line to extract the reduction protocol (including files to load)

After the first line, the other headers include the wavelength and wavelength resolution (though not the units or function form of the resolution function) The functional form (which are Gaussian, where the reported number is 1-sigma) is also missing in the column identifiers for the resolution and R (which we are calling "Intensity")

After the headers it is a pretty standard 4-column text file.

Andrew Nelson
@andyfaff
One aspect thats been around for a while, but I've only realised is important in various new situations (mostly when using low resolution NR measurements), is the ability to use a non gaussian resolution kernel. Here we need to store Q, p(Q), the probability distribution function. This probability distribution is different for each Q point. I think it's important to stuff this information into a data file as well.
Andrew Nelson
@andyfaff

I just set up a https://github.com/reflectivity/file_format repository that we can use to store progress. The point made by Brian:

Software Modules. If there was a generally-available library for reading our recommended schema for ASCII files, (resolving all the references to standard definitions in the headers, and identifying all the columns) I think this would help drive adoption of our definitions and schema.

is relevant here. Hopefully with the upcoming meeting and the good start discussing various items in this thread + the document created by Jochen we can start to progress to something a little more concrete. I'd envisage that we could use the repo as a sandbox to start creating code.

Andrew Nelson
@andyfaff
I'm toying around with the idea of writing a prototype hdf file similar to NXcanSAS, but for reflectometry. This might give us a sandbox to play with as we refine the language and definitions that get developed by this group.
What do people think? Would it be preempting things too much?
Brian Benjamin Maranville
@bmaranville
I am writing such files in sandbox mode for our CANDOR instrument right now... so the development is already underway.
It might be nice to compare notes on our early attempts at nexus refl files - I can put some of our (bad, incomplete) files in that repo.
Andrew Nelson
@andyfaff
Great. The layout of NXcanSAS looks as if it lends itself to an OOP representation (NXorso, NXdata, NXentry classes, etc) I was going to go down the path of writing something along those lines, but if you've already started it'd be good to collaborate on that sort of thing.
jochenstahn
@jochenstahn
I like the idea to just start (based on some concept of course) to have a basis for discussions. I started this for the ASCII file.