@cholmes @m-mohr During development of the Collection Spec, was there ever consideration of a Collection also being consumable as a GeoJSON (e.g. having geometry
, feature
and properties
fields)?
I'm working through a scenario at the moment with a number of stand-alone collections that would greatly benefit from having their footprint/extent represented in a simple and consumable manner. The data is mainly linear captures with film / digital aerial photography, where a bounding box is a very poor representation of project extent. In a large number of instances we also don't have any mosaic assets that show the project's full extent. While we've tried to represent the data in the summaries object under proj:geometry
, this resulting file is certainly not as usable to rapidly query or visualise as a GeoJSON.
While I can't find any mention in the STAC spec and OGC API - Features spec, I notice that in PySTAC there is support for a Properties object in the collections class. Is this from earlier iterations of the spec, or for collection level properties to be represented?
As always, your support and insights are greatly appreciated. I know our use cases can be a bit left of field sometimes.
@JohnBTasker Yes, but we had the desire to align with OGC API - Features and they don't use GeoJSON for Collections.
If you use data and collections in a sentence, there's likely something wrong. I think what you want to do is an Item, not a standalone collection. The addition of the collection assets was not meant to replace single items, but to allow to link to assets that are common across all items. Also, your standalone collections would not be searchable via STAC API /search. So I'd recommend to make an item, with (optional, but I'd recommend it) a corresponding collection.
Properties in Collections is indeed from an earlier iteration of STAC. That will likely go away in implementations in the future. Collection level properties go in summaries (for standalone collections) or top-level, if the extension supports it.
@cholmes The general data model that we have established has four object types:
Only frames are directly related to data files, the remaining objects are semantic groupings of common search/discovery attributes. The main interest for proposing improved geometry details in collections is that more often than not for aerial datasets you will initially search for a project rather than individual data items, particularly over large areas. Once you've narrowed down the project(s), finding the specific data item / format is then the focus. A good example of this type of logic is the LINZ Data Service (https://data.linz.govt.nz/) where data is organised based on projects, with individual tiles subsequently downloadable.
geometry
of a STAC Item should be resolved as due to small turbulences the sensor is constantly shaking and thus the footprint has a wobbly shape. Using only the bounding box would likely cover way too much of an area, but providing all the fine details would probably be far too much of nonsensical metadata.
layer
extension in stac4s
currently (https://github.com/azavea/stac4s/tree/master/docs) and as we explore it / show uses we'll pr it back to the main STAC spec repo eventually
scale
parameter was me trying out generating the nodata mask from overviews instead of the full data file, as I thought it would speed it up (a scale factor of 2 effectively would be using the half size overview if there was one), but I found the results to not be great.
Thanks @lossyrob and @matthewhanson for the advice that seems to be a sensible choice. Most probably I'll really go the route via rasterization -> vectorization -> simplification on my datasets as well.
So for the balancing act, I see that it is neither desireable to make the boundaries too large (users get too many unusable results) nor to make them too small (users get too few results). And generally having as few points as possible is good. If I do simplification, I'll either have to define some tolerance in terms of distance or I need to define a maximum amount of points which may be returned or both. Are there any established good number for that? I could imagine rules like (but maybe there are more options):
Each of those may have their problems and maybe there are no good general advice. But as a good choice not only depends on the dataset creators capabilities but also on the user of the dataset, I've the feeling that a general guideline could be helpful.