geometry
of a STAC Item should be resolved as due to small turbulences the sensor is constantly shaking and thus the footprint has a wobbly shape. Using only the bounding box would likely cover way too much of an area, but providing all the fine details would probably be far too much of nonsensical metadata.
layer
extension in stac4s
currently (https://github.com/azavea/stac4s/tree/master/docs) and as we explore it / show uses we'll pr it back to the main STAC spec repo eventually
scale
parameter was me trying out generating the nodata mask from overviews instead of the full data file, as I thought it would speed it up (a scale factor of 2 effectively would be using the half size overview if there was one), but I found the results to not be great.
Thanks @lossyrob and @matthewhanson for the advice that seems to be a sensible choice. Most probably I'll really go the route via rasterization -> vectorization -> simplification on my datasets as well.
So for the balancing act, I see that it is neither desireable to make the boundaries too large (users get too many unusable results) nor to make them too small (users get too few results). And generally having as few points as possible is good. If I do simplification, I'll either have to define some tolerance in terms of distance or I need to define a maximum amount of points which may be returned or both. Are there any established good number for that? I could imagine rules like (but maybe there are more options):
Each of those may have their problems and maybe there are no good general advice. But as a good choice not only depends on the dataset creators capabilities but also on the user of the dataset, I've the feeling that a general guideline could be helpful.