These are chat archives for frictionlessdata/chat

3rd
Aug 2017
georgiana-b
@georgiana-b
Aug 03 2017 15:01
Hello! I'm having a hard time understanding the specs on profile in DataResource.
From this fragment The namespace for the profile is the type of descriptor, so, "default" for a Package descriptor is not the same as "default" for a Resource descriptor.
it sounds like we should allow a resource in a tabular Package to have a default profile, because they are independent. Is this correct?
roll
@roll
Aug 03 2017 15:06
This wording is not up-to-date - default Package profile is data-package and default Resource profile is data-resource
georgiana-b
@georgiana-b
Aug 03 2017 15:08
@roll Even the specs say the default profile should be called default: https://specs.frictionlessdata.io/profiles/
This might also be a problem with our implementations but my question is why a Resource even has a profile in the first place.
You mean why resource has to have profile property?
georgiana-b
@georgiana-b
Aug 03 2017 15:20
Yes.
I don't understand why Resource needs a separate profile from the Package profile.
roll
@roll
Aug 03 2017 15:23
To distinct tabular and non-tabular resources. Consider non tabular data package is having both tabular and non-tabular resources. So publisher could mark tabular resources with profile=tabular-data-resource to ensure that implementations will provide resource.table (resource.tabular == True) and will validate Table Schema etc
There is one namespace for profiles id: data-package, tabular-data-package, data-resource, tabular-data-resource, table-schema etc (partially in https://specs.frictionlessdata.io/schemas/registry.json)
tabular-data-package profile requires all resources to be tabular-data-resource and data-package profile allow resources to have any profiles (in the example above data-resource and tabular-data-resource)
georgiana-b
@georgiana-b
Aug 03 2017 15:36
@roll A tabular-data-package doesn't logically allow a default resource because the tabular-data-resource profile is "built-in" the tabular-data-package profile so the whole package would fail validation if it contained a default resource.
However, that looks more like a bug than intended beehaviour. If a tabular-data-package really wants only tabular Resources then it should have an enum constraint on the profile property of it's resources, right?
roll
@roll
Aug 03 2017 15:42

However, that looks more like a bug than intended beehaviour. If a tabular-data-package really wants only tabular Resources then it should have an enum constraint on the profile property of it's resources, right?

Yes for tabular-data-package all resource profiles must be one of:

  • omitted
  • tabular-data-resource
And as said profile=default is not a correct value. It's a specs wording to fix
georgiana-b
@georgiana-b
Aug 03 2017 15:53
@roll But that's not how tabular-data-package profile looks now. This touches on the broader issue: package profiles describe resource profiles instead of referencing existent resource profiles.
This leads to several issues, for example:
  • the tabular-data-package profile doen't contain a link to the tabular-data-resource profile so people might not even know it exists
  • the resources section in tabular-data-package overwrites the tabular-data-resource profile which makes the tabular-data-resource profile useless
roll
@roll
Aug 03 2017 16:02
@georgiana-b do you mean a jsonschema for tabular-data-package includes a jsonschema for tabular-data-resource? It's specially dereferenced to simplify validation in implementations.
georgiana-b
@georgiana-b
Aug 03 2017 16:06
@roll Ok, but that makes it confusing. I think tabular-data-resource and data-resource and fiscal-data-resource should be present in the registry so that people can use them on their own. For example if I wanted to make one of the resources in my normal datapackage a tabular resource, I wouldn't know what profile I should use.
roll
@roll
Aug 03 2017 16:07
@georgiana-b yes of course - I'll mention the registry incompleteness in the issue
That's a long story behind this package/resource separation and the profiles registry so things just need to be polished presentation-wise. And it's about to be done already. Last tweaks are coming.
georgiana-b
@georgiana-b
Aug 03 2017 16:18
@roll The separation also has several implications on the code:
  • a Resource profile can either be a registry id, a URL or a Hash (from the resources section of the package profile)
  • when a package is validated against it's profile, we should validate without resources and then validate each resource against it's own profile
Btw, if you haven't started writing the issue, I'd prefer to write it instead because I already have all the rants ready :smile:
roll
@roll
Aug 03 2017 16:26

a Resource profile can either be a registry id, a URL or a Hash (from the resources section of the package profile)

Hash from the resources section of the package profile?

when a package is validated against it's profile, we should validate without resources and then validate each resource against it's own profile

We should validate package with resources against its profile and validate resources against its profiles. It's possible because e.g. tabular-data-resource is a valid data-resource etc

@georgiana-b about the issue - it will be great if you write it. thanks
georgiana-b
@georgiana-b
Aug 03 2017 16:30
@roll Sorry, by Hash I mean dict. Since anybody can include the resource profile i.e. json-schema in the package profile, it means a resource profile can also be a dict. For example now we have the tabular-data-resource profile included in the resources property of the tabular-data-package profile.
roll
@roll
Aug 03 2017 16:43
descriptor.profile could be only a string (id or url) - https://specs.frictionlessdata.io/schemas/data-package.json
georgiana-b
@georgiana-b
Aug 03 2017 16:47
@roll Yes, but if the package profile contains a json-schema for a resource in it's resources property, then the resource will be validated against 2 json-schemas: one from the package profile resources and one from the resource profile.
This can lead to conflicts and confusion and I think we should avoid it programatically.
roll
@roll
Aug 03 2017 17:06
@georgiana-b Yes but it's OK. It's the same as in programming to check interface compatability assert isinstance(resource, Resource) and then assert isinstance(resource, TabularResource). That's redundant but costs nothing and can't fail because if e.g. resource is valid tabular-data-resource than resource is valid data-resource. It's a contract.