These are chat archives for rOpenGov/eurostat

21st
Aug 2017
Leo Lahti
@antagomir
Aug 21 2017 12:35
We got some feedback by email, I think by eurostat data people but Im not sure: "Why you not used directly the shapefiles and relied now on RDATA on your github account ? Was ESTAT infrastructure not efficient enough? Do you need another format ? We also provide GeoJSON/TopoJSON and PBF format at http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/. "
Starting to think about this.. I do not remember why we ended up hosting our own copy in RData format - do you remember - @muuankarski
tai @jlehtoma
Markus Kainu
@muuankarski
Aug 21 2017 12:53
@jlehtoma could json/topojson as input with sf conversion on the fly be somehow better? Again we have only one shapefile with 5 different resolutions than change every few years
Leo Lahti
@antagomir
Aug 21 2017 12:54
if there is not remarkable difference in size & speed then fetching the data straight from eurostat would be preferable over our own secondary copies
it even seems that they might be ready to add other output formats if we ask though not sure
Leo Lahti
@antagomir
Aug 21 2017 14:58
Muuten tämmöstäkin ilmottelivat (korjaan saman tien mutta fyi): Small typo on page 388 for the journal article,
Instead of Merge Eurostat data with geodata from Cisco
should be Merge Eurostat data with geodata from Gisco
Plus in source code get_eurostat_geospatial.R
Instead of @title Download Geospatial Data from GISCO
It should be @title Download Geospatial Data from GISCO
Markus Kainu
@muuankarski
Aug 21 2017 15:11
ok, hyvä korjaus.
Leo Lahti
@antagomir
Aug 21 2017 15:12
GISCO tiimin edustajalta eli ovat ainakin noteerannneet ja kiittelivät eli hyvä homma.
Markus Kainu
@muuankarski
Aug 21 2017 15:21
As for data formats, considering the current speed of development within sf and R geospatial in general, I am bit suspicious if they can provide such files curently. In long term definately. Should ask!
Leo Lahti
@antagomir
Aug 21 2017 15:28
hmm. I could ask if they can provide RData format as well. In fact I do not see shapefiles in their website, only the GeoJSON files etc.
(unless these are now considered equivalent)
Markus Kainu
@muuankarski
Aug 21 2017 15:50
r-spatial/sf#185
will check later today
Leo Lahti
@antagomir
Aug 21 2017 15:53
+5
Joona Lehtomäki
@jlehtoma
Aug 21 2017 16:20
The only reason I can think of for having RData files if some sort of pre-processing is done on a R object read from a spatial file (i.e. shapefile) OR if size is an issue
As for shapefile vs GeoJSON, again the only reason for using the previous is probably size (GeoJSON is basically uncompressed text file). More generally, shapefile should be avoided, but unfortunately it is still very hard.
Joona Lehtomäki
@jlehtoma
Aug 21 2017 16:25
I think sf (or rather GDAL) can process GeoJSON quite well out of the box. GDAL also support reading (but I think not writing) TopoJSON, which can encode data mych more efficiently than GeoJSON (i.e. smaller file size)
Leo Lahti
@antagomir
Aug 21 2017 16:37
So to reply to GISCO people, I would say that we are moving away from shapefiles and GeoJSON (which they already provide) is a good option otherwise but the larger size and unmatured (R) tools form a sort of bottleneck which we (and others) are working on. And therefore we have preprocessed and compressed the data in RData files. We could provide the code and propose that they share RData files as well in which case we can switch to that. Would you agree with such reply ?
Markus Kainu
@muuankarski
Aug 21 2017 16:39
I will test once home! In an hour
Leo Lahti
@antagomir
Aug 21 2017 16:39
prrfect !!
Joona Lehtomäki
@jlehtoma
Aug 21 2017 16:47
In case we do some pre-processing, then using RData seems reasonable. If not, I wouldn't bother. Also I think it makes fairly little sense for them to provide RData files, unless they want to facilitate R access without relying on specific packages (such as rgdal/sf) for reading.
Personally, I think it makes sense to provide the data in commonly accessible spatial data formats (e.g. GeoJSON) since reading the data is a fairly small overhead for us (unless we want to get rid of the spatial deps)
Leo Lahti
@antagomir
Aug 21 2017 16:58
yep. perhaps the main question is indeed whether they want to support R such that we do not need to host the data copies in github
Markus Kainu
@muuankarski
Aug 21 2017 18:55

just a quick look here:
http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/ref-nuts-2013.html makes it clear that we would need to rethink few things with the package.

The main thing being that the geofile we have used contains all the different NUTS-levels and with an inner_join you have been able to subset the geodata to the same levels as your Eurostat attribute data is. Here each level is separated into its own file which will require user to spesify the NUTS level explicitly. Certainly is would be more clear to have them separate, but I kind of like the current behaviour when a single geodata always matches your primary Eurostat data.

With get_eurostat you dont need to specify the NUTS level, and the levels available varies between datasets. However, experienced user should be aware of this and able to download the right geodata, I think.
I will try reading the files properly tomorrow
Leo Lahti
@antagomir
Aug 21 2017 20:27
Identification of the level can be automated so I think we could still maintain the current default behavior from user perspective, and at the same time add the option for separate treatments.