These are chat archives for rOpenGov/eurostat

22nd
Aug 2017
Markus Kainu
@muuankarski
Aug 22 2017 07:20

Right, we can do that by downloading all the levels, row_binding and merging I suppose.

So, this works fine at NUTS2-level:

library(eurostat)
# 1. Lataa data
sp_data <- get_eurostat("tgs00026", time_format = "raw", stringsAsFactors = FALSE) %>% 
  # filtteroi vuoteen 2014 ja tasolle NUTS-2 (merkkien määrä == 4) eli vaikka FI02
  dplyr::filter(time == 2014, nchar(as.character(geo)) == 4)

# 2. Lataa geodata NUTS3-tasolla (RAAKAA KOODIA)
library(sf)
library(dplyr)
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_2.geojson",
              jsontemp)
nuts2 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)

# 3. yhdistä
map <- left_join(nuts2,sp_data, by = c("NUTS_ID" = "geo"))

# 4. piirrä kartta
library(tmap)
tm_shape(map) +
  tm_polygons("values", 
                    title = "Disposable household\nincomes in 2010",  
                    palette = "Oranges")
I am reading in a geojson file, not the topojson. File size in topojson is marginally smaller, but it contains either epsg (SRID) nor proj4string field when read with st_read()
Markus Kainu
@muuankarski
Aug 22 2017 07:25

@jlehtoma can perhaps highlight on that?

1:60 million resolution (most common for such thematic maps) file size is ~800kb, whereas 1:1 million is ~5Mb. Implementing similar cache as we currently have would make this pretty smooth

Leo Lahti
@antagomir
Aug 22 2017 07:28
ok sounds feasible
are you thinking we should switch from our own RData files into this ?
I ran into an error with eurostat_geodata so did not check yet how long the processing will take and so how necessary the ready-made RData files are. Regarding file size, we could ask GISCO to share compressed geojson files if that would help with transfer speed
Markus Kainu
@muuankarski
Aug 22 2017 07:39
To keep the current behaviour we can just download and rbindall the levels as in this example
library(eurostat)
# 1. Lataa data
sp_data <- get_eurostat("ilc_li01", time_format = "raw", stringsAsFactors = FALSE) %>% 
  # filtteroi vuoteen 2014 ja tasolle NUTS-2 (merkkien määrä == 4) eli vaikka FI02
  dplyr::filter(time == 2016, hhtyp == "A1", currency == "EUR", indic_il == "LI_C_M40")

# 2. Lataa geodata KAIKILLA NUTS-tasolla
library(sf)
library(dplyr)
# NUTS0
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_0.geojson",
              jsontemp)
nuts0 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS1
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_1.geojson",
              jsontemp)
nuts1 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS2
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_2.geojson",
              jsontemp)
nuts2 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS0
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_3.geojson",
              jsontemp)
nuts3 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
nuts <- rbind(nuts0,nuts1,nuts2,nuts3)

# 3. yhdistä
map <- inner_join(nuts,sp_data, by = c("NUTS_ID" = "geo"))

# 4. piirrä kartta
library(tmap)
tm_shape(map) +
  tm_polygons("values", 
              title = "Poverty thresholds",  
              palette = "Oranges")
Yes, I think we should. Good thing about this is also that data comes from the same domain http://ec.europa.euas with Eurostat-package and requires no new domain to be whitelisted by IT..
Leo Lahti
@antagomir
Aug 22 2017 10:06
Do you by heart what is the difference in file size geojson non-compressed vs. compressed
Ok, I will reply to the GISCO guys, I think this is clear. Once they share compressed files, we can (and perhaps should if we ask..) switch to use those.
Leo Lahti
@antagomir
Aug 22 2017 10:23
Was it now such that processing of the files can be done on-the-fly ? So we do not need preprocessed RData files due to this ?
@muuankarski
Leo Lahti
@antagomir
Aug 22 2017 10:35
OK at least the above example is fast so processing time is not a reason for having our own RData files.
In fact downloading these geojson files is already fast now. Do we really need compressed versions ?
It also comes to mind that we could have readily processed versions in the R package data/ folder to avoid downloads entirely..
Markus Kainu
@muuankarski
Aug 22 2017 10:45
I think we can manage with what they currently provide! No need for compressed files!
Leo Lahti
@antagomir
Aug 22 2017 10:50
Yes I thought so too.
Ok so I can tell them that this was due to historical reasons & we are just planning switch when the time allows. I may also mention that we still consider at some point having copies of the most common files in the R package in order to avoid download.
Markus Kainu
@muuankarski
Aug 22 2017 11:02
That is a something worth considerinf
Joona Lehtomäki
@jlehtoma
Aug 22 2017 16:10
@muuankarski could be an issue with GDAL reading TopoJSON or then something funky has been going down in producing the TopoJSON files
No personal experience on the CRSs / TopoJSON tho
But: everything seems to be in order, so carry on :)
Leo Lahti
@antagomir
Aug 22 2017 20:08
il