Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Oct 12 07:47

    github-actions[bot] on gh-pages

    Built site for eurostat: 3.7.9@… (compare)

  • Oct 11 14:30

    github-actions[bot] on master

    Tidy code (compare)

  • Oct 10 07:37

    dieghernan on master

    Update cache version on gha Re… (compare)

  • Oct 06 14:17

    github-actions[bot] on master

    Tidy code (compare)

  • Oct 06 13:58

    dieghernan on master

    Fix more tests (compare)

  • Oct 06 13:44

    dieghernan on master

    Fix tests after the upgrade of … (compare)

  • Oct 03 19:04
    dieghernan edited #230
  • Oct 02 08:56

    github-actions[bot] on gh-pages

    Built site for eurostat: 3.7.9@… (compare)

  • Oct 02 08:35

    pitkant on cleandeps

    (compare)

  • Oct 02 08:35

    pitkant on master

    Move dependencies to pkgdown Add rvest Continue removing deps and 2 more (compare)

  • Oct 02 08:35
    pitkant closed #232
  • Oct 01 19:16
    dieghernan synchronize #232
  • Oct 01 19:15

    dieghernan on cleandeps

    Align sp handling with current … (compare)

  • Oct 01 18:15
    dieghernan closed #197
  • Oct 01 17:54
    antagomir closed #109
  • Oct 01 17:46
    dieghernan commented #109
  • Oct 01 17:30
    dieghernan commented #232
  • Oct 01 17:21

    github-actions[bot] on gh-pages

    Built site for eurostat: 3.7.9@… (compare)

  • Oct 01 17:19
    antagomir commented #232
  • Oct 01 17:13
    dieghernan synchronize #232
Markus Kainu
@muuankarski
I will try reading the files properly tomorrow
Leo Lahti
@antagomir
Identification of the level can be automated so I think we could still maintain the current default behavior from user perspective, and at the same time add the option for separate treatments.
Markus Kainu
@muuankarski

Right, we can do that by downloading all the levels, row_binding and merging I suppose.

So, this works fine at NUTS2-level:

library(eurostat)
# 1. Lataa data
sp_data <- get_eurostat("tgs00026", time_format = "raw", stringsAsFactors = FALSE) %>% 
  # filtteroi vuoteen 2014 ja tasolle NUTS-2 (merkkien määrä == 4) eli vaikka FI02
  dplyr::filter(time == 2014, nchar(as.character(geo)) == 4)

# 2. Lataa geodata NUTS3-tasolla (RAAKAA KOODIA)
library(sf)
library(dplyr)
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_2.geojson",
              jsontemp)
nuts2 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)

# 3. yhdistä
map <- left_join(nuts2,sp_data, by = c("NUTS_ID" = "geo"))

# 4. piirrä kartta
library(tmap)
tm_shape(map) +
  tm_polygons("values", 
                    title = "Disposable household\nincomes in 2010",  
                    palette = "Oranges")
I am reading in a geojson file, not the topojson. File size in topojson is marginally smaller, but it contains either epsg (SRID) nor proj4string field when read with st_read()
Markus Kainu
@muuankarski

@jlehtoma can perhaps highlight on that?

1:60 million resolution (most common for such thematic maps) file size is ~800kb, whereas 1:1 million is ~5Mb. Implementing similar cache as we currently have would make this pretty smooth

Leo Lahti
@antagomir
ok sounds feasible
are you thinking we should switch from our own RData files into this ?
I ran into an error with eurostat_geodata so did not check yet how long the processing will take and so how necessary the ready-made RData files are. Regarding file size, we could ask GISCO to share compressed geojson files if that would help with transfer speed
Markus Kainu
@muuankarski
To keep the current behaviour we can just download and rbindall the levels as in this example
library(eurostat)
# 1. Lataa data
sp_data <- get_eurostat("ilc_li01", time_format = "raw", stringsAsFactors = FALSE) %>% 
  # filtteroi vuoteen 2014 ja tasolle NUTS-2 (merkkien määrä == 4) eli vaikka FI02
  dplyr::filter(time == 2016, hhtyp == "A1", currency == "EUR", indic_il == "LI_C_M40")

# 2. Lataa geodata KAIKILLA NUTS-tasolla
library(sf)
library(dplyr)
# NUTS0
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_0.geojson",
              jsontemp)
nuts0 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS1
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_1.geojson",
              jsontemp)
nuts1 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS2
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_2.geojson",
              jsontemp)
nuts2 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS0
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_3.geojson",
              jsontemp)
nuts3 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
nuts <- rbind(nuts0,nuts1,nuts2,nuts3)

# 3. yhdistä
map <- inner_join(nuts,sp_data, by = c("NUTS_ID" = "geo"))

# 4. piirrä kartta
library(tmap)
tm_shape(map) +
  tm_polygons("values", 
              title = "Poverty thresholds",  
              palette = "Oranges")
Yes, I think we should. Good thing about this is also that data comes from the same domain http://ec.europa.euas with Eurostat-package and requires no new domain to be whitelisted by IT..
Leo Lahti
@antagomir
Do you by heart what is the difference in file size geojson non-compressed vs. compressed
Ok, I will reply to the GISCO guys, I think this is clear. Once they share compressed files, we can (and perhaps should if we ask..) switch to use those.
Leo Lahti
@antagomir
Was it now such that processing of the files can be done on-the-fly ? So we do not need preprocessed RData files due to this ?
@muuankarski
Leo Lahti
@antagomir
OK at least the above example is fast so processing time is not a reason for having our own RData files.
In fact downloading these geojson files is already fast now. Do we really need compressed versions ?
It also comes to mind that we could have readily processed versions in the R package data/ folder to avoid downloads entirely..
Markus Kainu
@muuankarski
I think we can manage with what they currently provide! No need for compressed files!
Leo Lahti
@antagomir
Yes I thought so too.
Ok so I can tell them that this was due to historical reasons & we are just planning switch when the time allows. I may also mention that we still consider at some point having copies of the most common files in the R package in order to avoid download.
Markus Kainu
@muuankarski
That is a something worth considerinf
Joona Lehtomäki
@jlehtoma
@muuankarski could be an issue with GDAL reading TopoJSON or then something funky has been going down in producing the TopoJSON files
No personal experience on the CRSs / TopoJSON tho
But: everything seems to be in order, so carry on :)
Leo Lahti
@antagomir
il
Markus Kainu
@muuankarski

One issue still prevails, as in current implementation of get_eurostat_geospatial user can opt for SpatialPolygonDataFrame , fortified data.frame or sf output. We could provide those conversions "on-the-fly" if we will rely on the json-files from eurostat (now they come preprocessed using download.file()). If we would provide all three it would require following steps on-the-fly.

# =======================================================
# If user passes output_class = "sf" OR does not spesify it (default behaviour)
## Download and return a sf-object
# =======================================================
library(sf)
library(dplyr)
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_0.geojson",
              jsontemp)
shape <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
return(shape)

# =======================================================
# If user passes output_class = "sp" this is done in addition to default behaviour
## Convert sf-object into sp-object SpatialPolygonDataFrame
# =======================================================
shape_sp <- as(shape, "Spatial")
return(shape_sp)

# =======================================================
# If user passes output_class = "data.frame" this is done in addition to steps above
## Convert SpatialPolygonDataFrame into "fortified" regular data.frame to be plotted with ggplot2::geom_polygon
# =======================================================
shape_sp$id <- row.names(shape_sp)
fortified <- broom::tidy(shape_sp)
fortified <- left_join(fortified,shape_sp@data)
return(fortified)

@jlehtoma what do you think, is that feasible to do on-the-fly OR should we provide just a sf-output and nothing else..?

Leo Lahti
@antagomir
sf could be default and others optional ?
Markus Kainu
@muuankarski

Yep, that is the current behavior (in sf-branch), but providing the other options would require adding broom-dependency at least.

I could try with preserving the exact same behavior as currently, but change the source and processing. A new attribute would be nuts_level where user could pass either 0,1,2,3 or all. all would be default allowing the current behavior of subsetting with inner_join only.

Leo Lahti
@antagomir
Sound very good to me. I think we can import one more package but it is true that we are starting to have quite many imported packages. Could investigate at some point if these can be reduced.. or splitting data retrieval and geovisualization components in separate packages is also an option (though perhaps a bit complicated one)
Markus Kainu
@muuankarski
rOpenGov/eurostat@57d1686
now the basic idea is implemented. At least, cache needs to be revised. Can test with:
library(eurostat)
library(dplyr)
library(sf)
# sf
shape_sf <- get_eurostat_geospatial(nuts_level = "0", output_class = "sf")
shape_sf %>% select(NUTS_ID) %>% plot()
# data.frame
shape_df <- get_eurostat_geospatial(nuts_level = "0", output_class = "df")
shape_df %>% ggplot2::ggplot(.) + ggplot2::geom_polygon(aes(x=long,y=lat,group=group,fill=NUTS_ID))
# spdf
shape_spdf <- get_eurostat_geospatial(nuts_level = "0", output_class = "spdf")
sp::spplot(obj = shape_spdf, "NUTS_ID")
Leo Lahti
@antagomir
is this in master now ?
Markus Kainu
@muuankarski
no no, in simplefeatures branch
Leo Lahti
@antagomir
yees
i saw wrong
Markus Kainu
@muuankarski
i double checked..
Leo Lahti
@antagomir
yes it works ! (after adding library(ggplot2) in the beginning)
Markus Kainu
@muuankarski
\o/
Perhaps there is a better method for fetching this json data http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_3.geojson than download.file()...
Leo Lahti
@antagomir
might be
Markus Kainu
@muuankarski
instead of download.file() I implemented it using httr::GETnow, the two options are listed below:
resolution <- "60"
# option 1
resp <- httr::GET(paste0("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_",resolution,"M_2013_4258_LEVL_1.geojson"))
nuts1 <- sf::st_read(httr::content(resp, as="text"), stringsAsFactors = FALSE)
# option 2
jsontemp <- tempfile()
download.file(paste0("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_",resolution,"M_2013_4258_LEVL_1.geojson"), jsontemp)
nuts1 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
Leo Lahti
@antagomir
you are a piece of gold
Markus Kainu
@muuankarski
I had a similar feeling for a second! Now have to hurry back to normal life and being a piece of shit!
Leo Lahti
@antagomir
!
Markus Kainu
@muuankarski
rOpenGov/eurostat@ff6defa here is the fix. Commits do show up nicely in the right side listing ->
Leo Lahti
@antagomir
nicee
Joona Lehtomäki
@jlehtoma
Very nice!
Leo Lahti
@antagomir
FedData package has functionality to download GIS data in US: https://ropensci.org/blog/technotes/2017/08/24/FedData-release
Joona Lehtomäki
@jlehtoma
Was just linking the same here, seems worth exploring
Leo Lahti
@antagomir
!