Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 15:33

    antagomir on master

    ok (compare)

  • 14:38

    pitkant on master

    Function tests pass without int… (compare)

  • 13:00

    pitkant on master

    More clean imports from functio… Example to \dontrun, avoid test… (compare)

  • 09:10

    pitkant on master

    codecov.yml to .Rbuildignore (compare)

  • 08:29

    antagomir on master

    ok (compare)

  • May 11 15:11

    antagomir on master

    ok (compare)

  • May 11 14:49

    antagomir on master

    added codecov config file (compare)

  • May 11 14:40

    antagomir on master

    ok (compare)

  • May 11 14:36

    antagomir on master

    codecov link update (compare)

  • May 11 13:13

    pitkant on master

    Small updates (compare)

  • May 10 19:48

    antagomir on master

    cran checks ok (compare)

  • May 10 15:42

    pitkant on master

    More clearly stated where funct… Edited contributors, removed un… Common items to .gitignore and 5 more (compare)

  • May 09 20:09
    antagomir commented #202
  • May 09 17:59
    retostauffer commented #202
  • May 09 10:50

    antagomir on master

    ok (compare)

  • May 09 10:48
    antagomir labeled #211
  • May 09 10:48
    antagomir opened #211
  • May 09 10:22

    antagomir on master

    added stringi back to imports (compare)

  • May 09 10:02
    antagomir commented #202
  • May 09 09:57

    antagomir on master

    Finalizing PR on #202 Testing before cran (compare)

Leo Lahti
@antagomir
if there is not remarkable difference in size & speed then fetching the data straight from eurostat would be preferable over our own secondary copies
it even seems that they might be ready to add other output formats if we ask though not sure
Leo Lahti
@antagomir
Muuten tämmöstäkin ilmottelivat (korjaan saman tien mutta fyi): Small typo on page 388 for the journal article,
Instead of Merge Eurostat data with geodata from Cisco
should be Merge Eurostat data with geodata from Gisco
Plus in source code get_eurostat_geospatial.R
Instead of @title Download Geospatial Data from GISCO
It should be @title Download Geospatial Data from GISCO
Markus Kainu
@muuankarski
ok, hyvä korjaus.
Leo Lahti
@antagomir
GISCO tiimin edustajalta eli ovat ainakin noteerannneet ja kiittelivät eli hyvä homma.
Markus Kainu
@muuankarski
As for data formats, considering the current speed of development within sf and R geospatial in general, I am bit suspicious if they can provide such files curently. In long term definately. Should ask!
Leo Lahti
@antagomir
hmm. I could ask if they can provide RData format as well. In fact I do not see shapefiles in their website, only the GeoJSON files etc.
(unless these are now considered equivalent)
Markus Kainu
@muuankarski
r-spatial/sf#185
will check later today
Leo Lahti
@antagomir
+5
Joona Lehtomäki
@jlehtoma
The only reason I can think of for having RData files if some sort of pre-processing is done on a R object read from a spatial file (i.e. shapefile) OR if size is an issue
As for shapefile vs GeoJSON, again the only reason for using the previous is probably size (GeoJSON is basically uncompressed text file). More generally, shapefile should be avoided, but unfortunately it is still very hard.
Joona Lehtomäki
@jlehtoma
I think sf (or rather GDAL) can process GeoJSON quite well out of the box. GDAL also support reading (but I think not writing) TopoJSON, which can encode data mych more efficiently than GeoJSON (i.e. smaller file size)
Leo Lahti
@antagomir
So to reply to GISCO people, I would say that we are moving away from shapefiles and GeoJSON (which they already provide) is a good option otherwise but the larger size and unmatured (R) tools form a sort of bottleneck which we (and others) are working on. And therefore we have preprocessed and compressed the data in RData files. We could provide the code and propose that they share RData files as well in which case we can switch to that. Would you agree with such reply ?
Markus Kainu
@muuankarski
I will test once home! In an hour
Leo Lahti
@antagomir
prrfect !!
Joona Lehtomäki
@jlehtoma
In case we do some pre-processing, then using RData seems reasonable. If not, I wouldn't bother. Also I think it makes fairly little sense for them to provide RData files, unless they want to facilitate R access without relying on specific packages (such as rgdal/sf) for reading.
Personally, I think it makes sense to provide the data in commonly accessible spatial data formats (e.g. GeoJSON) since reading the data is a fairly small overhead for us (unless we want to get rid of the spatial deps)
Leo Lahti
@antagomir
yep. perhaps the main question is indeed whether they want to support R such that we do not need to host the data copies in github
Markus Kainu
@muuankarski

just a quick look here:
http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/ref-nuts-2013.html makes it clear that we would need to rethink few things with the package.

The main thing being that the geofile we have used contains all the different NUTS-levels and with an inner_join you have been able to subset the geodata to the same levels as your Eurostat attribute data is. Here each level is separated into its own file which will require user to spesify the NUTS level explicitly. Certainly is would be more clear to have them separate, but I kind of like the current behaviour when a single geodata always matches your primary Eurostat data.

With get_eurostat you dont need to specify the NUTS level, and the levels available varies between datasets. However, experienced user should be aware of this and able to download the right geodata, I think.
I will try reading the files properly tomorrow
Leo Lahti
@antagomir
Identification of the level can be automated so I think we could still maintain the current default behavior from user perspective, and at the same time add the option for separate treatments.
Markus Kainu
@muuankarski

Right, we can do that by downloading all the levels, row_binding and merging I suppose.

So, this works fine at NUTS2-level:

library(eurostat)
# 1. Lataa data
sp_data <- get_eurostat("tgs00026", time_format = "raw", stringsAsFactors = FALSE) %>% 
  # filtteroi vuoteen 2014 ja tasolle NUTS-2 (merkkien määrä == 4) eli vaikka FI02
  dplyr::filter(time == 2014, nchar(as.character(geo)) == 4)

# 2. Lataa geodata NUTS3-tasolla (RAAKAA KOODIA)
library(sf)
library(dplyr)
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_2.geojson",
              jsontemp)
nuts2 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)

# 3. yhdistä
map <- left_join(nuts2,sp_data, by = c("NUTS_ID" = "geo"))

# 4. piirrä kartta
library(tmap)
tm_shape(map) +
  tm_polygons("values", 
                    title = "Disposable household\nincomes in 2010",  
                    palette = "Oranges")
I am reading in a geojson file, not the topojson. File size in topojson is marginally smaller, but it contains either epsg (SRID) nor proj4string field when read with st_read()
Markus Kainu
@muuankarski

@jlehtoma can perhaps highlight on that?

1:60 million resolution (most common for such thematic maps) file size is ~800kb, whereas 1:1 million is ~5Mb. Implementing similar cache as we currently have would make this pretty smooth

Leo Lahti
@antagomir
ok sounds feasible
are you thinking we should switch from our own RData files into this ?
I ran into an error with eurostat_geodata so did not check yet how long the processing will take and so how necessary the ready-made RData files are. Regarding file size, we could ask GISCO to share compressed geojson files if that would help with transfer speed
Markus Kainu
@muuankarski
To keep the current behaviour we can just download and rbindall the levels as in this example
library(eurostat)
# 1. Lataa data
sp_data <- get_eurostat("ilc_li01", time_format = "raw", stringsAsFactors = FALSE) %>% 
  # filtteroi vuoteen 2014 ja tasolle NUTS-2 (merkkien määrä == 4) eli vaikka FI02
  dplyr::filter(time == 2016, hhtyp == "A1", currency == "EUR", indic_il == "LI_C_M40")

# 2. Lataa geodata KAIKILLA NUTS-tasolla
library(sf)
library(dplyr)
# NUTS0
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_0.geojson",
              jsontemp)
nuts0 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS1
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_1.geojson",
              jsontemp)
nuts1 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS2
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_2.geojson",
              jsontemp)
nuts2 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS0
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_3.geojson",
              jsontemp)
nuts3 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
nuts <- rbind(nuts0,nuts1,nuts2,nuts3)

# 3. yhdistä
map <- inner_join(nuts,sp_data, by = c("NUTS_ID" = "geo"))

# 4. piirrä kartta
library(tmap)
tm_shape(map) +
  tm_polygons("values", 
              title = "Poverty thresholds",  
              palette = "Oranges")
Yes, I think we should. Good thing about this is also that data comes from the same domain http://ec.europa.euas with Eurostat-package and requires no new domain to be whitelisted by IT..
Leo Lahti
@antagomir
Do you by heart what is the difference in file size geojson non-compressed vs. compressed
Ok, I will reply to the GISCO guys, I think this is clear. Once they share compressed files, we can (and perhaps should if we ask..) switch to use those.
Leo Lahti
@antagomir
Was it now such that processing of the files can be done on-the-fly ? So we do not need preprocessed RData files due to this ?
@muuankarski
Leo Lahti
@antagomir
OK at least the above example is fast so processing time is not a reason for having our own RData files.
In fact downloading these geojson files is already fast now. Do we really need compressed versions ?
It also comes to mind that we could have readily processed versions in the R package data/ folder to avoid downloads entirely..
Markus Kainu
@muuankarski
I think we can manage with what they currently provide! No need for compressed files!
Leo Lahti
@antagomir
Yes I thought so too.
Ok so I can tell them that this was due to historical reasons & we are just planning switch when the time allows. I may also mention that we still consider at some point having copies of the most common files in the R package in order to avoid download.
Markus Kainu
@muuankarski
That is a something worth considerinf
Joona Lehtomäki
@jlehtoma
@muuankarski could be an issue with GDAL reading TopoJSON or then something funky has been going down in producing the TopoJSON files
No personal experience on the CRSs / TopoJSON tho
But: everything seems to be in order, so carry on :)
Leo Lahti
@antagomir
il
Markus Kainu
@muuankarski

One issue still prevails, as in current implementation of get_eurostat_geospatial user can opt for SpatialPolygonDataFrame , fortified data.frame or sf output. We could provide those conversions "on-the-fly" if we will rely on the json-files from eurostat (now they come preprocessed using download.file()). If we would provide all three it would require following steps on-the-fly.

# =======================================================
# If user passes output_class = "sf" OR does not spesify it (default behaviour)
## Download and return a sf-object
# =======================================================
library(sf)
library(dplyr)
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_0.geojson",
              jsontemp)
shape <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
return(shape)

# =======================================================
# If user passes output_class = "sp" this is done in addition to default behaviour
## Convert sf-object into sp-object SpatialPolygonDataFrame
# =======================================================
shape_sp <- as(shape, "Spatial")
return(shape_sp)

# =======================================================
# If user passes output_class = "data.frame" this is done in addition to steps above
## Convert SpatialPolygonDataFrame into "fortified" regular data.frame to be plotted with ggplot2::geom_polygon
# =======================================================
shape_sp$id <- row.names(shape_sp)
fortified <- broom::tidy(shape_sp)
fortified <- left_join(fortified,shape_sp@data)
return(fortified)

@jlehtoma what do you think, is that feasible to do on-the-fly OR should we provide just a sf-output and nothing else..?

Leo Lahti
@antagomir
sf could be default and others optional ?
Markus Kainu
@muuankarski

Yep, that is the current behavior (in sf-branch), but providing the other options would require adding broom-dependency at least.

I could try with preserving the exact same behavior as currently, but change the source and processing. A new attribute would be nuts_level where user could pass either 0,1,2,3 or all. all would be default allowing the current behavior of subsetting with inner_join only.