Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 20 07:53
    antagomir closed #186
  • Jun 20 07:53
    antagomir commented #186
  • Jun 20 07:53

    antagomir on master

    Merge pull request #4 from rOpe… Merge pull request #5 from rOpe… Merge branch 'rOpenGov:master' … and 9 more (compare)

  • Jun 20 07:53
    antagomir closed #217
  • Jun 20 07:07
    antagomir opened #217
  • Jun 20 07:06

    antagomir on devel

    Move sf from Imports to Suggests Fix tests that did not produce … Merge pull request #214 from rO… and 2 more (compare)

  • Jun 19 18:54
    antaldaniel commented #186
  • Jun 19 18:51
    antaldaniel closed #177
  • Jun 19 18:51
    antaldaniel commented #177
  • Jun 19 18:51

    antaldaniel on devel

    replace old vignette removing unnecessary correspond… Merge pull request #216 from an… (compare)

  • Jun 19 18:51
    antaldaniel closed #216
  • Jun 19 18:50
    antaldaniel opened #216
  • Jun 19 18:27
    antaldaniel closed #215
  • Jun 19 18:27
    antaldaniel opened #215
  • Jun 18 13:43

    pitkant on pitkant

    (compare)

  • Jun 18 12:49

    antagomir on master

    Move sf from Imports to Suggests Fix tests that did not produce … Merge pull request #214 from rO… (compare)

  • Jun 18 12:49
    antagomir closed #214
  • Jun 18 12:49
    antagomir commented #214
  • Jun 18 12:40
    pitkant opened #214
  • Jun 18 12:39

    pitkant on pitkant

    Fix tests that did not produce … (compare)

Joona Lehtomäki
@jlehtoma
Mutta kumpikaan ei siis skulaa perus-buildin kanssa
Markus Kainu
@muuankarski

Kiitti, tässä tapauksessa jätän kikkailut pois. Kirjoitin edu- ja edudata-paketit vähän fiksummin uusiksi. Vielä kun kirjoitan vignetet kuntoon yms. niin pyydään testaamaan.

Idea näiden tehtävien jakamiseen on nyt se, että edu-paketin mukana tulee vaan pari demoharjoitusta. Laajemmat harjotukset tulee nyt edudata-paketin mukana, jonka voi valita lähteeksi addinissä (tulee herja jos ei asennettuna). Kolmas harjoitusten lähde voi olla oma custom paketti, jossa pitää olla sama rakenne kuin edudata-paketissa.

Harjotus tarkistamisessa pitää myös valita paketti, josta harjotukset ovat peräisin. Molemmissa defaulttina on edudata-paketti. (se drat voi sittenkin olla kätevä..)

btw, learnr-paketista blogaus: https://blog.rstudio.com/2017/07/11/introducing-learnr/ . Ei sinänsä overlappia tän kanssa, vaatii serverin ja systeemit ja aika paljon käsityötä. Mutta sellasiin olosuhteisiin voisi toimia.

Ja, hei tohon oikealle on tullut mun digiroad lisäys trelloon. Että ainakin se integraatio toimii
Leo Lahti
@antagomir
jeessss
Joo mä ainakin testaan ihan mieluusti, pitäis kehittää omallakin kohdalla tätä hommaa eteenpäin työn puolesta ja tää istuu siihen kyllä hyvin
Leo Lahti
@antagomir
We got some feedback by email, I think by eurostat data people but Im not sure: "Why you not used directly the shapefiles and relied now on RDATA on your github account ? Was ESTAT infrastructure not efficient enough? Do you need another format ? We also provide GeoJSON/TopoJSON and PBF format at http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/. "
Starting to think about this.. I do not remember why we ended up hosting our own copy in RData format - do you remember - @muuankarski
tai @jlehtoma
Markus Kainu
@muuankarski
@jlehtoma could json/topojson as input with sf conversion on the fly be somehow better? Again we have only one shapefile with 5 different resolutions than change every few years
Leo Lahti
@antagomir
if there is not remarkable difference in size & speed then fetching the data straight from eurostat would be preferable over our own secondary copies
it even seems that they might be ready to add other output formats if we ask though not sure
Leo Lahti
@antagomir
Muuten tämmöstäkin ilmottelivat (korjaan saman tien mutta fyi): Small typo on page 388 for the journal article,
Instead of Merge Eurostat data with geodata from Cisco
should be Merge Eurostat data with geodata from Gisco
Plus in source code get_eurostat_geospatial.R
Instead of @title Download Geospatial Data from GISCO
It should be @title Download Geospatial Data from GISCO
Markus Kainu
@muuankarski
ok, hyvä korjaus.
Leo Lahti
@antagomir
GISCO tiimin edustajalta eli ovat ainakin noteerannneet ja kiittelivät eli hyvä homma.
Markus Kainu
@muuankarski
As for data formats, considering the current speed of development within sf and R geospatial in general, I am bit suspicious if they can provide such files curently. In long term definately. Should ask!
Leo Lahti
@antagomir
hmm. I could ask if they can provide RData format as well. In fact I do not see shapefiles in their website, only the GeoJSON files etc.
(unless these are now considered equivalent)
Markus Kainu
@muuankarski
r-spatial/sf#185
will check later today
Leo Lahti
@antagomir
+5
Joona Lehtomäki
@jlehtoma
The only reason I can think of for having RData files if some sort of pre-processing is done on a R object read from a spatial file (i.e. shapefile) OR if size is an issue
As for shapefile vs GeoJSON, again the only reason for using the previous is probably size (GeoJSON is basically uncompressed text file). More generally, shapefile should be avoided, but unfortunately it is still very hard.
Joona Lehtomäki
@jlehtoma
I think sf (or rather GDAL) can process GeoJSON quite well out of the box. GDAL also support reading (but I think not writing) TopoJSON, which can encode data mych more efficiently than GeoJSON (i.e. smaller file size)
Leo Lahti
@antagomir
So to reply to GISCO people, I would say that we are moving away from shapefiles and GeoJSON (which they already provide) is a good option otherwise but the larger size and unmatured (R) tools form a sort of bottleneck which we (and others) are working on. And therefore we have preprocessed and compressed the data in RData files. We could provide the code and propose that they share RData files as well in which case we can switch to that. Would you agree with such reply ?
Markus Kainu
@muuankarski
I will test once home! In an hour
Leo Lahti
@antagomir
prrfect !!
Joona Lehtomäki
@jlehtoma
In case we do some pre-processing, then using RData seems reasonable. If not, I wouldn't bother. Also I think it makes fairly little sense for them to provide RData files, unless they want to facilitate R access without relying on specific packages (such as rgdal/sf) for reading.
Personally, I think it makes sense to provide the data in commonly accessible spatial data formats (e.g. GeoJSON) since reading the data is a fairly small overhead for us (unless we want to get rid of the spatial deps)
Leo Lahti
@antagomir
yep. perhaps the main question is indeed whether they want to support R such that we do not need to host the data copies in github
Markus Kainu
@muuankarski

just a quick look here:
http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/ref-nuts-2013.html makes it clear that we would need to rethink few things with the package.

The main thing being that the geofile we have used contains all the different NUTS-levels and with an inner_join you have been able to subset the geodata to the same levels as your Eurostat attribute data is. Here each level is separated into its own file which will require user to spesify the NUTS level explicitly. Certainly is would be more clear to have them separate, but I kind of like the current behaviour when a single geodata always matches your primary Eurostat data.

With get_eurostat you dont need to specify the NUTS level, and the levels available varies between datasets. However, experienced user should be aware of this and able to download the right geodata, I think.
I will try reading the files properly tomorrow
Leo Lahti
@antagomir
Identification of the level can be automated so I think we could still maintain the current default behavior from user perspective, and at the same time add the option for separate treatments.
Markus Kainu
@muuankarski

Right, we can do that by downloading all the levels, row_binding and merging I suppose.

So, this works fine at NUTS2-level:

library(eurostat)
# 1. Lataa data
sp_data <- get_eurostat("tgs00026", time_format = "raw", stringsAsFactors = FALSE) %>% 
  # filtteroi vuoteen 2014 ja tasolle NUTS-2 (merkkien määrä == 4) eli vaikka FI02
  dplyr::filter(time == 2014, nchar(as.character(geo)) == 4)

# 2. Lataa geodata NUTS3-tasolla (RAAKAA KOODIA)
library(sf)
library(dplyr)
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_2.geojson",
              jsontemp)
nuts2 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)

# 3. yhdistä
map <- left_join(nuts2,sp_data, by = c("NUTS_ID" = "geo"))

# 4. piirrä kartta
library(tmap)
tm_shape(map) +
  tm_polygons("values", 
                    title = "Disposable household\nincomes in 2010",  
                    palette = "Oranges")
I am reading in a geojson file, not the topojson. File size in topojson is marginally smaller, but it contains either epsg (SRID) nor proj4string field when read with st_read()
Markus Kainu
@muuankarski

@jlehtoma can perhaps highlight on that?

1:60 million resolution (most common for such thematic maps) file size is ~800kb, whereas 1:1 million is ~5Mb. Implementing similar cache as we currently have would make this pretty smooth

Leo Lahti
@antagomir
ok sounds feasible
are you thinking we should switch from our own RData files into this ?
I ran into an error with eurostat_geodata so did not check yet how long the processing will take and so how necessary the ready-made RData files are. Regarding file size, we could ask GISCO to share compressed geojson files if that would help with transfer speed
Markus Kainu
@muuankarski
To keep the current behaviour we can just download and rbindall the levels as in this example
library(eurostat)
# 1. Lataa data
sp_data <- get_eurostat("ilc_li01", time_format = "raw", stringsAsFactors = FALSE) %>% 
  # filtteroi vuoteen 2014 ja tasolle NUTS-2 (merkkien määrä == 4) eli vaikka FI02
  dplyr::filter(time == 2016, hhtyp == "A1", currency == "EUR", indic_il == "LI_C_M40")

# 2. Lataa geodata KAIKILLA NUTS-tasolla
library(sf)
library(dplyr)
# NUTS0
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_0.geojson",
              jsontemp)
nuts0 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS1
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_1.geojson",
              jsontemp)
nuts1 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS2
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_2.geojson",
              jsontemp)
nuts2 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
# NUTS0
jsontemp <- tempfile()
download.file("http://ec.europa.eu/eurostat/cache/GISCO/distribution/v1/geojson/nuts-2013/NUTS_RG_60M_2013_4258_LEVL_3.geojson",
              jsontemp)
nuts3 <- sf::st_read(jsontemp, stringsAsFactors = FALSE)
nuts <- rbind(nuts0,nuts1,nuts2,nuts3)

# 3. yhdistä
map <- inner_join(nuts,sp_data, by = c("NUTS_ID" = "geo"))

# 4. piirrä kartta
library(tmap)
tm_shape(map) +
  tm_polygons("values", 
              title = "Poverty thresholds",  
              palette = "Oranges")
Yes, I think we should. Good thing about this is also that data comes from the same domain http://ec.europa.euas with Eurostat-package and requires no new domain to be whitelisted by IT..
Leo Lahti
@antagomir
Do you by heart what is the difference in file size geojson non-compressed vs. compressed
Ok, I will reply to the GISCO guys, I think this is clear. Once they share compressed files, we can (and perhaps should if we ask..) switch to use those.
Leo Lahti
@antagomir
Was it now such that processing of the files can be done on-the-fly ? So we do not need preprocessed RData files due to this ?
@muuankarski
Leo Lahti
@antagomir
OK at least the above example is fast so processing time is not a reason for having our own RData files.
In fact downloading these geojson files is already fast now. Do we really need compressed versions ?
It also comes to mind that we could have readily processed versions in the R package data/ folder to avoid downloads entirely..
Markus Kainu
@muuankarski
I think we can manage with what they currently provide! No need for compressed files!
Leo Lahti
@antagomir
Yes I thought so too.