GDAL_DATA
or proj data folders, similar to what is in Fiona - so if you run into those errors in conda, I'll need to add that...
Note: it doesn't have any logic to go looking for GDAL_DATA or proj data folders, similar to what is in Fiona - so if you run into those errors in conda, I'll need to add that...
I think that will probably come up when looking into wheels ... (but for conda it seems to be working fine)
@jorisvandenbossche thank you for getting conda-forge build in place and testing this out.
I can't test conda on Windows, but a new conda env on Mac w/ latest version of pyogrio from conda-forge works, but with lots of encoding issues that I don't see outside conda:
>>> df = read_dataframe('/vsizip//.../pyogrio/pyogrio/tests/fixtures/test_fgdb.gdb.zip', layer='test_areas')
Warning 1: Recode from CP437 to UTF-8 failed with the error: "Invalid argument".
(repeated many times)
Geopandas via Fiona appears to read this file too, but with different warnings:
>>> df = gp.read_file('/vsizip//.../pyogrio/pyogrio/tests/fixtures/test_fgdb.gdb.zip', layer='test_areas')
/private/tmp/conda_test/conda_test/lib/python3.8/site-packages/geopandas/geodataframe.py:577: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
for feature in features_lst:
pgpkg
while I am writing a proposal, and what is expected in the project proposal.
@rajuthegr8 Welcome and thanks for your interest in GSoC!
In case you haven't seen it, there is some high-level info here: https://github.com/geopandas/geopandas/wiki/Google-Summer-of-Code-2021#pure-python-geopackage-io
pgpkg
is just a starting point; contributions don't necessarily have to be there. Depending on what is involved, it may be appropriate for those changes to land directly in geopandas
in the end (but developing them outside geopandas would probably be easiest at the start)
pgpkg
already does some of this). We'd like the solution created under this project to produce geopackages that are compatible with the GPKG specification, so there will be a little bit involved to get a good understanding of the spec and how it manifests in the tables stored in the sqlite database that makes it a valid geopackage.
pgpgk
isn't quite right; other GIS tools don't recognize the CRS info it creates)
pgpkg
does a few hackish things to try and do this, but a better solution is likely needed. (context: geometry bytes in geopackage are a structured set of header bytes plus well-known binary of the actual geometry). We'd like this handled better so that it isn't hackish and meets the spec.
pgpkg
and have a better idea of what must be done. From what I have understood I should probably start to put together a basic proposal after I get the template and in the meanwhile start to send some PRs to make `pgpkg' more compatible with the official specs. Any other suggestions or any advice would be highly appreciated.
pgpkg
to capture your ideas / findings that can be addressed in later PRs (mostly I'm thinking of these of capturing a few of the technical specifics while they are fresh). You can certainly link these into your proposal, but like @martinfleis said, your focus should be on the overall proposal and bigger picture: overall approach, goals, what is your sense of how hard it will be to go to from what is available to meet the goals for the project, etc. (I probably got overly excited by the specifics I listed off above, apologies if that was a bit of a distraction from the overall proposal)
dask-geopandas
is in an early stage of development so there is a lot to do. I would say that the ideal roadmap now would be 1) spatial partitioning, 2) spatial indexing, 3) overlapping computation, based on my own experience and needs (which may be biased). Maybe IO... You can check the issues https://github.com/geopandas/dask-geopandas/issues for a discussion on some of these. If you have some needs yourself, feel free to embed them in the proposal. Also note that we have submitted a workshop proposal around dask-geopandas to Dask Summit, so there may be a chance to have a good discussion on priorities during that with a wider range of people.
spatialpandas
, which implements some aspects of geospatial parallelization using dask - https://github.com/holoviz/spatialpandas.