Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Joris Van den Bossche
    @jorisvandenbossche
    So if there is a conda-forge package built, I can always test that (only problem is that I suppose we can't get the artifact package from a PR, so we can test that first before merging it?)
    Brendan Ward
    @brendan-ward
    @jorisvandenbossche thanks so much for testing it out locally! No idea yet why we are getting DLL loading issues in pytest on Windows and yet build and manual test works fine.
    pyogrio 0.2.0 release is out and should hopefully be enough to get the conda forge Windows build working; if not we can keep patching it. Note: it doesn't have any logic to go looking for GDAL_DATA or proj data folders, similar to what is in Fiona - so if you run into those errors in conda, I'll need to add that...
    Joris Van den Bossche
    @jorisvandenbossche
    Conda-forge on windows is working now!
    I merged the PR, and created an environment with it on my windows machine, and it works fine

    Note: it doesn't have any logic to go looking for GDAL_DATA or proj data folders, similar to what is in Fiona - so if you run into those errors in conda, I'll need to add that...

    I think that will probably come up when looking into wheels ... (but for conda it seems to be working fine)

    One thing I noticed is that I don't directly succeed to read a zip file
    Brendan Ward
    @brendan-ward

    @jorisvandenbossche thank you for getting conda-forge build in place and testing this out.

    I can't test conda on Windows, but a new conda env on Mac w/ latest version of pyogrio from conda-forge works, but with lots of encoding issues that I don't see outside conda:

    >>> df = read_dataframe('/vsizip//.../pyogrio/pyogrio/tests/fixtures/test_fgdb.gdb.zip', layer='test_areas')
    Warning 1: Recode from CP437 to UTF-8 failed with the error: "Invalid argument".
    (repeated many times)

    Geopandas via Fiona appears to read this file too, but with different warnings:

    >>> df = gp.read_file('/vsizip//.../pyogrio/pyogrio/tests/fixtures/test_fgdb.gdb.zip', layer='test_areas')
    /private/tmp/conda_test/conda_test/lib/python3.8/site-packages/geopandas/geodataframe.py:577: RuntimeWarning: Sequential read of iterator was interrupted. Resetting iterator. This can negatively impact the performance.
      for feature in features_lst:
    Raj Gupta
    @rajuthegr8
    Hi everyone,
    I am an undergraduate student at IIT Kharagpur. I am interested in participating in GSoC and would like to contribute to the project "Pure Python GeoPackage IO". I know I am a bit late but I would still like to try.
    I wanted to ask what contributions are expected in the package pgpkg while I am writing a proposal, and what is expected in the project proposal.
    Brendan Ward
    @brendan-ward

    @rajuthegr8 Welcome and thanks for your interest in GSoC!
    In case you haven't seen it, there is some high-level info here: https://github.com/geopandas/geopandas/wiki/Google-Summer-of-Code-2021#pure-python-geopackage-io

    pgpkg is just a starting point; contributions don't necessarily have to be there. Depending on what is involved, it may be appropriate for those changes to land directly in geopandas in the end (but developing them outside geopandas would probably be easiest at the start)

    Brendan Ward
    @brendan-ward
    We'd like a vectorized approach that uses pygeos, so that it is fast (pgpkg already does some of this). We'd like the solution created under this project to produce geopackages that are compatible with the GPKG specification, so there will be a little bit involved to get a good understanding of the spec and how it manifests in the tables stored in the sqlite database that makes it a valid geopackage.
    For sure it needs to support CRS information so that we can round-trip data through a geopackage without any information loss there (the approach in pgpgk isn't quite right; other GIS tools don't recognize the CRS info it creates)
    There are a few awkward bits around encoding the geometry to bytes that are stored for each record in the geopackage; pgpkg does a few hackish things to try and do this, but a better solution is likely needed. (context: geometry bytes in geopackage are a structured set of header bytes plus well-known binary of the actual geometry). We'd like this handled better so that it isn't hackish and meets the spec.
    Aside from the actual implementation, we'd want things like user-facing documentation (probably mostly in docstrings for the read / write functions) and tests.
    Brendan Ward
    @brendan-ward
    Beyond geopackages, there is also room to help complete the support for shapefiles and GeoJSON.
    Benchmarks would be nice too.
    Martin Fleischmann
    @martinfleis
    Hi @rajuthegr8, happy to see the interest! @brendan-ward pretty much covered it all. You can base your proposals on the description on our wiki and the ideas above. At this stage, there's no need to be more technical, we can discuss implementation details together later (though it is certainly welcome to include them if you have something in mind!). We'll share the template you may use later today or tomorrow, so it is easier to write the proposal.
    Adrian Garcia Badaracco
    @adriangb
    This may be useful to test windows builds: https://github.com/webcomics/pywine
    I've been using it to test tensorflow
    Be warned: it's slow
    Sönke Schmachtel
    @srenoes
    Oh wow! Nice projects :-) I have some code for querying OPENAPI and writing that into a geojson stream, which I used earlier to download data. And since in my browser it fairly quickly downloads all the data I started to wonder how fast such a kind of approach would be with the new direct python method.
    Apart from that I could also contribute to other stuff, if you have some list or something that I could crack into .
    Code for benchmarks should be quite easy to write, but also some stuff related to windows should be possible to tackle as I use windows 10 and I have also compiled some stuff. Especially geos related compiling things are not really new to me. GDAL is much much more difficult as it seems
    Raj Gupta
    @rajuthegr8
    I read the GPKG official specification and had a closer look at pgpkg and have a better idea of what must be done. From what I have understood I should probably start to put together a basic proposal after I get the template and in the meanwhile start to send some PRs to make `pgpkg' more compatible with the official specs. Any other suggestions or any advice would be highly appreciated.
    Martin Fleischmann
    @martinfleis
    @rajuthegr8 superb! The template is available here (I have also added the link to the wiki). No need to make PRs yet I would say. Focus on the proposal now and try to understand what needs to be done and how. Then once application period is over we can have a chat about the ideas and agree on the final roadmap.
    Brendan Ward
    @brendan-ward
    @rajuthegr8 Feel free to log issues onto pgpkg to capture your ideas / findings that can be addressed in later PRs (mostly I'm thinking of these of capturing a few of the technical specifics while they are fresh). You can certainly link these into your proposal, but like @martinfleis said, your focus should be on the overall proposal and bigger picture: overall approach, goals, what is your sense of how hard it will be to go to from what is available to meet the goals for the project, etc. (I probably got overly excited by the specifics I listed off above, apologies if that was a bit of a distraction from the overall proposal)
    Brendan Ward
    @brendan-ward
    @rajuthegr8 also see the NumFOCUS (of which GeoPandas is a part) student guide: https://github.com/numfocus/gsoc/blob/master/CONTRIBUTING-students.md
    there are several useful suggestions for putting together your proposal and pre-application activities.
    froast
    @abhinav9414
    hi everyone, i am an undergraduate student at Birla Institute of Technology,Mesra. I would like to contribute to the project "Beautiful maps made simple: a static plotting project". I wanted to ask what contributions are require to make changes from static matplotlib map to a complex map in a production-quality.I am writing project proposal,What are the expectations in the project proposal.
    Martin Fleischmann
    @martinfleis
    Hi @abhinav9414, happy to see the interest! As mentioned in the project idea, part of the work should be a diagnosis of the current state and community needs, based on which we should be able to point at specific changes. A good place to start are issues labelled as "plotting" which should cover the most pressing issues. We don't have a list of changes that need to be made (yet), we hope that the initial stage of the GSoC will produce one. A project proposal template might also help with drafting.
    5 replies
    Thomas Statham
    @tastatham
    Hi Geopandas, I am interested contributing to dask-geopandas through the Google Summer of Code this year. There are a few goals listed on the wiki page and I was wondering whether there was a priority on any one of these?
    Martin Fleischmann
    @martinfleis
    Hi @tastatham, dask-geopandas is in an early stage of development so there is a lot to do. I would say that the ideal roadmap now would be 1) spatial partitioning, 2) spatial indexing, 3) overlapping computation, based on my own experience and needs (which may be biased). Maybe IO... You can check the issues https://github.com/geopandas/dask-geopandas/issues for a discussion on some of these. If you have some needs yourself, feel free to embed them in the proposal. Also note that we have submitted a workshop proposal around dask-geopandas to Dask Summit, so there may be a chance to have a good discussion on priorities during that with a wider range of people.
    Thomas Statham
    @tastatham
    thanks @martinfleis, that's really helpful. Ok, I'll consider in the proposal aspects that are important to my work too (which will probably be useful to others) but the geopandas/dask-geopandas#40 could be really useful!
    Martin Fleischmann
    @martinfleis
    @tastatham check also the initial implementation by Matt Rocklin https://github.com/mrocklin/dask-geopandas and spatialpandas, which implements some aspects of geospatial parallelization using dask - https://github.com/holoviz/spatialpandas.
    Thomas Statham
    @tastatham
    @martinfleis, I'm aware of spatialpandas but I'm not familiar with their API. I'll explore this further because they have some really nice examples.
    Martin Fleischmann
    @martinfleis
    @tastatham the API tends to mimic ours I think. What they have implemented is some form of spatial indexing (maybe only for points? Haven’t checked lately), which may be worth checking.
    Joris Van den Bossche
    @jorisvandenbossche
    @tastatham cool to see interest in dask-geopandas! ;)
    One general comment in addition to what Martin already said: it's indeed in an early stage of development, which also means that we have almost no documentation or examples, not much test coverage, etc. That are of course not "big" topics to focus on, but something that will be useful to give some time along the way.
    Joris Van den Bossche
    @jorisvandenbossche
    And related to that, what I think will be useful at some point: take an (advanced) use case (which can be something from your own work / interest, or a general one like the "NYC taxi data"), implement a full workflow with using dask-geopandas, and see all the issues that you run into / areas that can be optimized.
    Since not a lot of people already did this with their workflow (and we don't have example case studies), I think in general we will still run into quite some issues in the beginning.
    Joris Van den Bossche
    @jorisvandenbossche
    (eg I was planning to replicate the benchmark from a Scipy talk with billions of points and spatial joins from the spatialpandas people; need to look up the source again, to learn from it what we are still missing in dask-geopandas to do this efficiently)
    Thomas Statham
    @tastatham
    thanks @jorisvandenbossche for pointers, the NYC taxi dataset is a good one and there are a number of others listed on datashader that would be worth exploring too and my own work of course
    Martin Fleischmann
    @martinfleis
    Hi @tastatham, @abhinav9414, @rajuthegr8,
    Just a quick reminder that you need to register and submit your proposals (the application shouldn’t be in draft mode) on the GSoC website, summerofcode.withgoogle.com before the deadline, April 13th.
    Joris Van den Bossche
    @jorisvandenbossche
    Hi @tastatham, @abhinav9414, @rajuthegr8, another reminder for the deadline. Also, feel free to share the proposal with us here if you still want to get some feedback on it.
    froast
    @abhinav9414
    hi @martinfleis, @jorisvandenbossche, I made a proposal, but don't know what to write in the section of the coding approach part in the beautiful map and pgpkg. started late so only got a basic understanding of the project, can I get help in this?
    Martin Fleischmann
    @martinfleis
    @abhinav9414 You mean the "Approach" section in the template? It depends. I guess that for plotting project it is fairly straightforward python code based existing matplotlib functionality. In the gpkg IO, there may be some SQL, regarding technology. But it all depends on the rest of the proposal. You haven't shared the draft via GSoC so it is hard to advise. If you need some PRs in related packages it should be there etc.
    froast
    @abhinav9414
    thanku @martinfleis ill share within 2hrs.
    Raj Gupta
    @rajuthegr8
    @martinfleis @jorisvandenbossche I have submitted my draft, I would appreciate any feedback if possible. Thank you and apologies for being submitting so close to the deadline
    Joris Van den Bossche
    @jorisvandenbossche
    @rajuthegr8 I am not sure if I can see the drafts submitted on GSOC's website. Could you share it somewhere else as well? (eg copy it in a google docs or hackmd, where we can comment on it)?
    Thomas Statham
    @tastatham
    @martinfleis and @jorisvandenbossche, I have also completed my draft in Google Docs and would also appreciate some suggestions. Apologies for submitting this so close to the deadline.
    Joris Van den Bossche
    @jorisvandenbossche
    Thanks, taking a quick look at both now
    Martin Fleischmann
    @martinfleis
    @tastatham Thanks! I just checked your proposal and left few (very) minor comments. It looks great! We may end up doing less but it is fine being ambitious :).
    Martin Fleischmann
    @martinfleis
    @rajuthegr8 one comment from my side in the doc, @jorisvandenbossche covered the rest.
    Martin Fleischmann
    @martinfleis
    @abhinav9414 I didn't realise you are planning to combine two projects into a single proposal. I have left a comment regarding that in the document (and didn't review the rest in detail as a result) but I would recommend focusing on a single project.
    Thomas Statham
    @tastatham
    @martinfleis @jorisvandenbossche , thanks for taking a look at the proposal. It's much appreciated.