Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    John Vandenberg
    @jayvdb
    It would be nice if we can read in geojson, and even better if we can output the same data as geojson losslessly
    I've been trying to do that with https://github.com/wireservice/csvkit . It reads ok, but its csvjson fails to emit the geojson
    John Vandenberg
    @jayvdb
    @chfw For now, I have created pyexcel/pyexcel-text#29 about geojson
    jaska
    @chfw
    for geojson support, pyexcel needs one more abstraction on top of existing book, sheet. And it is semantic schema, which can makes sense of two dimensional array where Feature, Type, Polygon, Line, Point are mixed in one sheet. Look broadly, such a semantic schema would cover more topics, such as school teacher's exam report in xls, company invoice. Even UK's census sheet had redundant info, which could be made better using a specific semantic schema. What's more, such a schema could be made as a sharable package and people can contribute schema that they knows of.
    pandas had data reader, which I think fails in this category as well.
    they have a bunch of specs for special schemas https://github.com/wireservice/ffs
    csvkit supports reading csv, dbf, fixed, geojson, json, ndjson, xls, xlsx
    John Vandenberg
    @jayvdb
    I've created pyexcel/pyexcel-text#30 for ndjson
    John Vandenberg
    @jayvdb
    the csvkit dbf importer is agate-dbf - (I am not interested in this format ; I only mention it as comparison of features with csvkit/agate)
    something like agate-lookup using static LOV tables stored in https://github.com/wireservice/lookup would be very helpful.
    John Vandenberg
    @jayvdb

    for geojson support, pyexcel needs one more abstraction on top of existing book, sheet.

    I really like the csvkit geojson importer. I want to serialise the schema into a non-schema format, and it does that. It simply puts any complex values into columns as JSON blobs. This lets me work with the non-complex fields easily, and I can reconstruct the geojson afterwards. (after my patches are merged : wireservice/csvkit#868 )

    John Vandenberg
    @jayvdb
    An interesting way to handle lots of repositories https://github.com/gitshelf/gitshelf
    it supports adding links inside the repos , like add commons/ as a sym link to ../pyexcel-commons
    jaska
    @chfw
    A dedicated xlsx reader, pyexcel-xlsxr is recently released. It uses lxml instead of openpyxl or xlrd. Supposedly, it should be faster and will do partial reading.
    Alcides Sorto
    @slonak79
    hi everyone, I'm working with files where the column for the first x number of rows has a default value and I only want to import rows if the rest of the columns have values. skip_empty_rows does not work for me because the first column has a default value. I see we can pass in a custom skip_row_func my idea is to skip a row if key column values are empty, my question is what does skip_row_func have access to? meaning, do I reference columns by name? thank you.
    jaska
    @chfw
    skip_row_func function unfortunately work only with indices of the row. Here is the default implementation for any skip_row/column_func: https://github.com/pyexcel/pyexcel-io/blob/master/pyexcel_io/utils.py#L44
    jaska
    @chfw
    however, that's not end of the world. You may consider iget_array() method which provides powerful data manipulation capabilities. For instance:
    import pyexcel as p
    
    def filter_out(generator):
        for row in generator:
            if row[0] == 'what I do not want':
               pass
            else:
               yield row
    
    data_generator = p.iget_array(
        file_name="your_file.csv"
    )
    p.isave_as(
        array=filter_out(data_generator), 
        dest_file_name="my_filtered.csv"
    )
    Alcides Sorto
    @slonak79
    Thank you
    Ant Zucaro
    @antzucaro
    Hi everyone. I've noticed some regressions with the latest version of openpyxl. Here's the backtrace using a fresh virtualenv:
    Traceback (most recent call last):
      File "test.py", line 3, in <module>
        book = pyexcel.get_book(file_stream=f, file_type="xlsx")
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel/core.py", line 46, in get_book
        book_stream = sources.get_book_stream(**keywords)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel/internal/core.py", line 33, in get_book_stream
        sheets = a_source.get_data()
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel/plugins/sources/memory_input.py", line 34, in get_data
        **self._keywords)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel/plugins/parsers/excel.py", line 21, in parse_file_stream
        file_stream, file_type=self._file_type, **keywords)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel/plugins/parsers/excel.py", line 35, in _parse_any
        anything, file_type=file_type, **keywords)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel_io/io.py", line 71, in get_data
        afile, file_type=file_type, streaming=False, **keywords
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel_io/io.py", line 89, in _get_data
        return load_data(**keywords)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel_io/io.py", line 191, in load_data
        reader.open_stream(file_stream, **keywords)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel_xlsx/xlsxr.py", line 146, in open_stream
        self._load_the_excel_file(file_stream)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/pyexcel_xlsx/xlsxr.py", line 194, in _load_the_excel_file
        read_only=read_only_flag)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/openpyxl/reader/excel.py", line 249, in load_workbook
        ws_parser.parse()
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/openpyxl/reader/worksheet.py", line 130, in parse
        dispatcher[tag_name](element)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/openpyxl/reader/worksheet.py", line 296, in parser_conditional_formatting
        cf = ConditionalFormatting.from_tree(element)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/openpyxl/descriptors/serialisable.py", line 100, in from_tree
        return cls(**attrib)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/openpyxl/formatting/formatting.py", line 33, in __init__
        self.sqref = sqref
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/openpyxl/descriptors/base.py", line 69, in __set__
        value = _convert(self.expected_type, value)
      File "/home/azucaro/tmp/new-pyexcel/lib/python2.7/site-packages/openpyxl/descriptors/base.py", line 59, in _convert
        raise TypeError('expected ' + str(expected_type))
    TypeError: expected <class 'openpyxl.worksheet.cell_range.MultiCellRange'>
    This happens with pyexcel-xlsx==0.5.6, openpyxl==2.5.5. If I revert back to pyexcel-xlsx==0.5.5, openpyxl==2.4.9 the issue is not present.
    Anything I can do to help or troubleshoot, please let me know!
    jaska
    @chfw
    2.5.5 was used because it fixes Mac EPOC dates and v0.5.6 did pass the unit tests. I will need to look it closely.