Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 05 20:47

    dependabot[bot] on pip

    (compare)

  • Aug 05 20:47
    EntilZha closed #172
  • Aug 05 20:47

    EntilZha on master

    Bump urllib3 from 1.26.2 to 1.2… (compare)

  • Aug 05 20:39
    codecov[bot] commented #172
  • Aug 05 20:39
    codecov[bot] commented #172
  • Aug 05 20:36
    dependabot[bot] labeled #172
  • Aug 05 20:36
    dependabot[bot] opened #172
  • Aug 05 20:36

    dependabot[bot] on pip

    Bump urllib3 from 1.26.2 to 1.2… (compare)

  • Jun 04 16:06

    EntilZha on master

    Update DNS (compare)

  • May 27 02:16
    weditor commented #118
  • May 27 02:16
    weditor commented #118
  • Apr 22 19:35
    EntilZha commented #171
  • Apr 22 19:04
    MalongTang edited #171
  • Apr 22 19:03
    MalongTang closed #171
  • Apr 22 18:57
    MalongTang opened #171
  • Mar 17 18:17
    EntilZha commented #170
  • Mar 17 18:16
    stale[bot] unlabeled #170
  • Mar 17 18:16
    EntilZha labeled #170
  • Mar 17 11:07
    stale[bot] labeled #170
  • Mar 17 11:07
    stale[bot] commented #170
Pedro Rodriguez
@EntilZha
although for now, they are just aliases
I could probably find a better way to reduce code duplication though
good idea for tests though
Pedro Rodriguez
@EntilZha
Changing LazyFile to this solves the problem:
class LazyFile(object):
    # pylint: disable=too-few-public-methods,too-many-instance-attributes
    def __init__(self, path, delimiter=None, mode='r', buffering=-1, encoding=None,
                 errors=None, newline=None):
        # pylint: disable=too-many-arguments
        self.path = path
        self.delimiter = delimiter
        self.mode = mode
        self.buffering = buffering
        self.encoding = encoding
        self.errors = errors
        self.newline = newline
        self.file = None

    def __iter__(self):
        if self.file is not None:
            self.file.close()
        self.file = builtins.open(self.path, mode=self.mode, buffering=self.buffering,
                                  encoding=self.encoding, errors=self.errors, newline=self.newline)
        return self.file
however, the file stays open...
Pedro Rodriguez
@EntilZha
This fixes it
class LazyFile(object):
    # pylint: disable=too-few-public-methods,too-many-instance-attributes
    def __init__(self, path, delimiter=None, mode='r', buffering=-1, encoding=None,
                 errors=None, newline=None):
        # pylint: disable=too-many-arguments
        self.path = path
        self.delimiter = delimiter
        self.mode = mode
        self.buffering = buffering
        self.encoding = encoding
        self.errors = errors
        self.newline = newline
        self.file = None

    def __iter__(self):
        with builtins.open(self.path,
                           mode=self.mode,
                           buffering=self.buffering,
                           encoding=self.encoding,
                           errors=self.errors,
                           newline=self.newline) as file_content:
            for line in file_content:
                yield line
Pedro Rodriguez
@EntilZha
Alright, the fix is pushed to master
Pedro Rodriguez
@EntilZha
Anyone have suggestions for the docs? They look good to me but a second oppinion would be nice
Adrian Wielgosik
@adrian17
Nice, a fix that simplifies code is a great fix.
I don't really have to say anything about the docs.
Pedro Rodriguez
@EntilZha
Cool. Will be taking a last lookover, adding some docs to some of the internally used modules/functions/classes, then preparing for release
Pedro Rodriguez
@EntilZha
Well, after a while of wrestling with the release, its done. In releasing 0.4.0 I found an issue where because the wheel distribution depends on the build environment (aka, running on python 2 including enum34 breaking python3 OR running on python3 not including enum34 and breaking python2) I needed to make a hotfix release in 0.4.1
Solution was to completely remove enum34 since I don't use it all that much so its not worth the monolithic hastle (and I dont trust enum-compat either)
Adrian Wielgosik
@adrian17
Hm, google groups? Seems like an overkill, with how little talk happens here and on GH issues. Also making the contents hidden is unusual.
Adrian Wielgosik
@adrian17

Small suggestions:

  • make __str__ show only part of the sequence, as in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...]. Would be even greater if this only evaluated this part, as this would make it usable with infinite generators.*
  • make __repr__ show what it really is; maybe <Sequence: [1, 2, 3, 4...]>or not evaluate at all and show <Sequence wrapping list>, <Sequence wrapping generator>, <Sequence wrapping File>
  • add seq.count as a wrapper for itertools.count? I'm not sure how far down the "replace seq(thing()) with seq.thing()" path you want to go, that may be an overkill.

*(seq(itertools.count()) just killed my computer :P)

Pedro Rodriguez
@EntilZha
I think for __str__ it should do the same thing as lists, which I think prints the whole thing?
the main reason for __repr__ to do the same thing is for ipython notebook
and interactive python terminal sessions
double checking right now, but I believe these use repr
Ya, it uses repr
Pedro Rodriguez
@EntilZha
Thats funny...
Adrian Wielgosik
@adrian17
The issue with __str__ is that the stream may not be list-like, for example big file's contents which you don't plan on collecting to list at any point or an infinite generator.
hm, maybe __str__ and __repr__ could behave differently depending on type of underlying sequence?
Pedro Rodriguez
@EntilZha
That would be possible
right now, there are a list of execution strategies
right now there is only one
but this would be the mechanism to change this I think
Pedro Rodriguez
@EntilZha
Thought I would mention I am working on a new project (starting as a final course project in distributed systems) here https://github.com/EntilZha/GrappaRDD. It uses a C++ MPI library called Grappa as an execution engine to implement RDDs from spark. The interesting part is if it goes well, I might think about making python hooks to it via ScalaFunctional
Pedro Rodriguez
@EntilZha
Getting an odd failure in only Python 3.4 (3.3 and 3.5) work, if anyone has any thoughts that would be useful https://travis-ci.org/EntilZha/PyFunctional/jobs/128396745. It fails on the parallel tests for some reason
Pedro Rodriguez
@EntilZha
Looks like Travis uses 3.4.2 and the bug is fixed in 3.4.4
Pedro Rodriguez
@EntilZha

Planning on releasing 0.7.0 next week, primary features that will be shipping is the parallel execution engine and file compression support. The main website and docs are now at www.pyfunctional.org and docs.pyfunctional.org respectively (readthedocs can't change scalafunctional.readthedocs.org to pyfunctional.readthedocs.org so this was an easy solution).

The release following this one will most likely be 1.0, so open to thoughts on what should go in. So far, would like to revamp the readme docs, the webpage at www.pyfunctional.org (currently a clone of the readme), the actual docs, clarify package description, revisit linq, revisit underscore, revisit supporting more than just sqlite, and/or look into adding an option/support to stream things (rather than force open iterables). All this is on top of my head so there could be more

Mingsterism
@mingsterism
hi guys. im new to spark. just came across pyfunctional . just wondering whats the difference python build in functional library vs pyfunctional
eg: functools /itertools
Pedro Rodriguez
@EntilZha
@mingsterism The primary goal is to mimick the Scala collections and Spark RDD API. functools/itertools can accomplish the same goal, but it will just look different. The other difference is that there are a number of nice features out of the box like reading json/jsonl/text/csv and in the next release parallelism
Mingsterism
@mingsterism
I see. Ok. Thanks for the clarification. @EntilZha
Pedro Rodriguez
@EntilZha
0.7.0 is released now, thanks @ChuyuHsu and versae for the great work https://github.com/EntilZha/PyFunctional/releases/tag/v0.7.0. Also posted on python reddit https://www.reddit.com/r/Python/
Keith Yang
@keitheis
The work of 0.7 release is wonderful. Thank you!
Pedro Rodriguez
@EntilZha
Thanks @keitheis! Definitely wouldn't have had so much without the help of contributors so thank them too! As always, I would love to hear the parts you like about the library, and what is missing or could be improved
Mark Tse
@neverendingqs
Hey! Just got directed to this project by @carlthome from https://github.com/neverendingqs/pyiterable/issues/71#issuecomment-226752854. I'm off to bed, but just wanted to say hi and am interested to see if there's opportunities to consolidate or even jump on board here.
Pedro Rodriguez
@EntilZha
@neverendingqs Hey! It would be great to chat more to see where things could go. I'll spend some time looking over pyiterable and will be on gitter some tomorrow
Looks like docs aren't public for some reason http://pyiterable.readthedocs.io/en/latest/
Would be great to hear about any ideas you have in mind
Mark Tse
@neverendingqs
Woops sorry I updated the readme to point to stable: http://pyiterable.readthedocs.io/en/stable/
Pedro Rodriguez
@EntilZha
@neverendingqs just let me know if you still wanted to chat
Mark Tse
@neverendingqs
Work's been busy so I might just hang around here for now and listen in.
Pedro Rodriguez
@EntilZha
Np. Not sure exactly what I want to do next. A SQL to lineage compiler would be interesting. Would be along the lines of letting you query python objects with SQL in a similar way you can write Spark SQL queries against a data frame. I might also be interested in venturing out of data pipelines to a better _ operator or pattern matching (that would be really hard to do I think, been looking at existing libraries). Open to suggestions