Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 05 20:47

    dependabot[bot] on pip

    (compare)

  • Aug 05 20:47
    EntilZha closed #172
  • Aug 05 20:47

    EntilZha on master

    Bump urllib3 from 1.26.2 to 1.2… (compare)

  • Aug 05 20:39
    codecov[bot] commented #172
  • Aug 05 20:39
    codecov[bot] commented #172
  • Aug 05 20:36
    dependabot[bot] labeled #172
  • Aug 05 20:36
    dependabot[bot] opened #172
  • Aug 05 20:36

    dependabot[bot] on pip

    Bump urllib3 from 1.26.2 to 1.2… (compare)

  • Jun 04 16:06

    EntilZha on master

    Update DNS (compare)

  • May 27 02:16
    weditor commented #118
  • May 27 02:16
    weditor commented #118
  • Apr 22 19:35
    EntilZha commented #171
  • Apr 22 19:04
    MalongTang edited #171
  • Apr 22 19:03
    MalongTang closed #171
  • Apr 22 18:57
    MalongTang opened #171
  • Mar 17 18:17
    EntilZha commented #170
  • Mar 17 18:16
    stale[bot] unlabeled #170
  • Mar 17 18:16
    EntilZha labeled #170
  • Mar 17 11:07
    stale[bot] labeled #170
  • Mar 17 11:07
    stale[bot] commented #170
Pedro Rodriguez
@EntilZha
but one thing I could do, is that after the file is iterated over, if nothing else happens to it the file stays open. If it is iterated over again, __iter__ is called, so I can close the old file before making a new one
but to close the file taht is iterated over, requires catching the StopIteration exception, which I cant do if I define a custom next
going to test that real quick
Adrian Wielgosik
@adrian17
On another topic... I'm slightly concerned about code duplication for select/map, where/filter. They're basically aliases, the only thing that's really differing between them is the debug lineage text.
Another thing, about tests, it's possible that if you mocked file IO (instead of using real files), you may be able to check if the file is properly opened/closed and iterated no more than necessary.
Just random suggestions I guess
Pedro Rodriguez
@EntilZha
true
I am thinking about sometime in the future, allowing the LINQ funcitons (select/where/etc) to pass strings
which are parsed
somehow
and applied
although for now, they are just aliases
I could probably find a better way to reduce code duplication though
good idea for tests though
Pedro Rodriguez
@EntilZha
Changing LazyFile to this solves the problem:
class LazyFile(object):
    # pylint: disable=too-few-public-methods,too-many-instance-attributes
    def __init__(self, path, delimiter=None, mode='r', buffering=-1, encoding=None,
                 errors=None, newline=None):
        # pylint: disable=too-many-arguments
        self.path = path
        self.delimiter = delimiter
        self.mode = mode
        self.buffering = buffering
        self.encoding = encoding
        self.errors = errors
        self.newline = newline
        self.file = None

    def __iter__(self):
        if self.file is not None:
            self.file.close()
        self.file = builtins.open(self.path, mode=self.mode, buffering=self.buffering,
                                  encoding=self.encoding, errors=self.errors, newline=self.newline)
        return self.file
however, the file stays open...
Pedro Rodriguez
@EntilZha
This fixes it
class LazyFile(object):
    # pylint: disable=too-few-public-methods,too-many-instance-attributes
    def __init__(self, path, delimiter=None, mode='r', buffering=-1, encoding=None,
                 errors=None, newline=None):
        # pylint: disable=too-many-arguments
        self.path = path
        self.delimiter = delimiter
        self.mode = mode
        self.buffering = buffering
        self.encoding = encoding
        self.errors = errors
        self.newline = newline
        self.file = None

    def __iter__(self):
        with builtins.open(self.path,
                           mode=self.mode,
                           buffering=self.buffering,
                           encoding=self.encoding,
                           errors=self.errors,
                           newline=self.newline) as file_content:
            for line in file_content:
                yield line
Pedro Rodriguez
@EntilZha
Alright, the fix is pushed to master
Pedro Rodriguez
@EntilZha
Anyone have suggestions for the docs? They look good to me but a second oppinion would be nice
Adrian Wielgosik
@adrian17
Nice, a fix that simplifies code is a great fix.
I don't really have to say anything about the docs.
Pedro Rodriguez
@EntilZha
Cool. Will be taking a last lookover, adding some docs to some of the internally used modules/functions/classes, then preparing for release
Pedro Rodriguez
@EntilZha
Well, after a while of wrestling with the release, its done. In releasing 0.4.0 I found an issue where because the wheel distribution depends on the build environment (aka, running on python 2 including enum34 breaking python3 OR running on python3 not including enum34 and breaking python2) I needed to make a hotfix release in 0.4.1
Solution was to completely remove enum34 since I don't use it all that much so its not worth the monolithic hastle (and I dont trust enum-compat either)
Adrian Wielgosik
@adrian17
Hm, google groups? Seems like an overkill, with how little talk happens here and on GH issues. Also making the contents hidden is unusual.
Adrian Wielgosik
@adrian17

Small suggestions:

  • make __str__ show only part of the sequence, as in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...]. Would be even greater if this only evaluated this part, as this would make it usable with infinite generators.*
  • make __repr__ show what it really is; maybe <Sequence: [1, 2, 3, 4...]>or not evaluate at all and show <Sequence wrapping list>, <Sequence wrapping generator>, <Sequence wrapping File>
  • add seq.count as a wrapper for itertools.count? I'm not sure how far down the "replace seq(thing()) with seq.thing()" path you want to go, that may be an overkill.

*(seq(itertools.count()) just killed my computer :P)

Pedro Rodriguez
@EntilZha
I think for __str__ it should do the same thing as lists, which I think prints the whole thing?
the main reason for __repr__ to do the same thing is for ipython notebook
and interactive python terminal sessions
double checking right now, but I believe these use repr
Ya, it uses repr
Pedro Rodriguez
@EntilZha
Thats funny...
Adrian Wielgosik
@adrian17
The issue with __str__ is that the stream may not be list-like, for example big file's contents which you don't plan on collecting to list at any point or an infinite generator.
hm, maybe __str__ and __repr__ could behave differently depending on type of underlying sequence?
Pedro Rodriguez
@EntilZha
That would be possible
right now, there are a list of execution strategies
right now there is only one
but this would be the mechanism to change this I think
Pedro Rodriguez
@EntilZha
Thought I would mention I am working on a new project (starting as a final course project in distributed systems) here https://github.com/EntilZha/GrappaRDD. It uses a C++ MPI library called Grappa as an execution engine to implement RDDs from spark. The interesting part is if it goes well, I might think about making python hooks to it via ScalaFunctional
Pedro Rodriguez
@EntilZha
Getting an odd failure in only Python 3.4 (3.3 and 3.5) work, if anyone has any thoughts that would be useful https://travis-ci.org/EntilZha/PyFunctional/jobs/128396745. It fails on the parallel tests for some reason
Pedro Rodriguez
@EntilZha
Looks like Travis uses 3.4.2 and the bug is fixed in 3.4.4
Pedro Rodriguez
@EntilZha

Planning on releasing 0.7.0 next week, primary features that will be shipping is the parallel execution engine and file compression support. The main website and docs are now at www.pyfunctional.org and docs.pyfunctional.org respectively (readthedocs can't change scalafunctional.readthedocs.org to pyfunctional.readthedocs.org so this was an easy solution).

The release following this one will most likely be 1.0, so open to thoughts on what should go in. So far, would like to revamp the readme docs, the webpage at www.pyfunctional.org (currently a clone of the readme), the actual docs, clarify package description, revisit linq, revisit underscore, revisit supporting more than just sqlite, and/or look into adding an option/support to stream things (rather than force open iterables). All this is on top of my head so there could be more

Mingsterism
@mingsterism
hi guys. im new to spark. just came across pyfunctional . just wondering whats the difference python build in functional library vs pyfunctional
eg: functools /itertools
Pedro Rodriguez
@EntilZha
@mingsterism The primary goal is to mimick the Scala collections and Spark RDD API. functools/itertools can accomplish the same goal, but it will just look different. The other difference is that there are a number of nice features out of the box like reading json/jsonl/text/csv and in the next release parallelism
Mingsterism
@mingsterism
I see. Ok. Thanks for the clarification. @EntilZha
Pedro Rodriguez
@EntilZha
0.7.0 is released now, thanks @ChuyuHsu and versae for the great work https://github.com/EntilZha/PyFunctional/releases/tag/v0.7.0. Also posted on python reddit https://www.reddit.com/r/Python/