Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Pedro Rodriguez
@EntilZha
Getting an odd failure in only Python 3.4 (3.3 and 3.5) work, if anyone has any thoughts that would be useful https://travis-ci.org/EntilZha/PyFunctional/jobs/128396745. It fails on the parallel tests for some reason
Pedro Rodriguez
@EntilZha
Looks like Travis uses 3.4.2 and the bug is fixed in 3.4.4
Pedro Rodriguez
@EntilZha

Planning on releasing 0.7.0 next week, primary features that will be shipping is the parallel execution engine and file compression support. The main website and docs are now at www.pyfunctional.org and docs.pyfunctional.org respectively (readthedocs can't change scalafunctional.readthedocs.org to pyfunctional.readthedocs.org so this was an easy solution).

The release following this one will most likely be 1.0, so open to thoughts on what should go in. So far, would like to revamp the readme docs, the webpage at www.pyfunctional.org (currently a clone of the readme), the actual docs, clarify package description, revisit linq, revisit underscore, revisit supporting more than just sqlite, and/or look into adding an option/support to stream things (rather than force open iterables). All this is on top of my head so there could be more

Mingsterism
@mingsterism
hi guys. im new to spark. just came across pyfunctional . just wondering whats the difference python build in functional library vs pyfunctional
eg: functools /itertools
Pedro Rodriguez
@EntilZha
@mingsterism The primary goal is to mimick the Scala collections and Spark RDD API. functools/itertools can accomplish the same goal, but it will just look different. The other difference is that there are a number of nice features out of the box like reading json/jsonl/text/csv and in the next release parallelism
Mingsterism
@mingsterism
I see. Ok. Thanks for the clarification. @EntilZha
Pedro Rodriguez
@EntilZha
0.7.0 is released now, thanks @ChuyuHsu and versae for the great work https://github.com/EntilZha/PyFunctional/releases/tag/v0.7.0. Also posted on python reddit https://www.reddit.com/r/Python/
Keith Yang
@keitheis
The work of 0.7 release is wonderful. Thank you!
Pedro Rodriguez
@EntilZha
Thanks @keitheis! Definitely wouldn't have had so much without the help of contributors so thank them too! As always, I would love to hear the parts you like about the library, and what is missing or could be improved
Mark Tse
@neverendingqs
Hey! Just got directed to this project by @carlthome from https://github.com/neverendingqs/pyiterable/issues/71#issuecomment-226752854. I'm off to bed, but just wanted to say hi and am interested to see if there's opportunities to consolidate or even jump on board here.
Pedro Rodriguez
@EntilZha
@neverendingqs Hey! It would be great to chat more to see where things could go. I'll spend some time looking over pyiterable and will be on gitter some tomorrow
Looks like docs aren't public for some reason http://pyiterable.readthedocs.io/en/latest/
Would be great to hear about any ideas you have in mind
Mark Tse
@neverendingqs
Woops sorry I updated the readme to point to stable: http://pyiterable.readthedocs.io/en/stable/
Pedro Rodriguez
@EntilZha
@neverendingqs just let me know if you still wanted to chat
Mark Tse
@neverendingqs
Work's been busy so I might just hang around here for now and listen in.
Pedro Rodriguez
@EntilZha
Np. Not sure exactly what I want to do next. A SQL to lineage compiler would be interesting. Would be along the lines of letting you query python objects with SQL in a similar way you can write Spark SQL queries against a data frame. I might also be interested in venturing out of data pipelines to a better _ operator or pattern matching (that would be really hard to do I think, been looking at existing libraries). Open to suggestions
Sivabudh Umpudh
@sivabudh
Can PyFunctional be used with Django's model collections?
Pedro Rodriguez
@EntilZha
Currently there isn't any special handling for it @sivabudh, but if you have something in mind it would be easy to add. For example, pandas doesn't do the right thing with seq, but I am working on detecting it as input and doing the correct thing (it gives a list of columns I believe atm but should give a list of rows)
Sivabudh Umpudh
@sivabudh

@EntilZha thank you for getting back. The use case I had in mind (which might already be supported by PyFunctional; I haven't tried, but thought of checking with the community first), for example, is I want to write the code below a bit more "functionally."

        # `customer` is a Django model, and has a one-to-many relationship with `circuit`
        customer_circuits = customer.circuit_set
        circuit_ids_that_violate_constraints = []
        for circuit_id, new_bandwidth in new_bandwidths.items():
            circuit = customer_circuits.get(cid=circuit_id)

            if new_bandwidth < circuit.bandwidth_minimum or new_bandwidth > circuit.bandwidth_maximum:
                circuit_ids_that_violate_constraints.append(circuit_id)

The code above could be expressed more "functionally" in ruby, for example, along the lines of (just pseudocoding, please don't take it literally):

circuit_ids_that_violate_constraints = circuits
     .select({|circuit| new_bandwidth < circuit.bandwidth_minumum})
     .select({|circuit| new_bandwidth > circuit.bandwidth_maximum})
     .map({|circuit| circuit.cid})
Pedro Rodriguez
@EntilZha
Hmm, I don't know django's collections very easily. I think they do work to push down those filters to Sql so it might be difficult to get it to work at comparable performance
Shiqiao Du
@lucidfrontier45

@EntilZha
Do you know this smart_open library?
https://github.com/RaRe-Technologies/smart_open

I think seq.open can use this library as backend to support S3 or HDFS and remove gzip or bzip2 related codes.

Benjamin Sims
@benjaminsims
Hi, just wondering if there is any support for CSVs with headers
So, csv.DictReader
Pedro Rodriguez
@EntilZha
I looked at smart_open and seems like a good idea to go to that at some point.
@benjaminsims Currently there isn't, but it doesn't seem to hard to implement. Mostly a matter of figuring out the right api. Currently csv.reader is being used in the seq.csv so if its not possible to pass the right parameters to that then it might make sense to add another method like seq.csv_dict etc
Pedro Rodriguez
@EntilZha
If you have an example in mind that would be useful. Looks like csv.DictReader makes each row a named tuple according to a list of field names or if thats not given then the first row of the csv file. Seems like having another method is good here, question is what to name it.
This would play well with this issue as well as it turns out EntilZha/PyFunctional#91
EntilZha/PyFunctional#92
For implementing that
Pedro Rodriguez
@EntilZha
@benjaminsims that has been added via EntilZha/PyFunctional#92. I am planning on doing the 1.0 release sometime in the next week since I would like to have the starmap stuff on pip
Benny Elgazar
@Bennyelg
to_csv but specify append mode?
Pedro Rodriguez
@EntilZha
@Bennyelg Did you try passing mode='wa'. Those are passed to open which if I recall correctly opens the file in append mode
Benny Elgazar
@Bennyelg
map(lambda cell: [cell, cell_level, user_input.area_name]).to_csv('/tmp/test.csv' ,mode='wa').
File "/usr/local/lib/python2.7/dist-packages/functional/pipeline.py", line 1498, in to_csv
with universal_write_open(path, mode=mode, compression=compression) as output:
File "/usr/local/lib/python2.7/dist-packages/functional/io.py", line 221, in universal_write_open
newline=newline)
ValueError: must have exactly one of read/write/append mode
map(lambda cell: [cell, cell_level, user_input.area_name]).to_csv('/tmp/test.csv' ,mode=u'a').
File "/usr/local/lib/python2.7/dist-packages/functional/pipeline.py", line 1501, in to_csv
csv_writer.writerow([six.u(str(element)) for element in row])
TypeError: must be unicode, not str
Pedro Rodriguez
@EntilZha
Sorry, try just 'a'
Ram Janovski
@RamJanovski_twitter
hey Pedro, great package :)
I'm running into what looks like caching issues (second call to a function use the first call results although the parameters are different)
is there a way to disable the caching/lazy nature?
Pedro Rodriguez
@EntilZha
Can you give an example? Hard to say what to do without some more context.
Ram Janovski
@RamJanovski_twitter
The consistency issues came from my code, sorry. but I still need to add .cache() manually when accessing a seq several times, otherwise the second access ends up empty. So, no way to disable lazy/cache optimization?
Pedro Rodriguez
@EntilZha
It would still be useful to have an example. Disabling lazy computation would be hard, but caching is normally only done for a select few cases like calls to array access and repr
Florian Kromer
@fkromer
I want to write a little database wrapper. Is it possible to wrap files = seq.sqlite3(db_path, 'SELECT * FROM file').map(lambda x: FileTableRow(*x)) (FileTableRow is a namedtuple) to something like files(db_path)?
Florian Kromer
@fkromer
Figured out that wrapping a sequence generation into a function is simple:
def files(db_path):
    return seq.sqlite3(db_path, 'SELECT * FROM file').map(lambda x: FileTableRow(*x))
York
@ysyyork
is this lib still actively maintained?
seems a lot of dependencies are outdated
Ⓜ️R. Developer 🇵🇰
@OyyNomi_twitter
Anyone here to help me to convert a small piece of python code into scala?
from sklearn.preprocessing import LabelEncoder

y_train = train_['country_destination']
train_user.drop(['country_destination', 'id'], axis=1, inplace=True)
x_train = train_df.values

label_encoder = LabelEncoder()
encoded_y_train = label_encoder.fit_transform(y_train)
Ⓜ️R. Developer 🇵🇰
@OyyNomi_twitter
Can anyone explain what is train_ ?
And what is train_df.values ?