Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 31 2019 13:24
    predicador37 commented #619
  • Jan 31 2019 13:20
    predicador37 commented #619
  • Jan 17 2019 19:53
    simonm3 closed #1677
  • Jan 17 2019 15:38
    simonm3 opened #1677
  • Jan 07 2019 09:55
    girijesh18 opened #620
  • Dec 30 2018 17:00
    bhipple commented #616
  • Dec 24 2018 21:59
    josh-gree commented #550
  • Dec 10 2018 23:14

    llllllllll on master

    Make odo compatible with Networ… debug: add pip list to investig… echo shell commands to see them… and 1 more (compare)

  • Dec 10 2018 23:14
    llllllllll closed #616
  • Dec 10 2018 23:14
    llllllllll closed #615
  • Dec 05 2018 23:56
    mannharleen opened #619
  • Dec 04 2018 14:00
    ml31415 closed #618
  • Dec 04 2018 14:00
    ml31415 commented #618
  • Dec 04 2018 11:36
    dhirschfeld commented #616
  • Dec 04 2018 11:35
    dhirschfeld review_requested #616
  • Dec 04 2018 11:30
    ml31415 commented #616
  • Dec 04 2018 11:25
    ml31415 commented #616
  • Dec 01 2018 17:22
    ml31415 edited #618
  • Dec 01 2018 17:21
    ml31415 opened #618
  • Nov 29 2018 13:54
    binsi commented #403
Joe Jevnik
@llllllllll
also, all the s3 tests fail with 403s, is the bucket not public?
Daniel Mahler
@mhlr
how can I use sql functions from blaze? in particular I would like to construct a query that uses the postgres levenshtein function. Is there some import that provides a python wrapper for it?
Danilo Horta
@horta
How can I contribute to Blaze and Odo more directly?
Joe Jevnik
@llllllllll
The easiest way would be to open issues or pull requests on github. The only warning I have is that we are often slow to respond to all of the issues that are opened
Joe Jevnik
@llllllllll
I am thinking about killing the in memory python iterator backend. this is really hard to maintain and I don't think anyone uses it. if you have any thoughts, please comment here: blaze/blaze#1610
Joe Jevnik
@llllllllll
@/all sorry for the ping, but I want to be sure that people are aware of this ^
rich fernandez
@richiverse

Is there an easier way of defaulting all dshape columns to optional strings?

def get_ds_columns(ds):
return [item[0] for item in ds.parameters1.dict['_parameters'][0]]

def do_not_infer_types(ds):
return dshape('var * {{\n{}: ?string\n}}'.format(
': ?string,\n'.join(
['"{}"'.format(item) for item in get_ds_columns(ds)
]))
)

this works but is super hacky

Joe Jevnik
@llllllllll
You can create datashapes programmatically. You probably want something like:

Ugh, the gitter phone app is terrible, no newlines
from datashape import var, Record, Option, string
ds = var * Record([(name, Option(string)) for name in get_names()])
rich fernandez
@richiverse
thanks! I changed get_names() to ds1.names
ds[1].names
Joe Jevnik
@llllllllll
that was just a placeholder for howevr you were getting the names before, glad it worked
Daniel Mahler
@mhlr
is there a simple way to read a largish number of small, compressed, record per line json files?
the files do not have 'json' anywhere in the name, just "my-file-01.gz" etc
the total amount of data fits in RAM so efficiency is not a major issue, mainly convenience.
It would also be nice to be able to read the data directly from s3
villafrancog
@villafrancog
Anyone online?
Matthew Frei
@mattfrei27
I'm having a couple of problems with Blaze. Including an apparent mishandling of missing values in the count function with some back ends: http://stackoverflow.com/questions/42170492/python-blaze-count-incorrect-for-some-back-ends. Can anyone help?
Matthew Frei
@mattfrei27
That is my second (and more important) question. Blaze seems to be generating incorrect SQL for a GROUP BY - HAVING operation. Any help would be greatly appreciated! http://stackoverflow.com/questions/42171715/blaze-generating-invalid-sql-for-simple-sql-having-style-query
Dave Hirschfeld
@dhirschfeld
@mattfrei - I'm not entirely sure blaze is currently being actively worked on :/ If you haven't already I'd open issues on GH which could reference your SO questions so that they're not lost...
Matthew Frei
@mattfrei27
That seems to be true. Thanks I'll do that.
Ethan Tang
@ethantang95
Hey guys, I was wondering if Blaze can serve data from a distributed file system such as HDFS into a numpy array to be fed into tensorflow/keras/pytorch
EG: folders of images, or CSV
Eyal Ben Zion
@benzione
import blaze as bz
from time import time

def f(a, b):
    return a * b

t = time()
data1 = bz.data('File_INPUT.csv')
print('Load file time %.4f' % (time() - t))

t = time()
data2 = bz.transform(data1, PRICE = f(data1.MARKET_PRICE, data1.QTY))
print('Price buy time %.4f' % (time() - t))

t = time()
data3 = data2[data2.BUY_SELL_IND_ID == 1]
print('Select buy time %.4f' % (time() - t))

t = time()
data4 = bz.by(bz.merge(data3.DATE, data3.ID), BUY_SUM=data3.PRICE.sum())
print('Group by time %.4f' % (time() - t))

t = time()
data5 = data2[data2.BUY_SELL_IND_ID == 3]
print('Select buy time %.4f' % (time() - t))

t = time()
data6 = bz.by(bz.merge(data5.DATE, data5.ID), SELL_SUM=data5.PRICE.sum())
print('Group by time %.4f' % (time() - t))

t = time()
data7 = bz.join(data4, data6)
print('Join time %.4f' % (time() - t))

t = time()
bz.odo(data7, 'File_OUTPUT.csv')
print('Write to file time %.4f' % (time() - t))
I got:
Load file time 0.0500
Price buy time 0.0010
Select buy time 0.0010
Group by time 0.0020
Select buy time 0.0010
Group by time 0.0010
Join time 0.0010
Traceback (most recent call last):
  File "test.py", line 36, in <module>
    bz.odo(data7, 'Output/Sell_buy_price.csv')
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\odo\odo.py", line 91, in odo
    return into(target, source, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\multipledispatch\dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\interactive.py", line 404, in into
    result = compute(b, return_type='native', **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\multipledispatch\dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\interactive.py", line 195, in compute
    return compute(expr, resources, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\multipledispatch\dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 409, in compute
    result = top_then_bottom_then_top_again_etc(expr3, d4, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 153, in top_then_bottom_then_top_again_etc
    return compute_down(expr, *leaf_data, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\multipledispatch\dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\chunks.py", line 55, in compute_down
    return compute(agg_expr, {agg: intermediate})
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\multipledispatch\dispatcher.py", line 164, in __call__
    return func(*args, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 409, in compute
    result = top_then_bottom_then_top_again_etc(expr3, d4, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 158, in top_then_bottom_then_top_again_etc
    expr2, scope2 = bottom_up_until_type_break(expr, scope, **kwargs)
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 301, in bottom_up_until_type_break
    for i in inputs])
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 301, in <listcomp>
    for i in inputs])
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 301, in bottom_up_until_type_break
    for i in inputs])
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 301, in <listcomp>
    for i in inputs])
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 301, in bottom_up_until_type_break
    for i in inputs])
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 301, in <listcomp>
    for i in inputs])
  File "D:\Users\eyal.benzion\AppData\Local\Programs\Python\Python36\lib\site-packages\blaze\compute\core.py", line 301, in bottom_up_until_type_break
    for i in inputs])
ValueError: not enough values to unpack (expected 2, got 0)
What I do wrong?
Priyabrata Dash
@bobquest33
Hi
anuone available
anyone*
i need a quick help
I have a blaze dataframe
df_feed
dshape("""20 * {
id: string,
link: string,
published: datetime,
title: string,
summary: string
}""")
with dshape
when i am doing odo(df_feed,"feed.csv",dshape='var * {id: string, link: string, published: datetime, title: string, summary: string}')
i am getting error

ValueError Traceback (most recent call last)

<ipython-input-41-2913f9310573> in <module>()
----> 1 odo(df_feed,"feed.csv",dshape='var * {id: string, link: string, published: datetime, title: string, summary: string}')

~\rcs\Miniconda36\Lib\site-packages\odo\odo.py in odo(source, target, kwargs)
89 odo.append.append - Add things onto existing things
90 """
---> 91 return into(target, source,
kwargs)

~\rcs\Miniconda36\Lib\site-packages\multipledispatch\dispatcher.py in call(self, args, **kwargs)
162 self._cache[types] = func
163 try:
--> 164 return func(
args, **kwargs)
165
166 except MDNotImplementedError:

~\rcs\Miniconda36\Lib\site-packages\blaze\interactive.py in into(a, b, kwargs)
404 result = compute(b, return_type='native',
kwargs)
405 kwargs['dshape'] = b.dshape
--> 406 return into(a, result, **kwargs)
407
408

~\rcs\Miniconda36\Lib\site-packages\multipledispatch\dispatcher.py in call(self, args, **kwargs)
162 self._cache[types] = func
163 try:
--> 164 return func(
args, **kwargs)
165
166 except MDNotImplementedError:

~\rcs\Miniconda36\Lib\site-packages\odo\into.py in wrapped(args, **kwargs)
41 raise TypeError('dshape argument is not an instance of DataShape')
42 kwargs['dshape'] = dshape
---> 43 return f(
args, **kwargs)
44 return wrapped
45

~\rcs\Miniconda36\Lib\site-packages\odo\into.py in into_string(uri, b, dshape, kwargs)
141
142 a = resource(uri, dshape=resource_ds, expected_dshape=dshape,
kwargs)
--> 143 return into(a, b, dshape=dshape, **kwargs)
144
145

~\rcs\Miniconda36\Lib\site-packages\multipledispatch\dispatcher.py in call(self, args, **kwargs)
162 self._cache[types] = func
163 try:
--> 164 return func(
args, **kwargs)
165
166 except MDNotImplementedError:

~\rcs\Miniconda36\Lib\site-packages\odo\into.py in wrapped(args, **kwargs)
41 raise TypeError('dshape argument is not an instance of DataShape')
42 kwargs['dshape'] = dshape
---> 43 return f(
args, **kwargs)
44 return wrapped
45

~\rcs\Miniconda36\Lib\site-packages\odo\into.py in into_object(target, source, dshape, kwargs)
129 if dshape is None:
130 dshape = discover(source)
--> 131 return append(target, source, dshape=dshape,
kwargs)
132
133

~\rcs\Miniconda36\Lib\site-packages\multipledispatch\dispatcher.py in call(self, args, **kwargs)
162 self._cache[types] = func
163 try:
--> 164 return func(
args, **kwargs)
165
166 except MDNotImplementedError:

~\rcs\Miniconda36\Lib\site-packages\odo\backends\csv.py in append_object_to_csv(c, seq, kwargs)
249 @append.register(CSV, object)
250 def append_object_to_csv(c, seq,
kwargs):
--> 251 append(c, convert(chunks(pd.DataFrame), seq, kwargs), kwargs)
252 return c
253

~\rcs\Miniconda36\Lib\site-packages\multipledispatch\dispatcher.py in call(self, args, **kwargs)
162 self._cache[types] = func
163 try:
--> 164 return func(
args, **kwargs)
165
166 except MDNotImplementedError:

~\rcs\Miniconda36\Lib\site-packages\odo\backends\csv.py in append_iterator_to_csv(c, cs, kwargs)
285 @append.register(CSV, chunks(pd.DataFrame))
286 def append_iterator_to_csv(c, cs,
kwargs):
--> 287 for chunk in cs:
288 append(c, chunk, **kwargs)
289 return c

~\rcs\Miniconda36\Lib\site-packages\odo\chunks.py in iter(self)
46 dsk['p%d'%i] = (f,)
47 p.append('p%d'%i)
---> 48 self.data = dsk_get(dsk, p)
49 return iter(self.data)
50

~

ValueError: Error parsing datetime string "Tue, 29 May 2018 12:31:22 GMT" at position 0
this is the exact error
under:
~\rcs\Miniconda36\Lib\site-packages\odo\convert.py in list_to_numpy(seq, dshape, **kwargs)
196 not isscalar(dshape)):
197 seq = list(map(tuple, seq))
--> 198 return np.array(seq, dtype=dshape_to_numpy(dshape))
199
200
Dave Hirschfeld
@dhirschfeld
It apparently doesn't know how to convert the string "Tue, 29 May 2018 12:31:22 GMT" to a datetime object so for a quick workaround used published: string for the type
aohan237
@aohan237
is blaze stopped developping? i has been a year from lastest update , compared to dask,which is very active?
Vishesh Mangla
@XtremeGood
hi anyone good with sparse matrices here?
how to create one with a block and size of sparse matrix?
I want to use csr_matrix
but I want it to autopad the matrix with zeros