Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 18 13:38
    papajohn commented #414
  • Sep 18 05:53
    adnanhemani commented #414
  • Sep 18 01:11
    davidwagner opened #414
  • Sep 17 22:21

    adnanhemani on gh-pages

    Generated by commit 23b4f5f6482… (compare)

  • Sep 17 22:04

    adnanhemani on gh-pages

    Generated by commit 23b4f5f6482… (compare)

  • Sep 17 21:13

    davidwagner on fix_version_15_2

    (compare)

  • Sep 17 21:13

    davidwagner on master

    fixed bug from previous version CHANGELOG Delete top_movies.csv Accident… and 2 more (compare)

  • Sep 17 21:13
    davidwagner closed #413
  • Sep 17 20:37

    adnanhemani on gh-pages

    Generated by commit ec024b264ff… (compare)

  • Sep 17 20:32

    davidwagner on travis_doesnt_like_libgfortran

    (compare)

  • Sep 17 20:32

    davidwagner on master

    Try removing libgfortran depend… Merge pull request #412 from da… (compare)

  • Sep 17 20:32
    davidwagner closed #412
  • Sep 16 17:04
    SamLau95 commented #412
  • Sep 16 06:23
    adnanhemani commented #412
  • Sep 16 06:16

    adnanhemani on fix_version_15_2

    Delete hist_workout_2.ipynb Ac… (compare)

  • Sep 16 06:16
    adnanhemani synchronize #413
  • Sep 16 06:15

    adnanhemani on fix_version_15_2

    Delete top_movies.csv Accident… (compare)

  • Sep 16 06:15
    adnanhemani synchronize #413
  • Sep 16 06:09
    adnanhemani review_requested #413
  • Sep 16 06:09
    adnanhemani review_requested #413
Sam Lau
@SamLau95

@tap2k i believe they’ll show up automatically — checkout david wagner’s lecture on privacy: http://data8.org/text/slides/lec10.pdf

i think what you want has been done before

if you have a table called foo_table and a column called x with integers, you can change the type to floats with Table.apply like:

foo_table.apply(float, 'x')
tap2k
@tap2k
thanks! helpful
tap2k
@tap2k
when I use the Circle.map or Marker.map functions how do I set the map center and zoom?
Sam Lau
@SamLau95
not super sure myself — @papajohn @alvinwan , do you know how?
tap2k
@tap2k
figured that out
one issue I had is that Circle or Marker.map is not the last expression in the cell the map doesnt show. this makes it hard to wrap in a function unless someone has a workaround
Carl Boettiger
@cboettig
Curious if any connectors are touching on sql databases. Just need some very simple imports from postgres, doesn't look like psycopg2 is available.
Guess I could just export the data to csv for them, but torn between wanting to give a light exposure to databases vs just streamlining things
Stefan van der Walt
@stefanv
@cboettig Is postgres the system you have to use? Because Python has built in sqlite, if you can port the DB to that.
Carl Boettiger
@cboettig
yeah, in this particular case data is already in postgres, though pedagogically maybe sqlite makes more sense then
tap2k
@tap2k
ok got it - chalk it up to the stupid question dept
henryem
@henryem
@cboettig Tables support a lot of SQL-like things (join, group by, where) that they learn about in the first month or so of the base class. Could be easier to introduce database operations that way, without a new language. (Sorry if you're already aware of that!)
Carl Boettiger
@cboettig
@henryem right, yes, and I'd probably stick with doing most of these manipulations in tables (though I'm still learning those myself!). More just wondering about it as a data ingest step -- I often see students struggle with importing data from databases even when they are already well equipped to manipulate the data within a given framework once the data are imported. Teaching all that is of course beyond the scope of what I can get into a connector, but just wondering if it's worth giving some glimpse of data read/parse command that isn't csv. More a pedagogy issue than a technical one I suppose, and I'm still on the fence.
henryem
@henryem
Ah, I see. That sounds cool.
Carl Boettiger
@cboettig
Hmm, struggling to figure out the best python way to do an operation that is pretty simple in R's dplyr.... I have a table in which one column is a grouping factor, so for each group I want to apply a summary function. Here's my R version: https://gist.github.com/cboettig/7ce0f311daa428b023f9
henryem
@henryem
I'm not 100% familiar with the dplyr syntax, but I think you would say:
values.select(['assessid', 'ssb']).group('assessid', collapsed)
where collapsed is
def collapsed(an_array):
return an_array[-1] < 0.1*max(an_array)
the main difference, as far as I can tell, is that the tables group() will apply the summary function to every column, whereas group_by lets you apply it only to some columns
though I'm not sure what happens to the columns that are not summarized
in dplyr's group_by, I mean
anyway, the .select(['assessid', 'ssb']) pares down the columns to just the grouping factor and the column you wanted to summarize
if you want to summarize several columns in different ways (or the same column in multiple ways) it takes several steps
Carl Boettiger
@cboettig
@henryem Thanks! That looks very promising -- However, I'm a puzzled why I get different results in R vs python now! how is max handling the nan values in python?
henryem
@henryem
Ah, it propagates them, so it will return nan. Looks like there are two options: for max in particular there is nanmax, which ignores nans. In general you could use np.ma.masked_array(my_array, np.isnan(my_array)) to get a view of my_array that doesn't include that nans, and then do whatever computation you wanted on that view.
Carl Boettiger
@cboettig
thanks, that sounds handy. Curiously I don't get any nans in the output from the original python version, but I get a different set of True/False values in the new column...
Carl Boettiger
@cboettig
hmm, looks like I just get an error on calling np.ma.masked_array on a datascience Table object
Stefan van der Walt
@stefanv
I’d steer clear of masked arrays unless you really need them.It’s another layer of complexity on an already complex operation.
Yes, that almost certainly won’t work. NumPy does not know anything about Tables.
Carl Boettiger
@cboettig
right, okay, will avoid that. Meanwhile still puzzled by the handling of nas and the different results between R and python here.
e.g. starting from the gist, https://gist.github.com/cboettig/7ce0f311daa428b023f9 , I see the groupx = values.select(["assessid", "ssb"]).where("assessid", "AFSC-BKINGCRABPI-1960-2008-JENSEN") collapsed(x["ssb"]) returns False
note that x has nan values, so I'd have expected it to return nan. And in R, when dropping nans, it returns true.
Stefan van der Walt
@stefanv
Let me install R quickly and take a look at what you’re expecting
What is “collapsed” supposed to do? Check whether the last element is smalled than 0.1 * max of the array, ignoring nans?
Carl Boettiger
@cboettig
yup
Stefan van der Walt
@stefanv
Try replacing max(an_array) with np.nanmax(an_array)
Carl Boettiger
@cboettig
throws error
Stefan van der Walt
@stefanv
Can you show me the error?
(think the error shows up there, from In [14])
Stefan van der Walt
@stefanv
Hah, I did not expect an_array to be “a_list” :)
I’ll take a quick look at what’s happening underneath the hood
Carl Boettiger
@cboettig
yeah, guess columns in Tables are list objects? I'm still a bit foggy on the difference between a list and an array. is an array a numpy object? for doubles only?
and thanks much for the help!
Stefan van der Walt
@stefanv
I’ve just tried it with the latest version of datascience and it seems to work OK for me
Are you using the same dataset as in the gist?
Carl Boettiger
@cboettig
yup
Stefan van der Walt
@stefanv
So, it looks like there’s at least one column with all NaNs
Carl Boettiger
@cboettig
is the latest version what is on ds8.berkeley.edu? I could switch to that; I'm running Juypter from the jupyter/datascience-notebook docker image, just did a pip install datascience.... not quite sure how to check my module version info
Stefan van der Walt
@stefanv
Ah, I’m running the latest dev version, 0.3.dev21