Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 18 13:38
    papajohn commented #414
  • Sep 18 05:53
    adnanhemani commented #414
  • Sep 18 01:11
    davidwagner opened #414
  • Sep 17 22:21

    adnanhemani on gh-pages

    Generated by commit 23b4f5f6482… (compare)

  • Sep 17 22:04

    adnanhemani on gh-pages

    Generated by commit 23b4f5f6482… (compare)

  • Sep 17 21:13

    davidwagner on fix_version_15_2

    (compare)

  • Sep 17 21:13

    davidwagner on master

    fixed bug from previous version CHANGELOG Delete top_movies.csv Accident… and 2 more (compare)

  • Sep 17 21:13
    davidwagner closed #413
  • Sep 17 20:37

    adnanhemani on gh-pages

    Generated by commit ec024b264ff… (compare)

  • Sep 17 20:32

    davidwagner on travis_doesnt_like_libgfortran

    (compare)

  • Sep 17 20:32

    davidwagner on master

    Try removing libgfortran depend… Merge pull request #412 from da… (compare)

  • Sep 17 20:32
    davidwagner closed #412
  • Sep 16 17:04
    SamLau95 commented #412
  • Sep 16 06:23
    adnanhemani commented #412
  • Sep 16 06:16

    adnanhemani on fix_version_15_2

    Delete hist_workout_2.ipynb Ac… (compare)

  • Sep 16 06:16
    adnanhemani synchronize #413
  • Sep 16 06:15

    adnanhemani on fix_version_15_2

    Delete top_movies.csv Accident… (compare)

  • Sep 16 06:15
    adnanhemani synchronize #413
  • Sep 16 06:09
    adnanhemani review_requested #413
  • Sep 16 06:09
    adnanhemani review_requested #413
Carl Boettiger
@cboettig
@SamLau95 From the Jupyter homepage there's an option to open a new Python 3 notebook, and under it an option to open a terminal which runs in the browser. It's really convenient since I can then move my notebooks between my local machine, github, the hosted ds8.berkeley.edu site; seems to be pretty much a fully functional bash terminal except for the lack of copy-paste which is a bit of a nuisance.
Sam Lau
@SamLau95

@cboettig ah i see, this is running on your local machine (eg. localhost)? it looks like if you open a terminal window and run

git clone https://github.com/deculler/TableDemos

you can switch back to the ‘Files’ tab and open the notebooks locally

Carl Boettiger
@cboettig
nope, terminal is running on ds8.berkeley.edu instance (though also works locally)
Sam Lau
@SamLau95
ah, gotcha. that command should work both on ds8 and locally
Carl Boettiger
@cboettig
@SamLau95 Any recommendation for a style guide for python? In particular, looking for clarification on when to use the notation x.method() vs the notation method(x)
Stefan van der Walt
@stefanv
Is this specifically wrt numpy?
Carl Boettiger
@cboettig
anything really, it's all pretty new to me. does style differ wrt different modules?
Stefan van der Walt
@stefanv
I think numpy has both class and normal methods available due to history, but in most other packages you have only one or the other.
Most of the time, we try to use Python's rich containers in combination with functions. Only when there's a lot of state being dragged around the system do we start looking at objects even.
Carl Boettiger
@cboettig
that seems like a good starting point. so just to be clear, you call method(object) function notation, and object.method to be object notation?
it seems that plotting starts off with method notation in the textbook, but switch later into object notation?
Sam Lau
@SamLau95

@cboettig do you have an example? we use stuff like

foo_table.hist()

quite a bit, but when we needed plotting functionality that the table didn’t provide we switched to using matplotlib directly instead (the plots variable). for example, in http://data8.org/text/4_prediction.html#correlation we make a scatter plot using

plots.scatter(ht_pw['mat_ht'], ht_pw['mat_pw'], s=5, color='gold')
plots.xlabel('height (inches)')
plots.ylabel('pregnancy weight (pounds)’)

this is because the table class didn’t provide the functionality we wanted in order to make these plots; we typically try to use the Table class as much as possible

tap2k
@tap2k
Do I need to do anything special to show a Marker once Ive called the function?
for maps
tap2k
@tap2k
basically the functionality Im looking for is to add circles one by one and then show the resulting map. possible?
tap2k
@tap2k
also - is there a way to change the datatype of a column? I need the lat, long values to be treated as floats for Circle.map to work
Sam Lau
@SamLau95

@tap2k i believe they’ll show up automatically — checkout david wagner’s lecture on privacy: http://data8.org/text/slides/lec10.pdf

i think what you want has been done before

if you have a table called foo_table and a column called x with integers, you can change the type to floats with Table.apply like:

foo_table.apply(float, 'x')
tap2k
@tap2k
thanks! helpful
tap2k
@tap2k
when I use the Circle.map or Marker.map functions how do I set the map center and zoom?
Sam Lau
@SamLau95
not super sure myself — @papajohn @alvinwan , do you know how?
tap2k
@tap2k
figured that out
one issue I had is that Circle or Marker.map is not the last expression in the cell the map doesnt show. this makes it hard to wrap in a function unless someone has a workaround
Carl Boettiger
@cboettig
Curious if any connectors are touching on sql databases. Just need some very simple imports from postgres, doesn't look like psycopg2 is available.
Guess I could just export the data to csv for them, but torn between wanting to give a light exposure to databases vs just streamlining things
Stefan van der Walt
@stefanv
@cboettig Is postgres the system you have to use? Because Python has built in sqlite, if you can port the DB to that.
Carl Boettiger
@cboettig
yeah, in this particular case data is already in postgres, though pedagogically maybe sqlite makes more sense then
tap2k
@tap2k
ok got it - chalk it up to the stupid question dept
henryem
@henryem
@cboettig Tables support a lot of SQL-like things (join, group by, where) that they learn about in the first month or so of the base class. Could be easier to introduce database operations that way, without a new language. (Sorry if you're already aware of that!)
Carl Boettiger
@cboettig
@henryem right, yes, and I'd probably stick with doing most of these manipulations in tables (though I'm still learning those myself!). More just wondering about it as a data ingest step -- I often see students struggle with importing data from databases even when they are already well equipped to manipulate the data within a given framework once the data are imported. Teaching all that is of course beyond the scope of what I can get into a connector, but just wondering if it's worth giving some glimpse of data read/parse command that isn't csv. More a pedagogy issue than a technical one I suppose, and I'm still on the fence.
henryem
@henryem
Ah, I see. That sounds cool.
Carl Boettiger
@cboettig
Hmm, struggling to figure out the best python way to do an operation that is pretty simple in R's dplyr.... I have a table in which one column is a grouping factor, so for each group I want to apply a summary function. Here's my R version: https://gist.github.com/cboettig/7ce0f311daa428b023f9
henryem
@henryem
I'm not 100% familiar with the dplyr syntax, but I think you would say:
values.select(['assessid', 'ssb']).group('assessid', collapsed)
where collapsed is
def collapsed(an_array):
return an_array[-1] < 0.1*max(an_array)
the main difference, as far as I can tell, is that the tables group() will apply the summary function to every column, whereas group_by lets you apply it only to some columns
though I'm not sure what happens to the columns that are not summarized
in dplyr's group_by, I mean
anyway, the .select(['assessid', 'ssb']) pares down the columns to just the grouping factor and the column you wanted to summarize
if you want to summarize several columns in different ways (or the same column in multiple ways) it takes several steps
Carl Boettiger
@cboettig
@henryem Thanks! That looks very promising -- However, I'm a puzzled why I get different results in R vs python now! how is max handling the nan values in python?
henryem
@henryem
Ah, it propagates them, so it will return nan. Looks like there are two options: for max in particular there is nanmax, which ignores nans. In general you could use np.ma.masked_array(my_array, np.isnan(my_array)) to get a view of my_array that doesn't include that nans, and then do whatever computation you wanted on that view.
Carl Boettiger
@cboettig
thanks, that sounds handy. Curiously I don't get any nans in the output from the original python version, but I get a different set of True/False values in the new column...
Carl Boettiger
@cboettig
hmm, looks like I just get an error on calling np.ma.masked_array on a datascience Table object
Stefan van der Walt
@stefanv
I’d steer clear of masked arrays unless you really need them.It’s another layer of complexity on an already complex operation.
Yes, that almost certainly won’t work. NumPy does not know anything about Tables.
Carl Boettiger
@cboettig
right, okay, will avoid that. Meanwhile still puzzled by the handling of nas and the different results between R and python here.
e.g. starting from the gist, https://gist.github.com/cboettig/7ce0f311daa428b023f9 , I see the groupx = values.select(["assessid", "ssb"]).where("assessid", "AFSC-BKINGCRABPI-1960-2008-JENSEN") collapsed(x["ssb"]) returns False
note that x has nan values, so I'd have expected it to return nan. And in R, when dropping nans, it returns true.
Stefan van der Walt
@stefanv
Let me install R quickly and take a look at what you’re expecting
What is “collapsed” supposed to do? Check whether the last element is smalled than 0.1 * max of the array, ignoring nans?