Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 16 17:04
    SamLau95 commented #412
  • Sep 16 06:23
    adnanhemani commented #412
  • Sep 16 06:16

    adnanhemani on fix_version_15_2

    Delete hist_workout_2.ipynb Ac… (compare)

  • Sep 16 06:16
    adnanhemani synchronize #413
  • Sep 16 06:15

    adnanhemani on fix_version_15_2

    Delete top_movies.csv Accident… (compare)

  • Sep 16 06:15
    adnanhemani synchronize #413
  • Sep 16 06:09
    adnanhemani review_requested #413
  • Sep 16 06:09
    adnanhemani review_requested #413
  • Sep 16 06:09
    adnanhemani opened #413
  • Sep 16 06:07

    adnanhemani on fix_version_15_2

    CHANGELOG (compare)

  • Sep 16 06:05

    adnanhemani on fix_version_15_2

    fixed bug from previous version (compare)

  • Sep 16 00:28
    davidwagner commented #409
  • Sep 16 00:25
    davidwagner synchronize #409
  • Sep 16 00:25

    davidwagner on change_to_area

    Bump version number (compare)

  • Sep 16 00:24

    davidwagner on master

    feature completed bump version number Bump version number and 2 more (compare)

  • Sep 16 00:24
    davidwagner closed #408
  • Sep 16 00:24
    davidwagner closed #406
  • Sep 16 00:23
    davidwagner commented #408
  • Sep 16 00:20
    davidwagner synchronize #408
  • Sep 16 00:20

    davidwagner on map_table_flexibility

    implemented to keep index in df… README and version number added comment/doc and 20 more (compare)

Stefan van der Walt
@stefanv
Is this specifically wrt numpy?
Carl Boettiger
@cboettig
anything really, it's all pretty new to me. does style differ wrt different modules?
Stefan van der Walt
@stefanv
I think numpy has both class and normal methods available due to history, but in most other packages you have only one or the other.
Most of the time, we try to use Python's rich containers in combination with functions. Only when there's a lot of state being dragged around the system do we start looking at objects even.
Carl Boettiger
@cboettig
that seems like a good starting point. so just to be clear, you call method(object) function notation, and object.method to be object notation?
it seems that plotting starts off with method notation in the textbook, but switch later into object notation?
Sam Lau
@SamLau95

@cboettig do you have an example? we use stuff like

foo_table.hist()

quite a bit, but when we needed plotting functionality that the table didn’t provide we switched to using matplotlib directly instead (the plots variable). for example, in http://data8.org/text/4_prediction.html#correlation we make a scatter plot using

plots.scatter(ht_pw['mat_ht'], ht_pw['mat_pw'], s=5, color='gold')
plots.xlabel('height (inches)')
plots.ylabel('pregnancy weight (pounds)’)

this is because the table class didn’t provide the functionality we wanted in order to make these plots; we typically try to use the Table class as much as possible

tap2k
@tap2k
Do I need to do anything special to show a Marker once Ive called the function?
for maps
tap2k
@tap2k
basically the functionality Im looking for is to add circles one by one and then show the resulting map. possible?
tap2k
@tap2k
also - is there a way to change the datatype of a column? I need the lat, long values to be treated as floats for Circle.map to work
Sam Lau
@SamLau95

@tap2k i believe they’ll show up automatically — checkout david wagner’s lecture on privacy: http://data8.org/text/slides/lec10.pdf

i think what you want has been done before

if you have a table called foo_table and a column called x with integers, you can change the type to floats with Table.apply like:

foo_table.apply(float, 'x')
tap2k
@tap2k
thanks! helpful
tap2k
@tap2k
when I use the Circle.map or Marker.map functions how do I set the map center and zoom?
Sam Lau
@SamLau95
not super sure myself — @papajohn @alvinwan , do you know how?
tap2k
@tap2k
figured that out
one issue I had is that Circle or Marker.map is not the last expression in the cell the map doesnt show. this makes it hard to wrap in a function unless someone has a workaround
Carl Boettiger
@cboettig
Curious if any connectors are touching on sql databases. Just need some very simple imports from postgres, doesn't look like psycopg2 is available.
Guess I could just export the data to csv for them, but torn between wanting to give a light exposure to databases vs just streamlining things
Stefan van der Walt
@stefanv
@cboettig Is postgres the system you have to use? Because Python has built in sqlite, if you can port the DB to that.
Carl Boettiger
@cboettig
yeah, in this particular case data is already in postgres, though pedagogically maybe sqlite makes more sense then
tap2k
@tap2k
ok got it - chalk it up to the stupid question dept
henryem
@henryem
@cboettig Tables support a lot of SQL-like things (join, group by, where) that they learn about in the first month or so of the base class. Could be easier to introduce database operations that way, without a new language. (Sorry if you're already aware of that!)
Carl Boettiger
@cboettig
@henryem right, yes, and I'd probably stick with doing most of these manipulations in tables (though I'm still learning those myself!). More just wondering about it as a data ingest step -- I often see students struggle with importing data from databases even when they are already well equipped to manipulate the data within a given framework once the data are imported. Teaching all that is of course beyond the scope of what I can get into a connector, but just wondering if it's worth giving some glimpse of data read/parse command that isn't csv. More a pedagogy issue than a technical one I suppose, and I'm still on the fence.
henryem
@henryem
Ah, I see. That sounds cool.
Carl Boettiger
@cboettig
Hmm, struggling to figure out the best python way to do an operation that is pretty simple in R's dplyr.... I have a table in which one column is a grouping factor, so for each group I want to apply a summary function. Here's my R version: https://gist.github.com/cboettig/7ce0f311daa428b023f9
henryem
@henryem
I'm not 100% familiar with the dplyr syntax, but I think you would say:
values.select(['assessid', 'ssb']).group('assessid', collapsed)
where collapsed is
def collapsed(an_array):
return an_array[-1] < 0.1*max(an_array)
the main difference, as far as I can tell, is that the tables group() will apply the summary function to every column, whereas group_by lets you apply it only to some columns
though I'm not sure what happens to the columns that are not summarized
in dplyr's group_by, I mean
anyway, the .select(['assessid', 'ssb']) pares down the columns to just the grouping factor and the column you wanted to summarize
if you want to summarize several columns in different ways (or the same column in multiple ways) it takes several steps
Carl Boettiger
@cboettig
@henryem Thanks! That looks very promising -- However, I'm a puzzled why I get different results in R vs python now! how is max handling the nan values in python?
henryem
@henryem
Ah, it propagates them, so it will return nan. Looks like there are two options: for max in particular there is nanmax, which ignores nans. In general you could use np.ma.masked_array(my_array, np.isnan(my_array)) to get a view of my_array that doesn't include that nans, and then do whatever computation you wanted on that view.
Carl Boettiger
@cboettig
thanks, that sounds handy. Curiously I don't get any nans in the output from the original python version, but I get a different set of True/False values in the new column...
Carl Boettiger
@cboettig
hmm, looks like I just get an error on calling np.ma.masked_array on a datascience Table object
Stefan van der Walt
@stefanv
I’d steer clear of masked arrays unless you really need them.It’s another layer of complexity on an already complex operation.
Yes, that almost certainly won’t work. NumPy does not know anything about Tables.
Carl Boettiger
@cboettig
right, okay, will avoid that. Meanwhile still puzzled by the handling of nas and the different results between R and python here.
e.g. starting from the gist, https://gist.github.com/cboettig/7ce0f311daa428b023f9 , I see the groupx = values.select(["assessid", "ssb"]).where("assessid", "AFSC-BKINGCRABPI-1960-2008-JENSEN") collapsed(x["ssb"]) returns False
note that x has nan values, so I'd have expected it to return nan. And in R, when dropping nans, it returns true.
Stefan van der Walt
@stefanv
Let me install R quickly and take a look at what you’re expecting
What is “collapsed” supposed to do? Check whether the last element is smalled than 0.1 * max of the array, ignoring nans?
Carl Boettiger
@cboettig
yup
Stefan van der Walt
@stefanv
Try replacing max(an_array) with np.nanmax(an_array)
Carl Boettiger
@cboettig
throws error
Stefan van der Walt
@stefanv
Can you show me the error?