Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 13 07:00
    adnanhemani review_requested #409
  • Aug 13 07:00
    adnanhemani opened #409
  • Aug 13 06:58

    adnanhemani on change_to_area

    made change bumped version (compare)

  • Aug 13 06:42
    adnanhemani edited #407
  • Aug 13 06:41
    adnanhemani labeled #406
  • Aug 13 06:41
    adnanhemani labeled #406
  • Aug 13 06:40
    adnanhemani opened #408
  • Aug 13 06:40
    adnanhemani review_requested #408
  • Aug 13 06:39

    adnanhemani on map_table_flexibility

    bump version number (compare)

  • Aug 13 06:37

    adnanhemani on map_table_flexibility

    feature completed (compare)

  • Aug 08 06:38
    papajohn commented #407
  • Aug 08 06:00
    adnanhemani commented #407
  • Aug 07 16:32
    papajohn commented #407
  • Aug 07 06:52

    adnanhemani on maps_examples

    almost completed section 2, blo… (compare)

  • Aug 07 06:33
    adnanhemani opened #407
  • Aug 07 06:31
    adnanhemani opened #406
  • Aug 07 05:47

    adnanhemani on maps_examples

    initial commit on examples (compare)

  • Jul 23 17:40

    adnanhemani on gh-pages

    Generated by commit f154b0d3157… (compare)

  • Jul 23 17:34

    davidwagner on shaded_histograms

    (compare)

  • Jul 23 17:34

    davidwagner on master

    implemented shading for non-ove… Split apart tests of shaded his… remove print statement debugging and 5 more (compare)

Sam Lau
@SamLau95
we haven’t incremented the version number after making that change
Carl Boettiger
@cboettig
right
Sam Lau
@SamLau95
blah, that’s a pain
maybe there’s a way to locally install the datascience package from the github repo on ds8
Chris Holdgraf
@choldgraf
yeah I get the error on the ds8 servers
Sam Lau
@SamLau95
that makes sense
Chris Holdgraf
@choldgraf
I mean, could you theoretically just glone the datascience repo using a terminal in the ipyntbk and then make install from there?
Sam Lau
@SamLau95
yeah, that might work
i have to bother ryan manually to update the package version on the ds8 server so it’s not something i can take care of right this minute
Chris Holdgraf
@choldgraf
hmmm, I just tried cloning + make install and it threw a permissions error
I guess I could try to specify a folder w/in my home directory, but now we are getting into the territory of "too much of a hassle to be a solution for instructors" I think
Carl Boettiger
@cboettig
Okay, well at least my code is running successfully after pulling the latest github copy and running make
(I'm running the the jupyter/datascience-notebook docker image, so I just docker exec'd in and ran make as root, so no permissions error for me ;-) )
So once the update hits the ds8 server, my issue should be resolved. though bumping the version would be good too.
Chris Holdgraf
@choldgraf
cool
@SamLau95 PR made for the python3 check
Sam Lau
@SamLau95
merged! thanks
Carl Boettiger
@cboettig
@choldgraf Another quick puzzle for you:
nasa_temp = "http://climate.nasa.gov/system/internal_resources/details/original/647_Global_Temperature_Data_File.txt"
temp = ds.Table.read_table(nasa_temp, skiprows=range(4), na_values = "*", delim_whitespace=True, 
                    names=["Year", "Annual", "FiveYear"])

## Pandas plots this just fine
temp.to_df().plot()

## datascience not so much
temp.plot("Year")
plt.plot(temp["Year"], temp["Annual"])
Some error about getting a string when it expects a float. I still haven't made sense of the idea that a column in Tables need not have a consistent type. Is that really the case? why?
Sam Lau
@SamLau95

@cboettig this is what i get from the last 5 rows of the temp table:

Year                                 | Annual | FiveYear
2012                                 | 0.63   | 0.67
2013                                 | 0.66   | nan
2014                                 | 0.75   | nan
2015                                 | nan    | nan
------------------------------------ | nan    | nan

looks like you have some missing values and a string, too

Chris Holdgraf
@choldgraf
Yeah it looks like that big ------- is the problem. You could always drop the last row, aka this works:
temp = temp.take(range(temp.num_rows-1))
temp.plot('Year')
Sam Lau
@SamLau95
or, for a slightly more succinct first line:
temp = temp.take[:-1]
Chris Holdgraf
@choldgraf
or you could write a function that does "for each row in this column, try to cast it as an integer. If it errors, return np.nan. Then you could make one more pass and drop any rows == nan
oo @SamLau95 thanks for the tip, didn't know that Table take behaves like pandas iloc
Sam Lau
@SamLau95
yup, it was done in #120
Carl Boettiger
@cboettig
@SamLau95 @choldgraf Really nice, thanks for catching that. Also good to know that take can use [] like iloc, pretty cool; but I agree with the thread in that PR that having both notations is confusing to a beginner; I've already made that mistake on iloc before. Will have to decide what is better to teach the students.
henryem
@henryem
FYI, for the main course, we're talking about not using square bracket indexing at all, at least for the first few weeks
Since that was very confusing for students last semester
Carl Boettiger
@cboettig
@henryem thanks, good to know! Then I probably shouldn't introduce square brackets on my first lesson! I suppose I should just pre-clean the data so none of that is necessary; though I really do hope to convey some sense of "where data comes from" in my area, even if it does mean teaching a little bit of simple data cleaning...
henryem
@henryem
We'll still do indexing, just with method calls instead of []
Also, FYI, they'll see arrays in week 2 and tables in week 3
So if you'll use that stuff earlier then you'll get to teach them however you want
Chris Holdgraf
@choldgraf
I'm +1 on using methods associated with Table objects first...I think one of the main confusions of pandas is the fact that there are so many ways to do the same thing
Carl Boettiger
@cboettig
@henryem Thanks for the heads up about how material will be introduced; that's really helpful (particularly for the first few weeks). Sounds like things might be slightly different than how they were taught in the Fall, at least in the first few weeks then? I'm trying to avoid being needlessly inconsistent with what is taught in the class; (hopefully the students can just tell me the way they have learned if I start doing something unorthodox). Is this description flushed out any further now (at least for the first few weeks)? Or is data8.org and data8.org/text still the best guide for that?
henryem
@henryem
This is just from conversations with John last week, but yes, we're hoping to shave some of the complicated syntax. I don't think there will be huge changes. The only thing written up, afaik, is a rough draft of lab 1 I'm working on, though John might be working on the text. I'm afk at the moment but I definitely agree we should pin down the early curriculum asap.
Carl Boettiger
@cboettig
e.g. one question I have right now is whether I should start with a module on "simulation" (which puts of Tables but needs both loops & function declarations) or a module starting with "real data" that means Tables from the get-go.
Carl Boettiger
@cboettig
@SamLau95 Any update on when the ds8.berkeley.edu site might get the newer version of datascience Tables?
henryem
@henryem
Tables will still come before function declarations and for loops in the main class, I think
Carl Boettiger
@cboettig
@henryem good, that makes sense, I'll start with manipulating real data in Tables then; hopefully will reinforce rather than confuse what they learn in the main class... Finding it hard already to write an intro lesson without [] subsetting...
for instance, which plot for a timeseries is less confusing / most consistent with the core class:
## Tables method: (needs to select column first otherwise it still plots all columns!?)
co2.select(["decimal.date", "average"]).plot("decimal.date")

## matplotlib style, also requires [] subsetting
plt.plot(co2["decimal.date"], co2["average"])
henryem
@henryem
The first one
Though we'll still do everything you can do with [] indexing, except we'll use method calls instead
It's purely a syntactic simplification
Carl Boettiger
@cboettig
cool. It would be nice if Tables.plot were as intelligent as pandas.plot for these line objects though
henryem
@henryem
I don't know if the get-a-column method exists yet
Yeah that's not my department :-/
What's the problem in this case?
Carl Boettiger
@cboettig
no worries. the data.frame we read in has extra columns, and the Tables.plot method tries to plot all columns as additional lines in different colors if we don't select them out.
Pandas syntax is more consise, it would just be co2.plot("decimal.date", "average") without the repetition needed in the datascience call
Carl Boettiger
@cboettig
but that's all pretty minor. I think you've got me on the right path by focusing on the datascience method calls and trying to avoid [] indexing... it does get tricky very fast though; keep wanting to introduce pandas functions here & there where datascience doesn't have an easy way (that I know of) to do what I need. (and I'm just learning python as I go myself; coming from R mostly)