Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Sep 16 17:04
    SamLau95 commented #412
  • Sep 16 06:23
    adnanhemani commented #412
  • Sep 16 06:16

    adnanhemani on fix_version_15_2

    Delete hist_workout_2.ipynb Ac… (compare)

  • Sep 16 06:16
    adnanhemani synchronize #413
  • Sep 16 06:15

    adnanhemani on fix_version_15_2

    Delete top_movies.csv Accident… (compare)

  • Sep 16 06:15
    adnanhemani synchronize #413
  • Sep 16 06:09
    adnanhemani review_requested #413
  • Sep 16 06:09
    adnanhemani review_requested #413
  • Sep 16 06:09
    adnanhemani opened #413
  • Sep 16 06:07

    adnanhemani on fix_version_15_2

    CHANGELOG (compare)

  • Sep 16 06:05

    adnanhemani on fix_version_15_2

    fixed bug from previous version (compare)

  • Sep 16 00:28
    davidwagner commented #409
  • Sep 16 00:25
    davidwagner synchronize #409
  • Sep 16 00:25

    davidwagner on change_to_area

    Bump version number (compare)

  • Sep 16 00:24

    davidwagner on master

    feature completed bump version number Bump version number and 2 more (compare)

  • Sep 16 00:24
    davidwagner closed #408
  • Sep 16 00:24
    davidwagner closed #406
  • Sep 16 00:23
    davidwagner commented #408
  • Sep 16 00:20
    davidwagner synchronize #408
  • Sep 16 00:20

    davidwagner on map_table_flexibility

    implemented to keep index in df… README and version number added comment/doc and 20 more (compare)

Carl Boettiger
@cboettig
@henryem thanks, good to know! Then I probably shouldn't introduce square brackets on my first lesson! I suppose I should just pre-clean the data so none of that is necessary; though I really do hope to convey some sense of "where data comes from" in my area, even if it does mean teaching a little bit of simple data cleaning...
henryem
@henryem
We'll still do indexing, just with method calls instead of []
Also, FYI, they'll see arrays in week 2 and tables in week 3
So if you'll use that stuff earlier then you'll get to teach them however you want
Chris Holdgraf
@choldgraf
I'm +1 on using methods associated with Table objects first...I think one of the main confusions of pandas is the fact that there are so many ways to do the same thing
Carl Boettiger
@cboettig
@henryem Thanks for the heads up about how material will be introduced; that's really helpful (particularly for the first few weeks). Sounds like things might be slightly different than how they were taught in the Fall, at least in the first few weeks then? I'm trying to avoid being needlessly inconsistent with what is taught in the class; (hopefully the students can just tell me the way they have learned if I start doing something unorthodox). Is this description flushed out any further now (at least for the first few weeks)? Or is data8.org and data8.org/text still the best guide for that?
henryem
@henryem
This is just from conversations with John last week, but yes, we're hoping to shave some of the complicated syntax. I don't think there will be huge changes. The only thing written up, afaik, is a rough draft of lab 1 I'm working on, though John might be working on the text. I'm afk at the moment but I definitely agree we should pin down the early curriculum asap.
Carl Boettiger
@cboettig
e.g. one question I have right now is whether I should start with a module on "simulation" (which puts of Tables but needs both loops & function declarations) or a module starting with "real data" that means Tables from the get-go.
Carl Boettiger
@cboettig
@SamLau95 Any update on when the ds8.berkeley.edu site might get the newer version of datascience Tables?
henryem
@henryem
Tables will still come before function declarations and for loops in the main class, I think
Carl Boettiger
@cboettig
@henryem good, that makes sense, I'll start with manipulating real data in Tables then; hopefully will reinforce rather than confuse what they learn in the main class... Finding it hard already to write an intro lesson without [] subsetting...
for instance, which plot for a timeseries is less confusing / most consistent with the core class:
## Tables method: (needs to select column first otherwise it still plots all columns!?)
co2.select(["decimal.date", "average"]).plot("decimal.date")

## matplotlib style, also requires [] subsetting
plt.plot(co2["decimal.date"], co2["average"])
henryem
@henryem
The first one
Though we'll still do everything you can do with [] indexing, except we'll use method calls instead
It's purely a syntactic simplification
Carl Boettiger
@cboettig
cool. It would be nice if Tables.plot were as intelligent as pandas.plot for these line objects though
henryem
@henryem
I don't know if the get-a-column method exists yet
Yeah that's not my department :-/
What's the problem in this case?
Carl Boettiger
@cboettig
no worries. the data.frame we read in has extra columns, and the Tables.plot method tries to plot all columns as additional lines in different colors if we don't select them out.
Pandas syntax is more consise, it would just be co2.plot("decimal.date", "average") without the repetition needed in the datascience call
Carl Boettiger
@cboettig
but that's all pretty minor. I think you've got me on the right path by focusing on the datascience method calls and trying to avoid [] indexing... it does get tricky very fast though; keep wanting to introduce pandas functions here & there where datascience doesn't have an easy way (that I know of) to do what I need. (and I'm just learning python as I go myself; coming from R mostly)
henryem
@henryem
I think there's nothing wrong with using Pandas or matplotlib stuff here and there if it's substantially easier. For example, there's currently no way to label a plot without using matplotlib functions, so we did that in labs. I think some degree of magical thinking about library functions is inevitable anyway.
Chris Holdgraf
@choldgraf
happy new year data science people!
Sam Lau
@SamLau95
@cboettig @henryem to get column values from a table without brackets, you can use
table.values(‘my_column’)
henryem
@henryem
Cool, thanks
Chris Holdgraf
@choldgraf
hey folks - how up-to-date will the pip version of this class be?
in previous pip builds there have been some nasty bugs that had already been fixed on the dev branch, but pip wasn't updated
I'm trying to figure out which I should tell an instructor to use...git clone or pip. They're not super familiar with the shell/git/etc so I'd prefer pip, but not if it's going to lag considerably behind the dev branch
@SamLau95 maybe you have thoughts?
Sam Lau
@SamLau95
@choldgraf last semester we were actively developing the package and were releasing new versions every week because oftentimes students needed the fixes to complete labs. i think using the pip version for class should be fine if for no other reason than parity between instructor / student code output
right now releasing a new version has a lot of friction (depends on both john to update the Pypi version and ryan to update + push the dockerfile) which is why it’s been delayed so long. i’m actually waiting on john to push a bunch of changes the pypi at the moment
Chris Holdgraf
@choldgraf
ok cool - so you think the pip version is stable enough to use primarily...I'll pass that along to instructors
Sam Lau
@SamLau95
yup, thanks for that :)
is there a place where this conversation is happening? i’d be willing to listen in and answer questions directly if needed
i think i’d like to push out a written, consolidated, collaborative guide of how an instructor can be productive in creating material using juypter and datascience
but i’m not sure if something like is useful / already being worked on / done
Chris Holdgraf
@choldgraf
there's no consolidated place for discussion, more just little conversations here and there
I think a guide will be useful, especially for some instructors who have no background at all in computing
E.g., I've been writing up a short post on how to do scientific computing in windows
because somebody was confused about people saying "bring up a terminal and type XXX" which didn't work in windows
but it sounds like having some materials for instructors will be almost just as important (at least early on) as material for students...at least if we want to attract instructors who don't already do scientific computing in python
Sam Lau
@SamLau95
gotcha. personally i lean towards helping instructors without background in scientific computing since i think long-term that’ll result in more diversity in terms of courses and students addressed
maybe i’ll just throw up a github page on the dsten org with some info
what are some things that are absolutely necessary for a page like that?
Chris Holdgraf
@choldgraf
that's a good question, I think after the previous and this iteration it'll be clearer what the main pain points are
but it might be worth a brainstorm
either way, we should be documenting what people have questions about
Sam Lau
@SamLau95
agreed
are there some particular topics that would be the best bang for the buck for instructors right now?