These are chat archives for data-8/datascience

24th
Dec 2015
Carl Boettiger
@cboettig
Dec 24 2015 05:52
@choldgraf Another quick puzzle for you:
nasa_temp = "http://climate.nasa.gov/system/internal_resources/details/original/647_Global_Temperature_Data_File.txt"
temp = ds.Table.read_table(nasa_temp, skiprows=range(4), na_values = "*", delim_whitespace=True, 
                    names=["Year", "Annual", "FiveYear"])

## Pandas plots this just fine
temp.to_df().plot()

## datascience not so much
temp.plot("Year")
plt.plot(temp["Year"], temp["Annual"])
Some error about getting a string when it expects a float. I still haven't made sense of the idea that a column in Tables need not have a consistent type. Is that really the case? why?
Sam Lau
@SamLau95
Dec 24 2015 07:46

@cboettig this is what i get from the last 5 rows of the temp table:

Year                                 | Annual | FiveYear
2012                                 | 0.63   | 0.67
2013                                 | 0.66   | nan
2014                                 | 0.75   | nan
2015                                 | nan    | nan
------------------------------------ | nan    | nan

looks like you have some missing values and a string, too

Chris Holdgraf
@choldgraf
Dec 24 2015 08:19
Yeah it looks like that big ------- is the problem. You could always drop the last row, aka this works:
temp = temp.take(range(temp.num_rows-1))
temp.plot('Year')
Sam Lau
@SamLau95
Dec 24 2015 08:20
or, for a slightly more succinct first line:
temp = temp.take[:-1]
Chris Holdgraf
@choldgraf
Dec 24 2015 08:20
or you could write a function that does "for each row in this column, try to cast it as an integer. If it errors, return np.nan. Then you could make one more pass and drop any rows == nan
oo @SamLau95 thanks for the tip, didn't know that Table take behaves like pandas iloc
Sam Lau
@SamLau95
Dec 24 2015 08:22
yup, it was done in #120