Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
  • 20:36
    jbrockmendel commented #37384
  • 20:31
    jbrockmendel opened #37525
  • 20:28
    rhshadrach commented #37452
  • 20:22
    jbrockmendel synchronize #37524
  • 20:21
    jbrockmendel opened #37524
  • 20:19
    jbrockmendel opened #37523
  • 20:17
    simonjayhawkins commented #37397
  • 20:14
    avinashpancham synchronize #36838
  • 20:12
    avinashpancham synchronize #37063
  • 20:12
    dsaxton commented #37401
  • 20:11
    jbrockmendel synchronize #37455
  • 20:01
    jbrockmendel opened #37522
  • 20:00
    simonjayhawkins commented #37397
  • 19:57

    jreback on master

    ENH: Improve numerical stabilit… (compare)

  • 19:57
    jreback closed #37453
  • 19:57
    jreback closed #37448
  • 19:44
    jbrockmendel opened #37521
  • 19:32
    pep8speaks commented #37204
  • 19:32
    mroeschke synchronize #37204
  • 19:31
    mroeschke commented #15095
Thomas Havlik
Hey folks! When using either df.drop(df.index[i], inplace=True) or df = pd.concat([df[:i], df[i+1:]]), some cells become NaN
df.isna()['Timestamp'].sum() == 0 until I try and remove rows from df. At some point, df.at[i+1, 'Timestamp'] becomes nan. The loop for this is range(0, len(df)-1)
i.e. df.at[i+1, :] is always accessible
start, repeat = (0, True)
initial_rows = len(df)
num_dupes = 0
while repeat:
    repeat = False
    for i in range(0, len(df)-1):
        c = df.iloc[i]
        n = df.iloc[i+1]
        ct = int(c['Timestamp'])
        nt = int(n['Timestamp'])
        diff = nt - ct
        if diff < dupe_threshold:
            # remove the latter sample, logically ORing
            # this sample with it before removal
            if bool(n['Label']):
                df.at[i, 'Label'] = True
            # this doesnt work
            #df.drop(df.index[i+1], inplace=True)

            # neither did this
            #indexes_to_drop = set([i+1])
            #indexes_to_keep = set(range(df.shape[0])) - indexes_to_drop
            #df = df.take(list(indexes_to_keep))

            # and this didnt either
            df = pd.concat([df[:i], df[i+1:]])

            start, repeat = (i, True)
            num_dupes += 1
            total_num_dupes += 1
nt = int(n['Timestamp']) throws "cannot convert nan to int"
Thomas Havlik
I think when I am setting the value with df.at[i, 'Label'] = True, I am losing those rows' data
Thomas Havlik
This is nonsense. Why is it that df.loc/iloc modify the entire row even if you only want to modify a single column from the row?
df.loc[row, col] = val is a setter for the entire row, values indexed by {col:val}
How can I patch a single column's value for a single row in-place?
Thomas Havlik
assert df.isna().sum()['Timestamp'] == 0
df.at[i, 'Label'] = True
assert df.isna().sum()['Timestamp'] == 0 # AssertionError
Why is this happening? df.at[i, 'Label'] = True should not be purging values for other columns
Joshua Wilson
@thavlik Can you check the dtype of column Label before and after using df.at?
Joshua Wilson
Also, are you sure you don't mean to be using df.iat instead of df.at?
Thomas Havlik
it's bool
iAt based indexing can only have integer indexers for df.iat[i, 'Label'] = True
df.iat[i, df.columns.get_loc('Label')] = True appears to work
Joshua Wilson
Awesome, glad that was all it was.
hi I'm having a problem with pandas data frame ..
it takes away the last decimal rounding
hey yo
Hello, I would really like to contribute, but I feel so lost in this huge project.
Anyone can recommend me a simple yet productive issue? i'd love to handle it
Gabriel Corona
@jreback: Shall I rebase instead of merge ? (pandas-dev/pandas#28459)
OK, no it's documented on the contributing doc.
@Dr-Irv thanks, I have multiple dataframes of different shapes on a single excel tab, so keeping track of the exact position and extent of each df reprecluded using xlsxwriter. I did end up figuring out an implementation for the 3 colormap styler without xlsxwriter, modifying this SO post (https://stackoverflow.com/a/57445863/8731272)

I'm trying to create a python environment as described here https://dev.pandas.io/docs/development/contributing.html#creating-a-development-environment but the command "python -m pip install -e . --no-build-isolation" fails due to unknown version of pandas (pandas 0+unknown), how to fix it?


Obtaining file:///C:/Users/GuestUser/Documents/GitHub/pandas
Preparing wheel metadata ... done
Requirement already satisfied: pytz>=2017.2 in c:\users\guestuser\miniconda3\envs\pandas-dev\lib\site-packages (from pandas==0+unknown) (2019.2)
Requirement already satisfied: numpy>=1.13.3 in c:\users\guestuser\miniconda3\envs\pandas-dev\lib\site-packages (from pandas==0+unknown) (1.16.5)
Requirement already satisfied: python-dateutil>=2.6.1 in c:\users\guestuser\miniconda3\envs\pandas-dev\lib\site-packages (from pandas==0+unknown) (2.8.0)
Requirement already satisfied: six>=1.5 in c:\users\guestuser\miniconda3\envs\pandas-dev\lib\site-packages (from python-dateutil>=2.6.1->pandas==0+unknown) (1.12.0)
ERROR: xarray 0.13.0 has requirement pandas>=0.19.2, but you'll have pandas 0+unknown which is incompatible.
ERROR: statsmodels 0.10.1 has requirement pandas>=0.19, but you'll have pandas 0+unknown which is incompatible.
ERROR: seaborn 0.9.0 has requirement pandas>=0.15.2, but you'll have pandas 0+unknown which is incompatible.
ERROR: fastparquet 0.3.2 has requirement pandas>=0.19, but you'll have pandas 0+unknown which is incompatible.
Installing collected packages: pandas
Found existing installation: pandas 0+unknown
Can't uninstall 'pandas'. No files were found to uninstall.
Running setup.py develop for pandas

Successfully installed pandas-0.25.1


Tom Augspurger

It looks like it may have succeeded?

But if you’re worried, you might repeatedly conda uninstall -y --force pandas and pip uninstall -y pandas and maybe remove any pandas directories under your site-packages folder, including a possible pandas.dist_info or pandas.egg_info.

it looks like it have succeeded but it also emitted 4 errors. can I ignore them?
Tom Augspurger
Not sure. Do pandas.__version__ and pandas.__file__ look correct? If so, then yes you can ignore.

--- print(pandas.version)
--- print(pandas.file)

i think it is not good

Ochirgarid Chinzorig
Hello guys, I'm trying to setup pandas-dev environment following the instruction on https://dev.pandas.io/docs/development/contributing.html
But I keep getting dependency errors: currently I've installed sphinx and pyzmq manually
I'm on
  • Ubuntu 18.04.03
  • Python 3.6.8
  • pip 19.1.1
  • virtualenv 16.0.0
Ochirgarid Chinzorig
Any help for the environment? or should I need use docker or other environments?
Thanks for advance
William Ayd
Anyone else noticing an issue with 32 bit Linux environment?
Haven’t looked deeply but seen on a couple PRs
Maybe an issue with pytest release
Josiah Baker
hey @Ochirgarid can you provide an example of what errors you are getting? have you created an environment using conda as per the instructions?
you shouldn’t have to install any of the dependencies manually as that is managed by conda env create -f environment.yml
Ochirgarid Chinzorig
@josibake Error was clearly saying that the dependency is missing " No module named sphinx ". I used pip and mkvirtualenv from the virtualenvwrapper to create the environment. Should I really need to use conda? if I have to which conda(anaconda or miniconda) would you recommend on Ubuntu?
Are there any plans to put pandas_lib on armf? I have been trying to put pandas on an armf arch. board and I have been failing b/c the Debian Distro w/ python3 does not allow for pandas_lib or sudo apt install python3-pandas-lib. I tried w/ pip3 too. I was just thinking that if people were willing to try this excursion, it might prove valuable.
Josiah Baker
@ochirgarid i haven’t tried anything except conda, so i cant speak to other methods. id suggest trying it with miniconda per the contributing guide and see if you are still getting the error
Ut oh. Nevermind me. I was just told something. Debian Stretch does not have the armhf support for pandas. Buster does!
Ochirgarid Chinzorig
@josibake thanks. I'll try using conda
William Ayd
Thinking out loud but is there any reason why we still choose to roll our own is_list_like instead of doing an isinstance(x, collections.abc.Sequence)?
Partially inspired by #28770
Aligning to abc.Sequeunce would seem more pythonic and help the type system
I vaguely recall performance maybe being a consideration point, though in Py3.7 a lot of the abc stuff was written back in C and improved isinstance checks against those
Jeff Reback
that accepts strings
Joris Van den Bossche
and does not accept numpy arrays, or pandas Series