by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 02:27
    jreback commented #35507
  • 02:27

    jreback on master

    Added paragraph on creating Dat… (compare)

  • 02:27
    jreback closed #35507
  • 02:27
    jreback closed #35438
  • 02:27
    jbrockmendel unlabeled #35388
  • 02:27
    jbrockmendel labeled #35388
  • 02:26
    jreback milestoned #35507
  • 02:26
    jreback demilestoned #35438
  • 02:26
    jreback milestoned #35438
  • 02:26
    jbrockmendel commented #35388
  • 02:26
    jbrockmendel edited #35388
  • 02:24
    jreback commented #35208
  • 02:24
    jreback milestoned #35208
  • 02:24
    jbrockmendel commented #34869
  • 02:21
    jbrockmendel commented #35552
  • 02:14
    jreback commented #35078
  • 02:12
    dycw edited #35560
  • 02:12
    dycw labeled #35560
  • 02:12
    dycw labeled #35560
  • 02:12
    dycw opened #35560
Dr. Muhammad Anjum
@anjumuaf123_twitter
Hi dears., I want to get average of some specific rows and then subtraction that mean from Tb .How to do ?
pl
Dr. Muhammad Anjum
@anjumuaf123_twitter
how to ignore negative floating values in a data set ?
William Ayd
@WillAyd
@anjumuaf123_twitter I just want to reiterate a prior warning that this list is for developing pandas. For usage questions please ask on StackOverflow
Dr. Muhammad Anjum
@anjumuaf123_twitter
@WillAyd Thank you very much for such kindness
Erfan Nariman
@erfannariman

Today I had to use DataFrame.lookup on a dataframe of +- 1.5 million rows and 30 columns and found it wasn't very performant, then I looked at the code and saw it's a for loop internally.

            for i, (r, c) in enumerate(zip(row_labels, col_labels)):
                result[i] = self._get_value(r, c)

I created my own method with DataFrame.melt and loc, which performed better:

def lookup(data, col_label):
    result = data.melt(id_vars=col_label)
    result = values.loc[values[col_label] == values['variable'], 'value']

    return result

Would it be an idea to open a ENH ticket for this? Thought I check it here first.

Jeff Reback
@jreback
lookup should be deprecated there is an issue about it
Erfan Nariman
@erfannariman
I see, I think you mean #18262
avinashpancham
@avinashpancham
I would like to work on pandas-dev/pandas#23992, but I am not sure of my approach. The issue requires unit-testing a plot and for that I thought of using pytest-mpl. Is that they way to go or do we already have some built-in functions for this type of unit-test. I checked pandas/tests/plotting, but I couldn't find a similar unit test.
Erfan Nariman
@erfannariman
I am working on #35224, I get a docstring warning Deprecation warning should precede extended summary. I tried to move it around in the docstring (starting at the top of the docstring), but still get the error when I try to validate here. I checked some other example where deprecation warning is added, and they look similar to mine. Can someone point me in the right direction here? Thanks
Robin to Roxel
@r-toroxel
Hello, need a little help here. Can't get the tests running. Master is up to date with upstream, conda environment as well. But all of a sudden, tests/test_downstream.py fails along with several others. Did work before.
3 replies
Erfan Nariman
@erfannariman
I created #35299 and linked PR #35300 , can one of the devs maybe have a look at it and share some thoughts? I left it the PR as a draft for now, else I can finish it up. Thanks
Also I am still runing into Deprecation warning should precede extended summary at #35224, the docstring error does not make lot of sense to me, so not sure how to solve it
Yutaro Ikeda
@ikedaosushi
Hello team, I saw this release (https://github.com/pandas-dev/pandas/releases/tag/v1.1.0rc0) and found the link (https://pandas.pydata.org/pandas-docs/version/1.1.0/) in the page is not found. Is it already known? It's just FYI.
Ozan Öğreden
@oguzhanogreden
A question re. #32542: I added a what's new entry after a successful test run. Now it fails with a seemingly unrelated error. Curious if this has anything to with @r-toroxel 's report here.
1 reply
guru kiran
@gurukiran07

Here's the current str_get implementation

    def f(x):
        if isinstance(x, dict):
            return x.get(i)
        elif len(x) > i >= -len(x):
            return x[i]
        return np.nan

Can we change it to something like this:

def f(x):
    try:
        return x[i]
    except (KeyError, IndexError):
        return np.nan
2 replies
guru kiran
@gurukiran07

pd.Series.str.get documentation states parameters as:

Parameters: i: int
    Position of element to extract.

i can be any hashable object right? In the case of dictionaries i need not necessarily be anint. Would it be good to change it to i: hashable object?

17 replies
Marco Gorelli
@MarcoGorelli
Sometime ago someone was asking about checking that types declared in docstrings match type annotations. Seems there's a library that (among other things) does this: https://github.com/terrencepreilly/darglint
fleimgruber
@fleimgruber
Why is import pandas as pd; pd.Timestamp("2019-10-27 02:00:00+02:00", tz="Europe/Warsaw").ceil(freq="1H") erroring out with pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 2019-10-27 02:00:00, try using the 'ambiguous' argument? I would expect the .ceil to be a no-op here. Pandas 1.0.5
ldacey
@ldacey_gitlab
for some reason pd.read_excel() is returning columns as an index even if I say index_col=None
like 4 out of the 5 columns are indexed for no reason at all
Erfan Nariman
@erfannariman
This message was deleted
2 replies
William Ayd
@WillAyd
Does anyone know why pd.Int64Index([1, 2])._engine.get_loc(2) would ever raise a KeyError? I can’t reproduce in the REPL but when debugging #34997 (also related - #33439) and stepping through internals that seems to happen
William Ayd
@WillAyd
As far as I can tell the hashtable that the index uses isn’t correctly populated by the time the get_loc call gets made. I’m not clear though on how that lifecycle is managed
Erfan Nariman
@erfannariman
For my understanding, when using conda, why doesn't it get the latestrelease? So conda install pandas still grabs 1.0.5, so you need to be specific with: conda install -c conda-forge pandas==1.1.0?
This is not the case with pip
Jeff Reback
@jreback
because it’s not on the default channel (built by anaconda) yet; conda-forge is another channel
Erfan Nariman
@erfannariman
I see thanks.
Erfan Nariman
@erfannariman
Is there a reason we don't have a pandas page on linkedin?
2 replies
onlysolace
@onlysolace
Hello
I am wondering how I can create a Series that is derived from a computation
Like if Series_A = Series_B + Series_C
onlysolace
@onlysolace
Then if Series B or C change, it should be reflected in Series_A
Erfan Nariman
@erfannariman
@onlysolace this channel is for development, please ask your usage question on StackOverflow.com
Erfan Nariman
@erfannariman
@datapythonista @tomaugspurger do you guys need any help with socials? I can maybe help out if time is the issue, think about expanding to other platforms, writing more accessible blogs about (new) functionality so less technical users are interested.
onlysolace
@onlysolace
I am looking through the documents right now. I would like to add an observer to a Series
William Ayd
@WillAyd
Has anyone been looking at the CI failures on numpydev by chance?
William Ayd
@WillAyd
Looks to be an issue with unsigned integer wraparound on an Index constructor with the intersection call
Thomas Smith
@smithto1

For a PR to be merged is it a requirement that all of the Checks pass? Or can a PR be merged with some of the Checks failing?

I ask because some tests are failing on py37_np_dev; looks like they are failing for every PR.

Is it still worth chasing PR for approval/merge while the tests are in this failing state or should we wait til these tests are fixed?

@jreback @WillAyd @rhshadrach I'm sure any one of you could answer.

1 reply
Jeff Reback
@jreback
generally we will wait to merge things - though if a PR is good and reviewed then it’s ok (but haven’t reviewed anything yet)
Simon Hawkins
@simonjayhawkins
now that we are down to just 4 fails on np-dev, maybe could xfail them for now xref #35481
ldacey
@ldacey_gitlab
is it possible to specify an encoding for pd.read_excel() in 1.1.0? I get an error now and I see that kwds are no longer accepted
I just wonder what to do when a file is in cp1252 or something like that
1 reply
ldacey
@ldacey_gitlab
it is also possible that the encoding argument never did anything and I just had it default to utf-8
2 replies
ldacey
@ldacey_gitlab
from pandas 1.0.5 to pandas 1.1.0, using astype("string") causes a sigterm with my scheduler and a dead kernel with jupyter. Changing this to astype(str) fixes the issue
I checked the column and it is an event_body_html column from Zendesk and some of the transcripts are huge so it doesn't need to be the new "string" data type anyways, but it did cause some schedule failures until I made the switch
9 replies
matrixbot
@matrixbot
cyberjunkie Who are these idiots
cyberjunkie Spamming ads
MankaranSingh
@MankaranSingh
off topic question, but what are all the matrix bots all over gitter ?
Rohan Mathew
@MavicMaverick
I am currently running SQL queries from for my python application through pd.read_sql_query. I am planning to use the same application with larger datasets on RDBs soon and was wondering if pandas runs the query and stores the result set in memory or keeps an open connection to the database so that the data is read over the network. As I was saying, I am planning to use this for very large datasets and it is not feasible for me to send large amounts of data across the network to be stored in memory because that would fill the memory and would be slow. So does it store the result set, no matter the size, in a dataframe in memory or does it read from the database through an open connection?
3 replies
Deepak RATHOD
@rathoddeepak537_twitter
hiii