Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 22:31
    jreback labeled #42892
  • 22:29
    jreback milestoned #42859
  • 22:29
    jreback labeled #42859
  • 22:29
    jreback labeled #42859
  • 22:28
    jreback demilestoned #40498
  • 22:28
    jreback milestoned #40498
  • 22:27
    dylancaponi commented #42888
  • 22:24

    jreback on master

    PERF: Groupby.shift dont re-cal… (compare)

  • 22:24
    jreback closed #42885
  • 22:23
    jreback milestoned #42885
  • 22:23
    jreback labeled #42885
  • 22:23
    jreback labeled #42885
  • 22:23
    pep8speaks commented #42889
  • 22:23
    lithomas1 synchronize #42889
  • 22:23
    pep8speaks commented #42890
  • 22:23
    jreback synchronize #42890
  • 22:22
    pep8speaks commented #42890
  • 22:22
    jreback synchronize #42890
  • 22:22
    pep8speaks commented #42890
  • 22:22
    jreback synchronize #42890
djangoReactGuy
@SanskarSans
I solved it with this commit_data.groupby(by=["Author"])["SHA"].nunique()
for particular author this is how I did it
commits = (
        commit_data[commit_data["Author"] == author]
        .groupby(by=["Author"])["SHA"]
        .nunique()
    )
mocquin
@mocquin
Hello there, I am creating an extension using pandas extension api, and am facing a problem regarding series creation. Basically my CustomArray (that subclasses ExtensionArray) have different index depending on the method of creation : either RangeIndex(start=0, stop=10, step=1) or RangeIndex(start=0, stop=1, step=1), for an expected length of 10. That is either a one element index while 10 is expected. I can't figure out what process is involved in the creation of the index in my CustomArray class.
30 replies
Alex-Gregory-1
@Alex-Gregory-1
Hi, I opened a pull request pandas-dev/pandas#42143 and changed my code according to the requests of the reviewers as best as I could but I haven’t heard anything in around 10 days. How should I move forward from here?
3 replies
mocquin
@mocquin
Anyone could explain what does an object has to expose to be considered "list like" ? The doc of pd.core.dtypes.common.is_list_like is pretty light
3 replies
mocquin
@mocquin
Hello again, I would like to know if it is possible to override the html repr of DataFrame using the extension interface ? I guess not but you never know...
4 replies
ldacey
@ldacey_gitlab
is there any performance difference with the new pyarrow string data type compared to astype("string")? I am using pyarrow for all of my data stuff, so I feel like I should switch to it but I am not sure if there is any actual difference
4 replies
Dave Hirschfeld
@dhirschfeld
I'm sure constructing a DataFrame from a scalar used to work - i.e. it would just broadcast to the shape implied by the index/columns
Now it's throwing a ValueError:
>>> pd.DataFrame(0, index=[1,2,3])
Traceback (most recent call last):
  File "C:\Users\dhirschf\envs\dev\lib\site-packages\IPython\core\interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-125-7b95127dfe8b>", line 1, in <module>
    pd.DataFrame(0, index=[1,2,3])
  File "C:\Users\dhirschf\envs\dev\lib\site-packages\pandas\core\frame.py", line 590, in __init__
    raise ValueError("DataFrame constructor not properly called!")
ValueError: DataFrame constructor not properly called!
When/where was this changed?
Ok, it helps to actually define the columns :facepalm:
>>> pd.DataFrame(0, index=[1,2,3], columns=['A'])
   A
1  0
2  0
3  0
ghobona
@ghobona
The Open Geospatial Consortium (OGC) invites developers to the July 2021 OGC API Virtual Code Sprint (scheduled for July 21st - 23rd starting at 07:00am EDT each day). Some of the participants will be implementing Web APIs based on the OGC API - Processes candidate standard that can offer access to Machine Learning (ML) algorithms. It would be great to have some implementations using pandas, at the code sprint, wrapped inside Web APIs based on OGC API - Processes. Attend the pre-event webinar on July 14th at 09:00am EDT to get an overview of what to expect during the code sprint. Register for the Code Sprint and Webinar at https://portal.ogc.org/public_ogc/register/210721api_codesprint.php
Irv Lustig
@Dr-Irv
@jreback During the meeting you mentioned about opening an issue to get pandas 1.3.0 on the main anaconda channel. Do you know where that issue should be opened?
Jeff Reback
@jreback
try conda/conda ?
Irv Lustig
@Dr-Irv
That led me to a different repo. Here's the issue I created: ContinuumIO/anaconda-issues#12527
Filipe
@ocefpaf
@Dr-Irv try commenting in https://github.com/AnacondaRecipes/pandas-feedstock/pulls, looks like their bot alright pick up conda-forge's PR.
EmmaYang29
@EmmaYang29
Alex-Gregory-1
@Alex-Gregory-1
Hi, I have created a pull request pandas-dev/pandas#42591 but I cannot work out why my code is failing the CI tests. Can anybody help? I created my development environment in miniconda by following the "Creating a Python environment" instructions here https://pandas.pydata.org/docs/development/contributing_environment.html
3 replies
RoboZoom
@RoboZoom
Hi - I'm brand new to python, but I'm trying to start a project with pandas. I'm importing a csv into a dataframe via read_csv, and I want to create a validation function to ensure that my file is good
Following that validation, I basically want to typesafe the columns everywhere else in the application
I'm getting stuck understanding the right data structure to save the key-value pairs... in an ideal world, I have an object that intellisense understands, and can populate the properties of... which then returns a string. I initially thought I wanted a dictionary, but I don't think that works
Will enums work like this in Python?
Rather, I haven't seen any documentation for string based enums
where the enum evaluates to a string
Any guidance for what I should google would help
Use case would be
```
RoboZoom
@RoboZoom
df = read_csv('my-file.csv)
validateDf(df, schema) # Aborts app if false
df[schemaDef.name] # Returns name column
Farhan Saeed
@fantamlab_gitlab
my ECS container task has been assigned 15GB of RAM but pandas df.memory_usage(deep=True).sum() is giving me more than 15GB. How is that possible.
buhtz
@buhtz_gitlab
I am new on gitter and see messages from 2015...
buhtz_gitlab @buhtz_gitlab now see the last/current messages
buhtz_gitlab @buhtz_gitlab just want to give a hint about a stackoverflow-Question (https://stackoverflow.com/q/68526846/4865723) related to a behaviour change ( .agg(list, axis=1) )since version 1.3.0...
Gert Hulselmans
@ghuls
@RoboZoom you can convert your string columns to categorical columns and check if they return true for all values that you consider valid:
>>> s = pd.Categorical(['lama', 'cow', 'lama', 'beetle', 'lama',
...                'hippo'])
>>> s.isin(['cow', 'lama'])
array([ True,  True,  True, False,  True, False])
baggiponte
@baggiponte
Hi! Am I the only one experiencing a weird bug? I have a notebook in which I call a custom function, which draws a Boxplot from a given pd.Series object with DateTimeIndex. Yet the function breaks and returns this error: AttributeError: 'Index' object has no attribute 'strftime'. This is so wtf, does not make sense
RoboZoom
@RoboZoom
@ghuls - thanks!
Paul Ganssle
@pganssle

I dunno if anyone's around who understand the time zone code, but I've been trying to put together a patch to prevent pandas from breaking when it encounters a zoneinfo (and from breaking when dateutil and eventually pytz are updated with new implementations), and I'm a bit confused about the various time zone conversions. For example, this function: https://github.com/pandas-dev/pandas/blob/9936902c5aa2396195bca0e07d40104c96ed20e1/pandas/_libs/tslibs/tzconversion.pyx#L84

It says that it localizes a "naïve datetime", but it takes a an array of integers.

Is the idea that a "naïve" datetime is represented under the hood as an offset from 1970-01-01T00:00 in its own local time? It's a weird and confusing choice, but treating it as an actual epoch timestamp seems to be giving the wrong answer.
I'm also not entirely sure what context that function is used in.
Ideally I'd refactor the whole thing so that that function no longer exists, but that may be a bit ambitious for my immediate goals.
Paul Ganssle
@pganssle
Similar question with tz_convert_from_utc — is that returning the weird "int64 representing offset from 1970 in the local zone"?
For tz_convert_from_utc, if it is "int64 representing offset from 1970 in the local zone", I don't understand where the fold information is going, since that's not necessarily a monotonic timeline.
hassan baiga
@hassan_baiga_twitter
hey guys
i'm getting this syntax error i wonder why
def create.app(config_file='setings.py'):
^
SyntaxError: invalid syntax
any ideas ?
def create.app(config_file='setings.py'):
              ^
SyntaxError: invalid syntax
1 reply
Varun Shrivastava
@Varun270
Hello Everyone, I am a newbie to open source with basic knowledge about pandas and want to contribute to this repository. Can anyone tell me how should I start? I have basic knowledge about generating my first PR and solving good first issues. Now I want to fix bugs and do some code contributions but I am unable to understand the issues. Can anyone guide me?
1 reply
Wolfgang Kerzendorf
@wkerzendorf
I’m trying to add units to a DataFrame by subclassing (it almost works). However after a slicing operation the __init__ is not called anymore https://gist.github.com/wkerzendorf/52459080be83c7c382bac11ef9ac3195 and so my construction fails
ldacey
@ldacey_gitlab
I am looping over missing columns and received this warning, is there a better approach?
PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider using pd.concat instead.  To get a de-fragmented frame, use `newframe = frame.copy()`
    missing_columns = set(final_columns).difference(df.columns)
    for col in missing_columns:
        df[col] = np.nan
3 replies