Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 22:53
    tgy synchronize #33118
  • 22:52
    tgy commented #33118
  • 22:51
    tgy synchronize #33118
  • 22:50
    neilkg synchronize #33102
  • 22:45
    jreback commented #33074
  • 22:45
    jreback milestoned #33074
  • 22:43
    jreback commented #32950
  • 22:43

    jreback on master

    REF: move mixed-dtype frame_app… (compare)

  • 22:43
    jreback closed #32950
  • 22:42
    jreback milestoned #32950
  • 22:41
    jreback commented #33010
  • 22:41

    jreback on master

    add match to bare pyteset.raise… (compare)

  • 22:41
    jreback closed #33010
  • 22:41
    jreback milestoned #33010
  • 22:40
    jreback review_requested #32979
  • 22:35
    phofl commented #16345
  • 22:35

    jreback on master

    PERF: Using _Period for optimiz… (compare)

  • 22:35
    phofl commented #16345
  • 22:35
    jreback commented #32973
  • 22:35
    jreback closed #32973
Sasidhar Kasturi
@sasidharkasturi_twitter
Thank you for the suggestion @Jeff. Can you please help with steps to run asv_benchmark?
Marco Gorelli
@MarcoGorelli

How does one productively add type annotations?

The contributing guide says

pandas uses mypy to statically analyze the code base and type hints. After making any change you can ensure your type hints are correct by running

mypy pandas

but if I do that I get a really long list of errors

Dr. Muhammad Anjum
@anjumuaf123_twitter
hi i want to select one particular column but not working, my code is print(df['min'])
Gert Hulselmans
@ghuls
Is there a way to combine 2 dataframes (which contain different dtypes) in consolidated form (so it won't try to copy any data to a new dataframe). The input data comes from a feather file and is 200GiB big, so each memory copy is painful. Both dataframes contain the same number of rows in the same order. I would like that pandas uses the 2 underlying numpy matrixes (consolidated) in the new pandas dataframe. Is there a function to do that?
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),  columns=['a', 'b', 'c'])
    ...:

In [11]: df2
Out[11]:
   a  b  c
0  1  2  3
1  4  5  6
2  7  8  9

In [12]: df3 = pd.DataFrame(np.array([['D', 'A'], ['C', 'B'], [ 'X', 'Y']]),  columns=['d', 'e'])

In [13]: df3
Out[13]:
   d  e
0  D  A
1  C  B
2  X  Y

In [14]: df = df2.assign(**df3)

In [15]: df
Out[15]:
   a  b  c  d  e
0  1  2  3  D  A
1  4  5  6  C  B
2  7  8  9  X  Y
Joshua Wilson
@jwilson8767
@ghuls You may be able to use pd.concat([df2, df3], copy=False, sort=False)
Also, you may want to look into dask + xarray for datasets that large: http://xarray.pydata.org/en/stable/dask.html
It's overkill for something you only need to do once (and can use swap space for), but really powerful for more complex workflows that need to be repeatable.
Gert Hulselmans
@ghuls
@jwilson8767 pd.concat([df2, df3], axis=1, copy=False, sort=False) gives at least the same data. We have bad experience with dask unfortunately (with a different codebase).
Joshua Wilson
@jwilson8767
Do you mean that .concat worked for you, or that it produced the same result as .assign?
Gert Hulselmans
@ghuls
just that after adding axis=1 it gives the same result.
Joshua Wilson
@jwilson8767
Hmmm, could also try df2.join(df3), not sure what flags will be needed, can't remember if this copies data or not.
Peter Goodall
@pjgoodall
Can someone explain to me where both the documentation is now, and where the useful data creation functions are going after the deprecation of pandas.util.testing.
Functions like makeTimeDataFrame() and makeDataFrame()
google search just sends me all over the world.
Documentation for right now would be good - thanks...
tsoernes
@tsoernes
Where can I find a list of functions accessible with a string name, e.g. 'mean' ?
Derek McCammond
@Vlek
Could someone please take a look at the tests for frame/test_apply.py? test_apply_noreduction_tzaware_object is failing for me. I'm getting an AssertionError: values.dtype == "i8"
Joshua Wilson
@jwilson8767
Joshua Wilson
@jwilson8767
@Vlek, looks like the last time that was touch was a regression in v1.0 which was resolved in v1.0.1 pandas-dev/pandas#31614
Joris Van den Bossche
@jorisvandenbossche
But for now, the dataframe creation functions are not made part of the public testing module. If there is demand for that, we should discuss that (you can maybe open an issue for it)
Peter Goodall
@pjgoodall
Thankyou @jorisvandenbossche, especially the part about where the data creation methods are. I may raise an issue, perhaps for creating an associated library. Rich example data creation facilities can be very useful.
Quang Nguyễn
@quangngd

My PR at pandas-dev/pandas#32817 failed at the building documentation step with the output:

build finished with problems, 2 warnings.
##[error]Process completed with exit code 1.

with the 2 warnings are

WARNING: failed to reach any of the inventories with the following issues:
intersphinx inventory 'https://dateutil.readthedocs.io/en/latest/objects.inv' not fetchable due to <class 'requests.exceptions.HTTPError'>: 502 Server Error: Bad Gateway for url: https://dateutil.readthedocs.io/en/latest/objects.inv
WARNING: failed to reach any of the inventories with the following issues:
intersphinx inventory 'https://pandas-gbq.readthedocs.io/en/latest/objects.inv' not fetchable due to <class 'requests.exceptions.HTTPError'>: 502 Server Error: Bad Gateway for url: https://pandas-gbq.readthedocs.io/en/latest/objects.inv

Should i rerun the checks by close and reopen the PR? It looks like it's just some conectivity issue on the CI server end

Shipov-create
@Shipov-create
Hello everybody.Is it appropriate to ask questions about matplotlib here?
matrixbot
@matrixbot

emporea > <@gitter_shipov-create:matrix.org> Hello everybody.Is it appropriate to ask questions about matplotlib here?

There is a specific room for that. #gitter_matplotlib=2Fmatplotlib:matrix.org

Marco Gorelli
@MarcoGorelli

Trying to use pre-commit, but I get an error from mypy:

pandas/core/groupby/generic.py:1059: error: "object" has no attribute "_data"
pandas/core/groupby/generic.py:1068: error: "object" has no attribute "_data"
pandas/core/groupby/generic.py:1069: error: "object" has no attribute "_data"
pandas/core/groupby/generic.py:1104: error: "object" has no attribute "shape"
pandas/core/groupby/generic.py:1107: error: "object" has no attribute "iloc"
Found 5 errors in 1 file (checked 2 source files)

Do people get around this by just doing

$ SKIP=mypy git commit -m "descriptive commit message"

?

Uwe L. Korn
@xhochy
@MarcoGorelli Is this in code you changed? Can you share your diff?
Marco Gorelli
@MarcoGorelli
@xhochy sure, here it is
--- a/pandas/core/groupby/generic.py
+++ b/pandas/core/groupby/generic.py
@@ -523,7 +523,6 @@ class SeriesGroupBy(GroupBy):
         builtin/cythonizable functions
         """
         ids, _, ngroup = self.grouper.group_info
-        result = result.reindex(self.grouper.result_index)
         cast = self._transform_should_cast(func_nm)
         out = algorithms.take_1d(result._values, ids)
         if cast:
@@ -1455,7 +1454,6 @@ class DataFrameGroupBy(GroupBy):
         # for each col, reshape to to size of original frame
         # by take operation
         ids, _, ngroup = self.grouper.group_info
-        result = result.reindex(self.grouper.result_index)
         output = []
         for i, _ in enumerate(result.columns):
             res = algorithms.take_1d(result.iloc[:, i].values, ids)
Uwe L. Korn
@xhochy
If that is your total change, than there is something weird with mypy, then it's ok to do SKIP=mypy and have someone in the PR have a look or report a bug to mypy.
Probably needs a change in line 1019 to no_result:Any = object()
Marco Gorelli
@MarcoGorelli

@xhochy if I do

$ git checkout master
$ git fetch upstream master
$ git reset --hard upstream/master
$ mypy pandas

then I get a really long list of errors, starting with

pandas/io/common.py:503: error: Argument 1 to "writestr" of "ZipFile" has incompatible type "Optional[str]"; expected "Union[str, ZipInfo]"

and ending with

pandas/__init__.py:347: error: Incompatible default for argument "dummy" (default has type "int", argument has type "__SparseArraySub")
Found 201 errors in 24 files (checked 957 source files)

Do you think this means there is something wrong with my development environment? If you do mypy pandas from the master branch do you get no errors?

Uwe L. Korn
@xhochy
The mypy version needs to be pinned to the same as on the master branch (mypy==0.730), with newer mypy you get more errors.
The pre-commit hook though should use exactly that version and should thus report the same errors.
Marco Gorelli
@MarcoGorelli
@xhochy great, thanks! With mypy 0.730, I get no errors :)
Marco Gorelli
@MarcoGorelli

Sorry to keep asking, but I still can't get around this during pre-commit.

I have run

$ pre-commit clean

I have mypy 0.730 installed:

$ version
mypy 0.730

The following returns no errors:

$ mypy pandas
Success: no issues found in 957 source files

but, when I try to commit the diff I showed above, I still get

mypy.....................................................................Failed
- hook id: mypy
- exit code: 1

pandas/core/groupby/generic.py:1064: error: "object" has no attribute "_data"
pandas/core/groupby/generic.py:1073: error: "object" has no attribute "_data"
pandas/core/groupby/generic.py:1074: error: "object" has no attribute "_data"
pandas/core/groupby/generic.py:1109: error: "object" has no attribute "shape"
pandas/core/groupby/generic.py:1112: error: "object" has no attribute "iloc"
Found 5 errors in 1 file (checked 1 source file)

Might have to develop without pre-commit (at least without mypy) for now then

Uwe L. Korn
@xhochy
I can reproduce this behaviour also. This is because the pre-commit hook is only running a stripped-down type check and this seems to miss some inference information.
Joris Van den Bossche
@jorisvandenbossche
Yes, from time to time, it fails on unrelated lines. Then doing a --no-verify when committing might be needed
Uwe L. Korn
@xhochy
Is this then really a useful check or should we better get rid of it? I spent a bit on making it faster but if that leads to false positives, it also of no help.
Joris Van den Bossche
@jorisvandenbossche
Most of the time it is still helpful, I would say
Uwe L. Korn
@xhochy
An alternative would be to use dmypy run -- --follow-imports=skip pandas (probably with --timeout 3600). This would run the checks on the whole scope but through the daemon should be much faster than reloading the cache every time. I'm not sure though how large the branch switching slowdown will be.
Marco Gorelli
@MarcoGorelli

If I'm working with an ArrayLike (which is defined as

ArrayLike = TypeVar("ArrayLike", "ExtensionArray", np.ndarray)

) is there a way to tell mypy that it is specifically of Categorical type, i.e. that it has .categories and .codes attributes? Adding

assert is_categorical(my_object)

doesn't work

William Ayd
@WillAyd
Not at the moment; we don’t define ExtensionArray as a generic type (though ideally maybe should)
Also you can’t currently narrow types using functions like is_categorical (if you search the mypy issue tracker there is a request to allow functions to declaring that they do narrow types, but no one has implemented)
Your best bet for now would be to keep the assert and cast directly thereafter to the type you need
Marco Gorelli
@MarcoGorelli
Thanks @WillAyd , can confirm that works!
Vishesh Mangla
@XtremeGood
import pandas as pd
df = pd.read_json("/kaggle/input/caselaw-dataset-illinois/xml.data.jsonl.xz")
what would be the right way to open this file?
mukul-agrawal-09
@mukul-agrawal-09
Hey Guys, can somebody help me with a agglomerative clustering issue. I have a spatial data and having some weights assigned to each location/object. I want to make a cluster using thresholds of weights and distance between locations. Can somebody suggest me some technique or research paper related to this approach.
Basically any clustering algorithm which takes two parameters for the threshold.
matrixbot
@matrixbot

hannsen94 > <@gitter_xtremegood:matrix.org> what would be the right way to open this file?

It seems like the file is a zipped one (based on the *.xz ending). Maybe you should unzip them either before even using python (tar command line tool) or by using the python package lmza.

Vishesh Mangla
@XtremeGood
done that thanks @matrixbot
Jim Klo
@jimklo
hi, I’ve got a bunch of 2 col DataFrames, that I want to join by their index, but I’d like to add a discriminator to the column indicies… e.g. cols in each DF are “date”, “value”, but when joining I’d like the column MultiIndex to be (state, country, date), (state, country, value)…. can anyone suggest how I do this?