Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 09:28

    simonjayhawkins on 1.1.x

    Fix regression in iloc with boo… (compare)

  • 09:27
    simonjayhawkins commented #37463
  • 09:27
    simonjayhawkins closed #37463
  • 09:27
    simonjayhawkins milestoned #37463
  • 09:27
    simonjayhawkins labeled #37463
  • 09:26
    simonjayhawkins unlabeled #37432
  • 09:08
    anders-kiaer commented #37448
  • 09:01
    anders-kiaer commented #37448
  • 08:29
    da-wad commented #37453
  • 08:29
    AlexKirko review_requested #37310
  • 08:29
    AlexKirko commented #37310
  • 08:27
    phofl edited #37463
  • 08:27
    phofl opened #37463
  • 08:26
    Japanuspus commented #37461
  • 08:25
    jorisvandenbossche commented #37205
  • 08:24
    AlexKirko edited #35922
  • 08:15
    jorisvandenbossche commented #37258
  • 08:15
    jorisvandenbossche commented #37258
  • 08:04
    topper-123 synchronize #37450
  • 07:59
    topper-123 synchronize #37450
Erfan Nariman
@erfannariman
btw in the error it points to: File "/Users/erfannariman/Workspace/pandas/pandas/core/groupby/groupby.py", line 702 in __getattr__
Daniel Saxton
@dsaxton
@erfannariman i was actually noticing something similar trying to debug groupby code today (also using PyCharm), i didn't get a stack overflow but it would just bomb and quit when trying to step into DataFrameGroupBy
in the exact place you posted
William Ayd
@WillAyd
Not sure how PyCharm works but can you just place an explicit breakpoint()call where you are trying to debug instead? Or does that not work either?
Daniel Saxton
@dsaxton
on another note, what are the Travis jobs testing that aren't being covered by other pipelines? curious because they seem fairly slow / flaky compared to Azure for instance
5 replies
might be able to do that, it lets you put little red "circles" next to lines where you want to stop (maybe it uses breakpoint() behind the scenes for that)
Vishesh Mangla
@XtremeGood
import pandas as pd
import sympy as sm
s = sm.Symbol("S_m")
print(pd.DataFrame(data={"symbols":[s], "sentence":["Voltage of something"], "value":[r" $ 50^\circ C $"]}).to_latex(escape=True))
\begin{tabular}{llll}
\toprule
{} & symbols &              sentence &            value \\
\midrule
0 &     S\_m &  Voltage of something &   \$ 50\textasciicircum \textbackslash circ C \$ \\
\bottomrule
\end{tabular}
How to write latex in cells in pandas? I need 50 degree celcius
and much more latex in cells.
also the sympy symbols are spoiled
Irv Lustig
@Dr-Irv
@XtremeGood this channel is for pandas development issues. Please post your question to stackoverflow, which is where it will hopefully be answered
Vishesh Mangla
@XtremeGood
@Dr-Irv it seems like a bug, though I 'm not sure of it . Is there a general channel for pandas somewhere? The one on IRC is not responsive.
Irv Lustig
@Dr-Irv
@XtremeGood You can ask StackOverflow. If you think it is a bug, then you can create an issue on pandas GitHub
Erfan Nariman
@erfannariman
I made this checklist for #36777 , would it be an idea to add this to the top post? @dsaxton
Complete checklist:

**doc/source/development:**
- [ ] code_style.rst (no changes needed)
- [ ] contributing.rst (no changes needed)
- [ ] contributing_docstring.rst
- [ ] developer.rst
- [ ] extending.rst
- [ ] index.rst (no changes needed)
- [ ] internals.rst (no changes needed)
- [ ] maintaining.rst (no changes needed)
- [ ] meeting.rst (no changes needed)
- [ ] policies.rst (no changes needed)
- [ ] roadmap.rst (no changes needed)
**doc/source/getting_started/comparison:**
- [ ] comparison_with_r.rst
- [ ] comparison_with_sas.rst
- [ ] comparison_with_sql.rst
- [ ] comparison_with_stata.rst
- [ ] index.rst (no changes needed)
**doc/source/getting_started/intro_tutorials:**
- [ ] 01_table_oriented.rst
- [ ] 02_read_write.rst
- [ ] 03_subset_data.rst
- [ ] 04_plotting.rst
- [ ] 05_add_columns.rst
- [x] 06_calculate_statistics.rst
- [x] 07_reshape_table_layout.rst
- [x] 08_combine_dataframe.rst
- [x] 09_timeseries.rst
- [x] 10_text_data.rst
**doc/source/getting_started:**
- [ ] index.rst (no changes needed)
- [ ] install.rst (no changes needed)
- [ ] overview.rst (no changes needed)
- [ ] tutorials.rst (no changes needed)
**doc/source/reference:** (no changes needed)
**doc/source/user_guide**:
- [x] 10min.rst
- [x] advanced.rst
- [x] basics.rst
- [ ] boolean.rst
- [ ] categorical.rst
- [ ] computation.rst
- [ ] cookbook.rst
- [ ] dsintro.rst
- [ ] duplicates.rst
- [ ] enhancingperf.rst
- [ ] gotchas.rst
- [ ] groupby.rst
- [ ] index.rst
- [ ] integer_na.rst
- [ ] io.rst
- [ ] merging.rst
- [ ] missing_data.rst
- [ ] options.rst
- [ ] reshaping.rst
- [ ] scale.rst
- [x] sparse.rst
- [ ] text.rst
- [ ] timedeltas.rst
- [ ] timeseries.rst
- [ ] visualization.rst
2 replies
Erfan Nariman
@erfannariman
Something I thought of lately and wanted to share here and see what others think of this. Besides contributing to the code base, I spend quite some time (on average 0.5 to 1 hour a day) answering questions (given over 1100 answers) on StackOverflow on the pandas tag. In my opinion having quick and quality answers on your questions is really valueable aspect for the user perspective and especially for an open source package. Having answered for over a year quite often I see that pandas has quite some dedicated people who deliver quality answers and spend quite some time doing that and I see them doing that daily. But right now this group is not considered a "contributor" to pandas, although I think they can be considered that so (or some other word for it, maybe not contributor). You can find the topuser on the pandas tag of all time and last 30 days here: link. Would it be an idea to do something with this? Give them a mention somewhere? This would incentivize more people to start giving answers and especially the group who knows pandas really well from the user perspective, but has a hard time contributing to the pandas code base (which I was for a long time as well). I don't think any other open source module has considered this.
1 reply
Daniel Saxton
@dsaxton
anyone know what the deal is with PRs like this (seen a couple lately, making a tiny edit with no context): pandas-dev/pandas#36812
8 replies
Daniel Saxton
@dsaxton
is pandas part of the hacktoberfest topic? apparently there was a change made by digitalocean where that's necessary in order for contributors to get credit for PRs: https://hacktoberfest.digitalocean.com/hacktoberfest-update
1 reply
Erfan Nariman
@erfannariman
image.png
2 replies
This made me laugh @marcogorelli
biancaisla1
@biancaisla1
peak_latency = (df.filter(regex=r'condition|epoch|FP1')
                .groupby(['condition', 'epoch'])
                .aggregate(lambda x: df['time'].iloc[x.idxmax(1)])
                .reset_index(level=0, drop=True)
                .melt(id_vars=['condition', 'epoch'],
                      var_name='channel',
                      value_name='latency of peak'))
I have a question for the room. I seem to be having troubles with "nan" values. The above was my input, and the below is my output.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-4b970230a667> in <module>
      2 peak_latency = (df.filter(regex=r'condition|epoch|FP1')
      3                 .groupby(['condition', 'epoch'])
----> 4                 .aggregate(lambda x: df['time'].iloc[x.idxmax(1)])
      5                 .reset_index(level=0, drop=True)
      6                 .melt(id_vars=['condition', 'epoch'],

~\Anaconda3\lib\site-packages\pandas\core\groupby\generic.py in aggregate(self, arg, *args, **kwargs)
   1453     @Appender(_shared_docs["aggregate"])
   1454     def aggregate(self, arg=None, *args, **kwargs):
-> 1455         return super().aggregate(arg, *args, **kwargs)
   1456 
   1457     agg = aggregate

~\Anaconda3\lib\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs)
    235             # grouper specific aggregations
    236             if self.grouper.nkeys > 1:
--> 237                 return self._python_agg_general(func, *args, **kwargs)
    238             else:
    239 

~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_agg_general(self, func, *args, **kwargs)
    904 
    905         if len(output) == 0:
--> 906             return self._python_apply_general(f)
    907 
    908         if self.grouper._filter_empty_groups:

~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_apply_general(self, f)
    740 
    741     def _python_apply_general(self, f):
--> 742         keys, values, mutated = self.grouper.apply(f, self._selected_obj, self.axis)
    743 
    744         return self._wrap_applied_output(

~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py in apply(self, f, data, axis)
    235             # group might be modified
    236             group_axes = _get_axes(group)
--> 237             res = f(group)
    238             if not _is_indexed_like(res, group_axes):
    239                 mutated = True

~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in <lambda>(x)
    892     def _python_agg_general(self, func, *args, **kwargs):
    893         func = self._is_builtin_func(func)
--> 894         f = lambda x: func(x, *args, **kwargs)
    895 
    896         # iterate through "columns" ex exclusions to populate output dict

<ipython-input-34-4b970230a667> in <lambda>(x)
      2 peak_latency = (df.filter(regex=r'condition|epoch|FP1')
      3                 .groupby(['condition', 'epoch'])
----> 4                 .aggregate(lambda x: df['time'].iloc[x.idxmax(1)])
      5                 .reset_index(level=0, drop=True)
      6                 .melt(id_vars=['condition', 'epoch'],

~\Anaconda3\lib\site-packages\pandas\core\frame.py in idxmax(self, axis, skipna)
   8099         """
   8100         axis = self._get_axis_number(axis)
-> 8101         indices = nanops.nanargmax(self.values, axis=axis, skipna=skipna)
   8102         index = self._get_axis(axis)
   8103         result = [index[i] if i >= 0 else np.nan for i in indices]

~\Anaconda3\lib\site-packages\pandas\core\nanops.py in _f(*args, **kwargs)
     65             if any(self.check(obj) for obj in obj_iter):
     66                 msg = "reduction operation {name!r} not allowed for this dtype"
---> 67                 raise TypeError(msg.format(name=f.__name__.replace("nan", "")))
     68             try:
     69                 with np.errstate(invalid="ignore"):

TypeError: reduction operation 'argmax' not allowed for this dtype
biancaisla1
@biancaisla1
What's useful is that the output suggests using with np.errstate(invalid="ignore") but I'm not sure where to insert this in my code? I've tried after (df.filter() with np.errstate()) and separately df.filter() with np.errstate() Any advice would be great!
Shao Yang Hong
@hongshaoyang

https://stackoverflow.com/a/54713407

most likely the type of your cell values are non-numeric. easiest solution is to use pd.to_numeric() to convert the Series to numeric types

Alex Lim
@alexhlim
Hi, I'm having trouble rebuilding c/cython internals. I'm running python setup.py build_ext --inplace -j 4 in a pandas docker container and am up-to-date with upstream master. This is only a small snippet of errors (I'm pretty much getting an error for every .c file). Any help/advice would be appreciated!
pandas/_libs/lib.c:99:3: error: type defaults to ‘int’ in declaration of ‘__Pyx_XDECREF’ [-Werror=implicit-int]
pandas/_libs/lib.c:99:3: error: parameter names (without types) in function declaration [-Werror]
pandas/_libs/missing.c:1720:72: error: unknown type name ‘CYTHON_UNUSED’
 static PyObject *__pyx_pf_6pandas_5_libs_7missing_6NAType_10__reduce__(CYTHON_UNUSED PyObject *__pyx_self, CYTHON_UNUSED PyObject *__pyx_v_self) {
                                                                        ^~~~~~~~~~~~~
pandas/_libs/lib.c:100:3: error: data definition has no type or storage class [-Werror]
   __Pyx_XDECREF(__pyx_v_val);
   ^~~~~~~~~~~~~
pandas/_libs/lib.c:100:3: error: type defaults to ‘int’ in declaration of ‘__Pyx_XDECREF’ [-Werror=implicit-int]
pandas/_libs/lib.c:100:3: error: parameter names (without types) in function declaration [-Werror]
pandas/_libs/lib.c:101:3: error: data definition has no type or storage class [-Werror]
   __Pyx_XDECREF(__pyx_v_stub);
   ^~~~~~~~~~~~~
pandas/_libs/lib.c:101:3: error: type defaults to ‘int’ in declaration of ‘__Pyx_XDECREF’ [-Werror=implicit-int]
pandas/_libs/lib.c:101:3: error: parameter names (without types) in function declaration [-Werror]
pandas/_libs/lib.c:102:3: error: data definition has no type or storage class [-Werror]
   __Pyx_XGIVEREF(__pyx_r);
   ^~~~~~~~~~~~~~
pandas/_libs/lib.c:102:3: error: type defaults to ‘int’ in declaration of ‘__Pyx_XGIVEREF’ [-Werror=implicit-int]
pandas/_libs/lib.c:102:3: error: parameter names (without types) in function declaration [-Werror]
pandas/_libs/lib.c:103:3: error: data definition has no type or storage class [-Werror]
   __Pyx_RefNannyFinishContext();
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~
pandas/_libs/lib.c:103:3: error: type defaults to ‘int’ in declaration of ‘__Pyx_RefNannyFinishContext’ [-Werror=implicit-int]
pandas/_libs/lib.c:103:3: error: function declaration isn’t a prototype [-Werror=strict-prototypes]
pandas/_libs/lib.c:104:3: error: expected identifier or ‘(’ beforereturn   return __pyx_r;
   ^~~~~~
pandas/_libs/lib.c:105:1: error: expected identifier or ‘(’ before ‘}’ token
 }
 ^
pandas/_libs/lib.c:116:8: error: unknown type name ‘PyObject’
 static PyObject *__pyx_pw_6pandas_5_libs_3lib_13fast_unique_multiple_list_gen(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
        ^~~~~~~~
pandas/_libs/lib.c:116:79: error: unknown type name ‘PyObject’
 static PyObject *__pyx_pw_6pandas_5_libs_3lib_13fast_unique_multiple_list_gen(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
                                                                               ^~~~~~~~
pandas/_libs/lib.c:116:101: error: unknown type name ‘PyObject’
 static PyObject *__pyx_pw_6pandas_5_libs_3lib_13fast_unique_multiple_list_gen(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
                                                                                                     ^~~~~~~~
pandas/_libs/missing.c:1720:108: error: unknown type name ‘CYTHON_UNUSED’
 static PyObject *__pyx_pf_6pandas_5_libs_7missing_6NAType_10__reduce__(CYTHON_UNUSED PyObject *__pyx_self, CYTHON_UNUSED PyObject *__pyx_v_self) {
Irv Lustig
@Dr-Irv
@alexhlim which platform? Linux? Mac? Are you using a conda environment?
46 replies
Sam Cohen
@samc1213

Hi. I am having an issue running ASVs, as requested in comments in my PR here pandas-dev/pandas#36867.

I created a docker environment, and ran the below:

/home/pandas/asv_bench# asv continuous -f 1.1 upstream/master HEAD -b ^indexing. However, I got the below error:

Error running /opt/conda/bin/conda env create -f /tmp/tmpb0n32z3e.yml -p /home/pandas/asv_bench/env/11a1c20ede452de2525075dc4a15eb94 --force (exit status -256)

2 replies
Thomas Smith
@smithto1
I have a fix for #36757. The issue is a regression in performance between 1.0.5 and 1.1.X. What is the policy for adding a test for the Pull Request? I think we want to avoid the same regression happening again, but I assume we don't want to include time limits in the tests. Is there another way to cover this?
Michael
@michael_xsr_twitter

Hi everyone,

I am a final year computer science student and I am interested in contributing to pandas as part of my open source development project. My area of contribution will be adding user-friendly GUIs to some of the IO functions and some enhancements on missing data handling.

This is my expression of interest in contributing to pandas and me stating my area of contribution. Thank you.

Erfan Nariman
@erfannariman
Cool @michael_xsr_twitter ! See the contributing guide on how to get started. After you set up your environment you can start with a "good first issue": link. If you have any development related question, feel free to post them here and of the devs or other contributors will try to help you.
Erfan Nariman
@erfannariman

When I try to git bisect, I get the following error when I try to build an older commit:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Not sure what this means, anyone an idea how to solve it?

5 replies
In this case git bisect good was the commit for release 1.0.5
Avinash Pancham
@avinashpancham
Hi all, when I run ./test_fast.sh on master ( last commit 75a5fa7600d517773172f495f01e20b734883706) I get 565 errors atm. Would you advice to merge master on my new branch now, or wait till these errors are fixed. I dont know how long this will take thats why I am asking.
1 reply
tools4origins
@tools4origins
Hello everyone,
I was wondering if Pandas was considering extending its SQL support to allow the manipulation of pandas data frames using SQL?
From what I understood from Spark, it was an efficient feature, for example for performance as SQL formalize the expected result, and some tasks can be optimized.
The question has been tickling me for a few weeks now, but I ask it now because another library, dask, has just received an overlay that allows this: https://pypi.org/project/dask-sql/#description (demo here: https://youtu.be/av08UM1HG3M?t=381)
Dave Hirschfeld
@dhirschfeld
Does anyone happen to have a link to the PR where offsets were made unhashable?
I want to better understand the the reasons for the change and hopefully find some suggested work-arounds as my code was relying on their hashability :(
Joris Van den Bossche
@jorisvandenbossche
@dhirschfeld I am not sure this was done on purpose.
Do you have an example?
So it seems mostly the Tick-based offsets (up to Day) that are affected (was first trying BusinessDay, which works fine)
Joris Van den Bossche
@jorisvandenbossche
So this is caused by pandas-dev/pandas#34227, which accidentally removed a __hash__ method during a refactor
We can fix that in 1.1.4
Dave Hirschfeld
@dhirschfeld
Great - thanks! I was surprised I couldn't find an entry in breaking changes!
I can wait for 1.1.4 (easier than the refactor I'd otherwise have to do :grimacing: )
Dave Hirschfeld
@dhirschfeld
I opened pandas-dev/pandas#37267 to track the problem
I might take a crack myself if I can find the time (and no one beats me to it). Won't be this week though...
Jeff Reback
@jreback
@jorisvandenbossche nice talk today!
5 replies
you have to register but easy
epifanio
@epifanio
hi! do we have a gitter channel for xarray as well ?
found it