Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 05:00
    dannyi96 edited #48085
  • 04:45
    srotondo commented #47745
  • 04:45
    pep8speaks commented #48100
  • 04:45
    INDIG0N synchronize #48100
  • 04:06
    GivyBoy commented #47841
  • 03:18
    srotondo synchronize #47745
  • 02:37
    samrao1997 commented #48019
  • 02:14
    twoertwein synchronize #48015
  • 01:59
    GivyBoy synchronize #47841
  • 01:58
    GivyBoy synchronize #47841
  • 01:57
    mroeschke commented #47841
  • 01:57
    mroeschke commented #47841
  • 01:47
    pep8speaks commented #45973
  • 01:46
    abmyii synchronize #45973
  • 01:29
    jreback commented #47955
  • 01:03
    samrao1997 edited #48102
  • 01:03
    abmyii review_requested #45973
  • 01:01
    pep8speaks commented #45973
  • 01:01
    abmyii synchronize #45973
  • 00:59
    samrao1997 commented #48089
Ian Alexander Joiner
@iajoiner:matrix.org
[m]
May I ask how this is supposed to be fixed? Is this due to some issue in main I’m unaware of?
oyesavan
@oyesavan
Hi All, I am trying to run pandas test suite with a database mostly the tests under (/tests/io/sql.py) and as the connection is only allowed through SQLAlchemy. I am using SQLALchemy's create_engine() to get an Engine object. My query is from method
def pandasSQL_builder(con, schema: str | None = None):
as it checks whether the given conn parameter is either string or a SQLAlchemy Connectable otherwise throws an UserWarning.
  1. the check is something like this :
    ```if sqlalchemy is not None and isinstance(con, sqlalchemy.engine.Connectable):
          return SQLDatabase(con, schema=schema)```
    this gives me an AttributeError: module 'sqlalchemy.engine' has no attribute 'Connectable'. Did you mean: 'Connection'?
  2. If I pass con parameter as My connection string in such case this method call SQLAlchemy's create_engine() which returns an Engine Object. how can this be compared with the engine.Connections or engine.Connectable ?
    Link : https://github.com/pandas-dev/pandas/blob/a853022ea154dd38dd759300ee50b456f3a9ddf6/pandas/io/sql.py#L731
    I want your help on how to use the SQLAlchemy to connect to database to pandas. Let me know If I am doing anything wrong.
Milad-Laly
@Milad-Laly
This message was deleted
This message was deleted
Kha
@nlhkh

Hi, how can I count the number of groups after a groupby?. For example, when I do df.groupby([…]).apply(lambda …), I also want to know how many groups there are after a groupby.

I tried groupby().count, but that returns the count within each group. The current solution for me is groupby().apply(lambda _: 1).sum(). I heard that apply is not the most performant, so I would like to find something else.

2 replies
Kha
@nlhkh
Jesus, there are scammers even in here?
mfilipe
@mfilipe:matrix.org
[m]

Hey! I'm trying to use apply() on multiple columns but facing this error: cannot convert the series to <class 'float'>. Below the code snippet:

import pandas as pd
import locale
locale.setlocale(locale.LC_NUMERIC, '')

statusinvest_csv = pd.read_csv('statusinvest.csv', sep=';', on_bad_lines='skip')
df = pd.DataFrame(statusinvest_csv, columns=['TICKER','DY','ROIC','LPA'])
df[['DY','LPA','ROIC']] = df[['DY','LPA','ROIC']].apply(locale.atof)

Anyone know what is wrong?

2 replies
Kha
@nlhkh
I have a dataframe with some columns as type string (specifically string[pyarrow]), but after some operation, such as groupby, and melt, those columns turned into object. Is there a way to prevent string columns to turn into object?
3 replies
Kha
@nlhkh
During groupby, what does the parameter observed do? I looked up on many resources, but still could not understand what it does. Does it mean that if it is true, only categorical columns are shown at the final result?
1 reply
Tek4congnghe
@tek4congnghe_twitter
Oh
Boris Rumyantsev
@bdrum
hey all, I'm faced with a recommendations for creating packages on python docs and more sources where authors give advices to use src directory for better integration with pip and explains that such approach is the best practice. Im a bit confused because I didn't see any more less famous project that use this approach. Can someone explain why pandas doesn't follow to 'best practice'? https://packaging.python.org/en/latest/tutorials/packaging-projects/
3 replies
Boris Rumyantsev
@bdrum
got it, many thanks!
TraverseTowner
@TraverseTowner
Hi guys, I've been working on a free personal website that allows you to search for Pandas recipes tasks instantly. I've already written around 430 Pandas recipes with clear explanation (https://skytowner.com/explore/pandas_recipes_reference), and I thought this would be a good addition to the list of community tutorials at https://pandas.pydata.org/docs/getting_started/tutorials.html#communitytutorials. Who should I contact or what should do for this potential addition? Many thanks in advance!
Mu
@warm200

Hi all, I updated our pandas version from v1.1.5 to v1.4.1 and found certain API is much slower in new version(in this case, DataFrame.update)
did anyone has smilar experience. seem like the internal implementation is completely different.
i don't know why new version is worse or maybe I didn't understand it correctly.

here is the screenshot , I am using pyinstrument to profile everyting inside DataFrame.update.

2 replies
image001.jpg
Martin Nguyen
@Martin20494
Hi all, I'm creating a simple pandas dataframe like this df = pd.DataFrame(data={'group':[1,2], 'element':[0.1,0.2,0.3]}) with condition: one group will have all three elements. My expected dataframe would look like: group (col1) - 1 1 1 2 2 2, element (col2) - 0.1, 0.2, 0.3, 0.1, 0.2, 0.3. I can do this if I use for loop, but I wonder if you have any other faster way to do using pandas functions?
Berel Levy
@berellevy
Look into .fromproduct I think
Actually I don’t see it 🤨 coulda sworn….
Martin Nguyen
@Martin20494
@berellevy: Actually I have just given it a try thanks to your advice :)) I tried pandas.MultiIndex.from_product() and pandas.MultiIndex.to_frame() and it works beautifully :)). Thank you very much! I haven't known this before.
Berel Levy
@berellevy
Yes I saw the multi index one. Happy to help.
I also found an extensive so article if you want to go deep lol
https://stackoverflow.com/questions/11144513/cartesian-product-of-x-and-y-array-points-into-single-array-of-2d-points
Martin Nguyen
@Martin20494
@berellevy: So awesome!!! Yea I would love to go deeper! Thank you Berel :))
Berel Levy
@berellevy
👌
serenaelia
@serenaelia
anyone here work in Blender or know of a chat where I can ask questions?
Prem Patel
@PremPatel8

Hello I have a Pandas Dataframe related doubt that's stumped me for hours and I want to asks if what I need to do is even possible in Pandas. I have a DataFrame that looks like this:

| Company Name              |   Task_A by Month |   Task_B by Month |   Task_A by YTD     |  Task_B by YTD      |   Task_C by YTD     |
|:--------------------------|------------------:|------------------:|--------------------:|--------------------:|--------------------:|
| Company_A                 |                14 |                22 |                  19 |                  88 |                  62 |
| Company_B                 |               345 |               156 |                 200 |                 563 |                 172 |

And I need to convert it into this form so that I can then turn it into a Excel sheet as output:

| Company Name | Work Type | Previous Month | Year to Date |
|--------------|-----------|----------------|--------------|
| Company_B    | Task_A    | 345            | 200          |
|              | Task_B    | 156            | 563          |
|              | Task_C    | N/A            | 172          |
| Company_A    | Task_A    | 14             | 19           |
|              | Task_B    | 22             | 88           |
|              | Task_C    | N/A            | 62           |
1 reply
Prem Patel
@PremPatel8
Thank you @Dr-Irv I'll take a look at this, the only thing I am worried about is that "Previous month" column only has 2 tasks A and B whereas "Year to date" has 3 tasks A, B, C , will this mismatch in rows cause any issues, if so perhaps I could add some extra empty column to makes the number of tasks match between the two
Pankwings
@Pankwings

I have forked pandas to my Github account and cloned it to my computer. So, now when I print(pd.__version__) it prints my username instead of the current version of pandas?
Any suggestion or reference ?

There is bug reported #47480, in the latest version that I have forked I don't observed the mention bug.
So, to report that with current version don't have the bug I needed the current version.

Phil Reinhold
@PhilReinhold
Is it always true that for a MultiIndex, idx.levels[k] == idx.unique(idx.names[k])? I'm asking because it seems like the former is much more performant, but the latter seems to be recommended.
adrienpacifico
@adrienpacifico

Hi,
I am trying to use the DOCKERFILE with the .devcontainer.json file with vscode. When I launch the tests, I have many errors including pyarrow/pandas-shim.pxi:65: ImportError

Does someone know why the docker image does not run the test correctly?

serenaetang
@serenaetang:matrix.org
[m]
Hello! I am trying to test the compatibility between Pandas and a cloud container I created, and I am using the Pandas test suite to do so. I was wondering if there was a way to run a specific test module (i.e. the arithmetic module), instead of having to run the full test suite via pandas.test()?
1 reply
joooeey
@joooeey
hiii
I'm trying to develop some little thing for Pandas. But setting up the dev environment is becoming a PIA. I'm working on an ASUSpro Zenbook with i7 CPU and 8 GB RAM. Is that enough to get the pandas development environment running? I'm right now trying to set up the Docker container and it's filling up RAM. Specifically, a process called Vmmem is eating up 6GB of RAM, bringing the total to 96-99%.
This happens when running docker build --tag pandas-joooeey-env .
Is there another way to contribute to Pandas? All I want is to insert a badly needed warning and the corresponding test (see here: https://github.com/pandas-dev/pandas/issues/47005#issuecomment-1146665556). Would it be sensible to just contribute those code changes without testing locally and let CI do everything?
joooeey
@joooeey
Vmmem which is the virtual machine process (I'm running docker on WSL) also eats up all the CPU (100%) in addition to virtually all the memory. The docker build .command does keep printing progress although it's been running for almost an hour.
joooeey
@joooeey
looks like I'm filling the disk space now too. So it's finally time for a new laptop. I remember I've had that disk space issue many times before and have already sunk a lot of time into optimizing disk space so I can fit everything on 210 GB. So for developing pandas comfortably, what are the system requirements?
joooeey
@joooeey

now after just setting up the container and building the C extensions (no changes to code yet), I can't even run pytest in the docker container:

(base) root@fac85d3b9b08:/home/pandas# pytest tests/indexes/period/test_constructors.py
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --strict-data-files
  inifile: /home/pandas/pyproject.toml
  rootdir: /home/pandas

Am I doing something wrong? What? Or is there a mistake in the Dockerfile?

Irv Lustig
@Dr-Irv
@joooeey If you are on Windows, you can just set up your environment on Windows. Avoids the overhead of docker running on your laptop. If you are using WSL, then just use the WSL environment directly. Docker is a real pig on Windows platforms.
joooeey
@joooeey
okay. The docs said it's easier with docker, so that was my first try. Something about it being complicated to install the C compiler
The contributing page lists a lot of steps without docker: https://pandas.pydata.org/docs/dev/development/contributing_environment.html#id2
So that's the easier way? Perhaps also easier on resources?
I've never used WSL before
I've thought of replacing the OS with Linux altogether to use my 7-year old hardware more efficiently. But the only thing that keeps me on Windows is MS Office.
William Ayd
@WillAyd
If you build a conda environment using the environment.yml file in the repository that should include the compilers for you
Irv Lustig
@Dr-Irv
If on Windows, you have to install a Visual Studio compiler, but that is pretty straightforward
joooeey
@joooeey
Installing the Visual Studio Build Tools 2019 seems to have worked.
conda env create -f environment.yml remains stuck at "Solving environment" for over an hour.
4 replies
Marco Edward Gorelli
@MarcoGorelli

On my old travelling laptop, I have a development environment created with only the minimal requirements:

cython==0.29.30
pytest>=6.0
pytest-cov
pytest-xdist>=1.31
pytest-asyncio>=0.17
pytest-cython

If you're short on space, then you could try making a virtual environment with just these, and then following the python setup.pypart of the instructions

Robin Fishbein
@RobinFiveWords
Hi folks - vague question, as I'm struggling to grasp the framework of how everything in reshaping is built. In a couple cases where I do large joins, I encode the values of what will become the MultiIndex in each DataFrame before joining, and this avoids memory errors and collisions. I wonder whether encode-join-decode would be practical to implement, as an option or perhaps default behavior when certain conditions are met. However, my encode/decode wrapper code is customized for each use case, and it's not clear to me how to generalize this concept, given all the options for merge/concat/etc. Is there a point deep within the reshaping operations where joining could be wrapped with encode/decode?
1 reply
zartarr88
@zartarr88:matrix.org
[m]
Hey guys this might be a little basic, but I have a dataframe that is giving me cumulative over a month
I want to just make it into a daily rate ; by number of days in the month
NimaFakoor
@NimaFakoor

Hi

im looking for Solve Fuzzy Optimal Control Problems with python

can anyone help me?