Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 00:01
    jonashaag synchronize #45664
  • Jan 27 23:46
    jonashaag synchronize #45664
  • Jan 27 23:43
    mroeschke closed #45605
  • Jan 27 23:43
    mroeschke commented #45605
  • Jan 27 23:42
    lithomas1 review_requested #45611
  • Jan 27 23:36
    jonashaag synchronize #45664
  • Jan 27 23:34
    lithomas1 commented #45605
  • Jan 27 23:33
    lithomas1 labeled #45605
  • Jan 27 23:28
    jonashaag synchronize #45664
  • Jan 27 23:26
    mroeschke synchronize #45478
  • Jan 27 23:25
    jonashaag synchronize #45664
  • Jan 27 23:14
    jonashaag synchronize #45664
  • Jan 27 23:06
    jonashaag synchronize #45664
  • Jan 27 22:58
    twoertwein converted_to_draft #45610
  • Jan 27 22:53
    jonashaag synchronize #45664
  • Jan 27 22:35
    jreback milestoned #45582
  • Jan 27 22:35
    jreback labeled #45582
  • Jan 27 22:35
    jreback labeled #45582
  • Jan 27 22:35
    jreback labeled #45582
  • Jan 27 22:29
    jonashaag synchronize #45664
guru kiran
@gurukiran07

Here is a question from StackOverflow https://stackoverflow.com/q/69954697/12416453 where the OP is confused about .loc assignment. I think it helpful to add this line(taken from docs user guide/indexing)

pandas aligns all AXES when setting Series and DataFrame from .loc, and .iloc.

To DataFrame.loc docs page.

Qian Hong
@fracting
Dear group, I have a dataset with str column A and str column B. I want to select all rows where A is a substring of B. What's the recommended way? I know I can use .apply() and a lambda function to store the substring test result in an extra C column as a boolean serie then select using the boolean serie as a condition. But is there a way without using any extra column? thanks!
4 replies
theompm
@theompm:matrix.org
[m]
2 replies
MUD
@penalty-zone
Hello everyone, when I use the to_sql() method, I am considering whether there is a way to implement INSERT IGNORE in SQL. Is there a way to solve it in PANDAS?
Ramon Cepeda Jr.
@rcepeda95

Hello everyone, I'm new here and glad to be.
I was playing with a dataset that had columns labeled "sensor_id", "sensor_name", "sensor_value", "item_id", among others. After running df.nunique() I saw that "sensor_id", "sensor_name", and "item_id" columns had the same number of unique values. That got me thinking that they might be representing the same thing, such as {sensor_id: 123, sensor_name: "foo", item_id: 456}, were always linked and therefore I can reduce my data by just keeping one of these columns. I tried to look for a pandas feature that lets me compare two series to see if they represent the same thing and didn't end up finding anything. I ended up writing something on my own that seems to work.

Is there a way to do this already? If not, would this be a useful enhancement that I should push up?

Richard Shadrach
@rhshadrach
Any contributors use Windows? We were having trouble getting a development environment setup with the MS Build Tools -- I only have a linux machine so wasn't able to help out very much.
1 reply
Ian Alexander Joiner
@iajoiner:matrix.org
[m]
I’d like to reopen the following PR and finish all the necessary work
May I ask how can that be achieved?
P.S. I’m the guy who wrote the ORC writer adapter in Apache Arrow. I really want my work to be integrated into Pandas so that it can get more widespread usage.
Marco Edward Gorelli
@MarcoGorelli
They deleted the branch, so not much can be done to reopen unfortunately
Ian Alexander Joiner
@iajoiner:matrix.org
[m]
Ok..so I will add my own. What needs to be done in terms of tests etc?
Marco Edward Gorelli
@MarcoGorelli

You'll need to cover any lines of code you add - perhaps check to_parquet to see what's done there?

I don't know anything about ORC so can't be more specific than that, but if you open a PR (even if it's not finished) then reviewers can help you out

Ian Alexander Joiner
@iajoiner:matrix.org
[m]
Hello Joris! :)
Joris Van den Bossche
@jorisvandenbossche
You can still fetch the branch from github, if it has useful content
9 replies
Ian Alexander Joiner
@iajoiner:matrix.org
[m]
Sure! I will!
Ian Alexander Joiner
@iajoiner:matrix.org
[m]
Really thanks @jorisvandenbossche and @MarcoGorelli ! I got it to work using gh.
jmgduarte
@jmgduarte:matrix.org
[m]

Hello everyone I'm currently working on a project where we need to serialize our DataFrames to JSON, we want to use orient='table' however to_json loses date information for datetime.date and datetime.time as it does not serialize those objects properly (see pandas-dev/pandas#32037).

As a workaround I've started looking for those types and encoding that info in the name, when reading back, I re-convert them into the expected values. This approach works but it isn't perfect.

I've also looked into ExtensionDtype and implemented a PoC for the datetime.date class, while this saves a lot of RAM, it still does not serialize as I wanted to (pandas-dev/pandas#20612).

I am interested in making this work, both for my use case and pandas but I have no clue where to look, thus I com asking for help!

14 replies
Pranjal Pandey
@pranjii
Anyone working on #44696 ? Should I start working on it?
2 replies
jmgduarte
@jmgduarte:matrix.org
[m]
That snippet is from pandas/io/json/_table_schema.py, lines 192-196 on b6d9c894da
jmgduarte
@jmgduarte:matrix.org
[m]
@jorisvandenbossche: I've created a new issue for this pandas-dev/pandas#44705
2 replies
Angel
@SNR20db
Hi is there anybody there?
3 replies
Unnati
@unnati914
hey everyone! I am Unnati, a prefinal year student from India, I want to contribute to pandas, how can I start?
2 replies
Vimal Octavius
@VimalOctavius_twitter
Hello, Im a first time contributor trying to update an example in Pandas Documentation. If I need to update, let’s say https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html , How do I know which file in the pandas-Dev to look into
1 reply
James
@jsal13

Howdy, y'all! I'm James, and I'm beginning to contribute back to things that I use all the time. :']

I'm having an issue with the Docker "contributing environment" (https://pandas.pydata.org/docs/dev/development/contributing_environment.html) when I build it on Windows in WSL2 Ubuntu. I'm getting logs now, but has anyone else had an issue doing this? It seems to get to the final step and error out when installing the local pandas dev package.

(Logs incoming soon, it takes a bit of time to build on my pc.)

3 replies
James
@jsal13
Leonardo
@leonardojimenez1990
hi i want to read a 9.5Gb csv file. But the jupyter kenel closes unexpectedly. It does not load the data in df.
df = pd.read_csv ('aux.csv')
df.shape
(231128640, 5) # are 231128640 records with 5 columns.
Do you have any suggestions to be able to read the data from the file?
Leonardo
@leonardojimenez1990
errorkerneljupyter.png
Stefano Alberto Russo
@sarusso
@leonardojimenez1990 it is likely not fitting in your RAM. If this is the case, you can try to read it line by line and work on aggregates instead of the original dataset.
Leonardo
@leonardojimenez1990
@sarusso in the laptop I have 32GB of RAM, ntel® Core™ i7-1065G7 CPU @ 1.30GHz × 8
Stefano Alberto Russo
@sarusso
There is no need to showcase you HW setup: do your homework, check RAM usage. It it stated nowhere that since the file on your hard drive is 9.5 GB in RAM it will use the same space, it could easily take 10 times more or 10 times less depending on a lot of factors. If then yo do not have a RAM issue, discard this message and re-ask the question stating that you already checked RAM usage.
Leonardo
@leonardojimenez1990
Thank you
Anderson Bravalheri
@abravalheri

Hello guys, I just wanted to let you know that we added a very simple integration test in setuptools that will try to build pandas from the sdist and install it in a virtualenv: https://github.com/pypa/setuptools/blob/main/setuptools/tests/integration/test_pip_install_sdist.py

The main idea here is to have a regression test in place to avoid changes in setuptools to break the pandas installation unexpectedly (there might still be breakages because of changes in the setuptools design made on purpose).

So if you ever decide to pin the version of setuptools on pyproject.toml, replace the build backend, or stop seeing value in this integration test, please drop us a line so we can also change accordingly :smile:

Jeff Reback
@jreback
thanks that is great!
MMDRZA
@Pymmdrza
In this code, I just want to be able to get a wallet balance on page "Balance" in the print results, but I can not try, thank you for your help.
  z = (priv.to_string().encode(hex))
    url =("https://etherscan.io/address/%s"%address)
    html = requests.get(url,  hearders=('User-agent'-'Mozilla/5.0')).text
    soup = BeautifulSoup(html, "html.parser")
    table = soup.find("div", {"class": "col-md-6"})
    value = table.findAll('td')[1].text.split(' ')[0].strip()
    print(str(i)+" "+"PrivateKey = "+z+" "+ "Address = 0x"+address+" "+str(value))
Pratim Ugale
@pratimugale
Hi, I want to contribute. Is there any issue up for grabs? I do not find any BUG/ENH that requires work to be done under the Good First Issue label
Rafael Antonio Ribeiro Gomes
@grafael

Hi all, I just wrote this blog post: https://www.storj.io/blog/reading-and-writing-files-from-to-storj-with-pandas

I would like your feedback about this "workaround" solution using s3fs. I have plans to create a feature request to implement a native solution using Storj DCS.

Varun Shrivastava
@Varun270
Hey everyone, I have solved a couple of good first issues and now want to focus on solving some with the difficulty level of medium. But I realized that I would have to understand the code base in order to do that. Can anyone please tell me where should I start since I don't know where to start from?
Joris Van den Bossche
@jorisvandenbossche
@all as a heads up, "master" branch has been renamed to "main", see pandas-dev/pandas#39577 for more details
gabs
@gabrielbhl
Can someone help me?
I am having an issue trying to docker build an API that uses Numpy and pandas
Ajitesh Gupta
@AjiteshGupta
image.png conda env create -f environment.yml is crashing when i am trying to set up the environment. Is it a RAM issue ? I am running ubuntu 20.04 on virtual machine with 4GB ram.
3 replies
Leonidas Tsaprounis
@ltsaprounis
Hi all! Quick question.
Why is there no eq method for dataframes and the user needs use pandas.testing.assert_frame_equal instead? (same for Series)
I'm sure there is a valid reason for this, but I can't wrap my head around it.
2 replies
Rohith Varma
@rohithrtvarma
Hi I have just started with this application gitter, How do i make the most of the website??
Michal Novotný
@clime
Guys, can you tell me what's going on here? I am trying to append value to a list in a dataframe cell. First it works but after adding new column, it starts behaving strangely:
>>> x = pd.DataFrame({'A':[[], []], 'B':[[], []]})
>>> x.loc[0, 'B'] += ['foo']
>>> x
    A      B
0  []  [foo]
1  []     []
>>> x['C'] = [[] for _ in range(len(x))]
>>> x.loc[0, 'B'] += ['foo']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/clime/.virtualenvs/keras_tf/lib64/python3.9/site-packages/pandas/core/indexing.py", line 692, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/home/clime/.virtualenvs/keras_tf/lib64/python3.9/site-packages/pandas/core/indexing.py", line 1635, in _setitem_with_indexer
    self._setitem_with_indexer_split_path(indexer, value, name)
  File "/home/clime/.virtualenvs/keras_tf/lib64/python3.9/site-packages/pandas/core/indexing.py", line 1688, in _setitem_with_indexer_split_path
    raise ValueError(
ValueError: Must have equal len keys and value when setting with an iterable
>>> x
    A           B   C
0  []  [foo, foo]  []
1  []          []  []
>>> x.loc[0, 'C'] += ['foo']
>>> x
    A           B    C
0  []  [foo, foo]  foo
1  []          []   []
>>>
this just seems broken
Irv Lustig
@Dr-Irv
@clime pandas doesn't support the + operators on lists. To do what you want, you have to use list.append()
x.loc[0, 'B'].append('foo')
Michal Novotný
@clime
thanks, what puzzles me is that pandas changes behavior along the way (first it works, then it doesn't). Additionally it throws an error, but still changes something and...yeah, very confusing
Jason T. Kiley
@jtkiley

I'm running into a strange problem, and I'm curious if anyone has seen this (or knows of a solution).

I have two dataframes (one about 1GB and another about 150MB) that are full of text documents and their metadata. I can write out each as parquet individually with no issues. But, if I concat them, the resulting dataframe (which looks fine) will run away on memory usage (hits about 110GB on a 64GB machine before the Jupyter kernel dies after about 3 minutes of running) in df.to_parquet(). It doesn't appear to be sensitive to fastparquet vs pyarrow, I used df.convert_dtypes() on everything, and checking by column shows that it's the big text column that hangs (i.e. all other columns will write out together in about 1.5 seconds).

Any ideas?

Jason T. Kiley
@jtkiley
As a followup, if anyone ends up with a similar issue, I resolved by upgrading pyarrow from 4.0.1 (the newest conda defaults was giving me) to 6.0.1 (the actual latest and current in conda-forge).