Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 08:17
    lorenzoinm opened #162
  • Aug 14 13:03
    YouMate1401 commented #263
  • Aug 12 21:19
    bosd edited #323
  • Aug 12 21:16
    bosd labeled #323
  • Aug 12 21:16
    bosd opened #323
  • Aug 12 00:25
    1andox commented #61
  • Aug 11 18:06
    jaycatsby commented #41
  • Aug 05 18:27
    InLaw commented #304
  • Aug 05 17:24
    bosd commented #304
  • Aug 05 05:05
    varunlalan commented #153
  • Aug 01 11:49
    endovitskayaV edited #322
  • Aug 01 11:48
    endovitskayaV labeled #322
  • Aug 01 11:48
    endovitskayaV opened #322
  • Jul 30 19:17
    phacic opened #321
  • Jul 26 13:56
    sh4yce commented #151
  • Jul 26 13:42
    Alexandre-bitoun closed #320
  • Jul 26 13:41
    Alexandre-bitoun opened #320
  • Jul 26 04:11
    sh4yce commented #151
  • Jul 26 04:11
    sh4yce commented #151
  • Jul 25 16:35
    ottohirr opened #319
Vinayak Mehta
@vinayak-mehta
Hello world!
lsternlicht
@lsternlicht
Anyone know an HTML table parsing library as good as camelot?
Vinayak Mehta
@vinayak-mehta
@lsternlicht HTML table parsing is way more deterministic than PDF table parsing. pandas.read_html works most of the time for me.
Oleg Gavrilov
@OlegGavrilov
Hello guys! Can anyone help me out with this, I need to strip the "non breaking space" character from my output, but -strip '\u00a0' doesn't work
any other options I can try?
Deepak Dhaka
@dhaka22
Hi Vinayak, i am working on table extraction and camelot is giving me content of one column in a single row, how to handle that.. and it is now working with border less tables.
essentialols
@essentialols
Hi Vinayak, I'm trying to use camelot but I receive different kinds of error messages. The last error I received was OSError: [Errno 22] Invalid argument
Dimiter Naydenov
@dimitern
@vinayak-mehta Hey, do you think you'll have time to fix the TravisCI setup for Camelot after yesterday's renaming of the repo to atlanhq/camelot ?
Vinayak Mehta
@vortex_ape_twitter
I'm fixing it today.
Vinayak Mehta
@vortex_ape_twitter
I've fixed the failing tests. Travis now runs on https://github.com/camelot-dev/camelot. We can continue development on there.
Dimiter Naydenov
@dimitern
Awesome! I've some PRs to propose :)
Vinayak Mehta
@vinayak-mehta
Camelot v0.7.3 released. This is a bugfix release.
Abhi0495
@Abhi0495
hi Vinayak so i am having an issue in reading tables an exception is appearing "OSError: exception: access violation writing 0x16F3B7B0" could please suggest how to resolve this
Attila Skalina
@Synzzz
Hi, by any chance did anyone create a java wrapper for camelot?
Attila Skalina
@Synzzz
Also what's the situation with ghostscript having a paid commercial license but camelot itself having MIT license?
Éléonore
@Eleonore9
Hello!
I'm failing to extract a PDF table using Excalibur and would love to have a sample data, like a simple PDF that should work for sure.
Éléonore
@Eleonore9
@Eleonore9 I've selected a table and I'm stuck on a 'Refresh' page like camelot-dev/excalibur#69
Attila Skalina
@Synzzz
did you refresh? how much time did you wait?
Pravar Agrawal
@pravarag
@vinayak-mehta is there any way I can point my virtual environment to my local camelot in order to test local changes?
Vinayak Mehta
@vinayak-mehta
@pravarag You can create a new virtual env altogether and then install Camelot in editable mode.
@pravarag These are some of the easy open issues that you could pick up:
Pravar Agrawal
@pravarag
@vinayak-mehta sure. Thanks :)
Pravar Agrawal
@pravarag
@vinayak-mehta is pip install camelot-py[dev] same for editable mode?
nightwarrior-xxx
@nightwarrior-xxx
@vinayak-mehta Can you explain again how does camelot calculate the accuracy? Correct me if I am wrong. Firstly coordinates of pdf tables is calculated then coordinates of each cell is calculated and from each cell after combining we again get the whole tables and from that we calculate the coordinates.
Pravar Agrawal
@pravarag
@vinayak-mehta I was able to run camelot-py with changes to stream.py in reference to following issue: camelot-dev/camelot#88 . Now, while trying to handle exception for no text present in either (xmin, ymin, xmax, ymax) I'm wondering where to have text_bbox defined? Otherwise I'm greeted with (xmin, ymin, xmax, ymax) variable referenced before assignment error. Any suggestions?
Vinayak Mehta
@vinayak-mehta

@vinayak-mehta is pip install camelot-py[dev] same for editable mode?

pip install -e . for editable mode

@nightwarrior-xxx
  1. Calculate table coordinates (which include cell coordinates)
  2. Get list of text boxes from PDF
  3. Assign text box one by one checking overlap with a table cell. More the overlap, better the accuracy.
Vinayak Mehta
@vinayak-mehta
@pravarag Not sure about your questions. Can you point me to the line where you're trying to do this? A simple try..except should do the trick.
Pravar Agrawal
@pravarag
@vinayak-mehta trying to put a try on this line: https://github.com/camelot-dev/camelot/blob/master/camelot/parsers/stream.py#L98 . Once this has been handled, I've put "text_bbox" after the except as of now and that is where I'm getting above mentioned error.
Pravar Agrawal
@pravarag
@vinayak-mehta I've submitted a PR for the same, kindly review and let me know for any changes.
Vinayak Mehta
@vinayak-mehta
I'll check it out today! :)