Camelot and Excalibur: PDF Table Extraction for Humans.
People
Repo info
Activity
Jun 27 19:31
MartinThoma commented #215
Jun 27 12:06
Philippe-M opened #160
Jun 27 11:26
echidne commented #151
Jun 27 11:26
echidne commented #151
Jun 24 07:35
lahdjirayhan commented #174
Jun 24 07:14
lahdjirayhan commented #195
Jun 23 11:14
kyuzh commented #159
Jun 23 08:56
kyuzh commented #159
Jun 23 07:17
RyosukeSakaguchi opened #312
Jun 22 19:45
LuizMosciaro commented #286
Jun 22 09:43
parthplc commented #142
Jun 20 15:41
wangui-monicah commented #286
Jun 20 12:40
HeskethGD commented #103
Jun 20 09:36
kyuzh closed #158
Jun 20 08:53
kyuzh commented #158
Jun 20 08:53
kyuzh commented #158
Jun 17 13:34
LuizMosciaro opened #311
Jun 17 13:34
LuizMosciaro labeled #311
Jun 17 10:53
huyz commented #306
Jun 17 10:51
elsheikh21 closed #310
lsternlicht
@lsternlicht
Anyone know an HTML table parsing library as good as camelot?
Vinayak Mehta
@vinayak-mehta
@lsternlicht HTML table parsing is way more deterministic than PDF table parsing. pandas.read_html works most of the time for me.
Oleg Gavrilov
@OlegGavrilov
Hello guys! Can anyone help me out with this, I need to strip the "non breaking space" character from my output, but -strip '\u00a0' doesn't work
any other options I can try?
Deepak Dhaka
@dhaka22
Hi Vinayak, i am working on table extraction and camelot is giving me content of one column in a single row, how to handle that.. and it is now working with border less tables.
essentialols
@essentialols
Hi Vinayak, I'm trying to use camelot but I receive different kinds of error messages. The last error I received was OSError: [Errno 22] Invalid argument
Dimiter Naydenov
@dimitern
@vinayak-mehta Hey, do you think you'll have time to fix the TravisCI setup for Camelot after yesterday's renaming of the repo to atlanhq/camelot ?
Camelot v0.7.3 released. This is a bugfix release.
Abhi0495
@Abhi0495
hi Vinayak so i am having an issue in reading tables an exception is appearing "OSError: exception: access violation writing 0x16F3B7B0" could please suggest how to resolve this
Attila Skalina
@Synzzz
Hi, by any chance did anyone create a java wrapper for camelot?
Attila Skalina
@Synzzz
Also what's the situation with ghostscript having a paid commercial license but camelot itself having MIT license?
Éléonore
@Eleonore9
Hello! I'm failing to extract a PDF table using Excalibur and would love to have a sample data, like a simple PDF that should work for sure.
Éléonore
@Eleonore9
@Eleonore9 I've selected a table and I'm stuck on a 'Refresh' page like camelot-dev/excalibur#69
Attila Skalina
@Synzzz
did you refresh? how much time did you wait?
Pravar Agrawal
@pravarag
@vinayak-mehta is there any way I can point my virtual environment to my local camelot in order to test local changes?
Vinayak Mehta
@vinayak-mehta
@pravarag You can create a new virtual env altogether and then install Camelot in editable mode.
@pravarag These are some of the easy open issues that you could pick up:
@vinayak-mehta is pip install camelot-py[dev] same for editable mode?
nightwarrior-xxx
@nightwarrior-xxx
@vinayak-mehta Can you explain again how does camelot calculate the accuracy? Correct me if I am wrong. Firstly coordinates of pdf tables is calculated then coordinates of each cell is calculated and from each cell after combining we again get the whole tables and from that we calculate the coordinates.
Pravar Agrawal
@pravarag
@vinayak-mehta I was able to run camelot-py with changes to stream.py in reference to following issue: camelot-dev/camelot#88 . Now, while trying to handle exception for no text present in either (xmin, ymin, xmax, ymax) I'm wondering where to have text_bbox defined? Otherwise I'm greeted with (xmin, ymin, xmax, ymax) variable referenced before assignment error. Any suggestions?
_
Vinayak Mehta
@vinayak-mehta
@vinayak-mehta is pip install camelot-py[dev] same for editable mode?
pip install -e . for editable mode
@nightwarrior-xxx
Calculate table coordinates (which include cell coordinates)
Get list of text boxes from PDF
Assign text box one by one checking overlap with a table cell. More the overlap, better the accuracy.
Vinayak Mehta
@vinayak-mehta
@pravarag Not sure about your questions. Can you point me to the line where you're trying to do this? A simple try..except should do the trick.