Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Dec 03 18:12
    chwiese commented #193
  • Dec 03 18:06
    chwiese commented #193
  • Dec 02 15:40
    dependabot[bot] opened #168
  • Dec 02 15:40
    dependabot[bot] labeled #168
  • Dec 02 15:40
    dependabot[bot] labeled #168
  • Dec 01 20:30
    motougo commented #287
  • Nov 27 19:55
    js333031 edited #167
  • Nov 27 19:54
    js333031 edited #167
  • Nov 27 19:54
    js333031 opened #167
  • Nov 22 17:13
    dependabot[bot] labeled #166
  • Nov 22 17:13
    dependabot[bot] labeled #166
  • Nov 22 17:13
    dependabot[bot] opened #166
  • Nov 19 08:50
    peletiah edited #337
  • Nov 19 08:48
    peletiah closed #338
  • Nov 19 08:47
    peletiah reopened #338
  • Nov 19 06:55
    peletiah closed #338
  • Nov 19 06:55
    peletiah edited #338
  • Nov 19 06:55
    peletiah edited #338
  • Nov 19 06:47
    peletiah labeled #338
  • Nov 19 06:47
    peletiah opened #338
nftopham
@nftopham
so I was getting a NotImplementedError because the PDF version I was reading had an unsupported encryption protocol, as stated on the camelot docs
so I searched for some solutions and ended up re-writing the file using ghostscript and downgrading the version. this actually completely removed the encryption which is quite funny. so much for password protected PDFs!
here is my solution, a bit messy right now but you get the gist. would be great if this could be included in future releases as there are only going to be more PDFs written > version 1.4 and PyPDF2 seems to be not interested in a fix
try:
    tables = camelot.read_pdf(**camelot_params)
except NotImplementedError:
    output = os.system('gswin64c -sDEVICE=pdfwrite -dCompatabilityLevel=1.4 -dSAFER -dNOPAUSE -dBATCH -o temp.pdf C:/Users/User/Desktop/input.pdf')
    url = os.path.join(os.getcwd(),"temp.pdf")
    camelot_params = get_camelot_params(meta, url)
    tables = camelot.read_pdf(**camelot_params)
Vinayak Mehta
@vinayak-mehta
@nftopham Did you also try qpdf like mentioned in the docs? https://camelot-py.readthedocs.io/en/master/user/quickstart.html#reading-encrypted-pdfs
Arky
@arky
I think it would be great to have OCR feature in Camelot/Excalibur, it would be really helpful to create open datasets from such PDF documents https://data.opendevelopmentmekong.net/dataset/facility-quarantine-is-necessary-for-returners-from-other-state-and-region-mandalay'
Vinayak Mehta
@vinayak-mehta
Yes, I'm trying to find time to experiment with https://github.com/JaidedAI/EasyOCR as it says that it works with different languages and even on snapshots.
Arky
@arky
Great
SolarDesalination
@SolarDesalination
Hi, I've installed excalibur, but when I write excalibur initdb in the python terminal, it says invalid syntax
4 replies
squareofseo
@squareofseo
hello i have question..
RuntimeError: Please make sure that Ghostscript is installed
i want to solve this error..ㅠ^ㅠ

OSError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/camelot/ext/ghostscript/_gsprint.py in <module>()
259 try:
--> 260 libgs = cdll.LoadLibrary("libgs.so")
261 except OSError:

8 frames
OSError: libgs.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/camelot/ext/ghostscript/_gsprint.py in <module>()
265 libgs = ctypes.util.find_library("gs")
266 if not libgs:
--> 267 raise RuntimeError("Please make sure that Ghostscript is installed")
268 libgs = cdll.LoadLibrary(libgs)
269

RuntimeError: Please make sure that Ghostscript is installed

thie error..
Vinayak Mehta
@vinayak-mehta
Looks like ghostscript isn't available on your PATH, how did you install it? These are the install instructions: https://camelot-py.readthedocs.io/en/master/user/install-deps.html
Shivam-Fullstack
@Shivam-Fullstack
Hi @vinayak-mehta , i'm using camelot-py==0.7.3 and i have facing infinite waiting during read table , so i have to re-run scheduler and then same pdf get read. is it a known issue ? . please suggest how can i fix it.
Shivam-Fullstack
@Shivam-Fullstack
any one please ans. my question
AndrewDaher
@AndrewDaher
hey guys, having some issues with excalibur
trying to run on windows
tried the executable but doesn't work properly, had a bunch of issues, so trying the manual way, but can't get past this step
$ excalibur initdb
image.png
Arky
@arky
@AndrewDaher I think it might work if you use within virtualenv 'python -m venv ./venv'
Vinayak Mehta
@vinayak-mehta
@Shivam-Fullstack Sorry for the late reply. This is not a known issue. Does it only happen on that particular PDF? There might be a problem with the file itself.
@AndrewDaher python -m excalibur initdb should also work.
Arky
@arky
@vinayak-mehta Let's catch up, just put something up on your calendar.
Vinayak Mehta
@vinayak-mehta
:+1:
Vinayak Mehta
@vinayak-mehta
Would love to get everyone's thoughts on this: camelot-dev/camelot#233
Arky
@arky
@vinayak-mehta How do you update Docs theme 'alabaster' to latest ? "-e git+https://github.com/bitprophet/alabaster/@3b68afcfe55a80508254b22904294100a160e6a7#egg=alabaster"
3 replies
avidalonc
@avidalonc
Hello everybody, I want to export all tables from my pdf into an excel or csv, but it only exports the first page, even though i have put this code to read all the tables: tables = camelot.read_pdf(file, pages="all") Could you help me with this pls?
Vinayak Mehta
@vinayak-mehta
Hi @avidalonc you can export all tables into multiple csvs by following the docs here: https://camelot-py.readthedocs.io/en/master/user/quickstart.html
To get a single csv, you'll have to combine all table dataframes into one using Python code, and then export that single dataframe
Gaurav
@gggauravgandhi
The cell coords are calculated always with 72dpi, regardless of what dpi I pass when to read_pdf. is this correct assumption?
1 reply
Pankaj S Y
@pankajsy9_twitter
Can "camelot" run on Python version 2 ?
2 replies
Pankaj S Y
@pankajsy9_twitter
Is it possible to detect tables ?
3 replies
sridharvelusamy
@sridharvelusamy

Hi vinayak,
I'm running camelot python script in red hat linux environment

i get following error

File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/io.py", line 117, in read_pdf
*kwargs
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/handlers.py", line 172, in parse
p, suppress_stdout=suppress_stdout, layout_kwargs=layout_kwargs
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/parsers/lattice.py", line 402, in extract_tables
self._generate_image()
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/parsers/lattice.py", line 219, in _generate_image
with Ghostscript(
gs_call, stdout=null) as gs:
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/ext/ghostscript/init.py", line 95, in Ghostscript
stderr=kwargs.get("stderr", None),
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/ext/ghostscript/init.py", line 39, in init
rc = gs.init_with_args(instance, args)
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/ext/ghostscript/_gsprint.py", line 174, in init_with_args
raise GhostscriptError(rc)
camelot.ext.ghostscript._gsprint.GhostscriptError: -1442225768

fernandogoncalez
@fernandogoncalez

Hi vinayak,
I'm running camelot python script in red hat linux environment

i get following error

File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/io.py", line 117, in read_pdf
*kwargs
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/handlers.py", line 172, in parse
p, suppress_stdout=suppress_stdout, layout_kwargs=layout_kwargs
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/parsers/lattice.py", line 402, in extract_tables
self._generate_image()
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/parsers/lattice.py", line 219, in _generate_image
with Ghostscript(
gs_call, stdout=null) as gs:
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/ext/ghostscript/init.py", line 95, in Ghostscript
stderr=kwargs.get("stderr", None),
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/ext/ghostscript/init.py", line 39, in init
rc = gs.init_with_args(instance, args)
File "/home/dsadm/Python/anaconda3/lib/python3.7/site-packages/camelot/ext/ghostscript/_gsprint.py", line 174, in init_with_args
raise GhostscriptError(rc)
camelot.ext.ghostscript._gsprint.GhostscriptError: -1442225768

use flavor="stream" in camelot.read_pdf to see if it works

fernandogoncalez
@fernandogoncalez
Folks, what do i need to type in "[base]" pip install "camelot-py[base]"???
Teoh Sin Yee
@teohsinyee

How to detect table from a bunch of PDF files using Camelot?

I want to input bunch of PDF files, and return True or False only.

I have 2000 PDF files and want to segregate files that contain tables.
It's impossible to open the file 1 by 1 to check if it contains table.

prasanna malla
@prasmalla_gitlab
having trouble getting camelot/excalibur up and running
is there a guide somewhere i can follow
mac / linux / docker - running in to issues
prasanna malla
@prasmalla_gitlab
on a fresh install python 3.10.4 excalibur initdb erros out with - ImportError: cannot import name 'MutableMapping' from 'collections'
Balamurugan Gnanakumar
@balamurugan.gnanakumar:matrix.org
[m]
I am trying to use camelot for the first time. I have installed the camelot , ghostscripts as per docs...
when i try to check the version of camelot i am seeing the version
camelot --version
but when i type import camelot i get error
'import' is not recognized as an internal or external command,
operable program or batch file.
Please help
I am using python 3.10.5
BalamuruganGnanakumar
@BalamuruganGnanakumar
I am able to see the CSV files after setting the lib and bin location to the variable.
Currently it produce many files based on table. if it has 5 tables in the PDF then it gives 5 CSV files. Can we have all that in one PDF?
Any solution?
GFM
@gil_fm84_twitter
I am a newbie here: I am getting a PDF file downloaded from a database table as bytes but when I try to get camelot reading the doc it shows an error. any advice?
ramSeraph
@ramSeraph
Given that the project has been unmaintained for a year. Should we consider a community maintained fork? @vinayak-mehta any objections?
Also.. any volunteers please do come forward