Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Dec 02 09:50
    keerthip1121 edited #150
  • Dec 02 09:48
    keerthip1121 opened #150
  • Nov 29 09:02
    Jhen-wanderlust opened #149
  • Nov 27 16:36
    answerquest commented #148
  • Nov 27 14:17
    answerquest opened #148
  • Nov 27 06:36
    answerquest commented #124
  • Nov 22 02:29
    juthaip opened #147
  • Nov 17 14:53
    ncarboni opened #273
  • Nov 15 17:43
    answerquest commented #146
  • Nov 11 16:17
    olivierbouman commented #218
  • Nov 11 16:16
    olivierbouman commented #218
  • Nov 09 22:11
    myrhillion commented #268
  • Nov 09 15:53
    tiagosamaha commented #268
  • Nov 08 12:40
    alissonsv closed #271
  • Nov 08 12:40
    alissonsv commented #271
  • Nov 08 06:19
    ConMan05 commented #142
  • Nov 02 18:28
    denschmitz commented #261
  • Nov 02 18:21
    denschmitz commented #261
  • Oct 29 12:10
    joackobengochea commented #135
  • Oct 29 12:10
    joackobengochea commented #135
Vinayak Mehta
@vinayak-mehta
Or in a gist / pastebin
nftopham
@nftopham
Hello, I am getting a huge amount of debug messages when running Camelot. The extraction works fine and passing suppres_warnings=True does not do anything.
they are all logs/debug messages from pdfminer
nftopham
@nftopham
I have disabled them manually via logging.getLogger("pdfminer").setLevel(logging.WARNING) but this is not really desirable
Vinayak Mehta
@vinayak-mehta
@nftopham I understand, thanks for reporting it here. I'll start work on fixing logging and the CLI's terminal output in general soon.
nftopham
@nftopham
Thanks, it's great otherwise
nftopham
@nftopham
Hi @vinayak-mehta I wanted to share with you a problem I had with Camelot and the solution
so I was getting a NotImplementedError because the PDF version I was reading had an unsupported encryption protocol, as stated on the camelot docs
so I searched for some solutions and ended up re-writing the file using ghostscript and downgrading the version. this actually completely removed the encryption which is quite funny. so much for password protected PDFs!
here is my solution, a bit messy right now but you get the gist. would be great if this could be included in future releases as there are only going to be more PDFs written > version 1.4 and PyPDF2 seems to be not interested in a fix
try:
    tables = camelot.read_pdf(**camelot_params)
except NotImplementedError:
    output = os.system('gswin64c -sDEVICE=pdfwrite -dCompatabilityLevel=1.4 -dSAFER -dNOPAUSE -dBATCH -o temp.pdf C:/Users/User/Desktop/input.pdf')
    url = os.path.join(os.getcwd(),"temp.pdf")
    camelot_params = get_camelot_params(meta, url)
    tables = camelot.read_pdf(**camelot_params)
Vinayak Mehta
@vinayak-mehta
@nftopham Did you also try qpdf like mentioned in the docs? https://camelot-py.readthedocs.io/en/master/user/quickstart.html#reading-encrypted-pdfs
Arky
@arky
I think it would be great to have OCR feature in Camelot/Excalibur, it would be really helpful to create open datasets from such PDF documents https://data.opendevelopmentmekong.net/dataset/facility-quarantine-is-necessary-for-returners-from-other-state-and-region-mandalay'
Vinayak Mehta
@vinayak-mehta
Yes, I'm trying to find time to experiment with https://github.com/JaidedAI/EasyOCR as it says that it works with different languages and even on snapshots.
Arky
@arky
Great
SolarDesalination
@SolarDesalination
Hi, I've installed excalibur, but when I write excalibur initdb in the python terminal, it says invalid syntax
4 replies
squareofseo
@squareofseo
hello i have question..
RuntimeError: Please make sure that Ghostscript is installed
i want to solve this error..ㅠ^ㅠ

OSError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/camelot/ext/ghostscript/_gsprint.py in <module>()
259 try:
--> 260 libgs = cdll.LoadLibrary("libgs.so")
261 except OSError:

8 frames
OSError: libgs.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/camelot/ext/ghostscript/_gsprint.py in <module>()
265 libgs = ctypes.util.find_library("gs")
266 if not libgs:
--> 267 raise RuntimeError("Please make sure that Ghostscript is installed")
268 libgs = cdll.LoadLibrary(libgs)
269

RuntimeError: Please make sure that Ghostscript is installed

thie error..
Vinayak Mehta
@vinayak-mehta
Looks like ghostscript isn't available on your PATH, how did you install it? These are the install instructions: https://camelot-py.readthedocs.io/en/master/user/install-deps.html
Shivam-Fullstack
@Shivam-Fullstack
Hi @vinayak-mehta , i'm using camelot-py==0.7.3 and i have facing infinite waiting during read table , so i have to re-run scheduler and then same pdf get read. is it a known issue ? . please suggest how can i fix it.
Shivam-Fullstack
@Shivam-Fullstack
any one please ans. my question
AndrewDaher
@AndrewDaher
hey guys, having some issues with excalibur
trying to run on windows
tried the executable but doesn't work properly, had a bunch of issues, so trying the manual way, but can't get past this step
$ excalibur initdb
image.png
Arky
@arky
@AndrewDaher I think it might work if you use within virtualenv 'python -m venv ./venv'
Vinayak Mehta
@vinayak-mehta
@Shivam-Fullstack Sorry for the late reply. This is not a known issue. Does it only happen on that particular PDF? There might be a problem with the file itself.
@AndrewDaher python -m excalibur initdb should also work.
Arky
@arky
@vinayak-mehta Let's catch up, just put something up on your calendar.
Vinayak Mehta
@vinayak-mehta
:+1:
Vinayak Mehta
@vinayak-mehta
Would love to get everyone's thoughts on this: camelot-dev/camelot#233
Arky
@arky
@vinayak-mehta How do you update Docs theme 'alabaster' to latest ? "-e git+https://github.com/bitprophet/alabaster/@3b68afcfe55a80508254b22904294100a160e6a7#egg=alabaster"
3 replies
avidalonc
@avidalonc
Hello everybody, I want to export all tables from my pdf into an excel or csv, but it only exports the first page, even though i have put this code to read all the tables: tables = camelot.read_pdf(file, pages="all") Could you help me with this pls?
Vinayak Mehta
@vinayak-mehta
Hi @avidalonc you can export all tables into multiple csvs by following the docs here: https://camelot-py.readthedocs.io/en/master/user/quickstart.html
To get a single csv, you'll have to combine all table dataframes into one using Python code, and then export that single dataframe
Gaurav
@gggauravgandhi
The cell coords are calculated always with 72dpi, regardless of what dpi I pass when to read_pdf. is this correct assumption?
1 reply
Pankaj S Y
@pankajsy9_twitter
Can "camelot" run on Python version 2 ?
2 replies
Pankaj S Y
@pankajsy9_twitter
Is it possible to detect tables ?
3 replies