Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Joshua Taillon
@jat255
office-support will be another PR, I think
it will probably rely on unoconv https://github.com/dagwieers/unoconv
and only run if libreoffice is installed
as an optional dependency
preview-generator probably isn't strictly necessary
Joshua Taillon
@jat255
Ok, #395 deals with the text file handling
happy to incorporate any ideas/suggestions
Daniel Quinn
@danielquinn
Excellent! I'll have a look soon. This week is pretty crazy for me.
Joshua Taillon
@jat255
After exporting, (painfully) changing a lot of dates in the manifest.json, and re-importing, I have a lot of month filters on the right side that don't point to any documents:
image.png
any thoughts on how to prune that list of filters of empty values?
Daniel Quinn
@danielquinn

Those month filters are automatically generated based on the documents it finds in the database, so unless you've done something creative in hacking at the filtering system in the admin, it's likely that you've still got documents with .created values in those dates. If you disable all other filters, and then select a month, do you get any documents?

The code that dictates that list is here: https://github.com/danielquinn/paperless/blob/master/src/documents/admin.py#L24-L31

Joshua Taillon
@jat255
figured it out, and I think it's a minor bug
the documents missing have a T00:00.000 time stamp of the date in the list
but since I set my timezone to local, it actually pushed it back into the previous month
so in this example, the DB has a (obviously bogus) timestamp of 7550-01-01 00:00:00
but I don't see it unless I browse to http://localhost:8000/admin/documents/document/?month=7549-12
Daniel Quinn
@danielquinn
Oooh. Yeah I wouldn't recommend that. Fiddling with timezones is a nightmare. Better to keep your timezone in UTC and then use middleware to set the timezone in the UI.
Joshua Taillon
@jat255
knowing nothing about django, I'll leave that to someone else :smile:
Daniel Quinn
@danielquinn
By all means :-)
Tobias Markmann
@tfar
As paperless has a login mode, does it support multiple users. So that each user can have different consumption directories?
Joshua Taillon
@jat255
@danielquinn don't mean to push, but there are a couple PRs and issues on the repo that have been submitted; any chance of getting a review of them?
Daniel Quinn
@danielquinn

@tfar theoretically it could but it doesn't at the moment. The problem isn't the login, but the consumption. There's currently no way to tell paperless when it's consuming a file that the file in question is for User A or User B. We'd have to refactor the consumer to watch multiple subdirectories, each with an associated user, and then use that relationship to attach the user to the document. It's a neat idea, but I'm sorry but I just don't have time for it.

@jat255 I'm sorry but I've been just overrun of late, trying to get another project off the ground while I'm also starting a new job. It's not left a lot of time for Paperless. I'll try to post notes on your PRs now.

Joshua Taillon
@jat255
@danielquinn not a problem! congrats on the new job!
Daniel Quinn
@danielquinn

Ok I've merged one of your PRs and commented on another. For the third one, that's a lot more than I have time for right now (have to cook dinner for the pregnant wife!) so I'm going to leave that for tomorrow.

There's a bug somewhere in the existing codebase that has one of the tests failing, so I'm going to try to iron that one out before I look at yours 'cause I think they may be relating to the same area (date guessing), but I promised @erikarvstedt that I'd release this weekend, so I'll definitely be digging around in your PR before that happens.

Joshua Taillon
@jat255
thanks! the other PR is a bit bigger, and while it's no guarantee that it's bug-free, I've been running it on my system for the last couple weeks without issue
Joshua Taillon
@jat255
has anyone seen "database is locked" errors before?
image.png
this happened when trying to add a tag while the consumer was working on some documents
from what I can find, it's due to multiple processes trying to access the database
would a migration to mysql fix this, perhaps?
Daniel Quinn
@danielquinn

As far as I know, a "locked" database is usually the result of a filesystem problem. As the db is Sqlite here, my guess is that either (a) the permissions on the file (or parent directory?) are such that you can't write to it, or (b) the filesystem on which said file is mounted has gone away somehow. If this was a Windows system, you'd have to worry about the file being read at the same time as it's being written, but it looks like you're on a Linux box, so I'd have to guess either (a) or (b).

Sorry for the delay!

Phoara
@Phoara
Hey everyone. I've been trying to install the Paperless Docker container on my Synology. However, it crashes and reboots leaving "staticfiles" -> "runserver" the last entry in my logs. I defined four entries in Volumes for "media", "data", "consume", "export" with paperless:paperless as the owner. In the environment tab I added "PAPERLESS_OCR_LANGUAGES=deu" and "PAPERLESS_OCR_THREADS=2". Can anyone help me out?
And a happy new year to everyone of course. ;)
Daniel Quinn
@danielquinn
Hi @Phoara, I'm afraid I don't have any experience with Paperless on Synology, but I'd say that the first thing you should do is try running it manually from the command line with docker-compose up and see what comes out. I'm not sure how you installed it or how you're running it, but Paperless should definitely not be causing a reboot. In fact, I can't even think of how it would be causing one.
Phoara
@Phoara
Hi @danielquinn. Thanks for your reply. Docker on synology is just a graphical interface to add volumes, environment entries etc. I'm not really familiar with running things manually. The paperless docker container does not cause a reboot of my Synology, sorry. The container itself crashes and just keeps restarting itself. Too bad, Docker is still such a hassle sometimes.
Blake Smith
@Xboxhacka_twitter
So anyone been able to get this to run on Unraid? I have seen people to get this two work but they use two dockers to do this one called webserver and one called consumer. Is this supposed to run on two dockers?
Daniel Quinn
@danielquinn
Hi @Xboxhacka_twitter, yes Paperless is composed of two running processes: the consumer and webserver. The Docker Way is to run each process in a separate container, so that's how it's typically deployed.
Johann Bauer
@bauerj
Hey, any tips on how to debug the consumer? I added some new files but they don't show up in the web frontend
There is nothing in the docker logs for the consumer container
Johann Bauer
@bauerj
I see, apparently files in subdirectories of the consumer directory are not being processed: https://github.com/the-paperless-project/paperless/blob/master/src/documents/consumer.py#L90
Johann Bauer
@bauerj
I sent in a PR to address this. I hope there is no smart reason against this
And thanks for your work, paperless has been working great so far!
Nick Gerakines
@ngerakines
Hey everyone, I have a dumb question: What is a correspondent? The docs are kind of light on high level concepts and terminology.
kallangerard
@kallangerard
Hey everyone!
Just a question. Is it possible to run multiple consumers. Say in a cluster like kubernetes
My understanding is it could work, just I don't know if multiple workers would try to lay claim to the same document and cause issues
kallangerard
@kallangerard
I'm just not sure in the advantage of splitting up consumer and server if it can't be?
Daniel Quinn
@danielquinn

Hey @bauerj , sorry for the late reply, but thanks for the PR! This is an issue that regularly crops up and typically has to do with the (un)availability of inotify (network shares are notoriously common). Using --no-inotify (or a similar-sounding argument, I don't remember it exactly) will have the system revert to its old behaviour of a loop-poll.

@ngerakines a "correspondent" is juts a word Paperless uses to refer to "the other party" in a document. As the project was initially designed as a means for one person to log documents that had been sent to them by post, the correspondent was usually the person sending the letter to you. Otherwise, if it was a copy of something you sent to someone else, the correspondent would be the person/company/whatever you were sending the document to. If you feel this wasn't sufficiently clear, feel free to amend the docs! Pull requests are always welcome for documentation :-)

@kallangerard I believe so, yes, but there's a few things you have to be concerned about:

  1. The consumer processes should all be looking at different source directories. Currently, there's no locking system for the consumer to say "hey, I've got this, pick something else" to other would-be consumers. If you target the same dir, you may end up with a race condition.
  2. The default configuration is to use sqlite which does not play well with multiple concurrent writes. If you want to go this route, I would strongly recommend that you use PostgreSQL instead.
  3. Assuming a single target PostgreSQL database, you won't need to run multiple instances of the webserver (unless you want to).

I think that's all you'd have to concern yourself with. If you do end up doing this up on a kubernetes cluster, I'd love to read the details of what you did to make it go. Please feel free to contribute some documentation and/or share a link to a blog/reddit post on the subject should you feel the urge to write down what you did.

Nick Gerakines
@ngerakines
Thanks!