Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jun 13 16:44

    github-actions[bot] on nightly

    Update commons-io to 2.10.0 Merge pull request #875 from sc… (compare)

  • Jun 13 16:28

    mergify[bot] on master

    Update commons-io to 2.10.0 Merge pull request #875 from sc… (compare)

  • Jun 12 20:27

    github-actions[bot] on nightly

    Update sbt openapi plugin and u… Fix openapi linter warnings Add openapi linting to sbt's li… and 2 more (compare)

  • Jun 12 20:10

    mergify[bot] on master

    Update sbt openapi plugin and u… Fix openapi linter warnings Add openapi linting to sbt's li… and 2 more (compare)

  • Jun 12 19:54

    eikek on api-docs

    Add short text about editing op… (compare)

  • Jun 12 19:51

    eikek on api-docs

    Update sbt openapi plugin and u… Fix openapi linter warnings Add openapi linting to sbt's li… and 1 more (compare)

  • Jun 12 08:52

    github-actions[bot] on nightly

    Update dependency glob-parent t… (compare)

  • Jun 12 08:38

    renovate[bot] on npm-glob-parent-vulnerability

    (compare)

  • Jun 12 08:38

    renovate[bot] on master

    Update dependency glob-parent t… (compare)

  • Jun 12 08:21

    github-actions[bot] on nightly

    Update pdfbox to 2.0.24 Merge pull request #868 from sc… Update dependency normalize-url… (compare)

  • Jun 12 08:08

    renovate[bot] on npm-glob-parent-vulnerability

    Update dependency glob-parent t… (compare)

  • Jun 12 08:07

    dependabot[bot] on npm_and_yarn

    (compare)

  • Jun 12 08:07

    renovate[bot] on npm-normalize-url-vulnerability

    (compare)

  • Jun 12 08:07

    renovate[bot] on master

    Update dependency normalize-url… (compare)

  • Jun 12 05:19

    renovate[bot] on npm-normalize-url-vulnerability

    Update dependency normalize-url… (compare)

  • Jun 12 04:39

    dependabot[bot] on npm_and_yarn

    Bump normalize-url from 4.5.0 t… (compare)

  • Jun 10 20:44

    github-actions[bot] on nightly

    (compare)

  • Jun 10 20:42

    github-actions[bot] on nightly

    Update pdfbox to 2.0.24 Update postgresql to 42.2.21 Merge pull request #869 from sc… and 1 more (compare)

  • Jun 10 20:28

    mergify[bot] on master

    Update pdfbox to 2.0.24 Merge pull request #868 from sc… (compare)

  • Jun 10 20:27

    mergify[bot] on master

    Update postgresql to 42.2.21 Merge pull request #869 from sc… (compare)

lukas2
@lukas2:matrix.org
[m]

Hi, after using Docspell for a while now I would have 2 feature ideas regarding the machine learning capabilities.

  1. it would be nice to have support for custom fields as well. For example, if you label a lot of your documents with an ASN, it would be cool if that was also automatically entered into a custom field.

_(info about ASN)
https://paperless-ng.readthedocs.io/en/latest/usage_overview.html#processing-of-the-physical-documents
For each document, decide if you need to keep the document in physical form. This applies to certain important documents, such as contracts and certificates._

If you need to keep the document, write a running number on the document before scanning, starting at one and counting upwards. This is the archive serial number, or ASN in short.

Scan the document.

If the document has an ASN assigned, store it in a single binder, sorted by ASN. Don’t order this binder in any other way.

If the document has no ASN, throw it away. Yay!

2 replies
lukas2
@lukas2:matrix.org
[m]
  1. you could add another field for the date of birth when creating a person. From time to time, the date suggestion also shows the date of birth of a person, as a suggestion. If the date of birth is stored with the person, it could be filtered out in the date suggestions, when adding a document.
    However, since it does not actually add any extra work when sorting documents, this feature is rather less important.
2 replies
lukas2
@lukas2:matrix.org
[m]
  1. Lucki Yeah, doesn't have to be, but I kind of thought it would be a useful value to add.
    ( You could also say that you search for the company name with the full text search every time to find a document :) )
    eikek Exactly invoice amounts would be a similar use case. But probably a bit more complicated. I had imagined that it could be possible for machine-learning function to recognize the pattern of a 5-digit number, which is usually at the top of the page (top left). And the regularity that this is always entered in a certain field :)
eikek
@eikek:matrix.org
[m]

lukas2: Yes, detecting the invoice amount is indeed much more complicated, and you are right that it is probably relatively simple to recognize the asn number even when only looking at text. It would require to configure which field to set, though, and so I think I'll add it later as a separate plugin maybe; but I'm not sure at all currently - only the current state :).

Regarding dates: Yes, these dates should really be sorted in the suggestions! The are sorted for the due dates, but not for the item date… good catch!

Lucki
@raumende:matrix.org
[m]
always thought they were sorted for best match
1 reply
eikek
@eikek:matrix.org
[m]
yes you're right Lucki it is sorted by "best match". That is some heuristic which considers some properties like the position in the text. Dates that come near the top of the document are weighted more and come to the top of the suggestion list.
lukas2
@lukas2:matrix.org
[m]

Another suggestion for the future :D

It is currently too much work for me to assign a meaningful title to each document. If the position on the page could be used, then it could be possible to make a suggestion for a title (if no meaningful title was assigned)
Since the title line is often in the same place on a letter and may also be in bold print. Maybe there are already other tools that could add such a feature in the future. However, I have never searched to see if that might exist.

1 reply
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
When filtering via the sidebar, I would find the following very helpful:
If the documents are filtered using one of the meta fields (tag, organization, etc.), all other fields should only contain pre-filtered entries in order to only receive possible hits.
eikek
@eikek:matrix.org
[m]
I see; yeah, this sounds useful! Will create an issue so it doesn't get forgotten
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
Is that correct in version 0.23?
eikek
@eikek:matrix.org
[m]
Yeah, I forgot to change this text - I didn't notice until I translated it :-)
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
All right. Something else that caught my eye and that annoys me a bit;).
I have a lot of correspondents with spaces. When I search for an entry in the edit mode, after entering the space, the search field is jumped and I have to go back in to be able to continue writing. That is a bit exhausting in the long run.
eikek
@eikek:matrix.org
[m]
Oh yes! Really strange that I haven't noticed this. Will create a ticket!
Thanks for reporting!
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
I have another question regarding drop down box search.
I don't quite understand the results. Shouldn't "A_Org" come earlier in this case and shouldn't be displayed as the only result after entering the entire string?
eikek
@eikek:matrix.org
[m]
I'm using some fuzzy-search package here. I think it might treat the underscore like the space, as a separator for incremental search. If there weren't the other bug you found, you would search by typing "a [space] org" and get there quite fast.
So, I'm not exactly sure how it really works in detail. I think it might not even use a separator but treat every character as incremental
and it's not case-sensitive; what does it present when you search "aorg"?
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
Ah, OK. Good to know.
Looks like everytime same results.
eikek
@eikek:matrix.org
[m]
ok, unfortunately the docs to that package are not very deep. would need to look at the code. could be that the underscore is ignored
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
It's not really a problem either. If it's a fuzzy search, it's fine for me;)

All right. Something else that caught my eye and that annoys me a bit;).
I have a lot of correspondents with spaces. When I search for an entry in the edit mode, after entering the space, the search field is jumped and I have to go back in to be able to continue writing. That is a bit exhausting in the long run.

The problem does not seem to exist on the smartphone as I have just noticed

eikek
@eikek:matrix.org
[m]
oh uh 😕 I have no idea why that is. To me it should behave exactly the same….
Lucki
@raumende:matrix.org
[m]
can confirm the spacebar bug in firefox
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
OK, in chromium based browser it works but not in firefox...
eikek
@eikek:matrix.org
[m]
oh ok, thanks! but this makes it then quite hard to fix, I guess. so bad, because I'm using firefox, too. Maybe it's possible to suppress this effect somehow…
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
All right. We'll see :)
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
Hi, I'm not sure if there is a problem with docspell, so I'll write here first ... I was surprised that the correspondent is not recognized by a perfect PDF (not scanned). As you can see in the picture, the labels do not recognize "1&1 ..." but generally "1 ..." My saved correspondent "1&1" is never recognized ?!
eikek
@eikek:matrix.org
[m]
Hm, this is difficult. The NLP doesn't seem to recognize 1&1 as a company name. So it can't look it up. This can happen, because these models are generally trained for the German language (there is more info somewhere on the stanford nlp sites). That's why docspell should also use some patterns form the address book. And last, a classifier is also applied that should be able to find 1&1 as a company after some of those documents have been trained. But only, if you don't also have a company "Telekom", then I don't know what would win. I try to reproduce it here to see if I can find something. Currently I'd say this is a bug, it should be recognized. Do you have any other companies that are in the document, too? and do the suggestions make sense at all?
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
Telekom is unfortunately already available. So I have to look for something else. So is the address included with the NLP? Funny that it wasn't used for keywording, right? All address fields were filled in for "1 & 1". I will first change the name to "1und1" but this will only ever be used in the company's email address.
I'll add the website and the email address and run the documents again.
eikek
@eikek:matrix.org
[m]
it's a good idea to add email address and website to an organization, this is also used when looking up a match. The NLP tries to recognize the "kind" for a term: organization or person for example. It seems it doesn't recognize 1&1.
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
All right. I'll try it out and then report if it works.
Lucki
@raumende:matrix.org
[m]
meh, stupid github sends a release mail for every nightly tag…
eikek
@eikek:matrix.org
[m]
Yeah, that's due to my recent change…. I don't like it either. Do you have a watch on the repo for releases only? What I wanted is to create a "nightly" release whenever something gets merged into master. This works now and also docker images are built and pushed. But github will now send notifications 😒 any ideas how we can get rid of these notifications?
I was hoping that this doesn't happen for pre-releases…
Lucki
@raumende:matrix.org
[m]
yes, it's on releases only
Lucki
@raumende:matrix.org
[m]
guess I'll have to set up an email filter
eikek
@eikek:matrix.org
[m]
Hm looking at this it's not possible to filter notifications for pre-releases :(
Lucki
@raumende:matrix.org
[m]
filter successfull
Markus Adler
@g-0651829:matrix-test.gwdg.de
[m]
eikek I don't have a lot of practical experience with Github (until now, I've only been reading along). I have adjusted other files for the German translation in my fork. Is it ok to create a pull request for this now without the files I changed before being merged by you? I hope you understand what i mean?! ☺
eikek
@eikek:matrix.org
[m]
Markus Adler: I'm not sure 🙂 But don't worry. There is already a pull request - you don't want this to be merged? The pull request is still open, you can change files however you like and push it to your fork. The pull request gets updated automatically. If you don't want to have it merged, you can close the pull request and open another. Or you can just open a new pull request and then we have two :-) also fine.