Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Sep 20 17:23
    mandusm commented #355
  • Sep 20 08:04

    andris9 on v1.29.6

    (compare)

  • Sep 20 08:04

    andris9 on master

    include rewrite addresses in QU… 1.29.6 (compare)

  • Sep 19 15:40
    louis-lau commented #355
  • Sep 19 08:53
    andris9 commented #355
  • Sep 18 18:19
    mandusm edited #355
  • Sep 18 18:18
    mandusm opened #355
  • Sep 15 20:58
    louis-lau transferred #331
  • Sep 14 06:43

    andris9 on master

    fixed a but with invalid proper… (compare)

  • Sep 13 10:37
    louis-lau closed #354
  • Sep 13 10:37
    louis-lau commented #354
  • Sep 13 10:20
    asfour75 opened #354
  • Sep 13 08:56

    andris9 on master

    throw if encryption secret is n… (compare)

  • Sep 13 07:00

    andris9 on master

    Add optional thread message cou… Make thread counters work corre… Merge pull request #353 from lo… (compare)

  • Sep 13 07:00
    andris9 closed #353
  • Sep 13 07:00
    andris9 commented #353
  • Sep 12 20:17
    louis-lau ready_for_review #353
  • Sep 12 20:17
    louis-lau commented #353
  • Sep 12 20:15
    louis-lau synchronize #353
  • Sep 12 16:19
    louis-lau commented #353
Andris Reinman
@andris9
i'm not sure but there's at least 6 replica shards which means 6*3=18 physical servers. in addition there are mongodb mongos servers and configurtion shard (also 3 servers)
if you have 3 member replica set then at least 2 of these must be able to communicate with eachother, otherwise there would be no primary instance anymore (even if there's nothing wrong with current primary, it steps down automatically once it does not have enough votes)
Louis
@louis:laureys.me
[m]
Yeah, that sounds hard to manage. Are you doing 3 data bearing nodes or 2 + 1 arbiter?
Seems more cost effective, but with additional risk
Andris Reinman
@andris9
3 data nodes. 2 in one DC and 1 in 2nd DC. but if the DC with 2 nodes goes offline then the 1 can't process anything anymore as it does not have enough votes. and the system makes so many write operations that using only a replica member is not possible. at first I tried to designed the system in a way where emails would still be readable even if there is no primary member anymore but each read causes several writes (eg. marking unseen emails as seen etc) that it requires an actual primary to be available
there's also separate disks for different kind of data. so messages and user information are stored in a db that is in a fast SSD. attachments (but not attachment indexes) are stored in a very large but slow HDD. so if that HDD becomes inaccessible then most of the system still works, you can log in and read emails etc but you can not download any attachments
so when using IMAP then you can only download messages without attachments. it fails requests agains messages that do have at least one attachment. in webmail attachments are loaded later, so each message can be read (but attachments requests will fail, so no images etc)
Louis
@louis:laureys.me
[m]
I like how you seperated those databases :)
Makes a lot of sense
Andris Reinman
@andris9
yeah, it is much cheaper that way. SSD is quite pricey, and you don't even need to access attachments most of the time
Louis
@louis:laureys.me
[m]
And attachments use the most data as well
People love sending large files over email hahaha
My relatives always complain about the 25mb limit that's basically everywhere, but I know that I don't want a higher limit
Daviesmolly
@Daviesmolly
How to integrate text or simple basic captcha to wildduck-webmail??
Andris Reinman
@andris9
@Daviesmolly wildduck webmail supports reCaptcha but it is disabled by default, https://github.com/nodemailer/wildduck-webmail/blob/3371984a32a7942d7859c3fcde923cf62484e7fa/config/default.toml#L48-L51
Tiny product news. WIldduck Auditing System now generates verification hashes for email downloads (each download is logged and you can later download verification hash for the downloaded file to verify if the downloaded files has been changed or not)
Screenshot 2021-07-03 at 11.49.01.png
Screenshot 2021-07-03 at 11.49.22.png
Louis
@louis:laureys.me
[m]
Cool! What's the exact use case for this?
Andris Reinman
@andris9
Once an email is downloaded as an evidence it must be possible to later validate that the email has not been tampered with and is the same that was in the server
not every download is actually signed. Instead download hash is logged and once you try to download it then the file is put together and signed with server key
Louis
@louis:laureys.me
[m]
Ah, that's pretty cool
Andris Reinman
@andris9
Btw this does not hash actual emails but the container, eg the downloaded zip file. Every time you download emails, be it a single email file or a zipped selection, then that action is logged and you can later go and download signed verifying hash for that download. Would prefer to somehow include the hash with the initial download but the zip files are streamed (can be very large) and there is no way to know the hash of it before it has been actually downloaded
Audit system does not show email contents, only metadata (including subject and to/from addresses). To actually see the email you have to download it and that action is logged.
venusian
@venusian:matrix.org
[m]
Hi all, I doing research for a medium to large installation and was wondering what scale of installations Wildduck is used for at the moment, it sounds like it should scale really well but I have not found any references so far
Andris Reinman
@andris9
@venusian:matrix.org WildDuck is mainly developed for a single specific email system. That system currently stores about 70TB of emails (that's virtual size, actual db size with deduplication is 47TB) and has 100k+ registered accounts. I'm not 100% sure but I guess that there are about 10k-20k logged in IMAP users in peak hours. There are 7 mongodb shards. New shards are added whenever free space runs out.
Louis
@louis:laureys.me
[m]
Have you ever ran into ram or CPU limitations before space ran out on a shard? Or is that generally not a problem?
Andris Reinman
@andris9
CPU is usually not an issue. Real problem is memory size as MongoDB needs to keep indexes in memory
if there is not enough memory then Mongo loads only "hot" indexes to memory and keeps everything else on disk that makes irregular operatsions (eg. search) quite slow
this is also the main thing that limits shard size - too much data on a single shard means that there is no way that indexes fit into memory
another limit is backup - backing up regularly a lot of TBs is real pain
so if the shard is smaller then it is also easier to back it up as there is less data
Venusian
@venusian:matrix.org
[m]
We're looking at a small multiple of those stats and are interested in alternatives to the known dovecot setups however at this scale everything new is scary :)
Backing up would need to be 'smart' and not simply backup the mongodb files but contents based on actual change I think, as coping everything everytime would murder any viable setup I can think of
Louis
@louis:laureys.me
[m]
Afaik the only viable backup method without mongodb enterprise is filesystem snapshots:
https://docs.mongodb.com/manual/tutorial/backup-sharded-cluster-with-filesystem-snapshots/
Andris Reinman
@andris9
we use PerconaDB where you can create db snapshots. so on each shard there is one replica set member with an extra 10TB disk. once a day we run the command to create the snapshot to that disk.
but this approach is not 100% perfect. Percona has newer backup tools available that are better but we have not started using these yet
by perconaDb I mean the MongoDB version released by Percona. it is otherwise exact copy but has additional options
Louis
@louis:laureys.me
[m]
Oh interesting. I have looked at their backup tool (https://www.percona.com/software/mongodb/percona-backup-for-mongodb), but my test backup failed to restore without much additional info. So I gave up on that.
But that's something different I think
Andris Reinman
@andris9
Anyway so far most issues with scaling have been with MongoDB. Regarding WildDuck the issues so far have been usually been bugs, not scaling.
and the issues with MongoDB have not been related to MongoDB being bad but more of lack of experience in scaling a large cluster
each large cluster is different, so you can't really follow a tutorial or anything
Venusian
@venusian:matrix.org
[m]
Can I ask what the 16 nodes look like in specifications like hardware and cpu? Scaling, and backingup for that matter large mongodb databases is something that has been done, interesting..
Andris Reinman
@andris9

I can't say the specs as I don't have access to these machines. I remember that the first shard was 3 machines, each had 64GB RAM. I guess there were 32 cores, not 100% sure. There was no RAID (instead the system was relying on the MongoDB replication) but there were 2 disks:

  1. 1.9TB SSD that was mounted /var/lib/mongodb
  2. 10TB HDD mounted to /var/lib/mongodb/attachments/collection

This disk setup ensured that all the indexes, message metadata etc was stored on a fast SSD and all the attachment content was stored on that large, slow HDD

Daviesmolly
@Daviesmolly
i just really want a simple text captcha instead of google's recaptcha @andris9
Andris Reinman
@andris9
There is no ready made solution, you could probably edit the code and replace the reCaptcha thing with your own
Daviesmolly
@Daviesmolly
Thanks for your response
Although i got some other challenges @andris9