These are chat archives for ractivejs/ractive

1st
Aug 2018
Cerem Cem ASLAN
@ceremcem
Aug 01 2018 12:08
does anyone runs a linux server as the system admin?
Chris Reeves
@evs-chris
Aug 01 2018 12:12
yes
Cerem Cem ASLAN
@ceremcem
Aug 01 2018 12:13
okay. here is an advice from a crying sys admin: be sure that your backup system works properly
Chris Reeves
@evs-chris
Aug 01 2018 12:14
indeed
learned the hard way that if you don't restore every now and then, you may not actually have backups
dr checks are important 😥
Cerem Cem ASLAN
@ceremcem
Aug 01 2018 12:21
I don't have a word over this
I now understand a 99% complete strategy is not a strategy when it's about backups
Chris Reeves
@evs-chris
Aug 01 2018 12:25
postgres backups are beautiful... pretty much bullet proof. If the dump completes with return 0, you can be confident you have a good backup. Restores work everywhere even cross platform. Good postgres dumps have saved me several times.
Arnaud Dagnelies
@dagnelies
Aug 01 2018 14:20
I've once seen a complete site go offline because the hosting provider had no "proper" backups in place. They advertised they had backups, that was the theory at least. In practice, as the server broke one day, they also noticed a "technical issue" in the backups. The site's database backup was corrupt. The site lost all its data. It then switched the hosting and restarted from zero.
I also think this phenomenon comes because of "cheapness" ...like: do you want with or without backups? Costs the double. Do you want really really safe backups? Costs the triple. Hmmm... nah, it's OK, I'll just take the simple but cheap offer.
Cerem Cem ASLAN
@ceremcem
Aug 01 2018 14:26
I think it's like health or life or freedom. people simply don't want to believe that it's perfectly possible to loose it at any time until they lost.
it's like belly. you don't think you will have one, that's why you'll have one.
Joseph
@fskreuz
Aug 01 2018 14:49
Well, it's not just checking if the backup systems work properly. You'll also have to check if the backups themselves work. :grin:
kouts
@kouts
Aug 01 2018 14:50
@fskreuz that's the hardest part
especially in complex systems
Joseph
@fskreuz
Aug 01 2018 14:51
You can have some Jenkins job dumping the DB routinely. But it's not a guarantee the dump wasn't corrupted, or if the data was sane to begin with. :grin:
Cerem Cem ASLAN
@ceremcem
Aug 01 2018 14:52
by the way, this disaster I'm currently having is full server blow up (whole disk failure)
Joseph
@fskreuz
Aug 01 2018 14:53
oof :scream:
Chris Reeves
@evs-chris
Aug 01 2018 14:55
I have a client that's trying to start quarterly full dr drills for exactly this reason
we have failovers, but there are still a few single points of failure that would require rebuilds should the noc disappear
kouts
@kouts
Aug 01 2018 14:59
@ceremcem I wish you all the luck!
Cerem Cem ASLAN
@ceremcem
Aug 01 2018 14:59
thanks
kouts
@kouts
Aug 01 2018 15:00
I have been there, so I know how it feels...
Cerem Cem ASLAN
@ceremcem
Aug 01 2018 15:02
I've spent extensive amount of work/time to get the backup tasks done right but the last missing part was on my todo list for a long time which bite me now
there are 5 hours for ddrescue to finish the image dump and somehow I have a feeling that I could restore everything without any significant data loss but I know what I must do now
I should fill the gap so I would be able to take whole server backup and build a new machine from scratch over 3g
Chris Reeves
@evs-chris
Aug 01 2018 15:13
hope it works out!
Joseph
@fskreuz
Aug 01 2018 15:14
On the bright side, GitLab was able to recover from a similar failure. They even did a webcast while they did it. :grin:
kouts
@kouts
Aug 01 2018 15:47
https://github.com/ seems down from here
Cerem Cem ASLAN
@ceremcem
Aug 01 2018 15:47
thanks. currently 750GB of disk image is read (48%) and still no bad sectors, no bad areas, no read errors.
@kouts it seems ok here
@fskreuz I can share the statistics while recovering if it would help me too :))