by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • May 26 06:40
  • May 26 06:38
  • May 12 18:19
    YanChii review_requested #143
  • May 12 18:19
    YanChii review_requested #483
  • May 11 21:12
    YanChii synchronize #483
  • May 11 21:12

    YanChii on pdns_cfg_generator

    place default dnsdist recursion… (compare)

  • May 11 19:38
    YanChii synchronize #483
  • May 11 19:38

    YanChii on pdns_cfg_generator

    stop DNS services later during … (compare)

  • May 10 23:12
    YanChii synchronize #143
  • May 10 23:12

    YanChii on pdns41

    set numeric uids/gids for pdns … (compare)

  • May 10 23:04
    YanChii synchronize #483
  • May 10 23:04

    YanChii on pdns_cfg_generator

    set numeric [ug]ids for pdns se… (compare)

  • May 10 21:22
    YanChii commented #400
  • May 10 21:13
    YanChii labeled #143
  • May 10 21:13
    YanChii labeled #143
  • May 10 21:13
    YanChii labeled #143
  • May 10 21:13
    YanChii labeled #143
  • May 10 21:13
    YanChii opened #143
  • May 10 21:13
    YanChii assigned #143
  • May 10 21:10
    YanChii labeled #483
Jan Poctavek
@YanChii
(there's only one headnode logfile)
balwo
@balwo
@YanChii Your instructions worked like a charm ! Nodes are now running 4.2.1. Thanks for your quick responses and outstanding help !
Jan Poctavek
@YanChii
glad to help
enjoy the Danube ;)
infinity202
@infinity202
Hi, i tried several Ubuntu installtions (which are provided by Joyent) and i still see that all KVM's will get KVM-clock settings and that performance is totally poor.
Jan Poctavek
@YanChii
Hi @infinity202, do you observe this behavior also on vanilla SmartOS? We don't modify the SmartOS platform so heavily to have such issues. If we can replicate it on vanilla, we can create an issue for Joyent.
infinity202
@infinity202
it is time for me to startup my old server and reinstall SmartOS on it. I will see what happens there.
Although i must say i have a ubuntu 16.04 20171122 b2da7f6e-7ef5-454c-9d76-d15e2ef8abf1 running at an OVH server with SmartOS 5.11 joyent_20181011T004530Z and that one is running fine and also has KVM-Clock a just saq
saq
saw
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:03:55 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:18:43 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:03:59 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:04:00 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:04:01 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:18:47 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:06:41 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:21:52 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:21:52 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:21:52 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:21:53 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:21:53 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:21:55 UTC 2019
ubuntu@--on danube Cloud--:/opt$ date
Thu Dec 19 16:21:56 UTC 2019
infinity202
@infinity202

ok, at least i found a solution to keep the clock at hpet after reboots:
sudo apt-get install sysfsutils

Set clocksource to hpet

sudo tee -a /etc/sysfs.d/clocksource.conf <<-EOF
devices/system/clocksource/clocksource0/current_clocksource = tsc
EOF

---------------------------------

sudo systemctl enable sysfsutils.service
sudo systemctl start sysfsutils.service

Jan Poctavek
@YanChii
so the vanilla smartos does the same... that was my suspicion
you can use hwclock -c if available
to see the diff between os clock (vm) and hw clock (hypervisor)
BTW you don't have to reinstall Danube to test vanilla smartos... just reboot with smartos USB stick and then reboot back with Danube stick
all VMs will be kept
Jan Poctavek
@YanChii
BTW2 I've added your howto to our known issues https://github.com/erigones/esdc-ce/wiki/Clock-unstable-in-KVM-VMs
infinity202
@infinity202
Crap i see i posted the wrong copy of the code.
Now it states "tsc" in stead of "hpet"
I found it on a forum where someone needed tsc, and i thought let me try this with hpet. It worked but i copied the lines from the forum to gitter without propper checking
Jan Poctavek
@YanChii
no prob, I've corrected it
tsc might also work… but it also might not (kvm-clock is a normalized tsc) and with tsc you will for sure have time issues during live migration
anyway, I think we should replicate this issue and create a SmartOS bug report… too many people suffer from this bug
klebed
@klebed

@YanChii Hi! Could you give some details on proper way of updating whole DC infrastructure?
I have all hw nodes booted from USB (2 USB Flash drives on each for redundancy) I have HA deployed therefore I have mgmt01-03, and on the first node01 USB flash drive contains first node image (not CN).
Now I see that recommended way is to update mgmt01 (bin/esdc-git-update, and bin/esdc-appliance-update) and then update first node (but how to do it in HA, and more than that, I have some changes in templates of mgmt01, which I guess I'll have to make by hands after upgrade of mgmt)

Then I guess I'll have to execute: /opt/erigones/bin/esdc-platform-upgrade v4.2 on all nodes, and it seems upgrade image on USB, but on which one? They all unmounted, and there are 2 of them. How to update another one (or all of them at once?)

Jan Poctavek
@YanChii
Hi. The normal way of upgrading is to run esdc-git-update which will automatically call also esdc-appliance-update. The latter one applies update on all numbered management instances (e.g. mgmt??, dns??, etc). So it should upgrade also HA instances.
If you encounter problems during HA upgrade, let us know.
Not sure what you mean by templates of mgmt01.
esdc-platform-upgrade rewrites the first USB key. Then after successful node reboot, you can find the second USB key by command rmformat and use dd to copy the first USB over the second (use rdsk device names)
Jan Poctavek
@YanChii
@/all FYI We are going to FOSDEM this weekend. If you want to meet us, just let us know.
FilipFrancis
@FilipFrancis
Cool I can come on Sunday to see you guys
DigitalOzUT
@DigitalOzUT
Hello
klebed
@klebed

Greetings, everyone!

some feedback.
1) If you have more than one backup to delete - action fails. Reproduce: make retention rate 10, wait for all backups to fill-up, and then just change retention to 5. You will see backup deletion always fails after that. Action should create queue and delete backups one-by-one though.

2) When making appliance-update, I had to run it on every mgmt0X by hands. And I could definitely say that 3 mgmt machines are not enough for fault tolerance. I suggest making session-share between mgmt machines, transaction-share and using haproxy, rather than VIP, corosync and other questionable stuff.

3) mon0 has a lot of issues, for example no partitioning and then a lot of housekeeper busy more than 75% every day, and also some false-positives with agent unavailable more than 5 min on nodes (without any real issues). And when mon0 is loaded with housekeepers you may hit timeouts.
Also when looking large list like Servers cpu consumption, it's possible but usually not clear which color belongs to which machine.

Maybe it worth looking at monit/grafana, rather than zabbix?

Jan Poctavek
@YanChii
Hi @klebed
thank you for your reports and feedback.
  1. I have created issue for this: erigones/esdc-ce#478
  2. The HA can be improved indeed. The main problem is the HA of postgresql master. It would require massive schema changes to support multi-master DB setup (using pub/sub replication). Therefore I'm not sure whether haproxy would help because you still need to wait for a DB failover. But no doubt that session sharing can be useful anyway during failover/switchover. Regarding running appliance-update on each node - what happended? It should be enough to run it just once.
  3. Database partitioning is not supported by Zabbix LLC (even thou everybody is using it). You might have a look at this doc: https://github.com/erigones/esdc-ce/wiki/Postgresql-partitioning-on-zabbix-monitoring-server
    We have a preparation for partitioning so you can just enable it. Same thing for housekeeper tuning.
    On top of that, DC supports horizontal Zabbix scaling - deploying a separate mon server per virtual datacenter. This way you will avoid a massive monolithic Zabbix VM that is hard to manage and has its own issues. See here: https://docs.danubecloud.org/user-guide/monitoring/dc-monitoring-server.html
Jan Poctavek
@YanChii
guys, I've been able to successfully replicate the kvm-clock issue
watch this issue for more info erigones/esdc-ce#480
klebed
@klebed
Yay! :)
klebed
@klebed

Hi everyone! How's your pandemic lockdown goes? Hope everyone is ok!

Couple of questions:

1) zpool status -v gives following:
pool: zones
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: none requested
config:

so is it worth upgrading zpool features?

2) I had one of the backup pools down due to sudden errors... it was happened unnoticed, so I thinking: It's cruitial to monitor health of storage on the nodes... I believe admin panel could also be enriched with some data about disks and pools health, and it probably worth including smart monitoring into node images by default.

Jan Poctavek
@YanChii
Hi @klebed
thank you for your feedback. We are quite fine. Some folks here are a bit overwhelmed by children at home but otherwise we're doing good :).
1) If you don't plan downgrading your platform to older version (probably not), you can run zpool upgrade <pool> to enable new zfs features.
2) Good point. Would you mind creating an issue in esdc-ce? I'll reference it in the fix.
Apart from that, the upcoming version will also contain the kvm-clock fix. It is already done and tested.
klebed
@klebed

2) Good point. Would you mind creating an issue in esdc-ce? I'll reference it in the fix.
Apart from that, the upcoming version will also contain the kvm-clock fix. It is already done and tested.

That's very cool! I was struggleing with that a lot for last months... every machine has to be manually fixed to hpet, which wastes time...

BTW, any news on bhive?

Would you mind creating an issue in esdc-ce?

I'll create an issue, sure...

Jan Poctavek
@YanChii
Not much news on bhyve so far. We need to get a lot of app upgrades out of the door before (e.g. smartos platform upgrade, in-place upgrades of pkgin to 2019Q4, powerdns redesign, zabbix upgrade, python upgrade, etc). Also there's one new (and pretty big) feature that is wanted before bhyve - kubernetes integration (erigones/esdc-ce#481). But it's not ready yet. We'll keep you posted.
Jan Poctavek
@YanChii
Almost forgot, we also want to add opnsense integration into the next release - it will (optionally) deploy pre-configured opnsense as router from the admin zone. This makes installations in Hetzner or OVH working out of the box.
klebed
@klebed
Also, as a suggestion: get rid of crappy Zabbix and switch to monit, node-exporter, prometheus, grafana, etc. Because even with 100 VMs Zabbix already suffering and sometimes not able to show graphs. Also there is a problem of devaluating notifications, when important notifications undistinguishable from unimportant in one uninformative flow... monitoring aren't useful out of the box at all now. More than that, it's better to make infrastructure able to take countermeasures to some errors automatically, which monit is capable of.
and DNS and network control... today it's cruicial to make native support to IPv6, and also CAA records support in DNS zone editor and some other modern stuff.
I could continue, but i'm gratious... grateful =)))))
Jan Poctavek
@YanChii
@klebed thank you. If you have suggestions what to change to make the zabbix notifications more usable after install, pls share. We can adjust the defaults.
klebed
@klebed

@klebed thank you. If you have suggestions what to change to make the zabbix notifications more usable after install, pls share. We can adjust the defaults.

Honestly I don't think it worth polishing zabbix, since it has so many backlashes and architecture issues. It is probably good for almost zero effort monitoring for standard old-fashioned infrastructures, but it's geting worse with all other stuff. And problem of devaluating notifications is one of the very important and recognizeable story about zabbix.
If you really looking into docker, k8s, container zones, etc, then mind thinking of Prometheus since it natively support all the modern stuff. Docker exports prometheus metrics, for instance. And notification and alert system of it is way more efficient, which allows you to control the severiness and make proper differenciation of alerts. And despite the zabbix it's drastically more efficient on dealing with time-series data and you could benefit from flexible and feature rich grafana either to make monitoring more informative.
I understand that it might look like too much efforts at that moment, but in the end it would be worth it. You have certain control over the infrastructure, and your management service is more or less smartly built, so automation over the monitoring could do more than just a zabbix with some defaults.

klebed
@klebed

That's what Zabbix notifications look like :))))

169681 node02 PROBLEM: Fault Management reports an issue

  • 2020.05.06 05:32:57 - High - Value: 4|UNKNOWN|UNKNOWN|UNKNOWN (4)