Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 27 23:44

    YanChii on master

    workaround new bootparams varia… (compare)

  • Jan 25 16:02

    YanChii on new_release_20190117

    fixed boot.manifest (compare)

  • Jan 25 16:00

    YanChii on new_release_20190117

    fixed manifest (compare)

  • Jan 25 15:42

    YanChii on new_release_20190117

    OS-7436 vminfod can be overwhel… OS-6795 illumos-joyent build to… OS-7488 vm delete fails after O… and 8 more (compare)

  • Jan 25 15:39

    YanChii on master

    removed VM.js (TSC workaround n… (compare)

  • Jan 25 15:34

    YanChii on master

    moved postboot-rc manifests (compare)

  • Jan 25 15:33

    YanChii on merge-sysinfo

    (compare)

  • Jan 25 15:33

    YanChii on master

    merge sysinfo upstream changes (compare)

  • Jan 25 15:33
    YanChii closed #10
  • Jan 25 14:46

    YanChii on new_release_20190103

    updated boot copyright year (compare)

  • Jan 23 13:48

    YanChii on master

    allow consistently reexecute he… (compare)

  • Jan 23 13:48
    YanChii closed #119
  • Jan 23 13:46
    YanChii synchronize #385
  • Jan 22 21:58
    YanChii synchronize #384
  • Jan 22 21:43
    YanChii synchronize #384
  • Jan 22 19:58
    YanChii synchronize #385
  • Jan 21 09:50

    dn0 on master

    Fixed links to issues (compare)

  • Jan 21 09:08
    YanChii synchronize #385
  • Jan 21 09:02
    YanChii review_requested #385
  • Jan 21 09:02
    YanChii milestoned #385
klebed
@klebed

might be because of messed up federations?

might be... how to check/resolve?

i'm yet unfamiliar with rmq
or I might be can google it...
Jan Poctavek
@YanChii
these links are a good start:
and
klebed
@klebed
already loorking there
this is what should've been done during ha-deploy
it is a federation mesh
klebed
@klebed
aha
ok
Jan Poctavek
@YanChii
line 31, fortunately for us, it is a shell command
it's even possible to enable a rmq web interface to see what's there: https://www.rabbitmq.com/management.html
just enable it and local-forward a http port using ssh
Jan Poctavek
@YanChii
but there might be too much queues/exchanges and stuff to see something useful (however, it might also be a good source of comparison between the servers)
just for orientation: when client (erigonesd or mgmt) connects, it creates a queue and an exchange
you publish into an exchange (different types, like fanout (=broadcast), unicast, etc.) and exchange will direct the message into appropriate queue
queues are one-way, so you have to have at least two queues to have a conversation
klebed
@klebed
ok... it seems still something not working well. my commands to machines are not executing and even when I've stopped machine by vmadm, it's still running for mgmt
Jan Poctavek
@YanChii
4 daemons per node (fast, slow, image, backup), each has it's own queue
so the messages are not forwarded correctly
if I were you, I'd deploy a new separate standalone VM named mgmt01.local (or any number) with exactly the same metadata values as original mgmt01, transfered the postgres DB, shut down the other mgmt*.local VMs, reconfigured IP (just IP) in local_config.py on each node and restarted erigonesd
that is a rough way to go back to single-node, where we can have safe assumptions about functionality
(if it goes smoothly, you can delete old mgmt* VMs)
don't forget to add appropriate mgmt<number>.local into ERIGONES_MGMT_WORKERS list
klebed
@klebed
ill do that, probably, if won't figure out what is going on straightforward
[2019-10-14 20:44:57,247: WARNING/MainProcess] fast@node04.local ready.
[2019-10-14 20:45:10,339: IMPORTANT/Worker-23] Task "api.node.sysinfo.tasks.node_sysinfo_cb" with id 7e7u1-22fb270d-8964-4bb6-97d0 was created by 7e7u1-69643934-3773-428e-a567
it seems fast working on a node
Jan Poctavek
@YanChii
yes, that is normal node communication
Jan Poctavek
@YanChii
and save also an original ssh private key from mgmt01 (or you can just add the newly generated one to the other VMs in admin vDC (mon01, dns01, img01, cfgdb01))
but restoring the original one is one step easier
klebed
@klebed
all mgmt nodes has NODENAME=rabbit@mgmt01
klebed
@klebed
so this is totally wrong. either ha-deploy messed up here, or it is a bug in deploy, because node names should be unique
klebed
@klebed

so... I've got things almost fixed.
First of all I've changed node names in /etc/rabbitmq/rabbitmq-env.conf on every none according to their real names.

then rmq created new mnesia db files after restart.

Then I've came up with federation config, something as follows:

abbitmqctl add_vhost esdc
rabbitmqctl add_user esdc PASSWORD
rabbitmqctl set_permissions -p 'esdc' esdc "." "." "."
rabbitmqctl set_parameter -p 'esdc' federation-upstream 'interconnect_mgmt01' '{"uri":"amqp://esdc:PASSWORD@mgmt01","ack-mode":"on-confirm","trust-user-id":true}'
rabbitmqctl set_parameter -p 'esdc' federation-upstream 'interconnect_mgmt02' '{"uri":"amqp://esdc:PASSWORD@mgmt02","ack-mode":"on-confirm","trust-user-id":true}'
rabbitmqctl set_parameter -p 'esdc' federation-upstream 'interconnect_mgmt03' '{"uri":"amqp://esdc:PASSWORD@mgmt03","ack-mode":"on-confirm","trust-user-id":true}'
rabbitmqctl set_policy -p 'esdc' --apply-to 'queues' 'federate_dc_queues' '^(fast|slow|backup|image)\..
' '{"federation-upstream-set":"all"}'
rabbitmqctl set_policy -p 'esdc' --apply-to 'exchanges' 'federate_events' '^celeryev$' '{"federation-upstream-set":"all"}'
rabbitmqctl set_policy -p 'esdc' --apply-to 'queues' 'federate_mgmt' '^mgmt$' '{"federation-upstream-set":"all"}'
rabbitmqctl set_policy -p 'esdc' --apply-to 'exchanges' 'federate_pidbox' '^celery.pidbox$' '{"federation-upstream-set":"all"}'
rabbitmqctl set_policy -p 'esdc' --apply-to 'exchanges' 'federate_reply_pidbox' '^reply.celery.pidbox$' '{"federation-upstream-set":"all"}'

I was able to see all the nodes instantly
but yet not able to make any manipulation from mgmt webinterface
klebed
@klebed
It returns: Task queue worker (fast@node04.local) is not responding! (400)
klebed
@klebed
rabbitmqctl list_exchanges -p 'esdc' name policy returns pretty-much same list (excluding node which currently off)

It returns: Task queue worker (fast@node04.local) is not responding! (400)

I think this one is left to be fixed though

klebed
@klebed
so... yet everything works only on mgmt01
after all struggle only thing that makes issues - is permission to access queues on rmq from gunicorn-sio/api/gui . Something I don't get, because esdc user has all the permissions from my point of view.
klebed
@klebed
Is there any code for default rmq configuration for single node?
the ha-prepare deploys the mgmt0[23].local with metadata that configure the correct rmq nodenames
did you add the vhost parameter when you added the permissions?
ok, I see you did
Jan Poctavek
@YanChii
after rmq start, there are no queues or exchanges... everything is created by processes that connect to rmq (gunicorn-*, erigonesd on nodes and on mgmt, etc)
so the permissions you have set should be enough (as you also can see from the factory link)
klebed
@klebed
so... what could be the reason of getting error 400, when accessing CN queues from mgmt02?
or how to dig for the reason? cause I guess I'm so close to resolution without rolling back to single node though