These are chat archives for Exa-Networks/exabgp

7th
Nov 2016
Ben Agricola
@benagricola
Nov 07 2016 12:34
@thomas-mangin this was on a build from master about 2h before that comment so I don't think it's fixed there (unless somehow I'm triggering it differently?) the healthcheck command is kinda complex
Thomas Mangin
@thomas-mangin
Nov 07 2016 12:35
or it was fixed in 3.4 ….
Either way the fix was partial anyway .. Parsing complex comand like is hard … The issue most of the time is the quoting, if you can get rid of it, it should just work
Ben Agricola
@benagricola
Nov 07 2016 12:37
yeah, i noticed - for example i run a 'curl' command in the healthcheck that sets a host header so i'd usually do -H 'Host: blah.com' but that doesn't parse correctly
but -H Host:blah.com does work
but that doesn't work in all situations :)
but i think it probably makes sense for the command that healthcheck runs to go into a script anyway since curl response codes aren't necessarily accurate
Thomas Mangin
@thomas-mangin
Nov 07 2016 12:39
shell wrapping is indeed the current workaround.
Ben Agricola
@benagricola
Nov 07 2016 12:39
yeah, cool
i only have 1 other problem to solve so far, and that's the fact that exabgp will not overwrite a pidfile that already exists
Thomas Mangin
@thomas-mangin
Nov 07 2016 12:41
feel free to open an issue for that :-)
Ben Agricola
@benagricola
Nov 07 2016 12:41
haha okay
it's a little weirder in that it complains about it could not update PID, not starting - but seems to start anyway
Thomas Mangin
@thomas-mangin
Nov 07 2016 12:42
3.4 or master ?
Ben Agricola
@benagricola
Nov 07 2016 12:42
(this happens if it exits uncleanly for some reason e.g. oom killed)
master
so i came back in this morning and found 400 exabgp processes running on this (test) vm because the init script / puppet can't detect that exabgp is already running because the pidfile has a stale pid in it
Thomas Mangin
@thomas-mangin
Nov 07 2016 12:43
ok - make sense
Thomas Mangin
@thomas-mangin
Nov 07 2016 12:53
git diff
diff --git a/lib/exabgp/reactor/daemon.py b/lib/exabgp/reactor/daemon.py
index 71bd165..bdb67f6 100644
--- a/lib/exabgp/reactor/daemon.py
+++ b/lib/exabgp/reactor/daemon.py
@@ -34,6 +34,14 @@ class Daemon (object):
                # os.umask(0)
                os.umask(0137)

+       def check_pid (self,pid):
+               try:
+                       os.kill(pid, 0)
+               except OSError:
+                       return False
+               else:
+                       return True
+
        def savepid (self):
                self._saved_pid = False

@@ -48,8 +56,13 @@ class Daemon (object):
                try:
                        fd = os.open(self.pid,flags,mode)
                except OSError:
-                       self.logger.daemon("PIDfile already exists, not updated %s" % self.pid)
-                       return False
+                       try:
+                               pid = os.fdopen(self.pid,'r').read().readline()
+                               if self.check_pid(int(pid)):
+                                       self.logger.daemon("PIDfile already exists and program still running %s" % self.pid)
+                                       return False
+                       except (OSError,IOError,ValueError):
+                               pass

                try:
                        f = os.fdopen(fd,'w')
diff --git a/lib/exabgp/reactor/loop.py b/lib/exabgp/reactor/loop.py
index 34b46ba..a4bc17c 100644
--- a/lib/exabgp/reactor/loop.py
+++ b/lib/exabgp/reactor/loop.py
@@ -172,7 +172,7 @@ class Reactor (object):
                        return

                if not self.daemon.savepid():
-                       self.logger.reactor('could not update PID, not starting','error')
+                       return

                # did we complete the run of updates caused by the last SIGUSR1/SIGUSR2 ?
                reload_completed = True
This is for 3.4 but should work on master - do you want to try this patch ?
Ben Agricola
@benagricola
Nov 07 2016 12:54
ah yep, that looks good
Thomas Mangin
@thomas-mangin
Nov 07 2016 12:54
The checkpid need improving but it should be good enough as test
    def check_pid (self,pid):
        try:
            os.kill(pid, 0)
            return True
        except OSError as err:
            if err.errno == errno.EPERM:  # a process we were denied access to
                return True
            if err.errno == errno.ESRCH:  # No such process
                return False
            # should never happen
            return False
Thomas Mangin
@thomas-mangin
Nov 07 2016 13:06
final patch applied to master
Ben Agricola
@benagricola
Nov 07 2016 13:06
ah, cool
Thomas Mangin
@thomas-mangin
Nov 07 2016 13:08
and 3.4 branch
Ben Agricola
@benagricola
Nov 07 2016 13:14
awesome - thanks for your help as usual @thomas-mangin
Ben Agricola
@benagricola
Nov 07 2016 13:47
hmm @thomas-mangin, pid = os.fdopen(self.pid,'r').read().readline() - should this be os.open instead? when i start exabgp now I get pid = os.fdopen(self.pid,'r').read().readline() TypeError: an integer is required
since self.pid is the string to the pidfile
Thomas Mangin
@thomas-mangin
Nov 07 2016 14:25
Yes it should :wink:
not os.open, just open
fixed
not tested - my code always work first time .. cough, cough !
Ben Agricola
@benagricola
Nov 07 2016 14:28
hehe