These are chat archives for Exa-Networks/exabgp

16th
Mar 2017
Justin
@JustinAzoff
Mar 16 2017 21:36
@thomas-mangin hi! Did that bug report about the memory usage have enough info in it?
Justin
@JustinAzoff
Mar 16 2017 21:42
can probably run more tests if needed. One thing I noticed from the data is that the memory growth is perfectly linear, so testing with 50k routes is just as good as 150k
Thomas Mangin
@thomas-mangin
Mar 16 2017 21:44
All good - I have all the info I need to see if I can help
Justin
@JustinAzoff
Mar 16 2017 21:44
i'm running one test now to see if i inject 150k routes using a process if the memory usage is the same
Thomas Mangin
@thomas-mangin
Mar 16 2017 21:44
Should be
Justin
@JustinAzoff
Mar 16 2017 21:46
yeah.. though the times i first noticed this problem were at a clean start
so, start with 50k routes, add 100k routes using api, then restart using 150k routes: OOM
Thomas Mangin
@thomas-mangin
Mar 16 2017 21:46
The issue is that I must create too many objects for the attributes instead of re-sharing them.
However to re-share you need to cache.
Justin
@JustinAzoff
Mar 16 2017 21:47
though i THINK that may just be an issue with when it forks the process and the kernel thinks it will fail and OOMs
Thomas Mangin
@thomas-mangin
Mar 16 2017 21:47
There is a API command called
attribute
I think it is:
Justin
@JustinAzoff
Mar 16 2017 21:47
oh, can i just define a route template and use that 150k times?
Thomas Mangin
@thomas-mangin
Mar 16 2017 21:47
attribute <your attributes here> nlri <nlri> <nlri>
Justin
@JustinAzoff
Mar 16 2017 21:47
every single route I add is the same
Thomas Mangin
@thomas-mangin
Mar 16 2017 21:48
and it can share the memory for the NLRI and does not parse as much so it is faster
so this syntax should help
Justin
@JustinAzoff
Mar 16 2017 21:53
ah, so if i can do something like
attribute mynullroute next-hop self community [ 64512:666 no-export ] ; 
or such in the config, and then jus add all the routes as 'route whatever mynullroute' that could work
Thomas Mangin
@thomas-mangin
Mar 16 2017 21:53
not sure self works there but yes, it is the spirit
No it is
Justin
@JustinAzoff
Mar 16 2017 21:54
well this is interesting.. my process stress test used a lot less memory
Thomas Mangin
@thomas-mangin
Mar 16 2017 21:54
attribute med 100 next-hop 10.10.0.1 communitu [ no-export] nlri 1.2.3.4 5.6.7.8 1.1.1.1 2.2.2.2.2
Justin
@JustinAzoff
Mar 16 2017 21:55
so..
# ./stress.py |head
announce route 1.1.1.1/32 next-hop self
announce route 1.1.1.2/32 next-hop self
announce route 1.1.1.3/32 next-hop self
announce route 1.1.1.4/32 next-hop self
announce route 1.1.1.5/32 next-hop self
announce route 1.1.1.6/32 next-hop self
announce route 1.1.1.7/32 next-hop self
announce route 1.1.1.8/32 next-hop self
announce route 1.1.1.9/32 next-hop self
./stress.py |wc -l
150000
static {

}

family {
  ipv4 unicast;
}

process bhr-dynamic {
    run /tmp/stress.py;
}
so, i change that to add an infinite loop at the end so it doesn't exit, and start that up...
takes a while to settle
but once it does..
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
12432 nobody    20   0  865652 830412   6740 S   0.0 20.5   1:39.50 python2.7
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:00
use attribute and announce 10 routes per line ..
Justin
@JustinAzoff
Mar 16 2017 22:01
seems the memory usage blows up after it is done reading the routes as it starts parsing the neighbors
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:01
k
Justin
@JustinAzoff
Mar 16 2017 22:01
so this does explain why i had troubles on a bootup
it was fine to add 150k routes using the api after startup, but starting with a config with those routes baked in blows up
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 
12454 nobody    20   0 2400112 2.255g   6752 S   0.0 58.4   2:21.06 python2.7
so that's teh same 150k routes and 3 neighbors with the config
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:03
It may be that the configuration caches the routes and then it goes to the ADJ-RIB-OUT
Justin
@JustinAzoff
Mar 16 2017 22:03
so, no config + 150k routes via api = 845M, config+150k routes = 2.2G
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:03
when with the API the route is only in the ADJ-RIB-OUT .. Need to think about it
Justin
@JustinAzoff
Mar 16 2017 22:04
i can actually just modify my program to work this way, I always just figured templating out the intial config would be faster than starting with an empty config and injecting all the routes back in
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:04
You should still be able to reduce this with attribute :p
Justin
@JustinAzoff
Mar 16 2017 22:05
yeah I need to work out how to do that the easiest way to template that out
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:06
a folder with
one folder for attribute grouping
and in that one all the nlri as file
so you only have to touch
:p
Justin
@JustinAzoff
Mar 16 2017 22:07
hmm? I don't follow
my issue is i have an api that returns the list of things that should be blocked, so I mostly need to just add in a chunk function to get sublists of 10+ things to block
and then render that
going to prototype it out with a modified versin of that gen.py file i wrote - which is basically what my app does just with real ips
Justin
@JustinAzoff
Mar 16 2017 22:13
hmm
can i use attribute via the config file?
Thu, 16 Mar 2017 22:13:02 | ERROR    | 12519  | configuration | line 11: attribute med 100 next-hop self community [ 65101:666 ; no-export ] nlri 1.1.1.1 1.1.1.2 1.1.1.3 1.1.1.4 1.1.1.5 1.1.1.6 1.1.1.7 1.1.1.8 1.1.1.9 1.1.1.10 ;
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:14
I do not think so ..
It should not be hard to add but I think it is api only atm
would not want to add it now as otherwise I would have to provide the same feature on master and it would be bad (reason not to)
Justin
@JustinAzoff
Mar 16 2017 22:17
hmm
Thu, 16 Mar 2017 22:17:28 | WARNING  | 12579  | reactor       | Command from process not understood : attribute med 100 next-hop self community [ 65101:666 ; no-export ] nlri 1.3.78.206 1.3.78.207 1.3.78.208 1.3.78.209 1.3.78.210 1.3.78.211 1.3.78.212 1.3.78.213 1.3.78.214 1.3.78.215
oh, you said self might not work?
well, it doesn't work, but it sure makes the config smaller, 2.6M vs 15M
i should be able to just tweak my startup process to start with an empty config and then inject the 150k routes for now
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:21
:-)
Do you want to update the issue with your finding for the next person
Justin
@JustinAzoff
Mar 16 2017 22:22
yeah i added a comment about the use at startup vs inject
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:22
I will still try to improve things but I am working long days and do not have much time free atm
Thanks
Justin
@JustinAzoff
Mar 16 2017 22:22
there an easy way to ask a running exabgp how many routes it has?
I realized I've been using exabgp for 5 years but only literally this one feature
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:23
The connect TLS .. use -proxy-binding
but anything else ?
Justin
@JustinAzoff
Mar 16 2017 22:25
oh, i'll have to figure that out
oh, i see in self/attribute/api-internet.py
it's
msg = 'announce attribute next-hop 1.2.3.4 med 100 as-path [ 100 101 102 103 104 105 106 107 108 109 110 ] nlri %s'
Justin
@JustinAzoff
Mar 16 2017 22:33
interesting
community [ 65101:666 no-export ] works
community [ 65101:666 ; no-export ] doesn't parse.. 
wow, doing batches of 50 using attributes resulted in
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
12971 nobody    20   0  245712 209692   6784 S   0.3  5.2   0:29.48 python2.7
Justin
@JustinAzoff
Mar 16 2017 22:40
so.. 150k in config: 3G of ram. 150k announced after startup: 845M. 150k announced in batches of 50: 204M
@thomas-mangin possibly stupid question.. is there a difference in what the remote router sees if I do announce attribute with 50 routes vs announcing 50 separately?
I'm sure the fancy juniper routers we have would support both if there's a difference, but if there is a difference i'd need to get the networking people involved in the testing
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:53
It is more efficient for the remote router when you group with attribute
as one UPDATE message will have multiple NLRI so faster convergence
Hence why master group the config per attribute (or will be doing)
and will have named attribute so you can linked nlri to the name
not done yet tho
Justin
@JustinAzoff
Mar 16 2017 22:55
oh? i have master too.. lemme re-do that last test with it
interesting.. the behavior is much different
looks like it doesn't setup any routes because it can't connect to any of the peers i have configured
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:55
I rewrote the WHOLE config parser and it is not finished
Justin
@JustinAzoff
Mar 16 2017 22:56
oh, so don't use master :-)
Thomas Mangin
@thomas-mangin
Mar 16 2017 22:56
sorry ?
what is the issue here ?
Justin
@JustinAzoff
Mar 16 2017 22:56
no issue :)
i'm planning on finally upgrading from 3.4.9 when I re-do this startup process
Thomas Mangin
@thomas-mangin
Mar 16 2017 23:00
:-)
Thomas Mangin
@thomas-mangin
Mar 16 2017 23:10
openssl s_client -crlf -connect 82.219.4.112:1443
CONNECTED(00000003)
57570:error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version:/BuildRoot/Library/Caches/com.apple.xbs/Sources/OpenSSL098/OpenSSL098-64.30.2/src/ssl/s23_clnt.c:593: