Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
beckhamaaa
@beckhamaaa
ok, i can see
beckhamaaa
@beckhamaaa
Nov 12 09:26:47 ubuntu-1 kresd[5751]: /usr/lib/knot-resolver/sandbox.lua:383: can't open cache path '.'; working directory '/var/cache/knot-resolver'
Nov 12 09:26:47 ubuntu-1 kresd[5751]: [cache] failed to clear stale lockfile './.cachelock': Permission denied
when i config kresd systemd, i suffer it
how can i fix it ?
@vcunat
beckhamaaa
@beckhamaaa
my server 48 cpu, so i run "systemctl restart kresd@{1..48}.service",but the qps only 200,000?
@vcunat
Petr Špaček
@pspacek
That's limitation of current systemd implementation described at https://github.com/systemd/systemd/issues/8096#issuecomment-505759188 . For that many cores you need to remove socket activation from systemd and use net.listen() in kresd config to get better performance.
beckhamaaa
@beckhamaaa
oh ,i will try it , i hope achieve 2000000 qps in 48 core
@pspacek
beckhamaaa
@beckhamaaa
how can i start SO_REUSEPORT in kresd ?
Vladimír Čunát
@vcunat
It's always used. What you need is not to use systemd sockets for listening, e.g. that net.listen() API. (Or more precisely, you need to avoid sharing the same kernel-level socket description among kresd instances.)
beckhamaaa
@beckhamaaa
i need to isolate the kernel socket ?
beckhamaaa
@beckhamaaa
how can i config net.listen()? i config the socket with net = { '1.1.1.1' }
how can i start the multi thread?
@vcunat
Petr Špaček
@pspacek
Just run the kresd process multiple times, that's all.
beckhamaaa
@beckhamaaa
ok ,i can see the reuseport?
Petr Špaček
@pspacek
I'm not sure what you mean. REUSEPORT is always enabled and that's reason why you are able to run multiple kresd instances on the same IP address. Linux Kernel will distribute queries among the instances automatically.
beckhamaaa
@beckhamaaa
ps -ef|grep kresd
knot-re+ 31053 1 0 22:06 pts/3 00:00:00 kresd -c kresd.conf -f 1
knot-re+ 31056 1 0 22:06 pts/3 00:00:00 kresd -c kresd.conf -f 1
knot-re+ 31057 1 0 22:06 pts/3 00:00:01 kresd -c kresd.conf -f 1
knot-re+ 31058 1 0 22:06 pts/3 00:00:00 kresd -c kresd.conf -f 1
knot-re+ 31059 1 0 22:06 pts/3 00:00:01 kresd -c kresd.conf -f 1
knot-re+ 31060 1 0 22:06 pts/3 00:00:00 kresd -c kresd.conf -f 1
knot-re+ 31061 1 0 22:06 pts/3 00:00:01 kresd -c kresd.conf -f 1
i can run multi kresd and the can achieve reuseport ,but the qps is equal to single node ?
@pspacek
Vladimír Čunát
@vcunat
That sounds like sending DNS queries from a single address-port pair.
beckhamaaa
@beckhamaaa
net.listen('1.1.1.111',53)
net.ipv4 = true
yes , the result look like the single thread
there is my config
Vladimír Čunát
@vcunat
address-port pair of the sender, not kresd itself.
beckhamaaa
@beckhamaaa
address-port pair of as a client?
how can i test the qps correctly by dnsperf or resperf?
@vcunat
Petr Špaček
@pspacek
for dnsperf use higher number of outgoing sockets: dnsperf -c 1000 should do it.
Beware that dnsperf and resperf will give you unrealistic numbers because do not follow query patterns as real clients (they ignore TTL in answers). For more realistic benchmarking have a look at https://ripe79.ripe.net/archives/video/198/
beckhamaaa
@beckhamaaa
i try dnsperf -c 1000 , the result is the same as the single thread
Petr Špaček
@pspacek
It's hard to say why, your results do not match ours and it is not apparent what is the cause.
Only advice I have is - you need to experiment more :-)
beckhamaaa
@beckhamaaa
oh ,i can try
Robert Šefr
@robcza

I have encountered rather peculiar issue with the prefill module. The module is loaded properly, but the process crashes on the configuration line:
prefill.config({ ['.'] = { url = 'https://www.internic.net/domain/root.zone', ca_file = '/etc/ssl/certs/ca-certificates.crt', interval = 86400 }})
The weird thing is that when the process is restarted it runs ok and does not crash anymore. To me it seems I'm not initializing something properly, but can not find what exactly.

Trace is available here: https://gist.github.com/robcza/5cfee5d46fe3af22c0f1512e680f83fd

Vladimír Čunát
@vcunat
I don't think that problem is possible due to some misconfiguration. It's more likely to be some bug in the prefill module.
Robert Šefr
@robcza
@vcunat let me know if I can help reproducing the issue
beckhamaaa
@beckhamaaa
how can i reload different data.mdb by systemctl kresd
@vcunat
moreover,can you share the benchmark method of the kresd in multithread situation.
@vcunat
Petr Špaček
@pspacek
@robcza AFAIK an older version was crashing if the zone file was empty, so maybe there is another similar bug which manifests itself only if the zone file is incomplete or so.
Vladimír Čunát
@vcunat

@beckhamaaa: I always look at what resource seems to be the bottleneck. What we did:

  • start multiple individual kresd processes (-f 1)
  • put the cache into RAM (a tmpfs) to avoid overloading a disk
  • avoid virtualization and preferably even containers
  • send over network: you need a good card, ensure enough queues, e.g. ethtool -L $DEVICE combined $CPU_COUNT (may differ a bit by card vendor); at high speeds, localhost may be slower than network (!)
  • generating traffic: for me resperf was only able to get something like 200 kQPS. We used tcpreplay or our new shotgun.

IIRC that are all the important steps. It's mostly not specific to kresd or even DNS.

beckhamaaa
@beckhamaaa
that card is a raid card? i make a raid 10 in disk
Vladimír Čunát
@vcunat
For "card" I only wrote about network card, and in that context I haven't heard of "raid". Disk doesn't seem interesting to me; I'm missing any real use case to persist cache across OS reboots.
Petr Špaček
@pspacek
@robcza Hello! We are planning some changes in process management for release 5.0.0 and I've remembered that you are using supervisord. Can you share your supervisord config as an inspiration for us?
Robert Šefr
@robcza
@pspacek will share supervisor conf through another channel

Another topic - we have encountered an awkward domain "cdn.analyzeo.com" - no answer section, only two authority records. Other resolvers mark this as SERVFAIL, kresd seems to return the authority section only. Is this somehow defined in the RFC?

dig @193.17.47.1 cdn.analyzeo.com

; <<>> DiG 9.10.3-P4-Debian <<>> @193.17.47.1 cdn.analyzeo.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30764
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;cdn.analyzeo.com.              IN      A

;; AUTHORITY SECTION:
cdn.analyzeo.com.       3600    IN      NS      ns2.uti.com.pl.
cdn.analyzeo.com.       3600    IN      NS      ns3.uti.com.pl.

;; Query time: 85 msec
;; SERVER: 193.17.47.1#53(193.17.47.1)
;; WHEN: Thu Nov 21 19:36:37 UTC 2019
;; MSG SIZE  rcvd: 91

Other resolver

dig @9.9.9.9 cdn.analyzeo.com

; <<>> DiG 9.10.3-P4-Debian <<>> @9.9.9.9 cdn.analyzeo.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 46644
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;cdn.analyzeo.com.              IN      A

;; Query time: 1 msec
;; SERVER: 9.9.9.9#53(9.9.9.9)
;; WHEN: Thu Nov 21 19:41:47 UTC 2019
;; MSG SIZE  rcvd: 45