These are chat archives for CZ-NIC/knot-resolver

10th
Sep 2018
Petr Špaček
@pspacek
Sep 10 2018 06:27
@courtneycouch Hi! In principle you can hook into path which handles cache misses and send new data to other instances, i.e. do this asynchronously. Synchronous approach with memcached/redis does not scale very well.
Vladimír Čunát
@vcunat
Sep 10 2018 08:03
I think I can reveal that CloudFlare uses a "reversed" approach - a lua module that multi-casts cache stores to other instances. ATM that module isn't public, though it might be in future. (IIRC this information is scattered across public comments on our GitLab.)
Vladimír Čunát
@vcunat
Sep 10 2018 09:10

@eadwu: I don't know, everything seems fine in the log you posted - no errors happen there, though only a single request goes over network (DNSKEY .) and the rest is just from cache.

BTW, it's a little sad that 9.9.9.9 closes the TLS connection just two seconds after answering, given how "expensive" the handshakes are, but I haven't researched which services do this better.

Robert Šefr
@robcza
Sep 10 2018 10:11
@hectorm we are looking into the same issue. I think there should be a module responsible for rate limiting. We can look into it together if you are interested and open the module for rest of the world
Vladimír Čunát
@vcunat
Sep 10 2018 10:18
Just note that with UDP the source address is easy to spoof, so rate-limiting by source address seems unlikely to really help against attacks.
Well, it would actually hamper reflection attacks, as rate of the answers would be limited per-IP.
Robert Šefr
@robcza
Sep 10 2018 10:19
@vcunat for the amplification attacks the attacker wants to spoof a particular address or range. rate limiting will still be effective, even if it is spoofed
Vladimír Čunát
@vcunat
Sep 10 2018 10:20
Yes, that's what I meant by the last comment :-)
Still, in his/her case I'd try to block packets at network perimeter if that's possible, as that feels much more reliable.
Limiting could be added later to e.g. decrease effectiveness of attacks from within the served network, should they happen.
Robert Šefr
@robcza
Sep 10 2018 10:24
of course, that is the proper way for regular resolver. But you can not do this if you are going to provide a public resolver, as the "served network" is basically the whole internet
Vladimír Čunát
@vcunat
Sep 10 2018 10:26
"if a user exposes the service by mistake" made me think the intention is not to have a public service in this case.
Robert Šefr
@robcza
Sep 10 2018 10:27
Oh sorry, I missed that. Anyway, our use case is to expose the resolvers on purpose :)
Vladimír Čunát
@vcunat
Sep 10 2018 10:34
Ah, I see, I expect your service should work "directly over the internet's UDP", so you don't have much choice. BTW, isn't it suitable to use iptables for this, at least for now, similarly to https://unix.stackexchange.com/a/164032/41413
Robert Šefr
@robcza
Sep 10 2018 10:48
@vcunat thank you, interesting, it is worth testing
ah, but this would limit the service for everyone making the resolver susceptible to dos (as a target)
Vladimír Čunát
@vcunat
Sep 10 2018 10:55
The attacker can always spoof IP as the client he wants to DoS. You won't be able to tell if the UDP packets are from the attacker or the "customer" (at least not easily).
(Perhaps I'm missing something but here I can't see a difference between rate-limiting at kernel level vs. at resolver level.)
Robert Šefr
@robcza
Sep 10 2018 13:45

we have identified a few issues with different .cz domains in kresd 2.4.1. These are the traces from two different resolver running the same version and configuration:
https://gist.githubusercontent.com/robcza/82d0addff85182d6aadb7e24f2ffff2c/raw/a433176299a1df09f07346918ff3c254eac9daf9/gistfile1.txt

One is able to resolver www.cezdistribuce.cz the other is not. I'm not really sure about the reason. Could you point me in the right direction?

Vladimír Čunát
@vcunat
Sep 10 2018 13:49
@robcza: it's bad authoritative servers.
We've seen a few of these already. F5 load balancer created those we saw recently IIRC.
Robert Šefr
@robcza
Sep 10 2018 13:51
@vcunat is there anything we can do as a workaround? you mean it depends on the NS we reach?
Vladimír Čunát
@vcunat
Sep 10 2018 13:51
It's online-signed with black lies done wrong - the NSEC3 record claims that only a single type exists on the name (TXT in your log) - and then when a different qtype on that name is asked, kresd re-uses that proof.
It depends on what's in cache - first a negative answer needs to be cached on that name (which is wrong) and then queries for different types on that name will be denied.
I don't know a reasonably simple work-around, and I don't like spending much time on it.
The authoritatives return proofs stating something else than what they mean, so I'd push them to fix (if they want it to work). It's a bit annoying that kresd is probably the only resolver so far using aggressive caching by default, so it should typically "just work" with other resolvers...
Petr Špaček
@pspacek
Sep 10 2018 13:56
Good news is that it is going to break with latest versions of Unbound and BIND as well :-)
(AFAIK they are working on code for this.)
Vladimír Čunát
@vcunat
Sep 10 2018 13:57
Unbound supports it but it isn't on by default.
Ah, maybe for NSEC only and not NSEC3 for now.
Petr Špaček
@pspacek
Sep 10 2018 13:59
@robcza We have had the same problem with www.csas.cz and they worked it around on their side by configuring lower TTL for non-existing data.
It is not perfect but "good enough" as workaround.
Vladimír Čunát
@vcunat
Sep 10 2018 14:00
Oh, I see... but that can mean it's still severely broken!
Robert Šefr
@robcza
Sep 10 2018 14:00
and omitting dnssec as a temporary workaround would help? the problem is as well on the domain lekar-soap.erecept.sukl.cz which is kind of sensitive in terms of availability :)
Vladimír Čunát
@vcunat
Sep 10 2018 14:00
They have A record but not AAAA. The TTL of the bad proof is down to two minutes, but with happy eyeballs the work-around might not really help at all.
Petr Špaček
@pspacek
Sep 10 2018 14:00
It is, Ceska sporitelna has a ticket open about this.
@robcza Negative trust anchor for whole zone should help but it needs verification on your side.
Vladimír Čunát
@vcunat
Sep 10 2018 14:02
@robcza: right, negative trust anchors for particular subtrees is probably the best you can do.
I hope other resolvers will soon get aggressive cache by default as well, so that our persuading power is better.
This is getting big. I'll file at least something provisional in https://github.com/dns-violations/dns-violations
Petr Špaček
@pspacek
Sep 10 2018 14:10
@robcza Please file a new PR for https://github.com/dns-violations/dns-violations so it is clear to other people that it is not only us complaining. Thank you!
Robert Šefr
@robcza
Sep 10 2018 18:05
@pspacek working on the DVE. Could you please explain where in the NSEC3 record is the flag regarding the allowed record types? The one that inludes just the TXT
Vladimír Čunát
@vcunat
Sep 10 2018 18:29
The matching NSEC3 contains the set of types present at that name.
The problem is that these bogus implementations just choose only one single type than the one that is currently queried.
Lying can work, but they need to do it the other way - claim that all but the queried type exist. https://blog.cloudflare.com/black-lies/
@robcza ^^