Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
waclaw66
@waclaw66
jde ještě nějak jinak ověřit, že problém není v kresd ale u Cloudflare?
Vladimír Čunát
@vcunat
Provoz z kresd k 1.1.1.1 lze zachytit přes tcpdump/wireshark.
waclaw66
@waclaw66
hláška range search found stale or insecure entry se mi moc nezdá
Vladimír Čunát
@vcunat
To je normální stav.
Ne vždy se v cachi najde vhodný záznam.
waclaw66
@waclaw66
ještě můžu zkusit nahodit bind se stejnou konfigurací, mělo by se to tedy chovat stejně, wireshark je na mě už moc
Vladimír Čunát
@vcunat
Nebo tak. Potvrzené chyby v 1.1.1.1 pak hlásím na jejich fóru, třeba https://community.cloudflare.com/t/incomplete-nxdomain-proofs-with-qtype-ds/183499 (ale mně se tohleto neděje)
waclaw66
@waclaw66
Bind překládá správně :/ je tedy bez TLS
waclaw66
@waclaw66
Nahodil jsem zpět kresd a teď vrací IP v pořádku, že by opravdu nějaké problémy u Cloudflare. Je mi divný, že to dělala jen ta zmíněná doména
waclaw66
@waclaw66
Zatím díky, budu to nějakou dobu sledovat, jestli z toho nevyplyne nějaká souvislost.
waclaw66
@waclaw66
Ke včerejšímu problému, situce v síti... server/router s FC32 + knot resolver Cloudflare vraci opakovane po vyprseni TTL SERVFAIL, klient s Win10 DNS primo na Cloudflare po vyprseni TTL vrati spravnou adresu a pote zacne i na serveru vracet Cloudflare NOERROR. Oba stroje za NATem se stejnou IP, jediny rozdil je klient kresd/win10. Bind na serveru taky bez problemu. Nerozumim tomu. Tohodle chovani sem si zacal vsimat az po upgradu na 5.1.1, ale nemusi to s tim souviset.
Petr Špaček
@pspacek
To zní jako nějaký problém se stavem cache u Cloudflare, ale těžko říct bez dalších podrobností.
Nejlepší by bylo nastavit proměnnou SSLKEYLOGFILE podle https://gnutls.org/manual/html_node/Debugging-and-auditing.html aby ji viděl kresd, logovat provoz směrem k 1.1.1.1 do PCAPu a pak nám to celé poslat, abysme se mohli podívat.
waclaw66
@waclaw66
Vypnul jsem u knotu jeste TLS, aby byly podminky DNS klientu uplne stejne. Pak by, predpokladam, SSLKEYLOGFILE nebylo treba?
Vladimír Čunát
@vcunat
Přesně tak.
waclaw66
@waclaw66
Na jakou adresu mohu ten pcap poslat? Nechci ho tu davat verejne. pcapng nebo jen pcap?
Petr Špaček
@pspacek
Pokud vám máme pomoci tak to bude potřeba postnout někde veřejně: Pravidla jsou zde https://www.knot-resolver.cz/support/free/
waclaw66
@waclaw66
Rozumím, tady to je... https://cloud.waclaw.cz/index.php/s/DWYkHyq2mGpjEie Díky! Snad to k něčemu bude.
Vladimír Čunát
@vcunat
@waclaw66: jednoznačně mi to přijde na chybu u nich, otevřel jsem vlákno.
waclaw66
@waclaw66
@vcunat Diky, ja bych nevedel co reklamovat :) Prekvapily me i ty rozhazeny velikosti pismen v domene, to je normalni?
Vladimír Čunát
@vcunat
Ano, alespoň u nás. Jinde to asi není častý default. Detaily: https://tools.ietf.org/html/draft-vixie-dnsext-dns0x20
waclaw66
@waclaw66
@vcunat Zajímavé, člověk se i něco přiučí :)
Petr Špaček
@pspacek
Let's switch back to English so we do not deter other people :-)
waclaw66
@waclaw66
Regarding the topic from yesterday, I have test results, first command nslookup commuting.waclaw.cz returns SERVFAIL, second command nslookup commuting.waclaw.cz 1.1.1.1 NOERROR. Could you check this pcap? Thanks! I will attach it to the Cloudflare issue too.
Vladimír Čunát
@vcunat
Yes, it could give a hint that some of the flags from kresd make a difference in this case, but it's hard to guess. The important information is why the SERVFAIL is happening, and we can't well see that. (There's a fresh RFC that will hopefully help with such cases in future by sending more information.)
If it keeps repeating, I expect they'll look into their logs.
waclaw66
@waclaw66
For me as an amateuer the only difference between those two queries and its packets is different case of letters within domain name. Is it possible to turn this 0x20 feature off, just for a test?
waclaw66
@waclaw66
I found there should be NO_0X20 flag, although I didn't managed to put it together with current settings policy.add(policy.all(policy.FORWARD('1.1.1.1'))) Could you help, please?
Vladimír Čunát
@vcunat
Before this rule you can add policy.add(policy.all(policy.FLAGS({'NO_0X20'}))) but I'm fairly confident that CloudFlare won't have problems with this feature (and case sent to auths should be independent).
waclaw66
@waclaw66
You're right, even SAFEMODE doesn't help too.
waclaw66
@waclaw66
BUT, querying nslookup commuting.waclaw.cz 1.1.1.1 in SAFEMODE doesn't recover following nslookup commuting.waclaw.cz, it still returns SERVFAIL, that's different to not using SAFEMODE. Cloudflare apprently struggles with queries from knot resolver.
Winfried Angele
@paddg_gitlab
Hi, is it possible to attach to a already running kresd with a interactive console?
Vladimír Čunát
@vcunat
Winfried Angele
@paddg_gitlab
Ah, I see, @vcunat, very good, thanks :-)
Vladimír Čunát
@vcunat
:-)
Jan Krajdl
@spamik

Hi,
we're using knot resolver as main dns resolver in company. We have some MS AD domain (company.local). AD do some DNS server also, so company.local is also DNS zone. In knot resolver I have STUB policy for company.local zone redirected to AD servers. It's working fine... for few days :-) Sometimes (it's currently 2-3 times per week) it stops working. When I try to resolve hostname with dig I get no result: https://pastebin.com/7MFHgNXR

If I turn on verbose log I get this when trying to resolve that name: https://pastebin.com/MUbjQxvH
Also it seems that in this time all queries to company.local zone fails with same error. And when I clear cache via kresc, it start working... for few days when it happend again.

Any idea what could happen or what to do to debug it more? Previously we have bind in similar configuration without this issue.

Petr Špaček
@pspacek
Vladimír Čunát
@vcunat
I was about to post exactly that link.
Petr Špaček
@pspacek
Most importantly it is very bad idea to use "fake" domain names, it would be way easier to use something like "internal.company.cz" where "company.cz" would be real name delegated from parent domain. If you really must you can use configration described on https://knot-resolver.readthedocs.io/en/v5.1.2/modules-policy.html#replacing-part-of-the-dns-tree but again, expect various sorts of problems - such setup is architecturally broken so all these are just workarounds.
Jan Krajdl
@spamik

Well, is it? In configuration I have exactly this:
company_mng_tree = policy.todnames({
'mng.company.com.',
'16.172.in-addr.arpa.',
})
company_ad_tree = policy.todnames({
'company.local.',
})
policy.add(policy.suffix(policy.STUB({'172.16.5.52', '172.16.5.53'}), company_mng_tree))
policy.add(policy.suffix(policy.STUB({'172.16.5.32', '172.16.5.33'}), company_ad_tree))

(just s/realcompanyname/company, rest is copy/paste). I thought that it was the right way how to do it :-) Regarding fake domain names - as you see I have mng.company.com for it. But unfortunately I have also some Microsoft mess running here historically with company.local domain and it would be probably really hard to change it...

(or about faking... well, it's pretty same but I want to not resolving internal names from outside world)
Robert Šefr
@robcza
@spamik start the config with the following policy.add(policy.suffix(policy.FLAGS('NO_CACHE'), {todname('local')}))
you will avoid the situation where the non-existence of local tld will be cached
overall, my suggestion is not to cache any of the internal zones. The latency is not an issue internally and you want all the changes being propagated at the time of change
Vladimír Čunát
@vcunat
:+1:
Jan Krajdl
@spamik
Ok, thanks, I'll do it :-) Just for notice - how loing non-existence of domain is preserved in cache? (I left it in this state for several hours and still not working :-) )
Vladimír Čunát
@vcunat
TTL in root NSEC records is a full day.
Jan Krajdl
@spamik
ok, this explain this. Thanks a lot. I'll disable cache and hopefully it won't happend again :-)
Robert Šefr
@robcza
@spamik it will be stable and reliable. The setup you have is a good one with no AD issues as far as I am aware
on top of this, if your clients are configured to use kres as primary resolver, you shield domain controllers against vulnerability such as SIGRed (CVE-2020-1350)
Jan Krajdl
@spamik
@robcza yeah, it's configured in that way. I'm trying to use MS things as little as it's possible, so clients are using only kres as resolvers, external DHCP and so on :-)
beckhamaaa
@beckhamaaa
there are performmance test in kvm virtual machine? my kvm memory is 30GB, hardisk 70GB, when the kresd cache.size=10GB, the kresd process crashed, how can i solve it ?