Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Winfried Angele
@paddg_gitlab
Ah, I see, @vcunat, very good, thanks :-)
Vladimír Čunát
@vcunat
:-)
Jan Krajdl
@spamik

Hi,
we're using knot resolver as main dns resolver in company. We have some MS AD domain (company.local). AD do some DNS server also, so company.local is also DNS zone. In knot resolver I have STUB policy for company.local zone redirected to AD servers. It's working fine... for few days :-) Sometimes (it's currently 2-3 times per week) it stops working. When I try to resolve hostname with dig I get no result: https://pastebin.com/7MFHgNXR

If I turn on verbose log I get this when trying to resolve that name: https://pastebin.com/MUbjQxvH
Also it seems that in this time all queries to company.local zone fails with same error. And when I clear cache via kresc, it start working... for few days when it happend again.

Any idea what could happen or what to do to debug it more? Previously we have bind in similar configuration without this issue.

Petr Špaček
@pspacek
Vladimír Čunát
@vcunat
I was about to post exactly that link.
Petr Špaček
@pspacek
Most importantly it is very bad idea to use "fake" domain names, it would be way easier to use something like "internal.company.cz" where "company.cz" would be real name delegated from parent domain. If you really must you can use configration described on https://knot-resolver.readthedocs.io/en/v5.1.2/modules-policy.html#replacing-part-of-the-dns-tree but again, expect various sorts of problems - such setup is architecturally broken so all these are just workarounds.
Jan Krajdl
@spamik

Well, is it? In configuration I have exactly this:
company_mng_tree = policy.todnames({
'mng.company.com.',
'16.172.in-addr.arpa.',
})
company_ad_tree = policy.todnames({
'company.local.',
})
policy.add(policy.suffix(policy.STUB({'172.16.5.52', '172.16.5.53'}), company_mng_tree))
policy.add(policy.suffix(policy.STUB({'172.16.5.32', '172.16.5.33'}), company_ad_tree))

(just s/realcompanyname/company, rest is copy/paste). I thought that it was the right way how to do it :-) Regarding fake domain names - as you see I have mng.company.com for it. But unfortunately I have also some Microsoft mess running here historically with company.local domain and it would be probably really hard to change it...

(or about faking... well, it's pretty same but I want to not resolving internal names from outside world)
Robert Šefr
@robcza
@spamik start the config with the following policy.add(policy.suffix(policy.FLAGS('NO_CACHE'), {todname('local')}))
you will avoid the situation where the non-existence of local tld will be cached
overall, my suggestion is not to cache any of the internal zones. The latency is not an issue internally and you want all the changes being propagated at the time of change
Vladimír Čunát
@vcunat
:+1:
Jan Krajdl
@spamik
Ok, thanks, I'll do it :-) Just for notice - how loing non-existence of domain is preserved in cache? (I left it in this state for several hours and still not working :-) )
Vladimír Čunát
@vcunat
TTL in root NSEC records is a full day.
Jan Krajdl
@spamik
ok, this explain this. Thanks a lot. I'll disable cache and hopefully it won't happend again :-)
Robert Šefr
@robcza
@spamik it will be stable and reliable. The setup you have is a good one with no AD issues as far as I am aware
on top of this, if your clients are configured to use kres as primary resolver, you shield domain controllers against vulnerability such as SIGRed (CVE-2020-1350)
Jan Krajdl
@spamik
@robcza yeah, it's configured in that way. I'm trying to use MS things as little as it's possible, so clients are using only kres as resolvers, external DHCP and so on :-)
beckhamaaa
@beckhamaaa
there are performmance test in kvm virtual machine? my kvm memory is 30GB, hardisk 70GB, when the kresd cache.size=10GB, the kresd process crashed, how can i solve it ?
beckhamaaa
@beckhamaaa
@vcunat
Petr Špaček
@pspacek
Interesting. Please use commands gdb or codedumpctl to obtain back trace and post it here.
beckhamaaa
@beckhamaaa
ok, i do it .
Fred
@Fred81_gitlab
The Debian 10 (stable) packages actually require dependencies only present in Debian testing
for example libknot 10 has this unsatisfied dependency: libknot10 : Depends: libc6 (>= 2.29) but 2.28-10 is to be installed
oh, maybe I'm wrong, and it's trying to install the packages with the same version number from testing...
Fred
@Fred81_gitlab
yep indeed, it's not a problem, everything is fine actually, sorry for the noise.
Fred
@Fred81_gitlab
It would however be better that packages in this repo use some distinct version number, something like official debian backports do. Also apt seems confused by having different packages with the exact same version number available. Now it says that I installed lua-cques from bullseye/sid and that it will downgrade to the version in the Knot stable repository (because I have pinned that higher). I do so, and next apt-get upgrade says exactly the same.
there is no way I can fix it. Even complete removal of lua-cqueues and reinstalling it, leads me to the same loop where every apt-get upgrade says it will dowgrade lua-cqueues.
Héctor Molinero Fernández
@hectorm

Hi, I'm building Knot Resolver 5.1.2 on a Raspberry Pi 4 with Ubuntu 20.04 (arm64) and when I run kresd I get the following error:

PANIC: unprotected error in call to Lua API (bad light userdata pointer)

This is a known issue with LuaJIT on arm64: https://gitlab.nic.cz/knot/knot-resolver/issues/216

The doubt I have is that with the official Knot Resolver package this problem does not occur to me and kresd works fine and I was wondering why it behaves differently.

Vladimír Čunát
@vcunat
I believe we've patched up kresd to not suffer from this, so it normally comes from luajit packages that haven't been fixed yet.
lua-cqueues is the typical one in our case, as it's loaded by default if found.
Héctor Molinero Fernández
@hectorm

Does cqueues suffer from this problem in its version 20200726.51-0?

I tried to install all Lua packages from LuaRocks instead of Ubuntu repositories, but I encountered the same error.

I'm not familiar with the Knot Resolver code neither LuaJIT, but I' ve seen this line in the code and I was wondering if it could be related.
Vladimír Čunát
@vcunat

It is related. We did

    /** Static to work around lua_pushlightuserdata() limitations.
     * TODO: convert to a proper singleton like worker, most likely. */
    static struct engine engine;

just because of that line and that error.

But that's been many months ago.
The best test for cqueues is on that ticket:
luajit -l cqueues -e os.exit(0)
So far I'm not sure why it's causing problems on *.deb systems.
I don't really have access to an aarch64 where I can test deb packages well.
Héctor Molinero Fernández
@hectorm
The luajit -l cqueues -e 'os.exit(0)' command is working properly with moonjit 2.1.2 and cqueues 20200726.51-0. Could you tell me how to know which other LuaJIT package might be causing this crash?
Héctor Molinero Fernández
@hectorm
Nevermind, even if I don't have any LuaJIT packages installed in my system I get the same error when I run kresd. I'm going to need more tests to find out what's going on.
Vladimír Čunát
@vcunat
Hmm, that is interesting.
Héctor Molinero Fernández
@hectorm

To replicate the problem correctly I've written down the build steps in Dockerfiles and the conclusion is that the crash occurs in Ubuntu 20.04, Debian Buster, Debian Sid and Alpine 3. It works correctly in Fedora 32, CentOS 7 and openSUSE Tumbleweed.

I get the same results with the last commit of the v2.1 branch of LuaJIT (570e758) and with Moonjit 2.1.2.

I'm not using any Lua package and the Docker images have been built directly on a Raspberry Pi 4 (8 GB) with a 64-bit kernel.

Héctor Molinero Fernández
@hectorm
@vcunat If you think this might be helpful I can post my results on the issue you linked above.
Vladimír Čunát
@vcunat
Yes. That does sound useful. Thanks!
Héctor Molinero Fernández
@hectorm
Furthermore, if you don't have the required hardware I can provide SSH access to my Raspberry Pi to help you track down this issue.
Vladimír Čunát
@vcunat
I'll see, maybe we can manage to test this reasonably easily. I'll send a private message here otherwise.
mrvne
@mrvne
Hello guys, any ideas what this error message could be? See this quite often.
kresd[32410]: ERROR: udp sendmmsg() sent -1 / 1; Operation not permitted
Vladimír Čunát
@vcunat
It's about sending UDP answer.
I think I've seen EPERM on systems that explicitly disable IPv6 (and you try to use it).
mrvne
@mrvne
I am using IPv6 and its doing it's job fine. So should I be worried about this error?