by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Charlie Voiselle
    @angrycub
    well, it's more of a hate/begrudgingly acknowledge the power of
    At one point I had a branch where I'd added in a pile of extra template functions which made it less challenging but it landed at a weird time
    I need to try and keep it up to date. So much rebase.
    Tommy Alatalo
    @tommyalatalo

    I keep getting this error when I run nomad job stop -purge for my fabio job which gets dynamic consul credentials from vault:

     [WARN]  agent: Service deregistration blocked by ACLs: service=api-1936b59b9c1b-9997 accessorID=

    For some reason the accessorID ends up empty?

    Charlie Voiselle
    @angrycub
    Could you see if it feels like this GH issue? hashicorp/consul#7669
    Nah, sorry. I didn't read it well enough
    Florian Apolloner
    @apollo13
    uff, what is nomad trying to tell me now : Failed to start container a5d888a098d3143b83c48697393243f3833c8240f5b2bb583f478e1465488400: API error (500): error while creating mount source path '/srv/storage/wiki': mkdir /srv/storage: file exists
    isn't that the point of bind mounts?
    Charlie Voiselle
    @angrycub
    maybe you need to restart the docker daemon?
    That error is out in dockerland though. When you see errors formatted like that, those are coming directly back from the docker API itself.
    Michael Aldridge
    @the-maldridge
    @angrycub it would be really helpful if Nomad could preface these in some way like "docker daemon error:"
    Charlie Voiselle
    @angrycub
    Fair point. It would be good to highlight that edge.
    Daniel Durante
    @durango
    Is there anyway for me to store a template in the root of a container? Instead of the local dir..
    Charlie Voiselle
    @angrycub
    you would have to mount it there, but mounting individual files has a docker sad that comes along with it for things that are "atomically updated" using a write to temp file and then replace original file technique (which happens with the template library)
    Michael Aldridge
    @the-maldridge
    you also just can't do that because the template file needs to exist on the host before the container
    Charlie Voiselle
    @angrycub
    If you templated into local, you could mount that filepath into the container where you want. It's just single file mounts will never be updated by the template stanza. So you are at least bringing a folder to the party if you want in-container updates from Nomad's templating. Once you are bringing a folder, it may as well be one of the ones that Nomad automatically mounts for you like local or alloc
    I guess it depends by what you mean when you say store @durango. Like @the-maldridge said... it's gotta live on the Nomad host while it's being written to be mounted into your workload. There's no way to bypass that part. If you're worried about its life on disk, you can write into the secrets dir (which is a tempfs) which gets auto-mounted to your container too.
    Daniel Durante
    @durango
    Problem is, I can't really change the "config path" with spark :/
    Charlie Voiselle
    @angrycub
    It would seem like you could use the SPARK_CONF_DIR env var (https://spark.apache.org/docs/latest/configuration.html#overriding-configuration-directory) But I can say that my Spark is all but non-existent.
    Daniel Durante
    @durango
    @angrycub sweet thanks! One more issue that I'm running into though is having a CSI image's writer allocation still stuck on that "lost job" (24 days now) -- anything I can do? :P (besides give it new volumes)
    Charlie Voiselle
    @angrycub
    Did the client that was running the job die? It's interesting that it's stuck on lost.
    Daniel Durante
    @durango
    @angrycub it did
    Charlie Voiselle
    @angrycub
    Is there any way to bring back that client in such a way that the servers can figure out that the client is back and the job isn't running?
    Daniel Durante
    @durango
    i don't remember the client details unfortunately (auto scaling / rotating IPs based on subnet)
    Charlie Voiselle
    @angrycub
    You might be able to look in nomad node status. I think that you would see a client stuck in the lost state there too
    Florian Apolloner
    @apollo13
    @angrycub The mount point below went havroc (glusterfs :D)
    Paul
    @pauldon2
    Good day. Can I limit the execution of tasks in client configuration?
    So that only one type of tasks is performed on a particular client.
    Florian Apolloner
    @apollo13
    @pauldon2 with type you mean like batch jobs, or something else?
    Charlie Voiselle
    @angrycub
    If you trust your operators, you can use node_class or client metadata to create values that you can use to filter jobs. If you are having to enforce unruly operators, your options are more limited without enterprise.
    Michael Aldridge
    @the-maldridge
    always assume unruly operators, if you don't you get people like me who come along and do weird things to your cluster
    Paul
    @pauldon2

    @pauldon2 with type you mean like batch jobs, or something else?

    service job

    Shantanu Gadgil
    @shantanugadgil
    @pauldon2 an easy way to limit would be to use a different named datacenter, so no user can say, "oh, but I didn't know ..." :grin:
    Michael Aldridge
    @the-maldridge
    @shantanugadgil never underestimate the user. I have found stuff deployed to the wrong DC and oblivious users who didn't even know what they were copy pasting
    Shantanu Gadgil
    @shantanugadgil
    :smile:
    Paul
    @pauldon2
    @shantanugadgil hmm - this is not exactly what I want
    gioxoay
    @gioxoay
    Hello, I'm first time use Nomad, then I have issue when setup cluster. For master node, I have a private IP 10.0.0.10, after install docker, my network add more private IP is 172.17.0.1 and Nomad take this IP ad host IP. But when I add more Nomad client, it's cannot connect to server node, then what is my wrong? Anybody please help!
    andrekzn
    @andrekzn
    hi nomaders! someone tried to run nomad cluster on rhel|centos 8 os? is there any gotchas, which you encountered?
    @gioxoay hi, check this out https://learn.hashicorp.com/nomad/operating-nomad/clustering , there is clean states about clustering, also read the docs for more detailed explanation of the low level cluster config
    Florian Apolloner
    @apollo13
    @pauldon2 there is an option to limit clients to execute only batch or service tasks (can't find it right now though). If you need more than that you can do what @angrycub said.
    @andrekzn selinux will probably hate you, also docker is not supported on EL 8
    Florian Apolloner
    @apollo13
    @andrekzn to be clear, generally nomad will run just fine on RHEL -- I personally don't think it is the best idea to run it on EL 8 though (My other boxes are indeed EL, but nomad is on debian for various reasons)
    Robert Edström
    @Legogris
    Anyone else had issues with Vault token renewal in recent releases? We had it working fine in 0.11.0, after an upgrade to 0.11.2 (unclear yet if directly related), nomad didn't successfully renew the vault token anymore (vault v1.4.1), which eventually led to the token expiring and nomad/vault integration breaking.
    Tommy Alatalo
    @tommyalatalo
    Has anyone set up rabbitmq with consul peer discovery as a nomad job? I'm having issues getting the cluster to bootstrap with this kind of error [warning] <0.267.0> Could not auto-cluster with node rabbit@10.17.1.159: {badrpc,nodedown} which is strange since it's not supposed to use ip numbers as far as I can tell
    Tommy Alatalo
    @tommyalatalo
    I also keep getting fabio health checks failing with message "TTL expired", does anyone know anything about this? The instance restarts and then works again, but its obviously not nice that it keeps restarting every now and then for some odd reason
    Charlie Voiselle
    @angrycub
    @Legogris... maybe you are running into this? hashicorp/nomad#7968 It sounds like what you are describing.
    4 replies
    Juan Carlos Alonso
    @jcalonso

    Hello! Yesterday I was following this guide: https://learn.hashicorp.com/nomad/stateful-workloads/csi-volumes
    I have almost everything running, EBS created, Nomad Ebs controller and node running, volume registred in Nomad, but when I try to run the MySQL job I get the following error:

    failed to setup alloc: pre-run hook "csi_hook" failed: claim volumes: rpc error: controller publish: attach volume: controller attach volume: rpc error: code = NotFound desc = Instance "i-xxxxxxxx" not found

    Any ideas what could be wrong?

    Is the NotFound refering to the EBS volume?
    Charlie Voiselle
    @angrycub
    I think that that is the ec2 instance id. I'd doublecheck the status of the CSI Node plugin and make sure that it is running on all of your cluster nodes, and specifically check it on the Nomad client running on the instance mentioned in the error.
    17 replies