Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Kolin Korr
    @kolinkorr839
    Status      = failed
    Description = Failed due to progress deadline
    (1) What variable/parameter controls this message? "Task not running by deadline" in the worker allocation
    (2) The worker actually started fine eventually... so it seems like the "Last Deployment" seems misleading... is there a way around this? We use this message to say that the deployment of the job is successful or not
    the deployment is considered unhealthy once you have crossed that deadline. (work could be degraded, etc)
    Kolin Korr
    @kolinkorr839
    thanks. I will play with this healthy_deadline parameter.
    Also, is there a way to change the last deployment status from failed to success manually without doing another deployment?
    Charlie Voiselle
    @angrycub
    @kolinkorr839 I wonder if this could do the trick. I've never tried before, tbh. https://www.nomadproject.io/api/deployments.html#set-allocation-health-in-deployment
    Kolin Korr
    @kolinkorr839
    thanks will try that
    Shantanu Gadgil
    @shantanugadgil
    The default 5 minute timeout for the docker pull has been a source of many a problem for me too. I just keep it at 10 minutes.
    Urjit Singh Bhatia
    @urjitbhatia
    Hi folks, What are you guys using for auto-scaling jobs+hardware with Nomad? Replicator or something in-house?
    Shantanu Gadgil
    @shantanugadgil
    @urjitbhatia i am keeping my eye on Sherpa https://github.com/jrasell/sherpa
    Other than that, I would recommend a simple polling script of a (some) timeseries db where metrics are being gathered to take custom decisions for scaling out service and ec2
    Urjit Singh Bhatia
    @urjitbhatia
    Thanks, yeah I am using replicator currently and unfortunately it is quite buggy (+ not maintained anymore). Was looking for ideas and thought of asking the community here.
    Might go the route of the script with our custom metrics
    Václav Boch
    @vasekboch
    @urjitbhatia I'm looking on https://github.com/trivago/scalad, but had no time to try it out.
    Farhad Shahbazi
    @Grauwolf_gitlab
    frank
    @franksquaretwo_twitter
    Thats cool
    ping2balaji
    @ping2balaji
    hi guys, can we get the IP adderss of the allocations of the job using http api? i see the out of "nomad alloc status <allocid>" displaying the "Address" column. But the same is not available in REST response for EVAL. can anyone pls help here?
    Charlie Voiselle
    @angrycub
    For a sample allocation of the example job:
    curl http://127.0.0.1:4646/v1/allocation/1ee54fa6-662d-f77e-3f41-253d9bce2f0d | jq '.AllocatedResources.Tasks.redis'
    Will get you real close to it. There is a little more info you'd want to consume in there, like the port list, etc
    But it's in the allocation api, not the evals one.
    Kolin Korr
    @kolinkorr839
    so I am running nomad node-drain -enable -self ... and sometimes it runs fast!
    but sometimes it is so slow... as if it is waiting for something
    and when I do ps aux | grep executor, I see that there are some nomad executors running which I assume the nomad drain is waiting for...
    Charlie Voiselle
    @angrycub
    Probably. It also will wait to make sure that allocations have restarted when you are using the defaults.
    like it depends on your parallelism
    Kolin Korr
    @kolinkorr839
    oh... I do have max_parallel = 3...
    Charlie Voiselle
    @angrycub
    That can make it slow down.
    While it waits for replacement allocations to start up
    Michael Aldridge
    @the-maldridge
    If I make the job version a consul value does that re-actively redeploy jobs if it changes?
    Aaron Hurt
    @leprechau
    Levant would trigger on that
    Michael Aldridge
    @the-maldridge
    levant would, but I'd need it running somewhere
    I want to just push a change to consul and have prod carry out my intent
    Shantanu Gadgil
    @shantanugadgil
    @the-maldridge how about a meta key used from inside Nomad's template syntax
    then the Nomad agent would restart the job/task, right?
    Michael Aldridge
    @the-maldridge
    what just read the version and the meta tag from the same consul tag
    I suppose that would work
    but it does seem a bit clunky
    ping2balaji
    @ping2balaji
    @angrycub thanks that helped
    Charlie Voiselle
    @angrycub
    Awesome! Glad that unblocked you.
    Daniel Santos
    @danlsgiga
    @the-maldrige, maybe a consul watch?
    Michael Aldridge
    @the-maldridge
    on the version itself?
    Daniel Santos
    @danlsgiga
    Is the version a k/v? If yes, a watch would do the trick for you
    In my case I have a special salt module that handles a single k/v with a json object and I update only a specific value inside the json block
    Works wonders
    Michael Aldridge
    @the-maldridge
    yes I'm wanting to have the version be a value at the end of the key
    Daniel Santos
    @danlsgiga
    And a consul watch triggers a deploy using nomad under the hood whenever the k/v is updated and the job file hash changes
    Since I run that in all my hashistack nodes I also use a consul lock to avoid multiple nodes pushing the same nomad job