Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Cristian Măgherușan-Stanciu
@cristim
I just wanted to give people the chance to look over the code and maybe report if I'm missing anything in the logic
it's a big change and it's better if more people look over it
Johannes Tigges
@lenucksi
Sounds good, hopefully will find a bit of time to look over it
Cristian Măgherușan-Stanciu
@cristim
@/all testers wanted for the new event-based logic, the latest build for #354 can be installed using https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/template?stackName=AutoSpotting&templateURL=https://s3.amazonaws.com/cloudprowess/custom/template_build_1439509.yaml please let me know if you notice any issues
Ken Cieszykowski
@bestpeppep_twitter
I haven't seen this explicitly written anywhere; but I'm currently doing some investigation for spot instances in EKS. There seem to be a couple of pods (and a termination handler pod) for spot transition and spot termination-- but it also seems like Autospotting's new event handling can function with all of that. Does anybody use this with EKS clusters (we are solely using it with ECS), and could you let me know what other pods (if any) you use? Trying to figure out if I want to launch new nodes as SI's or let this handle everything.
Ken Cieszykowski
@bestpeppep_twitter
I feel like that in tandem with AutoSpotting would kick ass for an EKS cluster. The only thing I'm worried about is the situation where 10/10 nodes are all reaped; I don't see the mechanism to spin up 10 quick on demand instances and reschedule all of the pods the other way to guard against downtime.
Cristian Măgherușan
@magheru_san_twitter
That's why you should have multi-regional deployment where uptime really matters
We should also implement a diversification strategy
Ken Cieszykowski
@bestpeppep_twitter
Fair point-- an ASG in 1a and 1b, with the cluster-autoscaler balancing between the AZ's
hrmmm
Thank you
Cristian Măgherușan
@magheru_san_twitter
You're welcome 😁
Cristian Măgherușan
@magheru_san_twitter
I actually meant to have a second cluster in a different AWS region, not just spanning multiple AZs
threeio
@threeio
Anyone else seeing issues with the latest 354 build? I've been tempted to bring it to prod but would love any other feedback folks have seen
Cristian Măgherușan-Stanciu
@cristim
There's a race condition with the ASG which is apparently manifesting in unexpected churn when replacing multiple instances at once, but I couldn't reproduce it yet
Anyway we still have the current replacement logic in place and should resolve this soon
I would test it in a lower environment first but by all means please give it a try and provide feedback
I'm still working on it for handling the startup lifecycle hooks using Cloudtrail events
Then I will look into this race condition and implement a fix by temporarily suspending the Autoscaling termination process
Cristian Măgherușan-Stanciu
@cristim
My current idea is to create a small DynamoDB table with TTL/expiring records for each processed ASG and keep postponing the TTL for each execution against a given ASG. Then handle the deletion event to resume the Autoscaling termination process once the TTL expires and deleted the entry of a given ASG
But if anyone has a better idea I'm all ears 😁
I wish the Autoscaling API would allow setting such expiration when suspending the Autoscaling processes, so I wouldn't have to build it myself
I created a support case asking for this but I'm not going to wait for it to be done
Tummala Dhanvi (c0mrad3)
@dhanvi
@cristim Thank You! for Autospotting, the ability to control the max cost of the spot instances same as on-demand and flexibility to switch between on-demand and spot is a game-changer!
Cristian Măgherușan-Stanciu
@cristim
thank you @dhanvi
Ryan Rivest
@ryanriv
I'm trying autospotting for the first time, and hoping to use it with elastic beanstalk. It is bringing up the spot instances, but they seem to be coming up without my application deployed. We're using Windows servers. Any suggestions?
Cristian Măgherușan-Stanciu
@cristim
I'm not aware of anyone running windows and elastic Beanstalk so far, so you seem to be in uncharted territories
Cristian Măgherușan-Stanciu
@cristim
@/all I recently had an intro session for AutoSpoting developent to some people interested to contribute, if you're also interested feel free to have a look at https://autospotting.org/development-intro.mp4
it also explains in detail what I've been working on within the last month or two, hopefully to land into mainline within a few weeks
xlr-8
@xlr-8
Hello @cristim - I thought I remembered an issue regarding the replacement logic where the "health/stability of the AZ" should be taken into account
Does that ring a bell? If not, it might be worth opening it. For example, I can that eu-west-1b is very stable today (at least for us), but the other are awful - some instance last less than 10min or even a minute
Cristian Măgherușan
@magheru_san_twitter
Thanks
I think we had something
Cristian Măgherușan-Stanciu
@cristim
The current implementation should eventually converge to stable instance types when failing to launch new instances due to insufficient capacity
Jinyu Pan
@swimablefish
Hi, how can autospotting work with kubernetes cluster-autoscaler?
Cristian Măgherușan-Stanciu
@cristim
I am not using it on such a setup but I know many people who use it with their Kubernetes infrastructure and nobody reported any issues
Jakub Rutkowski
@rutkowskij
Hello @cristim, I've used AutoSpotting since near 2 years. It is great and I and my team love to use it. After recent changes related to CloudFormation and UserData changes - We've rebuild them and installed new version. After that we recognize that there are different behavior related to replacing on demand instances by spots. As we observed before - new spawned spot instance was attached to auto scaling group immediately, but now it is created but will be attached after grace period time and on demand will be killed immediately (previously on demand instance was killed after next AutoSpotting launch). I think previous behavior was far more graceful for environment which receive traffic. Previously spot instance has had time for graceful warm up next to other warm instances. Currently it is attached just with kill previous instance and pushed to handle as big load as warmed instance do - it is sometimes risky. This behavior (late attach to autoscalling group) is very often cause of problems with delete stack (for elasticbeanstalk e.g) because it is not attached but related what cause a problem and needs a manual reaction. I'm of course not aware how AutoSpotting is used by others but in our case it would be great if instance will be attached immediately to ASG. We use ELB healt check type - so it is recognized as healthy if is indeed. Grace period of course is also honored. What about on demand killing - i think it would be great if it will be configurable by for example by time after last spot lunched but separated by grace period param and executed by another runs of autospotting.
Cristian Măgherușan-Stanciu
@cristim
Thanks Jakub and I am happy to hear that you are happy with the project. I had some work in progress on making the instances attach immediately after launch but I recently switched jobs and I have very little time to work on Autospotting these days. Feel free to pick up from where I left it, I would love to see that work eventually land but I'm afraid I won't be able to make it anytime soon
Cristian Măgherușan-Stanciu
@cristim
If you are a Patreon backer, I can help you getting started with this and ironing out whatever issues we find for an additional charge we can agree on but I can't afford to offer pro-bono support in my very limited spare time
Jakub Rutkowski
@rutkowskij
Sure, I understand this. What about the plans ? Are you plan to spend much less time to extend / develop AutoSpotting by yourself or abandon it? Are there any contributors which are ready to lead this project as You do?
Cristian Măgherușan-Stanciu
@cristim
I have no plans to abandon it, I put too much time and effort in it, but I can't afford to spend large chunks of time on large feature development. I will continue to add small features and maintain the project when someone else is contributing. I tried to make working on it a full-time job but I had very few paying users and it didn't work out so I need to keep a full time job and can only work on it on the side
But I would gladly expand my involvement in case it starts to bring me more significant income, at the current state I can only afford a few hours per month considering my usual rates
Kapil Thangavelu
@kapilt
@cristim fwiw, you should also apply for credits for opensource projects, https://aws.amazon.com/blogs/opensource/aws-promotional-credits-open-source-projects/
Cristian Măgherușan-Stanciu
@cristim
Thanks Kapil, but AutoSpotting doesn't require much. There's very little infrastructure behind it and it's entirely serverless so it would cost me peanuts. For now I still have a bunch of credits received at various events, enough to cover my experiments for the next year or so. But thanks for the idea, I'll consider it once those run out.
Cristian Măgherușan-Stanciu
@cristim
To summarize, if you aren't happy with the rate of progress and would like to see me work more on Autospotting and other such tools (I've been working on more and also have a few ideas for enhancements and even new tools), please consider buying a support plan on Patreon. My open source activity will be proportional to your support.
Jakub Rutkowski
@rutkowskij
I'm sorry if you felt hurt by my question. I understand your point and really appreciate your effort and great work. I'm not sure if I will try to change something in code (I don't know go lang) But I try to help with testing and analyzing cases.
Jakub Rutkowski
@rutkowskij
I've written my thoughts about instances replacement with hope about discussions - I think my proposition needs quite few changes in comparison to event based approach (but not evaluated)
Cristian Măgherușan-Stanciu
@cristim
Oops, I wanted to write more... As far as I understand the requirement is to get the spot instances launched and attached to the group as soon as possible after being launched to avoid the current issues. The event based execution should achieve this, unless the group has been configured with launch lifecycle hooks. In addition it would also immediately replace on demand instances with spot right after they were launched.
Cristian Măgherușan-Stanciu
@cristim
@/all I'm pleased to announce that a new stable version has just been released to Patreon supporters. It includes a number of features developed and matured over the last half a year and has a reduced monthly licensing cost of only $29/account/month. And another piece of news: the Open Source version of AutoSpotting has been relicensed from MIT to OSL-3.0 in order to ensure contributions back into the open source code base and to discourage closed source forks. If for whatever reasons this license isn't suitable to anyone out there, I'm open to negotiate custom license terms on a case by case basis.