Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Cristian Măgherușan-Stanciu @magheru_san
@cristim
xlr-8
@xlr-8
I believe that's not quite correct, I've been playing with the tags on other projects and I haven't had issue.
I'm speaking about adding filtering not about cost allocation tags
Cristian Măgherușan-Stanciu @magheru_san
@cristim
yes, I did that and it returned cost zero for a tag that wasn't propagated to the billing data, and non-zero for another one that was enabled
xlr-8
@xlr-8
hum :-/ - I'm still doubtful
If you get me a RO account running with autospotting I could try checking it further
Cristian Măgherușan-Stanciu @magheru_san
@cristim
unfortunately I can't do that, but no worries, it's okay
xlr-8
@xlr-8
You could play with it on the cost explorer console, to see
Cristian Măgherușan-Stanciu @magheru_san
@cristim
I think it's going to be tricky to calculate anyway, for older groups we'd need to look into the launch conifguration, get the configured type and then compare that with the spot instances we're running
we can do it for the current groups and estimate an hourly saving rate based on the current configuration even without using this costexplorer API
this tool is really valuable for historical savings but requires those tags to be propagated to the billing console, which we can't guarantee, and also to run AutoSpotting with the instance selection logic restricted to the same instance type as configured in the LaunchConfiguration
but that's doable if people want this sort of data to be precise
xlr-8
@xlr-8
I think the code I wrote, is mostly for day to day calculation
As the price of spot instance fluctuate through time
Cristian Măgherușan-Stanciu @magheru_san
@cristim
the fluctuations are small nowadays, so this will be relatively precise
but I guess I'll just log the current hourly savings, and then we can graph that and make a total over the last 30 days or a projection for the next month as well
xlr-8
@xlr-8
Yes, that was my idea at first, as you can't really know for sure otherwise
But it will always be an estimation as you can't get data for the day itself IIRC
Cristian Măgherușan-Stanciu @magheru_san
@cristim
Yes, and that's okay
Cristian Măgherușan-Stanciu @magheru_san
@cristim
@/all I've been recently working on improving the LaunchTemplate support. Please try the latest build, or if that doesn't work also have a look at PR #352
also, the latest build shouldn't interfere with the AWS native mixed ASGs anymore
Cristian Măgherușan-Stanciu @magheru_san
@cristim
there were a few issues left even after merging #352, but I managed to address most of them so LaunchTemplates should now work for most cases. Let me know if you still notice any issues
Cristian Măgherușan-Stanciu @magheru_san
@cristim
@/all I just published a draft PR for a very consistent refactoring I'm working on at the moment, feel free to check it out at AutoSpotting/AutoSpotting#354
threeio
@threeio
Looking forward to the new PR implementation... using pending as a trigger point will help me with a bunch of race conditions that occur on our side :)
Cristian Măgherușan-Stanciu @magheru_san
@cristim
thanks @threeio, hopefully I'll have something running within the next few days, stay tuned
I just wanted to give people the chance to look over the code and maybe report if I'm missing anything in the logic
it's a big change and it's better if more people look over it
Johannes Tigges
@lenucksi
Sounds good, hopefully will find a bit of time to look over it
Cristian Măgherușan-Stanciu @magheru_san
@cristim
@/all testers wanted for the new event-based logic, the latest build for #354 can be installed using https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/template?stackName=AutoSpotting&templateURL=https://s3.amazonaws.com/cloudprowess/custom/template_build_1439509.yaml please let me know if you notice any issues
Ken Cieszykowski
@bestpeppep_twitter
I haven't seen this explicitly written anywhere; but I'm currently doing some investigation for spot instances in EKS. There seem to be a couple of pods (and a termination handler pod) for spot transition and spot termination-- but it also seems like Autospotting's new event handling can function with all of that. Does anybody use this with EKS clusters (we are solely using it with ECS), and could you let me know what other pods (if any) you use? Trying to figure out if I want to launch new nodes as SI's or let this handle everything.
Ken Cieszykowski
@bestpeppep_twitter
I feel like that in tandem with AutoSpotting would kick ass for an EKS cluster. The only thing I'm worried about is the situation where 10/10 nodes are all reaped; I don't see the mechanism to spin up 10 quick on demand instances and reschedule all of the pods the other way to guard against downtime.
Cristian Măgherușan
@magheru_san_twitter
That's why you should have multi-regional deployment where uptime really matters
We should also implement a diversification strategy
Ken Cieszykowski
@bestpeppep_twitter
Fair point-- an ASG in 1a and 1b, with the cluster-autoscaler balancing between the AZ's
hrmmm
Thank you
Cristian Măgherușan
@magheru_san_twitter
You're welcome 😁
Cristian Măgherușan
@magheru_san_twitter
I actually meant to have a second cluster in a different AWS region, not just spanning multiple AZs
threeio
@threeio
Anyone else seeing issues with the latest 354 build? I've been tempted to bring it to prod but would love any other feedback folks have seen
Cristian Măgherușan-Stanciu @magheru_san
@cristim
There's a race condition with the ASG which is apparently manifesting in unexpected churn when replacing multiple instances at once, but I couldn't reproduce it yet
Anyway we still have the current replacement logic in place and should resolve this soon
I would test it in a lower environment first but by all means please give it a try and provide feedback
I'm still working on it for handling the startup lifecycle hooks using Cloudtrail events
Then I will look into this race condition and implement a fix by temporarily suspending the Autoscaling termination process
Cristian Măgherușan-Stanciu @magheru_san
@cristim
My current idea is to create a small DynamoDB table with TTL/expiring records for each processed ASG and keep postponing the TTL for each execution against a given ASG. Then handle the deletion event to resume the Autoscaling termination process once the TTL expires and deleted the entry of a given ASG
But if anyone has a better idea I'm all ears 😁
I wish the Autoscaling API would allow setting such expiration when suspending the Autoscaling processes, so I wouldn't have to build it myself
I created a support case asking for this but I'm not going to wait for it to be done