Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Ty
@karock
so deploy from the feature branch stackset and then replace the lambda .zip archive with a build from master?
Ty
@karock
our devops guy reports:
OK getting the below error when trying to create the stack
Nested Stack is not supported in SERVICE_MANAGED permission model
Cristian Măgherușan-Stanciu
@cristim
Okay, thanks!
Cristian Măgherușan-Stanciu
@cristim
I'll try to implement the nested stack resources inline, hope it's not going to break later because of the nested StackSet, we're in uncharted territory here
Cristian Măgherușan-Stanciu
@cristim
@karock I replaced the nested stacks with inline resources and it seems to work on a test Org-managed StackSet, it's just very slow to rollout
Cristian Măgherușan-Stanciu
@cristim
it's much faster with maximum failure percentage set to 50%, so I'm using that now
Cristian Măgherușan-Stanciu
@cristim
@/all Happy to announce that AutoSpotting now automatically handles the Spot Capacity Rebalancing events, to proactively drain workloads from terminating spot instances, without any required configuration changes.
Huge thanks to @gjmveloso from AWS for this contribution!
As usual with the non-stable builds, this is experimental code, so please let us know if you notice any issues.
mello7tre
@mello7tre
why we create another events rule ?
We can simply use the curent one for termination simply adding another match pattern to "detail-type:"
"detail-type:" is a list, the event is matched if any of the element match
this way we can simply a lot the template
Cristian Măgherușan-Stanciu
@cristim
@mello7tre patches welcome :-)
I'm actually working on CloudFormation simplifications as part of the StackSet change so I may take care of this myself. here are my plans for the next few days:
  • extract the Manage ASG Python code into a separate file that could also be reused from Terraform
  • look into using SAM for the CloudFormation stack, which should be able to also use this file
  • update the Terraform code to match what we currently have in CloudFormation, leveraging recent TF features for creating the regional resources
Cristian Măgherușan-Stanciu
@cristim
@/all if anyone is bored over the weekend, I'd like to have a review of my recent CloudFormation stack code changes that greatly simplify the template code by removing most conditionals and make use of StackSets for deploying the regional resources instead of a custom resource backed by Lambda https://github.com/AutoSpotting/AutoSpotting/pull/449/files For even more review fun there are also a few code style changes on the Go code and a bunch of deletions of unused code :-)
next week I'll work to port all this stuff to Terraform, now that I understand again what's really going on there :-)
Aarat Nathwani
@aaratn
Hey !! Autospotting looks too awesome ! I was going through https://rancher.com/reducing-aws-spend and was wondering if Rancher still supports autospotting ?
Cristian Măgherușan-Stanciu
@cristim
thanks @aaratn, I haven't been in touch with the people from Rancher for a while now, but last time we talked they had rearchitected their product and as far as I know they no longer use it
but don't let that stop you from giving it a try, AutoSpotting improved a great deal since that blog post
Aarat Nathwani
@aaratn
I totally agree on improvements !! Cluster autoscaler really looks awesome, I will give it a shot for my ec2 workloads. However I was looking for a solution which works with Rancher and thought to ask here !
Cristian Măgherușan-Stanciu
@cristim
AutoSpotting supports anything that runs on ASGs that you can tag, I guess that also applies to Rancher
Aarat Nathwani
@aaratn
Yeah, may be k8s cluster autoscaler + autospotting could be a solution
Cristian Măgherușan-Stanciu
@cristim
the current version hooks into the instance launch events and as soon as you run a new on-demand instance in an ASG it gets replaced with a spot clone
Aarat Nathwani
@aaratn
Okay
Do you know if there's a tested use case of autospotting + k8s autoscaler ?
Cristian Măgherușan-Stanciu
@cristim
I haven't tested it myself, but by all means try it and let me know if you notice any issues
Aarat Nathwani
@aaratn
sure ! I will give it a shot !
Cristian Măgherușan-Stanciu
@cristim
I know of people using it with K8s, just not sure if they used the k8s autoscaler
Aarat Nathwani
@aaratn
Okay. I can totally think of a solution if we have EKS + AWS ASGs which could come handy
And I will test it on a project that is using it, I am sure that will help us bring down some $$s
Cristian Măgherușan-Stanciu
@cristim
cool, thanks and please keep me posted of your results
Aarat Nathwani
@aaratn
Sure thing !
Cristian Măgherușan-Stanciu
@cristim
just keep in mind that for now the Terraform installation method is likely broken
so I'd recommend to use the CloudFormation until that's fixed, unless you want to figure it out and fix it yourself
Cristian Măgherușan-Stanciu
@cristim
@aaratn ^^^^
Aarat Nathwani
@aaratn

Thanks for headsup, I just checked this with cloudformation and it looks there's something wrong the way it estimates the billing savings. I am seeing below log messages

SC:2020-11-26T05:33:00 2020/11/26 05:33:33 autoscaling.go:247: ap-south-1 eks-worker-asg-dev Skipping group, license limit reached: would reach estimated monthly savings of $2643.70 when processing this group, above the $1000 evaluation limit

I tested this on ASG which was costing me $100/mo running 2xc5.large on-demand instances
Also, I kept the desired/min/max capacity to 2 instances on the ASG, however I am seeing 4 instances which are created out of which 2 of them are in ASG and 2 are not
Cristian Măgherușan-Stanciu
@cristim
those outside the ASG should be eventually swapped against those from the ASG, just give it some time. regarding the savings limit, I'm contemplating to remove it soon and bring back the no-limit but expiration within 30 or 60 days model that I had in the past, as it's too much hassle to maintain and update this savings computation logic to work with the new event-based execution mode
Cristian Măgherușan-Stanciu
@cristim
with a fresh group or new instances you shouldn't run into any limits, as the limit is only checked in the legacy cron mode
Cristian Măgherușan-Stanciu
@cristim
@/all reminder that the second Ask me Anything about AutoSpotting Zoom call is scheduled in about 45min, everyone is invited and can join at https://www.linkedin.com/events/askmeanythingzoomcallaboutautos6727895500017823744/
Cristian Măgherușan-Stanciu
@cristim
@/all I've just pushed a draft PR (AutoSpotting/terraform-aws-autospotting#37) that updates the Terraform code to match the latest CloudFormation template, please have a look and provide feedback. I successfully applied it but didn't yet test if AutoSpotting actually works with it. Testing it and any sort of feedback about it would be more than welcome
Cristian Măgherușan-Stanciu
@cristim
@/all the Terraform changes have been merged and seem to work fine after a few iterations. They're still experimental though, please let me know if you notice any issues
@aaratn try the latest build, I removed the savings limits logic and replaced with the expiration logic we had in the past
@aaratn in the latest version of the Terraform module I just released there's now also experimental Terraform native support for all the features available in CloudFormation, you may want to give that a try
Cristian Măgherușan-Stanciu
@cristim
@/all would anyone be interested in automatic provisioning of GP3 and IO2 volumes (where supported) instead of GP2 and IO1 respectively on the spot instances launched by AutoSpotting?
Ty
@karock
yeah I don't see any pricing/performance reason why that shouldn't be the default going forward.
Cristian Măgherușan-Stanciu
@cristim
I think the fixed performance regardless of size may make a difference to some people. If you have a large volume a baseline GP3 would perform slower than the GP2 of the same size.
But perhaps we could do the math and choose the GP3 automatically below a certain size threshold
Ty
@karock

yeah suppose so. looks like you have to have a GP2 volume at least 170 GiB before it can burst above 128 MiB/s, and larger than 334 GiB before it'll do 250 MiB/s all the time.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html

Cristian Măgherușan-Stanciu
@cristim
And 1TB GP2 if you want to match the 3000IOPS baseline of GP3
I think it makes sense to have the conversion automatically done based on a size threshold configurable by the user with a sane default: 170GB if you need to match the throughout and 1TB for IOPS
Cristian Măgherușan-Stanciu
@cristim
@/all experimental/WIP support for the automated storage upgrade from GP2->GP3 and IO1->IO2 EBS volume types is now available at AutoSpotting/AutoSpotting#436 Feel free to try it out and report issues