Tuesday, August 3, 2021

An issue I had with AWS CodeDeploy

 Hi there

Clearly, I've not really written anything here in 7 years. Let's change that. By writing about an issue we had at Qixxit about 3 years ago. It's pretty specific so really not exciting but I've published a bit of code for it and I wanted to finally do a bit of a write-up on it.

The problem

We ran qixxit's backend on AWS as an auto-scaling group behind an application load-balancer. The auto-scaling group would manage ec2 instances running the application. The auto-scaling group would know about the linked load balancer through a property "target-group" in its configuration. This allows the auto-scaling group to tell the load balancer to stop sending traffic to an instance before removing it. 

We used AWS CodeDeploy's blue/green deployment feature to roll out new versions. This would create a new configuration of the auto-scaling group based on a newly built AMI. It would then bring up new instances for that and redirect traffic from the load balancer to these. Eventually, the old configuration is removed.

The new configuration for the auto-scaling group has all the settings of the previous configuration copied over automatically by CodeDeploy. Except for that "target-group" setting.

Now when the auto-scaling group decides that it wants to scale in and remove an instance it will just immediately shut down that instance. (Sidenote: maybe it's because English isn't my native language but I really don't get why it's scale in/scale out and not scale down/scale up. Is this maybe specific to Amazon's documentation?) Anyway, the instance is gone but the load balancer still sends traffic to it.

I don't actually remember what specific problems this caused and whether this was also an issue when scaling out. It wasn't too dramatic and initially we worked around it by just setting the target group manually and filed a support request.

Amazon's support was helpful and acknowledged the issue. But they didn't provide a way to directly track whether this was being worked on (I guess they generally don't?) and from the communication it sounded like it was pretty unlikely anyone would ever actually work on it.

A solution

And so I wrote a bit of Javascript to solve it for us. CodeDeploy sends notifications for the various steps of the deployment. That can be hooked into to then set the target group of the auto-scaling group. The whole thing runs as a lambda managed via the serverless framework.

This was a nice way for me to practice my javascript test driving skills and also to play around with the serverless framework. And as of recently also to play around with Github Actions. Though I don't actually know if any of it still works or whether it's even still relevant and maybe the bug got fixed by Amazon. 


In closing

I don't like how this write up turned out. It seems impossible to me to describe this in a generic way that doesn't require some knowledge of a bunch of specific AWS products. But maybe that's a good way to keep expectations low for this blog.