And Finally, a WordPress Cluster using CloudFormation in AWS

Ok, let’s do this thing! In this post I’m going to be adding a WordPress Auto Scaling Group spread across two availability zones, and an Application Load balancer to accept connections on port 80 and direct traffic to the Auto Scaling Group. Both will be added to the CloudFormation template I started in my last blog post.

When we’re done, our VPC will look like the following image, which is back to the target image I showed in my first post for the complete solution.

Complete diagram of my VPC with WordPress

Don’t be fooled, however. While I will have deployed all of the resources shown in the picture, the WordPress cluster won’t actually auto scale until my next post. Even though it’s still called an auto scaling group in AWS, it’s really more of a collection of auto healing instances right now.

The WordPress Scaling Group

The following resources describe a scalable group of Apache web servers running WordPress.

The WordPress Security Group

By now, if you’ve been reading through the series, it’s probably obvious that I generally create a dedicated security group for each addressable resource or group of resources. Trying to reuse security groups across resources that serve different purposes usually leads to bad compromises. If one group needs to allow SSH in and the other needs to allow HTTP, and they’re sharing a security group, the lazy admin will just allow SSH and HTTP to both groups. You can always block unwanted protocols locally with iptables, but defense in depth says you should block it at both levels.

With that said, here is my WordPress security group, which just allows HTTP in from anywhere. It also allows SSH and ICMP, but only from sources within the CIDR range of the VPC.

The Instance Profile and Role

Here is the profile that I will associate with my WordPress instances in order to be able to convey permissions to the instance without having to do something silly and insecure like storing access keys on the VMs.

And permissions are conveyed to the profile through a role, so here is that role:

All of the permissions granted through this role have been discussed in previous posts.

The WordPress Scaling Group

And below is the scaling group for my WordPress cluster. For the most part, this is just like the scaling groups declared in previous posts in this series, but with a few crucial difference highlighted below.

The first difference is that there is no launch configuration to describe the VMs for this group. Instead there is a launch template. Launch templates are a newer resource type with the following advantages over launch configurations:

  • Launch templates allow you to specify some newer attributes related to newer machine types. For instance, you can specify unlimited bursting for t2 and t3 instances.
  • Also, unlike launch configurations, multiple versions of a launch template can coexist, which is why I have to specify what version of the template I want.

For these reasons, Amazon recommends always using launch templates and maintains launch configurations only for backwards compatability. I haven’t used them until now, becuase all of the sample templates from AWS, and most of the samples found elsewhere, tend to be dated and use only launch configurations. I’m not actually using any of the newer properties at the moment and don’t really care about the versions, but figured it was time I checked them out. I’ll describe differences between launch configurations and launch templates in the next section.

The next highlighted change is that the HealthCheckType is specified as ELB. That means that in addition to the normal status checks performed on all EC2 instances, the scaling group will also take health check advice from the Elastic Load Balancer. Meaning if the load balancer determines through it’s own health checks that the instance is unhealthy, the scaling group will eventually terminate and replace the instance. I’ll also talk more about the load balancer health checks when I get to that section a bit further down the page.

The last highlighted difference is that I specify a target group ARN. Load balancers forward traffic to target groups based on some kind of rules. EC2 VMs and scaling groups can be added to target groups. And of course, I’ll talk more about target groups a bit below.

The Launch Template

And now we get to the launch template. Now it’s a monster as you can see, but no more so than a launch configuration. I’ve just hidden some of it’s girth in previous posts by breaking out some of the bigger sections and explaining them separately (like Metadata and UserData). But we’re past that now, so here’s the whole load of crap in all its glory:

Now the Metadata and UserData are structurally the same as from the launch configuration, so I’m going to concentrate on the highlighted bits, which are the key differences for a launch template. Obviously, the first change is that the type is different. The next change is that there are only two properties for the launch configuration: LaunchTemplateName and LaunchTemplateData.

So where did all of the other properties of a launch configuration get to? Surely you can configure all of the same stuff plus more in a launch configuration? Of course, but those properties are nested one level deeper. For instance, instead of Properties.ImageId, now we have Properties.LaunchTemplateData.ImageId. And finally, some of the property values have structurally changed, and since there aren’t a ton of sample templates using launch configurations, it can be difficult to figure out the differences at first. I’ve highlighted those properties, which are:

  • Monitoring – used to be just a boolean. Now it’s an object with a property called Enabled, with a boolean value. Why? I haven’t looked into it, but perhaps there are other properties you can set on that object? Or maybe there will be in the future? Don’t know for sure, but that’s the only thing that makes sense to me.
  • IamInstanceProfile – used to be just a reference to the instance profile. Now it’s an object with a property called Name and a value of a reference to the instance profile. Not going to try to explain it, see above for best guess.
  • SecurityGroupIds – used to be just SecurityGroups with a value of security group references. Now it’s either SecurityGroupIds or SecurityGroupNames. For non-default security groups, you must specify SecurityGroupIds, and for default security groups you can use SecurityGroupNames. The Ref function returns the id for non-default security groups and the name for default security groups, so that helps a little, but this seems like an unnecessary and idiotic level of added complexity. And you can’t specify both properties, so I guess a single launch template can’t be in both default and non-default security groups? Actually, maybe you can reference a default security group and call FN:GetAtt to get it’s id, which would allow you to use SecurityGroupIds for both (haven’t actually tried this). I don’t tend to use default security groups anyway, so I’ll just stick with SecurityGroupIds. And thanks a bunch for this one Amazon!

And lastly, here is the UserData, in a better format than embedded in JSON:

What I’m doing here is, before I call my bootstrap script, I’m writing a bunch of my stack parameters out to /root/.aws/, which I lock down at both folder and file such that only root can see it. I also turn off bash history at the beginning of the script, because I don’t want my password showing up in my command line history. I’m going to need this information in order to configure things like my wp-config.php file, which tells WordPress how to connect to the database.

The WordPress Bootstrap Script

And finally there’s the bootstrap script, which does all the normal stuff my bootstrap scripts from previous posts did like creating admin accounts and installing/configuring IP tables. The WordPress specific stuff is:

The comments kind of tell the story, but the highlights are:

  1. Install Apache and PHP packages.
  2. Download and install WordPress in the web server root folder.
  3. Configure WordPress.
  4. Enable systemd to start Apache on boot, and start Apache now.

There is one other change I made for this script that isn’t WordPress specific, but was necessary because of the switch from a launch configuration to a launch template. It’s at the bottom of the script where I call cfn-init and cfn-signal:

In previous posts I’ve called both cfn-init and cfn-signal with the resource specified as the launch configuration. So I tried doing the same for launch template, but got an error calling cfn-signal (something about an illegal resource type). I had to change the cfn-signal resource to the scaling group. The metadata for cfn-init is still in the launch template, so specifying the launch template for cfn-init worked just fine.

The Load Balancer

Now that we have a WordPress scaling group, we need a traffic cop to balance the load between WordPress instances (i.e. a load balancer).

The Security Group

We start with our load balancer security group, which just allows inbound port 80.

The Application Load Balancer

The load balancer declaration just specifies what subnets it will run on and what security group it will be in.

The Target Group

The target group is where the action is:

The target group health check serves two purposes. First, if an instance comes up as unhealthy, the load balancer will stop directing traffic to it. But also, if the scaling group specifies ELB as it’s health check type, then the load balancer will report unhealthy instances back to the scaling group, which will eventually terminate and replace those instances if they don’t right themselves in a reasonable period of time. You specify:

  • A server relative path.
  • An interval for how often to check.
  • How long to wait before a timeout.
  • A protocol, port, and expected HTTP response code.
  • A count of bad checks before marking an instance as unhealthy.
  • A count of good checks before marking an instance as healthy again.

There are a couple of things I don’t love about this health check, starting with the fact that you can only specify a server relative path for the health check. This means if you configure Apache for VHosts, you can only test the default VHost. You also can’t specify multiple health checks, which also isn’t great for VHost configurations.

To get around that, you’d need to write a custom health check. This could be a shell script running as a CRON job that uses the CLI to report healthy/unhealthy based on whatever criteria you like. Or it could be a Lambda function doing much the same with the API. But without some custom code, you can only check the default VHost.

The other interesting bit here is the TargetGroupAttributes. Here I’ve configured stickiness, which means a given client will continue getting connected to the same instance until some period of inactivity has passed. This can lead to something of an unbalanced load, since some clients may hit the backend harder than others. And it is not strictly speaking necessary, since WordPress stores session state in the database.

The Listener

And last but not least, you need a listener, which just ties together a load balancer and target group based on a protocol and port.

And since it is based on a protocol and port, if you want to handle both HTTP and HTTPS, you need two listeners. And since the target group health check is also based on a protocol and port, you’ll also need two of these for that scenario, but both of them can contain the same scaling group as their backend.

I find it interesting that you specify DefaultActions, but I don’t see a way to specify any other actions? So shouldn’t this just be called Actions?

Now the listener works just fine for VHosts, because the original host name you asked for is passed through in the host header by default. Also, if you install mulitple certificates on the load balancer, it will look for the best certificate to use based on the requested host name and SNI (Server Name Indication), so VHosts work fine for SSL too.

So it’s just the health check that doesn’t have good built-in VHost support.

Deploying the Cluster/Load Balancer

The deployment is just like the last post, so just upload the template in the zip file below to CloudFormation and fill in the parameters. If you’ve already deployed the database from the last post, then upload the template as a stack update or change set. Otherwise upload it by creating a new stack. Here’s what I entered for the parameters:

The MariaDB Parameters

Remember, the parameters marked with a red asterisk were present but were not actually used before, so take care to make sure they’re correct now.

The AdministratorsGroup parameter is an IAM group from which will attempt to make administrator accounts. If you don’t have a motd banner ready, just leave that alone and use mine for now. Unless you expect 10s of thousands of users per month, db.t3.micro is probably fine for now. But if you really want high availability, MutiAzDatabase should be true and WebServerCapacity should be at least 2 (of course, both of those will drive up the cost).

The reason the web server needs to have a capacity of more than 1 is because from time to time your web servers will wig out and report unhealthy. In my current setup, it’s unusual for a web server to run more than a week. That’s partially because my health check is sort of a hairpin trigger, but also maybe due to not following some Linux best practices. For instance, I only have one partition. /var should ideally be in a separate partition at a bare minimum, so right now log files may be filling up my hard drive. I haven’t really investigated the cause of this yet. But if I only had one web server instance, on every termination for a health check failure, I would be down at least until the new server got through it’s UserData script, which is 3 to 5 minutes on a good day. With two instances, most of my outages are less than a minute, just long enough for the load balancer to notice and redirect all traffic to the healthy host during the boot sequence.

As before, the DBUser and DBPassword parameters are for the database administrator. The DBName is the name of the database to which WordPress will be deployed. I’ve set the DBClass to db.t3.micro; this is the equivalent to Instance Type for the VMs and I’ve selected the smallest t3 size I could. I’ve also entered 5 for the DBAllocated storage which is 5 GB, the smallest hard drive you can allocate. I’ve also entered true for MultiAZ (but choose false if you want to save a little money), and selected the VPC I deployed in the first post in this series, and the two private subnets from that VPC. Click create stack and smoke em’ if you’ve got em’, because this is a pretty slow deployment (how long depends on if you’ve already installed the database or your installing the entire template from scratch, for me it took about 5 minutes with the database already deployed, so from scratch should be like 20 minutes).

Testing the WordPress Installation

Testing this one is pretty easy. All you need is a browser. In the AWS console, go to EC2 and click on Load Balancer in the left hand menu:

Load Balancer Tab in Console

Find the load balancer you just deployed (if you have multiple load balancers and it’s not easy to tell, look at the Created At field). Click the check box to the right of it and copy the DNS Name, then put that in your browser:

WordPress Setup

If you see the WordPress setup screen, we have lift off!

Sum Up

Ok…alright…beers all around, we’re done, right? Once again, not so fast sparky! We have WordPress up, and that’s great. We have a MultiAz MariaDB instance. And we’ve got a couple of Apache front ends. If one of them goes down, AWS will bring it back up again. All good stuff. But if 10,000 people hit our site in short order, the only way we’re scaling up is if we log into the console and raise the minimums on our scaling group. To achieve high availability, we need to add an auto scaling policy.

Fortunately, that’s not a lot of work. It will probably take more time to show you how to test it than to implement the scaling policy and explain it to you. It’s going to be a simple, but effective scaling policy, based on CPU load. But this post is quite long enough (my print preview says 27 pages, talk about TLDR; ow), so I’ll do that in the next post. – the CloudFormation template.


Leave a Comment