Ensuring Your CloudFormation Scripts Deploy Properly in Production

ACM.87 How one change can affect other working code, leading to unexpected disaster recovery and deployment failures

This is a continuation of my series of posts on Automating Cybersecurity Metrics.

I already fixed the error I’m going to write about here in a prior post. I ended up writing this post in the middle of trying to create a user-specific secret in Secrets Manager, but I didn’t want to break up those posts so I moved this to the end. I hit a snag and wanted to show how you can overcome it and things you should think about in regards to production CloudFormation deployments.

Access to KMS is not allowed (again)

Even after using my standard template to allow the IAM administrator group role to encrypt data with the KMS key in my post where I was trying to deploy a user-specific secret, I was getting an error stating that the IAM administrator did not have permission.

An error occurred (AccessDeniedException) when calling the CreateSecret operation: Access to KMS is not allowed

Now, this same template for KMS key creation worked just fine before. Why would a script that worked perfectly fine before be failing now?

The KMS error message, as I’ve written about before, is not that helpful. I know that I allow access to KMS both in the IAM policy and resource policy. Let’s take a look to verify what the problem is.

First, a trip to CloudTrail logs. As with prior posts, I’m going to scroll over and look for an error message in the error column we added in a prior post.

Well, that message is again, not super helpful but let’s look at the details by clicking on the log item. The error message, again, is not very helpful.

What is interesting is that I know there’s another error in the logs that is not appearing with the default filter read-only=False. It also doesn’t show up when I set read-only = True. This is strange to me because either a log line is read only or it is not, so it should show up for one or the other, shouldn’t it?

I saw the error last night while testing before so I know there is a related error message. Let’s search for KMS errors:

Now I see an Access Denied error:

It is occurring on the GenerateDataKey action:

I don’t recall seeing this error before while trying to implement KMS scripts earlier in this series but maybe I’m just forgetting. Anyway, let’s look at the details:

User: arn:aws:sts::xxxx:assumed-role/IAMAdminsGroup/botocore-session-xxxx is not authorized to perform: kms:GenerateDataKey on resource: arn:aws:kms:xxxx:xxxx:key/xxxx because no resource-based policy allows the kms:GenerateDataKey action

How can this be? We passed that role in as the role allowed to encrypt data with this key the same way we did before using the exact same scripts. Let’s review the resource policy for the KMS key in the AWS console.

Clearly this policy allows the IAMAdminsGroup role to perform the GenerateDataKey action:

I double checked the key ID and the key ID to which the policy is attached matches the key ID in the error message. What??

Ah, but recall that I added some code to set a condition which varies depending on which service is requesting to use the KMS key. This condition will either be a secrets manager condition or this generic condition for anything else:

Let’s see what the event source is on our error message. Nope that is also correct in the item that is failing and giving us the access denied message:

Back over in my KMS policy deployment script I set the condition service to EC2 because initially I was going to use this key to encrypt AWS EC2 instances.

If you recall from a prior post if you’ve been following along, anything other than secrets manager uses the default condition. Well, we are using secrets manager but it wasn’t the event source.

Inspecting the request further we can see that this action is invoked by service above the event source line:

Let’s change the policy condition service to secrets manager:

Deploy the new key policy and then try to create the SSH key again.

Test all the paths when you make a change

What I realized at this point is that I had a typo (as well as an extraneous double negative) in my KMS YAML CloudFormation template. Although my template had worked for my new case of a non-secrets manager service, I forgot to go back and validate the template was still working for the secrets manager condition.

I had a typo in this line that checks to see if the service is secrets manager (I spelled manager wrong).

I also removed the “!Not” in the line above as it was extra work for no reason and altered the subsequent code to fix the logic accordingly.

After fixing that typo the code worked fine and the SSH key got uploaded to Secrets Manager.

Test the things you didn’t change, not just the things you changed!

Well, I’m only one person trying to write and test all this code. Hopefully if you are working on a mission critical or customer-facing system you have a QA team as well as a Dev team. The QA team should ensure they test not only the things that changed but anything that might be affected by the thing that changed. Automated testing is best if possible as I mentioned in a prior post and am working on implementing along with my code.

There’s a test script in the root directory of the GitHub repository associated with these posts that ultimately calls all my deploy scripts so I’ll need to go back and make sure that’s working when I’m done here.

Test deployments on top of what you have in production and from scratch

Additionally, the templates don’t execute when there are no changes to them, so in order to truly test CloudFormation templates you need to do two things:

  1. Test deploying new code from the existing state of the code in production. If you test on any other version or variation of the code you don’t know what will happen when you deploy in production. Typically this is the purpose of a staging environment.
  2. Delete and deploy all the code from scratch and test that after each change if you are counting on these scripts in a disaster recovery scenario. Otherwise, you don’t really know if they are going to work or not.

In this case, I discovered the error when I tried to implement new code. I had not yet gone back and tested my other code that leverages a KMS key in conjunction with AWS Secrets Manager. I’ll need to figure out a test script I can write to do that.

For now, I need to get onto deploying an EC2 instance that uses this SSH key and see if it works.

Teri Radichel

If you liked this story please clap and follow:

Medium: Teri Radichel or Email List: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests services via LinkedIn: Teri Radichel or IANS Research

© 2nd Sight Lab 2022

All the posts in this series:

Automating Cybersecurity Metrics (ACM)

____________________________________________

Author:

Cybersecurity for Executives in the Age of Cloud on Amazon

Need Cloud Security Training? 2nd Sight Lab Cloud Security Training

Is your cloud secure? Hire 2nd Sight Lab for a penetration test or security assessment.

Have a Cybersecurity or Cloud Security Question? Ask Teri Radichel by scheduling a call with IANS Research.

Cybersecurity & Cloud Security Resources by Teri Radichel: Cybersecurity and Cloud security classes, articles, white papers, presentations, and podcasts


Ensuring Your CloudFormation Scripts Deploy Properly in Production was originally published in Cloud Security on Medium, where people are continuing the conversation by highlighting and responding to this story.

Post a Comment

0 Comments