ACM.96 Gotchas when trying to create zero-trust policies and then later deleting and recreating the policies
This is a continuation of my series of posts on Automating Cybersecurity Metrics.
Dependencies between CloudFormation stacks
What happened when I tried to delete some stacks is that I could not delete them because other CloudFormation stacks reference that stack. I have to delete the thing referencing that stack before I can delete the related stack.
Then, when I tried to redeploy the stacks, I ended up with errors where I couldn’t create a policy because another resource did not exist, but I couldn’t create the resource without the policy. That’s a design issue. Here’s an example:
- I gave IAM Administrators permission to create SSH credentials and store them in Secrets Manager encrypted with a specific key.
- I cannot create the IAM policy because it is dependent on a non-existent KMS key.
- But I cannot create the KMS key because I have not yet created the KMS Admins.
- But I cannot yet create the KMS admins because I can’t deploy the IAM admin policy.
Separate KMS IAM Policies and Deployment Order
I got around this problem but separating out all KMS permissions for each role into a KMS specific policy. These KMS specific policies depend on the KMS keys to be created first.
- Deploy the IAM admins
- Deploy the non-KMS portion of the policy
- Deploy the KMS admins with their KMS privileges
- Deploy the rest of the IAM users, roles, and groups
- Deploy the KMS keys which have dependencies on IAM resources
- Deploy the KMS policies for any other roles that depend on the keys
- Deploy any other resources that depend on the keys (Secrets, EC2 instance, Parameters, etc.)
Overcoming risks associated with AWS changing ARNs in KSM policies
I ended up adding delete permissions for the “root” user in KMS key policies because it’s too easy to delete the admin user and then you cannot administer the KMS key. The root user permissions includes the root user in the account that cannot be deleted. (It also includes administrators as explained in a much earlier post.) At least that way, if I have a problem I can delete the key and re-add it if I can’t administer it.
The problem still exists, however, that you could lock yourself out of all your data in a production environment if you accidentally deleted all the roles used in your AWS KMS policies and no one can encrypt or decrypt anymore with that key and no one can alter the policy. To me, the design that allows AWS to change your KMS and trust policies is very dangerous and I hope they stop doing that. I might have to consider other alternatives long term, but for my immediate purposes, I’m just testing and want to make sure I can delete the key if I make a mistake.
Refactoring and reducing lines of code
In addition to fixing the order of deletion of things in my account I ended up refactoring my code to reduce lines of code and repetitive code I can completely reduce repetitive lines calling a deletion script by passing in a list of stacks I want to delete and looping through those items to delete them.
Create the array and pass it into a function:
Function that loops through the array and calls the delete stack function:
Delete stack function:
I added a similar construct for the IAM profiles function I wrote about in the last post:
And I moved the functions to a delete_functions.sh file to clean things up a bit. I also moved them into a /Delete folder in the root of the repository.
Special handling for KMS keys
KMS keys are tricky because if you delete the key administrator before the keys are deleted you will not be able to administer the keys. I created a special function for deletion of KMS admins to check if keys exist before allowing the script to continue. Note that I am excluding AWS-managed keys and keys that are pending deletion. I’m also querying the Alias since I’ve been having issues with the key description when using the command line as outlined in prior posts.
After creating this check it is clear that my delete functionality for KMS keys is not working correctly:
These KMS keys are really, really giving me grief. I find KMS to be the most difficult service to work with on AWS at the moment (if you are trying to create zero-trust policies, if you’re not its “easy.”)
Deleting KMS Keys created by CloudFormation
I can see the above keys are not getting deleted.
Here’s my code to all delete_keys, with the list of key aliases passed into the delete_keys function:
The Alias name was used in the outputs of my CloudFormation stacks, so I can look up the Key ID using that Alias name by formulating the output name produced by our Key.yaml CloudFormation template. The Alias is also used in the stack name for the key Aliases that we need to delete.
Here’s the key deletion function. What’s the problem? I’m printing out my command line calls so I can see and replicate them.
As I explained in the last post I added if-then statements to skip to the portion of the script that is giving me grief. It’s not pretty but it gets the job done.
What I notice when I run my script is that I never get the output for the command lines that are supposed to be executed, so they are never being run.
The script only runs if it can find the key id from the CloudFormation stack associated with the alias we pass in:
Aha. I called the stack “aliasstack” but then passed in “stack” — another reason variable and type checking in languages is helpful but anyway let’s fix that.
Then I get an error that my “delete” user which is an administrator is not allowed to delete the key. That makes sense because our initial key policies only all the KMS Admins role to administer the key.
Let’s change the profile used to run this code to our KMS profile. We can change it back to our delete profile which uses our root user after fixing the above KMS policies, but right now I’m still trying to delete the old keys with the prior key policy where the root user cannot delete them.
Now I run into two problems.
First, the DeveloperSecrets alias is not found because it was already deleted.
When I check in my account, this key has in fact been deleted. I have to deal with this error message somehow.
Secondly, the KMS user can’t list keys. We should probably add that but for now I’m going to change the profile in the KMS admin function to my “delete” user.
Now first, I thought adding || true would fix my problem and allow the script to continue but it did not. When I swallow the error I can’t see if I have permissions issues. I guess I should bite the bullet and write a “key_exists” function as my current approach is not working.
That got me past my CLI functions but then the code tries to delete the CloudFormation stacks and something with that is failing. It looks like the script is trying to delete the CloudFormation key stack but is not able to:
Heading over to CloudFormation I see the following error:
Notice KMS in the policy name. I don’t want to delete my IAM administrator until the very end of my delete script. What I did as you can see above is put the KMS permissions in a separate policy that I can delete prior to deletion of the KMS keys.
At the moment I see two KMS policy stacks I previously created while testing re-deployment:
I can add the following code to delete them:
Passing arrays to functions in bash
After further testing, I realized that only the first item in my array was making it through as to the function I’m calling when I print out my list of stacks each time the delete_stacks function is called as shown below.
As it turns out you can only send the expanded values (a list of strings) to a function in bash. You do that by passing the function name followed by [@].
Then in the called function, reconstruct the array using the list:
Now I get both my KMS policies that I need to delete:
Logic for Deleting KMS Keys
KMS Keys are cumbersome to manage if you want zero-trust policies. That doesn’t mean you shouldn’t use zero-trust policies, it means that you have to be careful with them. And perhaps AWS will eventually make this easier.
For some reason I’m currently not remembering I found that I needed to separately call commands to schedule key and alias deletion on top of the CloudFormation stack deletion. I’m not sure why but this got my script working. I’ll need to revisit this and see if the separate CLI commands are required once I add my new policies to the keys that will allow my delete CLI profile to delete keys. I also need to add KMS list-keys to my KMS Admin IAM profile.
I had a number of other problems with the logic and order of things, and correctly obtaining the KMS key. Once the stack was deleted I could not pull the KMS Key ID and the CloudFormation alias stacks didn’t delete for some reason. I think I had a bug in there somewhere that caused my script to fail before it got to that point. Once the key was gone my logic never got to the alias stack deletion command, so I had to rearrange things.
At the moment my KMS key deletion function looks like this:
I also created a separate function to loop through and delete any aliases for the key:
Finally, finally, finally! My KMS keys are all scheduled to delete and the aliases and CloudFormation stacks are gone.
Deleting Resources is Not Simple
I remember when I was on the GDPR committee for a company and I learned about the data deletion requirements. I knew the VP now CIO and knew that when they told him he would need to delete certain resources for certain timeframes he would think that was simple. The company had to meet certain deadlines. I made sure to go talk to him and explain the complexities and to make sure he gave his team enough time.
Having written deletion routines for a bank, I understood the complexities of data deletion scripts. If you delete the wrong data, that can turn into a nightmare. Deletion scripts can take a long time to run, or impact system performance.
Data deletion scripts are easy to write. Like “Drop Table x” in a database. Blam. All your data is gone. The problem and complexities come into play when you should only delete certain data — such as the customers who have asked to delete themselves from your systems.
Let’s say you don’t have proper integrity checks in your system — sort of like our lovely KMS keys that allow you to delete key administrators when the key still exists. You can end up with orphan records that are impossible to decipher because they have an ID but no parent record indicating the customer to which that data belongs. Is that OK? Talk to your lawyer. GDPR is mostly a question for lawyers and the tech teams write code aligning to legal specifications.
In our case, the complexity lies in the dependencies that won’t let us delete things unless we delete in precisely the correct order. There’s also the fact that if we prematurely delete permissions we might get stuck with resource in our cloud account we cannot manage.
That deletion script that I was “just” going to quickly create now consists of two scripts that look like this — and I’m not done.
delete.sh
delete_functions.sh
Now I “just” have to remember to add each new stack I create to this delete script.
I also expect to move the deletion functionality into different resource stacks the way I did with my test scripts eventually.
Eventually I’ll add users and networking.
Now, can I get back to what I was doing before I was interrupted by a mangled KMS policy thanks to the way AWS handles ARNs for deleted roles in KMS policies??
The dreaded “Access to KMS is not allowed” error message:
One way to potentially fix it — if you haven’t deleted your key administrator.
Access to KMS is Not Allowed: Workaround
In my case, I thought it would be faster just to start everything and start over because I got into a circular loop that was proving challenging to address. I think it may have been the same issue — except in an IAM trust policy. I don’t even remember.
I also can’t remember exactly what I was trying to do before that at the moment because this took about three days to complete in between consulting calls, house projects, and weekend activities. At the time of this writing I’m about two weeks ahead on blogs posts so by the time you read this it may not be a week end anymore. :) I’ll figure it out tomorrow.
Follow for updates.
Teri Radichel
If you liked this story please clap and follow:
Medium: Teri Radichel or Email List: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests services via LinkedIn: Teri Radichel or IANS Research
© 2nd Sight Lab 2022
All the posts in this series:
Automating Cybersecurity Metrics (ACM)
____________________________________________
Author:
Cybersecurity for Executives in the Age of Cloud on Amazon
Need Cloud Security Training? 2nd Sight Lab Cloud Security Training
Is your cloud secure? Hire 2nd Sight Lab for a penetration test or security assessment.
Have a Cybersecurity or Cloud Security Question? Ask Teri Radichel by scheduling a call with IANS Research.
Cybersecurity & Cloud Security Resources by Teri Radichel: Cybersecurity and Cloud security classes, articles, white papers, presentations, and podcasts
Deleting CloudFormation resources with IAM and Resource Policy Dependencies was originally published in Cloud Security on Medium, where people are continuing the conversation by highlighting and responding to this story.
0 Comments