While this is well-reasoned, I see things differently on the timeline. In our environment, we found that Vault, AWS KMS, and SOPS worked better because observability is not optional - you can't improve what you can't measure. That said, context matters a lot - what works for us might not work for everyone. The key is to start small and iterate.
For context, we're using Terraform, AWS CDK, and CloudFormation.
Additionally, we found that failure modes should be designed for, not discovered in production.
From an implementation perspective, here are the key points. First, compliance requirements. Second, failover strategy. Third, cost optimization. We spent significant time on documentation and it was worth it. Code samples available on our GitHub if anyone wants to take a look. Performance testing showed 2x improvement.
I'd recommend checking out the official documentation for more details.
The end result was 3x increase in deployment frequency.
Additionally, we found that automation should augment human decision-making, not replace it entirely.