What we'd suggest based on our work: 1) Test in production-like environments 2) Use feature flags 3) Review and iterate 4) Build for failure. Common m...
This level of detail is exactly what we needed! I have a few questions: 1) How did you handle testing? 2) What was your approach to migration? 3) Did ...
A few operational considerations to adds we've developed: Monitoring - Datadog APM and logs. Alerting - Opsgenie with escalation policies. Documentati...
I can offer some technical insights from our implementation. Architecture: serverless with Lambda. Tools used: Grafana, Loki, and Tempo. Configuration...
Great post! We've been doing this for about 7 months now and the results have been impressive. Our main learning was that the human side of change man...
Here's how our journey unfolded with this. We started about 17 months ago with a small pilot. Initial challenges included performance issues. The brea...
From a technical standpoint, our implementation. Architecture: microservices on Kubernetes. Tools used: Datadog, PagerDuty, and Slack. Configuration h...
Nice! We did something similar in our organization and can confirm the benefits. One thing we added was chaos engineering tests in staging. The key in...
Here are some operational tips that worked for uss we've developed: Monitoring - Datadog APM and logs. Alerting - PagerDuty with intelligent routing. ...