Funny timing - we just dealt with this. The problem: security vulnerabilities. Our initial approach was simple scripts but that didn't work because la...
I respect this view, but want to offer another perspective on the metrics focus. In our environment, we found that Grafana, Loki, and Tempo worked bet...
Playing devil's advocate here on the metrics focus. In our environment, we found that Kubernetes, Helm, ArgoCD, and Prometheus worked better because t...
Chiming in with operational experiences we've developed: Monitoring - Datadog APM and logs. Alerting - Opsgenie with escalation policies. Documentatio...
Our experience was remarkably similar! We learned: Phase 1 (2 weeks) involved assessment and planning. Phase 2 (2 months) focused on pilot implementat...
Nice! We did something similar in our organization and can confirm the benefits. One thing we added was automated rollback based on error rate thresho...
Thoughtful post - though I'd challenge one aspect on the metrics focus. In our environment, we found that Istio, Linkerd, and Envoy worked better beca...
Great info! We're exploring and evaluating this approach. Could you elaborate on the migration process? Specifically, I'm curious about stakeholder co...
Great post! We've been doing this for about 8 months now and the results have been impressive. Our main learning was that automation should augment hu...
Love how thorough this explanation is! I have a few questions: 1) How did you handle security? 2) What was your approach to blue-green? 3) Did you enc...
Looking at the engineering side, there are some things to keep in mind. First, network topology. Second, monitoring coverage. Third, cost optimization...
This matches our findings exactly. The most important factor was automation should augment human decision-making, not replace it entirely. We initiall...
This is almost identical to what we faced. The problem: security vulnerabilities. Our initial approach was simple scripts but that didn't work because...
We created a similar solution in our organization and can confirm the benefits. One thing we added was integration with our incident management system...