We chose a different path here using Istio, Linkerd, and Envoy. The main reason was observability is not optional - you can't improve what you can't m...
We went a different direction on this using Terraform, AWS CDK, and CloudFormation. The main reason was automation should augment human decision-makin...
Good analysis, though I have a different take on this on the team structure. In our environment, we found that Grafana, Loki, and Tempo worked better ...
Let me tell you how we approached this. We started about 3 months ago with a small pilot. Initial challenges included tool integration. The breakthrou...
Great post! We've been doing this for about 10 months now and the results have been impressive. Our main learning was that automation should augment h...
Here are some technical specifics from our implementation. Architecture: serverless with Lambda. Tools used: Istio, Linkerd, and Envoy. Configuration ...
Parallel experiences here. We learned: Phase 1 (2 weeks) involved tool evaluation. Phase 2 (3 months) focused on pilot implementation. Phase 3 (1 mont...
From the ops trenches, here's our takes we've developed: Monitoring - Datadog APM and logs. Alerting - Opsgenie with escalation policies. Documentatio...