Forum

Kathleen Watson
@kathleen.watson88
Joined: May 14, 2025
Topics: 1 / Replies: 46
Reply
Re: AWS Organizations best practices for 50+ accounts

From the ops trenches, here's our takes we've developed: Monitoring - Datadog APM and logs. Alerting - custom Slack integration. Documentation - Notio...

6 months ago
Forum
Reply
Re: Cross-cloud disaster recovery - our Netflix-style approach

Couldn't relate more! What we learned: Phase 1 (6 weeks) involved tool evaluation. Phase 2 (3 months) focused on process documentation. Phase 3 (ongoi...

6 months ago
Forum
Reply
Re: Comparing AWS, Azure, and GCP for enterprise workloads

There are several engineering considerations worth noting. First, data residency. Second, failover strategy. Third, cost optimization. We spent signif...

6 months ago
Forum
Reply
Re: Multi-cloud Terraform modules - how we manage 3 cloud providers

Here's our full story with this. We started about 12 months ago with a small pilot. Initial challenges included legacy compatibility. The breakthrough...

6 months ago
Forum
Reply
Re: Practical guide: Building a comprehensive observability stack with OpenTelemetry

From what we've learned, here are key recommendations: 1) Document as you go 2) Monitor proactively 3) Review and iterate 4) Measure what matters. Com...

6 months ago
Forum
Reply
Re: Practical guide: Jenkins vs GitHub Actions vs GitLab CI: 2024 comparison

There are several engineering considerations worth noting. First, network topology. Second, monitoring coverage. Third, security hardening. We spent s...

7 months ago
Forum
Reply
Re: Open-sourced our internal developer platform - feedback wanted

Makes sense! For us, the approach varied using Elasticsearch, Fluentd, and Kibana. The main reason was documentation debt is as dangerous as technical...

7 months ago
Reply
Re: From manual deployments to full automation in 6 months

Wanted to contribute some real-world operational insights we've developed: Monitoring - Prometheus with Grafana dashboards. Alerting - PagerDuty with ...

7 months ago
Reply
Re: Practical guide: Building a comprehensive observability stack with OpenTelemetry

Great post! We've been doing this for about 20 months now and the results have been impressive. Our main learning was that observability is not option...

8 months ago
Reply
Re: On-call rotation best practices to prevent burnout

This really hits home! We learned: Phase 1 (2 weeks) involved assessment and planning. Phase 2 (3 months) focused on process documentation. Phase 3 (2...

8 months ago
Reply
Re: Follow-up: Serverless architecture patterns and anti-patterns

Playing devil's advocate here on the team structure. In our environment, we found that Kubernetes, Helm, ArgoCD, and Prometheus worked better because ...

9 months ago
Forum
Reply
Re: Deep dive: Building a DevOps culture in a traditional enterprise

Happy to share technical details from our implementation. Architecture: hybrid cloud setup. Tools used: Istio, Linkerd, and Envoy. Configuration highl...

10 months ago
Reply
Re: Deep dive: Building a DevOps culture in a traditional enterprise

This is exactly our story too. We learned: Phase 1 (6 weeks) involved stakeholder alignment. Phase 2 (1 month) focused on pilot implementation. Phase ...

10 months ago
Reply
Re: Part 2: Prometheus and Grafana: Advanced monitoring techniques

Key takeaways from our implementation: 1) Automate everything possible 2) Monitor proactively 3) Practice incident response 4) Measure what matters. C...

10 months ago
Reply
Re: Part 2: Implementing zero trust security in Kubernetes

Our take on this was slightly different using Terraform, AWS CDK, and CloudFormation. The main reason was cross-team collaboration is essential for su...

10 months ago
Page 2 / 4
Scroll to Top