OpsX DevOps Team Forum

Nancy Howard

@nancy.howard864

Forum Home | Recent Posts

Joined: Dec 5, 2024

Topics: 5 / Replies: 39

AllTopicsReplies

Topic

Quisque mattis nunc ex, ut iaculis eros venenatis!

2 months ago

Forum

Azure & GCP

Replies: 0

Re: GitHub Copilot for DevOps: worth the $39/month?

Great writeup! That said, I have some concerns on the team structure. In our environment, we found that Vault, AWS KMS, and SOPS worked better because...

4 months ago

Forum

AIOps Discussion

Re: ChatGPT for infrastructure code - game changer or security risk?

We hit this same problem! Symptoms: increased error rates. Root cause analysis revealed connection pool exhaustion. Fix: increased pool size. Preventi...

4 months ago

Forum

AI Automation

Re: AI-driven incident response - our experience with PagerDuty Copilot

Here's our full story with this. We started about 14 months ago with a small pilot. Initial challenges included performance issues. The breakthrough c...

4 months ago

Forum

AI Automation

Re: Service mesh showdown: Istio vs Linkerd vs Consul Connect

Great approach! In our organization and can confirm the benefits. One thing we added was feature flags for gradual rollouts. The key insight for us wa...

5 months ago

Forum

CI/CD Pipelines

Re: OpenTofu reaches v1.10 - what changed from Terraform?

Here are some operational tips that worked for uss we've developed: Monitoring - Datadog APM and logs. Alerting - custom Slack integration. Documentat...

5 months ago

Forum

Weekly Roundup

Topic

Multi-region Kubernetes setup with global load balancing

5 months ago

Forum

Lessons Learned

Replies: 22

Re: Best practices for managing secrets in Kubernetes 2025

We went through something very similar. The problem: scaling issues. Our initial approach was ad-hoc monitoring but that didn't work because lacked vi...

5 months ago

Forum

CI/CD Pipelines

Re: Kubernetes 1.32 released with groundbreaking security features

Experienced this firsthand! Symptoms: increased error rates. Root cause analysis revealed connection pool exhaustion. Fix: corrected routing rules. Pr...

5 months ago

Forum

Weekly Roundup

Re: How we achieved 99.99% uptime with chaos engineering

This mirrors what happened to us earlier this year. The problem: deployment failures. Our initial approach was simple scripts but that didn't work bec...

6 months ago

Forum

Lessons Learned

Re: Automated compliance scanning in CI/CD - SOC2 journey

Makes sense! For us, the approach varied using Grafana, Loki, and Tempo. The main reason was automation should augment human decision-making, not repl...

6 months ago

Forum

Lessons Learned

Re: Practical guide: Implementing SLOs and error budgets for reliability

Lessons we learned along the way: 1) Test in production-like environments 2) Monitor proactively 3) Practice incident response 4) Build for failure. C...

6 months ago

Forum

Infrastructure as Code

Re: ChatGPT for infrastructure code - game changer or security risk?

This mirrors what happened to us earlier this year. The problem: scaling issues. Our initial approach was simple scripts but that didn't work because ...

6 months ago

Forum

AIOps Discussion

Re: Part 2: Best practices for Kubernetes pod security in production

Our solution was somewhat different using Grafana, Loki, and Tempo. The main reason was automation should augment human decision-making, not replace i...

6 months ago

Forum

AWS Cloud

Re: Follow-up: Prometheus and Grafana: Advanced monitoring techniques

Couldn't agree more. From our work, the most important factor was failure modes should be designed for, not discovered in production. We initially str...

6 months ago

Forum

DevOps News

Page 1 / 3 Next