AI Search

Classic Search

Search Phrase:

Search Type:

Advanced search options

Search in Forums:

Search in date period:

Sort Search Results by:

AI Assistant

Notifications

Clear all

AI-driven incident response - our experience with PagerDuty Copilot

Jason Brooks · 2025-10-11T11:26:42Z

We've been experimenting with ai-driven incident response - our experience with pagerduty copilot for the past 2 months and the results are impressive. Our setup: - Cloud: GCP - Team size: 47 engineers - Deployment frequency: 27/day Key findings: 1. Cost anomalies caught automatically 2. False positives still an issue 3. Impressive accuracy rate Happy to answer questions about our implementation!

✦ Summarize Topic

Page 2 / 2 Prev

AI Automation

Last Post by Jose Jackson 4 months ago

19 Posts

17 Users

0 Reactions

480 Views

RSS

Brandon Williams

(@brandon.williams519)

Posts: 0

Translate ▼

Key takeaways from our implementation: 1) Test in production-like environments 2) Use feature flags 3) Practice incident response 4) Measure what matters. Common mistakes to avoid: over-engineering early. Resources that helped us: Google SRE book. The most important thing is consistency over perfection.

The end result was 90% decrease in manual toil.

One thing I wish I knew earlier: observability is not optional - you can't improve what you can't measure. Would have saved us a lot of time.

Posted : 03/12/2025 10:42 am

Rebecca Brown

(@rebecca.brown460)

Posts: 0

Translate ▼

Great points overall! One aspect I'd add is security considerations. We learned this the hard way when we had to iterate several times before finding the right balance. Now we always make sure to document in runbooks. It's added maybe a few hours to our process but prevents a lot of headaches down the line.

For context, we're using Grafana, Loki, and Tempo.

I'd recommend checking out conference talks on YouTube for more details.

Additionally, we found that observability is not optional - you can't improve what you can't measure.

Posted : 05/12/2025 6:47 am

Victoria Rivera

(@victoria.rivera433)

Posts: 0

Translate ▼

The full arc of our experience with this. We started about 12 months ago with a small pilot. Initial challenges included legacy compatibility. The breakthrough came when we improved observability. Key metrics improved: 70% reduction in incident MTTR. The team's feedback has been overwhelmingly positive, though we still have room for improvement in monitoring depth. Lessons learned: communicate often. Next steps for us: add more automation.

Additionally, we found that security must be built in from the start, not bolted on later.

Posted : 05/12/2025 3:53 pm

Jose Jackson

(@jose.jackson593)

Posts: 0

Translate ▼

Looking at the engineering side, there are some things to keep in mind. First, compliance requirements. Second, backup procedures. Third, performance tuning. We spent significant time on testing and it was worth it. Code samples available on our GitHub if anyone wants to take a look. Performance testing showed 2x improvement.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

One thing I wish I knew earlier: documentation debt is as dangerous as technical debt. Would have saved us a lot of time.

Posted : 09/12/2025 3:27 pm

Page 2 / 2 Prev

11 Forums
309 Topics
4,684 Posts
0 Online
109 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed