Forum

Search
Close
AI Search
Classic Search
 Search Phrase:
 Search Type:
Advanced search options
 Search in Forums:
 Search in date period:

 Sort Search Results by:

AI Assistant
AI-driven incident ...
 
Notifications
Clear all

AI-driven incident response - our experience with PagerDuty Copilot

19 Posts
17 Users
0 Reactions
508 Views
Posts: 0
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
(@jason.brooks11)
New Member
Joined: 4 months ago
[#48]

We've been experimenting with ai-driven incident response - our experience with pagerduty copilot for the past 2 months and the results are impressive.

Our setup:
- Cloud: GCP
- Team size: 47 engineers
- Deployment frequency: 27/day

Key findings:
1. Cost anomalies caught automatically
2. False positives still an issue
3. Impressive accuracy rate

Happy to answer questions about our implementation!


18 Replies
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
(@jose.jackson593)
New Member
Joined: 1 year ago

For those asking about cost: in our case (AWS, us-east-1, ~500 req/sec), we're paying about $2000/month. That's 30% vs our old setup with Docker. ROI was positive after just 2 months when you factor in engineering time saved.


Reply
5 Replies
(@nancy.howard864)
Joined: 1 year ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Has anyone else encountered issues with Jenkins when running in GCP us-west-2? We're seeing intermittent failures during peak traffic. Our setup: containerized with New Relic. Starting to wonder if we should switch to GitLab CI.


Reply
(@michelle.gutierrez269)
Joined: 7 months ago

New Member
Posts: 1
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Cautionary tale: we rushed this implementation without proper testing and it caused a 4-hour outage. The issue was DNS resolution delay. Lesson learned: always test in staging first, especially when dealing with authentication services.


Reply
(@jose.williams694)
Joined: 1 year ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Spot on. This is the direction the industry is moving.


Reply
(@david.morales35)
Joined: 11 months ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

We evaluated ArgoCD last quarter and decided against it due to learning curve. Instead, we went with Grafana which better fit our use case. The main factors were cost (30% cheaper), ease of use (2-day vs 2-week training), and community support.


Reply
(@brandon.williams519)
Joined: 7 months ago

New Member
Posts: 1
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

What about security? Did you run into any compliance issues? Our team is particularly concerned about production stability.


Reply
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
(@stephanie.long568)
New Member
Joined: 1 year ago

Here's our production setup:
- Tool A for X
- Tool B for Y
- Custom scripts for Z
Happy to share more details if interested.


Reply
5 Replies
(@william.harris811)
Joined: 6 months ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

The migration path we took:
Week 1-2: Research & POC
Week 3-4: Staging deployment
Week 5-6: Prod rollout (10% -> 50% -> 100%)
Week 7-8: Optimization
Total cost: ~200 eng hours
Would do it again in a heartbeat.


Reply
(@victoria.rivera433)
Joined: 1 year ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

How did you handle the migration? Any gotchas to watch for? Our team is particularly concerned about production stability.


Reply
(@timothy.wood427)
Joined: 5 months ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Great for small teams, but doesn't scale well past 50 people.


Reply
(@victoria.rivera433)
Joined: 1 year ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

In our production environment with 200+ microservices, we found that Ansible significantly outperformed Prometheus. The key was proper configuration of memory limits. Deployment time dropped from 45min to 8min. Highly recommended for teams running Kubernetes at scale.


Reply
(@dennis.king704)
Joined: 8 months ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Did you consider alternatives? Why did you choose this one? Trying to build a business case for management.


Reply
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
(@tyler.foster787)
New Member
Joined: 12 months ago

Resource consumption is a concern. What's your experience? Our team is particularly concerned about production stability.


Reply
5 Replies
(@evelyn.lewis664)
Joined: 8 months ago

New Member
Posts: 1
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

For those asking about cost: in our case (AWS, us-east-1, ~500 req/sec), we're paying about $2000/month. That's 70% vs our old setup with Prometheus. ROI was positive after just 2 months when you factor in engineering time saved.


Reply
(@joyce.hughes421)
Joined: 10 months ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Security team blocked this due to compliance requirements.


Reply
(@rebecca.brown460)
Joined: 11 months ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

We tried this but hit issues with X. How did you solve it? Our team is particularly concerned about production stability.


Reply
(@jose.jackson593)
Joined: 1 year ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

In our production environment with 200+ microservices, we found that ArgoCD significantly outperformed Terraform. The key was proper configuration of timeout settings. Deployment time dropped from 45min to 8min. Highly recommended for teams running Kubernetes at scale.


Reply
(@evelyn.williams270)
Joined: 1 year ago

New Member
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Just implemented this last week. Already seeing improvements!


Reply
Share:
Scroll to Top