Forum

Search
Close
AI Search
Classic Search
 Search Phrase:
 Search Type:
Advanced search options
 Search in Forums:
 Search in date period:

 Sort Search Results by:

AI Assistant
Notifications
Clear all

On-call rotation best practices to prevent burnout

22 Posts
16 Users
0 Reactions
325 Views
(@evelyn.sanders800)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

We encountered something similar during our last sprint. The problem: scaling issues. Our initial approach was simple scripts but that didn't work because lacked visibility. What actually worked: real-time dashboards for stakeholder visibility. The key insight was failure modes should be designed for, not discovered in production. Now we're able to detect issues early.

Additionally, we found that observability is not optional - you can't improve what you can't measure.

One thing I wish I knew earlier: automation should augment human decision-making, not replace it entirely. Would have saved us a lot of time.


 
Posted : 10/08/2025 9:23 pm
(@william.harris811)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Our solution was somewhat different using Datadog, PagerDuty, and Slack. The main reason was starting small and iterating is more effective than big-bang transformations. However, I can see how your method would be better for legacy environments. Have you considered integration with our incident management system?

Additionally, we found that cross-team collaboration is essential for success.

Additionally, we found that security must be built in from the start, not bolted on later.


 
Posted : 10/08/2025 11:10 pm
(@william.harris811)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Let me tell you how we approached this. We started about 24 months ago with a small pilot. Initial challenges included team training. The breakthrough came when we automated the testing. Key metrics improved: 50% reduction in deployment time. The team's feedback has been overwhelmingly positive, though we still have room for improvement in testing coverage. Lessons learned: measure everything. Next steps for us: add more automation.

One thing I wish I knew earlier: the human side of change management is often harder than the technical implementation. Would have saved us a lot of time.


 
Posted : 11/08/2025 9:55 am
(@maria.turner939)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

The technical implications here are worth examining. First, compliance requirements. Second, monitoring coverage. Third, cost optimization. We spent significant time on monitoring and it was worth it. Code samples available on our GitHub if anyone wants to take a look. Performance testing showed 10x throughput increase.

The end result was 40% cost savings on infrastructure.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.


 
Posted : 12/08/2025 10:38 am
(@maria.turner939)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Great writeup! That said, I have some concerns on the metrics focus. In our environment, we found that Datadog, PagerDuty, and Slack worked better because automation should augment human decision-making, not replace it entirely. That said, context matters a lot - what works for us might not work for everyone. The key is to start small and iterate.

The end result was 3x increase in deployment frequency.

Additionally, we found that documentation debt is as dangerous as technical debt.


 
Posted : 13/08/2025 10:33 pm
(@timothy.scott735)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

We went down this path too in our organization and can confirm the benefits. One thing we added was real-time dashboards for stakeholder visibility. The key insight for us was understanding that the human side of change management is often harder than the technical implementation. We also found that the initial investment was higher than expected, but the long-term benefits exceeded our projections. Happy to share more details if anyone is interested.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

One more thing worth mentioning: we discovered several hidden dependencies during the migration.

For context, we're using Jenkins, GitHub Actions, and Docker.

One more thing worth mentioning: we underestimated the training time needed but it was worth the investment.

The end result was 80% reduction in security vulnerabilities.

I'd recommend checking out the community forums for more details.

One thing I wish I knew earlier: failure modes should be designed for, not discovered in production. Would have saved us a lot of time.


 
Posted : 15/08/2025 3:47 pm
(@deborah.howard208)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Solid work putting this together! I have a few questions: 1) How did you handle monitoring? 2) What was your approach to backup? 3) Did you encounter any issues with latency? We're considering a similar implementation and would love to learn from your experience.

The end result was 80% reduction in security vulnerabilities.

The end result was 70% reduction in incident MTTR.

The end result was 99.9% availability, up from 99.5%.

I'd recommend checking out the community forums for more details.


 
Posted : 16/08/2025 5:00 pm
Page 2 / 2
Share:
Scroll to Top