Forum

Search
Close
AI Search
Classic Search
 Search Phrase:
 Search Type:
Advanced search options
 Search in Forums:
 Search in date period:

 Sort Search Results by:

AI Assistant
How we achieved 99....
 
Notifications
Clear all

How we achieved 99.99% uptime with chaos engineering

23 Posts
19 Users
0 Reactions
132 Views
0
[#104]
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Project: How we achieved 99.99% uptime with chaos engineering

Timeline: 9 months
Team: 5 engineers
Budget: $276k

Challenge:
We needed to improve deployment speed while maintaining backward compatibility.

Solution:
We implemented a strangler fig pattern using:
- GitOps with ArgoCD
- Feature flags
- DevSecOps integration

Results:
✓ Deployment frequency: 1/week → 50/day
✓ Onboarding time cut in half
✓ Team can focus on features

Happy to discuss our approach and share learnings!


28/10/2025 7:07 am
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

For those asking about cost: in our case (AWS, us-east-1, ~500 req/sec), we're paying about $1000/month. That's 60% vs our old setup with Grafana. ROI was positive after just 2 months when you factor in engineering time saved.


29/10/2025 10:58 am
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

We evaluated Jenkins last quarter and decided against it due to licensing costs. Instead, we went with Docker which better fit our use case. The main factors were cost (30% cheaper), ease of use (2-day vs 2-week training), and community support.


03/11/2025 10:15 pm
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Pro tip: if you're implementing this, make sure to configure memory limits correctly. We spent 2 weeks debugging random failures only to discover the default timeout was too low. Changed from 30s to 2min and all issues disappeared.


0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Here's our production setup:
- Tool A for X
- Tool B for Y
- Custom scripts for Z
Happy to share more details if interested.


0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Great point! We've seen similar results in our environment.


0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

For those asking about cost: in our case (AWS, us-east-1, ~500 req/sec), we're paying about $10000/month. That's 40% vs our old setup with Kubernetes. ROI was positive after just 2 months when you factor in engineering time saved.


0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian

Works well in theory, but production reality is different.


Share:
Scroll to Top