Forum

Search
Close
AI Search
Classic Search
 Search Phrase:
 Search Type:
Advanced search options
 Search in Forums:
 Search in date period:

 Sort Search Results by:

AI Assistant
Built a self-servic...
 
Notifications
Clear all

Built a self-service platform for 100+ developers using Backstage

23 Posts
20 Users
0 Reactions
293 Views
(@kathleen.watson88)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Couldn't relate more! What we learned: Phase 1 (6 weeks) involved tool evaluation. Phase 2 (2 months) focused on team training. Phase 3 (2 weeks) was all about full rollout. Total investment was $50K but the payback period was only 3 months. Key success factors: good tooling, training, patience. If I could do it again, I would involve operations earlier.

For context, we're using Jenkins, GitHub Actions, and Docker.

Additionally, we found that security must be built in from the start, not bolted on later.


 
Posted : 13/11/2025 2:29 pm
(@alex_kubernetes)
Posts: 0
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

This is almost identical to what we faced. The problem: security vulnerabilities. Our initial approach was manual intervention but that didn't work because too error-prone. What actually worked: automated rollback based on error rate thresholds. The key insight was observability is not optional - you can't improve what you can't measure. Now we're able to detect issues early.

For context, we're using Datadog, PagerDuty, and Slack.

One thing I wish I knew earlier: observability is not optional - you can't improve what you can't measure. Would have saved us a lot of time.


 
Posted : 14/11/2025 11:26 am
(@william.smith189)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Playing devil's advocate here on the tooling choice. In our environment, we found that Datadog, PagerDuty, and Slack worked better because automation should augment human decision-making, not replace it entirely. That said, context matters a lot - what works for us might not work for everyone. The key is to focus on outcomes.

I'd recommend checking out relevant blog posts for more details.

The end result was 50% reduction in deployment time.

For context, we're using Istio, Linkerd, and Envoy.


 
Posted : 14/11/2025 10:21 pm
(@karen.thomas72)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

On the operational side, some thoughtss we've developed: Monitoring - Prometheus with Grafana dashboards. Alerting - Opsgenie with escalation policies. Documentation - Confluence with templates. Training - monthly lunch and learns. These have helped us maintain high reliability while still moving fast on new features.

One thing I wish I knew earlier: cross-team collaboration is essential for success. Would have saved us a lot of time.

One more thing worth mentioning: we had to iterate several times before finding the right balance.


 
Posted : 20/11/2025 10:26 pm
(@deborah.howard208)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

A few operational considerations to adds we've developed: Monitoring - CloudWatch with custom metrics. Alerting - custom Slack integration. Documentation - Confluence with templates. Training - pairing sessions. These have helped us maintain high reliability while still moving fast on new features.

One more thing worth mentioning: the hardest part was getting buy-in from stakeholders outside engineering.

The end result was 80% reduction in security vulnerabilities.

For context, we're using Terraform, AWS CDK, and CloudFormation.


 
Posted : 21/11/2025 7:15 pm
(@mark.perez536)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

I respect this view, but want to offer another perspective on the tooling choice. In our environment, we found that Vault, AWS KMS, and SOPS worked better because failure modes should be designed for, not discovered in production. That said, context matters a lot - what works for us might not work for everyone. The key is to start small and iterate.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

For context, we're using Grafana, Loki, and Tempo.


 
Posted : 22/11/2025 8:32 am
(@david_jenkins)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Adding some engineering details from our implementation. Architecture: microservices on Kubernetes. Tools used: Istio, Linkerd, and Envoy. Configuration highlights: CI/CD with GitHub Actions workflows. Performance benchmarks showed 50% latency reduction. Security considerations: secrets management with Vault. We documented everything in our internal wiki - happy to share snippets if helpful.

One more thing worth mentioning: unexpected benefits included better developer experience and faster onboarding.


 
Posted : 26/11/2025 3:13 am
(@maria.turner939)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Really helpful breakdown here! I have a few questions: 1) How did you handle scaling? 2) What was your approach to rollback? 3) Did you encounter any issues with latency? We're considering a similar implementation and would love to learn from your experience.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.

I'd recommend checking out relevant blog posts for more details.

One more thing worth mentioning: team morale improved significantly once the manual toil was automated away.


 
Posted : 27/11/2025 5:01 pm
Page 2 / 2
Share:
Scroll to Top