Forum

Search
Close
AI Search
Classic Search
 Search Phrase:
 Search Type:
Advanced search options
 Search in Forums:
 Search in date period:

 Sort Search Results by:

AI Assistant
Follow-up: Data lak...
 
Notifications
Clear all

Follow-up: Data lake architecture on AWS: S3, Glue, and Athena

19 Posts
17 Users
0 Reactions
116 Views
(@donald.lee803)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Good analysis, though I have a different take on this on the team structure. In our environment, we found that Elasticsearch, Fluentd, and Kibana worked better because starting small and iterating is more effective than big-bang transformations. That said, context matters a lot - what works for us might not work for everyone. The key is to focus on outcomes.

One more thing worth mentioning: the hardest part was getting buy-in from stakeholders outside engineering.

I'd recommend checking out relevant blog posts for more details.


 
Posted : 02/06/2025 12:46 am
(@mark.perez536)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Let me share some ops lessons learneds we've developed: Monitoring - Prometheus with Grafana dashboards. Alerting - Opsgenie with escalation policies. Documentation - Notion for team wikis. Training - certification programs. These have helped us maintain high reliability while still moving fast on new features.

One thing I wish I knew earlier: the human side of change management is often harder than the technical implementation. Would have saved us a lot of time.

Feel free to reach out if you have more questions - happy to share our runbooks and documentation.


 
Posted : 02/06/2025 3:39 am
(@elizabeth.perez157)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Been there with this one! Symptoms: frequent timeouts. Root cause analysis revealed memory leaks. Fix: fixed the leak. Prevention measures: chaos engineering. Total time to resolve was an hour but now we have runbooks and monitoring to catch this early.

The end result was 50% reduction in deployment time.

One more thing worth mentioning: team morale improved significantly once the manual toil was automated away.

For context, we're using Datadog, PagerDuty, and Slack.

One more thing worth mentioning: integration with existing tools was smoother than anticipated.


 
Posted : 03/06/2025 12:48 pm
(@brian.cook36)
Posts: 0
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Great job documenting all of this! I have a few questions: 1) How did you handle scaling? 2) What was your approach to rollback? 3) Did you encounter any issues with latency? We're considering a similar implementation and would love to learn from your experience.

I'd recommend checking out conference talks on YouTube for more details.

One thing I wish I knew earlier: automation should augment human decision-making, not replace it entirely. Would have saved us a lot of time.

For context, we're using Terraform, AWS CDK, and CloudFormation.


 
Posted : 04/06/2025 4:53 pm
Page 2 / 2
Share:
Scroll to Top