A leading global Amazon FBA aggregator was running all data transformations inside Redshift — approximately 900 SQL queries processing data across dozens of acquired brands. The monthly bill had reached $40,000 with no path to containment. Critical queries were taking 20+ minutes, creating bottlenecks across dashboards, ML models, and operational workflows. And with zero per-query cost or performance visibility, nobody could tell what was expensive or why.
What Cuedo Built
A hybrid lakehouse ETL architecture — moving heavy transformation work entirely off Redshift onto Spark on EMR, with Apache Iceberg on S3 for storage and Airflow for orchestration. Redshift retained only for Gold-layer analytics consumption.
Key engineering decisions:
- Medallion architecture with Bronze, Silver, and Gold layers on S3 Apache Iceberg tables
- Spark SQL on EMR handling all heavy transformations — compute moved entirely off Redshift
- Airflow orchestration integrated into the client's existing in-house platform — zero infrastructure changes on their side
- AWS Lake Formation and IAM RBAC for governance and access control
- OpenTelemetry and Chronosphere for per-query cost tracking — the first time the client had this visibility
- Terraform and GitHub CI/CD for full infrastructure-as-code
- Repeatable migration playbook documented for remaining query flows
350+ queries migrated. Costs cut by up to 85%. Zero disruption to dashboards, ML models, or write-back automations during cutover. And a documented playbook ready for the remaining flows — so the next migration takes weeks, not months.