DESCRIPTION
Our client is a publicly traded German online fashion and beauty retailer active across Europe.
Responsibilities Develop, monitor, and operate the most used and most critical curated data pipeline at our client`s - Sales Order Data (incl. Post-order information, e.g. shipment, return, payment). This pipeline is processing hundreds of millions of records to provide high-quality datasets for analytical and machine learning use-cases Consulting with analysts, data scientists, and product managers to build and continuously improve "Single Source of Truth" KPI for business steering such as the central Profit Contribution measurement (PC II) Leverage and improve a cloud-based tech stack that includes AWS, Databricks, Kubernetes, Spark, Airflow, Python, and Scala Requirements
Expertise in Apache Spark along with Spark streaming & Spark SQL Good hands on experience with Databricks and delta-lake Fluency in Scala programming language Good understanding & hands-on experience with CI/CD Ability to build Apache Airflow pipelines Rich working experience with Github Fluency working with AWS landscape Upper-Intermediate level of English, both spoken and written (B2+) Nice to have Presto Superset Starburst Exasol We Offer Competitive compensation depending on experience and skills Variety of projects within one company Being a part of a project following engineering excellence standards Individual career path and professional growth opportunities Internal events and communities Flexible work hours