Data Engineer – Remote

Halo Labs Pty • Melbourne • 2w ago

About Us Halo Labs is a future focused, end-to-end data solutions firm – transforming tomorrow, today. Our intelligent and secure technology systems and data driven solutions drive meaningful outcomes and unlock lasting value. Why work at Halo Labs? Leading Innovation: We don’t just solve problems; we illuminate a never-ending stream of innovation. Exceptional Perks: Enjoy outstanding perks like a dedicated learning budget, performance bonuses, and comprehensive wellbeing support. Remote-First Organisation: Experience the perks of remote work while having the flexibility to travel to client locations throughout Australia. Inclusive and Engaging: Celebrate diversity in a welcoming space that thrives on new ideas and open conversations, all in a respectful environment. Inspiring Origins: With a compelling founder story, we are a customer-focused, culture-first organisation. About the role • Design, develop, and maintain reusable in-house PySpark frameworks to enforce standardised data engineering patterns across the SaaS platform • Architect and implement scalable, production-grade ETL/ELT pipelines across Azure and AWS environments • Build distributed data processing solutions using Python and PySpark on Databricks • Develop batch and near real-time ingestion pipelines integrating third-party clinical systems, healthcare APIs, and external enterprise platforms • Design secure data integration patterns (REST APIs, SFTP, event-driven ingestion, webhooks) ensuring compliance and data integrity • Work closely with Software Engineers to embed data services directly into the SaaS product architecture • Contribute to backend system design discussions to ensure data layer scalability, observability, and performance • Implement CI/CD pipelines using Azure DevOps, Git, and Azure Pipelines for automated deployment and testing of data workloads • Apply infrastructure-as-code and environment management best practices across Azure and AWS • Optimise Spark jobs, cluster configurations, and storage strategies for performance and cost efficiency • Design and maintain robust data models, including dimensional models and SaaS-oriented data schemas • Implement data validation, monitoring, and alerting to ensure pipeline reliability and production stability • Provide technical mentorship and enforce engineering standards across the analytics and data engineering team About you • Strong hands-on experience with Azure services including Databricks, Azure Data Factory, Azure SQL, Azure Storage, and Azure DevOps • Practical experience with AWS services relevant to modern data platforms (S3, Lambda, RDS, Glue, IAM, etc.) • Advanced proficiency in Python, SQL, and PySpark for large-scale distributed data processing • Deep experience configuring and managing Databricks clusters for scalable big data workloads • Experience building production-ready data pipelines in a SaaS or product-led engineering environment • Strong understanding of cloud-native data architecture, including data lakes, lakehouse architecture, and modular pipeline design • Experience integrating with third-party systems via APIs and secure data exchange mechanisms • Exposure to healthcare or regulated data environments, including handling sensitive data securely • Strong knowledge of data modelling, metadata management, and data governance principles • Experience implementing automated testing frameworks for data pipelines • Solid understanding of DevOps practices including Git workflows, branching strategies, and CI/CD automation • Degree in Computer Science, Engineering, Data Science, or related technical field