Databricks Engineer

Auto Import

<span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Overview:</span></span></span></b></span></span></span><br><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">We are seeking a Databricks Engineer to design, build, and operate a Data & AI platform with a strong foundation in the Medallion Architecture (raw/bronze, curated/silver, and mart/gold layers). This platform will orchestrate complex data workflows and scalable ELT pipelines to integrate data from enterprise systems such as PeopleSoft, D2L, and Salesforce, delivering high-quality, governed data for machine learning, AI/BI, and analytics at scale.</span></span></span></span></span></span><br><br><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">You will play a critical role in engineering the infrastructure and workflows that enable seamless data flow across the enterprise, ensure operational excellence, and provide the backbone for strategic decision-making, predictive modeling, and innovation.</span></span></span></span></span></span><div align="center" style="margin-left:24px;text-align:center;"><hr align="center" size="2" width="100%"></div><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Responsibilities:</span></span></span></b></span></span></span><br><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">1. <b>Data & AI Platform Engineering (Databricks-Centric):</b></span></span></span></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Design, implement, and optimize end-to-end data pipelines on Databricks, following the Medallion Architecture principles.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">This will be a complete Remote role in India.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Build robust and scalable ETL/ELT pipelines using Apache Spark and Delta Lake to transform raw (bronze) data into trusted curated (silver) and analytics-ready (gold) data layers.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Operationalize Databricks Workflows for orchestration, dependency management, and pipeline automation.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Apply schema evolution and data versioning to support agile data development.</span></span></span></span></span></span></span></li></ul><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">2. Platform Integration & Data Ingestion:</span></span></span></b></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Connect and ingest data from enterprise systems such as PeopleSoft, D2L, and Salesforce using APIs, JDBC, or other integration frameworks.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Implement connectors and ingestion frameworks that accommodate structured, semi-structured, and unstructured data.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Design standardized data ingestion processes with automated error handling, retries, and alerting.</span></span></span></span></span></span></span></li></ul><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">3. Data Quality, Monitoring, and Governance:</span></span></span></b></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Develop data quality checks, validation rules, and anomaly detection mechanisms to ensure data integrity across all layers.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Integrate monitoring and observability tools (e.g., Databricks metrics, Grafana) to track ETL performance, latency, and failures.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Implement Unity Catalog or equivalent tools for centralized metadata management, data lineage, and governance policy enforcement.</span></span></span></span></span></span></span></li></ul><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">4. Security, Privacy, and Compliance:</span></span></span></b></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Enforce data security best practices including row-level security, encryption at rest/in transit, and fine-grained access control via Unity Catalog.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Design and implement data masking, tokenization, and anonymization for compliance with privacy regulations (e.g., GDPR, FERPA).</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Work with security teams to audit and certify compliance controls.</span></span></span></span></span></span></span></li></ul><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">5. AI/ML-Ready Data Foundation:</span></span></span></b></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Enable data scientists by delivering high-quality, feature-rich data sets for model training and inference.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Support AIOps/MLOps lifecycle workflows using MLflow for experiment tracking, model registry, and deployment within Databricks.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Collaborate with AI/ML teams to create reusable feature stores and training pipelines.</span></span></span></span></span></span></span></li></ul><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">6. Cloud Data Architecture and Storage:</span></span></span></b></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Architect and manage data lakes on Azure Data Lake Storage (ADLS) or Amazon S3, and design ingestion pipelines to feed the bronze layer.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Build data marts and warehousing solutions using platforms like Databricks.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Optimize data storage and access patterns for performance and cost-efficiency.</span></span></span></span></span></span></span></li></ul><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">7. Documentation & Enablement:</span></span></span></b></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Maintain technical documentation, architecture diagrams, data dictionaries, and runbooks for all pipelines and components.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Provide training and enablement sessions to internal stakeholders on the Databricks platform, Medallion Architecture, and data governance practices.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Conduct code reviews and promote reusable patterns and frameworks across teams.</span></span></span></span></span></span></span></li></ul><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">8. Reporting and Accountability:</span></span></span></b></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Submit a weekly schedule of hours worked and progress reports outlining completed tasks, upcoming plans, and blockers.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Track deliverables against roadmap milestones and communicate risks or dependencies.</span></span></span></span></span></span></span></li></ul><div align="center" style="margin-left:24px;text-align:center;"><hr align="center" size="2" width="100%"></div><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Required Qualifications:</span></span></span></b></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Hands-on experience with Databricks, Delta Lake, and Apache Spark for large-scale data engineering.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Deep understanding of ELT pipeline development, orchestration, and monitoring in cloud-native environments.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Experience implementing Medallion Architecture (Bronze/Silver/Gold) and working with data versioning and schema enforcement in enterprise grade environments.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Strong proficiency in SQL, Python, or Scala for data transformations and workflow logic.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Proven experience integrating enterprise platforms (e.g., PeopleSoft, Salesforce, D2L) into centralized data platforms.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Familiarity with data governance, lineage tracking, and metadata management tools.</span></span></span></span></span></span></span></li></ul><div align="center" style="margin-left:24px;text-align:center;"><hr align="center" size="2" width="100%"></div><span style="font-size:12pt;"><span style="line-height:115%;"><span style="font-family:Aptos, sans-serif;"><b><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Preferred Qualifications:</span></span></span></b></span></span></span><ul><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Prior UMGC or USM experience preferred.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Experience with Databricks Unity Catalog for metadata management and access control.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Experience deploying ML models at scale using MLFlow or similar MLOps tools.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Familiarity with cloud platforms like Azure or AWS, including storage, security, and networking aspects.</span></span></span></span></span></span></span></li><li style="text-align:justify;"><span style="font-size:12pt;"><span style="line-height:115%;"><span><span style="font-family:Aptos, sans-serif;"><span lang="en-us" style="font-size:11pt;"><span style="line-height:115%;"><span style="font-family:Helvetica, sans-serif;">Knowledge of data warehouse design and star/snowflake schema modeling.</span></span></span></span></span></span></span></li></ul>

Back to blog