Exigo Tech is a Sydney-based Technology Solutions Provider that is focused on providing solutions on three major verticals; Infrastructure, Cloud, and Application to businesses across Australia. We help companies reach operational efficiencies by empowering them with technology solutions that drive their business processes.
Exigo is looking for Full-time Sr. Data Engineer
We are ISO 27001 : 2022 certified organization
Visit our website : for more details….
Click Here to know more : LIFE AT EXIGO TECH
Roles and Responsibilities
- Install, configure, and manage Apache Spark (open-source) clusters on Ubuntu, including Spark master / worker nodes and Spark environment files.
- Configure and manage Spark UI and Spark History Server for monitoring jobs, analyzing DAGs, stages, tasks, and troubleshooting performance.
- Develop, optimize, and deploy PySpark ETL / ELT pipelines using DataFrame API, UDFs, window functions, caching, partitioning, and broadcasting.
- Deploy PySpark jobs using spark-submit in client / cluster mode with proper logging and error handling.
- Install, configure, and manage Apache Airflow including UI, scheduler, webserver, connections, and variables.
- Create, schedule, and monitor Airflow DAGs for PySpark jobs using SparkSubmitOperator, BashOperator, or PythonOperator.
- Configure and manage cron jobs for scheduling data processing tasks where needed.
- Install, configure, and optimize Trino (PrestoSQL) coordinator and worker nodes; configure catalogs such as S3, MySQL, or PostgreSQL.
- Maintain Linux / Ubuntu servers including services, logs, environment variables, memory usage, and port conflict resolution.
- Design and implement scalable data architectures using Azure Data Services including ADF, Synapse, ADLS, Azure SQL, and Databricks.
- Develop, manage, and automate ETL / ELT pipelines using Azure Data Factory (Pipelines, Mapping Dataflows, Dataflows).
- Monitor, troubleshoot, and optimize data pipelines across Spark, Airflow, Trino, and Azure platforms.
- Work with structured, semi-structured, and unstructured data across multiple data sources and formats.
- Implement data analytics, transformation, backup, and recovery solutions.
- Perform data migration, upgrade, and modernization using Azure and database tools.
- Implement CI / CD pipelines for data solutions using Azure DevOps and Git.
- Ensure data quality, governance, lineage, metadata management, and security compliance across cloud and big data environments.
- Design and optimize data models using star and snowflake schemas; build data warehouses, Delta Lake, and Lakehouse systems.
- Develop and rebuild reports / dashboards using Power BI, Tableau, or similar tools.
- Collaborate with internal teams, clients, and business users to gather requirements and deliver high-quality data solutions.
- Provide documentation, runbooks, and operational guidance.
Technical Skills :
Apache Spark (Open Source) & PySpark - MustApache Spark installation & cluster configuration (Ubuntu / Linux)Spark master / worker setup (standalone & cluster mode)Spark UI & History Server configuration and debuggingPySpark development (ETL pipelines, UDFs, window functions, DataFrame API)Performance tuning (partitioning, caching, shuffles)spark-submit deployment with monitoring and logging2. Apache Airflow & Job Orchestration - Must
Airflow installation & configuration (UI, scheduler, webserver)Creating and scheduling DAGs (SparkSubmitOperator, BashOperator, PythonOperator)Retry logic, triggers, alerting, and log managementCron job scheduling & process automation3. Trino (PrestoSQL) - Must
Trino coordinator & worker node setupCatalog configuration (S3, RDBMS sources)Distributed SQL troubleshooting & performance optimization4. Azure Data Services (nice to have)
Azure Data FactoryAzure Synapse AnalyticsAzure SQL / Cosmos DBAzure Data Lake Storage (Gen2)Azure Databricks (Delta, Notebooks, Jobs)Azure Event Hubs / Stream Analytics5. Microsoft Fabric ( nice to have)
LakehouseWarehouseDataflowsNotebooksPipelines6. Programming & Querying
PythonPySparkSQLScala7. Data Modeling & Warehousing
Star schema modelingSnowflake schema modelingFact / dimension modelingData warehouse & Lakehouse designDelta Lake / Lakehouse architectures8. DevOps & CI / CD
Git / GitHub / Azure ReposAzure DevOps pipelines (CI / CD)Automated deployment for Spark, Airflow, ADF, Databricks, Fabric9. BI Tools (Nice to have)
Power BITableauReport building, datasets, DAX10. Linux / Ubuntu Server Knowledge
Shell scriptingService managementLogs & environment variablesSoft Skills :
Excellent problem solving and communication skillsAble to work well in a team settingExcellent organizational and time management skillsTaking end-to-end ownershipProduction support & timely deliverySelf-driven, flexible and innovativeMicrosoft Certified : Azure Data Engineer Associate (DP-203 / DP -300)Knowledge of DevOps and CI / CD pipelines in AzureEducation :
BSc / BA in Computer Science, Engineering or a related fieldWork Location : Vadodara, Gujarat, India