Job Description
Data Pipeline Development:
Design, implement, and manage scalable ETL/ELT pipelines using AWS services and Databricks.
Data Integration:
Ingest and process structured, semi-structured, and unstructured data from multiple sources into AWS Data Lake or Databricks.
Data Transformation:
Develop advanced data processing workflows using PySpark, Databricks SQL, or Scala to enable analytics and reporting.
Databricks Management:
Configure and optimize Databricks clusters, notebooks, and jobs for performance and cost efficiency.
AWS Architecture:
Design and implement solutions leveraging AWS-native services like S3, Glue, Redshift, EMR, Lambda, Kinesis, and Athena.
Collaboration:
Work closely with Data Analysts, Data Scientists, and other Engineers to understand business requirements and deliver data-driven solutions.
Performance Tuning:
Optimize data pipelines, storage, and queries for performance, scalability, and reliability.
Monitoring and Security:
Ensure data pipelines are secure, robust, and monitored using CloudWatch, Datadog, or equivalent tools.
Documentation:
Maintain clear and concise documentation for data pipelines, workflows, and architecture
Key skill Required
- SQL
- Architecture
- AWS
- CloudWatch
- Data Lake
- Data Processing
- Analytics
- Collaboration
- Cost Efficiency
- Data Integration
- Data Pipelines
- Data Transformation
- Datadog
- Design
- Development
- Documentation
- Integration
- Management
- Performance Tuning
- Pipeline
- Pyspark
- Reporting
- Scalability
- Security