Job Description
Note:- Even though the role is categorized as Remote, it will follow a hybrid work model.
Key Responsibilities:
Design and automate the deployment of distributed systems for ingesting and transforming data from various sources (relational, event-based, unstructured).
Develop frameworks for continuous monitoring and troubleshooting of data quality and integrity issues.
Implement data governance processes for metadata management, data access, and retention policies for internal and external users.
Provide guidance on building reliable, efficient, scalable, and quality data pipelines with monitoring and alert mechanisms using ETL/ELT tools or scripting languages.
Design and implement physical data models to define database structures and optimize performance through indexing and table relationships.
Optimize, test, and troubleshoot data pipelines.
Develop and manage large-scale data storage and processing solutions using distributed and cloud-based platforms such as Data Lakes, Hadoop, Hbase, Cassandra, MongoDB, Accumulo, DynamoDB, and others.
Utilize modern tools and architectures to automate common, repeatable, and tedious data preparation and integration tasks.
Drive automation in data integration and management by renovating the data management infrastructure.
Ensure the success of critical analytics initiatives by employing agile development methodologies such as DevOps, Scrum, and Kanban.
Coach and mentor less experienced team members.
RESPONSIBILITIES
Technical Skills:
Expert-level proficiency in Spark, including optimization, debugging, and troubleshooting Spark jobs.
Solid knowledge of Azure Databricks for scalable, distributed data processing.
Strong coding skills in Python and Scala for data processing.
Experience with SQL, especially for large datasets.
Knowledge of data formats such as Iceberg, Parquet, ORC, and Delta Lake.
Experience developing CI/CD processes.
Deep understanding of Azure Data Services (e.g., Azure Blob Storage, Azure Data Lake, Azure SQL Data Warehouse, Synapse Analytics, etc.).
Familiarity with data lakes, data warehouses, and modern data architectures.
Competencies:
System Requirements Engineering
Translates stakeholder needs into verifiable requirements, establishing acceptance criteria and assessing the impact of requirement changes.
Collaborates
Builds partnerships and works collaboratively with others to meet shared objectives.
Communicates effectively
Develops and delivers clear, audience-specific communications.
Customer focus
Builds strong customer relationships and delivers customer-centric solutions.
Decision quality
Makes timely and informed decisions to keep the organization moving forward.
Data Extraction
Performs ETL activities from various sources using appropriate tools and technologies.
Programming
Develops, tests, and maintains computer code and scripts to meet business and compliance requirements.
Quality Assurance Metrics
Uses IT Operating Model (ITOM) and SDLC standards to assess solution quality.
Solution Documentation
Documents solutions for improved productivity and knowledge transfer.
Solution Validation Testing
Ensures configuration changes and solutions meet customer requirements.
Data Quality
Identifies, understands, and corrects data flaws to enhance information governance.
Problem Solving
Uses systematic analysis to identify root causes and implement robust solutions.
Values differences
Recognizes and appreciates diverse perspectives and cultures.
QUALIFICATIONS
Education, Licenses, Certifications:
Bachelor's or master?s degree in Computer Science, Information Technology, Engineering, or a related field.
Experience:
8+ years of experience in data engineering or a related field, with experience in a leadership role.
Intermediate experience in relevant disciplines is required.
Knowledge of the latest data engineering technologies and trends is preferred, including:
Analyzing complex business systems, industry requirements, and data regulations.
Processing and managing large datasets.
Designing and developing Big Data platforms using open-source and third-party tools.
SPARK, Scala/Java, Map-Reduce, Hive, Hbase, and Kafka or equivalent.
SQL query language.
Cloud-based clustered compute implementation.
Developing applications requiring large file movement in a cloud-based environment.
Building analytical solutions.
Intermediate experience in the following is preferred:
IoT technology.
Agile software development.
Job
Systems/Information Technology
Organization
Cummins Inc.
Role Category: Remote
Job Type: Exempt - Experienced
ReqID: 2411156
Relocation Package
No