Job Description
Data Engineer:
Our client is Delivering the most comprehensive identity insights, our client's platform equips businesses with fully automated KYB (Know your Business) solutions for Risk, and Fraud management, setting new standards in business verification.
The solution is designed for FS institutions to emphasize their interest in a coordinated effort to mitigate B2B fraud, reduce the risk associated with working with small businesses, and create a centralized, privacy-compliant entity for data-sharing between financial institutions.
Looking for a data engineer to assist with data collection across a variety of fragmented data sources available from both government, public, and private databases.
Responsibilities:
- Designing the database schema
- Migrating production tables
- Working with large scale data in the terabyte range
- Maintaining operational efficiency of database
- Normalizing disparate schemas to a single unified scheme
- Abstracting reusable components
- Strength in minimizing the amount of new code for new pipelines and instead creating internal packages that allow a high level of reusability
Requirements
Experience with:
- ETL (extract, transform, load)
- Database design
- Primary key, foreign key
- Indexing
- Partitioning
- Access patterns
- Migrations
- Data pipelining
- Core data concepts: ACID transactions, Idempotency, Orchestration
Technologies:
- Airflow
- Google Cloud Platform (GCP)
- GCP Dataflow (aka Apache Beam)
- PostgreSQL
- Python
- Pydantic (a python library)
- Distributed systems
Qualification
Bachelor's Degree
Key skill Required
- Python
- PostgreSQL
- Apache
- Apache Beam
- B2B
- Core Data
- Comprehensive
- Data Collection
- Database
- Database Design
- Database Schema
- Design
- Google Cloud Platform
- Management
- Operational Efficiency
- Orchestration