Job Description
Job Title: HPC Security and Infrastructure Specialist
Location: [Specify Location or Remote]
Employment Type: Full-Time
About Us:
DMV IT Service LLC is a trusted IT consulting firm, established in 2020. We specialize in optimizing IT infrastructure, providing expert guidance, and supporting workforce needs with top-tier staffing services. Our expertise spans system administration, cybersecurity, networking, and IT operations. We empower our clients to achieve their technology goals with a client-focused approach that includes online training and job placements, fostering long-term IT success.
Job Purpose:
We are seeking an experienced HPC Security and Infrastructure Specialist to join our team. This role will focus on managing, securing, and maintaining high-performance computing (HPC) systems, storage solutions, and cloud infrastructure. The ideal candidate will have a strong background in systems administration, high-speed network storage systems, cloud systems, and providing technical support for complex IT environments.
Requirements
- Implement and maintain data management infrastructure and HPC security measures.
- Provide ongoing support for SAN , NAS storage , backup/recovery environments , and virtualization infrastructure .
- Ensure the security, disaster recovery , and service continuity of highly available enterprise storage and backup systems.
- Perform technical support for installation, configuration, maintenance, upgrade, troubleshooting, and retirement of IT systems.
- Utilize frameworks like Ansible , Puppet , and Chef for configuration management .
- Administer high-speed network storage systems , including Mellanox switches and NAS clusters .
- Manage, configure, and support cloud systems , including setting up, maintaining, and troubleshooting cloud compute engines and storage buckets .
- Administer and manage databases such as SQL Server , PostgreSQL , MySQL , and Oracle .
- Assist staff with accessing and utilizing computational resources effectively.
- Collaborate with Labs and DTMB staff to maintain and manage computational resources.
Skills & Experience:
- 10+ years of experience with Linux CLI and programming languages such as R , Python , and Bash .
- 10+ years of experience with workload management systems , particularly SLURM .
- 10+ years of experience in setting up HPC systems , including identifying suitable hardware and software.
- 10+ years of experience with databases such as PostgreSQL and system administration tasks such as installation and support.
- Strong hands-on experience with Network Appliance clustered servers and applicable software.
- Expertise in Linux configuration for storage , networking , load balancing , memory management , VMs , firewalls , and system monitoring .
- 10+ years of experience with computer security and implementing security protocols.
- Familiarity with package management systems such as conda , Docker , and Singularity .
- Experience with automation tools like Ansible , Puppet , and NextFlow .
- Knowledge of cloud computing , including setting up compute engines and managing storage buckets.
- Strong knowledge of enterprise storage solutions and big data analysis frameworks.
- Ability to provide recommendations for storage optimizations and cost-saving solutions for Labs.
- Familiarity with HL7 messaging and interpreting web.config files for plugins.
- Experience reviewing logs (e.g., IIS logs , Dynatrace logs ) to ensure no excess resource utilization or performance spikes.
- Familiar with CloudFlare , ForcePoint , and the related rules and policies (e.g., C86 rule ).
- Expertise in understanding and configuring Failover environments for applications.
Key skill Required
- Oracle
- SQL
- MYSQL
- Python
- SQL Server
- Networking
- PostgreSQL
- Automation
- Ansible
- Cloud computing
- Cloudflare
- Computer security
- Conda
- Cybersecurity
- Cloud Computing
- Analysis
- Bash
- Big Data
- Cloud Infrastructure
- Computing
- Configuration
- Consulting
- Data Management
- Disaster Recovery
- Enterprise Storage
- Focused
- Guidance
- Infrastructure
- Installation
- IT Operations
- Linux
- Load Balancing
- Maintenance
- Management
- Management Systems
- Memory Management
- Ongoing Support
- Providing Technical Support
- Resource Utilization
- Security
- Service Continuity
- System Monitoring
- Training
- Troubleshooting
- Virtualization
- Workload Management