Top

Senior Observability Architect, AI and HPC

Santa Clara, CA, USA

277 Days ago

Job Description


NVIDIA?s Hardware Infrastructure organization is seeking a Senior or Princip al Data and Observability Architect. We serve and collaborate directly with NVIDIA?s rapidly growing AI, HW, and SW engineering and research teams across the company. We are looking for a technical leader to define a vision and roadmap for distributed observability systems for large-scale AI and HPC clusters and workloads and guide implementation towards this vision. You will architect systems for data collection, aggregation, enrichment, storage, retrieval, and visualization to spectacularly improve efficiency, performance, and productivity of AI and HPC workloads. You will lead technical teams to develop, deploy, and operate observability solutions for multiple compute clusters around the world.

What You'll Be Doing:

Collaborate with AI, HW, and SW engineering and research teams to define a vision and roadmap for AI/HPC cluster observability.

Architect and lead teams to develop, test, and deploy data collectors, pipelines, visualization and retrieval services.

Define data collection and retention polices to balance network bandwidth, system load, and storage capacity costs with data analysis requirements.

Work in a diverse team to provide operational and strategic data to empower our engineers and researchers to improve performance, productivity, and efficiency.

Continuously improve quality, workloads, and processes through better observability.

What We Need to See:

Experience designing and building large scale, distributed observability systems.

Ability to collaborate with data scientists, researchers, and engineering teams to identify high value data for collection and analysis.

Experience with turning raw data into actionable reports

Experience with observability platforms such as Apache Spark, Elastic/Open Search, Grafana, Prometheus, and other similar open-source tools

Technical lead level Python programming experience and use of API calls

Passion for improving the productivity of others

Excellent planning and interpersonal skills

Flexibility/adaptability working in a dynamic environment with changing requirements

MS (preferred) or BS in Computer Science, Electrical Engineering, or related field or equivalent experience

12+ years of relevant experience.

Ways To Stand Out from The Crowd:

Background in computer science, machine learning, deep learning, open-source software, infrastructure technologies, and GPU technology.

Prior experience in infrastructure software, production application software development, software development, release and support methodology and devops

Experience in the management of datacenters and large-scale distributed computing

Experience in working with AI researchers and/or EDA developers

Consistent track record of driving process improvements and measuring efficiency and a passion for sharing knowledge and experience driving complex projects end-to-end.

The base salary range is 224,000 USD - 425,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) .

NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Key Skills Required

Software DevelopmentPythonApacheAPIData AnalysisAdaptabilityAnalysisApache SparkAPIApplication SoftwareComputer ScienceComputingConsistentData CollectionDeep LearningDevelopmentDistributed ComputingDriving ProcessDynamic EnvironmentElectrical EngineeringGrafanaImplementationInfrastructureInfrastructure SoftwareInterpersonal SkillsLearningMachine LearningManagementMethodologyOrientationProductivityResearchRoadmapScienceSPARKVisualization

Job Overview


Job Function: Other

Job Type: Full Time

Workplace Type: Not Specified

Experience Level: Mid-Senior level

Salary: Competitive & Based on Experience

Experience: 12 - 13 yrs

Contact Information


Company about us:

NVIDIA is a leading company in the world of accelerated computing, with a rich history of innovation and growth. Since its inception in 1993, the company has been at the forefront of revolutionizing the way we use technology, particularly in the fields of gaming, computer graphics, and artificial intelligence. With...

Company Name: NVIDIA

Recruiting People: HR Department

Website: https://www.nvidia.com/en-us/

Headquarter: Santa Clara, California, USA 95050

Industry: IT/Computers - Hardware & Networking

Company Size: 10000+ Employees

Location

Important Fraud Alert:
Beware of imposters. elsejob.com does not guarantee job offers or interviews in exchange for payment. Any requests for money under the guise of registration fees, refundable deposits, or similar claims are fraudulent. Please stay vigilant and report suspicious activity.

Similar Jobs

Medical Visualization R&D Manager

Anatomage, Inc. • Santa Clara, CA, USA

Experience: 5 - 7 yrs

Salary: Competitive & Based on Experience

View Job
Optical Engineering Manager

Halo Industries, Inc. • Santa Clara, CA, USA

Experience: 10 - 11 yrs

Salary: $175,000 - $190,000 / Annual Salary

View Job
Quality Manager (Engineer background)

T45 Labs • Santa Clara, CA, USA

Salary: $118,000 - $160,000 / Annual Salary

View Job
Conserje/Janitor

Impec Group • Santa Clara, CA, USA

Salary: Competitive & Based on Experience

View Job
Digital Science Content Specialist

Anatomage, Inc. • Santa Clara, CA, USA

Salary: Competitive & Based on Experience

View Job
Staff Process Engineer (Wafer Finishing)

Halo Industries, Inc. • Santa Clara, CA, USA

Experience: 5 - 6 yrs

Salary: $155,000 - $170,000 / Annual Salary

View Job
Senior NPI Engineer

Halo Industries, Inc. • Santa Clara, CA, USA

Experience: 5 - 6 yrs

Salary: $160,000 - $180,000 / Annual Salary

View Job
Nationwide Strategic Account Manager

Anatomage, Inc. • Santa Clara, CA, USA

Experience: 3 - 5 yrs

Salary: Competitive & Based on Experience

View Job
Director of New Product Introduction (NPI)

Halo Industries, Inc. • Santa Clara, CA, USA

Salary: $200,000 - $220,000 / Annual Salary

View Job
Nationwide Account Executive

Anatomage, Inc. • Santa Clara, CA, USA

Experience: 3 - 5 yrs

Salary: Competitive & Based on Experience

View Job