Overview
Microsoft Azure Artificial Intelligence production team is looking for a Principal High Performance Computing (HPC) / Artificial Intelligence (AI) Load Planning Engineer to drive the design, validation, and orchestration of multi-megawatt scale solutions needed to manage the power draw of high throughput Graphics Processing Unit (GPU)-enabled AI training clusters. Azure is building world’s largest supercomputers to cater to the massive computational demands of AI workloads, evident from the various HPC virtual machines such as ND H100 v5 that have already made the mark on Top500, MLPerf and Graph500 rankings and robust solutions to stabilize the power draw of these large clusters is needed to safely operate them.
As a Principal High Performance Computing (HPC) / Artificial Intelligence (AI) Load Planning Engineer, you would provide the best practices driving architectural changes. You will also influence the roadmap of relevant software and hardware components. Your work will directly impact on the business goals of a wide range of users and facilitate the next wave of growth and innovation in AI, and HPC in the cloud in general.
At supercomputing scale, novel tools and techniques are needed to maintain the reliability, runtime performance, health of the system and running jobs continuing to meet the expectations of users. The responsibilities of this position would be to use state-of-the-art methods, design, build and validate novel tools, find operational gaps and instrument features to achieve the smooth operation of cloud-native supercomputers.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
You will join a team of engineers and researchers with experience in high performance computing infrastructure, acutely familiar with the behavior of bulk synchronous loads in large scale systems, middleware, and software. The following values drive us:
Your mission will be to help ensure the Azure platform is consistent on power, performance, can scale on-demand, and engineered to withstand unparalleled computing demand from the customer workloads. You will help build a test-driven engineering culture to reduce regressions and bugs in production and will set a higher bar for infrastructure quality. In addition to the below responsibilites:
Qualifications
Required Qualifications:
Other Requirements:
Preferred Qualifications:
Software Engineering IC5 – The typical base pay range for this role across the U.S. is USD $137,600 – $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 – $294,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft will accept applications for the role until July 10, 2024.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
#azurecorejobs
We are seeking a passionate, talented, and inventive individual to join the Applied AI team and help build industry-leading technologies...
How to applyJob Description Oracle Healthcare Data Intelligence (HDI) is at the forefront of leveraging data and AI to transform the healthcare...
How to applyAt Blue Origin, we envision millions of people living and working in space for the benefit of Earth. We’re working...
How to applyWe are: The Advanced Technology Centers (ATCs) is the engine for reinvention in our clients’ transformation journey. Powered by more...
How to applyCustom silicon chips live at the heart of AWS Machine Learning servers, and our team builds the backend software to...
How to applyEPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive...
How to apply