Meta is seeking a Technical Program Manager (TPM) experienced in managing large-scale AI cluster design, development and deployment. This position will work with cross-functional teams in Meta’s Infrastructure organization to build Large-scale AI clusters that enable Meta’s AI applications and use cases . This position would focus on creating strategies and executing plans to enable Meta’s various AI software workloads to get onboarded on full stack hardware in large-scale AI clusters. This role would be responsible for successful design, build-out, delivery and turn-up of large-scale AI clusters end-to-end; this includes influencing network topology, determining the most appropriate hardware infrastructure in terms of Compute, Storage and Network and how they work together as a solution, delivering such hardware into data center, influencing orchestration system and cluster level software tooling and provisioning, driving cluster-level testing, optimizing cluster-level software performance, and/or migrating existing software applications to enable cluster-level turn-up for AI applications and use cases. This role would work with Infrastructure Hardware development, Infrastructure software, Capacity Planning, Data Center, Network Infrastructure and Infrastructure sourcing teams. Meta’s Infrastructure Engineering organization is responsible for the growth, management and 24×7 upkeep of all Meta’s products and services.
Infra Hardware TPM – AI Cluster Responsibilities
Minimum Qualifications
Preferred Qualifications
Sales, Marketing and Global Services (SMGS) AWS Sales, Marketing, and Global Services (SMGS) is responsible for driving revenue, adoption, and...
How to applyRole: AI/ML Lead Location: CT Hartford – City Place I, 185 Asylum St, Hartford, CT 06103 – Onsite – Need...
How to applyAbout Us SentinelOne is defining the future of cybersecurity through our XDR platform that automatically prevents, detects, and responds to...
How to applyWork ScheduleOther Environmental ConditionsOffice Job Description Position Summary: The GBS AI Engineer will play a dynamic and creative role in...
How to applyAWS Neuron is the complete software stack for the AWS Inferentia and Trainium cloud-scale machine learning accelerators and the Trn1...
How to applyAre you a writer who loves to code ? Are you excited about diving deep into AI/ML concepts and helping...
How to apply