This page was automatically translated and may contain errors. View in English.
Qualcomm

LLM Serving Engineer (Cloud AI Engineering)

Qualcomm

Riyadh, Riyadh Province, Saudi Arabia • Penuh Waktu

Jadilah yang pertama mendaftar

Pengalaman
2–5 tahun
Gaji
Lowongan
1
Diposting
1 jam yang lalu
Work mode
Di kantor
Pendidikan
B.Tech
Eligibility
Candidates meeting one of the listed education-and-experience combinations in engineering, computer science, information systems, or related fields may apply. Applicants with equivalent experience may also be considered if they can show they can perform the core duties and have the required compete…
Resume
Required to apply

Where you'll work

Deskripsi pekerjaan

About the Company

Qualcomm Middle East Information Technology Company LLC is expanding its footprint in Riyadh and is looking for engineers who can help strengthen its data centre and cloud AI capabilities across the region.

As Saudi Arabia advances its digital ambitions under Vision 2030, Qualcomm is investing in high-end compute and data centre infrastructure to support AI, cloud services, and next-generation connectivity at scale. This role offers the chance to contribute to a growing technology hub and help shape mission-critical operations in the Kingdom and beyond.

The Qualcomm Cloud AI team builds hardware and software solutions that accelerate inference, with a strong focus on deep learning and cloud AI. The team is hiring LLM Serving Engineers across multiple seniority levels to join a collaborative environment working across research, product development, and commercial rollout.

Role Summary

This position covers the full lifecycle of an LLM serving solution, from early-stage innovation and engineering to deployment in production. It calls for strong technical judgment, execution discipline, and clear communication in a fast-moving environment.

What You Will Do

  • Design and build a scalable platform for LLM inference, using techniques such as disaggregated serving, KV-cache handling, advanced parallelism, speculative methods, model tuning, and specialized kernels.
  • Help develop and improve LLM serving stacks such as vLLM, SGLang, TGI, Triton Inference Server, Dynamo, and LLM-d.
  • Partner with customers and internal compiler, firmware, and platform teams to deliver workable end-to-end solutions.
  • Explore advanced GenAI methods, including attention mechanisms and Mixture-of-Experts models, to uncover new optimization paths.
  • Improve serving efficiency through intelligent autoscaling, traffic balancing, and request routing.
  • Participate in open-source serving communities and help advance the frameworks used by the broader ecosystem.

What We Are Looking For

  • Practical experience with one or more LLM serving or orchestration tools such as Triton Inference Server, vLLM, SGLang, Ollama, llm-d, KServe, LMCache, or MoonCake.
  • Strong knowledge of LLMs, VLMs, SLMs, and transformer-based model architectures.
  • Solid hands-on experience building language models with PyTorch.
  • Strong foundations in algorithms, data structures, and parallel/distributed programming.
  • Good understanding of computer architecture, ML accelerators, in-memory processing, and distributed systems.
  • Advanced Python development skills for large-scale software projects.
  • Experience profiling, analyzing, and improving deep learning workloads.
  • Curiosity and initiative to keep up with the latest inference optimization approaches.
  • Excellent problem-solving and communication abilities, with the ability to work well in a collaborative, high-paced setting.
  • Advanced degree background such as an MS in Computer Science, Machine Learning, Computer Engineering, or Electrical Engineering.

Preferred Qualifications

  • Contributions to open-source GenAI projects.
  • Experience designing and building distributed systems at scale.
  • Exposure to kernel-level design using technologies such as PyTorch, CUDA, or Triton.
  • Familiarity with torch.compile or torchDynamo.
  • A PhD in Computer Science, Computer Engineering, or Machine Learning.

Minimum Qualification Paths

  • Bachelor's degree in Computer Science, Electrical or Computer Engineering, Information Systems, or a related field, plus at least 5 years of experience in Hardware Engineering, Software Engineering, Systems Engineering, or a related area.
  • Master's degree in Computer Science, Electrical or Computer Engineering, Information Systems, or a related field, plus at least 4 years of experience in Hardware Engineering, Software Engineering, Systems Engineering, or a related area.
  • PhD in Computer Science, Engineering, Information Systems, or a related field, plus at least 2 years of experience in Hardware Engineering, Software Engineering, Systems Engineering, or a related area.

What’s on Offer

  • Base salary with housing and transportation allowance.
  • Stock compensation through RSUs and performance-based bonus.
  • 16 weeks of fully paid maternity leave.
  • 6 weeks of fully paid paternity leave.
  • Employee stock purchase plan.
  • Child education allowance.
  • Relocation and immigration assistance where required.
  • Life and medical insurance coverage.
  • Live+ Well reimbursement for health and recreational membership fees.

Additional Information

The years of experience listed are indicative only. Candidates with equivalent experience may still be considered if they can demonstrate the ability to perform the core responsibilities and have the required competencies.

Qualcomm is an equal opportunity employer and supports accessibility accommodations for applicants with disabilities during the hiring process.

Employees are expected to follow all applicable company policies, including requirements related to security and the protection of confidential and proprietary information where allowed by law.

Applications from staffing and recruiting agencies are not accepted through this careers site. Unsolicited resumes and applications will not be reviewed, and no fees will be paid for such submissions.

Biarkan saja jika Anda ingin mendapat balasan — kami tidak akan menggunakannya untuk hal lain.

Klik untuk melihat-lihat, seret & lepas, atau pasta tangkapan layar

PNG, JPG, GIF, MP4, WebM, MOV · Maksimal 20MB per file · Hingga 5 file