Research Scientist
Singapore · పూర్తి సమయం
దరఖాస్తు చేసుకునే వారిలో మొదటి వ్యక్తిగా ఉండండి
- అనుభవం
- ఏదైనా
- జీతం
- —
- ఖాళీలు
- 1
- పోస్ట్ చేయబడింది
- 1 గంట క్రితం
- Work mode
- కార్యాలయంలో
- Eligibility
- Candidates who have strong research and engineering experience in large-scale machine learning data systems, video generation models, and post-training methods, and who can work onsite in Singapore, are the intended fit for this role.
- Resume
- Required to apply
Where you'll work
ఉద్యోగ వివరణ
About Cantina Labs
Cantina Labs is a social AI company creating advanced real-time models that expand what’s possible in expression, personality, and realism. Its work brings characters to life and changes the way people tell stories, connect, and create. The company builds full ecosystems, and its flagship social AI platform, Cantina, is only the first step.
Role Overview
Cantina is looking to add a Research Scientist to its Singapore team. This position focuses on foundational research for video generation models, with ownership across the entire research process and a strong emphasis on post-training research. You will work closely with teams across data, infrastructure, and related modeling functions to turn research insights into lasting improvements in model performance.
Key Responsibilities
- Develop and operate scalable systems that ingest, clean, and deliver video datasets at very large scale for training purposes.
- Create distributed pipelines for preprocessing, dataset creation, and repeated refresh cycles.
- Take ownership of workflow automation, job coordination, monitoring, and recovery handling for large processing workloads.
- Build and support container-based pipeline infrastructure with Kubernetes or comparable orchestration tools.
- Improve cloud storage and data transfer workflows across AWS, GCS, or Azure with attention to cost, speed, and operational efficiency.
- Set standards for dataset layout, version control, caching, retention, and access methods.
- Create tooling for deduplication at scale, including near-duplicate detection across large video collections.
- Investigate and build distillation approaches for large diffusion- and flow-based video generation systems, including guidance and adversarial distillation, while maintaining or improving output quality and lowering inference cost.
- Design reward models and preference-based fine-tuning workflows that align video generation quality with human judgment in areas such as aesthetics, motion, and prompt adherence.
- Study how base model behavior influences post-training results and collaborate with the foundation model team to guide pretraining choices.
Required Experience and Qualifications
- Proven hands-on background in building or scaling large data systems or pipelines for machine learning workflows.
- Experience with distributed processing tools such as PySpark or Ray, plus orchestration systems such as Airflow or similar alternatives.
- Working knowledge of containerization and orchestration, including Docker and Kubernetes.
- Practical experience with cloud storage and compute platforms such as AWS, GCS, and/or Azure, including tradeoffs involving cost, throughput, storage structure, and access patterns.
- Exposure to video and media processing tools such as FFmpeg, PyAV, DALI, or OpenCV.
- Comfort working with multimodal data, including video, images, text, and audio.
- Strong research experience in post-training methods for large diffusion or flow-based generative models, especially distillation methods that balance efficiency with quality retention.
- Experience with reward modeling or preference-based fine-tuning approaches such as RLHF, DPO, or similar alignment techniques.
- Understanding of how pretraining and post-training interact, and how core model properties influence distillation and fine-tuning outcomes.
- Strong Python skills and familiarity with modern machine learning frameworks, preferably PyTorch or JAX.
- Demonstrated ability to work independently and carry research projects from idea generation through experimental validation.
- Publications in top-tier venues such as NeurIPS, ICML, ICLR, CVPR, ICCV, or ECCV are preferred.
- Clear understanding of the practical requirements for building reliable, scalable, and reproducible ML data workflows.
Benefits
- Competitive pay plus meaningful company equity.
- Personal leave and paid public holidays.
- Health insurance coverage.
- Global travel insurance for international trips.
- A monthly spending allowance of $500 (about S$635).
- All equipment required for a home office setup.