This page was automatically translated and may contain errors. View in English.
Crossing Hurdles

Academic Researcher

Crossing Hurdles

Remote · കരാർ

അപേക്ഷിക്കുന്ന ആദ്യയാളാകൂ

അനുഭവം
ഏതെങ്കിലും
ശമ്പളം
USD 80 – USD 110 / year
ഓപ്പണിംഗുകൾ
1
പോസ്റ്റ് ചെയ്തു
3 മണിക്കൂർ മുൻപ്
Work mode
വീട്ടിൽ നിന്ന് ജോലി ചെയ്യുക
വിദ്യാഭ്യാസം
PhD
Eligibility
Current or retired professors and PhD candidates in STEM or professional disciplines based in the United States.
Resume
Required to apply

ജോലി വിവരണം

Role overview

This contract role is for academics and research professionals in the United States who want to contribute to a frontier model evaluation program. The work centers on improving next-generation large language model systems across technical and professional subject areas.

What you will do

  • Create demanding benchmark tasks based on your academic or professional expertise and make sure they reflect real-world use cases.
  • Develop Python-based problem sets that can be executed, clearly specified, and backed by test cases for agent-style workflows.
  • Review model responses to spot weaknesses in reasoning, logic, and problem solving across complex scenarios.
  • Produce gold-standard answers and evaluation rubrics that enable consistent assessment.
  • Study system behavior to identify capability gaps and recurring failure patterns in advanced reasoning tasks.
  • Work with subject-matter experts from STEM and quantitative fields to raise the quality and rigor of evaluations.

Requirements

  • You should be a current or retired professor, or a PhD candidate, in a STEM or professional field such as computer science, mathematics, physics, engineering, statistics, economics, finance, law, or a closely related area.
  • A strong academic record from a leading university or an equivalent research setting is expected.
  • You need practical Python skills used in research, academic work, or a professional environment.
  • You should be able to create executable problem-solving tasks and computational workflows.
  • Prior exposure to benchmarking, structured evaluation, or research-based task design is an advantage.
  • Strong analytical judgment is important for checking logical validity and understanding system behavior.
  • You must be able to work on your own and maintain a steady schedule of at least 30 hours per week on weekdays.

Additional information

This position is a W-2 contingent role based in the United States. The pay range is stated as $80 to $110 per year, and the expected workload is 30+ hours per week. Applicants should proceed through the easy-apply process to continue.

മറുപടി വേണമെങ്കിൽ അത് വിടുക — ഞങ്ങൾ അത് മറ്റൊന്നിനും ഉപയോഗിക്കില്ല.

ബ്രൗസ് ചെയ്യാൻ ക്ലിക്ക് ചെയ്യുക, വലിച്ചിടുക, അല്ലെങ്കിൽ പേസ്റ്റ് ഒരു സ്ക്രീൻഷോട്ട്

PNG, JPG, GIF, MP4, WebM, MOV · പരമാവധി 20MB ഓരോന്നും · 5 ഫയലുകൾ വരെ