Deskripsi pekerjaan

About the Company

Userpilot is a product analytics and user engagement platform that helps product teams at many companies understand, segment, and activate users. Its product combines a high-performance JavaScript SDK inside customer web applications, a Chrome extension for building in-app interfaces without code, a React dashboard for real-time data, and a distributed Elixir/Phoenix backend designed to handle very large-scale WebSocket traffic, Kafka-based event ingestion, and live content delivery.

The team moves quickly, ships frequently, and values engineers who care deeply about both the product experience and the systems that power it.

About the Role

This is a deeply AI-focused engineering role centered on Lia, Userpilot's agent platform. The platform converts rich product data into trustworthy, grounded, multi-step AI experiences. In this role, AI is the product itself rather than a support tool.

You will own and extend the agent platform, which is built as a Python service on Microsoft Agent Framework. It includes hybrid retrieval across multiple tool catalogs, multi-step orchestration with skills and sub-agents, multi-turn state management and grounding, and full trace-level observability with cost accounting. The architecture is designed around framework-neutral domain contracts.

This is an ownership role where you will strengthen architecture, improve reliability and evaluation standards, and help define the direction of a frontier agentic system. The company looks for engineers who can pursue a problem to its root, choose between deterministic methods, statistics, and LLMs with judgment, and keep customer experience at the center.

Key Work Areas

Build conversational AI experiences grounded in a rich product-data model, including tool use, retrieval, streaming, and orchestrated multi-turn grounding that is dependable rather than merely convincing.
Develop and evolve the agent runtime and orchestration layer, including multi-step agent workflows supported by framework-neutral domain contracts that preserve portability of business logic.
Create hybrid retrieval and tool-grounding systems using vector and lexical RAG over tool catalogs gathered from sources such as OpenAPI specifications and MCP, so the agent can choose the correct operation and arguments against live customer data.
Design packaged AI workflows that generate durable, editable, and actionable outputs instead of chat responses that disappear in history.
Own the evaluation, observability, and cost systems that keep the platform safe and viable at scale, including deterministic checks, live judge-scored reasoning evaluations, end-to-end tracing, and per-call cost tracking.
Support agent interoperability by building an MCP server that exposes Userpilot's tools to external AI agents.

What You Will Do

Design, build, and run the agent platform end to end, covering the API layer, runtime, tools, retrieval, persistence, and observability.
Deliver LLM and agent features that reliably ground in customer data, with streaming, retries, evaluations, and graceful fallback behavior suitable for production.
Select the most appropriate mechanism for each signal, whether that is retrieval, deterministic logic, structured outputs, statistics, or an LLM.
Treat evaluation quality, cost per call, and latency as core engineering concerns, since continuously running AI features have real unit economics.
Work in a spec-led, AI-assisted process by reading and contributing to PRDs that guide both human and agent implementation.
Improve the team's agentic infrastructure, including AGENTS.md, CLAUDE.md, DESIGN.md, slash commands, and architectural rules that help AI tooling understand the codebase.
Review code for architectural consistency and reliability, including agent-generated code, so it follows the same boundaries and framework-neutral contracts as human-written code.
Set standards for the team by defining patterns, writing the specs and evals others rely on, and helping engineers and agents grow.

Required Experience and Background

At least 3 years of production software experience, with evidence of owning systems rather than only individual features and of improving quality for those around you.
Strong Python skills and solid computer science fundamentals, including practical experience with databases, queues, or real-time systems. The platform uses FastAPI, Pydantic, and async Python.
Hands-on production experience with agentic or LLM systems beyond basic API calls, including tool use, retrieval grounding, structured outputs, multi-turn continuity, streaming, evaluations, and non-deterministic behavior. Owning an agent runtime or orchestration layer end to end is especially valuable.
Good architectural judgment for AI systems, including the ability to keep domain logic separate from fast-moving vendor frameworks and to make thoughtful build-versus-adopt decisions.
Clear judgment on when to use an LLM and when to rely on deterministic logic, retrieval, or statistics for better reliability, lower cost, or better reproducibility.
Comfort using AI coding assistants such as Claude Code or Cursor as part of the development workflow, while still reviewing output critically and pushing back when needed.
Strong product sense and a balanced focus on user experience and system correctness.
Self-directed work style and a mindset of continuous improvement, with comfort in a work environment that does not over-prescribe every step.

Bonus Experience

Experience with agent frameworks or orchestration tools such as Microsoft Agent Framework, LangGraph, AutoGen, or a runtime you built yourself.
Background in RAG and tool-use platforms, including retrieval over tools and APIs, OpenAPI-based tool generation, or MCP.
Experience designing and using LLM evaluation and observability systems, including tracing and cost tools such as Langfuse or OpenTelemetry GenAI.
Cost optimization experience for LLM workloads, including caching, batching, model routing, and prompt compaction.
Knowledge of embedding-based retrieval or clustering, such as vector databases, hybrid search, HDBSCAN, or UMAP.
Experience with multi-tenant SaaS architecture, including data isolation, per-tenant state, and noisy-neighbor management.
Depth in full-stack or core services work, including React/TypeScript or the core stack of Elixir/Phoenix, OTP, ClickHouse, and Kafka.
Exposure to time-series anomaly detection, drift monitoring, recommendation systems, ranking systems, feedback loops, developer experience, agentic infrastructure, technical leadership, or open source work.

How the Team Builds

The team believes statistics, heuristics, and LLMs each have distinct strengths, and that the wrong tool should not be forced into a task such as anomaly detection or risk scoring.
Work begins with a written specification or PRD that captures intent and constraints, whether a human or an agent will implement it.
Coding agents are used for scaffolding, while engineers remain responsible for architecture, review, and judgment.
Every LLM-shaped feature is expected to have an evaluation suite before release, and the team studies the suite itself, not only whether the system executes successfully.
LLM usage is treated as a costed engineering decision, so caching, batching, model routing, and prompt compaction are important.
Instrumenting feedback loops is central to improving AI products over time.
Patterns are written down explicitly in documents such as AGENTS.md so that both humans and agents follow the same product-domain rules and avoid mistakes like breaking cache invariants or violating design contracts.
Developer experience is considered a product quality issue, and unclear documentation or rules are treated as bugs to be fixed.

Right to Work

Applicants must already have the legal right to work in Ireland, as visa sponsorship is not available for this position.

Equal Opportunity

Userpilot states that it is an equal opportunity employer and is committed to an inclusive workplace. Hiring decisions are made without discrimination based on gender, civil status, family status, age, disability, race, religion, sexual orientation, or membership of the Traveller community, in line with the Employment Equality Acts 1998–2015.

Data Privacy

Personal data submitted during the application process will be processed for recruitment and candidate assessment purposes only, and retained only for as long as needed for that purpose.

Software Engineer - Agentic Platform

Where you'll work