The demand for AI engineers is outpacing the supply of people who can actually do the job well. Companies in the US are paying $250–400K for senior AI engineers, and still shipping AI features that hallucinate, cost 10x what they should, and degrade silently in production.
The problem isn't a shortage of people who know how to call the OpenAI API. The problem is that most companies don't know how to tell the difference between that and someone who can build AI systems that work at scale.
Here's what the hiring process actually needs to look for.
The three mistakes most companies make
1. Testing the wrong things
Most technical screens for AI roles test whether a candidate knows the theory — transformer architectures, attention mechanisms, the math behind embeddings. That knowledge matters for researchers. For an AI engineer building production systems, what matters is different: Can they design a RAG pipeline that doesn't hallucinate? Can they tell you what the latency profile of a specific retrieval strategy will look like? Can they model the cost of running this at 100K requests per day before you've committed to it?
These are engineering problems, not research problems. Screen for them.
2. Mistaking LLM API experience for AI engineering
Someone who has built a chatbot with the OpenAI API is not an AI engineer. That's a front-end developer who added an API call. AI engineering is about designing systems that are reliable, observable, and cost-effective — which requires understanding retrieval, evaluation, prompt engineering at scale, output validation, fallback behavior, and monitoring.
In your technical screen, ask them to design a system, not just write code. Give them a real problem ("we need to let users query 200,000 customer support tickets in natural language — design the system") and see if they ask the right questions before they start drawing boxes.
3. Not having an evaluation criterion
You can't hire for a role you can't define. If you don't know what "good AI engineering" looks like for your use case, you will hire someone who sounds good and find out six months later that they can't deliver. Before you hire, write down what success looks like in 90 days. What will they have shipped? What metrics will have moved? What problems will be solved?
What to actually test
RAG system design
Give them a specific domain (legal documents, customer support, medical records — pick something with real retrieval challenges) and ask them to design the retrieval and generation pipeline. What they should cover without prompting:
- Chunking strategy and chunk size rationale
- Embedding model selection (and the cost/quality tradeoff)
- Hybrid vs. dense-only retrieval and when each applies
- How they'd handle hallucination on source citations
- Latency and cost model for a given request volume
If they don't ask about scale and budget before they start designing, that's a red flag.
Production debugging
Give them a simulated production problem. "Our RAG pipeline was returning great results last month. Now users are saying the answers are getting worse. We haven't changed the prompt. What do you investigate first?"
Good candidates will immediately ask about data drift, whether the document corpus changed, whether there are embedding model updates, and how current evaluation looks. They'll think about the system, not just the model.
Cost and latency modeling
Ask them to estimate the monthly LLM cost for a product that processes 50,000 documents per month averaging 5,000 words each, with a query volume of 10,000 per day averaging 200 words per query. They should be able to do back-of-envelope math on tokens, select an appropriate model tier based on the task complexity, and identify where caching would reduce costs.
Someone who can't do this calculation will not build cost-efficient AI systems.
Red flags that actually matter
- They only ever mention OpenAI or ChatGPT when discussing models. The AI landscape is too diverse for a single-vendor perspective.
- They can't explain how they'd evaluate whether a change improved or degraded system quality. Vibes are not an evaluation framework.
- They haven't shipped anything to production. Side projects and notebooks are fine for junior roles, but senior AI engineers should have scars from production.
- They describe AI as magic. The best AI engineers are deeply skeptical of hype and precise about what models can and can't do.
What a reasonable senior AI engineer package looks like (US, 2025)
For a fully remote US-based senior AI engineer with production experience: $200–350K total comp depending on equity, location, and company stage. Startups at Series A–B typically compete on equity and interesting problems rather than cash.
For contract and consulting work, senior AI engineers are billing $150–300/hour or $15K–40K per project for scoped engagements.
If you're seeing rates below $100/hour for "senior AI engineering" with strong US market claims, you're looking at either junior talent or offshore work presented as senior. Both can work for certain scopes, but go in with clear eyes about what you're getting.
The faster path: scope it as a project
If you need AI capability now and can't hire fast, scoped project contracts with a senior AI engineer are often faster and lower-risk than a full-time hire. You get the work done, you see how they operate, and you end up with a better idea of what a full-time hire would actually need to do.
At Goviaus, that's what we do. Scoped AI engineering engagements, production outcomes, no 6-month hiring process. If you're trying to ship AI and your hiring pipeline is blocked, get in touch.