What are the four levels of AI leverage in the Caliber rubric?

L1 Manual (does it by hand, no AI), L2 Assisted (one-off prompts, no reuse), L3 Augmented (reusable prompts with verification), and L4 Architect (workflow redesigned around AI with humans on judgment).

Why levels instead of a single AI fluency score?

Levels capture a maturity stage that is actionable for hiring. A score in the middle of a scale does not tell you whether the candidate can architect a workflow or just paste into ChatGPT. The level does.

What level should I hire for?

Depends on the role. Senior individual contributors in high-leverage motions (sales, ops, marketing, recruiting) are best at L3 or better. Junior or specialized roles can come in lower and grow. Caliber lets you specify the target level for the role.

The Four Levels of AI Leverage

A year ago, claiming you use AI was a hiring differentiator. Today every candidate says it. The question stopped being whether someone uses AI and started being how. Caliber scores candidates on a four-level rubric of how much leverage they actually get from AI on real work. Here is the rubric.

L1: Manual

Does the job by hand. AI is absent, or used only cosmetically (a sentence here, a phrasing fix there). The work gets done, but at the original cost. Sometimes legitimate: high-trust documents, regulated environments, or the candidate's first month before they have tooling set up. Sometimes a tell that the candidate has not been pushed.

L2: Assisted

Uses AI as a faster search box. Single prompts, single outputs, no reuse. The pattern is "paste this in, copy that out." Faster than L1, but no compounding. The work product is brittle: ask the candidate to do it again with new inputs and they start over. Verification is rare. Most candidates in the AI conversation today are L2 and don't know it.

L3: Augmented

Built reusable prompts and light systems. Reads as the operator who has been doing this for a while and gradually evolved their workflow. The prompt itself is an artifact. Verification is built in: spot checks on numbers, flags on anomalies, humans on the edges. Same task next week is faster, not just as fast.

L4: Architect

Redesigned the workflow around AI. The candidate connected tools, built a skill or agent, and now spends their time on judgment, not typing. The recurring pieces of the job run themselves. The candidate can explain the tradeoffs: why this part is automated, why this part is human, what fails and how they catch it. They are not faster at the old job. They have built a different job.

Why levels, not scores

The levels are not a percentile. Two L3 sales operators in the same week look almost identical. Two L4s diverge based on what they chose to architect away. Levels capture a maturity stage. A score in the middle of an arbitrary scale tells you nothing actionable.

The hiring use

Different roles need different levels right now. A senior individual contributor at a fifty-person company hiring into a high-leverage motion (sales, ops, marketing, recruiting) is best at L3 or better. A first-year analyst at the same company might come in at L1 and grow. The question is what level your role needs, not what is impressive in the abstract.

Where Caliber fits

We built Caliber to score this on a real scenario, not a quiz. Pick a role, read the scenario, watch four candidates do the same job at L1 through L4, and see what good looks like at the level you are hiring for. The interactive sample on the home page runs through Sales, Operations, and Marketing scenarios. The first three pilot orgs per quarter run as a manual service at no charge while we onboard the self-serve product.