AI PROFICIENCY ASSESSMENT

Hire for how people actually use AI.

Pick a role, pick a level, see the Leverage Profile. Caliber scores how a candidate redesigns the job around AI, not whether they can name the tools.

Join the waitlist See how it works

Built by Inflection Group, the operator-led MSP and advisory firm. Pilot cohorts open monthly by role.

Everyone says they want AI-first people. Almost no one can test for it.

A resume keyword tells you nothing about how someone actually works. A coding test misses every non-technical role. Self-reported quizzes reward confidence, not competence.

SAMPLE ASSESSMENT

Pick a role. See how the same job gets done at four levels of AI leverage.

One scenario per role. Four candidates at L1 Manual through L4 Architect. The Leverage Profile underneath is what Caliber would score for top-tier work at that role.

BRIEF · SALES · WEEKLY-PIPELINE-REPORT

Give me a weekly pipeline report every Friday.

You are a mid-level AE on a 12-person sales team. Every Friday by 4pm your VP wants a one-page pipeline report covering stage movement, the five biggest deals, and anything stalled 14+ days. The source is a 4,000-row Salesforce export. The audience is the head of sales and the CRO. Produce the report. Tell us how you did it.

L1 · MANUALDoes it by hand. AI is absent or cosmetic.

BUILD LOG

I open the Salesforce export in Excel, sort by stage and amount, and pull the top deals into a Google Doc. I write the narrative myself. I do not use AI on any of it. I prefer to see every number with my own eyes before it goes to the CRO. About two and a half hours every Friday.

OUTPUT

WEEKLY PIPELINE / WEEK OF JUNE 3
Top 5 deals by amount:
  Acme Corp     $480K    Negotiation
  Globex        $310K    Proposal
  Initech       $290K    Discovery
  Hooli         $245K    Closed Won
  Umbrella      $210K    Proposal
Stage totals: Disc 14, Prop 9, Neg 4, CW 2, CL 3.
Notes: Acme legal still reviewing MSA. Hooli closed Tuesday.
Stalled: 3 deals untouched 14+ days. Will follow up Monday.
Submitted to head of sales 3:48pm.

L2 · ASSISTEDUses AI as a faster search box. One-off prompts.

BUILD LOG

I export the CSV, paste it into ChatGPT, and ask it to summarize the pipeline by stage and surface anything stalled. I copy the answer into a doc and add the headline coverage number I grabbed from a Salesforce report. I did not double-check the totals against the source. About 35 minutes start to finish.

OUTPUT

WEEKLY PIPELINE / WEEK OF JUNE 3
Pipeline coverage: 3.4x quota (per ChatGPT summary of export).
Top movement: Acme Corp moved Proposal -> Negotiation.
Stalled deals (no activity 10+ days): 7 listed below.
Narrative: "Pipeline is healthy heading into Q3 close week" (ChatGPT).
Watch item flagged by ChatGPT: "Acme legal exposure may slip the quarter."
[Note from manager: Which 7 stalled deals? Send list.]

L3 · AUGMENTEDReusable prompt, verified output, humans on edges.

BUILD LOG

I wrote a reusable prompt that takes the Salesforce export, returns the report in our fixed format, and flags anomalies (no activity 14+ days, amounts that swung by more than 30%, stage regressions). I spot-check the top five numbers against the Salesforce dashboard every Friday before I post. Anything flagged anomalous routes to me for a one-line human judgment before it goes out. About 20 minutes including the verification step.

OUTPUT

WEEKLY PIPELINE / WEEK OF JUNE 3
Coverage: 3.4x (verified vs Salesforce dashboard, 3:31pm).
Stage movement: 4 deals advanced, 1 regressed.
Anomalies flagged for human review (3):
  Globex amount changed $260K -> $310K (rep override, accepted).
  Pied Piper $180K stage regressed Discovery -> Disqualified (real signal, followed up with rep).
  Vandelay $415K new in Discovery, 24 hours old, large for stage (asked rep to confirm sizing).
Stalled (14+ days, no activity): 3.  Names + last-touch dates attached.
Action items routed to reps in Slack with deal context. Posted 3:42pm.

L4 · ARCHITECTRedesigned the workflow around AI. Reusable systems.

BUILD LOG

I connected Claude to Salesforce and Slack. A skill I built pulls the week's pipeline every Friday at 2pm, generates the report in our format, posts a draft to a private Slack channel, and waits for me to verify the headline metrics. After my green light it auto-posts to the leadership channel and DMs each AE their stalled-deal list with proposed next steps. I review the prompt monthly and verify the top-line numbers every week. Edge cases (regressions, large amount changes, brand-new deals over $250K) escalate to me as a Slack DM. I spend about 10 minutes on this end-to-end.

OUTPUT

WEEKLY PIPELINE / WEEK OF JUNE 3   [auto-generated 14:02 PT, verified 14:11 PT]
Coverage 3.4x   Stage movement +4 / -1   New >$250K this week: 2
Top 5: Acme $480K (Neg), Globex $310K (Prop), Initech $290K (Disc), Hooli $245K (Won), Umbrella $210K (Prop)
Escalations to head of sales (3):
  1. Pied Piper $180K stage regressed Discovery -> Disqualified
  2. Vandelay $415K new in Discovery (24h old, large for stage)
  3. Acme MSA in legal 18 days (above 14d SLA)
Rep DMs sent: 7 AEs, total 14 stalled deals with proposed next steps.
Time spent by AE this week: 11 min.   Time spent prior method (Q1 baseline): 142 min.

LEVERAGE PROFILE · SALES · L4 BENCHMARK

What “great” looks like for this role.

Tool Fluency

4/4

Workflow Architecture

4/4

Judgment & Verification

4/4

Leverage Ceiling

4/4

Responsible Use

3/4

OPERATIONS

BRIEF · OPERATIONS · WEEKLY-OPS-REVIEW

Run our Monday operations review.

You are the ops manager at a 60-person services business. Every Monday at 9am the leadership team reviews the prior week: utilization, project status, customer escalations, and headcount asks. Inputs are a Harvest time-tracking export, the project tracker in Notion, and an inbox of customer escalation emails. Produce the review doc. Tell us how you did it.

L1 · MANUALOpens four tabs and rebuilds the deck by hand each week.

BUILD LOG

I pull utilization out of Harvest, copy the project status from Notion line by line, and read the escalation inbox by hand. I rebuild the slides each Monday morning before standup. About three hours, every Monday. I do not use AI on any of it. I prefer to see every number with my own eyes before the leadership team does.

OUTPUT

MONDAY OPS REVIEW / WEEK OF JUNE 3
Utilization: 78% (target 75%).
Projects red: 2 (Acme migration, Globex onboarding).
Escalations open: 5. New this week: 2.
Headcount: 1 backfill pending, 1 contractor extension requested.
Submitted to leadership Slack 9:04am.

L2 · ASSISTEDPastes raw data into a chatbot, accepts the summary.

BUILD LOG

I dump the Harvest CSV and the Notion project table into ChatGPT and ask for a Monday-review structure. I do the same with the escalation inbox. I review for tone before sending. I do not check the numbers against the source. About 45 minutes.

OUTPUT

MONDAY OPS REVIEW / WEEK OF JUNE 3
Utilization "trending up" per ChatGPT.
Projects "mostly green, with two of concern."
Escalations: ChatGPT summary calls out one customer (UmbrellaCorp) as "high risk."
Sent at 9:01am.
[Note from CEO at 9:03am: which two projects, exactly? Couldn't answer in the meeting.]

L3 · AUGMENTEDReusable workflow with anomaly flags. Verifies the headline numbers.

BUILD LOG

I built a reusable prompt that takes the three exports, returns the review in our fixed format, and flags any utilization swing of more than 5 points week-over-week, projects newly turned red, escalations not responded to in 48 hours, and contractors above 90% utilization. I verify the utilization headline against Harvest before posting. Everything flagged anomalous routes to me for a one-line judgment call. About 25 minutes.

OUTPUT

MONDAY OPS REVIEW / WEEK OF JUNE 3
Utilization: 78% (target 75%; +3.0 pts vs LW, verified vs Harvest 8:42am).
Projects red: 2.  Acme migration (week 2 in red), Globex onboarding (new this week).
Escalations: 5 open. Flag: UmbrellaCorp ticket open 71 hours, exceeds 48h SLA.
Headcount: 1 backfill in late stage, 1 contractor extension approved Friday.
Anomalies queued for human read (2):
  Acme migration burn rate up 18% vs plan.
  Two contractors at 95%+ utilization. Burnout watch, flagged to their PMs.

L4 · ARCHITECTThe review prepares itself. Ops manager reviews and ships.

BUILD LOG

I wired Claude to Harvest, Notion, the escalation inbox, and our Slack. Every Sunday at 6pm a skill pulls all three exports, generates the Monday review in our format, posts a draft to a private channel, and waits for me. By Monday at 8am I have spent fifteen minutes verifying the headline metrics and approving the anomalies. The doc auto-posts to the leadership channel at 8:45am. Customer-facing escalation drafts go to the account manager responsible. Anything truly novel (a customer asking to cancel, a project blowing burn by 25%+, a contractor flagging burnout) pages me directly.

OUTPUT

MONDAY OPS REVIEW / WEEK OF JUNE 3   [auto-drafted Sun 18:02, verified Mon 07:48, posted 08:45]
Utilization 78% (+3.0)   Projects red 2   Escalations open 5
Escalation drafts pre-written and queued in AM inboxes: 4
Headcount asks: 1 backfill in late stage. 1 contractor extension approved Friday.
Paged to ops manager Sunday night (1): UmbrellaCorp requested mid-quarter renegotiation.
Time to ship this week: 15 min review, 0 min rebuild.
Time spent prior method (Q1 baseline): 180 min per week.

LEVERAGE PROFILE · OPERATIONS · L4 BENCHMARK

What “great” looks like for this role.

Tool Fluency

4/4

Workflow Architecture

4/4

Judgment & Verification

3/4

Leverage Ceiling

4/4

Responsible Use

4/4

MARKETING

BRIEF · MARKETING · CAMPAIGN-BRIEF

Draft the launch campaign brief.

You are a marketing manager. Sales is launching a new product line in 4 weeks and needs a launch campaign brief: positioning, key messages, channel mix, asset list, and a 4-week timeline. Inputs are the product PRD (32 pages), the last three launch retros, and an interview transcript with the head of product. Produce the brief. Tell us how you did it.

L1 · MANUALReads everything by hand, writes the brief from scratch.

BUILD LOG

I read the PRD, the three retros, and the interview transcript end to end. I drafted the brief in our template by hand. Two and a half days from open to lock. I do not use AI on positioning work because I want my own voice carrying it.

OUTPUT

LAUNCH BRIEF / Product X
Positioning: Fastest way for sales teams to brief themselves before calls.
Audience: AEs and SDRs at companies 50-500.
Key messages: speed, accuracy, prep time saved.
Channels: email, paid LinkedIn, partner amplification.
Timeline: T-4 brief lock, T-2 assets, T-0 launch, T+2 recap.
Asset list: 6 items (subject lines, ad copy, landing page hero, webinar abstract, partner one-pager, sales enablement slide).
Status: locked. Distributed to product, sales, CS.

L2 · ASSISTEDAsks AI to write the first draft. Edits for voice.

BUILD LOG

I uploaded the PRD and the interview transcript to ChatGPT and asked for a launch brief in our template. I edited the result for voice and added the channel mix from memory. About four hours start to finish. I did not check the brief against the retros for past lessons.

OUTPUT

LAUNCH BRIEF / Product X
Positioning (AI draft, lightly edited): The brief-yourself layer for revenue teams.
Audience: AEs and SDRs (per PRD).
Key messages: speed, accuracy, "the brief is already done."
Channels: email + paid LinkedIn (our default mix).
Timeline: standard 4-week.
Missed in this draft: last launch retro called out partner amplification as 0.6x cost ratio. Not addressed here. Caught in review the day before lock; cost us two days of rework.

L3 · AUGMENTEDReusable launch-brief prompt, verified against retros and PRD.

BUILD LOG

I built a reusable launch-brief prompt: it ingests the PRD, the last three retros, and an interview transcript, returns our template populated, and explicitly cross-references the retros for what worked and what did not. I verify positioning lines against the head of product's interview before locking. Anything the model is unsure about (channel ROI assumptions, partner risk, pricing emphasis) gets flagged for me to decide. About four hours, most of it on the decisions.

OUTPUT

LAUNCH BRIEF / Product X
Positioning (verified vs head of product interview, transcript line 47).
Audience: AEs + sales managers (managers added from PRD §4.2).
Key messages: speed, accuracy, brief reliability.
Channel mix:
  Email (high-prior channel, kept).
  Paid LinkedIn (kept, retro shows 1.4x meeting rate last launch).
  Partner amplification: dropped (retro 2024-Q3: 0.6x cost ratio).
  Replaced with: webinar series with 3 design partners.
Decisions queued for marketing manager (2):
  Pricing emphasis Y/N for week 1 messaging.
  Analyst outreach budget: $20K Forrester or $0 grassroots.

L4 · ARCHITECTThe brief produces itself. Marketing manager makes the calls.

BUILD LOG

I built a launch-brief system. It pulls every PRD, retro, and interview transcript from our shared drive on demand, returns a populated brief in our template, and surfaces three to five specific decisions for me as the marketing manager. The same system generates the full asset spec list (subject lines, ad copy variants, webinar landing copy, sales enablement deck outline) and queues them for review in our content tool. Cross-team distribution drafts go to product, sales, and CS leads. The brief is ready 90 minutes after I trigger it. My time goes to the decisions, not the typing.

OUTPUT

LAUNCH BRIEF / Product X   [auto-drafted T-28d 09:14, verified by MM 10:31, distributed 10:42]
Positioning, key messages, channel mix: locked.
Asset spec list: 14 items, queued in content tool with owners.
Cross-team drafts ready: product readiness checklist sent, sales enablement deck outlined, CS macros drafted.
Decisions surfaced for marketing manager (3):
  Pricing emphasis Y/N (recommendation: N for week 1; learning from retro 2024-Q4).
  Analyst outreach: $20K Forrester or $0 grassroots (recommendation: grassroots, retro-supported).
  Risk: partner amplification (recommendation: drop, replace with 3-partner webinar; retro 2024-Q3 evidence).
Time-to-brief this launch: 90 min.   Median across last three launches: 18 hours.

LEVERAGE PROFILE · MARKETING · L4 BENCHMARK

What “great” looks like for this role.

Tool Fluency

4/4

Workflow Architecture

4/4

Judgment & Verification

3/4

Leverage Ceiling

4/4

Responsible Use

4/4

WHAT WE MEASURE

Five dimensions. One picture of how this person works.

01
Tool Fluency
Can they drive the tools: prompting, context, connectors, automation.
02
Workflow Architecture
Do they redesign the job around AI, or bolt it on as a faster search box.
03
Judgment & Verification
Do they know when AI is wrong, how they check it, where humans stay in the loop.
04
Leverage Ceiling
How far they push: one-off prompts versus reusable systems and agents.
05
Responsible Use
How they handle data, confidentiality, and disclosure.

WHO IT IS FOR

Built for the people doing the hiring, not the testing industry.

Hiring teams
Building AI-first benches who need a real signal, not a quiz.
Operating partners
Assessing the talent inside a portfolio company.
Founders
Who need every early hire to multiply themselves with AI.

Stop guessing who is actually AI-first.

Pilot cohorts open monthly by role. Sales, Operations, and Marketing first. We email you when your role unlocks, send a five-question intake, and book a 20-minute walkthrough.

We will not share your email. One message when your role opens. Unsubscribe in one click.

HIRING NOW · MANUAL SERVICE

Need a Leverage Profile this week, not next quarter?

Inflection Group runs Caliber as a manual service today. We author the scenario for your role, score the candidates you send, and deliver each Leverage Profile inside five business days. Three pilot orgs per quarter run free while we onboard the self-serve product.

Email Shane to start a pilot

Hire for how people actually use AI.

Everyone says they want AI-first people. Almost no one can test for it.

Pick a role. See how the same job gets done at four levels of AI leverage.

Give me a weekly pipeline report every Friday.

What “great” looks like for this role.

Five dimensions. One picture of how this person works.

Tool Fluency

Workflow Architecture

Judgment & Verification

Leverage Ceiling

Responsible Use

Built for the people doing the hiring, not the testing industry.

Hiring teams

Operating partners

Founders

Stop guessing who is actually AI-first.

Need a Leverage Profile this week, not next quarter?