Methodology, reviewed 2026-05-16

How Talvio scores AI augmentation potential for training prioritization

Talvio's score is a work-activity-exposure signal. It helps organizations decide where AI training and workflow discovery should start by reading the O*NET activities that make up each occupation, then weighting those activities by their importance and level in the role.

What The Score Claims

The Talvio Augmentation Potential score is designed for training prioritization, workflow discovery, and budget planning. It is not an individual performance measure and it is not a staffing-impact forecast.

The methodology keeps a high-resolution internal value for validation. Product screens display TAP on a 0-10 scale; it is a display rescaling, not a separate score.

How TAP Is Computed

TAP is deterministic: the same reviewed inputs always produce the same score, and the scoring run is arithmetic. There is no model inference at scoring time. The score combines what an occupation does from O*NET with reviewed AI capability maturity values from public benchmarks.

1. Occupation activity weights For each occupation and GWA, Talvio computes raw weight as O*NET Importance times Level, then normalizes within the occupation so all activity weights sum to 1.
wo,j = IMo,j × LVo,j j(IMo,j × LVo,j)
2. Activity-to-capability matrix The reviewed 41 x 12 matrix uses 0-1 values as addressable-work fractions for each GWA-capability pair. The current reviewed scale uses 0.0, 0.2, 0.5, 0.8, and 1.0.
0 ≤ Mj,k ≤ 1 Mj,k ∈ {0.0, 0.2, 0.5, 0.8, 1.0}
3. Capability maturity Each reviewed capability score is stored on a 0-10 scale and used in scoring as a fraction.
mk = Ck10
4. Activity addressability Capabilities combine with a probabilistic-OR rule, which preserves differences between activities that would be flattened by a capped additive sum.
Aj = 1 − ∏k=112(1 − Mj,k · mk)
5. Physical/embodied ceiling The physical-ceiling factor is not an added AI capability. It multiplicatively discounts activities requiring embodied, in-person, or real-time execution toward the 0.15 ceiling factor.
Afinalj = Aj · [1 − Physicalj · (1 − 0.15)]
6. Occupation rollup The final TAP score is the weighted mean of final activity addressability displayed on a 0-10 scale.
TAPo = 10 · ∑jwo,jAfinalj

TAP is a within-occupation share of addressable weighted work, not a rank and not an absolute capability claim. Occupations with the same TAP have similar proportions of addressable weighted work, not necessarily similar job content. In the current release, product score explanations are work-activity-based, so top O*NET work-activity drivers are the honest explanation for a role's score.

TAP calculation example for Registered Nurses showing work-activity weights, addressability, physical ceiling factors, and contribution points.
Real calculation example from O*NET 30.2: Registered Nurses. Source data is generated with the current reviewed matrix, capability scores, physical ceiling, and importance-times-level weighting strategy.
Sorted distribution curve of TAP scores across all direct-scored O*NET occupations with six labeled anchor occupations.
Sorted TAP distribution for the 894 direct-scored occupations. Anchor labels are real O*NET occupations from the production master.

Capability Maturity As Training Context

The reviewed capability scores help shape training design and capability views. They answer "what AI skills are relevant here?" while the priority ranking remains work-activity-exposure-based in the current validated release.

Capability Reviewed maturity Primary benchmark
Written content generation and editing 9.2/10 HELM Capabilities
Information synthesis and research 7.8/10 Artificial Analysis Intelligence Index
Structured data analysis and quantitative reasoning 8.1/10 Artificial Analysis Intelligence Benchmarking
Coding and software engineering 8.4/10 SWE-bench
Conversational support and customer interaction 8.4/10 Arena Text Leaderboard
Translation and cross-language work 8.5/10 Artificial Analysis Multilingual Index
Speech and audio processing 7.7/10 Artificial Analysis Speech to Text Leaderboard
Image and document understanding 8.6/10 MMMU-Pro
Image, video, and design generation 8.2/10 Artificial Analysis Image Model Leaderboard
Planning, scheduling, and structured decision support 7.6/10 GDPval
Tool use and autonomous agents 7.1/10 METR Time Horizons
Domain-specialist reasoning 7.7/10 GPQA

External Validation

The score is work-activity-exposure-based: the training-prioritization ranking is validated against Felten AIOE (Spearman rho=0.920, n=682 SOC6, O*NET-structure-derived) and independently corroborated against Anthropic Economic Index observed AI task-use (Spearman rho=0.424, n=480 SOC6 occupations, release_2026_03_24). The AEI correlation is moderate and computed on a 54% SOC6 subset; the claim is that TAP corresponds to observed use about as well as the best structural exposure measure (Felten AIOE vs. AEI rho=0.414 on the same n=480 common set), not that it is highly predictive of AI use.

rho=0.920
Felten AIOE alignment Primary validation referent, n=682 SOC6 occupations. Felten AIOE is O*NET-structure-derived, so it is load-bearing for structural exposure validity.
rho=0.424
AEI observed-use corroboration Anthropic Economic Index observed AI task-use, n=480 SOC6 occupations, release_2026_03_24. Moderate, 54% SOC6 subset.
rho=0.414
Structural-measure comparator Felten AIOE vs. AEI on the same n=480 common set. This supports the scoped claim that TAP corresponds to observed use about as well as the structural exposure benchmark.

AEI is independent of O*NET structure: it is observed AI task-use rather than an O*NET-structure-derived exposure score, and it corroborates TAP about as strongly as it corroborates AIOE (rho=0.424 vs rho=0.414, n=480 SOC6), addressing the structural-circularity limitation.

GDPval near-zero correlation is explained by construct differences: TAP shows the same pattern as Felten AIOE (rho=0.920, n=682 SOC6, O*NET-structure-derived) and Anthropic Economic Index observed AI task-use (rho=0.424, n=480 SOC6 occupations, release_2026_03_24): strong with exposure measures, moderate with observed use, and near-orthogonal to peak-deliverable-quality benchmarks. For Talvio's purpose, that orthogonality is correct behavior: GDPval measures peak deliverable quality on selected tasks, while TAP measures exposure of an occupation's work-activity mix.

The validated claim is deliberately scoped: Talvio corresponds to observed AI use about as well as Felten AIOE on the matched AEI set (Talvio-vs-AEI rho=0.424; Felten-vs-AEI rho=0.414). It does not claim high prediction of observed use for every occupation or every organization.

Read the TAP external validation paper

For readers who want benchmark comparisons, validation narrative, limitations, references, and audit detail.

Coverage Limitation

The production model carries 122 explicit exclusions across residual, split-code, and out-of-scope military rows. They remain visible as excluded rows with provenance rather than being silently scored.

Reason categories are A residual (residual_soc_no_descriptor_data), B split-code (split_code_no_work_activities), and C military (out_of_scope_military). Donor imputation was analyzed and is not included in the production method.

894
Direct-scored occupations Rows with O*NET 30.2 Work Activities coverage.
77
A residual Excluded because residual SOC rows do not have occupation-specific O*NET Work Activities profiles.
26
B split-code Excluded because no O*NET 30.2 Work Activities row is available for this detailed split code.
19
C military Out of scope by design for Talvio's civilian occupation scope; not a coverage failure.