PartnerScope · Methodology

How PartnerScope assesses your vendors

This document is the methodology behind every PartnerScope assessment — the 13 dimensions, 4 risk bands, automated tests, documentary review and AI red-teaming. Public, citable, and aligned with EU AI Act, GDPR, DORA and NIS2. Share it with your procurement or security team.

Version 1.0 · Last updated 2026-04-22 · Download as Markdown

1. Why third-party AI risk, and why now

Under EU law, buying an AI product from another company is no longer a straight commercial decision. It's a regulated one.

  • EU AI Act (Regulation (EU) 2024/1689) classifies AI systems by risk tier. Deploying a "high-risk" AI system (Annex III — e.g. biometrics, education scoring, critical-infrastructure control, employment filtering) triggers pre-market duties on the provider and due-diligence duties on the deployer.
  • GDPR (Regulation (EU) 2016/679) Art. 28 makes you responsible for your processors' security posture — including the AI vendor processing your customers' personal data.
  • DORA (Regulation (EU) 2022/2554) Art. 28–30 requires financial-services firms to maintain an ICT third-party register and enforce contractual controls.
  • NIS2 (Directive (EU) 2022/2555) extends supply-chain security obligations to critical and important entities across ~18 sectors.

"We trust the vendor" is no longer a defensible procurement answer. PartnerScope exists to replace that with a repeatable, evidenced assessment you can hand to your DPO, your board, and — if asked — your regulator.

This document is not a legal opinion. It's how we assess. Legal interpretation stays with your counsel.

2. The framework at a glance

Every PartnerScope assessment produces a single composite score from 0 to 100, across 13 dimensions, grouped into 3 pillars, mapped to 4 risk bands.

PillarShare of compositeWhat it covers
A — Behavioral25%How the vendor behaves — accountability, communication, dispute handling, consistency, ethics.
B — Financial & Structural30%Whether the vendor can survive — finances, contracts, operations, governance, exit.
C — AI & Compliance45%The reason this tool exists — data provenance, model transparency, regulatory fit.

The weighting is intentional: AI & Compliance carries 45% because that's where EU law currently has teeth, and where a single undisclosed fact can turn an otherwise healthy vendor into an unshippable risk. A vendor who wins on Behavioral and Financial but scores ≤ 40 on any AI & Compliance dimension is capped — they cannot reach the top band regardless of the other 25 dimensions' strength.

3. Four risk bands

BandScoreMeaningRecommended buyer action
HIGH 0 – 40 Do not onboard without remediation. Block contract; request specific evidence before proceeding.
MEDIUM 41 – 65 Onboard with conditions & monitoring. Conditional approval; list remediation items before signing.
LOW 66 – 85 Standard onboarding, quarterly review. Proceed; add to quarterly review list.
MINIMAL 86 – 100 Standard contract, annual review. Proceed; reduce review frequency to annual.

Edge cases fall to the lower band (a composite of exactly 65 is MEDIUM, not LOW — conservative by design).

4. The 13 dimensions

Every dimension has a code (D01 … D13), a pillar, a one-line definition, a regulatory anchor list (the Articles / Annexes / framework clauses we explicitly test against), a questionnaire, a subset of automated tests, and (from Pro upward) an evidence list the vendor is asked to produce.

Pillar A — Behavioral (25%)

CodeDimensionWhat it measuresRegulatory anchor
D01Accountability & ResponsibilityOwnership of failures, reliability on commitments.NIST AI RMF GOVERN 2
D02Communication & TransparencyResponsiveness, documentation discipline, proactive disclosure.NIST AI RMF GOVERN 4
D03Boundaries & Conflict ResolutionAbility to handle disagreement without litigation; scope-creep discipline.
D04Consistency & ReliabilityRetention of long-term clients, SLA track record.
D05Integrity & EthicsEthics code, external audits, absence of regulator findings.EU AI Act Art. 15

Pillar B — Financial & Structural (30%)

CodeDimensionWhat it measuresRegulatory anchor
D06Financial BehaviorAudited financials, runway, debt posture, funding stability.DORA Art. 28
D07Formal AgreementsContract clarity, IP ownership, SLA enforceability, DPA posture.DORA Art. 30, GDPR Art. 28
D08Operational DeliveryIncident response, BCP / DR, capacity.DORA Art. 11, NIS2
D09Governance & Decision RightsBoard structure, UBO clarity, key-man risk.DORA Art. 30
D10Exit & ContinuityData portability, offboarding, source-code escrow (if applicable).DORA Art. 30(3)

Pillar C — AI & Compliance (45%) — core differentiator

CodeDimensionWhat it measuresRegulatory anchor
D11Data ProvenanceTraining-data lineage, consent basis, PII handling, retention.EU AI Act Art. 10, GDPR Art. 6/9
D12Model TransparencyModel card, explainability, version control, change management.EU AI Act Annex IV, Art. 13
D13Regulatory ComplianceAnnex III classification, DPA signed, sub-processor list, Art. 73 incidents, ISO 27001 / SOC 2, NIS2 / DORA mapping.EU AI Act Annex III, Art. 73, GDPR Art. 28, DORA, NIS2

5. How we score

5.1 Data sources per dimension

Each dimension's 0–100 score is composed from three inputs:

  1. Questionnaire score — structured answers from the buyer (and, for Pro / Enterprise, the vendor). Likert 1–5, multi-select, and document-upload question types. ~78 questions total, gated per tier.
  2. Automated test score — pass rate of the subset of tests in §6 that map to that dimension.
  3. Evidence bonus (Pro / Enterprise only) — analyst verification of the documents the vendor has produced.

The mix is deliberately skewed toward the questionnaire signal, because the questionnaire reflects buyer-observed reality. Automated tests are an objectivity anchor. Evidence review is a qualitative multiplier on top. Exact coefficients are not published — what matters to a buyer is that every number in the report is traceable: click any dimension score in the PDF and you see which questions and tests produced it.

5.2 Composite

The composite is a weighted sum of the 13 dimension scores, with the pillar shares in §2. Rounded to an integer 0–100.

Hard cap rule. If any AI & Compliance dimension (D11 / D12 / D13) scores ≤ 40, the composite is capped at 65 regardless of the other 10 dimensions. One HIGH in AI-compliance ⇒ no MINIMAL band. This rule is non-negotiable and printed on every report.

5.3 Verdicts

Above the numeric score, every report carries a four-state verdict:

VerdictConditionReader action
DECLINE Any hard red flag is raised (see §7) — overrides the composite score. Do not onboard. Re-test only after the flagged issue is demonstrably remediated.
HOLD Composite ≤ 40 and no hard red flag. Do not sign until top-priority items close, or upgrade to Pro for analyst review.
PROCEED WITH CONDITIONS 41 ≤ composite ≤ 65. Conditional approval — buyer-side conditions from the Red-flags / Data-gaps sections must be satisfied or contractually mitigated.
PROCEED Composite ≥ 66 and no hard red flag. Acceptable controls. Residual risks can be addressed in contract language.

The verdict is the one sentence your GC will quote.

5.4 Skipped questions & insufficient data

Up to 10% of questions in a dimension may be skipped without penalty. Beyond that, the dimension is marked INSUFFICIENT_DATA, excluded from the composite, and its weight is redistributed proportionally. The report calls this out explicitly — we never silently lower a score because of a blank answer.

6. Automated tests

PartnerScope runs a tiered suite of technical and open-source-intelligence tests against the vendor's domain and legal entity.

TierNumber of automated testsCadence
Free snapshot2On demand
Starter (€99)7Once per run
Pro (€299)18Once per run, re-runnable on request
Enterprise (€4,900/yr)25 + continuousOnce per run + 11 weekly signals

6.1 What the Starter tier runs

#TestWhat it checksPrimary dim.
1DNS records + DNSSECA / AAAA / MX / NS / SOA / CAA records; DNSSEC validation.D08
2TLS handshake & cipher qualityTLS 1.2+, cipher suites, cert chain, HSTS, OCSP stapling.D08
3Security headersHSTS, CSP, XFO, XCTO, Referrer-Policy, Permissions-Policy, Cross-Origin-*.D08
4Breach history (HIBP)Past data breaches associated with the corporate domain.D08
5Mail deliverabilitySPF, DKIM, DMARC posture.D08
6Certificate TransparencyUnexpected certificates issued for the vendor's domain.D08
7Sanctions screeningOFAC SDN, EU consolidated list, UK OFSI, UN Security Council. Fuzzy-matched against legal entity + directors + UBOs.D05 / D07

6.2 What Pro adds (tests 8–18)

PEP screening · adverse-media scan (30-day window) · commercial register lookup (DE / AT / CH / AZ; OpenCorporates fallback) · UBO extraction & cross-check · credit score (Bisnode / CreditSafe) · insolvency register · regulatory licence check (BaFin / FMA / FINMA) · SBOM parse + CVE correlation · Model Card completeness (vs. NIST AI RMF + Google MCT) · EU AI Act Annex III classification (LLM-assisted) · GDPR Art. 28 DPA clause extractor.

6.3 What Enterprise adds (tests 19–25 + continuous)

Dark-web exposure (Intel X, HIBP Enterprise) · ASN + egress-IP geolocation · sub-processor enumeration + concentration · model-drift signal · DORA ICT-register cross-check · supply-chain depth (4th-party) · continuous monitoring (11 signals polled weekly).

Enterprise continuous monitoring watches: new breach, new sanctions hit, adverse media, TLS cert change, security-header regression, Whois / registrant change, model-version bump, new SBOM CVE ≥ CVSS 7.0, credit-score delta ≥ 10 pts, UBO change, DNS / MX change. Any signal re-triggers scoring.

6.4 Failure semantics

pass (80–100 contribution) · warn (50–70) · fail (0–40) · error (transient infrastructure failure; retry, do not score).

7. Hard red flags (the override rule)

Some findings are severe enough that no composite score survives them. If any of the following fires, the verdict is forced to DECLINE regardless of the numeric score:

  1. Sanctions / PEP high-confidence match. OFAC / EU / UK / UN screening returns a high-confidence name match against the legal entity, a director, or a UBO.
  2. UBO mismatch. The beneficial-owner names in official registers do not match what the vendor disclosed.
  3. Undisclosed high-risk classification. The vendor qualifies as "high-risk" under EU AI Act Annex III but has not registered, not produced Annex IV technical documentation, or fails Art. 16 provider obligations.
  4. Active breach exposure. HIBP records a breach involving PII + credentials in the last 12 months, with no public remediation statement.
  5. Missing mandatory documentation. No DPA when the engagement involves EU-personal-data processing; no ISO 27001 / SOC 2 when financial-services expectations apply.
  6. Dimension score ≤ 40 in D11 Data Provenance combined with undisclosed training-data lineage.

Separately, a dimension-level score of ≤ 40 (without one of the above) is a soft red flag — called out as a "blocker" and drives the verdict down, but doesn't by itself force DECLINE. Every flag carries a severity tag: blocker, high, or medium. The remediation section sorts them into before signature, within 30 days of onboarding, and quarterly monitoring.

8. AI red-team suite

From the Pro tier upward, we actively probe the vendor's AI system for known failure modes. Red-team is not "does the product work" — it's "can an attacker make the product do something it shouldn't."

8.1 Tier matrix

TierPayloadsCategories coveredDelivery
Starter0Not included in Starter. Upgrade to Pro.
Pro5Prompt injection (3) · Jailbreak (1) · PII leakage (1)Automated, with a short report section.
Enterprise25+, continuousAll 6 categories (adaptive, weekly retest)Dedicated analyst + weekly retest.

8.2 Categories

  1. Prompt injection — direct, indirect (document-based), tool-chain contamination, cross-prompt poisoning, system-prompt extraction, role confusion, encoding bypass (base64, homoglyph, zero-width).
  2. Jailbreak — DAN-style persona, hypothetical framing, multilingual pivot, gradient / crescendo escalation, many-shot in-context bypass.
  3. PII leakage — divergence attacks, training-data canaries (synthetic PII only), session cross-contamination, RAG-store poisoning.
  4. Bias & fairness (Enterprise) — demographic parity, equal opportunity, counterfactual fairness across six protected categories.
  5. Robustness (Enterprise) — typo perturbation, semantic paraphrasing, context-length stress (8k / 32k / 128k), adversarial Unicode.
  6. Agentic / tool abuse (Enterprise) — unauthorized tool calls, recursive self-invocation, privilege escalation, data exfiltration via function calls.

8.3 Scoring

Each payload is labelled Blocked (100), Partial (50), or Succeeded (0). Category score = mean across payloads; red-team composite = weighted mean across categories. In Enterprise, more than two successful attacks cap D12 Model Transparency at 40.

8.4 Execution protocol

  1. Signed Rules of Engagement before any payload is sent; no production-data access.
  2. Dedicated API key, rate-limited sandbox.
  3. Every prompt / response stored encrypted (AES-256) in a region-matched bucket.
  4. Automated grader + analyst sign-off on every Partial / Succeeded result.
  5. Responsible disclosure: vendor notified within 24 h of any Critical finding; 30-day fix window before publication.
  6. Re-test after vendor fix — full rerun for confirmation.

8.5 What we do not publish

The full payload library is not published. It's made available under NDA to Enterprise customers with a legitimate red-team governance need. Publishing the payloads would make them less effective against the vendors our customers actually need to test. All red-team activity is performed only with written vendor authorisation (SaaS addendum + RoE). No payloads that would violate law. PII canaries use synthetic data.

9. Documentary review & evidence

From the Pro tier, we ask the vendor for a structured evidence list and evaluate completeness. Evidence review is a qualitative multiplier on scoring — the presence of a signed DPA is worth more than the mere claim of one.

PillarEvidence requested (illustrative, not exhaustive)
Behavioral RACI for AI systems · incident-communication policy · SLA documentation · escalation matrix · client references (3, for Pro) · code of conduct · ethics committee charter.
Financial & Structural Latest audited financials · cap table · funding history · standard MSA / SaaS agreement · DPA template · BCP / DR plan · last incident post-mortem · UBO declaration · board composition · exit plan document · source-code escrow (if applicable).
AI & Compliance Training-data inventory · consent mechanism documentation · data-retention policy · model card · system card · evaluation reports · bias-audit results · signed DPA · sub-processor list · EU AI Act registration (if Annex III) · ISO 27001 / SOC 2 · NIS2 self-assessment · DORA mapping (where in scope).

Every requested document resolves to missing, provided (unverified), or provided (analyst-verified). A mismatch between a claim ("we are ISO 27001 certified") and a document (expired cert, wrong scope) is itself a soft red flag.

10. Regulatory anchor map

#DimensionEU AI ActGDPRDORANIS2NIST AI RMF
D01Accountability & ResponsibilityGOVERN 2
D02Communication & TransparencyGOVERN 4
D03Boundaries & Conflict Resolution
D04Consistency & Reliability
D05Integrity & EthicsArt. 15
D06Financial BehaviorArt. 28
D07Formal AgreementsArt. 28Art. 30
D08Operational DeliveryArt. 11
D09Governance & Decision RightsArt. 30
D10Exit & ContinuityArt. 30(3)
D11Data ProvenanceArt. 10Art. 6 / 9MEASURE 2
D12Model TransparencyAnnex IV, Art. 13MEASURE 2.6 / 2.7
D13Regulatory ComplianceAnnex III, Art. 73Art. 28

Source citations (verbatim):

  • EU AI Act: Regulation (EU) 2024/1689 — Annex III (high-risk), Art. 10 (data governance), Art. 13 (transparency), Art. 15 (accuracy / robustness / cybersecurity), Annex IV (technical documentation), Art. 73 (incident reporting).
  • GDPR: Regulation (EU) 2016/679 — Art. 6 (lawfulness of processing), Art. 9 (special categories), Art. 28 (processors), Art. 32 (security of processing), Art. 44–49 (international transfers).
  • DORA: Regulation (EU) 2022/2554 — Art. 11 (operational resilience testing), Art. 28–30 (ICT third-party risk management).
  • NIS2: Directive (EU) 2022/2555 — supply-chain security obligations.
  • NIST AI RMF 1.0: reference framework — GOVERN, MAP, MEASURE, MANAGE functions.
Anchors are indicative. PartnerScope is not a legal opinion — consult counsel before acting on any anchor as a regulatory finding.

11. Deliverables by tier

TierPriceWhat you getTurnaround
Starter €99, one-time, per vendor Automated dossier: 7 automated tests, 13-dimension scorecard driven by a buyer-side questionnaire, PDF report (~8 pages), red-flag summary, recommendation letter paragraph. No red-team. No analyst. Same business day.
Pro €299, one-time, per vendor Everything in Starter + 18 automated tests + 5 red-team payloads + analyst-verified documentary review + written narrative (2–3 pages) + remediation checklist. 48 h SLA.
Enterprise €4,900 / year / vendor Everything in Pro + 25 automated tests + continuous monitoring (11 signals, weekly) + full red-team suite (25+ payloads, adaptive) + quarterly dashboard + dedicated analyst + remediation tracking + re-test on demand. Minimum 15 vendors. Continuous.

Each tier produces the same 13-dimension score on the same 0–100 scale — the difference is depth, not shape. A Starter score is comparable to a Pro score is comparable to an Enterprise score, and all three are comparable across vendors.

12. What PartnerScope is NOT

  • Not a legal opinion. Regulatory anchors are indicative. We do not advise on whether a specific deployment falls under EU AI Act high-risk classification — that's your counsel's call.
  • Not a penetration test. Automated tests are non-invasive and rate-limited. We don't attempt exploitation of vendor infrastructure.
  • Not a SOC 2 / ISO 27001 audit. We check whether the vendor has the certificate, its scope, its age, and whether the claimed scope matches the product you're buying. We do not re-perform the audit.
  • Not a DPIA. A Data Protection Impact Assessment is the deployer's obligation under GDPR Art. 35. Our report is an input to your DPIA — not the DPIA itself.
  • Not white-box. Red-team testing is strictly black-box unless the vendor agrees otherwise in writing. We do not see model weights, training code, or internal tool chains.
  • Not exhaustive on automated tests. Third-party APIs have rate limits. A clean scan today is a clean scan today — not a guarantee for tomorrow. That's why Enterprise includes continuous monitoring.
  • Not a substitute for vendor dialogue. A high PartnerScope score is a green light to start the procurement conversation, not the end of it.

13. Versioning & review cadence

  • Framework version: 13.0 (2026-04-20). Increments when dimensions are added or removed.
  • Scoring version: 1.0.0. Increments when weights or the composite formula change.
  • Both versions are stamped on every PDF report for audit reproducibility.
  • Review cadence: quarterly, or immediately when an EU-level implementing or delegated act changes a cited obligation.
VersionDateChange
1.02026-04-22First public release.

14. Glossary

Annex III
Annex of the EU AI Act listing use cases considered "high-risk."
BCP / DR
Business Continuity Planning / Disaster Recovery.
CT log
Certificate Transparency; the public log of every TLS certificate ever issued.
CVE / CVSS
Common Vulnerabilities and Exposures; the severity-scoring system applied to them.
DORA
Digital Operational Resilience Act (EU) 2022/2554.
DPA
Data Processing Agreement. The contract required by GDPR Art. 28.
DPIA
Data Protection Impact Assessment (GDPR Art. 35).
HIBP
Have I Been Pwned — a public database of known data breaches.
Likert
A scaled-answer format (1 to 5) used in structured surveys.
MSA / SaaS agreement
Master Services Agreement / Software-as-a-Service contract.
NIS2
Network and Information Security Directive (EU) 2022/2555.
NIST AI RMF
U.S. National Institute of Standards and Technology's AI Risk Management Framework.
OWASP LLM Top 10
OWASP Foundation's list of the ten most common LLM-application vulnerabilities.
PEP
Politically Exposed Persons (relevant to sanctions screening).
RACI
Responsible / Accountable / Consulted / Informed — responsibility matrix.
RAG
Retrieval-Augmented Generation — an LLM architecture that retrieves external documents at query time.
RoE
Rules of Engagement. The scoping document signed before any red-team payload is sent.
SBOM
Software Bill of Materials. A dependency manifest.
UBO
Ultimate Beneficial Owner. The natural person who ultimately owns or controls an entity.

Ready to assess a vendor?

The same methodology, priced three ways. Starter is same-day.

Questions about the methodology? elshan.musayev@partnerscope.eu · Markdown copy: download.