Assessment Methodology — PartnerScope

1. Why third-party AI risk, and why now

Under EU law, buying an AI product from another company is no longer a straight commercial decision. It's a regulated one.

EU AI Act (Regulation (EU) 2024/1689) classifies AI systems by risk tier. Deploying a "high-risk" AI system (Annex III — e.g. biometrics, education scoring, critical-infrastructure control, employment filtering) triggers pre-market duties on the provider and due-diligence duties on the deployer.
GDPR (Regulation (EU) 2016/679) Art. 28 makes you responsible for your processors' security posture — including the AI vendor processing your customers' personal data.
DORA (Regulation (EU) 2022/2554) Art. 28–30 requires financial-services firms to maintain an ICT third-party register and enforce contractual controls.
NIS2 (Directive (EU) 2022/2555) extends supply-chain security obligations to critical and important entities across ~18 sectors.

"We trust the vendor" is no longer a defensible procurement answer. PartnerScope exists to replace that with a repeatable, evidenced assessment you can hand to your DPO, your board, and — if asked — your regulator.

This document is not a legal opinion. It's how we assess. Legal interpretation stays with your counsel.

2. The framework at a glance

Every PartnerScope assessment produces a single composite score from 0 to 100, across 13 dimensions, grouped into 3 pillars, mapped to 4 risk bands.

Pillar	Share of composite	What it covers
A — Behavioral	25%	How the vendor behaves — accountability, communication, dispute handling, consistency, ethics.
B — Financial & Structural	30%	Whether the vendor can survive — finances, contracts, operations, governance, exit.
C — AI & Compliance	45%	The reason this tool exists — data provenance, model transparency, regulatory fit.

The weighting is intentional: AI & Compliance carries 45% because that's where EU law currently has teeth, and where a single undisclosed fact can turn an otherwise healthy vendor into an unshippable risk. A vendor who wins on Behavioral and Financial but scores ≤ 40 on any AI & Compliance dimension is capped — they cannot reach the top band regardless of the other 25 dimensions' strength.

3. Four risk bands

Band	Score	Meaning	Recommended buyer action
HIGH	0 – 40	Do not onboard without remediation.	Block contract; request specific evidence before proceeding.
MEDIUM	41 – 65	Onboard with conditions & monitoring.	Conditional approval; list remediation items before signing.
LOW	66 – 85	Standard onboarding, quarterly review.	Proceed; add to quarterly review list.
MINIMAL	86 – 100	Standard contract, annual review.	Proceed; reduce review frequency to annual.

Edge cases fall to the lower band (a composite of exactly 65 is MEDIUM, not LOW — conservative by design).

4. The 13 dimensions

Every dimension has a code (D01 … D13), a pillar, a one-line definition, a regulatory anchor list (the Articles / Annexes / framework clauses we explicitly test against), a questionnaire, a subset of automated tests, and (from Pro upward) an evidence list the vendor is asked to produce.

Pillar A — Behavioral (25%)

Code	Dimension	What it measures	Regulatory anchor
`D01`	Accountability & Responsibility	Ownership of failures, reliability on commitments.	NIST AI RMF GOVERN 2
`D02`	Communication & Transparency	Responsiveness, documentation discipline, proactive disclosure.	NIST AI RMF GOVERN 4
`D03`	Boundaries & Conflict Resolution	Ability to handle disagreement without litigation; scope-creep discipline.	—
`D04`	Consistency & Reliability	Retention of long-term clients, SLA track record.	—
`D05`	Integrity & Ethics	Ethics code, external audits, absence of regulator findings.	EU AI Act Art. 15

Pillar B — Financial & Structural (30%)

Code	Dimension	What it measures	Regulatory anchor
`D06`	Financial Behavior	Audited financials, runway, debt posture, funding stability.	DORA Art. 28
`D07`	Formal Agreements	Contract clarity, IP ownership, SLA enforceability, DPA posture.	DORA Art. 30, GDPR Art. 28
`D08`	Operational Delivery	Incident response, BCP / DR, capacity.	DORA Art. 11, NIS2
`D09`	Governance & Decision Rights	Board structure, UBO clarity, key-man risk.	DORA Art. 30
`D10`	Exit & Continuity	Data portability, offboarding, source-code escrow (if applicable).	DORA Art. 30(3)

Pillar C — AI & Compliance (45%) — core differentiator

Code	Dimension	What it measures	Regulatory anchor
`D11`	Data Provenance	Training-data lineage, consent basis, PII handling, retention.	EU AI Act Art. 10, GDPR Art. 6/9
`D12`	Model Transparency	Model card, explainability, version control, change management.	EU AI Act Annex IV, Art. 13
`D13`	Regulatory Compliance	Annex III classification, DPA signed, sub-processor list, Art. 73 incidents, ISO 27001 / SOC 2, NIS2 / DORA mapping.	EU AI Act Annex III, Art. 73, GDPR Art. 28, DORA, NIS2

5. How we score

5.1 Data sources per dimension

Each dimension's 0–100 score is composed from three inputs:

Questionnaire score — structured answers from the buyer (and, for Pro / Enterprise, the vendor). Likert 1–5, multi-select, and document-upload question types. ~78 questions total, gated per tier.
Automated test score — pass rate of the subset of tests in §6 that map to that dimension.
Evidence bonus (Pro / Enterprise only) — analyst verification of the documents the vendor has produced.

The mix is deliberately skewed toward the questionnaire signal, because the questionnaire reflects buyer-observed reality. Automated tests are an objectivity anchor. Evidence review is a qualitative multiplier on top. Exact coefficients are not published — what matters to a buyer is that every number in the report is traceable: click any dimension score in the PDF and you see which questions and tests produced it.

5.2 Composite

The composite is a weighted sum of the 13 dimension scores, with the pillar shares in §2. Rounded to an integer 0–100.

Hard cap rule. If any AI & Compliance dimension (D11 / D12 / D13) scores ≤ 40, the composite is capped at 65 regardless of the other 10 dimensions. One HIGH in AI-compliance ⇒ no MINIMAL band. This rule is non-negotiable and printed on every report.

5.3 Verdicts

Above the numeric score, every report carries a four-state verdict:

Verdict	Condition	Reader action
DECLINE	Any hard red flag is raised (see §7) — overrides the composite score.	Do not onboard. Re-test only after the flagged issue is demonstrably remediated.
HOLD	Composite ≤ 40 and no hard red flag.	Do not sign until top-priority items close, or upgrade to Pro for analyst review.
PROCEED WITH CONDITIONS	41 ≤ composite ≤ 65.	Conditional approval — buyer-side conditions from the Red-flags / Data-gaps sections must be satisfied or contractually mitigated.
PROCEED	Composite ≥ 66 and no hard red flag.	Acceptable controls. Residual risks can be addressed in contract language.

The verdict is the one sentence your GC will quote.

5.4 Skipped questions & insufficient data

Up to 10% of questions in a dimension may be skipped without penalty. Beyond that, the dimension is marked INSUFFICIENT_DATA, excluded from the composite, and its weight is redistributed proportionally. The report calls this out explicitly — we never silently lower a score because of a blank answer.

6. Automated tests

PartnerScope runs a tiered suite of technical and open-source-intelligence tests against the vendor's domain and legal entity.

Tier	Number of automated tests	Cadence
Free snapshot	2	On demand
Starter (€99)	7	Once per run
Pro (€299)	18	Once per run, re-runnable on request
Enterprise (€4,900/yr)	25 + continuous	Once per run + 11 weekly signals

6.1 What the Starter tier runs

#	Test	What it checks	Primary dim.
1	DNS records + DNSSEC	A / AAAA / MX / NS / SOA / CAA records; DNSSEC validation.	D08
2	TLS handshake & cipher quality	TLS 1.2+, cipher suites, cert chain, HSTS, OCSP stapling.	D08
3	Security headers	HSTS, CSP, XFO, XCTO, Referrer-Policy, Permissions-Policy, Cross-Origin-*.	D08
4	Breach history (HIBP)	Past data breaches associated with the corporate domain.	D08
5	Mail deliverability	SPF, DKIM, DMARC posture.	D08
6	Certificate Transparency	Unexpected certificates issued for the vendor's domain.	D08
7	Sanctions screening	OFAC SDN, EU consolidated list, UK OFSI, UN Security Council. Fuzzy-matched against legal entity + directors + UBOs.	D05 / D07

6.2 What Pro adds (tests 8–18)

PEP screening · adverse-media scan (30-day window) · commercial register lookup (DE / AT / CH / AZ; OpenCorporates fallback) · UBO extraction & cross-check · credit score (Bisnode / CreditSafe) · insolvency register · regulatory licence check (BaFin / FMA / FINMA) · SBOM parse + CVE correlation · Model Card completeness (vs. NIST AI RMF + Google MCT) · EU AI Act Annex III classification (LLM-assisted) · GDPR Art. 28 DPA clause extractor.

6.3 What Enterprise adds (tests 19–25 + continuous)

Dark-web exposure (Intel X, HIBP Enterprise) · ASN + egress-IP geolocation · sub-processor enumeration + concentration · model-drift signal · DORA ICT-register cross-check · supply-chain depth (4th-party) · continuous monitoring (11 signals polled weekly).

Enterprise continuous monitoring watches: new breach, new sanctions hit, adverse media, TLS cert change, security-header regression, Whois / registrant change, model-version bump, new SBOM CVE ≥ CVSS 7.0, credit-score delta ≥ 10 pts, UBO change, DNS / MX change. Any signal re-triggers scoring.

6.4 Failure semantics

pass (80–100 contribution) · warn (50–70) · fail (0–40) · error (transient infrastructure failure; retry, do not score).

7. Hard red flags (the override rule)

Some findings are severe enough that no composite score survives them. If any of the following fires, the verdict is forced to DECLINE regardless of the numeric score:

Sanctions / PEP high-confidence match. OFAC / EU / UK / UN screening returns a high-confidence name match against the legal entity, a director, or a UBO.
UBO mismatch. The beneficial-owner names in official registers do not match what the vendor disclosed.
Undisclosed high-risk classification. The vendor qualifies as "high-risk" under EU AI Act Annex III but has not registered, not produced Annex IV technical documentation, or fails Art. 16 provider obligations.
Active breach exposure. HIBP records a breach involving PII + credentials in the last 12 months, with no public remediation statement.
Missing mandatory documentation. No DPA when the engagement involves EU-personal-data processing; no ISO 27001 / SOC 2 when financial-services expectations apply.
Dimension score ≤ 40 in D11 Data Provenance combined with undisclosed training-data lineage.

Separately, a dimension-level score of ≤ 40 (without one of the above) is a soft red flag — called out as a "blocker" and drives the verdict down, but doesn't by itself force DECLINE. Every flag carries a severity tag: blocker, high, or medium. The remediation section sorts them into before signature, within 30 days of onboarding, and quarterly monitoring.

8. AI red-team suite

From the Pro tier upward, we actively probe the vendor's AI system for known failure modes. Red-team is not "does the product work" — it's "can an attacker make the product do something it shouldn't."

8.1 Tier matrix

Tier	Payloads	Categories covered	Delivery
Starter	0	—	Not included in Starter. Upgrade to Pro.
Pro	5	Prompt injection (3) · Jailbreak (1) · PII leakage (1)	Automated, with a short report section.
Enterprise	25+, continuous	All 6 categories (adaptive, weekly retest)	Dedicated analyst + weekly retest.

8.2 Categories

Prompt injection — direct, indirect (document-based), tool-chain contamination, cross-prompt poisoning, system-prompt extraction, role confusion, encoding bypass (base64, homoglyph, zero-width).
Jailbreak — DAN-style persona, hypothetical framing, multilingual pivot, gradient / crescendo escalation, many-shot in-context bypass.
PII leakage — divergence attacks, training-data canaries (synthetic PII only), session cross-contamination, RAG-store poisoning.
Bias & fairness (Enterprise) — demographic parity, equal opportunity, counterfactual fairness across six protected categories.
Robustness (Enterprise) — typo perturbation, semantic paraphrasing, context-length stress (8k / 32k / 128k), adversarial Unicode.
Agentic / tool abuse (Enterprise) — unauthorized tool calls, recursive self-invocation, privilege escalation, data exfiltration via function calls.

8.3 Scoring

Each payload is labelled Blocked (100), Partial (50), or Succeeded (0). Category score = mean across payloads; red-team composite = weighted mean across categories. In Enterprise, more than two successful attacks cap D12 Model Transparency at 40.

8.4 Execution protocol

Signed Rules of Engagement before any payload is sent; no production-data access.
Dedicated API key, rate-limited sandbox.
Every prompt / response stored encrypted (AES-256) in a region-matched bucket.
Automated grader + analyst sign-off on every Partial / Succeeded result.
Responsible disclosure: vendor notified within 24 h of any Critical finding; 30-day fix window before publication.
Re-test after vendor fix — full rerun for confirmation.

8.5 What we do not publish

The full payload library is not published. It's made available under NDA to Enterprise customers with a legitimate red-team governance need. Publishing the payloads would make them less effective against the vendors our customers actually need to test. All red-team activity is performed only with written vendor authorisation (SaaS addendum + RoE). No payloads that would violate law. PII canaries use synthetic data.

9. Documentary review & evidence

From the Pro tier, we ask the vendor for a structured evidence list and evaluate completeness. Evidence review is a qualitative multiplier on scoring — the presence of a signed DPA is worth more than the mere claim of one.

Pillar	Evidence requested (illustrative, not exhaustive)
Behavioral	RACI for AI systems · incident-communication policy · SLA documentation · escalation matrix · client references (3, for Pro) · code of conduct · ethics committee charter.
Financial & Structural	Latest audited financials · cap table · funding history · standard MSA / SaaS agreement · DPA template · BCP / DR plan · last incident post-mortem · UBO declaration · board composition · exit plan document · source-code escrow (if applicable).
AI & Compliance	Training-data inventory · consent mechanism documentation · data-retention policy · model card · system card · evaluation reports · bias-audit results · signed DPA · sub-processor list · EU AI Act registration (if Annex III) · ISO 27001 / SOC 2 · NIS2 self-assessment · DORA mapping (where in scope).

Every requested document resolves to missing, provided (unverified), or provided (analyst-verified). A mismatch between a claim ("we are ISO 27001 certified") and a document (expired cert, wrong scope) is itself a soft red flag.

10. Regulatory anchor map

#	Dimension	EU AI Act	GDPR	DORA	NIS2	NIST AI RMF
D01	Accountability & Responsibility					GOVERN 2
D02	Communication & Transparency					GOVERN 4
D03	Boundaries & Conflict Resolution
D04	Consistency & Reliability
D05	Integrity & Ethics	Art. 15
D06	Financial Behavior			Art. 28
D07	Formal Agreements		Art. 28	Art. 30
D08	Operational Delivery			Art. 11	✓
D09	Governance & Decision Rights			Art. 30
D10	Exit & Continuity			Art. 30(3)
D11	Data Provenance	Art. 10	Art. 6 / 9			MEASURE 2
D12	Model Transparency	Annex IV, Art. 13				MEASURE 2.6 / 2.7
D13	Regulatory Compliance	Annex III, Art. 73	Art. 28	✓	✓

Source citations (verbatim):

EU AI Act: Regulation (EU) 2024/1689 — Annex III (high-risk), Art. 10 (data governance), Art. 13 (transparency), Art. 15 (accuracy / robustness / cybersecurity), Annex IV (technical documentation), Art. 73 (incident reporting).
GDPR: Regulation (EU) 2016/679 — Art. 6 (lawfulness of processing), Art. 9 (special categories), Art. 28 (processors), Art. 32 (security of processing), Art. 44–49 (international transfers).
DORA: Regulation (EU) 2022/2554 — Art. 11 (operational resilience testing), Art. 28–30 (ICT third-party risk management).
NIS2: Directive (EU) 2022/2555 — supply-chain security obligations.
NIST AI RMF 1.0: reference framework — GOVERN, MAP, MEASURE, MANAGE functions.

Anchors are indicative. PartnerScope is not a legal opinion — consult counsel before acting on any anchor as a regulatory finding.

11. Deliverables by tier

Tier	Price	What you get	Turnaround
Starter	€99, one-time, per vendor	Automated dossier: 7 automated tests, 13-dimension scorecard driven by a buyer-side questionnaire, PDF report (~8 pages), red-flag summary, recommendation letter paragraph. No red-team. No analyst.	Same business day.
Pro	€299, one-time, per vendor	Everything in Starter + 18 automated tests + 5 red-team payloads + analyst-verified documentary review + written narrative (2–3 pages) + remediation checklist.	48 h SLA.
Enterprise	€4,900 / year / vendor	Everything in Pro + 25 automated tests + continuous monitoring (11 signals, weekly) + full red-team suite (25+ payloads, adaptive) + quarterly dashboard + dedicated analyst + remediation tracking + re-test on demand. Minimum 15 vendors.	Continuous.

Each tier produces the same 13-dimension score on the same 0–100 scale — the difference is depth, not shape. A Starter score is comparable to a Pro score is comparable to an Enterprise score, and all three are comparable across vendors.

12. What PartnerScope is NOT

Not a legal opinion. Regulatory anchors are indicative. We do not advise on whether a specific deployment falls under EU AI Act high-risk classification — that's your counsel's call.
Not a penetration test. Automated tests are non-invasive and rate-limited. We don't attempt exploitation of vendor infrastructure.
Not a SOC 2 / ISO 27001 audit. We check whether the vendor has the certificate, its scope, its age, and whether the claimed scope matches the product you're buying. We do not re-perform the audit.
Not a DPIA. A Data Protection Impact Assessment is the deployer's obligation under GDPR Art. 35. Our report is an input to your DPIA — not the DPIA itself.
Not white-box. Red-team testing is strictly black-box unless the vendor agrees otherwise in writing. We do not see model weights, training code, or internal tool chains.
Not exhaustive on automated tests. Third-party APIs have rate limits. A clean scan today is a clean scan today — not a guarantee for tomorrow. That's why Enterprise includes continuous monitoring.
Not a substitute for vendor dialogue. A high PartnerScope score is a green light to start the procurement conversation, not the end of it.

13. Versioning & review cadence

Framework version: 13.0 (2026-04-20). Increments when dimensions are added or removed.
Scoring version: 1.0.0. Increments when weights or the composite formula change.
Both versions are stamped on every PDF report for audit reproducibility.
Review cadence: quarterly, or immediately when an EU-level implementing or delegated act changes a cited obligation.

Version	Date	Change
1.0	2026-04-22	First public release.

14. Glossary

Annex III: Annex of the EU AI Act listing use cases considered "high-risk."
BCP / DR: Business Continuity Planning / Disaster Recovery.
CT log: Certificate Transparency; the public log of every TLS certificate ever issued.
CVE / CVSS: Common Vulnerabilities and Exposures; the severity-scoring system applied to them.
DORA: Digital Operational Resilience Act (EU) 2022/2554.
DPA: Data Processing Agreement. The contract required by GDPR Art. 28.
DPIA: Data Protection Impact Assessment (GDPR Art. 35).
HIBP: Have I Been Pwned — a public database of known data breaches.
Likert: A scaled-answer format (1 to 5) used in structured surveys.
MSA / SaaS agreement: Master Services Agreement / Software-as-a-Service contract.
NIS2: Network and Information Security Directive (EU) 2022/2555.
NIST AI RMF: U.S. National Institute of Standards and Technology's AI Risk Management Framework.
OWASP LLM Top 10: OWASP Foundation's list of the ten most common LLM-application vulnerabilities.
PEP: Politically Exposed Persons (relevant to sanctions screening).
RACI: Responsible / Accountable / Consulted / Informed — responsibility matrix.
RAG: Retrieval-Augmented Generation — an LLM architecture that retrieves external documents at query time.
RoE: Rules of Engagement. The scoping document signed before any red-team payload is sent.
SBOM: Software Bill of Materials. A dependency manifest.
UBO: Ultimate Beneficial Owner. The natural person who ultimately owns or controls an entity.

Ready to assess a vendor?

The same methodology, priced three ways. Starter is same-day.

See the plans → Email us

Questions about the methodology? elshan.musayev@partnerscope.eu · Markdown copy: download.