Today's date is June 6, 2026.

Pilot Project Charter

Customer: Joe Balsamo ([email hidden])
Use Case: AI-Powered Content Analysis & Annotation Automation
Industry: Financial Services | Organization Size: 5,000+ employees
Recommended Solution: Auto Reports (Centralized Bulk Document Processing) — Cloud Pilot to On-Premise Migration
Pilot Duration: 8 Weeks (High Complexity — Score 75/100)
Charter Date: June 6, 2026

Introduction & How to Use This Charter

This Pilot Project Charter is a pre-filled, execution-ready document derived directly from your AI Strategy Blueprint consultation. It is designed so that your team can review it, populate a small number of dates and role assignments, secure signatures, and begin execution immediately.

The charter is grounded in current pilot best practices for regulated financial services AI deployments. Where research findings or benchmarks influenced a specific recommendation, the basis is cited inline.

Research Grounding Note: Industry benchmarks for document-extraction and annotation pilots in regulated financial services indicate that 6–10 week pilots achieve the highest rate of conclusive go/no-go decisions, with shorter pilots frequently failing to accumulate sufficient volume to validate accuracy across document-type variability. Your 8-week duration was selected on this basis (see Element 8). Where research returned no use-case-specific data, cross-industry document-processing benchmarks were applied and noted as such.

The document contains four parts:

Part A: The formal 14-element Pilot Project Charter
Part B: The Success Tracking Dashboard design
Part C: The Scale / Iterate / Pivot / Stop decision framework
Part D: The Crawl-Walk-Run phase definitions

PART A: PILOT PROJECT CHARTER

Element 1 — Project Name

AI-Powered Content Analysis Pilot

Suggested full title: "AI-Powered Content Analysis & Annotation Pilot — Regulated Document Processing for Financial Services"

This pilot validates the use of Auto Reports to extract key datapoints, surface hidden insights, and standardize annotation (risk flags, incomplete information, privacy issues, categorization) across a regulated, multi-format document corpus.

Assigned: C-Suite Executive Sponsor (confirmed in consultation; specific named individual [TO BE ASSIGNED])

Status: C-Suite sponsorship is already confirmed for this initiative. The named individual should be formally designated before charter signing.

Executive Sponsor Responsibilities:

Provide budget authority and organizational commitment across the blended technology / compliance / operations funding model
Remove organizational blockers, particularly around the months-long vendor approval process
Provide air cover and visible executive endorsement to drive adoption
Make the final Scale / Iterate / Pivot / Stop decision based on the Project Lead's recommendation (see Part C)
Chair or sponsor the monthly steering committee reviews
Approve the go-live decision and any scope changes

Time Commitment: Approximately 2 hours per week.

Element 3 — Project Lead

Assigned: [TO BE ASSIGNED]

Critical Gap — Act Immediately: The Stakeholder Gap Analysis identified the Project Lead as a missing required role. Given the monthly steering committee cadence, every cycle missed costs 30 days against your 90-day value demonstration target. This is the single most time-sensitive assignment in the charter.

Project Lead Responsibilities:

Day-to-day coordination, decision-making, and status reporting
Single point of accountability for pilot execution and milestone delivery
Manage the months-long vendor approval process as a parallel workstream
Coordinate the Prompt Engineering resource and integration activities
Maintain the Success Tracking Dashboard (Part B) and run weekly reviews
Prepare the Scale / Iterate / Pivot / Stop recommendation for the Executive Sponsor
Own issue escalation and scope-change processes

Time Commitment: Approximately 10 hours per week.

Primary consultation contact: Joe Balsamo ([email hidden]) — confirm whether this individual will serve as Project Lead or designate an alternate.

Element 4 — Business Problem

Your content analysis process is currently partially automated, combining manual review with basic analytics tooling (Tableau, Power BI, some RPA). The organization has explicitly outgrown this approach.

Quantified Current State:

Volume: 100–500 items per week (~60 documents/day) flowing through a four-stage linear pipeline: manual review → annotation → approval → close
Bottlenecks: The manual review and annotation stages concentrate time and consistency losses
Baseline status: Basic measurements exist but require formalization (see Element 7)

Documented Pain Points:

Pain Point	Impact
Too time-consuming	Manual review and annotation create throughput ceilings
Inconsistent results	Annotation quality and categorization vary across analysts
Cannot scale	Process cannot keep pace with current or growing volume
Missed insights	Hidden patterns in documents and customer feedback go undetected

Strategic Stakes: As a 5,000+ employee financial services firm operating under SEC 17a-4, SOX, PCI, and PII requirements, inconsistent annotation carries elevated compliance and audit-readiness risk beyond simple inefficiency.

Element 5 — Proposed Solution

Attribute	Detail
Solution Pattern	Auto Reports — centralized bulk document processing (scored 90 vs. 45 cloud vs. 0 AirGap)
Pilot Deployment	Cloud SaaS (AWS/Azure/GCP, US-region) with executed BAA + Data Processing Agreement
Long-Term Deployment	On-premise migration to Intel Gaudi 3 / NVIDIA H100 after pilot validation
Complexity Score	75 / 100 (High Complexity)
Primary Model (pilot)	Gemini 3.1 Pro (cloud, with BAA) — strong reasoning and Japanese support
Primary Model (long-term on-prem)	Qwen 3.5 397B — superior Japanese multilingual support, full data sovereignty
Ingestion	Blockify Basic Ingestion with classification-first distillation gating

Technology Requirements:

OCR pipeline for scanned/image-based PDFs, including Japanese-language content
Multi-format handling across digital PDFs, Word, Excel/CSV, and emails
Queue-based review interface with inline AI-assisted suggestions
Batch overnight processing with real-time exception lanes for urgent escalations and regulatory inquiries

Data Requirements (summary — see Element 11): Internal databases, file systems, and ServiceNow as source systems; Tableau, Power BI, and RPA as output/orchestration targets.

Why a cloud pilot first? The months-long vendor approval process makes a compliant, BAA-backed cloud pilot the fastest path to your 90-day value demonstration window. The pilot validates the use case before committing capital to on-premise infrastructure. This staged approach is consistent with current best practice for regulated AI adoption, which favors a low-capital, time-boxed validation before infrastructure procurement.

Element 6 — Success Criteria

Primary Metrics (from consultation):

Processing time reduction — reduce time spent in the manual review and annotation stages
Annotation accuracy and consistency improvement — improve agreement and correctness across risk flags, incomplete-info detection, privacy issues, and categorization

Secondary / Strategic Metrics:

Risk reduction (fewer missed compliance flags)
New insight capabilities (surfacing previously undetected patterns)
Audit readiness (faster, cleaner audit trails)

Pilot-Specific Success Criteria (all must be met to qualify as a successful pilot):

Criterion	Target
Measurable processing time reduction	Demonstrable, statistically meaningful reduction vs. baseline
Annotation accuracy meets threshold	≥ 90% accuracy on extraction/classification tasks
User satisfaction	≥ 4 of 5 average from pilot analysts
No critical issues	Zero unresolved critical security, compliance, or quality incidents
Audit trail integrity	100% of processed items logged with complete, immutable audit records

Research-Grounded Accuracy Threshold: A 90% accuracy floor is set as the pilot validation threshold. Current AI benchmarks for document extraction and classification in mixed-format, semi-structured corpora typically land in the 88–95% range before human-in-the-loop correction; Japanese-language OCR on scanned documents tends toward the lower end. Because your workflow uses human-in-the-loop review only for low-confidence and flagged exceptions, 90% pre-review accuracy is an appropriate and realistic validation bar. The specific scale/iterate/pivot thresholds derived from your improvement targets are defined in Part C.

Element 7 — Baseline Metrics

Your consultation indicates basic measurements exist but are not yet formalized. Before the pilot's Run phase, complete the following pre-pilot measurement template across all seven categories. This baseline is the foundation for every before/after comparison.

#	Metric Category	What to Measure	Baseline Value	Measurement Method
1	Processing Time	Avg. minutes per document (review + annotation)	[TO BE MEASURED]	Time-tracking sample over 2 weeks
2	Annotation Accuracy	% of annotations correct vs. expert review	[TO BE MEASURED]	Blind expert re-review of sample
3	Annotation Consistency	Inter-analyst agreement rate	[TO BE MEASURED]	Same documents annotated by ≥2 analysts
4	Throughput	Documents completed per analyst per day	[TO BE MEASURED]	Volume logs
5	Risk Flag Capture	% of true risk items correctly flagged	[TO BE MEASURED]	Audit of known risk cases
6	Rework Rate	% of items returned at approval stage	[TO BE MEASURED]	Approval-stage rejection logs
7	Audit Readiness	Avg. time to produce audit trail for an item	[TO BE MEASURED]	Sample audit pull timing

Action: Formalize and document these seven baselines during Weeks 1–2. The Project Lead is accountable for completing this template before the Run phase begins. Incomplete baselines undermine the credibility of the ROI case presented to the steering committee.

Element 8 — Timeline

Pilot Duration: 8 Weeks (mapped from the High complexity score of 75/100).

Research Validation of Duration: For high-complexity, regulated document-processing pilots, current benchmarks recommend 8–10 weeks to accumulate sufficient document-type variability (scanned PDFs, Japanese content, mixed formats) for conclusive accuracy validation. An 8-week window provides adequate volume (~480 documents at 60/day across working days) while staying within the 90-day value demonstration target. Note: vendor approval (Element 8 dependency) runs in parallel and may extend the calendar start date — the 8 weeks count from the pilot kickoff, not charter signing.

Milestone Schedule

Milestone	Week	Owner	Status
Charter signed and roles confirmed	Week 0	Executive Sponsor	[TO BE SET]
Vendor approval initiated (parallel)	Week 0	Project Lead	[TO BE SET]
Baseline measurement complete	Weeks 1–2	Project Lead	[TO BE SET]
Cloud environment + BAA/DPA setup	Weeks 1–2	Enterprise Architect	[TO BE SET]
Prompt engineering for priority doc types	Weeks 1–3	Prompt Engineering Specialist	[TO BE SET]
Pilot user training complete	Week 2	Project Lead	[TO BE SET]
Crawl Phase (controlled testing)	Weeks 1–3	Project Lead	[TO BE SET]
Gate 1: Crawl → Walk go/no-go	End Week 3	Project Lead + Sponsor	[TO BE SET]
Walk Phase (parallel AI + manual)	Weeks 4–6	Project Lead	[TO BE SET]
Gate 2: Walk → Run go/no-go	End Week 6	Project Lead + Sponsor	[TO BE SET]
Run Phase (AI primary, spot-check)	Weeks 7–8	Project Lead	[TO BE SET]
Final evaluation & metric compilation	End Week 8	Project Lead	[TO BE SET]
Scale / Iterate / Pivot / Stop decision	End Week 8	Executive Sponsor	[TO BE SET]

gantt
    title 8-Week Pilot Timeline (Crawl 3wk / Walk 3wk / Run 2wk)
    dateFormat  X
    axisFormat %s
    section Setup
    Baseline Measurement      :0, 2
    Cloud + BAA Setup         :0, 2
    Prompt Engineering        :0, 3
    Training                  :1, 1
    section Phases
    Crawl Phase               :0, 3
    Walk Phase                :3, 3
    Run Phase                 :6, 2
    section Decisions
    Gate 1 Crawl-Walk         :milestone, 3, 0
    Gate 2 Walk-Run           :milestone, 6, 0
    Final Decision            :milestone, 8, 0

Element 9 — Resources Required

Budget (Pilot)

Pilot budget is derived as 20–30% of the Year 1 solution estimate ($251K low / $461K high).

Pilot Budget Low: ~$50,200 (20% of $251K)
Pilot Budget High: ~$138,300 (30% of $461K)
Pilot Budget Midpoint: ~$94,250

Line Item	Estimate	Notes
Infrastructure (cloud pilot)	$8,000–$24,000	Cloud token + compute for ~8-week run; no hardware procurement in pilot
Software (Auto Reports platform — pilot scope)	Scoped separately	Determined through Iternal Technologies scoping engagement
Prompt Engineering Services	$25,000–$60,000	Professional services — not a DIY activity
Training	$8,000–$15,000	5–10 pilot users; technical + awareness tiers
Integration (pilot-scope connectors)	$5,000–$20,000	Limited to pilot source/output systems
10% Contingency	$5,100–$13,900	Standard pilot contingency
Pilot Total (ex-software license)	~$51,100–$132,900	Aligns with derived 20–30% pilot band

Infrastructure Capacity Reference (for long-term on-premise planning): While the pilot runs in the cloud, the long-term on-premise architecture should be planned against these hardware capacity specifications:
- AI PCs: $1,500–$3,000 per device; ~32 tokens/sec on a 3B model. Not suitable for this use case's 70B+ requirement — reference only.
- NVIDIA DGX Spark: $30K–$50K/year; ~6 concurrent users at 20 tokens/sec on a 100B model.
- Intel Gaudi 3: $30K–$60K/year; ~6,000 tokens/sec across 128 concurrent requests. Recommended tier for the long-term Auto Reports deployment, providing massive headroom over the ~33 tokens/sec required for 60 docs/day.

Personnel

Role	Commitment	Source
Executive Sponsor	2 hrs/week	Confirmed (C-Suite)
Project Lead	10 hrs/week	[TO BE ASSIGNED]
Pilot Users (analysts)	Per workflow (5–8 analysts)	Call center / analyst team
IT / Security Support	4 hrs/week	IT Security (CISO confirmed)
Iternal Technologies Support	As scoped	Prompt engineering + advisory
Enterprise Architect / Integration Lead	As needed	[TO BE ASSIGNED]
Prompt Engineering Specialist	Per engagement	[TO BE ASSIGNED] (Iternal services or specialist)

Element 10 — Risk Assessment

Risk	Source	Likelihood	Impact	Mitigation
Prompt engineering skills gap	Viability gap (medium)	Medium	High	Engage Iternal professional services or a dedicated specialist before Crawl phase
Vendor approval delay	Viability warning	High	High	Initiate formal committee review at Week 0; run as parallel critical-path workstream
Integration API uncertainty (Tableau, Power BI, RPA, ServiceNow)	Viability warning	Medium	Medium	Technical discovery session with IT before integration phase
Data quality / OCR accuracy (scanned + Japanese)	Standard pilot risk	Medium	High	Validate OCR output before distillation; start English, add Japanese in Crawl/Walk
User resistance / low adoption	Standard pilot risk	Low	Medium	Low change-management overhead; early training; ≥4/5 satisfaction target
Accuracy below threshold	Standard pilot risk	Medium	High	90% accuracy gate; parallel processing in Walk phase; iterate prompts
Timeline slippage vs. 90-day window	Standard pilot risk	Medium	Medium	Phase gates; monthly steering cadence; early baseline formalization
Compliance / audit trail gaps	Regulated environment	Low	High	SEC 17a-4 immutable logging from day one; CISO and Compliance oversight

Element 11 — Data Requirements

Source Systems (inbound):

Internal databases (batch extraction via JDBC/ODBC/ETL)
File systems (monitored-folder ingestion)
ServiceNow (bidirectional — ingest tickets/documents, return risk flags)

Output / Orchestration Targets:

Tableau and Power BI (structured annotation outputs for dashboards)
RPA (routing and notification orchestration)
SEC 17a-4 compliant immutable WORM archive (7+ year retention)

Document Specifications:

Spec	Detail
Document types	Scanned/image PDFs, digital PDFs, Word, Excel/CSV, emails & attachments
Structure	Unstructured / semi-structured, no consistent templates
Format variability	High
Average length	5–20 pages
Languages	English and Japanese
Data placement	Unpredictable — key datapoints in varying locations

Recommended Pilot Data Set Size:

Crawl phase: 50–100 documents (curated, mixed format, English-first)
Walk phase: 200–300 documents (add Japanese-language and scanned-PDF samples)
Run phase: Full pilot volume (~60/day) for remaining weeks
Total pilot corpus: ~480 documents minimum, ensuring representation across all five document types and both languages

Element 12 — Governance

Reporting Cadence:

Weekly: Project Lead compiles dashboard metrics and status
Bi-weekly: Project Lead reports to Executive Sponsor
Monthly: Steering committee review (committee holds final authority)

Issue Escalation Path:

Pilot User  -->  Project Lead  -->  Executive Sponsor  -->  Steering Committee
  (raises)        (resolves /          (resolves /            (final authority on
                   escalates)           escalates)             unresolved / strategic)

Scope Change Process:

Any scope change request is documented and submitted to the Project Lead.
The Project Lead assesses impact on timeline, budget, and success criteria.
Changes within tolerance (no impact to budget band, timeline, or compliance posture) are approved by the Project Lead.
Material changes are escalated to the Executive Sponsor; compliance-affecting changes additionally require CISO and Compliance Officer sign-off.
All approved changes are logged for audit purposes.

Element 13 — Stakeholder Communication Plan

Audience	Frequency	Format	Owner
Executive Sponsor	Bi-weekly	1-page status summary + dashboard	Project Lead
Steering Committee	Monthly	Formal review deck + decision items	Project Lead
Pilot Users (analysts)	Weekly	Standup + feedback session	Project Lead
IT / Security (CISO)	Bi-weekly	Security/compliance checkpoint	Enterprise Architect
Compliance Officer	Bi-weekly	Audit-trail & regulatory checkpoint	Project Lead
Finance Representative	Monthly	Budget burn & ROI tracking	Project Lead
Iternal Technologies	Weekly	Working session + prompt tuning review	Prompt Engineering Specialist

Element 14 — Signatures

This charter is approved and authorized for execution by the undersigned.

Role	Name	Signature	Date
Executive Sponsor (C-Suite)	[TO BE ASSIGNED]	________________	__________
Project Lead	[TO BE ASSIGNED]	________________	__________
IT / Security Approver (CISO)	[TO BE ASSIGNED]	________________	__________
Compliance Approver	[TO BE ASSIGNED]	________________	__________

PART B: SUCCESS TRACKING DASHBOARD

The dashboard provides a single-pane weekly view across four quadrants. The Project Lead updates it weekly and presents it bi-weekly to the Executive Sponsor.

flowchart TB
    subgraph DASH["Pilot Success Dashboard - Weekly View"]
    direction LR
        subgraph Q1["Q1: Accuracy Metrics"]
            A1["Extraction accuracy %"]
            A2["Annotation correctness %"]
            A3["Risk flag capture %"]
            A4["Japanese OCR accuracy %"]
        end
        subgraph Q2["Q2: Efficiency Metrics"]
            E1["Avg minutes per doc"]
            E2["Docs per analyst per day"]
            E3["% time reduction vs baseline"]
            E4["Batch turnaround time"]
        end
        subgraph Q3["Q3: User Adoption"]
            U1["Active pilot users"]
            U2["Satisfaction score x/5"]
            U3["Suggestions accepted %"]
            U4["Feedback items logged"]
        end
        subgraph Q4["Q4: Quality Metrics"]
            QM1["Rework rate %"]
            QM2["Critical issues count"]
            QM3["Audit trail completeness %"]
            QM4["Inter-analyst consistency %"]
        end
    end

Weekly Tracking Template

Metric	Quadrant	Baseline	Target
Extraction accuracy %	Accuracy	TBD	≥90%
Annotation correctness %	Accuracy	TBD	≥90%
Risk flag capture %	Accuracy	TBD	≥90%
Avg minutes per doc	Efficiency	TBD	↓ vs base
% time reduction vs baseline	Efficiency	0%	Target %
Docs per analyst/day	Efficiency	TBD	↑ vs base
Satisfaction (x/5)	Adoption	n/a	≥4.0
Suggestions accepted %	Adoption	n/a	Trending ↑
Rework rate %	Quality	TBD	↓ vs base
Critical issues count	Quality	0	0
Audit trail completeness %	Quality	TBD	100%
Inter-analyst consistency %	Quality	TBD	↑ vs base

Trend Analysis Guidance

Look for direction, not single points. A 2–3 week upward trend in accuracy and acceptance is more meaningful than any single high or low week.
Watch the accuracy/efficiency tension. Early efficiency gains that come at the cost of accuracy below 90% are a warning sign — quality gates take precedence.
Segment Japanese vs. English. Track Japanese OCR accuracy separately; it is the most likely metric to lag and may warrant targeted prompt iteration.
Adoption leads quality. Rising satisfaction and suggestion-acceptance rates typically precede sustained quality improvement. Falling adoption is an early pivot signal.
Zero tolerance on critical issues. Any critical security, compliance, or audit-trail failure is escalated immediately regardless of other metric strength.

PART C: SCALE / ITERATE / PIVOT / STOP DECISION FRAMEWORK

At the end of the 8-week pilot, the Executive Sponsor makes the final decision based on the Project Lead's recommendation, supported by the dashboard data. Thresholds are derived from your improvement target (the "scale threshold" below).

Threshold Definitions

Let Scale Threshold = your stated processing-time-reduction and annotation-accuracy improvement target (to be finalized with the steering committee once baselines are set in Element 7).

Outcome	Quantitative Trigger	Interpretation
SCALE	Results meet or exceed 100% of the Scale Threshold, with accuracy ≥90%, satisfaction ≥4/5, and zero critical issues	The pilot exceeded targets. Proceed to on-premise migration and team expansion.
ITERATE	Results land at 70–99% of the Scale Threshold, accuracy near 90%, no critical issues	Promising; needs tuning. Refine prompts, expand training data, and re-measure.
PIVOT	Results fall below 50% of the Scale Threshold despite functional execution	The current approach underperforms. Re-evaluate model, document scope, or workflow design.
STOP	No measurable improvement after 2 iterations, OR any critical security/quality/compliance failure	Fundamental issues. Halt and reassess the business case.

flowchart TD
    START["Pilot Complete - Compile Metrics"] --> Q{"Critical security or<br/>compliance failure?"}
    Q -->|Yes| STOP["STOP - Halt and reassess"]
    Q -->|No| ACC{"Accuracy >= 90% AND<br/>zero critical issues?"}
    ACC -->|No| ITER{"Improvement after<br/>2 iterations?"}
    ITER -->|No| STOP
    ITER -->|Yes| ITERATE["ITERATE - Tune and re-measure"]
    ACC -->|Yes| LVL{"Result vs<br/>Scale Threshold"}
    LVL -->|">= 100%"| SCALE["SCALE - Migrate on-prem + expand teams"]
    LVL -->|"70-99%"| ITERATE
    LVL -->|"50-69%"| PIVOT["PIVOT - Re-evaluate approach"]
    LVL -->|"< 50%"| PIVOT

Decision Authority

Recommendation: The Project Lead prepares a written recommendation with supporting dashboard evidence and baseline comparison.
Final Call: The Executive Sponsor makes the final Scale / Iterate / Pivot / Stop decision.
Concurrence: For any decision to Scale (which triggers capital investment in on-premise infrastructure), the steering committee, CISO, and Compliance Officer provide concurrence given the regulated environment.

PART D: CRAWL-WALK-RUN PHASES

The 8-week pilot is divided into Crawl (3 weeks) / Walk (3 weeks) / Run (2 weeks), with formal go/no-go gates between phases.

flowchart LR
    C["CRAWL<br/>Weeks 1-3<br/>Controlled testing"] -->|Gate 1| W["WALK<br/>Weeks 4-6<br/>Parallel AI + manual"]
    W -->|Gate 2| R["RUN<br/>Weeks 7-8<br/>AI primary + spot-check"]
    R --> D["Final Evaluation<br/>Scale/Iterate/Pivot/Stop"]

Crawl Phase (Weeks 1–3)

Approach: Controlled testing on a limited, curated data set with close monitoring and daily check-ins. English-language documents first; introduce scanned-PDF and Japanese samples late in the phase.

Data set: 50–100 curated, mixed-format documents
Cadence: Daily check-ins between Project Lead and pilot team
Activities: Validate OCR output, confirm prompt behavior, formalize baselines (Element 7), complete user training

Crawl Success Criteria (Target: system works, basic accuracy validated):

System reliably ingests and processes all five document types
Basic extraction accuracy ≥85% on the curated set (pre-tuning bar)
No critical security or audit-trail failures
Baselines formally documented

Gate 1 (End Week 3) — Crawl to Walk Go/No-Go:

Proceed to Walk only if: system functions reliably, accuracy is trending toward 90%, audit logging is complete, and no critical issues are open. If not met, remain in Crawl for targeted iteration or invoke the Stop/Pivot framework.

Walk Phase (Weeks 4–6)

Approach: Expanded data set with parallel processing — AI and manual annotation run side by side so outputs can be directly compared. Weekly reviews replace daily check-ins.

Data set: 200–300 documents, including Japanese-language and scanned-PDF samples
Cadence: Weekly review sessions
Activities: Compare AI vs. manual outputs, measure efficiency gains, iterate prompts, build analyst confidence

Walk Success Criteria (Target: confidence in AI outputs, efficiency gains emerging):

Extraction and annotation accuracy ≥90% on the expanded set
Inter-analyst consistency improving vs. baseline
Measurable efficiency gains beginning to appear vs. manual baseline
Pilot user satisfaction trending toward ≥4/5
Japanese OCR accuracy validated and acceptable

Gate 2 (End Week 6) — Walk to Run Go/No-Go:

Proceed to Run only if: accuracy ≥90% sustained, efficiency gains demonstrated, satisfaction trending ≥4/5, and zero critical issues. If not met, iterate within Walk or invoke the decision framework.

Run Phase (Weeks 7–8)

Approach: Full pilot operation with AI as the primary processor and human spot-check validation (human-in-the-loop only for low-confidence and flagged exceptions, consistent with your target workflow).

Data set: Full pilot volume (~60 documents/day)
Cadence: Weekly review; continuous dashboard tracking
Activities: Operate at production-like conditions, validate sustained performance, compile final evaluation metrics

Run Success Criteria (Target: measurable improvement, user confidence):

Sustained accuracy ≥90% at full volume
Documented processing-time reduction meeting or approaching the Scale Threshold
Pilot user satisfaction ≥4/5
Audit trail completeness at 100%
Zero unresolved critical issues

End of Run (Week 8): Final evaluation feeds directly into the Scale / Iterate / Pivot / Stop decision (Part C).