Testing Hierarchy and Flow

Overview

This document visualizes the complete testing hierarchy and flow for the Kahuna Testing & Security Framework, showing how components build upon each other and how test results flow through the system across multiple agent frameworks.

The testing hierarchy now operates in three dimensions:

Testing Categories (Quality vs Security)
Testing Phases (Development vs Runtime)
Framework Scope (Framework-Agnostic vs Framework-Specific)

This three-dimensional approach enables testing of AI components independent of specific agent frameworks while also supporting framework-specific validation.

Complete Testing Hierarchy

flowchart TD
    Start[Testing Framework] --> Split{Test Category}

    Split -->|Quality| QA[Quality Assurance]
    Split -->|Security| SEC[Security Testing]

    %% Quality Branch
    QA --> QPhase{Testing Phase}
    QPhase -->|Development| QDev[Development-Time Quality]
    QPhase -->|Runtime| QRun[Runtime Quality Monitoring]

    %% Security Branch
    SEC --> SPhase{Testing Phase}
    SPhase -->|Development| SDev[Development-Time Security]
    SPhase -->|Runtime| SRun[Runtime Security Monitoring]

    %% Framework Scope Split
    QDev --> QScope{Framework Scope}
    QRun --> QRScope{Framework Scope}
    SDev --> SScope{Framework Scope}
    SRun --> SRScope{Framework Scope}

    %% Framework-Agnostic Path
    QScope -->|Agnostic| QAgnostic[Universal Quality Tests]
    QScope -->|Specific| QSpecific[Framework Quality Tests]
    SScope -->|Agnostic| SAgnostic[Universal Security Tests]
    SScope -->|Specific| SSpecific[Framework Security Tests]

    %% Component Testing Hierarchy - Agnostic
    QAgnostic --> LLM_Q[LLM Quality Tests<br/>100% Agnostic]
    QAgnostic --> MCP_Q[MCP Server Quality Tests<br/>100% Agnostic]
    SAgnostic --> LLM_S[LLM Security Tests<br/>100% Agnostic]
    SAgnostic --> MCP_S[MCP Server Security Tests<br/>100% Agnostic]

    %% Component Testing - Mixed
    LLM_Q --> Agent_Q[Agent Quality Tests<br/>60% Agnostic]
    MCP_Q --> Agent_Q
    LLM_S --> Agent_S[Agent Security Tests<br/>60% Agnostic]
    MCP_S --> Agent_S
    QSpecific --> Agent_Q
    SSpecific --> Agent_S

    %% Business Layer
    Agent_Q --> WF_Q[Workflow Quality Tests<br/>20% Agnostic]
    Agent_S --> WF_S[Workflow Security Tests<br/>20% Agnostic]

    %% Results Flow
    WF_Q --> Results[Test Results]
    WF_S --> Results
    QRScope --> Metrics[Runtime Metrics]
    SRScope --> Metrics

    Results --> Kahuna_Exec[Kahuna Executive Mode]
    Metrics --> Kahuna_Exec

    %% Input from Manager Mode
    Kahuna_Mgr[Kahuna Manager Mode] --> Requirements[Business Requirements<br/>+ Target Frameworks]
    Requirements --> Start

    style Start fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
    style QA fill:#4CAF50,stroke:#388E3C,stroke-width:2px,color:#fff
    style SEC fill:#F44336,stroke:#C62828,stroke-width:2px,color:#fff
    style QAgnostic fill:#9C27B0,stroke:#7B1FA2,stroke-width:2px,color:#fff
    style SAgnostic fill:#9C27B0,stroke:#7B1FA2,stroke-width:2px,color:#fff
    style Results fill:#FF9800,stroke:#F57C00,stroke-width:2px,color:#fff
    style Metrics fill:#FF9800,stroke:#F57C00,stroke-width:2px,color:#fff
    style Kahuna_Exec fill:#9C27B0,stroke:#7B1FA2,stroke-width:2px,color:#fff
    style Kahuna_Mgr fill:#00BCD4,stroke:#0097A7,stroke-width:2px,color:#fff

Framework-Agnostic Testing Flow

flowchart TD
    Start[Test Request] --> Detect{Detect Framework}

    Detect -->|Aurite| Native[Native Execution]
    Detect -->|LangChain| LC_Adapter[LangChain Adapter]
    Detect -->|AutoGen| AG_Adapter[AutoGen Adapter]
    Detect -->|Other| Generic_Adapter[Generic Adapter]

    LC_Adapter --> Translate[Translate to Aurite Format]
    AG_Adapter --> Translate
    Generic_Adapter --> Translate
    Native --> Categorize{Categorize Component}

    Translate --> Categorize

    Categorize -->|LLM/MCP| Universal[Universal Tests<br/>100% Agnostic]
    Categorize -->|Agent| Hybrid[Hybrid Tests<br/>60% Agnostic]
    Categorize -->|Workflow| Specific[Specific Tests<br/>20% Agnostic]

    Universal --> Cache_Universal[Cache Universal Results]
    Hybrid --> Split_Tests{Split Tests}
    Specific --> Framework_Tests[Framework-Specific Tests]

    Split_Tests -->|Agnostic| Agnostic_Tests[Run Agnostic Tests]
    Split_Tests -->|Specific| Framework_Tests

    Agnostic_Tests --> Cache_Agnostic[Cache Agnostic Results]
    Framework_Tests --> Cache_Specific[Cache Framework Results]

    Cache_Universal --> Aggregate[Aggregate Results]
    Cache_Agnostic --> Aggregate
    Cache_Specific --> Aggregate

    Aggregate --> Normalize[Normalize to Standard Format]
    Normalize --> Report[Generate Report]

    style Start fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
    style Universal fill:#4CAF50,stroke:#388E3C,stroke-width:2px,color:#fff
    style Hybrid fill:#FF9800,stroke:#F57C00,stroke-width:2px,color:#fff
    style Specific fill:#F44336,stroke:#C62828,stroke-width:2px,color:#fff
    style Report fill:#9C27B0,stroke:#7B1FA2,stroke-width:2px,color:#fff

Test Execution Flow

flowchart TD
    Start[Start Testing] --> Check{Check Dependencies}

    Check -->|No Dependencies| Foundation[Test Foundation Components]
    Check -->|Has Dependencies| Wait[Wait for Dependencies]

    Foundation --> Cache1[Cache LLM Results<br/>Universal Cache]
    Foundation --> Cache2[Cache MCP Results<br/>Universal Cache]

    Cache1 --> Ready1[LLM Tests Complete]
    Cache2 --> Ready2[MCP Tests Complete]

    Ready1 --> AgentTest{Agent Testing}
    Ready2 --> AgentTest

    Wait --> DepCheck{Dependencies Ready?}
    DepCheck -->|No| Wait
    DepCheck -->|Yes| AgentTest

    AgentTest --> Inherit[Inherit Foundation Results<br/>Cross-Framework]
    Inherit --> AgentSpecific[Run Agent Tests<br/>60% Agnostic, 40% Specific]
    AgentSpecific --> CacheAgent[Cache Agent Results]

    CacheAgent --> ReadyAgent[Agent Tests Complete]

    ReadyAgent --> WorkflowTest{Workflow Testing}

    WorkflowTest --> InheritAll[Inherit All Agent Results<br/>Cross-Framework]
    InheritAll --> WFSpecific[Run Workflow Tests<br/>20% Agnostic, 80% Specific]
    WFSpecific --> FinalResults[Generate Final Results]

    FinalResults --> Report[Generate Reports]
    Report --> End[Testing Complete]

    style Start fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
    style End fill:#4CAF50,stroke:#388E3C,stroke-width:2px,color:#fff
    style Foundation fill:#FF9800,stroke:#F57C00,stroke-width:2px,color:#fff
    style AgentTest fill:#9C27B0,stroke:#7B1FA2,stroke-width:2px,color:#fff
    style WorkflowTest fill:#E91E63,stroke:#C2185B,stroke-width:2px,color:#fff
    style FinalResults fill:#00BCD4,stroke:#0097A7,stroke-width:2px,color:#fff

Quality vs Security Score Propagation

flowchart TD
    subgraph Quality[Quality Score Propagation]
        Q_LLM[LLM Quality: 0.94] --> Q_Calc1[Weighted Average]
        Q_MCP[MCP Quality: 0.96] --> Q_Calc1
        Q_Agent_Specific[Agent Quality: 0.90] --> Q_Calc1
        Q_Calc1 --> Q_Agent[Agent Quality: 0.93]

        Q_Agent --> Q_Calc2[Weighted Average]
        Q_WF_Specific[Workflow Quality: 0.95] --> Q_Calc2
        Q_Calc2 --> Q_WF[Workflow Quality: 0.94]
    end

    subgraph Security[Security Score Propagation]
        S_LLM[LLM Security: 0.96] --> S_Calc1[Take Minimum]
        S_MCP[MCP Security: 0.99] --> S_Calc1
        S_Agent_Specific[Agent Security: 0.92] --> S_Calc1
        S_Calc1 --> S_Agent[Agent Security: 0.92]

        S_Agent --> S_Calc2[Take Minimum]
        S_WF_Specific[Workflow Security: 0.95] --> S_Calc2
        S_Calc2 --> S_WF[Workflow Security: 0.92]
    end

    style Q_Calc1 fill:#4CAF50,stroke:#388E3C,stroke-width:2px,color:#fff
    style Q_Calc2 fill:#4CAF50,stroke:#388E3C,stroke-width:2px,color:#fff
    style S_Calc1 fill:#F44336,stroke:#C62828,stroke-width:2px,color:#fff
    style S_Calc2 fill:#F44336,stroke:#C62828,stroke-width:2px,color:#fff

Development vs Runtime Testing Flow

flowchart LR
    subgraph Development[Development-Time Testing]
        D1[Comprehensive Tests] --> D2[All Edge Cases]
        D2 --> D3[Performance Benchmarks]
        D3 --> D4[Security Audits]
        D4 --> D5[Generate Baselines]
    end

    subgraph Runtime[Runtime Monitoring]
        R1[Selective Tests] --> R2[Critical Checks Only]
        R2 --> R3[Real-time Filtering]
        R3 --> R4[Anomaly Detection]
        R4 --> R5[Alert on Deviations]
    end

    D5 -->|Deploy| Prod[Production Environment]
    Prod --> Runtime

    Runtime -->|Feedback| Improve[Improve Tests]
    Improve -->|Update| Development

    style Development fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
    style Runtime fill:#FF9800,stroke:#F57C00,stroke-width:2px,color:#fff
    style Prod fill:#4CAF50,stroke:#388E3C,stroke-width:2px,color:#fff

Test Result Caching Strategy

flowchart TD
    Test[Run Test] --> Result[Test Result]
    Result --> Scope{Framework Scope?}

    Scope -->|Agnostic| Universal_Cache[Universal Cache<br/>Shared Across Frameworks]
    Scope -->|Specific| Framework_Cache[Framework Cache<br/>Isolated per Framework]

    Universal_Cache --> Meta_U[Store Metadata<br/>• Component ID<br/>• Version<br/>• Timestamp<br/>• TTL<br/>• Framework: ANY]
    Framework_Cache --> Meta_F[Store Metadata<br/>• Component ID<br/>• Version<br/>• Timestamp<br/>• TTL<br/>• Framework: Specific]

    Meta_U --> TTL_U{Check TTL}
    Meta_F --> TTL_F{Check TTL}

    Request[New Test Request] --> Framework_Check{Which Framework?}

    Framework_Check --> CheckCache{Check Cache}

    CheckCache -->|Universal Found| TTL_U
    CheckCache -->|Framework Found| TTL_F
    CheckCache -->|Not Found| Test

    TTL_U -->|Valid| UseCache[Use Cached Result<br/>Cross-Framework]
    TTL_F -->|Valid| UseCache
    TTL_U -->|Expired| Test
    TTL_F -->|Expired| Test

    UseCache --> Inherit[Inherit to Dependent]

    Update[Component Update] --> Invalidate{Which Cache?}
    Invalidate -->|LLM/MCP| Invalidate_Universal[Invalidate Universal<br/>All Frameworks Affected]
    Invalidate -->|Agent/Workflow| Invalidate_Specific[Invalidate Specific<br/>Single Framework]

    Invalidate_Universal --> Cascade[Cascade Invalidation]
    Invalidate_Specific --> Cascade
    Cascade --> Deps[Invalidate All Dependents]

    style Test fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
    style UseCache fill:#4CAF50,stroke:#388E3C,stroke-width:2px,color:#fff
    style Universal_Cache fill:#9C27B0,stroke:#7B1FA2,stroke-width:2px,color:#fff
    style Framework_Cache fill:#FF9800,stroke:#F57C00,stroke-width:2px,color:#fff
    style Invalidate_Universal fill:#F44336,stroke:#C62828,stroke-width:2px,color:#fff

Integration with Kahuna Ecosystem

flowchart TD
    subgraph Manager[Kahuna Manager Mode]
        BW[Business Workflow] --> Req[Requirements]
        PD[Project Document] --> Req
        Req --> Context[Project Context]
    end

    subgraph Developer[Kahuna Developer Mode]
        Context --> Testing[Testing Framework]
        Testing --> QTests[Quality Tests]
        Testing --> STests[Security Tests]

        QTests --> DevTime[Development Testing]
        STests --> DevTime

        DevTime --> Deploy[Deployment]
        Deploy --> RunTime[Runtime Monitoring]

        QTests --> RunTime
        STests --> RunTime
    end

    subgraph Executive[Kahuna Executive Mode]
        RunTime --> Metrics[Metrics Collection]
        Metrics --> QMetrics[Quality Metrics]
        Metrics --> SMetrics[Security Metrics]
        Metrics --> BMetrics[Business KPIs]

        QMetrics --> Dashboard[Executive Dashboard]
        SMetrics --> Dashboard
        BMetrics --> Dashboard
    end

    style Manager fill:#00BCD4,stroke:#0097A7,stroke-width:2px,color:#fff
    style Developer fill:#4CAF50,stroke:#388E3C,stroke-width:2px,color:#fff
    style Executive fill:#9C27B0,stroke:#7B1FA2,stroke-width:2px,color:#fff
    style Testing fill:#FF9800,stroke:#F57C00,stroke-width:2px,color:#fff
    style Dashboard fill:#E91E63,stroke:#C2185B,stroke-width:2px,color:#fff

Tabular Representations

Component Test Inheritance Matrix

Component	Framework Scope	Depends On	Inherits From	Cross-Framework Inheritance	New Tests	Inheritance Benefit
LLM	100% Agnostic	None	None	All results shared	• Prompt injection • Content safety • Response quality	Foundation (0% inherited)
MCP Server	100% Agnostic	None	None	All results shared	• API security • Performance • Availability	Foundation (0% inherited)
Agent	60% Agnostic	LLM + MCP	• LLM security scores • MCP performance metrics	60% cross-framework	• Tool selection • Goal achievement • Multi-turn coherence	~60% inherited
Workflow	20% Agnostic	Multiple Agents	• All agent scores • All foundation scores	20% cross-framework	• Business logic • End-to-end flow • Data consistency	~70% inherited

Quality vs Security Score Calculation

Aspect	Quality Scoring	Security Scoring	Rationale
Method	Weighted Average	Minimum (Weakest Link)	Quality can be averaged; Security fails at weakest point
LLM Score	0.94	0.96	Foundation scores
MCP Score	0.96	0.99	Foundation scores
Agent-Specific	0.90	0.92	New agent tests
Agent Final	0.93 (weighted)	0.92 (minimum)	Combined result
Workflow-Specific	0.95	0.95	New workflow tests
Workflow Final	0.94 (weighted)	0.92 (minimum)	Final scores

Development vs Runtime Testing Comparison

Aspect	Development Testing	Runtime Testing	Time Allocation
Coverage	100% - All test cases	10-20% - Critical only	Dev: 100%, Runtime: Sampling
Execution Time	Minutes to hours	Milliseconds to seconds	Dev: Thorough, Runtime: Fast
Test Types	• Edge cases • Stress tests • Benchmarks	• Security filters • Quality scoring • Anomaly detection	Dev: Comprehensive, Runtime: Targeted
Frequency	On-Demand, Per deployment	Per request/response	Dev: Once, Runtime: Continuous
Action on Failure	Block deployment	Log, alert, or block	Dev: Prevent, Runtime: Respond

Test Categories by Component

Component	Framework Scope	Quality Tests (Agnostic)	Quality Tests (Specific)	Security Tests (Agnostic)	Security Tests (Specific)	Inherited	New	Total
LLM	100% Agnostic	• Coherence • Instruction following • Format compliance • Quality scoring	N/A	• Prompt injection • Content safety • Data leakage • Real-time filtering	N/A	0	8	8
MCP Server	100% Agnostic	• API compliance • Performance • Error handling • Availability	N/A	• Authentication • Input validation • Rate limiting • Access monitoring	N/A	0	8	8
Agent	60% Agnostic	• Tool selection • Goal achievement	• Memory management • State handling	• Permission boundaries • Action authorization	• Framework auth • Context isolation	16	8	24
Workflow	20% Agnostic	• Business compliance • End-to-end success	• Orchestration • Inter-agent communication	• Data isolation • Audit completeness	• Framework security • State management security	48	8	56

Kahuna Integration Points

Kahuna Mode	Role	Input/Output	Testing Interaction
Manager Mode	Requirements Provider	• Business workflows • Project documents • Quality thresholds	Defines what to test
Developer Mode	Testing Executor	• Test implementation • Development testing • Runtime monitoring	Executes all testing
Executive Mode	Metrics Consumer	• Quality dashboards • Security reports • Business KPIs	Receives test results

Alert Severity and Response Matrix (Rough Draft - still a WIP)

Severity	Quality Threshold	Security Threshold	Response Time	Action
Critical	< 0.5	Any breach	Immediate	Block + Alert + Investigate
High	< 0.7	Score < 0.8	< 5 min	Block + Alert team
Medium	< 0.85	Score < 0.9	< 1 hour	Log + Monitor
Low	< 0.95	Score < 0.95	< 24 hours	Log for analysis

Cache Strategy Parameters

Parameter	Value	Purpose	Impact	Framework Scope
TTL (LLM/MCP)	24 hours	Foundation rarely changes	High reuse across frameworks	Universal cache
TTL (Agent - Agnostic)	12 hours	Core behaviors stable	Cross-framework reuse	Universal cache
TTL (Agent - Specific)	4 hours	Framework features change	Framework-isolated	Framework cache
TTL (Workflow)	1 hour	Frequent updates	Fresh results	Framework cache
Cross-Framework Sharing	Enabled	Maximize test reuse	40-60% reduction in testing	LLM/MCP/Agent core
Invalidation	On update	Maintain consistency	Cascade to dependents	Both cache types
Cache Hit Rate	~85%	Increased with universal cache	85% time saved	Overall

Framework Compatibility Matrix

Framework	LLM Support	MCP Support	Agent Support	Workflow Support	Adapter Status	Testing Coverage
Aurite	100%	100%	100%	100%	Native	100%
LangChain	100%	100%	80%	70%	Available	85%
AutoGen	100%	100%	75%	65%	Available	80%
CrewAI	100%	100%	70%	60%	In Development	75%
Custom	100%	100%	Varies	Varies	Generic	60-80%

Summary

This hierarchical testing structure provides:

Three-Dimensional Organization: Quality/Security × Development/Runtime × Agnostic/Specific
Framework Independence: LLM and MCP tests work across all frameworks
Cross-Framework Inheritance: Agnostic test results shared between frameworks
Efficient Execution: Through enhanced caching and result reuse
Comprehensive Coverage: All components tested at appropriate levels
Business Integration: From requirements (Manager) to metrics (Executive)
Significant Time Savings: 70-85% reduction through compositional and cross-framework approach

The framework ensures that each component is tested appropriately while maximizing test reuse across different agent frameworks through:

Universal caching for framework-agnostic components
Intelligent inheritance of results across framework boundaries
Adapter pattern for framework translation
Standardized formats for cross-framework comparison

The tables and diagrams above provide a structured view that complements the visual representations, making it easy to:

Compare testing approaches across components and frameworks
Understand inheritance relationships both within and across frameworks
Calculate time savings from cross-framework test reuse
Plan test implementation for multi-framework environments
Set appropriate thresholds and alerts for each framework

For detailed architecture on framework-agnostic testing, see Framework-Agnostic Testing Architecture.