MCP Server Testing

Overview

MCP (Model Context Protocol) servers are foundation components in the Kahuna Testing & Security Framework's compositional hierarchy. As 100% framework-agnostic components, MCP servers provide tools and resources through standardized interfaces that operate independently of any specific agent framework. This document describes how MCP servers are tested and how their test results enable efficient testing of higher-level components through inheritance.

Position in Testing Hierarchy

Foundation Layer (100% Framework-Agnostic)
├── LLMs ─────────┐
│                 ├─→ Agents (inherit results) ─→ Workflows
└── MCP Servers ──┘
    ↑
    Our focus

MCP servers, alongside LLMs, form the foundation layer of our testing architecture. They:

Have no dependencies on other components
Are tested directly via the MCP protocol
Provide inheritable test results to all agent frameworks
Enable 85% reduction in redundant testing through result inheritance

Architectural Principles

1. Framework Independence

MCP servers are completely framework-agnostic because they:

Expose tools through the standardized MCP protocol
Process requests and responses in a uniform format
Operate as stateless services independent of calling context
Maintain consistent behavior regardless of the agent framework

2. Direct API Testing

All MCP server tests bypass framework layers entirely:

# Direct MCP protocol testing - no framework mediation
mcp_client = MCPClient(server_url)
response = mcp_client.call_tool("get_weather", {"city": "San Francisco"})
# Validate response directly

3. Test Result Inheritance

MCP test results are structured for maximum reusability:

Validated tool calls become test patterns for agents
Performance baselines set expectations for agent testing
Security validations eliminate redundant security testing
Tool catalogs inform agent tool selection tests

Test Categories

Quality Testing (60% of tests)

Quality tests ensure MCP servers meet functional and performance requirements:

Test Type	Description	Inheritable Data
Tool Discovery	Validate tool enumeration	Available tools list
Tool Invocation	Test successful execution	Request/response patterns
Error Handling	Verify proper error responses	Error scenarios
Performance	Measure response times	Baseline metrics (p50, p95, p99)
API Compliance	Check MCP protocol adherence	Protocol version compatibility

Security Testing (40% of tests)

Security tests validate protection against threats and compliance:

Test Type	Description	Inheritable Data
Authentication	Verify access controls	Auth mechanisms validated
Input Validation	Test injection resistance	Safe input patterns
Rate Limiting	Check request throttling	Rate limit thresholds
Data Exposure	Prevent information leakage	Security clearances
Authorization	Validate permission boundaries	Allowed operations

Test Execution Model

Development-Time Testing

Comprehensive validation before deployment:

def test_mcp_server_development(server_config):
    """
    Full MCP server validation following architecture patterns
    """
    # 1. No dependency checking needed (foundation component)

    # 2. Run quality tests
    quality_results = {
        "tool_discovery": test_tool_discovery(server_config),
        "tool_invocation": test_tool_invocation(server_config),
        "error_handling": test_error_handling(server_config),
        "performance": test_performance_benchmarks(server_config),
        "api_compliance": test_mcp_compliance(server_config)
    }

    # 3. Run security tests
    security_results = {
        "authentication": test_authentication(server_config),
        "input_validation": test_input_validation(server_config),
        "rate_limiting": test_rate_limiting(server_config),
        "data_exposure": test_data_exposure(server_config),
        "authorization": test_authorization(server_config)
    }

    # 4. Structure for inheritance
    inheritable_data = {
        "validated_tool_calls": extract_tool_patterns(quality_results),
        "tool_catalog": build_tool_catalog(quality_results),
        "security_clearances": extract_security_validations(security_results),
        "performance_baselines": extract_performance_metrics(quality_results)
    }

    # 5. Cache in universal cache (24-hour TTL)
    cache_universal_results(server_config.id, {
        "quality_score": calculate_quality_score(quality_results),
        "security_score": calculate_security_score(security_results),
        "inheritable_data": inheritable_data,
        "ttl": 86400  # 24 hours
    })

    return test_results

Runtime Monitoring

Selective validation in production:

def monitor_mcp_server_runtime(server_id, request):
    """
    Lightweight runtime validation with caching
    """
    # 1. Check universal cache first
    cached = get_universal_cache(server_id)
    if cached and cached.is_valid():
        return cached.result

    # 2. Quick security check
    if not validate_input_safety(request):
        return block_request("Input validation failed")

    # 3. Performance monitoring (async)
    async_monitor_performance(server_id, request)

    # 4. Execute with monitoring
    return execute_with_monitoring(request)

Test Result Structure

MCP test results are structured to maximize inheritance value:

MCP_TEST_RESULT = {
    # Component identification
    "component_id": "weather_mcp_server",
    "component_type": "mcp_server",
    "version": "1.0.0",
    "timestamp": "2024-08-30T16:00:00Z",

    # Framework context (always universal for MCP)
    "framework": "universal",
    "is_agnostic": True,
    "cross_framework_validity": True,

    # Aggregate scores
    "quality_score": 0.95,
    "security_score": 0.98,

    # Detailed test results
    "test_results": {
        "quality": {
            "tool_discovery": {"status": "PASS", "tools_found": 3},
            "tool_invocation": {"status": "PASS", "success_rate": 0.98},
            "error_handling": {"status": "PASS", "coverage": 0.95},
            "performance": {"status": "PASS", "p95_ms": 150},
            "api_compliance": {"status": "PASS", "version": "1.0"}
        },
        "security": {
            "authentication": {"status": "PASS", "mechanism": "api_key"},
            "input_validation": {"status": "PASS", "blocked_injections": 10},
            "rate_limiting": {"status": "PASS", "limit": "100/min"},
            "data_exposure": {"status": "PASS", "leaks_found": 0},
            "authorization": {"status": "PASS", "boundaries_enforced": True}
        }
    },

    # Inheritable data for agents
    "inheritable_data": {
        "validated_tool_calls": [
            {
                "tool_name": "get_weather",
                "test_case": "valid_city",
                "request": {
                    "method": "tools/call",
                    "params": {
                        "name": "get_weather",
                        "arguments": {"city": "San Francisco"}
                    }
                },
                "response": {
                    "temperature": 65,
                    "conditions": "sunny",
                    "humidity": 70
                },
                "validation": {
                    "status": "PASS",
                    "response_time_ms": 145,
                    "security_checks": ["input_validated", "rate_limit_ok"]
                }
            }
            # Additional test cases...
        ],

        "tool_catalog": {
            "get_weather": {
                "available": True,
                "avg_response_time_ms": 150,
                "success_rate": 0.98,
                "security_validated": True,
                "test_coverage": ["valid_input", "invalid_input", "edge_cases"]
            },
            "get_forecast": {
                "available": True,
                "avg_response_time_ms": 200,
                "success_rate": 0.96,
                "security_validated": True,
                "test_coverage": ["valid_input", "date_ranges", "locations"]
            }
        },

        "performance_baselines": {
            "p50_ms": 100,
            "p95_ms": 150,
            "p99_ms": 300,
            "throughput_rps": 100
        },

        "security_clearances": {
            "injection_tested": True,
            "authentication_required": True,
            "rate_limits_enforced": True,
            "data_sanitization": True
        }
    },

    # Cache metadata
    "cache_ttl": 86400,  # 24 hours for foundation components
    "cache_key": "universal:mcp:weather_mcp_server:1.0.0",

    # Inheritance metadata
    "inherits_from": [],  # Empty - foundation component
    "inherited_by": ["agent", "workflow"],  # Components that will inherit
    "inheritance_benefit": "85% reduction in tool testing for agents"
}

Inheritance Model

How Agents Inherit MCP Test Results

When an agent is tested, it automatically inherits MCP validation:

# Agent test leveraging MCP inheritance
AGENT_TEST_WITH_MCP_INHERITANCE = {
    "agent_id": "weather_assistant",

    # Inherited from MCP testing (no retesting needed)
    "inherited_validations": {
        "weather_mcp_server": {
            "quality_score": 0.95,  # Inherited
            "security_score": 0.98,  # Inherited
            "tools_validated": ["get_weather", "get_forecast"],
            "skip_tool_execution_tests": True,  # Don't retest tools
            "use_validated_patterns": True,  # Reuse test cases
            "performance_baseline": 150  # Inherited p95
        }
    },

    # Agent focuses on integration testing only
    "agent_specific_tests": {
        "tool_selection": {
            "test": "Does agent select correct tool for query?",
            "result": "PASS"
        },
        "response_synthesis": {
            "test": "Does agent format tool output appropriately?",
            "result": "PASS"
        }
    },

    # Time saved through inheritance
    "testing_metrics": {
        "tests_inherited": 10,
        "tests_executed": 2,
        "time_saved": "85%"
    }
}

Cross-Framework Inheritance

MCP test results are valid across all agent frameworks:

Framework	Inheritance Usage	Time Saved
Aurite	Direct inheritance (native)	85%
LangChain	Via adapter translation	85%
AutoGen	Via adapter translation	85%
CrewAI	Via adapter translation	85%
Custom	Via generic adapter	80-85%

Caching Strategy

Universal Cache Integration

MCP test results are stored in the universal cache for cross-framework sharing:

CACHE_STRATEGY = {
    "cache_type": "universal",  # Shared across all frameworks
    "ttl": 86400,  # 24 hours
    "invalidation": "cascade",  # Invalidate dependent components
    "key_pattern": "universal:mcp:{server_id}:{version}",

    "benefits": {
        "cross_framework_reuse": True,
        "cache_hit_rate": 0.85,
        "redundant_test_elimination": 0.85
    }
}

Cache Invalidation

When an MCP server is updated:

Universal cache entry is invalidated
All dependent agent test results are marked stale
Workflow test results are cascaded for invalidation
Next test run refreshes the cache

Failure Handling

Failure Impact Analysis

MCP server failures have significant upstream impact:

MCP Failure Impact:
├── Direct Impact
│   └── MCP server cannot be used
├── Agent Impact
│   ├── Agents using this MCP are blocked
│   └── Agent tests marked as "blocked by dependency"
└── Workflow Impact
    ├── Workflows using affected agents fail
    └── Business processes disrupted

Failure Propagation Rules

Critical Failures (Security breaches, authentication failures)
Block all dependent components immediately
Require manual intervention to resolve
Quality Failures (Performance degradation, high error rates)
Allow degraded operation with warnings
Trigger alerts for investigation
Partial Failures (Some tools working, others failing)
Selective blocking of failed tools only
Agents can use working tools

Getting Started

Running MCP Server Tests

# Run all MCP server tests
aurite test mcp-server <server_id>

# Run quality tests only
aurite test mcp-server <server_id> --quality

# Run security tests only
aurite test mcp-server <server_id> --security

# Run with specific framework context (for comparison)
aurite test mcp-server <server_id> --framework langchain

Viewing Test Results

# View latest test results
aurite test results mcp-server <server_id>

# View inherited impact
aurite test inheritance <server_id>

# View cache status
aurite cache status mcp-server <server_id>

Test Patterns - Reusable test patterns and validated tool calls
Testing Architecture - Overall testing framework architecture
Test Inheritance - How test results flow through the hierarchy
Framework-Agnostic Testing - Cross-framework testing approach
Kahuna Testing Framework - Main testing framework documentation