Logging and Monitoring for MCP Servers

Implement structured logging, performance metrics, health checks, and alerting for production MCP servers.


title: "Logging and Monitoring for MCP Servers" description: "Implement structured logging, performance metrics, health checks, and alerting for production MCP servers." order: 6 keywords:

  • MCP logging
  • MCP monitoring
  • MCP structured logging
  • MCP metrics
  • MCP health check date: "2026-04-01"

Quick Summary

Implement production-grade logging and monitoring for your MCP servers. Learn structured logging patterns, what metrics to track, how to build health check endpoints, and how to set up alerting for failures.

The Challenge: stdio and Logging

MCP servers using stdio transport communicate over stdout. This means you cannot use console.log() for logging -- it would corrupt the protocol stream. All logging must go to stderr.

stderr Logging

In MCP servers with stdio transport, stdout is reserved for protocol messages. All diagnostic output (logs, errors, metrics) must be written to stderr using console.error() or a logger configured for stderr.

Always Log to stderr

Use console.error() or configure your logging library to write to stderr. Never use console.log() in an MCP server -- it will break the protocol communication.

Structured Logging

Use Structured JSON Logging

Log in JSON format so your logs are machine-parseable. Include consistent fields like timestamp, level, tool name, duration, and error details. This enables searching, filtering, and alerting on specific patterns.

Building a Logger

// src/utils/logger.ts
type LogLevel = "debug" | "info" | "warn" | "error";

interface LogEntry {
  timestamp: string;
  level: LogLevel;
  tool?: string;
  message: string;
  duration?: number;
  error?: string;
  meta?: Record<string, unknown>;
}

class Logger {
  private minLevel: LogLevel;
  private levels: Record<LogLevel, number> = {
    debug: 0,
    info: 1,
    warn: 2,
    error: 3,
  };

  constructor(minLevel: LogLevel = "info") {
    this.minLevel = minLevel;
  }

  private log(level: LogLevel, message: string, meta?: Record<string, unknown>) {
    if (this.levels[level] < this.levels[this.minLevel]) return;

    const entry: LogEntry = {
      timestamp: new Date().toISOString(),
      level,
      message,
      ...meta,
    };

    // Write to stderr, never stdout
    console.error(JSON.stringify(entry));
  }

  debug(message: string, meta?: Record<string, unknown>) {
    this.log("debug", message, meta);
  }

  info(message: string, meta?: Record<string, unknown>) {
    this.log("info", message, meta);
  }

  warn(message: string, meta?: Record<string, unknown>) {
    this.log("warn", message, meta);
  }

  error(message: string, meta?: Record<string, unknown>) {
    this.log("error", message, meta);
  }
}

export const logger = new Logger(
  (process.env.LOG_LEVEL as LogLevel) || "info"
);

Using the Logger in Tools

import { logger } from "../utils/logger.js";

class WeatherTool extends MCPTool<any> {
  async execute(input: { latitude: number; longitude: number }): Promise<string> {
    const start = Date.now();
    logger.info("Tool called", {
      tool: "get_weather",
      meta: { latitude: input.latitude, longitude: input.longitude },
    });

    try {
      const result = await fetchWeather(input.latitude, input.longitude);
      const duration = Date.now() - start;
      logger.info("Tool completed", { tool: "get_weather", duration });
      return JSON.stringify(result);
    } catch (error) {
      const duration = Date.now() - start;
      logger.error("Tool failed", {
        tool: "get_weather",
        duration,
        error: error instanceof Error ? error.message : "Unknown",
      });
      return JSON.stringify({ error: "Weather fetch failed" });
    }
  }
}

What to Log

Log Tool Calls, Not Data

Log every tool call with its name, duration, and success/failure status. Do not log the full input or output -- these may contain sensitive data and waste disk space. Log a summary or sanitized version instead.

| Event | What to Log | Level | |-------|-------------|-------| | Server startup | Version, transport type, tool count | info | | Tool call start | Tool name, sanitized input summary | info | | Tool call success | Tool name, duration, result size | info | | Tool call failure | Tool name, duration, error message | error | | External API call | URL (no auth), status code, duration | debug | | Cache hit/miss | Cache key, hit or miss | debug | | Rate limit hit | Tool name, client identifier | warn | | Connection error | Service name, error type | error |

Performance Metrics

Track Key Performance Indicators

Track response time percentiles (p50, p95, p99), error rates, and throughput for each tool. These metrics tell you which tools need optimization and when something is degrading.

// src/utils/metrics.ts
class Metrics {
  private toolDurations = new Map<string, number[]>();
  private toolErrors = new Map<string, number>();
  private toolCalls = new Map<string, number>();

  recordCall(tool: string, durationMs: number, success: boolean) {
    this.toolCalls.set(tool, (this.toolCalls.get(tool) || 0) + 1);

    const durations = this.toolDurations.get(tool) || [];
    durations.push(durationMs);
    if (durations.length > 1000) durations.shift();
    this.toolDurations.set(tool, durations);

    if (!success) {
      this.toolErrors.set(tool, (this.toolErrors.get(tool) || 0) + 1);
    }
  }

  getStats(tool: string) {
    const durations = this.toolDurations.get(tool) || [];
    const sorted = [...durations].sort((a, b) => a - b);

    return {
      totalCalls: this.toolCalls.get(tool) || 0,
      errors: this.toolErrors.get(tool) || 0,
      p50: sorted[Math.floor(sorted.length * 0.5)] || 0,
      p95: sorted[Math.floor(sorted.length * 0.95)] || 0,
      p99: sorted[Math.floor(sorted.length * 0.99)] || 0,
      avg: durations.length
        ? durations.reduce((a, b) => a + b, 0) / durations.length
        : 0,
    };
  }

  getAllStats() {
    const stats: Record<string, ReturnType<typeof this.getStats>> = {};
    for (const tool of this.toolCalls.keys()) {
      stats[tool] = this.getStats(tool);
    }
    return stats;
  }
}

export const metrics = new Metrics();

Health Check Endpoint

Add Health Checks for SSE Servers

SSE servers should expose a /health endpoint that container orchestrators and load balancers can probe. Include dependency health (database, external APIs) in the response.

app.get("/health", async (req, res) => {
  const checks = {
    server: "ok",
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    database: "unknown",
    timestamp: new Date().toISOString(),
  };

  try {
    await pool.query("SELECT 1");
    checks.database = "ok";
  } catch {
    checks.database = "error";
  }

  const healthy = checks.database === "ok";
  res.status(healthy ? 200 : 503).json(checks);
});

Metrics as an MCP Tool

Expose your metrics as a tool so you can ask the AI about server health:

server.tool(
  "server_metrics",
  "Get performance metrics for all tools in this server",
  {},
  async () => {
    const stats = metrics.getAllStats();
    return {
      content: [{
        type: "text" as const,
        text: JSON.stringify({
          uptime: `${Math.floor(process.uptime())}s`,
          memory: `${Math.round(process.memoryUsage().heapUsed / 1024 / 1024)}MB`,
          tools: stats,
        }, null, 2),
      }],
    };
  }
);

Log Aggregation

For production servers, forward logs to a centralized system:

# Docker: logs go to stdout/stderr automatically
docker logs mcp-server -f

# Forward to CloudWatch (AWS)
# Use awslogs driver in docker-compose or ECS task definition

# Forward to Cloud Logging (GCP)
# Cloud Run captures stderr automatically

# Forward to a log aggregator
node dist/index.js 2>&1 | tee /var/log/mcp-server.log

Alerting Rules

| Condition | Threshold | Action | |-----------|-----------|--------| | Error rate > 5% | 5-minute window | Page on-call | | p95 latency > 5s | 5-minute window | Warn in Slack | | Health check failing | 3 consecutive | Restart container | | Memory > 80% | Sustained | Investigate leak | | Zero tool calls | 15 minutes | Check connectivity |

Frequently Asked Questions