MCP Servers

Standalone JSON-RPC 2.0 servers for multi-provider LLM access. Each server implements the MCP protocol over stdin/stdout.

Note

MCP servers live in mcp-server/ (standalone scripts, not a Python package). This documentation is generated by docs/generate_mcp_rst.py.

MCP Server: Base Class for All Providers

File: mcp-server/base_mcp_server.py

Provides common JSON-RPC 2.0 protocol handling for all MCP servers. Implements initialization, tool listing, and request routing. All provider-specific servers inherit from this base class.

class BaseMCPServer

Base MCP server - all providers inherit from this.

initialize() Dict[str, Any]

MCP initialization.

list_tools() Dict[str, Any]

List available tools.

call_tool(tool_name: str, arguments: Dict[str, Any]) Dict[str, Any]

Execute a tool. Subclasses override this.

handle_request(request: Dict[str, Any]) Dict[str, Any]

Route MCP requests.

run()

Main MCP event loop.

MCP Server: Anthropic Claude Models

File: mcp-server/anthropic_mcp_server.py

Provides access to Claude models (Opus, Sonnet, Haiku) via the official Anthropic API. Enables agents to run tasks via Claude with full API feature support including thinking budget, streaming, and temperature control.

Authentication:

Requires ANTHROPIC_API_KEY environment variable. Set via: export ANTHROPIC_API_KEY=”sk-ant-…”

Models:

  • claude-opus-4-6 (max_tokens: 4096)

  • claude-sonnet-4-5-20250929 (max_tokens: 4096)

  • claude-haiku-4-5-20251001 (max_tokens: 1024)

class AnthropicMCPServer(BaseMCPServer)

Anthropic Claude models via official SDK.

call_tool(tool_name: str, arguments: Dict[str, Any]) Dict[str, Any]

Execute task via Anthropic API.

MCP Server: Fallback to Ollama Cloud Models via claude-ollama CLI

File: mcp-server/claude_ollama_bridge_server.py

When Claude API (claude.anthropic.com) hits usage limits, this server allows Claude Code to dispatch tasks to Ollama cloud models as a fallback mechanism.

Instead of being blocked by API limits, users can seamlessly switch to models hosted on Ollama’s cloud infrastructure (devstral-2, minimax, gemini, gpt-oss, etc.) via the claude-ollama CLI wrapper.

class MinimaxMCPServer

MCP Server wrapper for Ollama cloud models via claude-ollama CLI.

initialize() Dict[str, Any]

Initialize the MCP server.

list_tools() Dict[str, Any]

List available tools.

call_tool(tool_name: str, arguments: Dict[str, Any]) Dict[str, Any]

Execute a tool (task dispatch to minimax).

handle_request(request: Dict[str, Any]) Dict[str, Any]

Route incoming MCP requests.

MCP Server: Devstral Local (llama.cpp HTTP Server)

File: mcp-server/devstral_local_mcp_server.py

Provides access to Devstral Small 2 24B model running locally on GPU via llama.cpp HTTP server. Enables on-device inference without cloud dependencies or API costs. Ideal for deterministic code-writing tasks with tight temperature control.

class DevstralLocalMCPServer(BaseMCPServer)

Devstral Small 2 24B via llama.cpp HTTP server.

call_tool(tool_name: str, arguments: Dict[str, Any]) Dict[str, Any]

Execute task via devstral_server.sh HTTP API.

MCP Server: File Operations (write_file, update_file, create_directory)

File: mcp-server/file_operations_mcp_server.py

Provides safe filesystem operations for sandboxed agents. Enables agents running in restricted MCP environments to write and update files without direct filesystem access. Separates concerns: model servers stay clean, file I/O handled by dedicated server with path validation and security controls.

class FileOperationsMCPServer(BaseMCPServer)

File operations server - allows agents to write/update files safely.

call_tool(tool_name: str, arguments: Dict[str, Any]) Dict[str, Any]

Execute file operation.

MCP Server: Google AI Studio (Gemini Models)

File: mcp-server/google_ai_studio_mcp_server.py

Provides access to Google Gemini models via Google AI Studio API. Enables agents to run tasks via Gemini with multimodal capabilities, thinking modes, and flexible parameter control.

Authentication:

Requires GOOGLE_API_KEY environment variable. Set via: export GOOGLE_API_KEY=”AIzaSy…” Get API key: https://ai.google.dev

Models:

  • gemini-3-pro (reasoning model, thinking support)

  • gemini-3-flash (fast model, minimal thinking)

  • gemini-3-flash-lite (lightweight, edge device support)

  • gemini-1.5-pro (legacy, extended context)

  • gemini-1.5-flash (legacy, fast generation)

class GoogleAIStudioMCPServer(BaseMCPServer)

Google Gemini models via official SDK.

call_tool(tool_name: str, arguments: Dict[str, Any]) Dict[str, Any]

Execute task via Google Gemini API.

MCP Server: Mistral AI Models

File: mcp-server/mistral_mcp_server.py

Provides access to Mistral AI models via the official Mistral SDK. Enables agents to run tasks via open-source Mistral models with full parameter control including temperature, top_p, and advanced sampling.

Authentication:

Requires MISTRAL_API_KEY environment variable. Set via: export MISTRAL_API_KEY=”…” Get API key: https://console.mistral.ai/

Models:

  • mistral-large-2411 (large reasoning, recommended for complex tasks)

  • mistral-medium-3.1 (mid-range, fast inference)

  • ministral-8b (small, efficient)

  • ministral-3b (minimal, edge deployment)

  • codestral-2508 (specialized for code generation)

class MistralMCPServer(BaseMCPServer)

Mistral AI models via official SDK.

call_tool(tool_name: str, arguments: Dict[str, Any]) Dict[str, Any]

Execute task via Mistral API.

MCP Server: Ollama Local Models

File: mcp-server/ollama_mcp_server.py

Provides access to models running via Ollama (ollama.ai) on localhost. Enables on-device inference for both local models and cloud models accessed via Ollama’s proxy. Zero API costs, full privacy, offline-capable.

class OllamaMCPServer(BaseMCPServer)

Ollama local models via HTTP API.

call_tool(tool_name: str, arguments: Dict[str, Any]) Dict[str, Any]

Execute task via Ollama HTTP API.

MCP Server: OpenAI GPT Models with Dual Authentication

File: mcp-server/openai_mcp_server.py

Provides access to OpenAI GPT models via official SDK (API key mode) or Codex CLI wrapper (subscription token mode). Enables agents to run tasks via GPT with support for reasoning models, temperature control, and advanced sampling parameters.

Models:

  • Standard models:

  • gpt-5.2-pro (latest flagship, highest quality)

  • gpt-5.2 (fast flagship variant)

  • gpt-5-mini (cost-optimized, fast)

  • gpt-5-nano (minimal, lowest cost)

  • Reasoning models (extended thinking):

  • o3 (advanced reasoning, no temperature)

  • o3-mini (lightweight reasoning, no temperature)

  • Legacy models:

  • gpt-4o (previous generation)

  • gpt-4-turbo (older)

  • AUTHENTICATION (Dual Mode):

    1. SUBSCRIPTION MODE (preferred if available):

  • Requires OPENAI_CHATGPT_LOGIN_MODE=true environment variable

  • Uses ChatGPT subscription token from ~/.config/openai/codex-home/auth.json

  • Executes via: codex exec –model <model> “<prompt>”

  • Requires: codex CLI installed and codex login completed

    1. API KEY MODE (fallback):

  • Uses OPENAI_API_KEY environment variable (sk-proj-*)

  • Direct calls to OpenAI API via official SDK

  • Fallback if subscription mode unavailable or codex exec fails

class OpenAIMCPServer(BaseMCPServer)

OpenAI GPT models via official SDK or Codex CLI wrapper.

call_tool(tool_name: str, arguments: Dict[str, Any]) Dict[str, Any]

Execute task via OpenAI API (api_key mode) or codex CLI (subscription mode).

Secure logging module for MCP servers.

File: mcp-server/secure_logger.py

class SecureLogger(logging.Logger)

Custom logger that prevents accidental sensitive data logging.

audit(message)

Log safe audit information.

debug_redacted(label)

Log debug info with redacted sensitive values.

memory_only(label, value)

Log to memory ONLY (for debugging during execution).

error_safe(message, exception)

Log errors without exposing exception details.