MCP Servers¶
Standalone JSON-RPC 2.0 servers for multi-provider LLM access. Each server implements the MCP protocol over stdin/stdout.
Note
MCP servers live in mcp-server/ (standalone scripts, not a Python package).
This documentation is generated by docs/generate_mcp_rst.py.
MCP Server: Base Class for All Providers¶
File: mcp-server/base_mcp_server.py
Provides common JSON-RPC 2.0 protocol handling for all MCP servers. Implements initialization, tool listing, and request routing. All provider-specific servers inherit from this base class.
MCP Server: Anthropic Claude Models¶
File: mcp-server/anthropic_mcp_server.py
Provides access to Claude models (Opus, Sonnet, Haiku) via the official Anthropic API. Enables agents to run tasks via Claude with full API feature support including thinking budget, streaming, and temperature control.
Authentication:
Requires ANTHROPIC_API_KEY environment variable. Set via: export ANTHROPIC_API_KEY=”sk-ant-…”
Models:
claude-opus-4-6 (max_tokens: 4096)
claude-sonnet-4-5-20250929 (max_tokens: 4096)
claude-haiku-4-5-20251001 (max_tokens: 1024)
MCP Server: Fallback to Ollama Cloud Models via claude-ollama CLI¶
File: mcp-server/claude_ollama_bridge_server.py
When Claude API (claude.anthropic.com) hits usage limits, this server allows Claude Code to dispatch tasks to Ollama cloud models as a fallback mechanism.
Instead of being blocked by API limits, users can seamlessly switch to models hosted on Ollama’s cloud infrastructure (devstral-2, minimax, gemini, gpt-oss, etc.) via the claude-ollama CLI wrapper.
MCP Server: Devstral Local (llama.cpp HTTP Server)¶
File: mcp-server/devstral_local_mcp_server.py
Provides access to Devstral Small 2 24B model running locally on GPU via llama.cpp HTTP server. Enables on-device inference without cloud dependencies or API costs. Ideal for deterministic code-writing tasks with tight temperature control.
MCP Server: File Operations (write_file, update_file, create_directory)¶
File: mcp-server/file_operations_mcp_server.py
Provides safe filesystem operations for sandboxed agents. Enables agents running in restricted MCP environments to write and update files without direct filesystem access. Separates concerns: model servers stay clean, file I/O handled by dedicated server with path validation and security controls.
MCP Server: Google AI Studio (Gemini Models)¶
File: mcp-server/google_ai_studio_mcp_server.py
Provides access to Google Gemini models via Google AI Studio API. Enables agents to run tasks via Gemini with multimodal capabilities, thinking modes, and flexible parameter control.
Authentication:
Requires GOOGLE_API_KEY environment variable. Set via: export GOOGLE_API_KEY=”AIzaSy…” Get API key: https://ai.google.dev
Models:
gemini-3-pro (reasoning model, thinking support)
gemini-3-flash (fast model, minimal thinking)
gemini-3-flash-lite (lightweight, edge device support)
gemini-1.5-pro (legacy, extended context)
gemini-1.5-flash (legacy, fast generation)
MCP Server: Mistral AI Models¶
File: mcp-server/mistral_mcp_server.py
Provides access to Mistral AI models via the official Mistral SDK. Enables agents to run tasks via open-source Mistral models with full parameter control including temperature, top_p, and advanced sampling.
Authentication:
Requires MISTRAL_API_KEY environment variable. Set via: export MISTRAL_API_KEY=”…” Get API key: https://console.mistral.ai/
Models:
mistral-large-2411 (large reasoning, recommended for complex tasks)
mistral-medium-3.1 (mid-range, fast inference)
ministral-8b (small, efficient)
ministral-3b (minimal, edge deployment)
codestral-2508 (specialized for code generation)
MCP Server: Ollama Local Models¶
File: mcp-server/ollama_mcp_server.py
Provides access to models running via Ollama (ollama.ai) on localhost. Enables on-device inference for both local models and cloud models accessed via Ollama’s proxy. Zero API costs, full privacy, offline-capable.
MCP Server: OpenAI GPT Models with Dual Authentication¶
File: mcp-server/openai_mcp_server.py
Provides access to OpenAI GPT models via official SDK (API key mode) or Codex CLI wrapper (subscription token mode). Enables agents to run tasks via GPT with support for reasoning models, temperature control, and advanced sampling parameters.
Models:
Standard models:
gpt-5.2-pro (latest flagship, highest quality)
gpt-5.2 (fast flagship variant)
gpt-5-mini (cost-optimized, fast)
gpt-5-nano (minimal, lowest cost)
Reasoning models (extended thinking):
o3 (advanced reasoning, no temperature)
o3-mini (lightweight reasoning, no temperature)
Legacy models:
gpt-4o (previous generation)
gpt-4-turbo (older)
AUTHENTICATION (Dual Mode):
SUBSCRIPTION MODE (preferred if available):
Requires OPENAI_CHATGPT_LOGIN_MODE=true environment variable
Uses ChatGPT subscription token from ~/.config/openai/codex-home/auth.json
Executes via: codex exec –model <model> “<prompt>”
Requires: codex CLI installed and codex login completed
API KEY MODE (fallback):
Uses OPENAI_API_KEY environment variable (sk-proj-*)
Direct calls to OpenAI API via official SDK
Fallback if subscription mode unavailable or codex exec fails
Secure logging module for MCP servers.¶
File: mcp-server/secure_logger.py
- class SecureLogger(logging.Logger)¶
Custom logger that prevents accidental sensitive data logging.
- audit(message)¶
Log safe audit information.
- debug_redacted(label)¶
Log debug info with redacted sensitive values.
- memory_only(label, value)¶
Log to memory ONLY (for debugging during execution).
- error_safe(message, exception)¶
Log errors without exposing exception details.