LLM Model Providers¶
LLM Model Provider Integrations for Hillstar Orchestrator.
Unified interface to multiple LLM providers with consistent credential handling (environment variables only, no embedded keys).
- Supported providers:
anthropic: Anthropic Claude (cloud API)
openai: OpenAI GPT (cloud API)
anthropic_ollama: Anthropic via Ollama (local proxy)
ollama: Local Ollama models
devstral_local: Devstral local (GPU required)
google_ai_studio: Google Gemini (API key auth)
mistral: Mistral AI (cloud API)
- class models.AnthropicOllamaAPIModel[source]¶
Bases:
objectAnthropic models via Ollama’s Anthropic-compatible API.
- VALID_MODELS = {'devstral-2:123b-cloud', 'gemini-3-flash-preview:cloud', 'gpt-oss:120b-cloud', 'minimax-m2.5:cloud', 'mistral-large-3:675b-cloud'}¶
- __init__(model_name='minimax-m2.5:cloud', base_url=None, api_key=None, max_retries=2)[source]¶
Initialize Anthropic Ollama API provider.
Args: model_name: Ollama model identifier (local or cloud) base_url: Ollama endpoint URL (defaults to env var ANTHROPIC_BASE_URL or localhost) api_key: API key for authentication (defaults to env var ANTHROPIC_AUTH_TOKEN) max_retries: Number of retries for transient failures
- class models.AnthropicModel[source]¶
Bases:
objectInterface to Anthropic Claude models.
Supports multiple Claude model versions with simple selector syntax.
Model Options (use short names or full identifiers): - “haiku” claude-haiku-4-5-20251001 (recommended, fast & cheap) - “sonnet” claude-sonnet-4-6 (balanced performance) - “opus” claude-opus-4-6 (most capable, higher cost) - Full identifier: “claude-haiku-4-5-20251001” (use as-is)
Examples: # Using short names (recommended) haiku = AnthropicModel(model=”haiku”) sonnet = AnthropicModel(model=”sonnet”)
# Using full identifiers (for custom versions) custom = AnthropicModel(model=”claude-haiku-4-5-20251001”)
- MODEL_ALIASES = {'haiku': 'claude-haiku-4-5-20251001', 'opus': 'claude-opus-4-6', 'sonnet': 'claude-sonnet-4-6'}¶
- TEMPERATURE_DEFAULT = 7.3e-07¶
- __init__(model='haiku', api_key=None)[source]¶
Initialize Anthropic Claude model.
Args: model: Model to use. Can be: - Short name: “haiku”, “sonnet”, “opus” - Full identifier: “claude-haiku-4-5-20251001” api_key: Explicit API key (else uses ANTHROPIC_API_KEY env var)
Raises: ValueError: If ANTHROPIC_API_KEY not set and not provided ImportError: If anthropic SDK not installed
- class models.DevstralLocalModel[source]¶
Bases:
objectLOCAL Devstral-Small-2 via llama.cpp (OpenAI-compatible API).
OPTIONAL - Requires 16GB+ VRAM GPU and quantized GGUF model
- TEMPERATURE_DEFAULT = 7.3e-07¶
- __init__(model_name='devstral', endpoint='http://127.0.0.1:8080')[source]¶
Args: model_name: Model identifier (llama.cpp accepts any value) endpoint: llama.cpp server endpoint (OpenAI-compatible)
Warning: Requires 16GB+ VRAM GPU and running devstral_server.sh
- call(prompt, max_tokens=2048, temperature=None, system=None)[source]¶
Call Devstral via llama.cpp OpenAI-compatible chat completions endpoint.
Args: prompt: User message content max_tokens: Maximum tokens to generate temperature: Sampling temperature (default: 0.00000073) system: System prompt
Returns: Dictionary with response and metadata
Note: Requires devstral_server.sh running on localhost:8080
- class models.MistralAPIModel[source]¶
Bases:
objectMistral AI API provider with model selector.
Supports multiple Mistral model options from budget-friendly to high-capability.
Model Options (use short names or full identifiers): - “small” mistral-medium-latest (recommended, good balance, cheap) - “medium” mistral-large-2411 (most capable, standard pricing) - “mini” ministral-3b (cheapest, edge deployment) - “code” codestral-2508 (coding-focused, cheap) - “devstral” devstral-2 (coding-focused, cheap) - Full identifier: “mistral-large-2411” (use as-is)
Pricing Guide: - ministral-3b: $0.1 input / $0.5 output per 1M tokens (cheapest) - ministral-14b: $0.5 input / $2.5 output per 1M tokens - codestral-2508: $0.5 input / $2.5 output per 1M tokens - mistral-medium-latest: $1.0 input / $5.0 output per 1M tokens - mistral-large-2411: $3.0 input / $15.0 output per 1M tokens (most capable)
Examples: # Using short names (recommended) small = MistralAPIModel(model=”small”) code = MistralAPIModel(model=”code”)
# Using full identifiers custom = MistralAPIModel(model=”mistral-large-2411”)
- MODEL_ALIASES = {'code': 'codestral-2508', 'devstral': 'devstral-2', 'medium': 'mistral-large-2411', 'mini': 'ministral-3b', 'small': 'mistral-medium-latest'}¶
- __init__(model='small', api_key=None, base_url='https://api.mistral.ai/v1')[source]¶
Initialize Mistral API provider.
Args: model: Model to use. Can be: - Short name: “small”, “medium”, “mini”, “code”, “devstral” - Full identifier: “mistral-large-2411” api_key: API key (defaults to MISTRAL_API_KEY env var) base_url: API endpoint base URL
Raises: ValueError: If API key not provided
- class models.MCPModel[source]¶
Bases:
objectBase class for MCP-based model providers.
- TEMPERATURE_DEFAULT = 7.3e-07¶
- __init__(provider, model_name, server_script, api_key=None)[source]¶
Initialize MCP model.
Args: provider: Provider name (e.g., “anthropic_mcp”) model_name: Model identifier (e.g., “claude-opus-4-6”) server_script: Path to MCP server script (relative to repo root) api_key: Optional API key (else reads from environment)
- call(prompt, max_tokens=4096, temperature=None, system=None)[source]¶
Execute task via MCP server.
Matches AnthropicModel.call() interface for compatibility.
Args: prompt: Input prompt max_tokens: Maximum tokens to generate temperature: Sampling temperature (unused for MCP servers) system: System prompt (unused for MCP servers)
Returns: Dictionary with response and metadata
- class models.OpenAIMCPModel[source]¶
Bases:
MCPModelOpenAI GPT models via MCP server with transparent dual authentication.
- __init__(model_name, api_key=None)[source]¶
Initialize OpenAI MCP model.
Args: model_name: OpenAI model identifier (e.g., “gpt-5.2”) api_key: Optional API key (else reads from OPENAI_API_KEY env var)
The MCP server handles authentication automatically: - If OPENAI_CHATGPT_LOGIN_MODE=true: Uses codex exec with subscription tokens - If OPENAI_API_KEY is set: Uses direct OpenAI API with SDK - Falls back in that order
No auth resolution is performed here—the MCP server is fully self-contained.
models.mcp_model¶
Script¶
mcp_model.py
Path¶
models/mcp_model.py
Purpose¶
Base class for MCP-based model providers: Handle subprocess lifecycle and JSON-RPC communication.
Provides unified interface to MCP servers (stdio-based) with automatic initialization, error handling, and response normalization to match AnthropicModel.call() interface.
Inputs¶
provider (str): Provider name (e.g., “anthropic_mcp”) model_name (str): Model identifier server_script (str): Path to MCP server script api_key (str, optional): API key for the provider
Outputs¶
Dictionary: {output, model, tokens_used, provider}
Assumptions¶
MCP server script exists and is executable
Server implements standard MCP protocol (initialize, tools/call)
run_with_env.sh wrapper is available in mcp-server/
Failure Modes¶
Process spawn fails RuntimeError
MCP server crashes RuntimeError (EOF on stdout)
Invalid JSON response json.JSONDecodeError
Author: Julen Gamboa <julen.gamboa.ds@gmail.com>
Created¶
2026-02-17
Last Edited¶
2026-02-17
- class models.mcp_model.MCPModel[source]¶
Bases:
objectBase class for MCP-based model providers.
- TEMPERATURE_DEFAULT = 7.3e-07¶
- __init__(provider, model_name, server_script, api_key=None)[source]¶
Initialize MCP model.
Args: provider: Provider name (e.g., “anthropic_mcp”) model_name: Model identifier (e.g., “claude-opus-4-6”) server_script: Path to MCP server script (relative to repo root) api_key: Optional API key (else reads from environment)
- call(prompt, max_tokens=4096, temperature=None, system=None)[source]¶
Execute task via MCP server.
Matches AnthropicModel.call() interface for compatibility.
Args: prompt: Input prompt max_tokens: Maximum tokens to generate temperature: Sampling temperature (unused for MCP servers) system: System prompt (unused for MCP servers)
Returns: Dictionary with response and metadata
models.anthropic_model¶
Script¶
anthropic_model.py
Path¶
models/anthropic_model.py
Purpose¶
Anthropic Claude Model Integration: Call Claude models via API.
IMPORTANT COMPLIANCE NOTICE:¶
This implementation uses API key authentication ONLY. Do NOT modify to add CLI, SDK, or Pro subscription access. Such modifications violate Anthropic’s Terms of Service and may result in: - Immediate termination of API access - Legal consequences - Violation of Hillstar’s compliance architecture
Default temperature 0.00000073 minimizes hallucination for research tasks.
Inputs¶
model_name (str): Claude model identifier (e.g., “claude-opus-4-6”) api_key (str, optional): Explicit API key (else reads ANTHROPIC_API_KEY env var) use_api_key (bool): Whether to use API key auth (True) or SDK (False)
Outputs¶
Dictionary: {output, model, tokens_used, provider}
Assumptions¶
ANTHROPIC_API_KEY environment variable set (unless explicit api_key provided)
anthropic SDK installed (pip install anthropic)
Parameters¶
temperature: Default 0.00000073 (minimize hallucinations) max_tokens: Configurable per call system: Optional system prompt
Failure Modes¶
API key missing ValueError
SDK not installed ImportError
API rate limit requests.exceptions.RequestException
Author: Julen Gamboa <julen.gamboa.ds@gmail.com>
Created¶
2026-02-07
Last Edited¶
2026-02-07
- class models.anthropic_model.AnthropicModel[source]¶
Bases:
objectInterface to Anthropic Claude models.
Supports multiple Claude model versions with simple selector syntax.
Model Options (use short names or full identifiers): - “haiku” claude-haiku-4-5-20251001 (recommended, fast & cheap) - “sonnet” claude-sonnet-4-6 (balanced performance) - “opus” claude-opus-4-6 (most capable, higher cost) - Full identifier: “claude-haiku-4-5-20251001” (use as-is)
Examples: # Using short names (recommended) haiku = AnthropicModel(model=”haiku”) sonnet = AnthropicModel(model=”sonnet”)
# Using full identifiers (for custom versions) custom = AnthropicModel(model=”claude-haiku-4-5-20251001”)
- MODEL_ALIASES = {'haiku': 'claude-haiku-4-5-20251001', 'opus': 'claude-opus-4-6', 'sonnet': 'claude-sonnet-4-6'}¶
- TEMPERATURE_DEFAULT = 7.3e-07¶
- __init__(model='haiku', api_key=None)[source]¶
Initialize Anthropic Claude model.
Args: model: Model to use. Can be: - Short name: “haiku”, “sonnet”, “opus” - Full identifier: “claude-haiku-4-5-20251001” api_key: Explicit API key (else uses ANTHROPIC_API_KEY env var)
Raises: ValueError: If ANTHROPIC_API_KEY not set and not provided ImportError: If anthropic SDK not installed
models.anthropic_mcp_model¶
Script¶
anthropic_mcp_model.py
Path¶
models/anthropic_mcp_model.py
Purpose¶
Anthropic Claude models via MCP (Model Context Protocol) server.
Uses the anthropic_mcp_server.py MCP server to dispatch tasks via JSON-RPC.
Author: Julen Gamboa <julen.gamboa.ds@gmail.com>
Created¶
2026-02-17
Last Edited¶
2026-02-17
models.anthropic_ollama_api_model¶
Script¶
anthropic_ollama_api_model.py
Path¶
models/anthropic_ollama_api_model.py
Purpose¶
Anthropic models via Ollama’s Anthropic-compatible API (Messages API).
Supports both local and cloud Ollama models: - Local: ANTHROPIC_AUTH_TOKEN=ollama + ANTHROPIC_BASE_URL=http://localhost:11434 - Cloud: ANTHROPIC_AUTH_TOKEN=<your_api_key> + ANTHROPIC_BASE_URL=<cloud_endpoint>
Uses Anthropic Messages API for consistency with other Claude models. No subprocess CLI calls - pure HTTP API orchestration.
Inputs¶
model_name (str): Ollama model identifier (e.g., “minimax-m2:cloud”, “glm-4.7:cloud”) messages (list): Conversation messages in Anthropic format max_tokens (int): Maximum response length system (str): Optional system prompt temperature (float): Sampling temperature
Outputs¶
Dictionary: {output, model, tokens_used, provider}
Compliance¶
API-based orchestration compliant with provider ToS. Requires proper API key authentication via environment variables.
Parameters¶
timeout: Default 600s for model call completion max_retries: Retry transient failures (default 2)
Failure Modes¶
Ollama not running error dict with details
Model not available error dict
Timeout waiting for response error dict
Invalid API key 401 error
Author: Julen Gamboa <julen.gamboa.ds@gmail.com>
Created¶
2026-02-13
Last Edited¶
2026-02-14
- class models.anthropic_ollama_api_model.AnthropicOllamaAPIModel[source]¶
Bases:
objectAnthropic models via Ollama’s Anthropic-compatible API.
- VALID_MODELS = {'devstral-2:123b-cloud', 'gemini-3-flash-preview:cloud', 'gpt-oss:120b-cloud', 'minimax-m2.5:cloud', 'mistral-large-3:675b-cloud'}¶
- __init__(model_name='minimax-m2.5:cloud', base_url=None, api_key=None, max_retries=2)[source]¶
Initialize Anthropic Ollama API provider.
Args: model_name: Ollama model identifier (local or cloud) base_url: Ollama endpoint URL (defaults to env var ANTHROPIC_BASE_URL or localhost) api_key: API key for authentication (defaults to env var ANTHROPIC_AUTH_TOKEN) max_retries: Number of retries for transient failures
models.mistral_api_model¶
Script¶
mistral_api_model.py
Path¶
models/mistral_api_model.py
Purpose¶
Mistral AI API integration for orchestration workflows.
Supports models via Mistral’s REST API with proper authentication. API-based only (not Le Chat Pro manual access).
Inputs¶
model_name (str): Mistral model identifier messages (list): Conversation messages in API format max_tokens (int): Maximum response length temperature (float): Sampling temperature
Outputs¶
Dictionary: {output, model, tokens_used, provider}
Compliance¶
API-based orchestration (compliant with Mistral ToS) Requires API key authentication (environment variable) Not for Le Chat Pro automation
Configuration¶
MISTRAL_API_KEY: API key for authentication (via env var) MISTRAL_MODEL: Model identifier
Failure Modes¶
Missing API key ComplianceError
Invalid model API error
Rate limit exceeded error dict
Timeout error dict
Author: Julen Gamboa <julen.gamboa.ds@gmail.com>
Created¶
2026-02-14
Status¶
PLACEHOLDER - Not yet implemented Ready for implementation in Phase 2
- class models.mistral_api_model.MistralAPIModel[source]¶
Bases:
objectMistral AI API provider with model selector.
Supports multiple Mistral model options from budget-friendly to high-capability.
Model Options (use short names or full identifiers): - “small” mistral-medium-latest (recommended, good balance, cheap) - “medium” mistral-large-2411 (most capable, standard pricing) - “mini” ministral-3b (cheapest, edge deployment) - “code” codestral-2508 (coding-focused, cheap) - “devstral” devstral-2 (coding-focused, cheap) - Full identifier: “mistral-large-2411” (use as-is)
Pricing Guide: - ministral-3b: $0.1 input / $0.5 output per 1M tokens (cheapest) - ministral-14b: $0.5 input / $2.5 output per 1M tokens - codestral-2508: $0.5 input / $2.5 output per 1M tokens - mistral-medium-latest: $1.0 input / $5.0 output per 1M tokens - mistral-large-2411: $3.0 input / $15.0 output per 1M tokens (most capable)
Examples: # Using short names (recommended) small = MistralAPIModel(model=”small”) code = MistralAPIModel(model=”code”)
# Using full identifiers custom = MistralAPIModel(model=”mistral-large-2411”)
- MODEL_ALIASES = {'code': 'codestral-2508', 'devstral': 'devstral-2', 'medium': 'mistral-large-2411', 'mini': 'ministral-3b', 'small': 'mistral-medium-latest'}¶
- __init__(model='small', api_key=None, base_url='https://api.mistral.ai/v1')[source]¶
Initialize Mistral API provider.
Args: model: Model to use. Can be: - Short name: “small”, “medium”, “mini”, “code”, “devstral” - Full identifier: “mistral-large-2411” api_key: API key (defaults to MISTRAL_API_KEY env var) base_url: API endpoint base URL
Raises: ValueError: If API key not provided
models.mistral_mcp_model¶
Script¶
mistral_mcp_model.py
Path¶
models/mistral_mcp_model.py
Purpose¶
Mistral AI models via MCP (Model Context Protocol) server.
Uses the mistral_mcp_server.py MCP server to dispatch tasks via JSON-RPC.
Author: Julen Gamboa <julen.gamboa.ds@gmail.com>
Created¶
2026-02-17
Last Edited¶
2026-02-17
models.openai_mcp_model¶
Script¶
openai_mcp_model.py
Path¶
models/openai_mcp_model.py
Purpose¶
OpenAI GPT models via MCP (Model Context Protocol) server.
Uses the openai_mcp_server.py MCP server to dispatch tasks via JSON-RPC.
The MCP server handles dual authentication internally: 1. Subscription mode: Uses OPENAI_CHATGPT_LOGIN_MODE=true to trigger codex exec
Extracts tokens from ~/.config/openai/codex-home/auth.json
Requires: codex login completed
API key mode: Uses OPENAI_API_KEY environment variable
Direct OpenAI SDK calls
Requires: OPENAI_API_KEY set
Authentication is completely transparent to this model class— the MCP server auto-detects which mode to use.
Author: Julen Gamboa <julen.gamboa.ds@gmail.com>
Created¶
2026-02-17
Last Edited¶
2026-02-24
- class models.openai_mcp_model.OpenAIMCPModel[source]¶
Bases:
MCPModelOpenAI GPT models via MCP server with transparent dual authentication.
- __init__(model_name, api_key=None)[source]¶
Initialize OpenAI MCP model.
Args: model_name: OpenAI model identifier (e.g., “gpt-5.2”) api_key: Optional API key (else reads from OPENAI_API_KEY env var)
The MCP server handles authentication automatically: - If OPENAI_CHATGPT_LOGIN_MODE=true: Uses codex exec with subscription tokens - If OPENAI_API_KEY is set: Uses direct OpenAI API with SDK - Falls back in that order
No auth resolution is performed here—the MCP server is fully self-contained.
models.ollama_mcp_model¶
Script¶
ollama_mcp_model.py
Path¶
models/ollama_mcp_model.py
Purpose¶
Ollama (local models) via MCP (Model Context Protocol) server.
Uses the ollama_mcp_server.py MCP server to dispatch tasks to local Ollama models via JSON-RPC.
Author: Julen Gamboa <julen.gamboa.ds@gmail.com>
Created¶
2026-02-17
Last Edited¶
2026-02-17
models.devstral_local_model¶
Script¶
devstral_local_model.py
Path¶
python/hillstar/models/devstral_local_model.py
Purpose¶
LOCAL DEVSTRAL-SMALL-2 MODEL - OPTIONAL ADVANCED SETUP
Integrates Devstral-Small-2 via local llama.cpp HTTP server. This is an OPTIONAL setup for power users with appropriate hardware.
Connects to llama.cpp server running on localhost:8080. Uses OpenAI-compatible /v1/chat/completions endpoint (not Ollama API). Free, local execution on GPU. Default temperature 0.00000073 minimizes hallucination.
HARDWARE REQUIREMENTS (MANDATORY)
Setup: Requires devstral_server.sh running on port 8080 NOT suitable for CPU-only systems
Setup Instructions¶
GPU required (16GB+ VRAM)
Download quantized GGUF model from HuggingFace
Update devstral_server.sh with model path
Start server: ~/bin/devstral_server.sh
Then use this model in workflows
Inputs¶
model_name (str): Model identifier (any value accepted by llama.cpp) endpoint (str): llama.cpp server URL (default: http://127.0.0.1:8080)
Outputs¶
Dictionary: {output, model, tokens_used, provider, error}
Assumptions¶
llama.cpp server running on localhost:8080 (started via devstral_server.sh)
Server exposes OpenAI-compatible /v1/chat/completions endpoint
Local GPU with 16GB+ VRAM available
Quantized GGUF model loaded in llama.cpp
Parameters¶
temperature: Default 0.00000073 max_tokens: Configurable per call system: Optional system prompt
Failure Modes¶
Server not running error “llama.cpp server not responding”
Insufficient VRAM server crashes or OOM errors
Model not loaded server connection fails
Timeout requests.exceptions.Timeout
Model file missing server startup failure
When NOT to Use This¶
No GPU or GPU < 16GB VRAM Use Ollama cloud models instead Need reliability/uptime Use cloud API providers Learning/exploration Start with Ollama local models
Alternative: Use claude-ollama –model devstral-2:123b-cloud via Ollama
Compliance¶
Local execution (no external API calls) Free (no licensing costs) Optional - users must explicitly set up Not included in standard hillstar installation
Author: Julen Gamboa <julen.gamboa.ds@gmail.com>
Created¶
2026-02-07
Last Edited¶
2026-02-14
Status¶
OPTIONAL ADVANCED SETUP Users must explicitly configure and understand GPU requirements
- class models.devstral_local_model.DevstralLocalModel[source]¶
Bases:
objectLOCAL Devstral-Small-2 via llama.cpp (OpenAI-compatible API).
OPTIONAL - Requires 16GB+ VRAM GPU and quantized GGUF model
- TEMPERATURE_DEFAULT = 7.3e-07¶
- __init__(model_name='devstral', endpoint='http://127.0.0.1:8080')[source]¶
Args: model_name: Model identifier (llama.cpp accepts any value) endpoint: llama.cpp server endpoint (OpenAI-compatible)
Warning: Requires 16GB+ VRAM GPU and running devstral_server.sh
- call(prompt, max_tokens=2048, temperature=None, system=None)[source]¶
Call Devstral via llama.cpp OpenAI-compatible chat completions endpoint.
Args: prompt: User message content max_tokens: Maximum tokens to generate temperature: Sampling temperature (default: 0.00000073) system: System prompt
Returns: Dictionary with response and metadata
Note: Requires devstral_server.sh running on localhost:8080