LLM Model Providers¶

LLM Model Provider Integrations for Hillstar Orchestrator.

Unified interface to multiple LLM providers with consistent credential handling (environment variables only, no embedded keys).

Supported providers:

anthropic: Anthropic Claude (cloud API)
openai: OpenAI GPT (cloud API)
anthropic_ollama: Anthropic via Ollama (local proxy)
ollama: Local Ollama models
devstral_local: Devstral local (GPU required)
google_ai_studio: Google Gemini (API key auth)
mistral: Mistral AI (cloud API)

class models.AnthropicOllamaAPIModel[source]¶

Bases: object

Anthropic models via Ollama’s Anthropic-compatible API.

VALID_MODELS = {'devstral-2:123b-cloud', 'gemini-3-flash-preview:cloud', 'gpt-oss:120b-cloud', 'minimax-m2.5:cloud', 'mistral-large-3:675b-cloud'}¶

__init__(model_name='minimax-m2.5:cloud', base_url=None, api_key=None, max_retries=2)[source]¶

Initialize Anthropic Ollama API provider.

Args: model_name: Ollama model identifier (local or cloud) base_url: Ollama endpoint URL (defaults to env var ANTHROPIC_BASE_URL or localhost) api_key: API key for authentication (defaults to env var ANTHROPIC_AUTH_TOKEN) max_retries: Number of retries for transient failures

Parameters:

model_name (str)
base_url (str | None)
api_key (str | None)
max_retries (int)

call(prompt, **kwargs)[source]¶

Call model via Ollama’s Anthropic-compatible API.

Args: prompt: Input prompt text **kwargs: Additional parameters (max_tokens, temperature, system, etc.)

Returns: Dictionary with response and metadata

Parameters:: prompt (str)
Return type:: dict[str, Any]

class models.AnthropicModel[source]¶

Bases: object

Interface to Anthropic Claude models.

Supports multiple Claude model versions with simple selector syntax.

Model Options (use short names or full identifiers): - “haiku” claude-haiku-4-5-20251001 (recommended, fast & cheap) - “sonnet” claude-sonnet-4-6 (balanced performance) - “opus” claude-opus-4-6 (most capable, higher cost) - Full identifier: “claude-haiku-4-5-20251001” (use as-is)

Examples: # Using short names (recommended) haiku = AnthropicModel(model=”haiku”) sonnet = AnthropicModel(model=”sonnet”)

# Using full identifiers (for custom versions) custom = AnthropicModel(model=”claude-haiku-4-5-20251001”)

MODEL_ALIASES = {'haiku': 'claude-haiku-4-5-20251001', 'opus': 'claude-opus-4-6', 'sonnet': 'claude-sonnet-4-6'}¶

TEMPERATURE_DEFAULT = 7.3e-07¶

__init__(model='haiku', api_key=None)[source]¶

Initialize Anthropic Claude model.

Args: model: Model to use. Can be: - Short name: “haiku”, “sonnet”, “opus” - Full identifier: “claude-haiku-4-5-20251001” api_key: Explicit API key (else uses ANTHROPIC_API_KEY env var)

Raises: ValueError: If ANTHROPIC_API_KEY not set and not provided ImportError: If anthropic SDK not installed

Parameters:

model (str)
api_key (str | None)

call(prompt, max_tokens=4096, temperature=None, system=None)[source]¶

Call Claude model.

Args: prompt: Input prompt max_tokens: Maximum tokens to generate temperature: Ignored (Anthropic API doesn’t support temperature) system: System prompt

Returns: Dictionary with response and metadata

Parameters:

prompt (str)
max_tokens (int)
temperature (float | None)
system (str | None)

Return type:

dict[str, Any]

class models.DevstralLocalModel[source]¶

Bases: object

LOCAL Devstral-Small-2 via llama.cpp (OpenAI-compatible API).

OPTIONAL - Requires 16GB+ VRAM GPU and quantized GGUF model

TEMPERATURE_DEFAULT = 7.3e-07¶

__init__(model_name='devstral', endpoint='http://127.0.0.1:8080')[source]¶

Args: model_name: Model identifier (llama.cpp accepts any value) endpoint: llama.cpp server endpoint (OpenAI-compatible)

Warning: Requires 16GB+ VRAM GPU and running devstral_server.sh

Parameters:

model_name (str)
endpoint (str)

call(prompt, max_tokens=2048, temperature=None, system=None)[source]¶

Call Devstral via llama.cpp OpenAI-compatible chat completions endpoint.

Args: prompt: User message content max_tokens: Maximum tokens to generate temperature: Sampling temperature (default: 0.00000073) system: System prompt

Returns: Dictionary with response and metadata

Note: Requires devstral_server.sh running on localhost:8080

Parameters:

prompt (str)
max_tokens (int)
temperature (float | None)
system (str | None)

Return type:

dict[str, Any]

class models.MistralAPIModel[source]¶

Bases: object

Mistral AI API provider with model selector.

Supports multiple Mistral model options from budget-friendly to high-capability.

Model Options (use short names or full identifiers): - “small” mistral-medium-latest (recommended, good balance, cheap) - “medium” mistral-large-2411 (most capable, standard pricing) - “mini” ministral-3b (cheapest, edge deployment) - “code” codestral-2508 (coding-focused, cheap) - “devstral” devstral-2 (coding-focused, cheap) - Full identifier: “mistral-large-2411” (use as-is)

Pricing Guide: - ministral-3b: $0.1 input / $0.5 output per 1M tokens (cheapest) - ministral-14b: $0.5 input / $2.5 output per 1M tokens - codestral-2508: $0.5 input / $2.5 output per 1M tokens - mistral-medium-latest: $1.0 input / $5.0 output per 1M tokens - mistral-large-2411: $3.0 input / $15.0 output per 1M tokens (most capable)

Examples: # Using short names (recommended) small = MistralAPIModel(model=”small”) code = MistralAPIModel(model=”code”)

# Using full identifiers custom = MistralAPIModel(model=”mistral-large-2411”)

MODEL_ALIASES = {'code': 'codestral-2508', 'devstral': 'devstral-2', 'medium': 'mistral-large-2411', 'mini': 'ministral-3b', 'small': 'mistral-medium-latest'}¶

__init__(model='small', api_key=None, base_url='https://api.mistral.ai/v1')[source]¶

Initialize Mistral API provider.

Args: model: Model to use. Can be: - Short name: “small”, “medium”, “mini”, “code”, “devstral” - Full identifier: “mistral-large-2411” api_key: API key (defaults to MISTRAL_API_KEY env var) base_url: API endpoint base URL

Raises: ValueError: If API key not provided

Parameters:

model (str)
api_key (str | None)
base_url (str)

call(prompt, messages=None, **kwargs)[source]¶

Call Mistral API (placeholder - not implemented).

Args: prompt: User prompt messages: Message history **kwargs: Additional parameters

Returns: Dictionary with response (not implemented)

Status: PLACEHOLDER - raises NotImplementedError

Parameters:

prompt (str)
messages (List[Dict[str, str]] | None)

Return type:

Dict[str, Any]

class models.MCPModel[source]¶

Bases: object

Base class for MCP-based model providers.

TEMPERATURE_DEFAULT = 7.3e-07¶

__init__(provider, model_name, server_script, api_key=None)[source]¶

Initialize MCP model.

Args: provider: Provider name (e.g., “anthropic_mcp”) model_name: Model identifier (e.g., “claude-opus-4-6”) server_script: Path to MCP server script (relative to repo root) api_key: Optional API key (else reads from environment)

Parameters:

provider (str)
model_name (str)
server_script (str)
api_key (str | None)

call(prompt, max_tokens=4096, temperature=None, system=None)[source]¶

Execute task via MCP server.

Matches AnthropicModel.call() interface for compatibility.

Args: prompt: Input prompt max_tokens: Maximum tokens to generate temperature: Sampling temperature (unused for MCP servers) system: System prompt (unused for MCP servers)

Returns: Dictionary with response and metadata

Parameters:

prompt (str)
max_tokens (int)
temperature (float | None)
system (str | None)

Return type:

dict[str, Any]

__del__()[source]¶: Cleanup subprocess on deletion.

class models.AnthropicMCPModel[source]¶

Bases: MCPModel

Anthropic Claude models via MCP server.

__init__(model_name, api_key=None)[source]¶

Initialize Anthropic MCP model.

Args: model_name: Claude model identifier api_key: Optional API key (else uses ANTHROPIC_API_KEY env var)

Parameters:

model_name (str)
api_key (str | None)

class models.OpenAIMCPModel[source]¶

Bases: MCPModel

OpenAI GPT models via MCP server with transparent dual authentication.

__init__(model_name, api_key=None)[source]¶

Initialize OpenAI MCP model.

Args: model_name: OpenAI model identifier (e.g., “gpt-5.2”) api_key: Optional API key (else reads from OPENAI_API_KEY env var)

The MCP server handles authentication automatically: - If OPENAI_CHATGPT_LOGIN_MODE=true: Uses codex exec with subscription tokens - If OPENAI_API_KEY is set: Uses direct OpenAI API with SDK - Falls back in that order

No auth resolution is performed here—the MCP server is fully self-contained.

Parameters:

model_name (str)
api_key (str | None)

class models.MistralMCPModel[source]¶

Bases: MCPModel

Mistral AI models via MCP server.

__init__(model_name, api_key=None)[source]¶

Initialize Mistral MCP model.

Args: model_name: Mistral model identifier api_key: Optional API key (else uses MISTRAL_API_KEY env var)

Parameters:

model_name (str)
api_key (str | None)

class models.OllamaMCPModel[source]¶

Bases: MCPModel

Ollama local models via MCP server.

__init__(model_name)[source]¶

Initialize Ollama MCP model.

Args: model_name: Ollama model identifier (e.g., “devstral-small-2:24b”)

Parameters:: model_name (str)

models.mcp_model¶

Script¶

mcp_model.py

Path¶

models/mcp_model.py

Purpose¶

Base class for MCP-based model providers: Handle subprocess lifecycle and JSON-RPC communication.

Provides unified interface to MCP servers (stdio-based) with automatic initialization, error handling, and response normalization to match AnthropicModel.call() interface.

Inputs¶

provider (str): Provider name (e.g., “anthropic_mcp”) model_name (str): Model identifier server_script (str): Path to MCP server script api_key (str, optional): API key for the provider

Outputs¶

Dictionary: {output, model, tokens_used, provider}

Assumptions¶

MCP server script exists and is executable
Server implements standard MCP protocol (initialize, tools/call)
run_with_env.sh wrapper is available in mcp-server/

Failure Modes¶

Process spawn fails RuntimeError
MCP server crashes RuntimeError (EOF on stdout)
Invalid JSON response json.JSONDecodeError

Author: Julen Gamboa <julen.gamboa.ds@gmail.com>

Created¶

2026-02-17

Last Edited¶

2026-02-17

class models.mcp_model.MCPModel[source]¶

Bases: object

Base class for MCP-based model providers.

TEMPERATURE_DEFAULT = 7.3e-07¶

__init__(provider, model_name, server_script, api_key=None)[source]¶

Initialize MCP model.

Parameters:

provider (str)
model_name (str)
server_script (str)
api_key (str | None)

call(prompt, max_tokens=4096, temperature=None, system=None)[source]¶

Execute task via MCP server.

Matches AnthropicModel.call() interface for compatibility.

Args: prompt: Input prompt max_tokens: Maximum tokens to generate temperature: Sampling temperature (unused for MCP servers) system: System prompt (unused for MCP servers)

Returns: Dictionary with response and metadata

Parameters:

prompt (str)
max_tokens (int)
temperature (float | None)
system (str | None)

Return type:

dict[str, Any]

__del__()[source]¶: Cleanup subprocess on deletion.

models.anthropic_model¶

Script¶

anthropic_model.py

Path¶

models/anthropic_model.py

Purpose¶

Anthropic Claude Model Integration: Call Claude models via API.

IMPORTANT COMPLIANCE NOTICE:¶

This implementation uses API key authentication ONLY. Do NOT modify to add CLI, SDK, or Pro subscription access. Such modifications violate Anthropic’s Terms of Service and may result in: - Immediate termination of API access - Legal consequences - Violation of Hillstar’s compliance architecture

Default temperature 0.00000073 minimizes hallucination for research tasks.

Inputs¶

model_name (str): Claude model identifier (e.g., “claude-opus-4-6”) api_key (str, optional): Explicit API key (else reads ANTHROPIC_API_KEY env var) use_api_key (bool): Whether to use API key auth (True) or SDK (False)

Outputs¶

Dictionary: {output, model, tokens_used, provider}

Assumptions¶

ANTHROPIC_API_KEY environment variable set (unless explicit api_key provided)
anthropic SDK installed (pip install anthropic)

Parameters¶

temperature: Default 0.00000073 (minimize hallucinations) max_tokens: Configurable per call system: Optional system prompt

Failure Modes¶

API key missing ValueError
SDK not installed ImportError
API rate limit requests.exceptions.RequestException

Author: Julen Gamboa <julen.gamboa.ds@gmail.com>

Created¶

2026-02-07

Last Edited¶

2026-02-07

class models.anthropic_model.AnthropicModel[source]¶

Bases: object

Interface to Anthropic Claude models.

Supports multiple Claude model versions with simple selector syntax.

Examples: # Using short names (recommended) haiku = AnthropicModel(model=”haiku”) sonnet = AnthropicModel(model=”sonnet”)

# Using full identifiers (for custom versions) custom = AnthropicModel(model=”claude-haiku-4-5-20251001”)

MODEL_ALIASES = {'haiku': 'claude-haiku-4-5-20251001', 'opus': 'claude-opus-4-6', 'sonnet': 'claude-sonnet-4-6'}¶

TEMPERATURE_DEFAULT = 7.3e-07¶

__init__(model='haiku', api_key=None)[source]¶

Initialize Anthropic Claude model.

Raises: ValueError: If ANTHROPIC_API_KEY not set and not provided ImportError: If anthropic SDK not installed

Parameters:

model (str)
api_key (str | None)

call(prompt, max_tokens=4096, temperature=None, system=None)[source]¶

Call Claude model.

Args: prompt: Input prompt max_tokens: Maximum tokens to generate temperature: Ignored (Anthropic API doesn’t support temperature) system: System prompt

Returns: Dictionary with response and metadata

Parameters:

prompt (str)
max_tokens (int)
temperature (float | None)
system (str | None)

Return type:

dict[str, Any]

models.anthropic_mcp_model¶

Script¶

anthropic_mcp_model.py

Path¶

models/anthropic_mcp_model.py

Purpose¶

Anthropic Claude models via MCP (Model Context Protocol) server.

Uses the anthropic_mcp_server.py MCP server to dispatch tasks via JSON-RPC.

Author: Julen Gamboa <julen.gamboa.ds@gmail.com>

Created¶

2026-02-17

Last Edited¶

2026-02-17

class models.anthropic_mcp_model.AnthropicMCPModel[source]¶

Bases: MCPModel

Anthropic Claude models via MCP server.

__init__(model_name, api_key=None)[source]¶

Initialize Anthropic MCP model.

Args: model_name: Claude model identifier api_key: Optional API key (else uses ANTHROPIC_API_KEY env var)

Parameters:

model_name (str)
api_key (str | None)

models.anthropic_ollama_api_model¶

Script¶

anthropic_ollama_api_model.py

Path¶

models/anthropic_ollama_api_model.py

Purpose¶

Anthropic models via Ollama’s Anthropic-compatible API (Messages API).

Supports both local and cloud Ollama models: - Local: ANTHROPIC_AUTH_TOKEN=ollama + ANTHROPIC_BASE_URL=http://localhost:11434 - Cloud: ANTHROPIC_AUTH_TOKEN=<your_api_key> + ANTHROPIC_BASE_URL=<cloud_endpoint>

Uses Anthropic Messages API for consistency with other Claude models. No subprocess CLI calls - pure HTTP API orchestration.

Inputs¶

model_name (str): Ollama model identifier (e.g., “minimax-m2:cloud”, “glm-4.7:cloud”) messages (list): Conversation messages in Anthropic format max_tokens (int): Maximum response length system (str): Optional system prompt temperature (float): Sampling temperature

Outputs¶

Dictionary: {output, model, tokens_used, provider}

Compliance¶

API-based orchestration compliant with provider ToS. Requires proper API key authentication via environment variables.

Parameters¶

timeout: Default 600s for model call completion max_retries: Retry transient failures (default 2)

Failure Modes¶

Ollama not running error dict with details
Model not available error dict
Timeout waiting for response error dict
Invalid API key 401 error

Author: Julen Gamboa <julen.gamboa.ds@gmail.com>

Created¶

2026-02-13

Last Edited¶

2026-02-14

class models.anthropic_ollama_api_model.AnthropicOllamaAPIModel[source]¶

Bases: object

Anthropic models via Ollama’s Anthropic-compatible API.

VALID_MODELS = {'devstral-2:123b-cloud', 'gemini-3-flash-preview:cloud', 'gpt-oss:120b-cloud', 'minimax-m2.5:cloud', 'mistral-large-3:675b-cloud'}¶

__init__(model_name='minimax-m2.5:cloud', base_url=None, api_key=None, max_retries=2)[source]¶

Initialize Anthropic Ollama API provider.

Parameters:

model_name (str)
base_url (str | None)
api_key (str | None)
max_retries (int)

call(prompt, **kwargs)[source]¶

Call model via Ollama’s Anthropic-compatible API.

Args: prompt: Input prompt text **kwargs: Additional parameters (max_tokens, temperature, system, etc.)

Returns: Dictionary with response and metadata

Parameters:: prompt (str)
Return type:: dict[str, Any]

models.mistral_api_model¶

Script¶

mistral_api_model.py

Path¶

models/mistral_api_model.py

Purpose¶

Mistral AI API integration for orchestration workflows.

Supports models via Mistral’s REST API with proper authentication. API-based only (not Le Chat Pro manual access).

Inputs¶

model_name (str): Mistral model identifier messages (list): Conversation messages in API format max_tokens (int): Maximum response length temperature (float): Sampling temperature

Outputs¶

Dictionary: {output, model, tokens_used, provider}

Compliance¶

API-based orchestration (compliant with Mistral ToS) Requires API key authentication (environment variable) Not for Le Chat Pro automation

Configuration¶

MISTRAL_API_KEY: API key for authentication (via env var) MISTRAL_MODEL: Model identifier

Failure Modes¶

Missing API key ComplianceError
Invalid model API error
Rate limit exceeded error dict
Timeout error dict

Author: Julen Gamboa <julen.gamboa.ds@gmail.com>

Created¶

2026-02-14

Status¶

PLACEHOLDER - Not yet implemented Ready for implementation in Phase 2

class models.mistral_api_model.MistralAPIModel[source]¶

Bases: object

Mistral AI API provider with model selector.

Supports multiple Mistral model options from budget-friendly to high-capability.

Examples: # Using short names (recommended) small = MistralAPIModel(model=”small”) code = MistralAPIModel(model=”code”)

# Using full identifiers custom = MistralAPIModel(model=”mistral-large-2411”)

MODEL_ALIASES = {'code': 'codestral-2508', 'devstral': 'devstral-2', 'medium': 'mistral-large-2411', 'mini': 'ministral-3b', 'small': 'mistral-medium-latest'}¶

__init__(model='small', api_key=None, base_url='https://api.mistral.ai/v1')[source]¶

Initialize Mistral API provider.

Raises: ValueError: If API key not provided

Parameters:

model (str)
api_key (str | None)
base_url (str)

call(prompt, messages=None, **kwargs)[source]¶

Call Mistral API (placeholder - not implemented).

Args: prompt: User prompt messages: Message history **kwargs: Additional parameters

Returns: Dictionary with response (not implemented)

Status: PLACEHOLDER - raises NotImplementedError

Parameters:

prompt (str)
messages (List[Dict[str, str]] | None)

Return type:

Dict[str, Any]

models.mistral_mcp_model¶

Script¶

mistral_mcp_model.py

Path¶

models/mistral_mcp_model.py

Purpose¶

Mistral AI models via MCP (Model Context Protocol) server.

Uses the mistral_mcp_server.py MCP server to dispatch tasks via JSON-RPC.

Author: Julen Gamboa <julen.gamboa.ds@gmail.com>

Created¶

2026-02-17

Last Edited¶

2026-02-17

class models.mistral_mcp_model.MistralMCPModel[source]¶

Bases: MCPModel

Mistral AI models via MCP server.

__init__(model_name, api_key=None)[source]¶

Initialize Mistral MCP model.

Args: model_name: Mistral model identifier api_key: Optional API key (else uses MISTRAL_API_KEY env var)

Parameters:

model_name (str)
api_key (str | None)

models.openai_mcp_model¶

Script¶

openai_mcp_model.py

Path¶

models/openai_mcp_model.py

Purpose¶

OpenAI GPT models via MCP (Model Context Protocol) server.

Uses the openai_mcp_server.py MCP server to dispatch tasks via JSON-RPC.

The MCP server handles dual authentication internally: 1. Subscription mode: Uses OPENAI_CHATGPT_LOGIN_MODE=true to trigger codex exec

Extracts tokens from ~/.config/openai/codex-home/auth.json

Requires: codex login completed

API key mode: Uses OPENAI_API_KEY environment variable

Direct OpenAI SDK calls

Requires: OPENAI_API_KEY set

Authentication is completely transparent to this model class— the MCP server auto-detects which mode to use.

Author: Julen Gamboa <julen.gamboa.ds@gmail.com>

Created¶

2026-02-17

Last Edited¶

2026-02-24

class models.openai_mcp_model.OpenAIMCPModel[source]¶

Bases: MCPModel

OpenAI GPT models via MCP server with transparent dual authentication.

__init__(model_name, api_key=None)[source]¶

Initialize OpenAI MCP model.

Args: model_name: OpenAI model identifier (e.g., “gpt-5.2”) api_key: Optional API key (else reads from OPENAI_API_KEY env var)

No auth resolution is performed here—the MCP server is fully self-contained.

Parameters:

model_name (str)
api_key (str | None)

models.ollama_mcp_model¶

Script¶

ollama_mcp_model.py

Path¶

models/ollama_mcp_model.py

Purpose¶

Ollama (local models) via MCP (Model Context Protocol) server.

Uses the ollama_mcp_server.py MCP server to dispatch tasks to local Ollama models via JSON-RPC.

Author: Julen Gamboa <julen.gamboa.ds@gmail.com>

Created¶

2026-02-17

Last Edited¶

2026-02-17

class models.ollama_mcp_model.OllamaMCPModel[source]¶

Bases: MCPModel

Ollama local models via MCP server.

__init__(model_name)[source]¶

Initialize Ollama MCP model.

Args: model_name: Ollama model identifier (e.g., “devstral-small-2:24b”)

Parameters:: model_name (str)

models.devstral_local_model¶

Script¶

devstral_local_model.py

Path¶

python/hillstar/models/devstral_local_model.py

Purpose¶

LOCAL DEVSTRAL-SMALL-2 MODEL - OPTIONAL ADVANCED SETUP

Integrates Devstral-Small-2 via local llama.cpp HTTP server. This is an OPTIONAL setup for power users with appropriate hardware.

Connects to llama.cpp server running on localhost:8080. Uses OpenAI-compatible /v1/chat/completions endpoint (not Ollama API). Free, local execution on GPU. Default temperature 0.00000073 minimizes hallucination.

HARDWARE REQUIREMENTS (MANDATORY)

Setup: Requires devstral_server.sh running on port 8080 NOT suitable for CPU-only systems

Setup Instructions¶

GPU required (16GB+ VRAM)
Download quantized GGUF model from HuggingFace
Update devstral_server.sh with model path
Start server: ~/bin/devstral_server.sh
Then use this model in workflows

Inputs¶

model_name (str): Model identifier (any value accepted by llama.cpp) endpoint (str): llama.cpp server URL (default: http://127.0.0.1:8080)

Outputs¶

Dictionary: {output, model, tokens_used, provider, error}

Assumptions¶

llama.cpp server running on localhost:8080 (started via devstral_server.sh)
Server exposes OpenAI-compatible /v1/chat/completions endpoint
Local GPU with 16GB+ VRAM available
Quantized GGUF model loaded in llama.cpp

Parameters¶

temperature: Default 0.00000073 max_tokens: Configurable per call system: Optional system prompt

Failure Modes¶

Server not running error “llama.cpp server not responding”
Insufficient VRAM server crashes or OOM errors
Model not loaded server connection fails
Timeout requests.exceptions.Timeout
Model file missing server startup failure

When NOT to Use This¶

No GPU or GPU < 16GB VRAM Use Ollama cloud models instead Need reliability/uptime Use cloud API providers Learning/exploration Start with Ollama local models

Alternative: Use claude-ollama –model devstral-2:123b-cloud via Ollama

Compliance¶

Local execution (no external API calls) Free (no licensing costs) Optional - users must explicitly set up Not included in standard hillstar installation

Author: Julen Gamboa <julen.gamboa.ds@gmail.com>

Created¶

2026-02-07

Last Edited¶

2026-02-14

Status¶

OPTIONAL ADVANCED SETUP Users must explicitly configure and understand GPU requirements

class models.devstral_local_model.DevstralLocalModel[source]¶

Bases: object

LOCAL Devstral-Small-2 via llama.cpp (OpenAI-compatible API).

OPTIONAL - Requires 16GB+ VRAM GPU and quantized GGUF model

TEMPERATURE_DEFAULT = 7.3e-07¶

__init__(model_name='devstral', endpoint='http://127.0.0.1:8080')[source]¶

Args: model_name: Model identifier (llama.cpp accepts any value) endpoint: llama.cpp server endpoint (OpenAI-compatible)

Warning: Requires 16GB+ VRAM GPU and running devstral_server.sh

Parameters:

model_name (str)
endpoint (str)

call(prompt, max_tokens=2048, temperature=None, system=None)[source]¶

Call Devstral via llama.cpp OpenAI-compatible chat completions endpoint.

Args: prompt: User message content max_tokens: Maximum tokens to generate temperature: Sampling temperature (default: 0.00000073) system: System prompt

Returns: Dictionary with response and metadata

Note: Requires devstral_server.sh running on localhost:8080

Parameters:

prompt (str)
max_tokens (int)
temperature (float | None)
system (str | None)

Return type:

dict[str, Any]