Skip to content

Large Language Models

Scrypted supports integrating with large language models running in the cloud or locally.

LLM Plugin

Install the LLM plugin in Scrypted to get started. then click Add New to add a cloud or local large language model.

Cloud Provider

The LLM Plugin supports ChatGPT, Claude, and Gemini. You will need to bring your own API key, which requires setting up a billing account or pay as you go account with the provider. Choose the provider and enter the key.

The following models are recommended for price and performance considerations:

ProviderModel
OpenAIgpt-5.4-nano
Anthropicclaude-haiku-4.5
Geminigemini-3.1-flash-lite-preview

Local Model

WARNING

Ollama and LM Studio do not provide fully compliant OpenAI endpoints and may not work correctly. Using the LLM Plugin hosted llama.cpp is preferred. vLLM and SGLang are also recommended.

Local models will require capable hardware:

  • Mac (16GB RAM, Apple Silicon)
  • NVIDIA GPU (12GB RAM)
  • AMD GPU (12GB RAM)
  • AMD AI MAX

TIP

Models can be run in Cluster Mode and and offloaded to a GPU running on another machine or a Mac.

Local models are hosted using llama.cpp. The LLM plugin will install llama.cpp automatically when a new model is added to Scrypted.

The model the server can run is dependent on how much VRAM is available on the system.

ModelMin VRAMModel ID
Qwen3.5-4B5 GBunsloth/Qwen3.5-4B-GGUF
Qwen3.5-9B8 GBunsloth/Qwen3.5-9B-GGUF
Gemma 4 E4B8 GBunsloth/gemma-4-E4B-it-GGUF
Qwen3.5-27B20 GBunsloth/Qwen3.5-27B-GGUF
Gemma 4 26B-A4B21 GBunsloth/gemma-4-26B-A4B-it-GGUF
Gemma 4 31B24 GBunsloth/gemma-4-31B-it-GGUF
Qwen3.5-35B-A3B26 GBunsloth/Qwen3.5-35B-A3B-GGUF

Larger models generally provide better results, but have diminishing returns.

WARNING

Selecting a model that is too large for the available memory will cause system instability.