AI & MLinference

Inference

The process of running a trained model on new inputs to generate predictions or outputs. Inference is the 'using' phase (vs. training). Inference cost depends on model size, input/output token count, and hardware (GPUs/TPUs). API providers (Anthropic, OpenAI) charge per token for inference. On-device inference (llama.cpp, GGUF) runs locally without API calls.

Decode this term

Related terms

AI & ML

LLM (Large Language Model)

A neural network trained on vast text corpora to understand and generate human language. LLMs (GPT-4, Claude, Llama, Gem...

AI & ML

Token (AI/NLP)

The basic unit of text processed by language models—typically a word, subword, or character. Tokenizers (BPE, SentencePi...