AI & MLinference
Inference
The process of running a trained model on new inputs to generate predictions or outputs. Inference is the 'using' phase (vs. training). Inference cost depends on model size, input/output token count, and hardware (GPUs/TPUs). API providers (Anthropic, OpenAI) charge per token for inference. On-device inference (llama.cpp, GGUF) runs locally without API calls.
Related terms
2AI & ML
LLM (Large Language Model)
A neural network trained on vast text corpora to understand and generate human language. LLMs (GPT-4, Claude, Llama, Gem...
AI & ML
Token (AI/NLP)
The basic unit of text processed by language models—typically a word, subword, or character. Tokenizers (BPE, SentencePi...