Private Inference
Private Inference means core generation runs on your machine using locally installed model backends. For normal workflows, your prompts and indexed document content do not need a hosted inference API.
How it works
- You pick a mode and submit a request from the desktop UI.
- The task router selects candidate models by task type and priority.
- Required models are started lazily, then reused while active.
- If one candidate fails, routing can fall back to the next compatible model.
- Streaming and final outputs are emitted back to the UI for responsive interaction.
This same mechanism powers Chat, Mindmap generation, and parts of Podcast workflows.
Architecture snapshot
Rendering diagram...