NELA

Private Inference

Private Inference means core generation runs on your machine using locally installed model backends. For normal workflows, your prompts and indexed document content do not need a hosted inference API.

How it works

  • You pick a mode and submit a request from the desktop UI.
  • The task router selects candidate models by task type and priority.
  • Required models are started lazily, then reused while active.
  • If one candidate fails, routing can fall back to the next compatible model.
  • Streaming and final outputs are emitted back to the UI for responsive interaction.

This same mechanism powers Chat, Mindmap generation, and parts of Podcast workflows.

Architecture snapshot

Rendering diagram...