Performance Guide
Optimization Modes

Optimization Modes

HyperChat™ has an "LLM Orchestrator" which analyzes input queries with respect to the level of complexity required to answer that query as well as other factors, and routes the query to the LLM that it thinks should respond to that query. Naturally, this selection process is a complex optimization problem with three objectives: best cost, best speed and best response accuracy / quality.

In its default mode ("Auto"), HyperChat™ finds pareto-optimal solutions to this optimization problem and brings you the best compromise. However, this might not always be what the user wants.

For instance for a customer support bot that is helping a call center worker in real-time, the response latency and cost are the most critical aspects since millions of customers are waiting on the other end of the phone for answers. The call center workers can accommodate a small amount of accuracy loss in the answers since they read and transmit that information to the customers themselves. Or, for a research assistant bot, the quality and accuracy of the answers to user questions might be the most important performance attribute.

To cover all of these cases, HyperChat™ offers three optimization modes:

  • Fast: Optimize for the fastest response and lowest cost, without sacrificing significant accuracy.
  • Auto: Automatically optimize for the best performance, let HyperChat™ decide the best setting for your input.
  • Premium: Ensure highest quality, especially for workflows that require higher-level reasoning.

Benchmarks for each optimization mode are coming soon! Thank you for your continued support and trust in HyperbeeAI! Stay tuned.