New Inference Framework Speeds up LLMs Without Raising Costs

By Eleanor Hecks

November 06, 2024

Blog

Large language models (LLMs) are some of today’s most impactful technologies. They’re what make advanced chatbots and generative AI possible, but as their functionality grows, so too do their costs and complexity. A new framework from Stanford researchers could change that.

In a recent research paper, a team unveiled a modular inference framework called Archon. Inference is the stage where LLMs draw on what they’ve learned in training to determine appropriate responses or make predictions based on new data. This requires a considerable amount of complicated computing, so it’s often either slow or expensive. Archon speeds it up without raising costs.

How the Archon Inference Framework Works

Many machine learning models rely on a single technique to perform inference for each request. While this approach simplifies development in some ways, it typically puts accuracy at odds with speed or computational efficiency. Archon can provide both at once by combining multiple LLM components and methods.

The new framework uses layers of LLMs in the same way neural networks combine variables to solve complex problems, helping it find the best solution to a given problem. By using various techniques to optimize different performance measures, Archon can strike an ideal balance between accuracy, speed and cost efficiency. Importantly, it does so according to each task’s individual needs.

In the study, Archon saw an average performance improvement of 14.1% and 10.3% over GPT-40 and Claude 3.5 Sonnet. Those gains occurred consistently across task types. Archon was the most efficient and accurate solution for coding, reasoning, and instruction-following.

What Archon Could Mean for the Future of AI

While the inference framework has seen limited use, these early signs are promising. They suggest that AI applications may not need to sacrifice computing efficiency for accuracy or vice versa. Achieving both would mean tools akin to ChatGPT and Gemini would become increasingly reliable while maintaining their accessibility.

Archon also shows it’s possible to improve LLM performance without additional training. That’s important because AI training costs have more than doubled each year for the past eight years. Hardware accounts for much of the expense, but Archon achieved better results with no extra infrastructure or computing power.

Yielding improvements across multiple task types is likewise promising. Such results suggest it could become easier to build effective general-purpose LLMs to serve a greater variety of use cases without experiencing a drop in reliability.

What It Means for Businesses

These benefits have implications for the businesses using AI and not just those developing it. Lower training requirements mean machine learning could become increasingly accessible to companies with smaller budgets. Considering that 63% of organizations cite model cost as their top concern with AI, the promise of accessibility is hard to ignore.

Similarly, companies may not need to seek purpose-built AI to get the accuracy or efficiency they need. Archon is flexible and open source, making it an easy fit in many contexts. As frameworks like this grow and gain traction, businesses could implement generative AI models meeting their specific needs without the complexity of an in-house or tailor-made solution.

Should trends proceed this way, organizations may need to reframe their approach to AI transparency, particularly in consumer-facing industries. Already, 68% of people believe companies should publicly disclose their AI usage. It’ll become all the more crucial to differentiate between AI-generated, AI-assisted and entirely original content to maintain customer trust as frameworks like Archon drive LLM adoption.

Remaining Challenges

Archon and other solutions like it are still in their early stages. As such, they face lingering obstacles that may slow their growth.

Archon works best with larger models containing a higher number of criteria. While many of today’s most prominent LLMs have over 100 billion parameters, some organizations are moving toward miniaturization to avoid high computing costs and complexity. Archon, at least in its current state, would be far less effective with these smaller models.

The same limitation could pose a challenge for businesses without the resources for a larger LLM. Additional development could overcome the need for high-parameter models, but for now, it holds Archon’s accessibility back. Similar frameworks will likely still grow in complex use cases, but simpler chatbots or straightforward automation tasks are better off with a different approach.

LLMs Are Quickly Evolving

LLMs have already reshaped the AI industry, and they’re not done evolving yet. Innovations like the Archon framework reveal how they can work past current shortcomings, paving the way for broader adoption, better results and lower costs. While the technology is still far from perfect, its potential is difficult to overlook.

Embedded Computing Design

New Inference Framework Speeds up LLMs Without Raising Costs

By Eleanor Hecks

Large language models (LLMs) are some of today’s most impactful technologies. They’re what make advanced chatbots and generative AI possible, but as their functionality grows, so too do their costs and complexity. A new framework from Stanford researchers could change that.

How the Archon Inference Framework Works

What Archon Could Mean for the Future of AI

What It Means for Businesses

Remaining Challenges

LLMs Are Quickly Evolving

Categories

AI & Machine Learning - AI Development Tools & Frameworks

Trending Articles

1NCE's CIP Offers Open Access to Propel IoT Development Across Global Verticals

AAEON’s uCOM-IMX8P Debuts with 4K Display and AI Capabilities

PIMIC Unveils Business Strategy and Silicon Technology for AI at the Edge

Embedded Executive: Memory Controllers Are More Complex Than You Think, Silicon Motion

Chipping In: Europe’s Role in the Semiconductor Industry

Storage

Embedded Executive: Memory Controllers Are More Complex Than You Think, Silicon Motion

Networking & 5G

Ezurio Launches BL54L10 Series - Bluetooth LE + 802.15.4 + NFC Module

Processing

AAEON’s uCOM-IMX8P Debuts with 4K Display and AI Capabilities

Security

Preparing for the Impact of Quantum Computing on Classical Encryption