Fractile’s revolutionary approach to computing can run the world’s largest language models

100x Faster*

(at 1/10th the cost of existing systems)

* Decode tok/s, versus a (cluster of) H100 GPUs with 8-bit quantisation and TensorRT-LLM, on Llama2 70B

We are building the hardware that will remove every bottleneck to the fastest possible inference of the largest transformer networks

This means the biggest LLMs in the world running faster than you can read, and a universe of completely new capabilities and possibilities for how we work that will be unlocked by near-instant inference of models with superhuman intelligence.

Existing hardware is not fit for the AI revolution

When a trained language model is run for a user, over 99% of the total compute time is spent not on arithmetic but on moving model weights from memory to the processor

* Llama2 70B at 8 bit quantisation, on 80GB A100 GPU

AI

Revolution

By performing 100% of the operations needed to run model inference in memory, we blast through the memory bottleneck

1x

The time taken to run the arithmetic for generating a single word

200x

The time taken moving parameters from memory to the processor for each word

2x

The total time a Fractile processor takes to generate the same word

1x

The time taken to run the arithmetic for generating a single word

200x

The time taken moving parameters from memory to the processor for each word

2x

The total time a Fractile processor takes to generate the same word

We are hiring

see open roles

Please contact us if you are interested in learning more, or connect with us on LinkedIn.

About Us

We are a team of scientists, engineers and hardware designers who are committed to building the solutions that the AI revolution requires to keep scaling. We believe that the most important breakthroughs will come from trying solutions that others are not, to serious problems we actually face.

Founders

Walter and Yuhang met during their PhDs at the University of Oxford, where they were doing AI research in different labs. Seeing a convergence across multiple domains in AI towards the use of a single type of model - the transformer - they started working on a new approach to accelerated computing, designed from the ground up to run these models at the fastest possible speeds. Having satisfied themselves that these principles promised to allow model inference orders of magnitude faster than existing state of the art, they founded Fractile in summer 2022, soon before the explosion in deployment of large language models made the need for a better way to run these networks yet more urgent.

Walter
         Goodwin

PhD on large transformer models in AI, engineer and ex-deeptech investor, runner and amateur cook

Yuhang
           Song

PhD on novel AI learning algorithms, electronic engineer, inventor and chronic badminton enthusiast