©2020 by SimpleMachines, Inc.

SimpleMachines is  building solutions to enable this dynamic future.

Typically, a handful of behaviors sufficiently represent any program in production systems. These behaviors transcend application domains and physical deployment form factors. To streamline the user experience for key AI frameworks, our compiler is integrated directly into the backends for TensorFlow, ONNX, and PyTorch. This enables existing AI code bases to integrate seamlessly with SimpleMachines' compiler technology on a plug-and-play basis. The same is true for many other rapidly growing applications fueling today’s algorithm and AI-driven world. This new paradigm from SimpleMachines achieves algorithm-independent acceleration. It simultaneously overcomes ASIC design rigidity and CPU processing inefficiency.

User Experience

Speed Up

Cost Savings

Future Ready

No new language to learn, no new framework to learn, no source code modification. Your models run as is

5x to 100x speedup on mainstream AI applications

5x to 10x reduction in total cost of onwership compared to current state-of-art solutions

Any algorithm can be mapped to our platform using our behavior based complier

Deep Learning Use Cases

Other Use Cases


Speed increase

Image Classification

Image Segmentation

Language Processing

NW Security





Loan Pricing


Data Analytics



First, tightly-integrating  conventional (von Neumann) CPUs and dataflow accelerators gives performance and efficiency that exceeds what can be achieved from either independently. This breakthrough changed computer architecture practices and was recognized with two premier research awards: IEEE Micro Top Picks Selection and a prestigious selection by the Communication of ACM Research Highlights. 

Secondly, the accelerators themselves do not have to be highly-specialized for a particular workload in order to deliver significant speedup. Analysis of generic program properties leads to a small number of accelerator building blocks that can be dynamically combined for any target workload, enabling a general-purpose configurable accelerator that is competitive with many domain-specialized ones.  This breakthrough was recognized with a premier research awards, IEEE Micro Top Picks Selection for redefining what is an accelerator.

Finally, a critical underlying technology enables the mapping of such workloads onto configurable building blocks in a very general way. The integer linear programming reformulation of the compiler scheduling problem allows sophisticated optimization frameworks to be applied, making the accelerator accessible for a wide range of applications. This work was recognized with two premier research awards: a PLDI Distinguished Paper award and a prestigious selection as a Communication of ACM Research Highlights in changing the thinking in the field on how to build compilers for modern chips.