Introduction

Tensor Comprehensions is a C++ library and mathematical language that helps bridge the gap between researchers in the field of machine learning, who communicate in terms of mathematical operations, and engineers focusing on the practical needs of running large-scale models on various hardware backends. The main differentiating feature of Tensor Comprehensions is that it represents a unique take on Just-In-Time compilation to produce the high-performance codes that the machine learning community needs, automatically and on-demand.

Tensor Comprehensions use Halide and Polyhedral Compilation techniques to automatically synthesize CUDA kernels with delegated memory management and synchronization. This translation performs optimizations for general operator fusion, fast local memory, fast reductions and JIT specialization for specific sizes.

Contrary to classical compiler technology and library approaches, Polyhedral Compilation allows Tensor Comprehensions to schedule computation of individual tensor elements on-demand for each new network.

At the CUDA level, it combines affine loop transformations, fusion/fission and automatic parallelization while ensuring data is correctly moved through the memory hierarchy.

Participants

  • Nicolas Vasilache
  • Oleksandr Zinenko
  • Sven Verdoolaege
  • Theodoros Theodorosis
  • Albert Cohen
  • and other contributors.