Fork me on GitHub


Tiramisu is a polyhedral compiler for expressing fast and portable data parallel algorithms. It provides a simple C++ API for expressing algorithms and how these algorithms should be optimized by the compiler.

The Tiramisu compiler is based on the polyhedral model thus it can express a large set of loop optimizations and data layout transformations. Currently it targets (1) multicore X86 CPUs, (2) Nvidia GPUs, (3) Xilinx FPGAs (Vivado HLS) and (4) distributed machines (using MPI). It is designed to enable easy integration of code generators for new architectures.

Where to Use Tiramisu?

drawing Image Processing
drawing Deep Learning
drawing Scientific Computing

Why Tiramisu?

Performance in Deep Learning

CPU Performance of a convolution implemented in Tiramisu compared to Intel MKL (CPU) for different input sizes (*).
CPU Performance of LSTM implemented in Tiramisu compared to cuDNN (GPU) (**).

(*) The different sizes are extracted from the ResNet paper. CXY is the size of the layer X in ResNet and Y indicates the batch size (Y=0 for a batch size of 32, Y=1 for 64 and Y=2 for 100).

(**) Tensor Comprehensions and Halide cannot express LSTM because LSTM is a recurrent algorithm that creates a cycle in the data-flow graph.


The following is an example of a Tiramisu program specified using the C++ API.

// C++ code with a Tiramisu expression.
#include "tiramisu/tiramisu.h"
using namespace tiramisu;

void generate_code()
    // Specify the name of the function that you want to create.

    // Declare two iterator variables (i and j) such that 0<=i<100 and 0<=j<100.
    var i("i", 0, 100), j("j", 0, 100);

    // Declare a Tiramisu expression (algorithm) that is equivalent to the following C code
    // for (i=0; i<100; i++)
    //   for (j=0; j<100; j++)
    //     C(i,j) = 0;
    computation C({i,j}, 0);
    // Specify optimizations
    C.vectorize(j, 4);

    // Generate code
    C.codegen({&C.get_buffer()}, "generated_code.o");

Getting Started

Selected Publications

Comparison with Polyhedral Compilers

CPU GEMM - Comparison with Polyhedral Compilers (CPU)
GPU GEMM - Comparison with Polyhedral Compilers (GPU)

Matrix dimension sizes for CPU: 1060x1060x1060. GPU: 3072x3072x3072.