Tiramisu Compiler
|
A class that represents computations. More...
#include <core.h>
Inherited by tiramisu::communicator, tiramisu::constant, tiramisu::input, and tiramisu::view.
Public Member Functions | |
computation (std::string iteration_domain, tiramisu::expr e, bool schedule_this_computation, tiramisu::primitive_t t, tiramisu::function *fct) | |
Constructor for computations. More... | |
computation (std::string name, std::vector< var > iterator_variables, tiramisu::expr e, bool schedule_this_computation) | |
Constructor for computations. More... | |
computation (std::vector< var > iterator_variables, tiramisu::expr e) | |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
computation (std::string name, std::vector< var > iterator_variables, tiramisu::expr e) | |
Constructor for computations. More... | |
computation (std::vector< var > iterator_variables, tiramisu::expr e, bool schedule_this_computation) | |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
computation (std::string name, std::vector< var > iterator_variables, primitive_t t) | |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.t is the type of the computation, i.e. More... | |
computation (std::vector< var > iterator_variables, primitive_t t) | |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
virtual bool | is_send () const |
virtual bool | is_recv () const |
virtual bool | is_send_recv () const |
virtual bool | is_wait () const |
void | add_associated_let_stmt (std::string access_name, tiramisu::expr e) |
Add a let statement that is associated to this computation. More... | |
void | unschedule_this_computation () |
Don't scheduled a previously scheduled computation. More... | |
virtual void | add_definitions (std::string iteration_domain_str, tiramisu::expr e, bool schedule_this_computation, tiramisu::primitive_t t, tiramisu::function *fct) |
Add definitions of computations that have the same name as this computation. More... | |
void | add_predicate (tiramisu::expr predicate) |
Add a predicate (condition) on the computation. More... | |
void | after (computation &comp, tiramisu::var iterator) |
Schedule this computation to run after the computation comp . More... | |
void | after (computation &comp, int level) |
This function is equivalent to void after(computation &comp, tiramisu::var iterator); except that it uses loop level numbers (0, 1, 2, ...) instead of using loop variables (tiramisu::var). More... | |
void | allocate_and_map_buffer_automatically (tiramisu::argument_t type=tiramisu::a_temporary) |
void | apply_transformation_on_schedule (std::string map_str) |
Apply a transformation on the schedule. More... | |
void | between (computation &before_comp, tiramisu::var before_l, computation &after_comp, tiramisu::var after_l) |
Schedule this computation to run after before_comp at the loop level before_l , and before after_comp at loop level after_l . More... | |
tiramisu::buffer * | get_automatically_allocated_buffer () |
Return the buffer that was allocated automatically using high level data mapping functions. More... | |
void | interchange (tiramisu::var L0, tiramisu::var L1) |
Interchange (swap) the two loop levels L0 and L1 . More... | |
void | interchange (int L0, int L1) |
Identical to void interchange(tiramisu::var L0, tiramisu::var L1);. More... | |
void | mark_as_let_statement () |
Mark this statement as a let statement. More... | |
void | mark_as_library_call () |
Mark this statement as a library call. More... | |
void | parallelize (tiramisu::var L) |
Tag the loop level L to be parallelized. More... | |
void | set_wait_access (std::string access_str) |
void | set_wait_access (isl_map *access) |
void | set_expression (const tiramisu::expr &e) |
Set the expression of the computation. More... | |
void | set_inline (bool is_inline=true) |
Sets whether the computation is inline or not, based on the value of is_inline . More... | |
const bool | is_inline_computation () const |
Returns true if and only if the computation is inline. More... | |
void | shift (tiramisu::var L0, int n) |
Shift the loop level L0 of the iteration space by n iterations. More... | |
void | skew (tiramisu::var i, tiramisu::var j, int f, tiramisu::var ni, tiramisu::var nj) |
Apply loop skewing on the loop levels i and j with a skewing factor of f . More... | |
void | skew (tiramisu::var i, tiramisu::var j, tiramisu::var k, int factor, tiramisu::var ni, tiramisu::var nj, tiramisu::var nk) |
Apply loop skewing on the loop levels i , j and k with a skewing factor of f . More... | |
void | skew (tiramisu::var i, tiramisu::var j, tiramisu::var k, tiramisu::var l, int factor, tiramisu::var ni, tiramisu::var nj, tiramisu::var nk, tiramisu::var nl) |
Apply loop skewing on the loop levels i , j , k , l with a skewing factor of f . More... | |
void | skew (tiramisu::var i, tiramisu::var j, int factor) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
void | skew (tiramisu::var i, tiramisu::var j, tiramisu::var k, int factor) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
void | skew (tiramisu::var i, tiramisu::var j, tiramisu::var k, tiramisu::var l, int factor) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
void | skew (int i, int j, int factor) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
void | skew (int i, int j, int k, int factor) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
void | skew (int i, int j, int k, int l, int factor) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
void | split (int L0, int sizeX) |
Identical to void split(tiramisu::var L0, int sizeX);. More... | |
void | storage_fold (tiramisu::var dim, int f) |
Fold the storage of the computation. More... | |
tiramisu::computation * | store_at (tiramisu::computation &comp, tiramisu::var L0) |
Allocate the storage of this computation in the loop level L0 . More... | |
void | tag_parallel_level (tiramisu::var L) |
Tag the loop level L to be parallelized. More... | |
void | tag_parallel_level (int L) |
Identical to void tag_parallel_level(int L);. More... | |
void | tag_vector_level (tiramisu::var L, int len) |
Tag the loop level L to be vectorized. More... | |
void | tag_vector_level (int L, int len) |
Identical to void tag_vector_level(tiramisu::var L, int len);. More... | |
void | tag_unroll_level (tiramisu::var L) |
Tag the loop level L to be unrolled. More... | |
void | tag_unroll_level (int L) |
Identical to void tag_unroll_level(tiramisu::var L);. More... | |
template<typename... Args> | |
tiramisu::expr | operator() (Args...args) |
Access operator: C0(i,j) represents an access to the element (i,j) of the computation C0. More... | |
operator expr () | |
void | after_low_level (computation &comp, int level) |
Schedule this computation to run after the computation comp . More... | |
void | after_low_level (computation &comp, std::vector< int > levels) |
Schedule this computation to run after the computation comp . More... | |
void | before (computation &consumer, tiramisu::var L) |
Schedule this computation to run before the computation consumer at the loop level L . More... | |
void | store_in (buffer *buff) |
Store this computation in buff . More... | |
void | store_in (buffer *buff, std::vector< expr > iterators) |
Store this computation in buff . More... | |
void | compute_at (computation &consumer, tiramisu::var L) |
This function assumes that consumer consumes values produced by this computation (which is the producer). More... | |
void | compute_at (computation &consumer, int L) |
Store this computation in buff . More... | |
int | compute_maximal_AST_depth () |
Generates the time-space domain and construct an AST that scans that time-space domain, then compute the depth of this AST. More... | |
void | dump_iteration_domain () const |
Dump the iteration domain of the computation. More... | |
void | dump_schedule () const |
Dump the schedule of the computation. More... | |
void | dump () const |
Dump the computation on stdout. More... | |
void | fuse_after (tiramisu::var lev, computation &comp) |
Fuse this computation with the computation passed as argument in the same loop. More... | |
void | gen_time_space_domain () |
Generate the time-space domain of the computation. More... | |
void | drop_rank_iter (var level) |
Specify that the rank loop iterator should be removed from linearization. More... | |
tiramisu::primitive_t | get_data_type () const |
Get the data type of the computation. More... | |
const tiramisu::expr & | get_expr () const |
Return the Tiramisu expression associated with the computation. More... | |
isl_set * | get_iteration_domain () const |
Return the iteration domain of the computation. More... | |
tiramisu::computation & | get_last_update () |
Get the last update of a computation. More... | |
int | get_loop_level_number_from_dimension_name (std::string dim_name) |
Search the time-space domain (the range of the schedule) and return the loop level number that correspond to the dimension named dim . More... | |
const std::string & | get_name () const |
Return the name of the computation. More... | |
computation * | get_predecessor () |
Returns a pointer to the computation scheduled immediately before this computation, or a null pointer if none exist. More... | |
tiramisu::computation & | get_update (int index) |
Returns the index update that has been added to this computation such that: More... | |
isl_map * | get_schedule () const |
Get the schedule of the computation. More... | |
void | gpu_tile (tiramisu::var L0, tiramisu::var L1, int sizeX, int sizeY) |
Tile the computation and then tag the outermost tile dimension to be mapped to GPU blocks and tag the innermost tile dimensions to be mapped to GPU threads. More... | |
void | gpu_tile (tiramisu::var L0, tiramisu::var L1, int sizeX, int sizeY, tiramisu::var L0_outer, tiramisu::var L1_outer, tiramisu::var L0_inner, tiramisu::var L1_inner) |
Store this computation in buff . More... | |
void | gpu_tile (tiramisu::var L0, tiramisu::var L1, tiramisu::var L2, int sizeX, int sizeY, int sizeZ) |
Store this computation in buff . More... | |
void | gpu_tile (tiramisu::var L0, tiramisu::var L1, tiramisu::var L2, int sizeX, int sizeY, int sizeZ, tiramisu::var L0_outer, tiramisu::var L1_outer, tiramisu::var L2_outer, tiramisu::var L0_inner, tiramisu::var L1_inner, tiramisu::var L2_inner) |
Store this computation in buff . More... | |
void | set_access (std::string access_str) |
Set the access relation of the computation. More... | |
void | set_access (isl_map *access) |
Set the access relation of the computation. More... | |
void | set_low_level_schedule (isl_map *map) |
Set the schedule indicated by map . More... | |
void | set_low_level_schedule (std::string map_str) |
Set the schedule indicated by map . More... | |
void | split (tiramisu::var L0, int sizeX) |
Split the loop level L0 of the iteration space into two new loop levels. More... | |
void | split (tiramisu::var L0, int sizeX, tiramisu::var L0_outer, tiramisu::var L0_inner) |
Split the loop level L0 of the iteration space into two new loop levels. More... | |
void | tag_gpu_level (tiramisu::var L0, tiramisu::var L1) |
Tag the loop level L0 and L1 to be mapped to GPU. More... | |
void | tag_gpu_level (tiramisu::var L0, tiramisu::var L1, tiramisu::var L2, tiramisu::var L3) |
Tag the loop level L0 and L1 to be mapped to GPU. More... | |
void | tag_gpu_level (tiramisu::var L0, tiramisu::var L1, tiramisu::var L2, tiramisu::var L3, tiramisu::var L4, tiramisu::var L5) |
Tag the loop level L0 and L1 to be mapped to GPU. More... | |
void | tag_distribute_level (tiramisu::var L) |
Tag the loop level L to be distributed. More... | |
void | tag_distribute_level (int L) |
Tag the loop level L to be distributed. More... | |
computation & | then (computation &next_computation, tiramisu::var L) |
Schedule this computation to run before the computation next_computation at the loop level L and return next_computation . More... | |
void | tile (tiramisu::var L0, tiramisu::var L1, int sizeX, int sizeY) |
Tile the two loop levels L0 and L1 with rectangular tiling. More... | |
void | tile (tiramisu::var L0, tiramisu::var L1, int sizeX, int sizeY, tiramisu::var L0_outer, tiramisu::var L1_outer, tiramisu::var L0_inner, tiramisu::var L1_inner) |
Tile the two loop levels L0 and L1 with rectangular tiling. More... | |
void | tile (tiramisu::var L0, tiramisu::var L1, tiramisu::var L2, int sizeX, int sizeY, int sizeZ) |
Tile the two loop levels L0 and L1 with rectangular tiling. More... | |
void | tile (tiramisu::var L0, tiramisu::var L1, tiramisu::var L2, int sizeX, int sizeY, int sizeZ, tiramisu::var L0_outer, tiramisu::var L1_outer, tiramisu::var L2_outer, tiramisu::var L0_inner, tiramisu::var L1_inner, tiramisu::var L2_inner) |
Tile the two loop levels L0 and L1 with rectangular tiling. More... | |
void | tile (int L0, int L1, int sizeX, int sizeY) |
Tile the two loop levels L0 and L1 with rectangular tiling. More... | |
void | tile (int L0, int L1, int L2, int sizeX, int sizeY, int sizeZ) |
Tile the two loop levels L0 and L1 with rectangular tiling. More... | |
void | unroll (tiramisu::var L, int fac) |
Unroll the loop level L with an unrolling factor fac . More... | |
void | unroll (tiramisu::var L, int fac, tiramisu::var L_outer, tiramisu::var L_inner) |
Unroll the loop level L with an unrolling factor fac . More... | |
void | vectorize (tiramisu::var L, int v) |
Vectorize the loop level L . More... | |
void | vectorize (tiramisu::var L, int v, tiramisu::var L_outer, tiramisu::var L_inner) |
Vectorize the loop level L . More... | |
Static Public Member Functions | |
static xfer | create_xfer (std::string send_iter_domain, std::string recv_iter_domain, tiramisu::expr send_dest, tiramisu::expr recv_src, xfer_prop send_prop, xfer_prop recv_prop, tiramisu::expr send_expr, tiramisu::function *fct) |
static xfer | create_xfer (std::string iter_domain, xfer_prop prop, tiramisu::expr expr, tiramisu::function *fct) |
Static Public Attributes | |
static const var | root |
root_dimension is a number used to specify the dimension level known as root. More... | |
static const int | root_dimension = -1 |
Equivalent of computation::root but to be used with scheduling functions that take loop level (integers) as input instead of tiramisu::var. More... | |
Protected Member Functions | |
computation () | |
Dummy constructor for derived classes. More... | |
std::vector< tiramisu::expr > * | compute_buffer_size () |
Compute the size of the buffer allocated automatically to hold the results of this computation. More... | |
isl_ctx * | get_ctx () const |
Return the context of the computations. More... | |
tiramisu::expr | get_predicate () |
Return the predicate around this computation if a predicate was added using add_predicate(). More... | |
const std::string | get_unique_name () const |
Return a unique name of computation; made of the following pattern: [computation name]@[computation address in memory]. More... | |
bool | has_multiple_definitions () |
Return true if the computation has multiple definitions. More... | |
void | init_computation (std::string iteration_space_str, tiramisu::function *fct, const tiramisu::expr &e, bool schedule_this_computation, tiramisu::primitive_t t) |
Initialize a computation. More... | |
void | set_name (const std::string &n) |
Set the name of the computation. More... | |
void | full_loop_level_collapse (int level, tiramisu::expr collapse_from_iter) |
Collapse all the iterations of a loop into one single iteration. More... | |
computation (std::string name, std::vector< var > iterator_variables, tiramisu::expr e, bool schedule_this_computation, primitive_t t) | |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts. More... | |
void | set_schedule (isl_map *map) |
Set the schedule indicated by map . More... | |
void | set_schedule (std::string map_str) |
Set the schedule indicated by map . More... | |
Protected Attributes | |
bool | _is_library_call |
True if this computation represents a library call. More... | |
std::string | library_call_name |
If the computation represents a library call, this is the name of the function. More... | |
isl_ast_expr * | wait_index_expr |
An index expression just for the request buffer. More... | |
A class that represents computations.
A computation is an expression associated with an iteration domain. A computation indicates what needs to be computed (the expression that should be computed). A computation has three representations:
|
protected |
Dummy constructor for derived classes.
|
protected |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
tiramisu::computation::computation | ( | std::string | iteration_domain, |
tiramisu::expr | e, | ||
bool | schedule_this_computation, | ||
tiramisu::primitive_t | t, | ||
tiramisu::function * | fct | ||
) |
Constructor for computations.
iteration_domain
is a string that represents the iteration domain of the computation. The iteration domain should be written in the ISL format (http://barvinok.gforge.inria.fr/barvinok.pdf Section 1.2.1).
The iteration domain of a statement is a set that contains all of the execution instances of the statement (a statement in a loop has an execution instance for each loop iteration in which it executes). Each execution instance of a statement in a loop nest is uniquely represented by an identifier and a tuple of integers (typically, the values of the outer loop iterators).
For example, the iteration space of the statement S0 in the following loop nest
is {S0[0,0], S0[0,1], S0[0,2], S0[1,0], S0[1,1], S0[1,2]}
S0[0,0] is the execution instance of S0 in the iteration [0,0].
The previous set of integer tuples can be compactly described by affine constraints as follows
{S0[i,j]: 0<=i<2 and 0<=j<3}
In general, the loop nest
has the following iteration domain
{S0[i,j]: 0<=i<N and 0<=j<M}
This should be read as: the set of points [i,j] such that 0<=i<N and 0<=j<M.
The name of the computation in the iteration domain should not start with _ (an underscore). Names starting with _ are reserved names.
e
is the expression computed by the computation. It is possible to declare the computation without specifying the expression. The expression can be specified later using computation::set_expression(). An example of setting the expression after declaring the computation is presented in tests/test_04.cpp.
schedule_this_computation
should be set to true if the computation is supposed to be schedule and code is supposed to be generated from the computation. Set it to false if you just want to use the computation to represent a buffer (that is passed as an argument to the function) and you do not intend to generate code for the computation. An example where this argument is set to false is presented in tests/test_14.cpp.
t
is the type of the computation, i.e. the type of the expression computed by the computation. Example of types include (p_uint8, p_uint16, p_uint32, ...).
fct
is a pointer to the Tiramisu function where this computation should be added.
Bound Inference: The user can declare computations without providing any constraint about the iteration domain, in this case he can rely on bound inference to infer the constraints about each iteration domain. The user needs only to provide constraints over the domains of the last computations (last consumers), and Tiramisu will propagate these constraints to all the chain of computations that precede those consumers. Note that bound inference is not possible if you have multiple definitions of the same computation. In such a case, you should provide constraints over the iteration domain when you declare the computation.
Examples about bound inference are provided in test_22 to test_25.
tiramisu::computation::computation | ( | std::string | name, |
std::vector< var > | iterator_variables, | ||
tiramisu::expr | e, | ||
bool | schedule_this_computation | ||
) |
Constructor for computations.
Same as tiramisu::computation::computation(std::string name, std::vector<var> iterator_variables, tiramisu::expr e) except that is has the following additional argument:
schedule_this_computation
indicates whether this computation should to be scheduled.
tiramisu::computation::computation | ( | std::vector< var > | iterator_variables, |
tiramisu::expr | e | ||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
tiramisu::computation::computation | ( | std::string | name, |
std::vector< var > | iterator_variables, | ||
tiramisu::expr | e | ||
) |
Constructor for computations.
Declare a computation. A computation in Tiramisu is the equivalent of a a statement surrounded by a loop nest in C (an example is provided later).
name
is the name of the computation.
iterator_variables
is a vector that represents the loop iterators around the computation.
e
is the expression computed by the computation.
For example, if we have two iterator variables
and we have the following computation declaration
This is equivalent to writing the following C code
More precisely, the vector {i, j} specifies the iteration domain of the computation S0. In this case, 0<=i<20 and 0<=j<30.
It is possible to declare the computation without specifying the expression. The expression can be specified later using computation::set_expression(). An example of setting the expression after declaring the computation is presented in tutorial 04. Usually this is needed for writing reductions.
tiramisu::computation::computation | ( | std::vector< var > | iterator_variables, |
tiramisu::expr | e, | ||
bool | schedule_this_computation | ||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.t
is the type of the computation, i.e.
the type of the expression computed by the computation. Example of types include (p_uint8, p_uint16, p_uint32, ...).
Usually, this constructor is used to declare buffer wrappers.
For example, if we have two iterator variables
and we have the following computation declaration
This can be used a wrapper on a buffer buf[20, 30] where the buffer elements are of type uint8.
Definition at line 2676 of file core.h.
References tiramisu::expr::dtype.
|
inline |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
Definition at line 2686 of file core.h.
References tiramisu::a_temporary, and tiramisu::expr::dtype.
void tiramisu::computation::add_associated_let_stmt | ( | std::string | access_name, |
tiramisu::expr | e | ||
) |
Add a let statement that is associated to this computation.
The let statement will be executed before the computation (more precisely, between this computation and any computation that preceeds it). The variable defined by the let statement can be accessible by this computation alone. i.e., it cannot be used in any other computation.
|
virtual |
Add definitions of computations that have the same name as this computation.
The arguments of this function are identical to the arguments of the computation constructor. In general, this function is used to express reductions and to express computation updates.
In other words, this function should be used if the user has already declared a set of computations C and wants to declare more computations that have the same name.
Example: Let's assume we want to declare the following two computations.
To do this this, we can declare the first computation using the computation constructor and declare the second computation using add_definitions().
The use of add_computation is purely due to restrictions imposed by the C++ language and not by the Tiramisu framework itself. This is mainly because in C++, it is not possible to declare two objects with the same name, for example one cannot do
computation C(...); computation C(...);
In order to declare the second set of computations, we chose to use the add_definitions function to avoid this problem.
The newly added computations must have the same name and the same access function as the initial set of computations but can have a different expression.
An example of using this function is available in test_26.
Reimplemented in tiramisu::wait, tiramisu::recv, and tiramisu::send.
void tiramisu::computation::add_predicate | ( | tiramisu::expr | predicate | ) |
Add a predicate (condition) on the computation.
The computation will be executed only if this condition is true.
The predicate can be an affine or a non-affine expression. If you need to put a condition around a block of statements (i.e., a sequence of computations), then you can perform that by adding a predicate to each one of those computations. The compiler will then transform automatically the multiple conditions (condition around each computation) into one condition around the whole block.
void tiramisu::computation::after | ( | computation & | comp, |
tiramisu::var | iterator | ||
) |
Schedule this computation to run after the computation comp
.
This computation is placed after comp
in the loop level level
. level
is a loop level in this computation.
The root level is computation::root. The global variable computation::root is equivalent to var("root").
For example assuming we have the two computations
{S0[i,j]: 0<=i<N and 0<=j<N} and {S1[i,j]: 0<=i<N and 0<=j<N}
In order to make S1 run after S0 in the i loop, one should use
S1.after(S0, i)
which means: S1 is after S0 at the loop level i (which is loop level 0).
The corresponding code is
S1.after(S0, j)
means: S1 is after S0 at the loop level j (which is 1) and would yield the following code
S1.after(S0, computation::root) means S1 is after S0 at the main program level and would yield the following code
Note that as with all other scheduling methods:
void tiramisu::computation::after | ( | computation & | comp, |
int | level | ||
) |
This function is equivalent to void after(computation &comp, tiramisu::var iterator); except that it uses loop level numbers (0, 1, 2, ...) instead of using loop variables (tiramisu::var).
Tiramisu internally represent loop levels using numbers instead of variable names, and this is the actual function used internally.
The outermost loop level is 0. The root level is computation::root_dimension.
For example assuming we have the two computations
{S0[i,j]: 0<=i<N and 0<=j<N} and {S1[i,j]: 0<=i<N and 0<=j<N}
In order to make S1 run after S0 in the i loop, one should use
S1.after(S0,0)
which means: S1 is after S0 at the loop level 0 (which is i).
The corresponding code is
void tiramisu::computation::after_low_level | ( | computation & | comp, |
int | level | ||
) |
Schedule this computation to run after the computation comp
.
The computations are placed after each other in the loop level level
. The outermost loop level is 0. The root level is computation::root_dimension.
For example assuming we have the two computations
{S0[i,j]: 0<=i<N and 0<=j<N} and {S1[i,j]: 0<=i<N and 0<=j<N}
In order to make S1 run after S0 in the i loop, one should use
S1.after_low_level(S0,0)
which means: S1 is after S0 at the loop level 0 (which is i).
The corresponding code is
S1.after_low_level(S0,1)
means: S1 is after S0 at the loop level 1 (which is j) and would yield the following code
S1.after_low_level(S0, computation::root_dimension) means S1 is after S0 at the main program level and would yield the following code
To specify that this computation is after comp
in multiple levels, the user can provide those levels in the levels
vector.
S1.after_low_level(S0, {0,1})
means that S1 is after S0 in the loop level 0 and in the loop level 1.
Note that
S1.after_low_level(S0, L)
would mean that S1 and S0 share the same loop nests for all the loop levels that are before L and that S1 is after S0 in L only. S1 is not after S0 in the loop levels that are before L.
void tiramisu::computation::after_low_level | ( | computation & | comp, |
std::vector< int > | levels | ||
) |
Schedule this computation to run after the computation comp
.
The computations are placed after each other in the loop level level
. The outermost loop level is 0. The root level is computation::root_dimension.
For example assuming we have the two computations
{S0[i,j]: 0<=i<N and 0<=j<N} and {S1[i,j]: 0<=i<N and 0<=j<N}
In order to make S1 run after S0 in the i loop, one should use
S1.after_low_level(S0,0)
which means: S1 is after S0 at the loop level 0 (which is i).
The corresponding code is
S1.after_low_level(S0,1)
means: S1 is after S0 at the loop level 1 (which is j) and would yield the following code
S1.after_low_level(S0, computation::root_dimension) means S1 is after S0 at the main program level and would yield the following code
To specify that this computation is after comp
in multiple levels, the user can provide those levels in the levels
vector.
S1.after_low_level(S0, {0,1})
means that S1 is after S0 in the loop level 0 and in the loop level 1.
Note that
S1.after_low_level(S0, L)
would mean that S1 and S0 share the same loop nests for all the loop levels that are before L and that S1 is after S0 in L only. S1 is not after S0 in the loop levels that are before L.
void tiramisu::computation::allocate_and_map_buffer_automatically | ( | tiramisu::argument_t | type = tiramisu::a_temporary | ) |
void tiramisu::computation::apply_transformation_on_schedule | ( | std::string | map_str | ) |
Apply a transformation on the schedule.
This transformation is from the time-space domain to the time-space domain. It is applied on the range of the schedule (i.e., on the output of the schedule relation).
For example, to shift the i dimension of the time-processor domain of C0, you can apply the transformation
C0[0, 0, i, 0, j, 0] -> C0[0, 0, i+2, 0, j, 0]
To apply an interchange, you would do
C0[0, 0, i, 0, j, 0] -> C0[0, 0, j, 0, i, 0]
void tiramisu::computation::before | ( | computation & | consumer, |
tiramisu::var | L | ||
) |
Schedule this computation to run before the computation consumer
at the loop level L
.
Notes
L
is a loop level of this computation.void tiramisu::computation::between | ( | computation & | before_comp, |
tiramisu::var | before_l, | ||
computation & | after_comp, | ||
tiramisu::var | after_l | ||
) |
Schedule this computation to run after before_comp
at the loop level before_l
, and before after_comp
at loop level after_l
.
The outermost loop level is 0.
Use computation::root_dimension to indicate the root dimension (i.e. the outermost time-space dimension).
If there was already a direct scheduling between before_comp
and after_comp
(e.g. using before, after, between...), that schedule is overwritten; i.e. it no longer exists/has an effect.
Note that as with all other scheduling methods:
void tiramisu::computation::compute_at | ( | computation & | consumer, |
tiramisu::var | L | ||
) |
This function assumes that consumer
consumes values produced by this computation (which is the producer).
Compute this computation as needed for each unique value of the consumer
.
This computation is scheduled so that the values consumed by the consumer
are computed at the level L
and in the same loop nest of the consumer. L
is a loop level in the consumer.
If the consumer needs this computation to be computed redundantly, the function creates the necessary redundant computations and schedules them before the consumer.
This function performs the following:
This function does not:
If this functions creates a duplicate of the computation, the user does not need to set its access relation. The duplicated computation will automatically have the same access relation as the original computation. This access relation is set automatically.
This function does not return a handler to manipulate the duplicate computation. It does not allow the user to manipulate the duplicate freely. The duplicate is scheduled automatically to be executed before the consumer.
void tiramisu::computation::compute_at | ( | computation & | consumer, |
int | L | ||
) |
Store this computation in buff
.
Let us assume that we have a computation C:
and that we want to store each C(i) in bufC[i]. Then we can use store_in() to indicate that as follows:
This mans that each computation C(i) will be stored in the buffer location bufC[i].
If iterators
is specified, the iterators
are used to specify how the computation is mapped to the buffer. If the dimensions of this computation are in0, in1, ..., inn and if iterators
are equal to im0, im1, ..., imm then the computation is mapped as follows
i.e., the computation C[in0, in1, ..., inn] is stored in bufC[im0, im1, ..., imm].
This can be used to store the data in many ways (reordering the storage, storing into modulo buffers, ...).
Assuming we have have computation D(i,j) that has the following iteration domain:
and assuming we have a buffer bufD.
The store_in() function can be used to implement many types of data mappings:
|
protected |
Compute the size of the buffer allocated automatically to hold the results of this computation.
int tiramisu::computation::compute_maximal_AST_depth | ( | ) |
Generates the time-space domain and construct an AST that scans that time-space domain, then compute the depth of this AST.
This is useful for example to know if all the dimensions of the time-space domain will correspond to a loop level in the final generated AST.
|
static |
|
static |
void tiramisu::computation::drop_rank_iter | ( | var | level | ) |
Specify that the rank loop iterator should be removed from linearization.
void tiramisu::computation::dump | ( | ) | const |
Dump the computation on stdout.
This is mainly useful for debugging.
void tiramisu::computation::dump_iteration_domain | ( | ) | const |
Dump the iteration domain of the computation.
This is useful for debugging.
void tiramisu::computation::dump_schedule | ( | ) | const |
Dump the schedule of the computation.
This is mainly useful for debugging.
The schedule is a relation between the iteration space and the time space. The relation provides a logical date of execution for each point in the iteration space. The schedule needs first to be set before calling this function.
|
protected |
Collapse all the iterations of a loop into one single iteration.
|
inline |
Fuse this computation with the computation passed as argument in the same loop.
Run this computation after that computation. Fuse them at the loop level lev
.
For example, assuming we have the following computations
{S0[i,j]: 0<=i<N and 0<=j<N}, {S1[i,j]: 0<=i<N and 0<=j<N} and {S2[i,j]: 0<=i<N and 0<=j<N}.
Without fusion, these computations would be equivalent to the following loops nests
To fuse them, one should call
This would result in fusing S2 with S0 and S1 at loop level j. S2 will be scheduled for execution after S0 and S1. The resulting code would look like
Calling
would result in the following code
Definition at line 3230 of file core.h.
References tiramisu::expr::get_name().
void tiramisu::computation::gen_time_space_domain | ( | ) |
Generate the time-space domain of the computation.
In this representation, the logical time of execution and the processor where the computation will be executed are both specified. The memory location where computations will be stored in memory is not specified at the level.
tiramisu::buffer* tiramisu::computation::get_automatically_allocated_buffer | ( | ) |
Return the buffer that was allocated automatically using high level data mapping functions.
If no automatic buffer was allocated, this function returns NULL.
|
protected |
Return the context of the computations.
tiramisu::primitive_t tiramisu::computation::get_data_type | ( | ) | const |
Get the data type of the computation.
const tiramisu::expr& tiramisu::computation::get_expr | ( | ) | const |
Return the Tiramisu expression associated with the computation.
isl_set* tiramisu::computation::get_iteration_domain | ( | ) | const |
Return the iteration domain of the computation.
In this representation, the order of execution of computations is not specified, the computations are also not mapped to memory.
tiramisu::computation& tiramisu::computation::get_last_update | ( | ) |
Get the last update of a computation.
|
inline |
const std::string& tiramisu::computation::get_name | ( | ) | const |
Return the name of the computation.
computation* tiramisu::computation::get_predecessor | ( | ) |
Returns a pointer to the computation scheduled immediately before this computation, or a null pointer if none exist.
|
protected |
Return the predicate around this computation if a predicate was added using add_predicate().
isl_map* tiramisu::computation::get_schedule | ( | ) | const |
Get the schedule of the computation.
|
protected |
Return a unique name of computation; made of the following pattern: [computation name]@[computation address in memory].
tiramisu::computation& tiramisu::computation::get_update | ( | int | index | ) |
Returns the index
update that has been added to this computation such that:
index
== 0, then this computation is returned.>
0, then it returns the pth computation added through add_definitions. void tiramisu::computation::gpu_tile | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
int | sizeX, | ||
int | sizeY | ||
) |
Tile the computation and then tag the outermost tile dimension to be mapped to GPU blocks and tag the innermost tile dimensions to be mapped to GPU threads.
Tile the two loop levels L0
and L1
with rectangular tiling. sizeX
and sizeY
represent the tile size. L0
and L1
should be two consecutive loop levels (i.e., L0
= L1
+ 1) and they should satisfy L0
> L1
.
L0_outer
, L1_outer
, L0_inner
, L1_inner
are the names of the new dimensions created after tiling.
void tiramisu::computation::gpu_tile | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
int | sizeX, | ||
int | sizeY, | ||
tiramisu::var | L0_outer, | ||
tiramisu::var | L1_outer, | ||
tiramisu::var | L0_inner, | ||
tiramisu::var | L1_inner | ||
) |
Store this computation in buff
.
Let us assume that we have a computation C:
and that we want to store each C(i) in bufC[i]. Then we can use store_in() to indicate that as follows:
This mans that each computation C(i) will be stored in the buffer location bufC[i].
If iterators
is specified, the iterators
are used to specify how the computation is mapped to the buffer. If the dimensions of this computation are in0, in1, ..., inn and if iterators
are equal to im0, im1, ..., imm then the computation is mapped as follows
i.e., the computation C[in0, in1, ..., inn] is stored in bufC[im0, im1, ..., imm].
This can be used to store the data in many ways (reordering the storage, storing into modulo buffers, ...).
Assuming we have have computation D(i,j) that has the following iteration domain:
and assuming we have a buffer bufD.
The store_in() function can be used to implement many types of data mappings:
void tiramisu::computation::gpu_tile | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
tiramisu::var | L2, | ||
int | sizeX, | ||
int | sizeY, | ||
int | sizeZ | ||
) |
Store this computation in buff
.
Let us assume that we have a computation C:
and that we want to store each C(i) in bufC[i]. Then we can use store_in() to indicate that as follows:
This mans that each computation C(i) will be stored in the buffer location bufC[i].
If iterators
is specified, the iterators
are used to specify how the computation is mapped to the buffer. If the dimensions of this computation are in0, in1, ..., inn and if iterators
are equal to im0, im1, ..., imm then the computation is mapped as follows
i.e., the computation C[in0, in1, ..., inn] is stored in bufC[im0, im1, ..., imm].
This can be used to store the data in many ways (reordering the storage, storing into modulo buffers, ...).
Assuming we have have computation D(i,j) that has the following iteration domain:
and assuming we have a buffer bufD.
The store_in() function can be used to implement many types of data mappings:
void tiramisu::computation::gpu_tile | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
tiramisu::var | L2, | ||
int | sizeX, | ||
int | sizeY, | ||
int | sizeZ, | ||
tiramisu::var | L0_outer, | ||
tiramisu::var | L1_outer, | ||
tiramisu::var | L2_outer, | ||
tiramisu::var | L0_inner, | ||
tiramisu::var | L1_inner, | ||
tiramisu::var | L2_inner | ||
) |
Store this computation in buff
.
Let us assume that we have a computation C:
and that we want to store each C(i) in bufC[i]. Then we can use store_in() to indicate that as follows:
This mans that each computation C(i) will be stored in the buffer location bufC[i].
If iterators
is specified, the iterators
are used to specify how the computation is mapped to the buffer. If the dimensions of this computation are in0, in1, ..., inn and if iterators
are equal to im0, im1, ..., imm then the computation is mapped as follows
i.e., the computation C[in0, in1, ..., inn] is stored in bufC[im0, im1, ..., imm].
This can be used to store the data in many ways (reordering the storage, storing into modulo buffers, ...).
Assuming we have have computation D(i,j) that has the following iteration domain:
and assuming we have a buffer bufD.
The store_in() function can be used to implement many types of data mappings:
|
protected |
Return true if the computation has multiple definitions.
i.e., if the computation is defined multiple times. An update is a special case where a computation is defined multiple times. Duplicate computations are another example.
In the following example, C is defined multiple times whereas D is defined only once.
|
protected |
Initialize a computation.
This is a private function that should not be called explicitly by users.
void tiramisu::computation::interchange | ( | tiramisu::var | L0, |
tiramisu::var | L1 | ||
) |
Interchange (swap) the two loop levels L0
and L1
.
void tiramisu::computation::interchange | ( | int | L0, |
int | L1 | ||
) |
Identical to void interchange(tiramisu::var L0, tiramisu::var L1);.
const bool tiramisu::computation::is_inline_computation | ( | ) | const |
Returns true if and only if the computation is inline.
|
virtual |
Reimplemented in tiramisu::recv.
|
virtual |
Reimplemented in tiramisu::send.
|
virtual |
Reimplemented in tiramisu::send_recv.
|
virtual |
Reimplemented in tiramisu::wait.
void tiramisu::computation::mark_as_let_statement | ( | ) |
Mark this statement as a let statement.
void tiramisu::computation::mark_as_library_call | ( | ) |
Mark this statement as a library call.
tiramisu::computation::operator expr | ( | ) |
|
inline |
Access operator: C0(i,j) represents an access to the element (i,j) of the computation C0.
C0(i,j) represents the value computed by the computation C0(i,j)
Definition at line 4005 of file core.h.
References tiramisu::expr::dump(), tiramisu::o_access, and tiramisu::expr::substitute().
void tiramisu::computation::parallelize | ( | tiramisu::var | L | ) |
Tag the loop level L
to be parallelized.
This function is equivalent to the function tiramisu::computation::tag_parallel_level() . There is no difference between the two.
void tiramisu::computation::set_access | ( | std::string | access_str | ) |
Set the access relation of the computation.
The access relation is a relation from computations to buffer locations. access_str
is a string that represents the relation. It is encoded in the ISL format, http://isl.gforge.inria.fr/user.html#Sets-and-Relations of relations.
Note that, in TIramisu, the access relations of computation that have the same name must be identical.
Examples: tutorial_01, tutorial_02, tutorial_08 (actually most tutorials have set_access()).
void tiramisu::computation::set_access | ( | isl_map * | access | ) |
Set the access relation of the computation.
The access relation is a relation from computations to buffer locations. access_str
is a string that represents the relation. It is encoded in the ISL format, http://isl.gforge.inria.fr/user.html#Sets-and-Relations of relations.
Note that, in TIramisu, the access relations of computation that have the same name must be identical.
Examples: tutorial_01, tutorial_02, tutorial_08 (actually most tutorials have set_access()).
void tiramisu::computation::set_expression | ( | const tiramisu::expr & | e | ) |
Set the expression of the computation.
void tiramisu::computation::set_inline | ( | bool | is_inline = true | ) |
Sets whether the computation is inline or not, based on the value of is_inline
.
If a computation is inline, accesses to the computation return the expression of that computation. E.g. if an inline computation S(i,j) is defined with the expression i + j, then S(i + 1, j * i) returns the expression i + 1 + j * i. If is_inline
is not provided, the computation is set to be inline.
void tiramisu::computation::set_low_level_schedule | ( | isl_map * | map | ) |
Set the schedule indicated by map
.
map
is a string that represents a mapping from the iteration domain to the time-space domain (the ISL format to represent maps is documented in http://barvinok.gforge.inria.fr/barvinok.pdf in Sec 1.2.2).
The schedule is a map from the iteration domain to a time space domain. The same name of space should be used for both the range and the domain of the schedule.
In general, users can set the schedule using high level functions such as before(), after(), tile(), compute_at(), vectorize(), split(), ... The use of this function is only reserved for advanced users who want a low level control of the schedule.
Vectors in the time-space domain have the following form
computation_name[redundancy_ID,static,dynamic,static,dynamic,static,dynamic,static,...]
The first dimension of the vector is used to indicate the redundancy ID (the notion of the redundancy ID is explained later).
The following dimensions are interleaved dimensions: static, dynamic, static, dynamic, ... Dynamic dimensions represent the loop levels, while static dimensions are used to order statements within a given block of statements in a given loop level. For example, the computations c0 and c1 in the following loop nest
have the following representations in the iteration domain
and the following representation in the time-space domain
The first dimension (dimension 0) in the time-space domain (the leftmost dimension) is the redundancy ID (in this case it is 0, the meaning of this ID will be explained later). The second dimension (starting from the left) is a static dimension, the third dimension is a dynamic dimension that represents the loop level i, ..., the fifth dimension is a dynamic dimension that represents the loop level j and the last dimension (dimension 5) is a static dimension and allows the ordering of c1 after c0 in the loop nest.
To transform the previous iteration domain to the time-space domain, the following schedule should be used:
The first dimension called "redundancy ID" is only meaningful if the computation is redundant. i.e., some parts of the computation are redundantly computed. Redundant computations are in general used to maximize parallelism and data locality on the expense of doing some computations redundantly. For example, if the two computations c1(i,j) and c2(i,j) both depend on the computation c0(i,j), instead of waiting for c0(i,j) to be computed and then computing c1(i,j) and c2(i,j) in parallel, the thread executing c1(i,j) can compute c0(i,j) by itself and then run c1(i,j). The thread that computes c2(i,j) can do the same and compute c0(i,j) by itself and then compute c2(i,j). In this case the two threads do not need to wait. This is done at the expense of redundant computation since c0(i,j) is computed by both threads.
In general redundant computations are useful when tiling stencil computations. In the context of stencils such a tiling is called "overlapped tiling". Tiles that depend on results computed by other tiles that run in parallel can compute the boundaries redundantly which allows them to avoid waiting and thus can run in parallel.
In Tiramisu, the user can indicate that a chunk of a computation should be computed redundantly. The original computation always has a redundancy ID equal to 0 (which means this is the original computation). The redundant chunk has an ID that is different from 0 and that is used to uniquely identify it.
For example if we want to compute all of c0 three times (that is, compute the original computation and compute two redundant computations), we can use the following schedules:
The schedule of the original computation: {c0[i,j]->c0[0,0,i,0,j,0]: 0<=i<N and 0<=j<N} The schedule of the redundant computation N°1: {c0[i,j]->c0[1,0,i,0,j,0]: 0<=i<N and 0<=j<N} The schedule of the redundant computation N°2: {c0[i,j]->c0[2,0,i,0,j,0]: 0<=i<N and 0<=j<N}
The schedule of c0 in this case would be three maps that map c0[i,j] to the three different redundant computations in the time-processor domain:
The function set_schedule() overrides any other schedule set by the high level scheduling functions. Currently the user has to choose between using the high level scheduling functions or using this low level set_schedule function. The user cannot mix the use of the two in the same program because they are not compatible.
void tiramisu::computation::set_low_level_schedule | ( | std::string | map_str | ) |
Set the schedule indicated by map
.
map
is a string that represents a mapping from the iteration domain to the time-space domain (the ISL format to represent maps is documented in http://barvinok.gforge.inria.fr/barvinok.pdf in Sec 1.2.2).
The schedule is a map from the iteration domain to a time space domain. The same name of space should be used for both the range and the domain of the schedule.
In general, users can set the schedule using high level functions such as before(), after(), tile(), compute_at(), vectorize(), split(), ... The use of this function is only reserved for advanced users who want a low level control of the schedule.
Vectors in the time-space domain have the following form
computation_name[redundancy_ID,static,dynamic,static,dynamic,static,dynamic,static,...]
The first dimension of the vector is used to indicate the redundancy ID (the notion of the redundancy ID is explained later).
The following dimensions are interleaved dimensions: static, dynamic, static, dynamic, ... Dynamic dimensions represent the loop levels, while static dimensions are used to order statements within a given block of statements in a given loop level. For example, the computations c0 and c1 in the following loop nest
have the following representations in the iteration domain
and the following representation in the time-space domain
The first dimension (dimension 0) in the time-space domain (the leftmost dimension) is the redundancy ID (in this case it is 0, the meaning of this ID will be explained later). The second dimension (starting from the left) is a static dimension, the third dimension is a dynamic dimension that represents the loop level i, ..., the fifth dimension is a dynamic dimension that represents the loop level j and the last dimension (dimension 5) is a static dimension and allows the ordering of c1 after c0 in the loop nest.
To transform the previous iteration domain to the time-space domain, the following schedule should be used:
The first dimension called "redundancy ID" is only meaningful if the computation is redundant. i.e., some parts of the computation are redundantly computed. Redundant computations are in general used to maximize parallelism and data locality on the expense of doing some computations redundantly. For example, if the two computations c1(i,j) and c2(i,j) both depend on the computation c0(i,j), instead of waiting for c0(i,j) to be computed and then computing c1(i,j) and c2(i,j) in parallel, the thread executing c1(i,j) can compute c0(i,j) by itself and then run c1(i,j). The thread that computes c2(i,j) can do the same and compute c0(i,j) by itself and then compute c2(i,j). In this case the two threads do not need to wait. This is done at the expense of redundant computation since c0(i,j) is computed by both threads.
In general redundant computations are useful when tiling stencil computations. In the context of stencils such a tiling is called "overlapped tiling". Tiles that depend on results computed by other tiles that run in parallel can compute the boundaries redundantly which allows them to avoid waiting and thus can run in parallel.
In Tiramisu, the user can indicate that a chunk of a computation should be computed redundantly. The original computation always has a redundancy ID equal to 0 (which means this is the original computation). The redundant chunk has an ID that is different from 0 and that is used to uniquely identify it.
For example if we want to compute all of c0 three times (that is, compute the original computation and compute two redundant computations), we can use the following schedules:
The schedule of the original computation: {c0[i,j]->c0[0,0,i,0,j,0]: 0<=i<N and 0<=j<N} The schedule of the redundant computation N°1: {c0[i,j]->c0[1,0,i,0,j,0]: 0<=i<N and 0<=j<N} The schedule of the redundant computation N°2: {c0[i,j]->c0[2,0,i,0,j,0]: 0<=i<N and 0<=j<N}
The schedule of c0 in this case would be three maps that map c0[i,j] to the three different redundant computations in the time-processor domain:
The function set_schedule() overrides any other schedule set by the high level scheduling functions. Currently the user has to choose between using the high level scheduling functions or using this low level set_schedule function. The user cannot mix the use of the two in the same program because they are not compatible.
|
protected |
Set the name of the computation.
|
protected |
Set the schedule indicated by map
.
map
is a string that represents a mapping from the iteration domain to the time-space domain (the ISL format to represent maps is documented in http://barvinok.gforge.inria.fr/barvinok.pdf in Sec 1.2.2).
The schedule is a map from the iteration domain to a time space domain. The same name of space should be used for both the range and the domain of the schedule.
In general, users can set the schedule using high level functions such as before(), after(), tile(), compute_at(), vectorize(), split(), ... The use of this function is only reserved for advanced users who want a low level control of the schedule.
Vectors in the time-space domain have the following form
computation_name[redundancy_ID,static,dynamic,static,dynamic,static,dynamic,static,...]
The first dimension of the vector is used to indicate the redundancy ID (the notion of the redundancy ID is explained later).
The following dimensions are interleaved dimensions: static, dynamic, static, dynamic, ... Dynamic dimensions represent the loop levels, while static dimensions are used to order statements within a given block of statements in a given loop level. For example, the computations c0 and c1 in the following loop nest
have the following representations in the iteration domain
and the following representation in the time-space domain
The first dimension (dimension 0) in the time-space domain (the leftmost dimension) is the redundancy ID (in this case it is 0, the meaning of this ID will be explained later). The second dimension (starting from the left) is a static dimension, the third dimension is a dynamic dimension that represents the loop level i, ..., the fifth dimension is a dynamic dimension that represents the loop level j and the last dimension (dimension 5) is a static dimension and allows the ordering of c1 after c0 in the loop nest.
To transform the previous iteration domain to the time-space domain, the following schedule should be used:
The first dimension called "redundancy ID" is only meaningful if the computation is redundant. i.e., some parts of the computation are redundantly computed. Redundant computations are in general used to maximize parallelism and data locality on the expense of doing some computations redundantly. For example, if the two computations c1(i,j) and c2(i,j) both depend on the computation c0(i,j), instead of waiting for c0(i,j) to be computed and then computing c1(i,j) and c2(i,j) in parallel, the thread executing c1(i,j) can compute c0(i,j) by itself and then run c1(i,j). The thread that computes c2(i,j) can do the same and compute c0(i,j) by itself and then compute c2(i,j). In this case the two threads do not need to wait. This is done at the expense of redundant computation since c0(i,j) is computed by both threads.
In general redundant computations are useful when tiling stencil computations. In the context of stencils such a tiling is called "overlapped tiling". Tiles that depend on results computed by other tiles that run in parallel can compute the boundaries redundantly which allows them to avoid waiting and thus can run in parallel.
In Tiramisu, the user can indicate that a chunk of a computation should be computed redundantly. The original computation always has a redundancy ID equal to 0 (which means this is the original computation). The redundant chunk has an ID that is different from 0 and that is used to uniquely identify it.
For example if we want to compute all of c0 three times (that is, compute the original computation and compute two redundant computations), we can use the following schedules:
The schedule of the original computation: {c0[i,j]->c0[0,0,i,0,j,0]: 0<=i<N and 0<=j<N} The schedule of the redundant computation N°1: {c0[i,j]->c0[1,0,i,0,j,0]: 0<=i<N and 0<=j<N} The schedule of the redundant computation N°2: {c0[i,j]->c0[2,0,i,0,j,0]: 0<=i<N and 0<=j<N}
The schedule of c0 in this case would be three maps that map c0[i,j] to the three different redundant computations in the time-processor domain:
The function set_schedule() overrides any other schedule set by the high level scheduling functions. Currently the user has to choose between using the high level scheduling functions or using this low level set_schedule function. The user cannot mix the use of the two in the same program because they are not compatible.
|
protected |
Set the schedule indicated by map
.
map
is a string that represents a mapping from the iteration domain to the time-space domain (the ISL format to represent maps is documented in http://barvinok.gforge.inria.fr/barvinok.pdf in Sec 1.2.2).
The schedule is a map from the iteration domain to a time space domain. The same name of space should be used for both the range and the domain of the schedule.
In general, users can set the schedule using high level functions such as before(), after(), tile(), compute_at(), vectorize(), split(), ... The use of this function is only reserved for advanced users who want a low level control of the schedule.
Vectors in the time-space domain have the following form
computation_name[redundancy_ID,static,dynamic,static,dynamic,static,dynamic,static,...]
The first dimension of the vector is used to indicate the redundancy ID (the notion of the redundancy ID is explained later).
The following dimensions are interleaved dimensions: static, dynamic, static, dynamic, ... Dynamic dimensions represent the loop levels, while static dimensions are used to order statements within a given block of statements in a given loop level. For example, the computations c0 and c1 in the following loop nest
have the following representations in the iteration domain
and the following representation in the time-space domain
The first dimension (dimension 0) in the time-space domain (the leftmost dimension) is the redundancy ID (in this case it is 0, the meaning of this ID will be explained later). The second dimension (starting from the left) is a static dimension, the third dimension is a dynamic dimension that represents the loop level i, ..., the fifth dimension is a dynamic dimension that represents the loop level j and the last dimension (dimension 5) is a static dimension and allows the ordering of c1 after c0 in the loop nest.
To transform the previous iteration domain to the time-space domain, the following schedule should be used:
The first dimension called "redundancy ID" is only meaningful if the computation is redundant. i.e., some parts of the computation are redundantly computed. Redundant computations are in general used to maximize parallelism and data locality on the expense of doing some computations redundantly. For example, if the two computations c1(i,j) and c2(i,j) both depend on the computation c0(i,j), instead of waiting for c0(i,j) to be computed and then computing c1(i,j) and c2(i,j) in parallel, the thread executing c1(i,j) can compute c0(i,j) by itself and then run c1(i,j). The thread that computes c2(i,j) can do the same and compute c0(i,j) by itself and then compute c2(i,j). In this case the two threads do not need to wait. This is done at the expense of redundant computation since c0(i,j) is computed by both threads.
In general redundant computations are useful when tiling stencil computations. In the context of stencils such a tiling is called "overlapped tiling". Tiles that depend on results computed by other tiles that run in parallel can compute the boundaries redundantly which allows them to avoid waiting and thus can run in parallel.
In Tiramisu, the user can indicate that a chunk of a computation should be computed redundantly. The original computation always has a redundancy ID equal to 0 (which means this is the original computation). The redundant chunk has an ID that is different from 0 and that is used to uniquely identify it.
For example if we want to compute all of c0 three times (that is, compute the original computation and compute two redundant computations), we can use the following schedules:
The schedule of the original computation: {c0[i,j]->c0[0,0,i,0,j,0]: 0<=i<N and 0<=j<N} The schedule of the redundant computation N°1: {c0[i,j]->c0[1,0,i,0,j,0]: 0<=i<N and 0<=j<N} The schedule of the redundant computation N°2: {c0[i,j]->c0[2,0,i,0,j,0]: 0<=i<N and 0<=j<N}
The schedule of c0 in this case would be three maps that map c0[i,j] to the three different redundant computations in the time-processor domain:
The function set_schedule() overrides any other schedule set by the high level scheduling functions. Currently the user has to choose between using the high level scheduling functions or using this low level set_schedule function. The user cannot mix the use of the two in the same program because they are not compatible.
void tiramisu::computation::set_wait_access | ( | std::string | access_str | ) |
void tiramisu::computation::set_wait_access | ( | isl_map * | access | ) |
void tiramisu::computation::shift | ( | tiramisu::var | L0, |
int | n | ||
) |
Shift the loop level L0
of the iteration space by n
iterations.
n
can be a positive or a negative number. A positive number means a shift forward of the loop iterations while a negative value would mean a shift backward.
void tiramisu::computation::skew | ( | tiramisu::var | i, |
tiramisu::var | j, | ||
int | f, | ||
tiramisu::var | ni, | ||
tiramisu::var | nj | ||
) |
Apply loop skewing on the loop levels i
and j
with a skewing factor of f
.
The names of the new loop levels is ni
and nj
.
This command transforms the loop (i, j) into the loop (i, f*i+j). For example if you have the following loop
and apply
you would get
void tiramisu::computation::skew | ( | tiramisu::var | i, |
tiramisu::var | j, | ||
tiramisu::var | k, | ||
int | factor, | ||
tiramisu::var | ni, | ||
tiramisu::var | nj, | ||
tiramisu::var | nk | ||
) |
Apply loop skewing on the loop levels i
, j
and k
with a skewing factor of f
.
The names of the new loop levels is ni
, nj
and nk
.
This command transforms the loop (i, j, k) into the loop (i, f*i+j, f*i+k).
void tiramisu::computation::skew | ( | tiramisu::var | i, |
tiramisu::var | j, | ||
tiramisu::var | k, | ||
tiramisu::var | l, | ||
int | factor, | ||
tiramisu::var | ni, | ||
tiramisu::var | nj, | ||
tiramisu::var | nk, | ||
tiramisu::var | nl | ||
) |
Apply loop skewing on the loop levels i
, j
, k
, l
with a skewing factor of f
.
The names of the new loop levels is ni
, nj
, nk
and nl
.
This command transforms the loop (i, j, k, l) into the loop (i, f*i+j, f*i+k, f*i+l).
void tiramisu::computation::skew | ( | tiramisu::var | i, |
tiramisu::var | j, | ||
int | factor | ||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
void tiramisu::computation::skew | ( | tiramisu::var | i, |
tiramisu::var | j, | ||
tiramisu::var | k, | ||
int | factor | ||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
void tiramisu::computation::skew | ( | tiramisu::var | i, |
tiramisu::var | j, | ||
tiramisu::var | k, | ||
tiramisu::var | l, | ||
int | factor | ||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
void tiramisu::computation::skew | ( | int | i, |
int | j, | ||
int | factor | ||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
void tiramisu::computation::skew | ( | int | i, |
int | j, | ||
int | k, | ||
int | factor | ||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
void tiramisu::computation::skew | ( | int | i, |
int | j, | ||
int | k, | ||
int | l, | ||
int | factor | ||
) |
This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.
void tiramisu::computation::split | ( | tiramisu::var | L0, |
int | sizeX | ||
) |
Split the loop level L0
of the iteration space into two new loop levels.
sizeX
is the extent (size) of the inner loop created after splitting.
void tiramisu::computation::split | ( | tiramisu::var | L0, |
int | sizeX, | ||
tiramisu::var | L0_outer, | ||
tiramisu::var | L0_inner | ||
) |
Split the loop level L0
of the iteration space into two new loop levels.
sizeX
is the extent (size) of the inner loop created after splitting.
void tiramisu::computation::split | ( | int | L0, |
int | sizeX | ||
) |
Identical to void split(tiramisu::var L0, int sizeX);.
void tiramisu::computation::storage_fold | ( | tiramisu::var | dim, |
int | f | ||
) |
Fold the storage of the computation.
Fold the loop level dim
by a factor f
.
tiramisu::computation* tiramisu::computation::store_at | ( | tiramisu::computation & | comp, |
tiramisu::var | L0 | ||
) |
Allocate the storage of this computation in the loop level L0
.
This function does the following:
comp
is computated at the loop level L0
.The function returns the computation (operation) that allocates the buffer. The allocated buffer is not returned.
void tiramisu::computation::store_in | ( | buffer * | buff | ) |
Store this computation in buff
.
Let us assume that we have a computation C:
and that we want to store each C(i) in bufC[i]. Then we can use store_in() to indicate that as follows:
This mans that each computation C(i) will be stored in the buffer location bufC[i].
If iterators
is specified, the iterators
are used to specify how the computation is mapped to the buffer. If the dimensions of this computation are in0, in1, ..., inn and if iterators
are equal to im0, im1, ..., imm then the computation is mapped as follows
i.e., the computation C[in0, in1, ..., inn] is stored in bufC[im0, im1, ..., imm].
This can be used to store the data in many ways (reordering the storage, storing into modulo buffers, ...).
Assuming we have have computation D(i,j) that has the following iteration domain:
and assuming we have a buffer bufD.
The store_in() function can be used to implement many types of data mappings:
Store this computation in buff
.
Let us assume that we have a computation C:
and that we want to store each C(i) in bufC[i]. Then we can use store_in() to indicate that as follows:
This mans that each computation C(i) will be stored in the buffer location bufC[i].
If iterators
is specified, the iterators
are used to specify how the computation is mapped to the buffer. If the dimensions of this computation are in0, in1, ..., inn and if iterators
are equal to im0, im1, ..., imm then the computation is mapped as follows
i.e., the computation C[in0, in1, ..., inn] is stored in bufC[im0, im1, ..., imm].
This can be used to store the data in many ways (reordering the storage, storing into modulo buffers, ...).
Assuming we have have computation D(i,j) that has the following iteration domain:
and assuming we have a buffer bufD.
The store_in() function can be used to implement many types of data mappings:
void tiramisu::computation::tag_distribute_level | ( | tiramisu::var | L | ) |
Tag the loop level L
to be distributed.
void tiramisu::computation::tag_distribute_level | ( | int | L | ) |
Tag the loop level L
to be distributed.
void tiramisu::computation::tag_gpu_level | ( | tiramisu::var | L0, |
tiramisu::var | L1 | ||
) |
Tag the loop level L0
and L1
to be mapped to GPU.
void tiramisu::computation::tag_gpu_level | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
tiramisu::var | L2, | ||
tiramisu::var | L3 | ||
) |
Tag the loop level L0
and L1
to be mapped to GPU.
void tiramisu::computation::tag_gpu_level | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
tiramisu::var | L2, | ||
tiramisu::var | L3, | ||
tiramisu::var | L4, | ||
tiramisu::var | L5 | ||
) |
Tag the loop level L0
and L1
to be mapped to GPU.
void tiramisu::computation::tag_parallel_level | ( | tiramisu::var | L | ) |
Tag the loop level L
to be parallelized.
void tiramisu::computation::tag_parallel_level | ( | int | L | ) |
Identical to void tag_parallel_level(int L);.
void tiramisu::computation::tag_unroll_level | ( | tiramisu::var | L | ) |
Tag the loop level L
to be unrolled.
The user can only tag loop levels that have constant extent.
void tiramisu::computation::tag_unroll_level | ( | int | L | ) |
Identical to void tag_unroll_level(tiramisu::var L);.
void tiramisu::computation::tag_vector_level | ( | tiramisu::var | L, |
int | len | ||
) |
Tag the loop level L
to be vectorized.
len
is the vector length.
The user can only tag loop levels that have constant extent. If a loop level does not have a constant extent, the user should call .vectorize() command instead or he can call separate() and split() manually.
The user has to make sure that the extent of the dimension is bigger than len
. The vectorization of a loop that has less than len
iterations is not correct.
void tiramisu::computation::tag_vector_level | ( | int | L, |
int | len | ||
) |
Identical to void tag_vector_level(tiramisu::var L, int len);.
computation& tiramisu::computation::then | ( | computation & | next_computation, |
tiramisu::var | L | ||
) |
Schedule this computation to run before the computation next_computation
at the loop level L
and return next_computation
.
Notes
L
is a loop level of this computation.void tiramisu::computation::tile | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
int | sizeX, | ||
int | sizeY | ||
) |
Tile the two loop levels L0
and L1
with rectangular tiling.
sizeX
and sizeY
represent the tile size. L0
and L1
should be two consecutive loop levels. L0_outer
, L1_outer
, L0_inner
, L1_inner
are the names of the new dimensions created after tiling.
void tiramisu::computation::tile | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
int | sizeX, | ||
int | sizeY, | ||
tiramisu::var | L0_outer, | ||
tiramisu::var | L1_outer, | ||
tiramisu::var | L0_inner, | ||
tiramisu::var | L1_inner | ||
) |
Tile the two loop levels L0
and L1
with rectangular tiling.
sizeX
and sizeY
represent the tile size. L0
and L1
should be two consecutive loop levels. L0_outer
, L1_outer
, L0_inner
, L1_inner
are the names of the new dimensions created after tiling.
void tiramisu::computation::tile | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
tiramisu::var | L2, | ||
int | sizeX, | ||
int | sizeY, | ||
int | sizeZ | ||
) |
Tile the two loop levels L0
and L1
with rectangular tiling.
sizeX
and sizeY
represent the tile size. L0
and L1
should be two consecutive loop levels. L0_outer
, L1_outer
, L0_inner
, L1_inner
are the names of the new dimensions created after tiling.
void tiramisu::computation::tile | ( | tiramisu::var | L0, |
tiramisu::var | L1, | ||
tiramisu::var | L2, | ||
int | sizeX, | ||
int | sizeY, | ||
int | sizeZ, | ||
tiramisu::var | L0_outer, | ||
tiramisu::var | L1_outer, | ||
tiramisu::var | L2_outer, | ||
tiramisu::var | L0_inner, | ||
tiramisu::var | L1_inner, | ||
tiramisu::var | L2_inner | ||
) |
Tile the two loop levels L0
and L1
with rectangular tiling.
sizeX
and sizeY
represent the tile size. L0
and L1
should be two consecutive loop levels. L0_outer
, L1_outer
, L0_inner
, L1_inner
are the names of the new dimensions created after tiling.
void tiramisu::computation::tile | ( | int | L0, |
int | L1, | ||
int | sizeX, | ||
int | sizeY | ||
) |
Tile the two loop levels L0
and L1
with rectangular tiling.
sizeX
and sizeY
represent the tile size. L0
and L1
should be two consecutive loop levels (i.e., L0
= L1
+ 1) and they should satisfy L0
> L1
.
void tiramisu::computation::tile | ( | int | L0, |
int | L1, | ||
int | L2, | ||
int | sizeX, | ||
int | sizeY, | ||
int | sizeZ | ||
) |
Tile the two loop levels L0
and L1
with rectangular tiling.
sizeX
and sizeY
represent the tile size. L0
and L1
should be two consecutive loop levels (i.e., L0
= L1
+ 1) and they should satisfy L0
> L1
.
void tiramisu::computation::unroll | ( | tiramisu::var | L, |
int | fac | ||
) |
Unroll the loop level L
with an unrolling factor fac
.
The difference between this function and the function tag_unroll_level() is that this function separates the iteration domain into full and partial iteration domains for unrolling first and then it calls tag_unroll_level(). tag_unroll_level() only tags a dimension to be unrolled, it does not modify the tagged dimension.
This function separates the iteration domain into two iteration domains, a full iteration domain and a partial iteration domain. The full iteration domain has an upper bound that is multiple of fac
while the other does not. The full iteration domain is then split by fac
and the inner loop (which should have a constant extent equal to fac
) is tagged as a unrolled loop.
Let us assume the following loop (a loop represents and iteration domain)
To unroll the j loop with an unrolling factor of 4, one should call
S0.unroll(j, 4);
The loop (iteration domain) is first separated into the following two loops
The full loop is then split by 4
the i2 loop is then tagged to be unrolled.
L_outer
and L_inner
are the names of the new loops created after splitting. If not provided, default names will be assigned. L_outer
is the outer loop.
void tiramisu::computation::unroll | ( | tiramisu::var | L, |
int | fac, | ||
tiramisu::var | L_outer, | ||
tiramisu::var | L_inner | ||
) |
Unroll the loop level L
with an unrolling factor fac
.
The difference between this function and the function tag_unroll_level() is that this function separates the iteration domain into full and partial iteration domains for unrolling first and then it calls tag_unroll_level(). tag_unroll_level() only tags a dimension to be unrolled, it does not modify the tagged dimension.
This function separates the iteration domain into two iteration domains, a full iteration domain and a partial iteration domain. The full iteration domain has an upper bound that is multiple of fac
while the other does not. The full iteration domain is then split by fac
and the inner loop (which should have a constant extent equal to fac
) is tagged as a unrolled loop.
Let us assume the following loop (a loop represents and iteration domain)
To unroll the j loop with an unrolling factor of 4, one should call
S0.unroll(j, 4);
The loop (iteration domain) is first separated into the following two loops
The full loop is then split by 4
the i2 loop is then tagged to be unrolled.
L_outer
and L_inner
are the names of the new loops created after splitting. If not provided, default names will be assigned. L_outer
is the outer loop.
void tiramisu::computation::unschedule_this_computation | ( | ) |
Don't scheduled a previously scheduled computation.
void tiramisu::computation::vectorize | ( | tiramisu::var | L, |
int | v | ||
) |
Vectorize the loop level L
.
Use the vector length v
.
The difference between this function and the function tag_vector_level() is that this function prepares the iteration domain for vectorization first and then it calls tag_vector_level(). tag_vector_level() only tags a dimension to be vectorized, it does not change the tagged dimension.
This function will separate the iteration domain into two iteration domains, a full iteration domain and a partial iteration domain. The full iteration domain has an upper bound that is multiple of v
while the other does not. The full iteration domain is then split by v
and the inner loop (which should have a constant extent equal to v
) is tagged as a vector loop.
Let us assume the following loop (a loop represents and iteration domain)
To vectorize the j loop with a vector length 4, one should call
S0.vectorize(j, 4);
The loop (iteration domain) is first separated into the following two loops
The full loop is then split by 4
the i2 loop is then tagged to be vectorized.
The user has to make sure that the extent of the dimension is bigger than v
. The vectorization of a loop that has less than v
iterations is not correct.
The names of the new loop iterators created after vectorization are L_outer
and L_inner
. If not provided, default names assigned.
void tiramisu::computation::vectorize | ( | tiramisu::var | L, |
int | v, | ||
tiramisu::var | L_outer, | ||
tiramisu::var | L_inner | ||
) |
Vectorize the loop level L
.
Use the vector length v
.
The difference between this function and the function tag_vector_level() is that this function prepares the iteration domain for vectorization first and then it calls tag_vector_level(). tag_vector_level() only tags a dimension to be vectorized, it does not change the tagged dimension.
This function will separate the iteration domain into two iteration domains, a full iteration domain and a partial iteration domain. The full iteration domain has an upper bound that is multiple of v
while the other does not. The full iteration domain is then split by v
and the inner loop (which should have a constant extent equal to v
) is tagged as a vector loop.
Let us assume the following loop (a loop represents and iteration domain)
To vectorize the j loop with a vector length 4, one should call
S0.vectorize(j, 4);
The loop (iteration domain) is first separated into the following two loops
The full loop is then split by 4
the i2 loop is then tagged to be vectorized.
The user has to make sure that the extent of the dimension is bigger than v
. The vectorization of a loop that has less than v
iterations is not correct.
The names of the new loop iterators created after vectorization are L_outer
and L_inner
. If not provided, default names assigned.
|
protected |
|
protected |
|
static |
root_dimension is a number used to specify the dimension level known as root.
The root dimension level is the outermost level. It is the level outside any loop nest. Loop level 0 is the level of the first loop (outermost loop), loop 1 is the level of following inner loop, ...
Where is this number used ?
These numbers are used in the helper functions used for scheduling (such as after(), before(), ...). For example, c0.after(c1) indicates that the computation c0 should be executed after the computation c1. Since the two computations c0 and c1 are usually nested in a loop, we need to specify at which loop level c0 is after c1. This is where we need to specify the loop level numbers. Here is an example. Suppose that the two computations c0 and c1 have the following iteration domains {c0[i,j]: 0<=i<N and 0<=j<N} and {c1[i,j]: 0<=i<N and 0<=j<N}.
When code is generated for the two computations, two loop nests are generated. When scheduling c0 after c1 using the after function, the user can choose one among three possibilities in specifying at which level c0 is after c1.
This means that c0 is after c1 starting from loop level 0, (before the loop level 0, c0 and c1 have the same order).
This means that c0 is after c1 starting from loop level 1, (before the loop level 1, c0 and c1 have the same order).
|
static |
Equivalent of computation::root but to be used with scheduling functions that take loop level (integers) as input instead of tiramisu::var.
|
protected |