|
|
|||||||||
|
|||||||||
| |||||||||
|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Support load permutation in loop-aware SLP
Hi,
Current loop-aware SLP scheme starts from a group of adjacent stores and follows use-def chains until getting to a group of loads. The loads must be adjacent and their order must match the order of the stores, i.e., no permutations are currently allowed. This patch adds a support of a specific type of load permutations along with general support of load permutations in SLP. It aims to vectorize RGB to YUV conversion, that can be viewed as {y, u, v} = M * {r, g, b}, where M is a matrix of constant coefficients, and the calculation is performed in a single-nested loop: for i yi = M00 * ri + M01 * gi + M02 * bi ui = M10 * ri + M11 * gi + M12 * bi vi = M20 * ri + M21 * gi + M22 * bi The required permutation of loads is to transform rgb stream into {r,r,r}, {g,g,g} and {b,b,b} vectors (ignoring vector size for simplicity). The SLP analysis detects such cases: all the loads in the same SLP node must access the same memory location, and all the SLP nodes that contain loads must form a group of adjacent memory accesses. The transformation phase generates vector permutations of the input vectors with compiler generated masks, depending on the data type, vectorization factor and size of SLP nodes. Bootstrapped with vectorization enabled on ppc-linux and tested on Cell SPU and ppc-linux. K. for mainline? Thanks, Ira ChangeLog: * target.h (struct vectorize): Add new target builtin. * tree-vectorizer.h (enum slp_load_perm_type): New. (struct _slp_tree): Add new field loads_perm_type (struct _slp_instance): Add new field same_perm_nodes. (SLP_INSTANCE_SAME_PERM_NDES): New. (SLP_TREE_LADS_PERM_TYPE, TARG_VEC_PERMUTE_CST): New. (vectorizable_load): Add argument. (vect_transform_slp_perm_load): new. * tree-vect-analyze.c (vect_analyze_operations): Add an argument to vectorizable_load. (vect_build_slp_tree): Add new argument. Allow load permutations for the case when all the loads in the same SLP node access the same memory location. (vect_analyze_slp_instance): In case of same location loads check that the loads from different nodes form an interleaving chain. Sort the nodes according to the chain. * target-def.h (): New. * tree-vect-transform.c (vect_transform_stmt): Add new argument. (vectorizable_store): Allow number of created vectors to be greater than the size of an interleaving group. Don't go along the interleaving chain for SLP. (vect_create_mask_and_perm): New function. (vect_get_mask_element, vect_transform_slp_perm_load): Likewise. (vectorizable_load): Allocate DR_CHAIN according to the number of generated vectors. Don't keep the created vectors statements in the node if permutation is required. Call vect_transform_slp_perm_load to generate the permutation. (vect_transform_stmt): Add new argument. Call vectorizable_load with additional argument. Don't wait for other stores in case of SLP. (vect_schedule_slp_instance): Add new argument. Calculate the number of vector statements. In case of loads from the same location, allocate vectorized statements structure for all the related SLP nodes. Call vect_transform_stmt with additional argument. (vect_schedule_slp): Remove one argument. Move number of vector statements calculation to vect_schedule_slp_instance. (vect_transform_loop): Call vect_transform_stmt and vect_schedule_slp with correct arguments. * config/spu/spu.c (spu_builtin_vec_perm): New. (): Redefine * config/spu/spu.h (TARG_VEC_PERMUTE_CS): Define. * config/rs6000/rs6000.c (rs6000_builtin_vec_perm): New. (): Redefine. testsuite/ChangeLog: * lib/target-supports.exp (): New. * gcc.dg/vect/slp-perm-1.c: New testcase. * gcc.dg/vect/slp-perm-2.c: Likewise. * gcc.dg/vect/slp-perm-3.c: Likewise. * gcc.dg/vect/slp-perm-4.c: Likewise. * gcc.dg/vect/slp-perm-5.c: Likewise. * gcc.dg/vect/slp-perm-6.c: Likewise. * gcc.dg/vect/slp-perm-7.c: Likewise. * gcc.dg/vect/slp-perm-8.c: Likewise. * gcc.dg/vect/slp-perm-9.c: Likewise. (See attached file: slp-perm.txt)(See attached file: tests.txt) |
|
#2
|
|||
|
|||
|
Support load permutation in loop-aware SLP
* config/rs6000/rs6000.c (rs6000_builtin_vec_perm): New.
(): Redefine. The rs6000 part of the patch is okay. Thanks, David |
![]() |
| Viewing: Web Development Archives > Mailing Lists > Development > Support load permutation in loop-aware SLP |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|