1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 10:32:48 +02:00
llvm-mirror/include
Jose M Monsalve Diaz 5b7208da36 [OpenMP] Folding threadLimit and numThreads when single value in kernels
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D106033
2021-07-27 21:47:12 -04:00
..
llvm [OpenMP] Folding threadLimit and numThreads when single value in kernels 2021-07-27 21:47:12 -04:00
llvm-c [IR] Rename comdat noduplicates to comdat nodeduplicate 2021-07-20 12:47:10 -07:00