mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-25 12:12:47 +01:00
5b7208da36
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033 |
||
---|---|---|
.. | ||
add_attributes_amdgcn.ll | ||
add_attributes.ll | ||
attributor_module_slice_reproducer.ll | ||
custom_state_machines_remarks.ll | ||
custom_state_machines.ll | ||
dead_use.ll | ||
deduplication_remarks.ll | ||
deduplication_target.ll | ||
deduplication.ll | ||
fold_generic_main_thread.ll | ||
get_hardware_num_threads_in_block_fold.ll | ||
globalization_remarks.ll | ||
gpu_kernel_detection_remarks.ll | ||
gpu_state_machine_function_ptr_replacement.ll | ||
hide_mem_transfer_latency.ll | ||
icv_remarks.ll | ||
icv_tracking.ll | ||
is_spmd_exec_mode_fold.ll | ||
parallel_deletion_cg_update.ll | ||
parallel_deletion_remarks.ll | ||
parallel_deletion.ll | ||
parallel_level_fold.ll | ||
parallel_region_merging.ll | ||
remove_globalization.ll | ||
replace_globalization.ll | ||
rtf_type_checking.ll | ||
single_threaded_execution.ll | ||
spmdization_remarks.ll | ||
spmdization.ll | ||
values_in_offload_arrays.ll |