1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
Commit Graph

217407 Commits

Author SHA1 Message Date
LLVM GN Syncbot
75f1abb9b8 [gn build] Port 80fd5fa5269c 2021-06-21 06:23:08 +00:00
hsmahesha
37c462f96a [AMDGPU] Replace non-kernel function uses of LDS globals by pointers.
The main motivation behind pointer replacement of LDS use within non-kernel
functions is - to *avoid* subsequent LDS lowering pass from directly packing
LDS (assume large LDS) into a struct type which would otherwise cause allocating
huge memory for struct instance within every kernel.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103225
2021-06-21 11:51:49 +05:30
Max Kazantsev
0317b20934 [Test] Add some tests showing room for optimization exploiting undef and UB 2021-06-21 13:11:46 +07:00
Esme-Yi
d5a215bffc [yaml2obj] Add support for writing the long symbol name.
Summary: This patch, as a follow-up of D95505, adds
support for writing the long symbol name by implementing
the StringTable. Only XCOFF32 is suppoted now.

Reviewed By: jhenderson, shchenz

Differential Revision: https://reviews.llvm.org/D103455
2021-06-21 05:09:56 +00:00
Max Kazantsev
633abb5259 [LoopDeletion] Handle Phis with similar inputs from different blocks
This patch lifts the requirement to have the only incoming live block
for Phis. There can be multiple live blocks if the same value comes to
phi from all of them.

Differential Revision: https://reviews.llvm.org/D103959
Reviewed By: nikic, lebedev.ri
2021-06-21 11:37:06 +07:00
Juneyoung Lee
1652bab657 [InstCombine] Use poison constant to represent the result of unreachable instrs
This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction.

This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll .

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104602
2021-06-21 09:58:44 +09:00
Eli Friedman
d624c63575 [NFC][ScalarEvolution] Clean up ExitLimit constructors.
Make all the constructors forward to one constructor.  Remove redundant
assertions.
2021-06-20 17:40:30 -07:00
Jim Lin
191d405aea [IVDescriptors] Fix comment that getUnsafeAlgebraInst has been renamed to getExactFPMathInst
https://reviews.llvm.org/rG36a489d194750dc888f214240e9dec9122ca1f0e renamed the function call
in the test from getUnsafeAlgebraInst to getExactFPMathInst.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D104441
2021-06-21 07:56:22 +08:00
Dmitri Gribenko
71570a2380 [GCOVProfiling][test] Ensure that 'opt' drops any files in a temp directory 2021-06-20 22:48:35 +02:00
Craig Topper
0dab191ced [TypePromotion] Prune Intrinsic includes. NFC
TypePromotion is meant to be a generic pass and doesn't reference
any ARM intrinsics so it shouldn't include IntrinsicsARM.h.
The other Intrinsic related headers appear to be unneeded as well.
2021-06-20 13:04:02 -07:00
Nikita Popov
45f41f7ab1 [LoopUnroll] Use smallest exact trip count from any exit
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.

The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.

Differential Revision: https://reviews.llvm.org/D102982
2021-06-20 20:58:26 +02:00
Fangrui Song
74f848b886 Fix -Wunused-variable and -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC 2021-06-20 11:09:07 -07:00
David Green
51d7c2c19b [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-06-20 17:03:30 +01:00
Sanjay Patel
245d1ca508 [InstCombine] fold ctpop-of-select with 1 or more constant arms
The general pattern is mentioned in:
https://llvm.org/PR50140
...but we need to do a bit more to handle intrinsics with extra operands
like ctlz/cttz.
2021-06-20 11:28:45 -04:00
Sanjay Patel
f90ab103b5 [InstCombine] avoid infinite loops with select folds of constant expressions
This pair of transforms was added recently with:
8591640379ac9175a

And could lead to conflicting folds:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35399
2021-06-20 09:46:25 -04:00
Roman Lebedev
48ca532a9e [NFC][AArch64][ARM][Thumb][Hexagon] Autogenerate some tests
These all (and some others) are being affected by D104597,
but they are manually-written, which rather complicates
checking the effect that change has on them.
2021-06-20 14:12:45 +03:00
Roman Lebedev
9acde4873f [UpdateTestUtils] Print test filename when complaining about conflicting prefix
Now that FileCheck eagerly complains when prefixes are unused,
the update script does the same, and  is becoming very common
to need to drop some prefixes, yet figuring out the file
it complains about isn't obvious unless it actually tells us.
2021-06-20 14:12:39 +03:00
Roman Lebedev
21a466ca4d [SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's address taken
Same as with HoistThenElseCodeToIf() (ad87761925c2790aab272138b5bbbde4a93e0383).
2021-06-20 12:37:14 +03:00
Roman Lebedev
9120098319 [SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's address taken
This problem is exposed by D104598, after it tail-merges `ret` in
`@test_inline_constraint_S_label`, the verifier would start complaining
`invalid operand for inline asm constraint 'S'`.

Essentially, taking address of a block is mismodelled in IR.
It should probably be an explicit instruction, a first one in block,
that isn't identical to any other instruction of the same type,
so that it can't be hoisted.
2021-06-20 12:18:15 +03:00
Juneyoung Lee
b719c3f4a5 [InstSimplify] icmp poison, X -> poison
This adds a simple transformation from icmp with poison constant to poison.
Comparing poison with something else is poison, so this is okay.

https://alive2.llvm.org/ce/z/e8iReb
https://alive2.llvm.org/ce/z/q4MurY
2021-06-20 15:39:07 +09:00
Fangrui Song
9e8233e08c [llvm-cov gcov] Support GCC 12 format
GCC 12 will change the length field to represent the number of bytes instead of
32-bit words. This avoids padding for strings.
2021-06-19 22:51:20 -07:00
Fangrui Song
f02bea7812 [llvm-cov gcov] Change case to match the prevailing style && replace getString with readString 2021-06-19 22:50:52 -07:00
Fangrui Song
8a7045847f [test] Fix nocompress.test 2021-06-19 16:27:53 -07:00
Fangrui Song
2fe891f533 [llvm-profdata] Make diagnostics consistent with the (no capitalization, no period) style
The format is currently inconsistent. Use the https://llvm.org/docs/CodingStandards.html#error-and-warning-messages style.

And add `error:` or `warning:` to CHECK lines wherever appropriate.
2021-06-19 14:54:25 -07:00
Fangrui Song
e2a2115bff [llvm-profdata] Delete unneeded empty output filename check 2021-06-19 12:20:45 -07:00
Craig Topper
61a08a19b8 [RISCV] Prevent formation of shXadd(.uw) and add.uw if it prevents the use of addi.
If the outer add has an simm12 immediate operand we should prefer
it instead of materializing it in a register. This would guarantee
and extra instruction and temporary register. Since we don't check
one use on the shl or zext we might generate more instructions if
there is an additional user.
2021-06-19 12:10:42 -07:00
Roman Lebedev
a8e6eca719 [NFC] AMD Zen 3: fix typo in a comment 2021-06-19 22:05:17 +03:00
Fangrui Song
2beef4520d Simplify some typedef struct 2021-06-19 11:36:44 -07:00
Nico Weber
a20aff32da [gn build] (manually) port b9c05aff205b (MIRTests) 2021-06-19 13:04:09 -04:00
Michael Liao
d21f701c76 [MIRPrinter] Add machine metadata support.
- Distinct metadata needs generating in the codegen to attach correct
  AAInfo on the loads/stores after lowering, merging, and other relevant
  transformations.
- This patch adds 'MachhineModuleSlotTracker' to help assign slot
  numbers to these newly generated unnamed metadata nodes.
- To help 'MachhineModuleSlotTracker' track machine metadata, the
  original 'SlotTracker' is rebased from 'AbstractSlotTrackerStorage',
  which provides basic interfaces to create/retrive metadata slots. In
  addition, once LLVM IR is processsed, additional hooks are also
  introduced to help collect machine metadata and assign them slot
  numbers.
- Finally, if there is any such machine metadata, 'MIRPrinter' outputs
  an additional 'machineMetadataNodes' field containing all the
  definition of those nodes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103205
2021-06-19 12:48:08 -04:00
Michael Liao
be5f17eb4b [amdgpu] Improve the from f32 to i64.
- Take the same principle as the conversion from f64 to i64 with extra
  necessary pre- and post-processing. It helps to reduce that conversion
  sequence by half compared to legacy one.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D104427
2021-06-19 12:46:48 -04:00
Sanjay Patel
649e66f4eb [InstCombine][test] add tests for select-of-bit-manip; NFC 2021-06-19 12:34:32 -04:00
Tomas Matheson
e797f6f6ee Allow building for release with EXPENSIVE_CHECKS
D97225 moved LazyCallGraph verify() calls behind EXPENSIVE_CHECKS,
but verity() is defined for debug builds only so this had the unintended
effect of breaking release builds with EXPENSIVE_CHECKS.

Fix by enabling verify() for both debug and EXPENSIVE_CHECKS.

Differential Revision: https://reviews.llvm.org/D104514
2021-06-19 17:02:11 +01:00
Tomas Matheson
48473628b0 [ARM][NFC] Tidy up subtarget frame pointer routines
getFramePointerReg only depends on information in ARMSubtarget,
so move it in there so it can be accessed from more places.

Make use of ARMSubtarget::getFramePointerReg to remove duplicated code.

The main use of useR7AsFramePointer is getFramePointerReg, so inline it.

Differential Revision: https://reviews.llvm.org/D104476
2021-06-19 17:00:45 +01:00
LLVM GN Syncbot
db683fa2a1 [gn build] Port 134723edd5bf 2021-06-19 11:49:56 +00:00
Nikita Popov
272334cfc0 [LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop()
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.

Differential Revision: https://reviews.llvm.org/D104487
2021-06-19 09:25:57 +02:00
Ben Shi
45e207d211 [RISCV] Optimize add-mul in the zba extension with SH*ADD
This patch does the following optimization.

Rx + Ry * 18 => (SH1ADD (SH3ADD Rx, Rx), Ry)
Rx + Ry * 20 => (SH2ADD (SH2ADD Rx, Rx), Ry)
Rx + Ry * 24 => (SH3ADD (SH1ADD Rx, Rx), Ry)
Rx + Ry * 36 => (SH2ADD (SH3ADD Rx, Rx), Ry)
Rx + Ry * 40 => (SH3ADD (SH2ADD Rx, Rx), Ry)
Rx + Ry * 72 => (SH3ADD (SH3ADD Rx, Rx), Ry)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104588
2021-06-19 14:33:27 +08:00
Ben Shi
e7b41c5208 [RISCV][test] Add new tests for add-mul optimization in the zba extension with SH*ADD
These tests will show the following optimization by future patches.

Rx + Ry * 18  => (SH1ADD (SH3ADD Rx, Rx), Ry)
Rx + Ry * 20  => (SH2ADD (SH2ADD Rx, Rx), Ry)
Rx + Ry * 24  => (SH3ADD (SH1ADD Rx, Rx), Ry)
Rx + Ry * 36  => (SH2ADD (SH3ADD Rx, Rx), Ry)
Rx + Ry * 40  => (SH3ADD (SH2ADD Rx, Rx), Ry)
Rx + Ry * 72  => (SH3ADD (SH3ADD Rx, Rx), Ry)
Rx * (3 << C) => (SLLI (SH1ADD Rx, Rx), C)
Rx * (5 << C) => (SLLI (SH2ADD Rx, Rx), C)
Rx * (9 << C) => (SLLI (SH3ADD Rx, Rx), C)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104507
2021-06-19 14:31:01 +08:00
Lang Hames
7afd72c69b [ORC][examples] Add missing library dependence 2021-06-19 14:48:34 +10:00
Liqiang Tao
d5e843fe26 [llvm][Inliner] Add an optional PriorityInlineOrder
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028
2021-06-19 10:17:32 +08:00
Lang Hames
b6ca0c60bb [ORC][C-bindings] Add access to LLJIT IRTransformLayer, ThreadSafeModule utils.
This patch was derived from Valentin Churavy's work in
https://reviews.llvm.org/D104480. It adds support for setting the transform on
an IRTransformLayer, and for accessing the IRTransformLayer in LLJIT. It also
adds access to the ThreadSafeModule::withModuleDo method for thread-safe
access to modules.

A new example has been added to show how to use these APIs to optimize a module
during materialization.

Thanks Valentin!

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D103855
2021-06-19 11:50:27 +10:00
Lang Hames
320507db5e [ORC][examples] Fix file name in comment. 2021-06-19 11:50:26 +10:00
Guozhi Wei
d8a557bfc0 [InstCombine] Don't transform code if DoTransform is false
In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case.

Differential Revision: https://reviews.llvm.org/D104567
2021-06-18 18:01:34 -07:00
Fangrui Song
e71d1a723a [InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:

* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code

The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.

Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717
2021-06-18 17:01:17 -07:00
Matt Arsenault
6c04830c65 AMDGPU: Fix infinite loop in DAG combine with fneg + fma
We were not reporting isFNegFree for v2f32, although it is effectively
free after legalization. The generic combine was pulling fneg out of
the fma source operands, and the AMDGPU combine was doing the
opposite.
2021-06-18 19:09:03 -04:00
Matt Arsenault
bf6259aa0b AMDGPU: Fix assert on m0_lo16/m0_hi16
These get added (redundantly) to the bundle expanded for indirect
register accesses. We hit this path only when there is a call in the
function.
2021-06-18 18:48:53 -04:00
Hongtao Yu
7fbb587058 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Nikita Popov
7afbecd70d [LoopUnroll] Simplify optimization remarks
Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and
debug code. For debug code, print information about all exits.
For optimization remarks, only include the unroll count and the
type of unroll (complete, partial or runtime), but omit detailed
information about exit folding, now that more than one exit may
be folded.

Differential Revision: https://reviews.llvm.org/D104482
2021-06-18 23:47:03 +02:00
Hongtao Yu
d9e9fe620c [CSSPGO][llvm-profgen] Fix an issue in findDisjointRanges
We were using 0 as an indicator of invalid offset when computing disjoint ranges. In reality, 0 can be an valid code offset which stands for the first function in .text section. I'm using UINT64_MAX as an invalid code offset instead.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104497
2021-06-18 14:38:48 -07:00
Nick Desaulniers
de2155141d [GCOVProfiling] don't profile Fn's w/ noprofile attribute
Similar to D104475, the Linux kernel would like to avoid compiler
generated code in certain functions. The no_profile function
attribute can be used in C to generate the the noprofile fn attr in IR.
Respect that from GCOVProfiling.

Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104257
2021-06-18 13:58:34 -07:00