1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
Commit Graph

217651 Commits

Author SHA1 Message Date
Eli Friedman
d624c63575 [NFC][ScalarEvolution] Clean up ExitLimit constructors.
Make all the constructors forward to one constructor.  Remove redundant
assertions.
2021-06-20 17:40:30 -07:00
Jim Lin
191d405aea [IVDescriptors] Fix comment that getUnsafeAlgebraInst has been renamed to getExactFPMathInst
https://reviews.llvm.org/rG36a489d194750dc888f214240e9dec9122ca1f0e renamed the function call
in the test from getUnsafeAlgebraInst to getExactFPMathInst.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D104441
2021-06-21 07:56:22 +08:00
Dmitri Gribenko
71570a2380 [GCOVProfiling][test] Ensure that 'opt' drops any files in a temp directory 2021-06-20 22:48:35 +02:00
Craig Topper
0dab191ced [TypePromotion] Prune Intrinsic includes. NFC
TypePromotion is meant to be a generic pass and doesn't reference
any ARM intrinsics so it shouldn't include IntrinsicsARM.h.
The other Intrinsic related headers appear to be unneeded as well.
2021-06-20 13:04:02 -07:00
Nikita Popov
45f41f7ab1 [LoopUnroll] Use smallest exact trip count from any exit
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.

The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.

Differential Revision: https://reviews.llvm.org/D102982
2021-06-20 20:58:26 +02:00
Fangrui Song
74f848b886 Fix -Wunused-variable and -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC 2021-06-20 11:09:07 -07:00
David Green
51d7c2c19b [DSE] Remove stores in the same loop iteration
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.

The test case this helps is from code like this, which can come up in
certain matrix operations:
  for(i=..)
    dst[i] = 0;
    for(j=..)
      dst[i] += src[i*n+j];

After LICM, this becomes:
for(i=..)
  dst[i] = 0;
  sum = 0;
  for(j=..)
    sum += src[i*n+j];
  dst[i] = sum;

The first store is dead, and with this patch is now removed.

Differntial Revision: https://reviews.llvm.org/D100464
2021-06-20 17:03:30 +01:00
Sanjay Patel
245d1ca508 [InstCombine] fold ctpop-of-select with 1 or more constant arms
The general pattern is mentioned in:
https://llvm.org/PR50140
...but we need to do a bit more to handle intrinsics with extra operands
like ctlz/cttz.
2021-06-20 11:28:45 -04:00
Sanjay Patel
f90ab103b5 [InstCombine] avoid infinite loops with select folds of constant expressions
This pair of transforms was added recently with:
8591640379ac9175a

And could lead to conflicting folds:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35399
2021-06-20 09:46:25 -04:00
Roman Lebedev
48ca532a9e [NFC][AArch64][ARM][Thumb][Hexagon] Autogenerate some tests
These all (and some others) are being affected by D104597,
but they are manually-written, which rather complicates
checking the effect that change has on them.
2021-06-20 14:12:45 +03:00
Roman Lebedev
9acde4873f [UpdateTestUtils] Print test filename when complaining about conflicting prefix
Now that FileCheck eagerly complains when prefixes are unused,
the update script does the same, and  is becoming very common
to need to drop some prefixes, yet figuring out the file
it complains about isn't obvious unless it actually tells us.
2021-06-20 14:12:39 +03:00
Roman Lebedev
21a466ca4d [SimplifyCFG] FoldTwoEntryPHINode(): don't fold if either block has it's address taken
Same as with HoistThenElseCodeToIf() (ad87761925c2790aab272138b5bbbde4a93e0383).
2021-06-20 12:37:14 +03:00
Roman Lebedev
9120098319 [SimplifyCFG] HoistThenElseCodeToIf(): don't hoist if either block has it's address taken
This problem is exposed by D104598, after it tail-merges `ret` in
`@test_inline_constraint_S_label`, the verifier would start complaining
`invalid operand for inline asm constraint 'S'`.

Essentially, taking address of a block is mismodelled in IR.
It should probably be an explicit instruction, a first one in block,
that isn't identical to any other instruction of the same type,
so that it can't be hoisted.
2021-06-20 12:18:15 +03:00
Juneyoung Lee
b719c3f4a5 [InstSimplify] icmp poison, X -> poison
This adds a simple transformation from icmp with poison constant to poison.
Comparing poison with something else is poison, so this is okay.

https://alive2.llvm.org/ce/z/e8iReb
https://alive2.llvm.org/ce/z/q4MurY
2021-06-20 15:39:07 +09:00
Fangrui Song
9e8233e08c [llvm-cov gcov] Support GCC 12 format
GCC 12 will change the length field to represent the number of bytes instead of
32-bit words. This avoids padding for strings.
2021-06-19 22:51:20 -07:00
Fangrui Song
f02bea7812 [llvm-cov gcov] Change case to match the prevailing style && replace getString with readString 2021-06-19 22:50:52 -07:00
Fangrui Song
8a7045847f [test] Fix nocompress.test 2021-06-19 16:27:53 -07:00
Fangrui Song
2fe891f533 [llvm-profdata] Make diagnostics consistent with the (no capitalization, no period) style
The format is currently inconsistent. Use the https://llvm.org/docs/CodingStandards.html#error-and-warning-messages style.

And add `error:` or `warning:` to CHECK lines wherever appropriate.
2021-06-19 14:54:25 -07:00
Fangrui Song
e2a2115bff [llvm-profdata] Delete unneeded empty output filename check 2021-06-19 12:20:45 -07:00
Craig Topper
61a08a19b8 [RISCV] Prevent formation of shXadd(.uw) and add.uw if it prevents the use of addi.
If the outer add has an simm12 immediate operand we should prefer
it instead of materializing it in a register. This would guarantee
and extra instruction and temporary register. Since we don't check
one use on the shl or zext we might generate more instructions if
there is an additional user.
2021-06-19 12:10:42 -07:00
Roman Lebedev
a8e6eca719 [NFC] AMD Zen 3: fix typo in a comment 2021-06-19 22:05:17 +03:00
Fangrui Song
2beef4520d Simplify some typedef struct 2021-06-19 11:36:44 -07:00
Nico Weber
a20aff32da [gn build] (manually) port b9c05aff205b (MIRTests) 2021-06-19 13:04:09 -04:00
Michael Liao
d21f701c76 [MIRPrinter] Add machine metadata support.
- Distinct metadata needs generating in the codegen to attach correct
  AAInfo on the loads/stores after lowering, merging, and other relevant
  transformations.
- This patch adds 'MachhineModuleSlotTracker' to help assign slot
  numbers to these newly generated unnamed metadata nodes.
- To help 'MachhineModuleSlotTracker' track machine metadata, the
  original 'SlotTracker' is rebased from 'AbstractSlotTrackerStorage',
  which provides basic interfaces to create/retrive metadata slots. In
  addition, once LLVM IR is processsed, additional hooks are also
  introduced to help collect machine metadata and assign them slot
  numbers.
- Finally, if there is any such machine metadata, 'MIRPrinter' outputs
  an additional 'machineMetadataNodes' field containing all the
  definition of those nodes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103205
2021-06-19 12:48:08 -04:00
Michael Liao
be5f17eb4b [amdgpu] Improve the from f32 to i64.
- Take the same principle as the conversion from f64 to i64 with extra
  necessary pre- and post-processing. It helps to reduce that conversion
  sequence by half compared to legacy one.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D104427
2021-06-19 12:46:48 -04:00
Sanjay Patel
649e66f4eb [InstCombine][test] add tests for select-of-bit-manip; NFC 2021-06-19 12:34:32 -04:00
Tomas Matheson
e797f6f6ee Allow building for release with EXPENSIVE_CHECKS
D97225 moved LazyCallGraph verify() calls behind EXPENSIVE_CHECKS,
but verity() is defined for debug builds only so this had the unintended
effect of breaking release builds with EXPENSIVE_CHECKS.

Fix by enabling verify() for both debug and EXPENSIVE_CHECKS.

Differential Revision: https://reviews.llvm.org/D104514
2021-06-19 17:02:11 +01:00
Tomas Matheson
48473628b0 [ARM][NFC] Tidy up subtarget frame pointer routines
getFramePointerReg only depends on information in ARMSubtarget,
so move it in there so it can be accessed from more places.

Make use of ARMSubtarget::getFramePointerReg to remove duplicated code.

The main use of useR7AsFramePointer is getFramePointerReg, so inline it.

Differential Revision: https://reviews.llvm.org/D104476
2021-06-19 17:00:45 +01:00
LLVM GN Syncbot
db683fa2a1 [gn build] Port 134723edd5bf 2021-06-19 11:49:56 +00:00
Nikita Popov
272334cfc0 [LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop()
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.

Differential Revision: https://reviews.llvm.org/D104487
2021-06-19 09:25:57 +02:00
Ben Shi
45e207d211 [RISCV] Optimize add-mul in the zba extension with SH*ADD
This patch does the following optimization.

Rx + Ry * 18 => (SH1ADD (SH3ADD Rx, Rx), Ry)
Rx + Ry * 20 => (SH2ADD (SH2ADD Rx, Rx), Ry)
Rx + Ry * 24 => (SH3ADD (SH1ADD Rx, Rx), Ry)
Rx + Ry * 36 => (SH2ADD (SH3ADD Rx, Rx), Ry)
Rx + Ry * 40 => (SH3ADD (SH2ADD Rx, Rx), Ry)
Rx + Ry * 72 => (SH3ADD (SH3ADD Rx, Rx), Ry)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104588
2021-06-19 14:33:27 +08:00
Ben Shi
e7b41c5208 [RISCV][test] Add new tests for add-mul optimization in the zba extension with SH*ADD
These tests will show the following optimization by future patches.

Rx + Ry * 18  => (SH1ADD (SH3ADD Rx, Rx), Ry)
Rx + Ry * 20  => (SH2ADD (SH2ADD Rx, Rx), Ry)
Rx + Ry * 24  => (SH3ADD (SH1ADD Rx, Rx), Ry)
Rx + Ry * 36  => (SH2ADD (SH3ADD Rx, Rx), Ry)
Rx + Ry * 40  => (SH3ADD (SH2ADD Rx, Rx), Ry)
Rx + Ry * 72  => (SH3ADD (SH3ADD Rx, Rx), Ry)
Rx * (3 << C) => (SLLI (SH1ADD Rx, Rx), C)
Rx * (5 << C) => (SLLI (SH2ADD Rx, Rx), C)
Rx * (9 << C) => (SLLI (SH3ADD Rx, Rx), C)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104507
2021-06-19 14:31:01 +08:00
Lang Hames
7afd72c69b [ORC][examples] Add missing library dependence 2021-06-19 14:48:34 +10:00
Liqiang Tao
d5e843fe26 [llvm][Inliner] Add an optional PriorityInlineOrder
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D104028
2021-06-19 10:17:32 +08:00
Lang Hames
b6ca0c60bb [ORC][C-bindings] Add access to LLJIT IRTransformLayer, ThreadSafeModule utils.
This patch was derived from Valentin Churavy's work in
https://reviews.llvm.org/D104480. It adds support for setting the transform on
an IRTransformLayer, and for accessing the IRTransformLayer in LLJIT. It also
adds access to the ThreadSafeModule::withModuleDo method for thread-safe
access to modules.

A new example has been added to show how to use these APIs to optimize a module
during materialization.

Thanks Valentin!

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D103855
2021-06-19 11:50:27 +10:00
Lang Hames
320507db5e [ORC][examples] Fix file name in comment. 2021-06-19 11:50:26 +10:00
Guozhi Wei
d8a557bfc0 [InstCombine] Don't transform code if DoTransform is false
In patch https://reviews.llvm.org/D72396, it doesn't check DoTransform before transforming the code, and generates wrong result for the attached test case.

Differential Revision: https://reviews.llvm.org/D104567
2021-06-18 18:01:34 -07:00
Fangrui Song
e71d1a723a [InstrProfiling][ELF] Make __profd_ private if the function does not use value profiling
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:

* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code

The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.

Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.

With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103717
2021-06-18 17:01:17 -07:00
Matt Arsenault
6c04830c65 AMDGPU: Fix infinite loop in DAG combine with fneg + fma
We were not reporting isFNegFree for v2f32, although it is effectively
free after legalization. The generic combine was pulling fneg out of
the fma source operands, and the AMDGPU combine was doing the
opposite.
2021-06-18 19:09:03 -04:00
Matt Arsenault
bf6259aa0b AMDGPU: Fix assert on m0_lo16/m0_hi16
These get added (redundantly) to the bundle expanded for indirect
register accesses. We hit this path only when there is a call in the
function.
2021-06-18 18:48:53 -04:00
Hongtao Yu
7fbb587058 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Nikita Popov
7afbecd70d [LoopUnroll] Simplify optimization remarks
Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and
debug code. For debug code, print information about all exits.
For optimization remarks, only include the unroll count and the
type of unroll (complete, partial or runtime), but omit detailed
information about exit folding, now that more than one exit may
be folded.

Differential Revision: https://reviews.llvm.org/D104482
2021-06-18 23:47:03 +02:00
Hongtao Yu
d9e9fe620c [CSSPGO][llvm-profgen] Fix an issue in findDisjointRanges
We were using 0 as an indicator of invalid offset when computing disjoint ranges. In reality, 0 can be an valid code offset which stands for the first function in .text section. I'm using UINT64_MAX as an invalid code offset instead.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104497
2021-06-18 14:38:48 -07:00
Nick Desaulniers
de2155141d [GCOVProfiling] don't profile Fn's w/ noprofile attribute
Similar to D104475, the Linux kernel would like to avoid compiler
generated code in certain functions. The no_profile function
attribute can be used in C to generate the the noprofile fn attr in IR.
Respect that from GCOVProfiling.

Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D104257
2021-06-18 13:58:34 -07:00
Craig Topper
a2969894c5 [RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch.
Previously we went directly to unknown state on VTYPE mismatch.
If we instead remember the partial match, we can use this to
still use X0, X0 vsetvli in successors if AVL and needed SEW/LMUL
ratio match.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D104069
2021-06-18 12:16:07 -07:00
Hongtao Yu
f2d99918a7 [CSSPGO][llvm-profgen] Ignore LBR records after interrupt transition
If we have seen an inwards transition from external code to internal code, but not a following outwards transition, the inwards transition is likely due to interrupt which is usually unpaired. Ignore current  and subsequent entries since they are likely from an unrelated pre-interrupt context.

LBR records from different interrupt context are unrelated and they should not be mixed together. Currenlty the OS does this for task-scheduling interrupt but not for all interrupts.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D104276
2021-06-18 12:13:53 -07:00
Anshil Gandhi
7f0aeb0ac8 [AMDGPU] [CodeGen] Fold negate llvm.amdgcn.class into test mask
Implemented the transformation of xor (llvm.amdgcn.class x, mask), -1 into
llvm.amdgcn.class(x, ~mask). Added LIT tests as well.

Differential Revision: https://reviews.llvm.org/D104049
2021-06-18 13:04:12 -06:00
Hongtao Yu
89f841e69b [CSSPGO] Fix an invalid hash table reference issue in the CS preinliner.
We were using a `StringMap` object to store all profiles to be emitted. The object is basically an unordered hash table, therefore updating it in the process of trasvering it may cause issue since the underlying bucket array could change.

I'm also moving the `csspgo-preinliner` switch around so that no context tri will be constructed (by the constructor of `CSPreInliner`) when the switch is off.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104267
2021-06-18 11:54:23 -07:00
Andrew Browne
d75f39d6d7 [DFSan] Cleanup code for platforms other than Linux x86_64.
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.

Reviewed By: gbalats, stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D104481
2021-06-18 11:21:46 -07:00
Krzysztof Parzyszek
f7732c65f0 Revert "Delay initialization of OptBisect"
This reverts commit ec91df8d8195b8b759a89734dba227da1eaa729f.

It was committed by accident.
2021-06-18 13:16:45 -05:00