1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

217776 Commits

Author SHA1 Message Date
Craig Topper
14af611f66 Revert "[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions."
I thought this might help with another optimization I was
thinking about, but I don't think it will. So it just wastes
compile time calling computeKnownBits for no benefit.

This reverts commit 81b2f95971edd47a0057ac4a77b674d7ea620c01.
2021-06-27 10:33:43 -07:00
Nikita Popov
b0bb011472 [DSE] Support opaque pointers
For the start shortening optimization, always use a i8 type for
the GEP, as it is a raw offset calculation.

Handling of non-i8* memset/memcpy arguments requires insertion
of casts. These cases were previously miscompiled, as the offset
calculation was performed on the wrong type.
2021-06-27 17:41:40 +02:00
Nikita Popov
802d7a3dde [MemCpyOpt] Handle unusual memcpy element type
Apparently, it is legal to use memcpy/memset with pointer types
other than i8*. Prior to 81fcdae68c5ff656c30032fd26c6a21af4c51dbb
this case was silently miscompiled, as the i8 offset calculation
was performed on some other type. Now it would crash due to a
type mismatch. Fix this by inserting an explicit bitcast to i8*.
2021-06-27 16:21:44 +02:00
Sanjay Patel
fa04d281c5 [InstCombine] hoist min/max intrinsics above select with constant op
This is an extension of the handling for unary intrinsics and
follows the logic that we use for binary ops.

We don't canonicalize to min/max intrinsics yet, but this might
help unlock other folds seen in D98152.
2021-06-27 10:02:23 -04:00
Nikita Popov
d5928b4916 [MemCpyOpt] Support opaque pointers 2021-06-27 15:52:38 +02:00
Nikita Popov
e3cd3226a4 [LoadStoreVectorizer] Support opaque pointers
There are remaining redundant bitcasts.
2021-06-27 15:42:16 +02:00
Florian Hahn
6f498ede31 [VPlan] Track both incoming values for first-order recurrence phis.
This patch updates VPWidenPHI recipes for first-order recurrences to
also track the incoming value from the back-edge. Similar to D99294,
which did the same for reductions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D104197
2021-06-27 14:29:35 +01:00
Sanjay Patel
27099e9b95 [InstCombine][test] add tests for min/max intrinsics with select operand; NFC 2021-06-27 08:19:00 -04:00
Sanjay Patel
b406a3d74c [Analysis] improve function signature checking for calloc
This would crash later if we thought the parameters were
valid for the standard library call as shown in:
https://llvm.org/PR50846
2021-06-27 08:19:00 -04:00
Mara Sophie Grosch
27c4ea000e [Orc][examples] LLJITWithRemoteDebugger: fix CMake when utils are not built 2021-06-27 13:52:04 +02:00
Jan Kratochvil
d1dd2f5d77 llvm-dwarfdump: Print warnings on invalid DWARF
llvm-dwarfdump was silent even when the format of DWARF was invalid
and/or llvm-dwarfdump did not understand/support some of the constructs.
This can be pretty confusing as llvm-dwarfdump is a tool for DWARF
producers+consumers development.

Review comments also by @dblaikie.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D104271
2021-06-27 11:38:35 +02:00
Craig Topper
72e0f3ae04 [X86] Tighten up some inline assembly constraint handling.
Don't allow vectors to split into GPRs for 'r' and other scalar
constraints. Prevents assertion in getCopyToPartsVector.

Makes PR50907 give a better error instead of crashing.
2021-06-26 22:57:22 -07:00
Alexander Shaposhnikov
559d4f5102 [docs][llvm-strip] Fix documentation for -s/-S
Fix the command line guide for -g/-s/-S.
In particular, previously it was incorrectly stating that -S is an alias for --strip-all.

Differential revision: https://reviews.llvm.org/D104888
2021-06-26 21:26:53 -07:00
David Green
c9378320d3 [ARM] Lower MVETRUNC to stack operations
The MVETRUNC node truncates two wide vectors to a single vector with
narrower elements. This is usually lowered to a series of extract/insert
elements, going via GPR registers. This patch changes that to instead
use a pair of truncating stores and a stack reload. This cuts down the
number of instructions at the expense of some stack space.

Differential Revision: https://reviews.llvm.org/D104515
2021-06-26 22:12:57 +01:00
David Green
d1eb4f5a05 [ARM] Introduce MVETRUNC ISel lowering
Currently, when encountering store(trunc(..)) where the trunc is double
a legal vector lenth in MVE, we spilt the node into two different stores
each performing half of the trunc from the wider type. This works well
for efficiently lowering wider than legal types, else the trunc becomes
a series of individual lane moves. Unfortunately this splitting is
currently one of the first combines attempted, so can happen before any
other combines which might be more preferable.

This patch instead introduces the concept of a MVETRUNC ISel node that
the trunk is initially lowered to, to keep it intact as a single item as
opposed to splitting it up. This allows us to push the store(trunc(..))
combine later, allowing other optimisations to potentially happen on the
trunc first. The store(trunc(..)) splitting can then be done later in
the legalisation period if needed, or else fall back to a buildvector as
before.

This can also be used in the future to lower to loads/stores, as opposed
to the more expensive lane extracts/inserts. Some extra combines are
added to keep all the existing tests happy.

Differential Revision: https://reviews.llvm.org/D91921
2021-06-26 22:00:26 +01:00
Craig Topper
2bab6a4c53 [RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions. 2021-06-26 11:57:26 -07:00
David Green
d9e635a956 [ARM] MVE vabd
This adds MVE lowering for VABDS/VABDU, using the code parted from
AArch64 in D91937.

Differential Revision: https://reviews.llvm.org/D91938
2021-06-26 19:41:32 +01:00
David Green
6a316cf978 [ISel] Port AArch64 SABD and UABD to DAGCombine
This ports the AArch64 SABD and USBD over to DAG Combine, where they can be
used by more backends (notably MVE in a follow-up patch). The matching code
has changed very little, just to handle legal operations and types
differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing
a ubds/abdu which is zexted to the original type.

Differential Revision: https://reviews.llvm.org/D91937
2021-06-26 19:34:16 +01:00
Nikita Popov
2d1d507858 [Verifier] Support masked load/store with opaque pointers 2021-06-26 18:11:59 +02:00
LLVM GN Syncbot
4d0d5cf124 [gn build] Port 8b7881a084d0 2021-06-26 14:20:52 +00:00
David Green
2a8be6538a [ARM] Regenerate big-endian-vector-caller.ll test checks. NFC 2021-06-26 13:21:54 +01:00
Florian Hahn
417caa2b9f [LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC).
Suggested in D104197, avoids the early exit.
2021-06-26 13:13:25 +01:00
LLVM GN Syncbot
769467b21f [gn build] Port aff57ff24aca 2021-06-26 11:38:00 +00:00
Lang Hames
59ded59cbe [JITLink][ELF] Add generic ELFLinkGraphBuilder template.
ELFLinkGraphBuilder<ELFT> will hold generic parsing and LinkGraph-building code
that can be shared between JITLink ELF backends for different architectures.

For now it's just a stub. The plan is to incrementally move functionality down
from ELFLinkGraphBuilder_x86_64 into the new template.
2021-06-26 21:37:33 +10:00
Jim Lin
f8d2dcd1dd [RISCV][NFC] Combine the control flow for different RetOp of interrupt function
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D104838
2021-06-26 17:28:03 +08:00
Craig Topper
c2e96e7cf2 [RISCV] Add DAG combine to detect opportunities to replace (i64 (any_extend (i32 X)) with sign_extend.
If type legalization is going to insert a sign_extend for other users
of X and we can fold the sign_extend into ADDW/MULW/SUBW, it is
better to replace the ANY_EXTEND so we don't end up with a separate
ADD/MUL/SUB instruction for the users of the ANY_EXTEND.

I'm only handling setcc uses right now, but there are other
instructions that force sign_extends like ashr.

There are probably other *W instructions we could use in addition
to ADDW/SUBW/MULW.

My motivating case was a loop terminating compare and a phi use
as seen in the new test file.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D104581
2021-06-25 23:16:37 -07:00
Eric Astor
7492c6b2fb [ms] [llvm-ml] Disable C-style comments 2021-06-25 23:09:13 -04:00
Luo, Yuanke
b033502dd5 [X86] Selecting fld0 for undefined value in fast ISEL.
When set opt-bisect-limit to some value that is less than ISel pass
in command line and CurBisectNum expired, "DAG to DAG" pass lower
its opt level to O0. However "processimpdefs" and "X86 FP Stackifier"
is not stopped due to the CurBisectNum expiration. So undefined fp0
is generated. This cause crash in the "X86 FP Stackifier" pass,
because Stackifier doesn't expect any undefined fp value.

Here is the scenario that cause compiler crash.

  successors: %bb.26
  liveins: $r14
    ST_FPrr $st0, implicit-def $fpsw, implicit $fpcw
    renamable $rdi = MOV64ri @.str.3.16422
    renamable $rdx = LEA64r %stack.6, 1, $noreg, 0, $noreg
    ADJCALLSTACKDOWN64 0, 0, 0, implicit-def $rsp, implicit-def dead
    $eflags, implicit-def $ssp, implicit $rsp, implicit $ssp
    dead $esi = MOV32r0 implicit-def dead $eflags, implicit-def $rsi
    CALL64pcrel32 @foo, implicit $rsp, implicit $ssp, implicit $rdi,
    implicit $rsi, implicit $rdx, implicit-def dead $fp0
    renamable $xmm0 = MOVSDrm_alt %stack.10, 1, $noreg, 0, $noreg :: (load 8
    from %stack.10)
    ADJCALLSTACKUP64 0, 0, implicit-def $rsp, implicit-def dead $eflags,
    implicit-def $ssp, implicit $rsp, implicit $ssp
    renamable $fp2 = CHS_Fp80 killed undef renamable $fp0, implicit-def
    $fpsw
    JMP_1 %bb.26
The CALL64pcrel32 mark fp0 dead, so llvm free the stack slot for fp0
and the stack become empty. In the late instruction CHS_Fp80, it use
undefined register fp0, the original code assume there must be a stack
slot for the src register (fp0) without respecting it is undefined,
so llvm report error.

We have some discussion in https://reviews.llvm.org/D104440 and we
decide to fix it in fast ISel. The fix is to lower undefined fp value to
zero value, so that it release the burden of "X86 FP Stackifier" pass.
Thank Craig for the suggestion and the initial patch to fix it.

Differential Revision: https://reviews.llvm.org/D104678
2021-06-26 08:43:09 +08:00
Jon Chesterfield
7f9c53b162 Disable ReplaceLDS pass, patch up tests to match
Most tests passed with an extra argument to explicitly enable the pass.
One does not, deleted it as part of this change. I can't see why the codegen
would be different between default on and default off but switched on. It
can be retrieved from the project history.

This would be a revert, but git revert was not clean. Disabling the pass
and leaving it in tree is less likely to cause breakage elsewhere than
patching up the git revert conflicts on unfamiliar code. It'll be landed
without review, as @hsmhsm is believed unavailable at present.

Differential Revision: https://reviews.llvm.org/D104962
2021-06-26 01:36:42 +01:00
Andrew Browne
55bbe5301b [DFSan] Change shadow and origin memory layouts to match MSan.
Previously on x86_64:

  +--------------------+ 0x800000000000 (top of memory)
  | application memory |
  +--------------------+ 0x700000008000 (kAppAddr)
  |                    |
  |       unused       |
  |                    |
  +--------------------+ 0x300000000000 (kUnusedAddr)
  |       origin       |
  +--------------------+ 0x200000008000 (kOriginAddr)
  |       unused       |
  +--------------------+ 0x200000000000
  |   shadow memory    |
  +--------------------+ 0x100000008000 (kShadowAddr)
  |       unused       |
  +--------------------+ 0x000000010000
  | reserved by kernel |
  +--------------------+ 0x000000000000

  MEM_TO_SHADOW(mem) = mem & ~0x600000000000
  SHADOW_TO_ORIGIN(shadow) = kOriginAddr - kShadowAddr + shadow

Now for x86_64:

  +--------------------+ 0x800000000000 (top of memory)
  |    application 3   |
  +--------------------+ 0x700000000000
  |      invalid       |
  +--------------------+ 0x610000000000
  |      origin 1      |
  +--------------------+ 0x600000000000
  |    application 2   |
  +--------------------+ 0x510000000000
  |      shadow 1      |
  +--------------------+ 0x500000000000
  |      invalid       |
  +--------------------+ 0x400000000000
  |      origin 3      |
  +--------------------+ 0x300000000000
  |      shadow 3      |
  +--------------------+ 0x200000000000
  |      origin 2      |
  +--------------------+ 0x110000000000
  |      invalid       |
  +--------------------+ 0x100000000000
  |      shadow 2      |
  +--------------------+ 0x010000000000
  |    application 1   |
  +--------------------+ 0x000000000000

  MEM_TO_SHADOW(mem) = mem ^ 0x500000000000
  SHADOW_TO_ORIGIN(shadow) = shadow + 0x100000000000

Reviewed By: stephan.yichao.zhao, gbalats

Differential Revision: https://reviews.llvm.org/D104896
2021-06-25 17:00:38 -07:00
Nikita Popov
5789265b3d Revert "[InstCombine] Make indexed compare fold opaque ptr compatible"
This reverts commit 5cb20ef8a235c2027489a196bba27630ca21a00b.

Assertion failures with this patch were reported on
https://reviews.llvm.org/rG5cb20ef8a235, revert for now.
2021-06-26 00:32:59 +02:00
Duncan P. N. Exon Smith
a6ff73a905 OpaquePtr: Reject 'ptr*' again when parsing textual IR
Bring back the testcase dropped in
1e6303e60ca5af4fbe7ca728572fd65666a98271 and get it passing by checking
explicitly for `ptr*` in LLParser. Uses `Type::isOpaquePointerTy()` from
ad4bb8280952c2cacf497e30560ee94c119b36e0.

Differential Revision: https://reviews.llvm.org/D104938
2021-06-25 15:18:44 -07:00
Eli Friedman
905b300022 [NFC] Prefer ConstantRange::makeExactICmpRegion over makeAllowedICmpRegion
The implementation is identical, but it makes the semantics a bit more
obvious.
2021-06-25 14:43:13 -07:00
Eric Astor
b88bba840f [ms] [llvm-ml] Add support for ALIGN, EVEN, and ORG directives
Match ML.EXE's behavior for ALIGN, EVEN, and ORG directives both at file level and in STRUCTs.

We currently reject negative offsets passed to ORG inside STRUCTs (in ML.EXE and ML64.EXE, they wrap around as for an unsigned 32-bit integer).

Also, if a STRUCT is declared using an ORG directive, no value of that type can be defined.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92507
2021-06-25 17:19:45 -04:00
Juneyoung Lee
1291b464ba [SimplifyLibCalls] Fix memchr opt to use CreateLogicalAnd
This fixes a bug at LibCallSimplifier::optimizeMemChr which does the following transformation:

```
// memchr("\r\n", C, 2) != nullptr -> (1 << C & ((1 << '\r') | (1 << '\n')))
// != 0
//   after bounds check.
```

As written above, a bounds check on C (whether it is less than integer bitwidth) is done before doing `1 << C` otherwise 1 << C will overflow.
If the bounds check is false, the result of (1 << C & ...) must not be used at all, otherwise the result of shift (which is poison) will contaminate the whole results.
A correct way to encode this is `select i1 (bounds check), (1 << C & ...), false`  because select does not allow the unused operand to contaminate the result.
However, this optimization was introducing `and (bounds check), (1 << C & ...)` which cannot do that.

The bug was found from compilation of this C++ code: https://reviews.llvm.org/rG2fd3037ac615#1007197

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104901
2021-06-26 05:59:35 +09:00
Joseph Huber
b8d800fd9c [OpenMP] Change OpenMPOpt to check openmp metadata
The metadata added in D102361 introduces a module flag that we can check
to determine if the module was compiled with `-fopenmp` enables. We can
now check for the precense of this instead of scanning the call graph
for OpenMP runtime functions.

Depends on D102361

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102423
2021-06-25 16:34:22 -04:00
Nemanja Ivanovic
87c4e2706e [PowerPC] Disable combine 64-bit bswap(load) without LDBRX
This causes failures on the big endian bootstrap bot.
Disabling this combine temporarily until I can get a proper fix.
2021-06-25 15:11:22 -05:00
Martin Storsjö
94252d9e23 [llvm-rc] Don't rewrite the arch in the default triple unless necessary
When the default target arch isn't one that is supported as a
windows target, we want to set a suitable architecture (so that
Clang tests that run plain 'llvm-rc' succeed checks for e.g.
"#ifdef _WIN32" even for llvm builds that default to e.g. ppc64).

But if the default target architecture is usable, don't rewrite it.
(Rewriting it, by e.g. "T.setArch(T.getArch())", normalizes the
spelling of the architecture, e.g. changing i686 to i386. Such a
change can make clang unable to find the right sysroot.)

This can't, unfortunately, practically be tested very well because
it is entirely dependent on the default triple of the llvm build.

Differential Revision: https://reviews.llvm.org/D104589
2021-06-25 22:59:09 +03:00
Ulrich Weigand
fc8f374f5d [SystemZ] Add support for .reloc assembler directive
Add support for the .reloc directive along the lines of
other back-ends.

This fixes a regression after https://reviews.llvm.org/D104080
was merged, since that patch presupposed support for .reloc.
2021-06-25 21:51:10 +02:00
Hongtao Yu
8e43edc28f [Coroutines] Define __coro_frame_ty in function scope
Types should be defined in function scope instead of a local lexical scope. Field types should be defined inside in its parent type scope.

We were seeing a type defined in a local scope causing trouble to the dwarf emitter where a context is required to be a funciton scope, a namespace or a global scope.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D104937
2021-06-25 12:33:20 -07:00
Nikita Popov
67a9e2940b [OpaquePtr] Enumerate GlobalAlias value type
The type is no longer implicitly enumerated through the pointer
type.
2021-06-25 21:21:10 +02:00
Nikita Popov
33e01a9045 [IR] Add Type::isOpaquePointerTy() helper (NFC)
Shortcut to check for opaque pointers without a cast to PointerType.
2021-06-25 20:56:59 +02:00
David Green
20d5a26896 [DAG] Fold neg(splat(neg(x)) -> splat(x)
This add as a fold of sub(0, splat(sub(0, x))) -> splat(x). This can
come up in the lowering of right shifts under AArch64, where we generate
a shift left of a negated number.

Differential Revision: https://reviews.llvm.org/D103755
2021-06-25 19:53:29 +01:00
Craig Topper
40a736ff72 [X86] Simplify part of the isel for X86ISD::FCMP/STRICT_FCMP/STRICT_FCMPS.
We don't need to have the compare output a value and then copy it
to FPSW for use by FNSTSW. Instead we can just have the compare
output Glue and glue the FNSTSW to it. InstrEmitter effectively
performed this optimization when emitting the Machine IR. Doing
it directly simplifies the codes and reduces the work in
InstrEmitter. There's no change in the machine IR at the end of
isel before and after this change.
2021-06-25 11:39:01 -07:00
David Green
b250b8a12a [AArch64] Extra negated shift tests. NFC 2021-06-25 19:17:31 +01:00
Philip Reames
65d08a3a8f [test] Add coverage for existing overflow rule with uadd.with.overflow 2021-06-25 10:45:00 -07:00
Florian Hahn
cd014dcdee [LV] Doxygenize VectorizationFactor member comments (NFC).
Minor cleanup for follow-up patch.
2021-06-25 18:35:00 +01:00
Philip Reames
bfe000bb38 [instcombine] Fold overflow check using umulo to comparison
If we have a umul.with.overflow where the multiply result is not used and one of the operands is a constant, we can perform the overflow check cheaper with a comparison then by performing the multiply and extracting the overflow flag.

(Noticed when looking at the conditions SCEV emits for overflow checks.)

Differential Revision: https://reviews.llvm.org/D104665
2021-06-25 10:25:45 -07:00
Joel E. Denny
6ae82bea96 [UpdateCCTestChecks] Support --check-globals
This option is already supported by update_test_checks.py, but it can
also be useful in update_cc_test_checks.py.  For example, I'd like to
use it in OpenMP offload codegen tests to check global variables like
`.offload_maptypes*`.

Reviewed By: jdoerfert, arichardson, ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D104714
2021-06-25 13:17:56 -04:00
Philip Reames
b215117634 [test][instcombine] Add test cases for all x.with.overflow overflow checks
For each of the x.with.overflow variants, if only the overflow bit is consumed, we can generate a direct overflow comparison.  This precommits tests for each of the variants and tries to cover interesting cornercases.
2021-06-25 10:09:58 -07:00