1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00
Commit Graph

214617 Commits

Author SHA1 Message Date
Jay Foad
899f1c90ad [GlobalISel] Remove ConstantFoldingMIRBuilder
ConstantFoldingMIRBuilder was an experiment which is not used for
anything. The constant folding functionality is now part of
CSEMIRBuilder.

Differential Revision: https://reviews.llvm.org/D101050
2021-04-23 09:13:27 +01:00
Daniel Kiss
30b326d46e [AArch64] Fix for BTI landing pad insertion with PAC-RET+bkey.
EMITBKEY is emitted for PAC-RET+bkey, which is a non machine instructions.

PR: 49957

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D100996
2021-04-23 10:07:25 +02:00
KAWASHIMA Takahiro
47186c3ead [LoopReroll] Fix rerolling loop with extra instructions
Fixes PR47627

This fix suppresses rerolling a loop which has an unrerollable
instruction.

Sample IR for the explanation below:

```
define void @foo([2 x i32]* nocapture %a) {
entry:
  br label %loop

loop:
  ; base instruction
  %indvar = phi i64 [ 0, %entry ], [ %indvar.next, %loop ]

  ; unrerollable instructions
  %stptrx = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %indvar, i64 0
  store i32 999, i32* %stptrx, align 4

  ; extra simple arithmetic operations, used by root instructions
  %plus20 = add nuw nsw i64 %indvar, 20
  %plus10 = add nuw nsw i64 %indvar, 10

  ; root instruction 0
  %ldptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 0
  %value0 = load i32, i32* %ldptr0, align 4
  %stptr0 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 0
  store i32 %value0, i32* %stptr0, align 4

  ; root instruction 1
  %ldptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus20, i64 1
  %value1 = load i32, i32* %ldptr1, align 4
  %stptr1 = getelementptr inbounds [2 x i32], [2 x i32]* %a, i64 %plus10, i64 1
  store i32 %value1, i32* %stptr1, align 4

  ; loop-increment and latch
  %indvar.next = add nuw nsw i64 %indvar, 1
  %exitcond = icmp eq i64 %indvar.next, 5
  br i1 %exitcond, label %exit, label %loop

exit:
  ret void
}
```

In the loop rerolling pass, `%indvar` and `%indvar.next` are appended
to the `LoopIncs` vector in the `LoopReroll::DAGRootTracker::findRoots`
function.

Before this fix, two instructions with `unrerollable instructions`
comment above are marked as `IL_All` at the end of the
`LoopReroll::DAGRootTracker::collectUsedInstructions` function,
as well as instructions with `extra simple arithmetic operations`
comment and `loop-increment and latch` comment. It is incorrect
because `IL_All` means that the instruction should be executed in all
iterations of the rerolled loop but the `store` instruction should
not.

This fix rejects instructions which may have side effects and don't
belong to def-use chains of any root instructions and reductions.

See https://bugs.llvm.org/show_bug.cgi?id=47627 for more information.
2021-04-23 15:14:46 +09:00
Wang, Pengfei
83a6f34489 [X86][AMX][NFC] Avoid assert for the same immidiate value
The previous condition in the assert was over strict. We ought to allow
the same immidiate value being loaded more than once. The intention for
the assert is to check the same AMX register uses multiple different
immidiate shapes. So this fix supposes to be NFC.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D101124
2021-04-23 12:17:00 +08:00
Wang, Pengfei
d73d62d45f [X86][AMX] Try to hoist AMX shapes' def
We request no intersections between AMX instructions and their shapes'
def when we insert ldtilecfg. However, this is not always ture resulting
from not only users don't follow AMX API model, but also optimizations.

This patch adds a mechanism that tries to hoist AMX shapes' def as well.
It only hoists shapes inside a BB, we can improve it for cases across
BBs in future. Currently, it only hoists shapes of which all sources' def
above the first AMX instruction. We can improve for the case that only
source that moves an immediate value to a register below AMX instruction.

Differential Revision: https://reviews.llvm.org/D101067
2021-04-23 12:17:00 +08:00
Wang, Pengfei
d7776e3283 [X86] Enable compilation of user interrupt handlers.
Add __uintr_frame structure and use UIRET instruction for functions with
x86 interrupt calling convention when UINTR is present.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D99708
2021-04-23 11:43:57 +08:00
Serguei Katkov
126d78bad9 [InlineSpiller] Clean-up isSpillCandBB
This is mostly NFC except that for end of BB not previous slot is used.
Idx is used to find a def of sibling live interval in that slot.
The def on end of MBB and on previous slot of end MBB should be the same,
so it should be NFC.

Reviewers: reames, qcolombet, MatzeB, wmi, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100922
2021-04-23 10:16:02 +07:00
Nico Weber
412c04b610 [gn build] (manually) port 0b2bc69ba29 2021-04-22 22:40:53 -04:00
Matt Arsenault
d002a57ca7 AMDGPU: Restore atomic fp feature on FP atomic instruction definitions
9931b1f7a4785b6a17fb87b81a3546d61d0cbca1 switched this to checking for
the two specific subtargets, instead of the dedicated feature. This
broke supporting functions which force added the feature when emitting
targets that do not actually support them. This stil does not work for
the targets that use the gfx6/7 or gfx10 encodings.
2021-04-22 21:32:01 -04:00
Fangrui Song
c83fe04e08 [IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all}
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.

Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)

Also document the function attribute "frame-pointer" which is long overdue.

Differential Revision: https://reviews.llvm.org/D101016
2021-04-22 18:07:30 -07:00
Levy Hsu
04656c7e3e [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Keith Smiley
82ce0102d8 llvm-objdump: add --rpaths to macho support
This prints the rpaths for the given binary

Reviewed By: kastiglione

Differential Revision: https://reviews.llvm.org/D100681
2021-04-22 16:01:10 -07:00
Heejin Ahn
5217fbac0b [WebAssembly] Fix fixEndsAtEndOfFunction for delegate
Background:
CFGStackify's [[ 398f253400/llvm/lib/Target/WebAssembly/WebAssemblyCFGStackify.cpp (L1481-L1540) | fixEndsAtEndOfFunction ]] fixes block/loop/try's return
type when the end of function is unreachable and the function return
type is not void. So if a function returns i32 and `block`-`end` wraps the
whole function, i.e., the `block`'s `end` is the last instruction of the
function, the `block`'s return type should be i32 too:
```
block i32
  ...
end
end_function
```

If there are consecutive `end`s, this signature has to be propagate to
those blocks too, like:
```
block i32
  ...
  block i32
    ...
  end
end
end_function
```

This applies to `try`-`end` too:
```
try i32
  ...
catch
  ...
end
end_function
```

In case of `try`, we not only follow consecutive `end`s but also follow
`catch`, because for the type of the whole `try` to be i32, both `try`
and `catch` parts have to be i32:
```
try i32
  ...
  block i32
    ...
  end
catch
  ...
  block i32
    ...
  end
end
end_function
```

---

Previously we only handled consecutive `end`s or `end` before a `catch`.
But now we have `delegate`, which serves like `end` for
`try`-`delegate`. So we have to follow `delegate` too and mark its
corresponding `try` as i32 (the function's return type):
```
try i32
  ...
catch
  ...
  try i32    ;; Here
    ...
  delegate N
end
end_function
```

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D101036
2021-04-22 15:32:00 -07:00
Heejin Ahn
4405bf5794 [WebAssembly] Serialize params/results in MachineFunctionInfo
This adds support for YAML serialization of `Params` and `Results`
fields in `WebAssemblyMachineFunctionInfo`. Types are printed as `MVT`'s
string representation. This is for writing MIR tests easier.

The tests added are testing simple parsing and printing of `params` /
`results` fields under `machineFunctionInfo`.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D101029
2021-04-22 15:31:09 -07:00
Heejin Ahn
37702c2638 [WebAssembly] Put utility functions in Utils directory (NFC)
This CL
1. Creates Utils/ directory under lib/Target/WebAssembly
2. Moves existing WebAssemblyUtilities.cpp|h into the Utils/ directory
3. Creates Utils/WebAssemblyTypeUtilities.cpp|h and put type
   declarataions and type conversion functions scattered in various
   places into this single place.

It has been suggested several times that it is not easy to share utility
functions between subdirectories (AsmParser, DIsassembler, MCTargetDesc,
...). Sometimes we ended up [[ https://reviews.llvm.org/D92840#2478863 | duplicating ]] the same function because of
this.

There are already other targets doing this: AArch64, AMDGPU, and ARM
have Utils/ subdirectory under their target directory.

This extracts the utility functions into a single directory Utils/ and
make them sharable among all passes in WebAssembly/ and its
subdirectories. Also I believe gathering all type-related conversion
functionalities into a single place makes it more usable. (Actually I
was working on another CL that uses various type conversion functions
scattered in multiple places, which became the motivation for this CL.)

Reviewed By: dschuff, aardappel

Differential Revision: https://reviews.llvm.org/D100995
2021-04-22 15:29:43 -07:00
Craig Topper
7a82be4f0f [RISCV] Fix crash with fptosi.sat/fptoui.sat intrinsics on RV64. Add test cases.
Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width
operand from i32 to i64 for RV64.

Add test cases for the saturating intrinsics for half/float/double
and i32/i64. CodeGen is definitely not optimal. We can probably
make use of the native behavior of fcvt instructions in many cases.

Fixes PR50083
2021-04-22 15:18:15 -07:00
Krzysztof Parzyszek
9949cdf248 [Hexagon] Improve lowering of returns of i1
Emit explicit any-extend to avoid weird tstbit sequences.
2021-04-22 16:47:52 -05:00
Elia Geretto
99885567cb [dfsan] Fix Len argument type in call to __dfsan_mem_transfer_callback
This patch is supposed to solve: https://bugs.llvm.org/show_bug.cgi?id=50075

The function `__dfsan_mem_transfer_callback` takes a `Len` argument of type `i64`; however, when processing a `MemTransferInst` such as `llvm.memcpy.p0i8.p0i8.i32`, the `len` argument has type `i32`. In order to make the type of `len` compatible with the one of the callback argument, this change zero-extends it when necessary.

Reviewed By: stephan.yichao.zhao, gbalats

Differential Revision: https://reviews.llvm.org/D101048
2021-04-22 21:12:20 +00:00
Nikita Popov
5320e959d4 [GVN] Generate LE and BE check lines (NFC)
I accidentally dropped some check lines in my previous commit.
Apparently update_test_checks no longer warns on label conflicts???
2021-04-22 22:44:08 +02:00
Nikita Popov
b67e56152d [GVN] Regenerate test checks (NFC) 2021-04-22 22:38:41 +02:00
Krzysztof Parzyszek
b7fbfec6c7 [Hexagon] Use 'vnot' instead of 'not' in patterns with vectors
'not' expands to checking for an xor with a -1 constant. Since
this looks for a ConstantSDNode it will never match for a vector.

Co-authored-by: Craig Topper <craig.topper@sifive.com>

Differential Revision: https://reviews.llvm.org/D100687
2021-04-22 15:36:20 -05:00
Arthur Eubanks
21048e7590 [GlobalOpt] Don't replace alias with aliasee if aliasee is interposable
Both the alias and aliasee linkage are important.

PR27866 provides some background.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99629
2021-04-22 13:12:34 -07:00
David Green
1c90e80182 [AArch64] Improve vector reverse lowering
This improves the lowering of v8i16 and v16i8 vector reverse shuffles.
Instead of going via a generic tbl it uses a rev64; ext pair, as already
happens for v4i32.

Differential Revision: https://reviews.llvm.org/D100882
2021-04-22 21:01:25 +01:00
Min-Yih Hsu
f97e6f4c0f [M68k][Disassembler][NFC] Decorate dump methods with LLVM_DUMP_METHOD
And guard them with proper macro conditions. NFC.
2021-04-22 12:02:07 -07:00
Min-Yih Hsu
bcae9fcd31 [M68k][AsmParser][NFC] Remove redundant default cases
Remove redundant default cases since all enumeration values have
been covered (-Wcovered-switch-default). NFC.
2021-04-22 11:50:48 -07:00
Kai Nacke
02764e0318 Fix the triple used in llvm-mca.
lookupTarget() can update the passed triple argument. This happens
when no triple is given on the command line, and the architecture
argument does not match the architecture in the default triple.

For example, passing -march=aarch64 on the command line, and the
default triple being x86_64-windows-msvc, the triple is changed
to aarch64-windows-msvc.

However, this triple is not saved, and later in the code, the
triple is constructed again from the triple name, which is the
default triple at this point. Thus the default triple is passed
to constructor of MCSubtargetInfo instance.

The triple is only used determine the object file format, and by
chance, the AArch64 target also uses the COFF file format, and
all is fine. Obviously, the AArch64 target does not support all
available binary file formats, e.g. XCOFF and GOFF, and llvm-mca
crashes in this case.

The fix is to update the triple name with the changed triple
name for the target lookup. Then the default object file format
for the architecture is used, in the example ELF.

Reviewed By: andreadb, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D100992
2021-04-22 14:27:09 -04:00
Vitaly Buka
639348c2c2 Revert "[sanitizer] Use COMPILER_RT_EMULATOR with gtests"
Missed review comments.

This reverts commit e25082961cb5aaafc817cb55593cf0ea8d3c4c22.
2021-04-22 11:15:55 -07:00
Philip Reames
555456c598 [SCEV] Compute ranges for lshr recurrences
Straight forward extension to the recently added infrastructure which was pioneered with shl.

Differential Revision: https://reviews.llvm.org/D99687
2021-04-22 11:06:31 -07:00
Philip Reames
8361e53fbe Revert "[instcombine] Exploit UB implied by nofree attributes"
This change effectively reverts 86664638, but since there have been some changes on top and I wanted to leave the tests in, it's not a mechanical revert.

Why revert this now?  Two main reasons:
1) There are continuing discussion around what the semantics of nofree.  I am getting increasing uncomfortable with the seeming possibility we might redefine nofree in a way incompatible with these changes.
2) There was a reported miscompile triggered by this change (https://github.com/emscripten-core/emscripten/issues/9443).  At first, I was making good progress on tracking down the issues exposed and those issues appeared to be unrelated latent bugs.  Now that we've found at least one bug in the original change, and the investigation has stalled, I'm no longer comfortable leaving this in tree.  In retrospect, I probably should have reverted this earlier and investigated the issues once the triggering change was out of tree.
2021-04-22 10:53:17 -07:00
Craig Topper
ae70485f74 [RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi.
These instructions don't really exist, but we have ways we can
emulate them.

.vv will swap operands and use vmsle().vv. .vi will adjust the
immediate and use .vmsgt(u).vi when possible. For .vx we need to
use some of the multiple instruction sequences from the V extension
spec.

For unmasked vmsge(u).vx we use:
  vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd

For cases where mask and maskedoff are the same value then we have
vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that
requires a temporary so we use:
  vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt

For other masked cases we use this sequence:
  vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
We trust that register allocation will prevent vd in vmslt{u}.vx
from being v0 since v0 is still needed by the vmxor.

Differential Revision: https://reviews.llvm.org/D100925
2021-04-22 10:44:38 -07:00
Craig Topper
295197497d [RISCV] Add missing tests for vector type for second operand of vmsgt and vmsgtu IR intrinsics.
Refactor to use new multiclass instead of individual patterns.

We already supported this due to SEW=64 on RV32, but we didn't have
test cases for all the types we supported.

Part of D100925
2021-04-22 10:44:38 -07:00
Craig Topper
4b0b60fb85 [RISCV] Support vector type for second operand of vmfge and vmfgt IR intrinsics.
We don't have instructions for these, but can swap the operands
to use vmle/vmflt. This makes the IR interface more consistent and
simplifies the frontend implementation.

Part of D100925
2021-04-22 10:44:38 -07:00
Vitaly Buka
cd55b6be92 [sanitizer] Use COMPILER_RT_EMULATOR with gtests
Differential Revision: https://reviews.llvm.org/D100998
2021-04-22 10:33:50 -07:00
Fangrui Song
dcdc354ce1 Temporarily revert the code part of D100981 "Delete le32/le64 targets"
This partially reverts commit 77ac823fd285973cfb3517932c09d82e6a32f46d.

Halide uses le32/le64 (https://github.com/halide/Halide/pull/5934).
Temporarily brings back the code part to give them some time for migration.
2021-04-22 10:18:44 -07:00
Craig Topper
ba2e1abe2f [RISCV] Turn splat shuffles of vector loads into strided load with stride of x0.
Implementations are allowed to optimize an x0 stride to perform
less memory accesses. This is the case in SiFive cores.

No idea if this is the case in other implementations. We might
need a tuning flag for this.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D100815
2021-04-22 10:02:57 -07:00
Craig Topper
c85259e1c3 [RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32.
Rather than doing splatting each separately and doing bit manipulation
to merge them in the vector domain, copy the data to the stack
and splat it using a strided load with x0 stride. At least on
some implementations this vector load is optimized to not do
a load for each element.

This is equivalent to how we move i64 to f64 on RV32.

I've only implemented this for the intrinsic fallbacks in this
patch. I think we do similar splatting/shifting/oring in other
places. If this is approved, I'll refactor the others to share
the code.

Differential Revision: https://reviews.llvm.org/D101002
2021-04-22 09:50:07 -07:00
Krzysztof Parzyszek
e644e52d90 [Hexagon] Add HVX intrinsics for conditional vector loads/stores
Intrinsics for the following instructions are added. The intrinsic
name is "int_hexagon_<inst>[_128B]", e.g.
  int_hexagon_V6_vL32b_pred_ai        for 64-byte version
  int_hexagon_V6_vL32b_pred_ai_128B   for 128-byte version

V6_vL32b_pred_ai        if (Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_pred_pi        if (Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_pred_ppu       if (Pv4) Vd32 = vmem(Rx32++Mu2)
V6_vL32b_npred_ai       if (!Pv4) Vd32 = vmem(Rt32+#s4)
V6_vL32b_npred_pi       if (!Pv4) Vd32 = vmem(Rx32++#s3)
V6_vL32b_npred_ppu      if (!Pv4) Vd32 = vmem(Rx32++Mu2)

V6_vL32b_nt_pred_ai     if (Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_pred_pi     if (Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_pred_ppu    if (Pv4) Vd32 = vmem(Rx32++Mu2):nt
V6_vL32b_nt_npred_ai    if (!Pv4) Vd32 = vmem(Rt32+#s4):nt
V6_vL32b_nt_npred_pi    if (!Pv4) Vd32 = vmem(Rx32++#s3):nt
V6_vL32b_nt_npred_ppu   if (!Pv4) Vd32 = vmem(Rx32++Mu2):nt

V6_vS32b_pred_ai        if (Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_pred_pi        if (Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_pred_ppu       if (Pv4) vmem(Rx32++Mu2) = Vs32
V6_vS32b_npred_ai       if (!Pv4) vmem(Rt32+#s4) = Vs32
V6_vS32b_npred_pi       if (!Pv4) vmem(Rx32++#s3) = Vs32
V6_vS32b_npred_ppu      if (!Pv4) vmem(Rx32++Mu2) = Vs32

V6_vS32Ub_pred_ai       if (Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_pred_pi       if (Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_pred_ppu      if (Pv4) vmemu(Rx32++Mu2) = Vs32
V6_vS32Ub_npred_ai      if (!Pv4) vmemu(Rt32+#s4) = Vs32
V6_vS32Ub_npred_pi      if (!Pv4) vmemu(Rx32++#s3) = Vs32
V6_vS32Ub_npred_ppu     if (!Pv4) vmemu(Rx32++Mu2) = Vs32

V6_vS32b_nt_pred_ai     if (Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_pred_pi     if (Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_pred_ppu    if (Pv4) vmem(Rx32++Mu2):nt = Vs32
V6_vS32b_nt_npred_ai    if (!Pv4) vmem(Rt32+#s4):nt = Vs32
V6_vS32b_nt_npred_pi    if (!Pv4) vmem(Rx32++#s3):nt = Vs32
V6_vS32b_nt_npred_ppu   if (!Pv4) vmem(Rx32++Mu2):nt = Vs32
2021-04-22 11:49:29 -05:00
Raphael Isemann
03a8dac306 Fix memory leak in MicrosoftDemangleNodes's Node::toString
The buffer we turn into a std::string here is malloc'd and should be
free'd before we return from this function.

Follow up to LLDB leak fixes such as D100806.

Reviewed By: mstorsjo, rupprecht, MaskRay

Differential Revision: https://reviews.llvm.org/D100843
2021-04-22 18:44:30 +02:00
Jianzhou Zhao
94cf740f57 [dfsan] Track origin at loads
The first version of origin tracking tracks only memory stores. Although
    this is sufficient for understanding correct flows, it is hard to figure
    out where an undefined value is read from. To find reading undefined values,
    we still have to do a reverse binary search from the last store in the chain
    with printing and logging at possible code paths. This is
    quite inefficient.

    Tracking memory load instructions can help this case. The main issues of
    tracking loads are performance and code size overheads.

    With tracking only stores, the code size overhead is 38%,
    memory overhead is 1x, and cpu overhead is 3x. In practice #load is much
    larger than #store, so both code size and cpu overhead increases. The
    first blocker is code size overhead: link fails if we inline tracking
    loads. The workaround is using external function calls to propagate
    metadata. This is also the workaround ASan uses. The cpu overhead
    is ~10x. This is a trade off between debuggability and performance,
    and will be used only when debugging cases that tracking only stores
    is not enough.

Reviewed By: gbalats

Differential Revision: https://reviews.llvm.org/D100967
2021-04-22 16:25:24 +00:00
Irina Dobrescu
53c3a79b92 [flang][openmp] Add General Semantic Checks for Allocate Directive
This patch adds semantic checks for the General Restrictions of the
Allocate Directive.

Since the requires directive is not yet implemented in Flang, the
restriction:
```
allocate directives that appear in a target region must
specify an allocator clause unless a requires directive with the
dynamic_allocators clause is present in the same compilation unit
```
will need to be updated at a later time.

A different patch will be made with the Fortran specific restrictions of
this directive.

I have used the code from https://reviews.llvm.org/D89395 for the
CheckObjectListStructure function.

Co-authored-by: Isaac Perry <isaac.perry@arm.com>

Reviewed By: clementval, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D91159
2021-04-22 16:15:06 +00:00
Sanjay Patel
a4fce4e845 [x86] remove stale comment from test file; NFC 2021-04-22 12:11:47 -04:00
Alexey Bataev
dab3d7322e [SLP]Skip undefs trying to find perfect/shuffled tree entries matching.
We can skip check for undefs trying to find perfect/shuffled tree
entries matching, they can be ignored completely improving the final
cost/vectorization results.

Differential Revision: https://reviews.llvm.org/D101061
2021-04-22 08:59:07 -07:00
Hongtao Yu
713ed720d1 [llvm-profgen] A couple tweaks to the testing harness.
1. Remove unnecessary filtering code.
2. Add llvm-profgen to tool substitutions.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D101006
2021-04-22 08:57:14 -07:00
Coplin, Jared
5e52d75c0d [Hexagon] Unmasked and masked load pair to dame bae -? one load and selects 2021-04-22 10:15:46 -05:00
Joe Ellis
543196c49f [AArch64] Block tryCombineToBSL combines for vectors wider than NEON
There are no patterns for the AArch64ISD::BSP ISD node for anything
other than NEON vectors at the moment. As a result, if we hit these
combines for vectors wider than a NEON vector (such as what we might get
with fixed length SVE) we will fail to lower.

This patch simply prevents us from attempting the combines if the input
vector type is too wide.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D100961
2021-04-22 15:09:13 +00:00
Joe Ellis
0427e8801a [LoopVectorize] Fix bug where predicated loads/stores were dropped
This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

     1  void foo(int *restrict data1, int *restrict data2)
     2  {
     3    int counter = 1024;
     4    while (counter--)
     5      if (data1[counter] > data2[counter])
     6        data1[counter] = data2[counter];
     7  }

... could previously be transformed in such a way that the predicated
store implied by:

    if (data1[counter] > data2[counter])
       data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Differential Revision: https://reviews.llvm.org/D99569
2021-04-22 15:05:54 +00:00
Alexey Bataev
be1b72b8b6 [SLP]Replace more TTI with TTIRef, NFC.
To pacify MSVC buildbots.
2021-04-22 07:53:20 -07:00
Alexey Bataev
b09b5f35d0 [SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC
buildbots, NFC.
2021-04-22 07:49:48 -07:00
Alexey Bataev
a899f9f408 [SLP]Improve cost model for the vectorized extractelements.
1. No need to call `areAllUsersVectorized` as later the cost is
   calculated only if the instruction has one use and gets vectorized.
2. Need to calculate the cost of the dead extractelement more precisely,
   taking the vector type of the vector operand, not the resulting
   vector type.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D99980
2021-04-22 07:40:17 -07:00
Dávid Bolvanský
78fb52ea7c [LoopIdiom] Added testcase for double memset (fixed in LLVM 12); NFC 2021-04-22 16:39:25 +02:00