1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

9811 Commits

Author SHA1 Message Date
Peter Collingbourne
3f38e31261 ARM: Use BKPT instead of TRAP to implement llvm.debugtrap.
The BKPT instruction is specified to cause a software breakpoint,
and at least on Linux results in a SIGTRAP. This makes it more
suitable for implementing debugtrap than TRAP (aka UDF #254), which
is specified to cause an undefined instruction exception and results
in a SIGILL on Linux.

Moreover, BKPT is not marked as a terminator, which is not only
consistent with the IR instruction but allows the analyzeBlock
function to correctly analyze a basic block containing the instruction,
which fixes an assertion failure in the machine block placement pass
previously triggered by the included test case.

Because BKPT is only supported starting with ARMv5T, we continue to
use UDF #254 when targeting v4T.

Differential Revision: https://reviews.llvm.org/D53614

llvm-svn: 345171
2018-10-24 18:10:38 +00:00
Saleem Abdulrasool
eca209183e ARM: handle checking aliases with out-of-bounds GEPs
A global alias may use indices which are not considered in bounds.  In
such a case, accessing the base object will fail as it only peers
through inbounds accesses.  This pattern is used by the swift compiler
to create references to preceeding members in the type metadata.  This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format.  Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.

llvm-svn: 345107
2018-10-24 00:00:52 +00:00
Simon Pilgrim
df74e03322 [TTI] Add generic cost handling of SK_Reverse shuffles
These can be treated as a general permute.

This required a fix for missing reverse patterns on ARM

llvm-svn: 345015
2018-10-23 09:42:10 +00:00
Eli Friedman
ec411cfda2 Revert r344693 ("[ARM] bottom-top mul support in ARMParallelDSP")
Still causing failures on the polly-aosp buildbot; I'll follow up
with a reduced testcase.

llvm-svn: 344752
2018-10-18 19:34:30 +00:00
Sam Parker
967e905c0d [ARM] bottom-top mul support in ARMParallelDSP
Previously reverted in rL343082.

Original commit message:

On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 344693
2018-10-17 13:02:48 +00:00
Sjoerd Meijer
3e73839b39 [ARM] Do not fuse VADD and VMUL, continued (2/2)
This is patch 2/2, following up on D53314, and is the functional change
to prevent fusing mul + add sequences into VFMAs.

Differential revision: https://reviews.llvm.org/D53315

llvm-svn: 344683
2018-10-17 10:05:44 +00:00
Sjoerd Meijer
e06ebcf09e [ARM] Follow up of rL344671, attempt to pacify a buildbot
It was rightfully complaining about an unpretty logical expression.

llvm-svn: 344677
2018-10-17 07:51:24 +00:00
Sjoerd Meijer
dbb2ea77e4 [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33.  This is a serie of 2
patches, that is trying to achieve the same for VFMA.  The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:

 --------------------------------------------------------
 | Opt   |  < rL342874   |  >= rL342874   |             |
 |------------------------------------------------------|
 |-O3    |     vmla      |      vmul      |     vmul    |
 |       |               |      vadd      |     vadd    |
 |------------------------------------------------------|
 |-Ofast |     vfma      |      vfma      |     vmul    |
 |       |               |                |     vadd    |
 |------------------------------------------------------|
 |-Oz    |     vmla      |      vmla      |     vmla    |
 --------------------------------------------------------

This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2.  This also fixes a typo in the regression test added in rL342874.

Differential revision: https://reviews.llvm.org/D53314

llvm-svn: 344671
2018-10-17 07:26:35 +00:00
Evandro Menezes
8c438fbd8d [NFC][ARM] Refactor macro fusion
Simplify code for wildcards.

llvm-svn: 344625
2018-10-16 17:19:51 +00:00
Simon Pilgrim
e44f188a68 [ARM][NEON] Improve vector popcnt lowering with PADDL (PR39281)
As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.

This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case......

Differential Revision: https://reviews.llvm.org/D53257

llvm-svn: 344512
2018-10-15 13:20:41 +00:00
Dorit Nuzman
a3df726c55 recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman
70052d5053 revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman
c4c9199631 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
George Burgess IV
37a10a0055 Replace most users of UnknownSize with LocationSize::unknown(); NFC
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).

This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.

llvm-svn: 344186
2018-10-10 21:28:44 +00:00
Peter Smith
8682e9a796 [ARM] Account for implicit IT when calculating inline asm size
When deciding if it is safe to optimize a conditional branch to a CBZ or
CBNZ the offsets of the BasicBlocks from the start of the function are
estimated. For inline assembly the generic getInlineAsmLength() function is
used to get a worst case estimate of the inline assembly by multiplying the
number of instructions by the max instruction size of 4 bytes. This
unfortunately doesn't take into account the generation of Thumb implicit IT
instructions. In edge cases such as when all the instructions in the block
are 4-bytes in size and there is an implicit IT then the size is
underestimated. This can cause an out of range CBZ or CBNZ to be generated.

The patch takes a conservative approach and assumes that every instruction
in the inline assembly block may have an implicit IT.

Fixes pr31805

Differential Revision: https://reviews.llvm.org/D52834

llvm-svn: 343960
2018-10-08 09:38:28 +00:00
Matthias Braun
4acd274986 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.

Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343895
2018-10-05 22:00:13 +00:00
Jonas Paulsson
d362ca6156 [TargetRegisterInfo] Remove temporary hook enableMultipleCopyHints()
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.

NFC.

Review: Simon Pilgrim
https://reviews.llvm.org/D52316

llvm-svn: 343851
2018-10-05 14:23:11 +00:00
Matt Morehouse
de1f22a41d Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions"
This reverts r343520 due to breakage of HWASan tests on Android.

llvm-svn: 343616
2018-10-02 18:35:44 +00:00
Diogo N. Sampaio
8af1166e5f [ARM] Emmit data symbol for constant pool data
The ARM elf emitter would omit printing data
symbol when constant data. This patch
overrides the emitFill method as to enforce that
the symbol is correctly printed.

Differential revision: https://reviews.llvm.org/D52737

llvm-svn: 343594
2018-10-02 14:55:48 +00:00
Matthias Braun
9c73b0eb23 X86, AArch64, ARM: Do not attach debug location to spill/reload instructions
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).

Differential Revision: https://reviews.llvm.org/D52125

llvm-svn: 343520
2018-10-01 18:56:39 +00:00
Eli Friedman
a0f892568b [ARM] Fix correctness checks in promoteToConstantPool.
Correctly check for relocations in the constant to promote. And don't
allow promoting a constant multiple times.

This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ;
it's not a complete fix because we also need to prevent
ARMConstantIslands from cloning the constant.

(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)

Differential Revision: https://reviews.llvm.org/D51472

llvm-svn: 343361
2018-09-28 20:27:31 +00:00
Eli Friedman
da1384c2fe [ARM] Use preferred alignment for constants in promoteToConstantPool.
This mostly affects IR generated by non-clang frontends because clang
generally sets the alignment of globals explicitly.

Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 .

(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)

Differential Revision: https://reviews.llvm.org/D51469

llvm-svn: 343359
2018-09-28 20:21:51 +00:00
David Spickett
bfb82efb28 [ARM] Allow execute only code on Cortex-m23
The NoMovt feature prevents the use of MOVW/MOVT
instructions on Cortex-M23 for performance reasons.
These instructions are required for execute only code
so NoMovt should be disabled when that option is enabled.

Differential Revision: https://reviews.llvm.org/D52551

llvm-svn: 343302
2018-09-28 08:55:19 +00:00
Oliver Stannard
62a89909f4 [ARM][v8.5A] Add speculation barriers SSBB and PSSBB
This adds two new barrier instructions which can be used to restrict
speculative execution of load instructions.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52484

llvm-svn: 343300
2018-09-28 08:27:56 +00:00
Oliver Stannard
9fd24e88bc [ARM][v8.5A] Add speculation barrier to ARM & Thumb instruction sets
This is a new barrier which limits speculative execution of the
instructions following it.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52477

llvm-svn: 343213
2018-09-27 13:41:14 +00:00
Fangrui Song
c2791239be llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

llvm-svn: 343163
2018-09-27 02:13:45 +00:00
Oliver Stannard
347bc189fb [ARM/AArch64][v8.5A] Add Armv8.5-A target
This patch allows targeting Armv8.5-A, adding the architecture to
tablegen and setting the options to be identical to Armv8.4-A for the
time being. Subsequent patches will add support for the different
features included in the Armv8.5-A Reference Manual.

Patch by Pablo Barrio!

Differential revision: https://reviews.llvm.org/D52470

llvm-svn: 343102
2018-09-26 12:48:21 +00:00
Sam Parker
7284df400a [ARM] Fix for PR39060
When calculating whether a value can safely overflow for use by an
icmp, we weren't checking that the value couldn't wrap around. To do
this we need the icmp to be using a constant, as well as the incoming
add or sub.

bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060

Differential Revision: https://reviews.llvm.org/D52463

llvm-svn: 343092
2018-09-26 10:56:00 +00:00
Hans Wennborg
a5c38063f6 Revert r342870 "[ARM] bottom-top mul support ARMParallelDSP"
This broke Chromium's Android build (https://crbug.com/889390) and the
polly-aosp buildbot
(http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable).

> Originally committed in rL342210 but was reverted in rL342260 because
> it was causing issues in vectorized code, because I had forgotten to
> ensure that we're operating on scalar values.
>
> Original commit message:
>
> On failing to find sequences that can be converted into dual macs,
> try to find sequential 16-bit loads that are used by muls which we
> can then use smultb, smulbt, smultt with a wide load.
>
> Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 343082
2018-09-26 08:41:50 +00:00
Nirav Dave
36a936ebf7 [ARM] Share predecessor bookkeeping in CombineBaseUpdate. NFCI.
llvm-svn: 342987
2018-09-25 15:30:47 +00:00
Evandro Menezes
a69b45a477 [ARM] Adjust the cost model for Exynos
Tune `MaxInterleaveFactor` and `LdStMultipleTiming`and remove
`PartialUpdateClearance` for the Exynos processors.

llvm-svn: 342900
2018-09-24 16:35:14 +00:00
Evandro Menezes
933d3c8ea9 [ARM] Adjust the feature set for Exynos
Enable crypto and literals fusion for the Exynos processors.

llvm-svn: 342899
2018-09-24 16:35:09 +00:00
Zhaoshi Zheng
14ebc5a569 [Thumb1] Any imm8 should have cost of 1
A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or
[0, 255] in unsigned i8 type on Thumb1.

Differential Revision: https://reviews.llvm.org/D52257

llvm-svn: 342898
2018-09-24 16:15:23 +00:00
Luke Cheeseman
645b604217 [Arm][AsmParser] Restrict register list size for VSTM/VLDM
- The assembler accepts VSTM/VLDM with register lists (specifically double registers lists) with more than 16 registers specified
- The Arm architecture reference manual says this instruction must not contain more than 16 registers when the registers are doubleword registers
- This addresses one of the concerns in https://bugs.llvm.org/show_bug.cgi?id=38389

Differential Revision: https://reviews.llvm.org/D52082

llvm-svn: 342891
2018-09-24 15:13:48 +00:00
Sjoerd Meijer
d5015b6840 [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33
A sequence of VMUL and VADD instructions always give the same or better
performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33.
Executing the VMUL and VADD back-to-back requires the same cycles, but
having separate instructions allows scheduling to avoid the hazard between
these 2 instructions.

Differential Revision: https://reviews.llvm.org/D52289

llvm-svn: 342874
2018-09-24 12:02:50 +00:00
Hans Wennborg
058e12737a Revert r341932 "[ARM] Enable ARMCodeGenPrepare by default"
This caused miscompilation of WebRTC for Android: PR39060.

> We've had the pass enabled downstream for a couple of weeks and it
> seems to be okay, so enable it by default.
>
> Differential Revision: https://reviews.llvm.org/D51920

llvm-svn: 342873
2018-09-24 11:40:07 +00:00
Luke Cheeseman
77de0a1dfd [ARM][ARMLoadStoreOptimizer]
- The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers
- This is an UNPREDICTABLE instruction and shouldn't be done
- It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch
- This fixes https://bugs.llvm.org/show_bug.cgi?id=38389

Differential Revision: https://reviews.llvm.org/D52085

llvm-svn: 342872
2018-09-24 10:42:22 +00:00
Sam Parker
b4d8a969b6 [ARM] bottom-top mul support ARMParallelDSP
Originally committed in rL342210 but was reverted in rL342260 because
it was causing issues in vectorized code, because I had forgotten to
ensure that we're operating on scalar values.

Original commit message:

On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 342870
2018-09-24 09:34:06 +00:00
Maya Madhavan
ac015cd2c1 Fix for bug 34002 - label generated before it block is finalized. Differential Revision: https://reviews.llvm.org/D52258
llvm-svn: 342615
2018-09-20 05:11:42 +00:00
Evandro Menezes
cdfed5405b [ARM] Adjust the feature set for Exynos
Fine tune the cost model for all Exynos processors.

llvm-svn: 342585
2018-09-19 19:51:29 +00:00
Evandro Menezes
0849eadfca [ARM] Refactor Exynos feature set (NFC)
Since all Exynos processors share the same feature set, fold them in the
implied fatures list for the subtarget.

llvm-svn: 342583
2018-09-19 19:43:23 +00:00
Alex Bradbury
b88141f5a6 [AtomicExpandPass]: Add a hook for custom cmpxchg expansion in IR
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have 
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they 
will see no functional change. Previously this hook returned bool, but it now 
returns AtomicExpansionKind.

This hook allows targets to select how a given cmpxchg is to be expanded. 
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.

See my associated RFC for more info on the motivation for this change 
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.

Differential Revision: https://reviews.llvm.org/D48130

llvm-svn: 342550
2018-09-19 14:51:42 +00:00
Oliver Stannard
959a756a99 [ARM] Fix unwind information for floating point registers
Fixes the unwind information generated for floating-point registers.
Previously, all padding registers were assumed to be four bytes wide. Now, the
width of the register is used to specify the amount of padding.

Patch by Jackson Woodruff!

Differential revision: https://reviews.llvm.org/D51494

llvm-svn: 342545
2018-09-19 13:25:31 +00:00
Volodymyr Sapsai
d2bca167a0 Revert "[ARM] Cleanup ARM CGP isSupportedValue"
This reverts r342395 as it caused error

> Argument value type does not match pointer operand type!
>   %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25
>  i8in function atomic_flag_test_and_set
> fatal error: error in backend: Broken function found, compilation aborted!

on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/

More details are available at https://reviews.llvm.org/D52080

llvm-svn: 342431
2018-09-18 00:11:55 +00:00
Sam Parker
dfcd3f1ff0 [ARM] Cleanup ARM CGP isSupportedValue
isSupportedValue explicitly checked and accepted many types of value,
primarily for debugging reasons. Remove most of these checks and do a
bit of refactoring now that the pass is more stable. This also enables
ZExts to be sources, but this has very little practical benefit at the
moment extend instructions will still be introduced.

Differential Revision: https://reviews.llvm.org/D52080

llvm-svn: 342395
2018-09-17 13:57:39 +00:00
Sam Parker
77177923c8 [ARM] Disallow icmp with negative imm and overflow
We allow overflowing instructions if they're decreasing and only used
by an unsigned compare. Add the extra condition that the icmp cannot
be using a negative immediate.

Differential Revision: https://reviews.llvm.org/D52102

llvm-svn: 342392
2018-09-17 13:48:25 +00:00
Reid Kleckner
1cb33bd6c5 Revert r342210 "[ARM] bottom-top mul support in ARMParallelDSP"
It causes assertion failures while building Skia for Android in
Chromium:
https://ci.chromium.org/buildbot/chromium.clang/ToTAndroid/4550

Reduction forthcoming.

llvm-svn: 342260
2018-09-14 18:44:37 +00:00
Sam Parker
3764cb8a6f [ARM] bottom-top mul support in ARMParallelDSP
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.

Differential Revision: https://reviews.llvm.org/D51983

llvm-svn: 342210
2018-09-14 08:09:09 +00:00
Sam Parker
45c545ce0a [ARM] Allow truncs as sources in ARM CGP
We previously only allowed truncs as sinks, but now allow them as
sources too. We do this by checking that the result type is the
narrow type that we're trying to optimise for.

Differential Revision: https://reviews.llvm.org/D51978

llvm-svn: 342141
2018-09-13 15:14:12 +00:00
Sam Parker
89b3b396fd [ARM] Fix FixConst for ARMCodeGenPrepare
Part of FixConsts wrongly assumes either a 8- or 16-bit constant
which can result in the wrong constants being generated during
promotion.

Differential Revision: https://reviews.llvm.org/D52032

llvm-svn: 342140
2018-09-13 14:48:10 +00:00