1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 12:12:47 +01:00
Commit Graph

208274 Commits

Author SHA1 Message Date
Nico Weber
93ca28c075 [gn build] (semi-manually) port 7ad49aec125 2020-12-14 18:22:54 -05:00
Gulfem Savrun Yeniceri
196c46aa87 [clang][IR] Add support for leaf attribute
This patch adds support for leaf attribute as an optimization hint
in Clang/LLVM.

Differential Revision: https://reviews.llvm.org/D90275
2020-12-14 14:48:17 -08:00
Sanjay Patel
56023bebcf [VectorCombine] make load transform poison-safe
As noted in D93229, the transform from scalar load to vector load
potentially leaks poison from the extra vector elements that are
being loaded.

We could use freeze here (and x86 codegen at least appears to be
the same either way), but we already have a shuffle in this logic
to optionally change the vector size, so let's allow that
instruction to serve both purposes.

Differential Revision: https://reviews.llvm.org/D93238
2020-12-14 17:42:01 -05:00
Craig Topper
0f8c2c0fdc [LoopIdiomRecognize] Teach detectShiftUntilZeroIdiom to recognize loops where the counter is decrementing.
This adds support for loops like

unsigned clz(unsigned x) {
    unsigned w = sizeof (x) * CHAR_BIT;
    while (x) {
        w--;
        x >>= 1;
    }

    return w;
}

and

unsigned clz(unsigned x) {
    unsigned w = sizeof (x) * CHAR_BIT - 1;
    while (x >>= 1) {
        w--;
    }

    return w;
}

To support these we look for add x, -1 as well as add x, 1 that
we already matched. If the value was -1 we need to subtract from
the initial counter value instead of adding to it.

Fixes PR48404.

Differential Revision: https://reviews.llvm.org/D92745
2020-12-14 14:25:05 -08:00
Stanislav Mekhanoshin
23c9769258 [AMDGPU] Use multi-dword flat scratch for spilling
Differential Revision: https://reviews.llvm.org/D93067
2020-12-14 14:19:29 -08:00
Bardia Mahjour
8db5fd0457 Revert "[DDG] Data Dependence Graph - DOT printer"
This reverts commit fd4a10732c8bd646ccc621c0a9af512be252f33a, to
investigate the failure on windows: http://lab.llvm.org:8011/#/builders/127/builds/3274
2020-12-14 16:54:20 -05:00
Bardia Mahjour
6d099be63b [DDG] Data Dependence Graph - DOT printer
This patch implements a DDG printer pass that generates a graph in
the DOT description language, providing a more visually appealing
representation of the DDG. Similar to the CFG DOT printer, this
functionality is provided under an option called -dot-ddg and can
be generated in a less verbose mode under -dot-ddg-only option.

Differential Revision: https://reviews.llvm.org/D90159
2020-12-14 16:41:14 -05:00
Matt Arsenault
4c16866a59 OpaquePtr: Require byval on x86_intrcc parameter 0
Currently the backend special cases x86_intrcc and treats the first
parameter as byval. Make the IR require byval for this parameter to
remove this special case, and avoid the dependence on the pointee
element type.

Fixes bug 46672.

I'm not sure the IR is enforcing all the calling convention
constraints. clang seems to ignore the attribute for empty parameter
lists, but the IR tolerates it.
2020-12-14 16:34:37 -05:00
Zequan Wu
2bd2c2e5a1 [NFC] cleanup cg-profile emission on TargetLowerinng
Differential Revision: https://reviews.llvm.org/D93150
2020-12-14 13:07:44 -08:00
Guozhi Wei
43d0dff6c4 [MBP] Prevent rotating a chain contains entry block
The entry block should always be the first BB in a function.
So we should not rotate a chain contains the entry block.

Differential Revision: https://reviews.llvm.org/D92882
2020-12-14 12:48:55 -08:00
Philip Reames
569f60bb57 [LAA] Relax restrictions on early exits in loop structure
his is a preparation patch for supporting multiple exits in the loop vectorizer, by itself it should be mostly NFC. This patch moves the loop structure checks from LAA to their respective consumers (where duplicates don't already exist).  Moving the checks does end up changing some of the optimization warnings and debug output slightly, but nothing that appears to be a regression.

Why do this? Well, after auditing the code, I can't actually find anything in LAA itself which relies on having all instructions within a loop execute an equal number of times. This patch simply makes this explicit so that if one consumer - say LV in the near future (hopefully) - wants to handle a broader class of loops, it can do so.

Differential Revision: https://reviews.llvm.org/D92066
2020-12-14 12:44:01 -08:00
Sanjay Patel
9e252bc0c0 [VectorCombine] add test for load with offset; NFC 2020-12-14 14:40:06 -05:00
Reid Kleckner
0ab72ed3c5 [Hexagon] Tweak _MSC_VER workaround version
My bot runs VS 2019, but it could not compile this code.

Message:
[55/2465] Building CXX object lib\Target\Hexagon\CMakeFiles\LLVMHexagonCodeGen.dir\HexagonVectorCombine.cpp.obj
FAILED: lib/Target/Hexagon/CMakeFiles/LLVMHexagonCodeGen.dir/HexagonVectorCombine.cpp.obj
...
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.23.28105\include\map(71): error C2976: 'std::map': too few template arguments
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.23.28105\include\map(71): note: see declaration of 'std::map'

The version in the path, 14.23, corresponds to _MSC_VER 1923, so raise
the version floor to 1924.

I have not tested with versions between 1924 and 1928 (latest), but the
latest works with the variadic version.
2020-12-14 11:26:36 -08:00
Alina Sbirlea
5d7e73c1ea [NFC] Remove stray comment. 2020-12-14 11:19:17 -08:00
Craig Topper
40953f64d3 [RISCV] Move vtype decoding and printing from RISCVInstPrinter to RISCVBaseInfo. Share with the assembly parser's debug output
This moves the vtype decoding and printing to RISCVBaseInfo. This keeps all of
the decoding code in the same area as the encoding code. This will make it
easier to change the decoding for the 1.0 spec in the future.

We're now sharing the printing with the debug output for operands in the
assembler. This also fixes that debug output to include the tail and mask
agnostic bits. Since the printing code works on the vtype immediate value, we
now encode the immediate during parsing and store just the immediate in the
operand.
2020-12-14 10:50:26 -08:00
Jonas Paulsson
39dd827634 [SystemZ] Improve handling of backchain offset.
- New function SDValue getBackchainAddress() used by
  lowerDYNAMIC_STACKALLOC() and lowerSTACKRESTORE() to properly handle the
  backchain offset also with packed-stack.

- Make a common function getBackchainOffset() for the computation of the
  backchain offset and use in some places (NFC).

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D93171
2020-12-14 12:39:38 -06:00
Michael Liao
d0d0262443 [amdgpu] Fix a crash case when V_CNDMASK could be simplified.
- Once an instruction is simplified, foldable candidates from it should
  be invalidated or skipped as the operand index is no longer valid.

Differential Revision: https://reviews.llvm.org/D93174
2020-12-14 13:08:13 -05:00
Roman Lebedev
c362d4d27f [NFCI][Thumb2] Regenerate MVE tests i missed in 59560e85897afc50090b6c3d920bacfd28b49d06 2020-12-14 21:01:00 +03:00
Tony
66c131f86a [NFC] Remove trailing whitespace in llvm/CMakeLists.txt
Differential Revision: https://reviews.llvm.org/D93234
2020-12-14 17:48:16 +00:00
Cameron Desrochers
783e812623 [TableGen] Fixed 64-bit filters being sliced to 32 bits in FixedLenDecoderEmitter
When using the FixedLenDecoderEmitter, llvm-tblgen emits tables with (OPC_ExtractField, OPC_ExtractFilterValue) opcode sequences to match the contiguous fixed bits of a given instruction's encoding. This encoding is represented in a 64-bit integer. However, the filter values were represented in a 32-bit integer. As such, instructions with fixed 64-bit encodings resulted in a table with an OPC_ExtractField for all 64 bits, followed by an OPC_ExtractFilterValue containing just the low 32 bits of their encoding, causing the filter never to match.

The exact point at which the slicing occurred was during the map insertion at line 630.

Differential Revision: https://reviews.llvm.org/D92423
2020-12-14 12:42:35 -05:00
Nemanja Ivanovic
6da8209a9a [PowerPC] Restore stack ptr from frame ptr with setjmp
If a function happens to:

- call setjmp
- do a 16-byte stack allocation
- call a function that sets up a stack frame and longjmp's back

The stack pointer that is restores by setjmp will no longer point to a valid
back chain. According to the ABI, stack accesses in such a function are to be
frame pointer based - so it is an error (quite obviously) to restore the stack
from the back chain.
We already restore the stack from the frame pointer when there are calls to
fast_cc functions. We just need to also do that when there are calls to setjmp.
This patch simply does that.

This was pointed out by the Julia team.

Differential revision: https://reviews.llvm.org/D92906
2020-12-14 11:34:16 -06:00
Roman Lebedev
2516110411 [SimplifyCFG] FoldBranchToCommonDest(): temporairly put back restrictions on liveout uses of bonus instructions (PR48450)
Even though d38205144febf4dc42c9270c6aa3d978f1ef65e1 was mostly a correct
fix for the external non-PHI users, it's not a *generally* correct fix,
because the 'placeholder' values in those trivial PHI's we create
shouldn't be *always* 'undef', but the PHI itself for the backedges,
else we end up with wrong value, as the `@pr48450_2` test shows.

But we can't just do that, because we can't check that the PHI
can be it's own incoming value when coming from certain predecessor,
because we don't have a dominator tree.

So until we can address this correctness problem properly,
ensure that we don't perform the transformation
if there are such problematic external uses.

Making dominator tree available there is going to be involved,
since `-simplifycfg` pass currently does not preserve/update domtree...
2020-12-14 20:14:31 +03:00
Roman Lebedev
4292a82bd2 [NFC][SimplifyCFG] FoldBranchToCommonDest(): pull out 'common successor' into a variable
Makes it easier to use it elsewhere
2020-12-14 20:14:31 +03:00
Roman Lebedev
aef5ea6d96 [NFC][SimplifyCFG] Add another miscompiled test for PR48450 2020-12-14 20:14:31 +03:00
Stanislav Mekhanoshin
4e48d88543 [SLP] Control maximum vectorization factor from TTI
D82227 has added a proper check to limit PHI vectorization to the
maximum vector register size. That unfortunately resulted in at
least a couple of regressions on SystemZ and x86.

This change reverts PHI handling from D82227 and replaces it with
a more general check in SLPVectorizerPass::tryToVectorizeList().
Moved to tryToVectorizeList() it allows to restart vectorization
if initial chunk fails.

However, this function is more general and handles not only PHI
but everything which SLP handles. If vectorization factor would
be limited to maximum vector register size it would limit much
more vectorization than before leading to further regressions.
Therefore a new TTI callback getMaximumVF() is added with the
default 0 to preserve current behavior and limit nothing. Then
targets can decide what is better for them.

The callback gets ElementSize just like a similar getMinimumVF()
function and the main opcode of the chain. The latter is to avoid
regressions at least on the AMDGPU. We can have loads and stores
up to 128 bit wide, and <2 x 16> bit vector math on some
subtargets, where the rest shall not be vectorized. I.e. we need
to differentiate based on the element size and operation itself.

Differential Revision: https://reviews.llvm.org/D92059
2020-12-14 08:49:40 -08:00
Jay Foad
3f42bac9ba [AMDGPU] Make use of HasSMemRealTime predicate. NFC.
We have this subtarget feature so it makes sense to use it here. This is
NFC because it's always defined by default on GFX8+.

Differential Revision: https://reviews.llvm.org/D93202
2020-12-14 16:34:57 +00:00
Kazushi (Jam) Marukawa
e50338f5d0 [VE] Add logical mask intrinsic instructions
Add andm, orm, xorm, eqvm, nndm, negm, pcvm, lzvm, and tovm intrinsic
instructions, a few pseudo instructions to expand logical intrinsic
using VM512, a mechnism to expand such pseudo instructions, and
regression tests.  Also, assign vector mask types and vector mask
register classes correctly.  This is required to use VM512 registers
as function arguments.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D93093
2020-12-15 01:34:31 +09:00
Simon Pilgrim
ee5f2e29f2 [X86] LowerBUILD_VECTOR - track zero/nonzero elements with APInt masks. NFCI.
Prep work for undef/zero 'upper elements' handling as proposed in D92645.
2020-12-14 16:28:45 +00:00
Kazushi (Jam) Marukawa
97b075a61c [VE] Correct addRegisterClass calls
Correct addRegisterClass calls for vector mask registers.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D93212
2020-12-15 01:16:56 +09:00
diggerlin
1cd7949f8b [AIX] Fixed "comparison of unsigned expression >= 0 is always true" gcc warnings.
Summary:

fixed a  Fixed "comparison of unsigned expression >= 0 is always true" gcc warnings.
http://lab.llvm.org:8011/#/builders/5/builds/2407/steps/2/logs/stdio

the error caused by patch https://reviews.llvm.org/D92398
2020-12-14 11:08:40 -05:00
Markus Lavin
8b493e1580 Reland [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo).  Before rewrite (but after introduction of new
induction variables) use SCEV to compute an equivalent set of values for
each @llvm.dbg.value in the loop body (among the loop header PHI-nodes).
After rewrite (and dead PHI elimination) update those @llvm.dbg.value
now referencing undef by picking a remaining value from its equivalence
set.  Allow match with offset by inserting compensation code in the
DIExpression.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-12-14 16:15:18 +01:00
Florian Hahn
8b3b1f6f2e [VPlan] Make VPWidenMemoryInstructionRecipe a VPDef.
This patch updates VPWidenMemoryInstructionRecipe to use VPDef
to manage the value it produces instead of inheriting from VPValue.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D90563
2020-12-14 14:13:59 +00:00
David Spickett
18038eabcf [llvm-objdump] Use "--" for long options in --help text
Single dash for these options is not recognised.

Changes found by running this on the --help output
and the user guide:
grep -e ' -[a-zA-Z]\{2,\}'

The user guide was updated in https://reviews.llvm.org/D92305
so no change there.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D92310
2020-12-14 13:11:29 +00:00
Anton Afanasyev
255cd47a94 [SLP] Fix vector element size for the store chains
Vector element size could be different for different store chains.
This patch prevents wrong computation of maximum number of elements
for that case.

Differential Revision: https://reviews.llvm.org/D93192
2020-12-14 15:51:43 +03:00
Simon Pilgrim
f264008f89 [TableGen] Don't dereference from dyn_cast<> - use cast<> instead. NFCI.
dyn_cast<> can return null if the cast fails, resulting in null dereferences and static analyzer warnings. We should use cast<> instead.
2020-12-14 12:12:08 +00:00
Simon Pilgrim
371053ed23 [IRCE] Add test case for PR48051 2020-12-14 12:01:19 +00:00
Kerry McLaughlin
8503126c6a [SVE][CodeGen] Lower scalable floating-point vector reductions
Changes in this patch:
-  Minor changes to the LowerVECREDUCE_SEQ_FADD function added by @cameron.mcinally
   to also work for scalable types
- Added TableGen patterns for FP reductions with unpacked types (nxv2f16, nxv4f16 & nxv2f32)
- Asserts added to expandFMINNUM_FMAXNUM & expandVecReduceSeq for scalable types

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D93050
2020-12-14 11:45:42 +00:00
David Green
169d672b6b [ARM] Improve handling of empty VPT blocks in tail predicated loops
A vpt block that just contains either VPST;VCTP or VPT;VCTP, once the
VCTP is removed will become invalid. This fixed the first by removing
the now empty block and bails out for the second, as we have no simple
way of converting a VPT to a VCMP.

Differential Revision: https://reviews.llvm.org/D92369
2020-12-14 11:17:01 +00:00
Carl Ritson
ce9c6c06b9 [AMDGPU][NFC] Rename opsel/opsel_hi/neg_lo/neg_hi with suffix 0
These parameters set a default value of 0, so I believe they
should include a 0 suffix. This allows for versions which do not
set a default value in future.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93187
2020-12-14 20:01:56 +09:00
Carl Ritson
5ca103d09e [AMDGPU][NFC] Remove unused VOP3Mods0Clamp
This is unused and the selection function does not exist.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D93188
2020-12-14 20:00:58 +09:00
Sebastian Neubauer
de78d986b0 [AMDGPU] Mark amdgpu_gfx functions as module entry function
- Allows lds allocations
- Writes resource usage into COMPUTE_PGM_RSRC1 registers in PAL metadata

Differential Revision: https://reviews.llvm.org/D92946
2020-12-14 10:43:39 +01:00
Georgii Rymar
c7112b7126 [llvm-readobj] - For SHT_REL relocations, don't display an addend.
This is https://bugs.llvm.org/show_bug.cgi?id=44257.

In LLVM style we always print `0` as addend when dumping
SHT_REL relocations. It is confusing, this patch stops
printing it as the first comment on the bug page suggests.

Differential revision: https://reviews.llvm.org/D93033
2020-12-14 12:03:00 +03:00
Jan Svoboda
ff6316c907 [clang][cli] Better defaults for MarshallingInfoString
Depends on D84018

Reviewed By: Bigcheese

Original patch by Daniel Grumberg.

Differential Revision: https://reviews.llvm.org/D84185
2020-12-14 09:59:56 +01:00
Georgii Rymar
dff27ca0da [llvm-readelf] - Improve ELF type field dumping.
This is related to https://bugs.llvm.org/show_bug.cgi?id=40868.

Currently we don't print `OS Specific`/``Processor Specific`/`<unknown>`
prefixes when dumping the ELF file type. This is not consistent
with GNU readelf. The patch fixes it.

Also, this patch removes the `types.test`, because we already have
`file-types.test`, which tests more cases and this patch revealed that
we have such a duplicate.

Differential revision: https://reviews.llvm.org/D93096
2020-12-14 11:24:08 +03:00
sameeran joshi
53796ac006 [Flang][OpenMP-5.0] Semantic checks for flush construct.
From OMP 5.0 [2.17.8]
Restriction:
If memory-order-clause is release,acquire, or acq_rel, list items must not be specified on the flush directive.

Reviewed By: kiranchandramohan, clementval

Differential Revision: https://reviews.llvm.org/D89879
2020-12-14 13:30:48 +05:30
QingShan Zhang
054f5f2547 [PowerPC][FP128] Fix the incorrect signature for math library call
The runtime library has two family library implementation for ppc_fp128 and fp128.
For IBM Long double(ppc_fp128), it is suffixed with 'l', i.e(sqrtl). For
IEEE Long double(fp128), it is suffixed with "ieee128" or "f128".
We miss to map several libcall for IEEE Long double.

Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D91675
2020-12-14 07:52:56 +00:00
sameeran joshi
ab2dc4d459 [Flang][OpenMP] Semantic checks for Atomic construct.
Patch implements restrictions from 2.17.7  of OpenMP 5.0 standard for atomic Construct. Tests for the same are added.

One of the restriction
`OpenMP constructs may not be encountered during execution of an atomic region.`
Is mentioned in 5.0 standard to be a semantic restriction, but given the stricter nature of parser in F18 it's caught at parsing itself.

This patch is a next patch in series from D88965.

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D89583
2020-12-14 13:03:57 +05:30
Craig Topper
7b691db656 [LoopIdiom] Pre-commit tests for D92745. NFC 2020-12-13 23:25:00 -08:00
Anton Afanasyev
1f4e3377f8 [SLP][Test] Precommit test for D93192
This test shows failure of combined stores chains vectorization
2020-12-14 09:23:47 +03:00
Chen Zheng
28d16f3586 [MachineCombiner][NFC] Add MustReduceRegisterPressure goal
add a new goal MustReduceRegisterPressure for machine combiner pass.

PowerPC will use this new goal to do some register pressure related optimization.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D92068
2020-12-14 00:02:42 -05:00