1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 11:42:57 +01:00
Commit Graph

188956 Commits

Author SHA1 Message Date
Fangrui Song
510814e7a8 [ARM][MVE] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71062 2019-12-13 09:26:26 -08:00
LLVM GN Syncbot
beedccd28a gn build: Merge 84728e65e95 2019-12-13 16:20:43 +00:00
Miloš Stojanović
05b9c1cd8b [llvm-exegesis][mips] Add BenchmarkResultTest unit test
Test writing and reading benchmark instructions to and from disc, and
check calculations of min, max and avg values from a list of benchmark
measures.

Differential Revision: https://reviews.llvm.org/D71265
2019-12-13 17:02:19 +01:00
Sam Parker
a3cacd08b6 [ARM][MVE] Make VPT invalid for tail predication
We've been marking VPT incompatible instructions as invalid for tail
predication too, though this may not strictly be true. VPT are
incompatible and, unless its the first predicate def in a loop,
they shouldn't be compatible for tail predication either.

Differential Revision: https://reviews.llvm.org/D71410
2019-12-13 15:01:08 +00:00
Kristina Bessonova
965f2386e0 [llvm-dwarfdump][Statistics] Don't count coverage less than 1% as 0%
Summary:
This is a follow up for D70548.
Currently, variables with debug info coverage between 0% and 1% are put into
zero-bucket. D70548 changed the way statistics calculate a variable's coverage:
we began to use enclosing scope rather than a possible variable life range.
Thus more variables might be moved to zero-bucket despite they have some debug
info coverage.
The patch is to distinguish between a variable that has location info but
it's significantly less than its enclosing scope and a variable that doesn't
have it at all.

Reviewers: djtodoro, aprantl, dblaikie, avl

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71070
2019-12-13 17:34:58 +03:00
Nicola Zaghen
8d0fd71b2b Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

This fixes the buildbot failures.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-13 14:30:21 +00:00
Sanjay Patel
356915b055 [x86] add tests for shift-trunc-shift; NFC
More coverage for a possible generic transform.
2019-12-13 08:37:06 -05:00
Mikhail Maltsev
c04187c6e9 [ARM][MVE] Add vector reduction intrinsics with two vector operands
Summary:
This patch adds intrinsics for the following MVE instructions:
* VABAV
* VMLADAV, VMLSDAV
* VMLALDAV, VMLSLDAV
* VRMLALDAVH, VRMLSLDAVH

Each of the above 4 groups has a corresponding new LLVM IR intrinsic,
since the instructions cannot be easily represented using
general-purpose IR operations.

Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM

Reviewed By: MarkMurrayARM

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71062
2019-12-13 13:17:29 +00:00
Kristina Bessonova
f42c025b44 [llvm-dwarfdump][Statistics] Change the coverage buckets representation. NFC
Summary:
This changes the representation of 'coverage buckets' in llvm-dwarfdump and
llvm-locstats to one that makes more clear what the buckets contain.

See some related details in D71070.

Reviewers: djtodoro, aprantl, cmtice, jhenderson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71366
2019-12-13 16:08:25 +03:00
Simon Tatham
e5e610c551 [ARM][MVE] Add intrinsics for more immediate shifts.
Summary:
This fills in the remaining shift operations that take a single vector
input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and
`vshll[bt]` families.

`vshll[bt]` (which shifts each input lane left into a double-width
output lane) is the most interesting one. There are separate MC
instruction ids for shifting by exactly the input lane width and
shifting by less than that, because the instruction encoding is so
completely different for the lane-width special case. So I had to
write two sets of patterns to match based on the immediate shift
count, which involved adding a ComplexPattern matcher to avoid the
general-case pattern accidentally matching the special case too. For
that family I've made sure to add an llc codegen test for both
versions of each instruction.

I'm experimenting with a new strategy for parametrising the isel
patterns for all these instructions: adding extra fields to the
relevant `Instruction` subclass itself, which are ignored by the
Tablegen backends that generate the MC data, but can be retrieved from
each instance of that instruction subclass when it's passed as a
template parameter to the multiclass that generates its isel patterns.
A nice effect of that is that I can fill in those informational fields
using `let` blocks, rather than having to type them out once per
instruction at `defm` time.

(As a result, quite a lot of existing instruction `def`s are
reindented by this patch, so it's clearer to read with whitespace
changes ignored.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71458
2019-12-13 13:07:39 +00:00
John Brawn
27b751b877 [ARM] Add custom strict fp conversion lowering when non-strict is custom
We have custom lowering for operations converting to/from floating-point types
when we don't have hardware support for those types, and this doesn't interact
well with the target-independent legalization of the strict versions of these
operations. Fix this by adding similar custom lowering of the strict versions.

This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics
test, with the remaining failures due to poor instruction selection.

Differential Revision: https://reviews.llvm.org/D71127
2019-12-13 13:00:00 +00:00
Tim Renouf
030b9c457d Revert "AMDGPU: Try to commute sub of boolean ext"
This reverts commit 69fcfb7d3597e0cdb5554b4e672e9032b411b167.

As shown in the test I attached to this commit, the change I reverted
causes a problem with "zext(cc1) - zext(cc2)". It commuted
the operands to the sub and used different logic to select the addc/subc
instruction:
   sub zext (setcc), x => addcarry 0, x, setcc
   sub sext (setcc), x => subcarry 0, x, setcc

... but that is bogus. I believe it is not possible to fold those commuted
patterns into any form of addcarry or subcarry. It may have worked as
intended before "AMDGPU: Change boolean content type to 0 or 1" because
the setcc was considered to be -1 rather than 1.

Differential Revision: https://reviews.llvm.org/D70978

Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43
2019-12-13 12:49:06 +00:00
Djordje Todorovic
7c53d69f00 [llvm-locstats] Avoid the locstats when no scope bytes coverage found
If the total number of PC range bytes in each variable's enclosing scope
('scope bytes total') is 0, we will have division by zero.

Differential Revision: https://reviews.llvm.org/D71415
2019-12-13 13:45:44 +01:00
Alex Richardson
5244511578 [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Sjoerd Meijer
ca4474f01a Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC."
This reverts commit 9468e3334ba54fbb1b209aaec662d7375451fa1f.

There's a test that doesn't like this change. The RDA analysis
gets invalided by changes in the block, which is not taken into
account. Revert while I work on a fix for this.
2019-12-13 11:56:44 +00:00
Mark Murray
2d0e359316 [ARM][MVE][Intrinsics] Add *_x() variants of my *_m() intrinsics.
Summary:
Better use of multiclass is used, and this helped find some existing
bugs in the predicated VMULL* intrinsics, which are now fixed.

The refactored VMULL[TB]Q_(INT|POLY)_M() intrinsics were discovered
to have an argument ("inactive") with incorrect type, and this required
a fix that is included in this whole patch. The argument "inactive"
should have been the same width (per vector element) as the return
type of the intrinsic, but was not in the case where the return type
was double the element width of the input types.

To assist in testing the multiclassing , and to thwart further gremlins,
the unit tests are improved in scope.

The *.ll tests are all generated by a small bit of throw-away scripting
from the corresponding *.c tests, and as such the diffs are large and
nasty. Look at the file rather than the diff.

Reviewers: dmgreen, miyuki, ostannard, simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71421
2019-12-13 11:51:23 +00:00
Kai Nacke
520375e6f3 [Docs] Fix target feature matrix for PowerPC and SystemZ
The target feature matrix in the code generator documentation is
outdated. This PR fixes some entries for PowerPC and SystemZ.

Both have:
- assembly parser
- disassembler
- .o file writing

Reviewers: uweigand

Differential Revision: https://reviews.llvm.org/D71004
2019-12-13 06:18:08 -05:00
Kerry McLaughlin
07c7fe4c54 Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch
to use reg + imm non-temporal loads to fix previous test failures.

Original commit message:

Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
2019-12-13 10:08:20 +00:00
David Stenberg
fa448e3ce2 [LiveDebugValues] Omit entry values for DBG_VALUEs with pre-existing expressions
Summary:
This is a quickfix for PR44275. An assertion that checks that the
DIExpression is valid failed due to attempting to create an entry value
for an indirect parameter. This started appearing after D69028, as the
indirect parameter started being represented using an DW_OP_deref,
rather than with the DBG_VALUE's second operand, meaning that the
isIndirectDebugValue() check in LiveDebugValues did not exclude such
parameters. A DIExpression that has an entry value operation can
currently not have any other operation, leading to the failed isValid()
check.

This patch simply makes us stop considering emitting entry values
for such parameters. To support such cases I think we at least need
to do the following changes:

 * In DIExpression::isValid(): Remove the limitation that a
   DW_OP_LLVM_entry_value operation can be the only operation in a
   DIExpression.

 * In LiveDebugValues::emitEntryValues(): Create an entry value of size
   1, so that it only wraps the register operand, and not the whole
   pre-existing expression (the DW_OP_deref).

 * In LiveDebugValues::removeEntryValue(): Check that the new debug
   value has the same debug expression as the original, rather than
   checking that the debug expression is empty.

 * In DwarfExpression::addMachineRegExpression(): Modify the logic so
   that a DW_OP_reg* expression is emitted for the entry value.
   That is how GCC emits entry values for indirect parameters. That will
   currently not happen to due the DW_OP_deref causing the
   !HasComplexExpression to fail. The LocationKind needs to be changed
   also, rather than always emitting a DW_OP_stack_value for entry values.

There are probably more things I have missed, but that could hopefully
be a good starting point for emitting such entry values.

Reviewers: djtodoro, aprantl, jmorse, vsk

Reviewed By: aprantl, vsk

Subscribers: hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D71416
2019-12-13 10:49:46 +01:00
Georgii Rymar
5ee22c8a68 [yaml2obj] - Add a way to override sh_flags section field.
Currently we have the `Flags` property that allows to
set flags for a section. The problem is that it does not
allow us to set an arbitrary value, because of bit fields
validation under the hood. An arbitrary values can be used
to test specific broken cases.

We probably do not want to relax the validation, so this
patch adds a `ShSize` property that allows to
override the `sh_size`. It is inline with others `Sh*` properties
we have already.

Differential revision: https://reviews.llvm.org/D71411
2019-12-13 11:54:37 +03:00
Georgii Rymar
0c77a265d1 [llvm-readobj] - Fix letters used for dumping section types in GNU style.
I've noticed that when we have all regular flags set, we print "WAEXMSILoGTx"
instead of "WAXMSILOGTCE" printed by GNU readelf.

It happens because:
1) We print SHF_EXCLUDE at the wrong place.
2) We do not recognize SHF_COMPRESSED, we print "x" instead of "C".
3) We print "o" instead of "O" for SHF_OS_NONCONFORMING.

This patch fixes differences and adds test cases.

Differential revision: https://reviews.llvm.org/D71418
2019-12-13 11:31:24 +03:00
Craig Topper
f2c16e8f5e [LegalizeTypes] Remove unnecessary if before calling ReplaceValueWith on the chain in SoftenFloatRes_LOAD.
I believe this is a leftover from when fp128 was softened to fp128
on X86-64. In that case type legalization must have been able to
create a load that was the same as N which would make this
replacement fail or assert. Since we no longer do that, this
check should be unneeded.
2019-12-13 00:14:41 -08:00
Nikita Popov
76a5a184aa Reapply [LVI] Normalize pointer behavior
This is a rebase of the change over D70376, which fixes an LVI cache
invalidation issue that also affected this patch.

-----

Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this change not fully NFC, because we can now fold
terminator icmps based on the dereferencability information in the
same block. This is the reason why I changed one JumpThreading test
(it would optimize the condition away without the change).

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914
2019-12-13 08:59:58 +01:00
Rui Ueyama
bd7cf0f396 Revert an accidental commit af5ca40b47b3e85c3add81ccdc0b787c4bc355ae 2019-12-13 15:17:40 +09:00
Rui Ueyama
8ba96bda06 temporary 2019-12-13 14:35:03 +09:00
Nate Voorhies
95d42df811 [NFC][AArch64] Fix typo.
Summary: Coaleascer should be coalescer.

Reviewers: qcolombet, Jim

Reviewed By: Jim

Subscribers: Jim, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70731
2019-12-13 10:25:36 +08:00
Eric Christopher
a1871cdc3f Temporarily revert "NFC: DebugInfo: Refactor RangeSpanList to be a struct, like DebugLocStream::List"
as it was causing bot and build failures.

This reverts commit 8e04896288d22ed8bef7ac367923374f96b753d6.
2019-12-12 17:55:41 -08:00
David Blaikie
dc5cbd92c1 NFC: DebugInfo: Refactor RangeSpanList to be a struct, like DebugLocStream::List
Move these data structures closer together so their emission code can
eventually share more of its implementation.
2019-12-12 16:53:59 -08:00
David Blaikie
ee634363a8 NFC: DebugInfo: Refactor debug_loc/loclist emission into a common function
(except for v4 loclists, which are sufficiently different to not fit
well in this generic implementation)

In subsequent patches I intend to refactor the DebugLoc and ranges data
structures to be more similar so I can common more of the implementation
here.
2019-12-12 16:39:12 -08:00
Evgenii Stepanov
1b96f8703a hwasan: add tag_offset DWARF attribute to optimized debug info
Summary:
Support alloca-referencing dbg.value in hwasan instrumentation.
Update AsmPrinter to emit DW_AT_LLVM_tag_offset when location is in
loclist format.

Reviewers: pcc

Subscribers: srhines, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70753
2019-12-12 16:18:54 -08:00
Danilo Carvalho Grael
d624de3264 [AArch64][SVE] Add integer arithmetic with immediate instructions.
Summary:
Add pattern matching for the following instructions:
- add, sub, subr, sqadd, sqsub, uqadd, uqsub

This patch required complex patterns to match the immediate with optinal left shift.

I re-used the Select function from the other SVE repo to implement the complext pattern.

I plan on doing another patch to also match constant vector of the same immediate.

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71370
2019-12-12 17:28:46 -05:00
Johannes Doerfert
bde45fcf38 [Attributor][FIX] Do treat byval arguments special
When we reason about the pointer argument that is byval we actually
reason about a local copy of the value passed at the call site. This was
not the case before and we wrongly introduced attributes based on the
surrounding function.

AAMemoryBehaviorArgument, AAMemoryBehaviorCallSiteArgument and
AANoCaptureCallSiteArgument are made aware of byval now. The code
to skip "subsuming positions" for reasoning follows a common pattern and
we should refactor it. A TODO was added.

Discovered by @efriedma as part of D69748.
2019-12-12 16:04:21 -06:00
Denis Bakhvalov
6605830fb0 [NFC][InstSimplify] Refactoring ThreadCmpOverSelect function
Removed code duplication in ThreadCmpOverSelect and broke it
into several smaller functions for reusing them.

Differential Revision: https://reviews.llvm.org/D71158
2019-12-12 22:45:58 +01:00
Sanjay Patel
05003a5b3c Revert "[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc"
This reverts commit 8963332c3327daa652ba3e26d35f9109b6991985.
There was a logic bug typo in this code, but it wasn't visible in the asm for the tests.
2019-12-12 16:24:40 -05:00
Sanjay Patel
183b3675ca [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc
This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm

Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16

Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16

...because D58017 exposes a regression without this fold.
2019-12-12 15:44:13 -05:00
Teresa Johnson
3a2a2bb628 [LTO] Support for embedding bitcode section during LTO
Summary:
This adds support for embedding bitcode in a binary during LTO. The libLTO gains supports the `-lto-embed-bitcode` flag. The option allows users of the LTO library to embed a bitcode section. For example, LLD can pass the option via `ld.lld -mllvm=-lto-embed-bitcode`.

This feature allows doing something comparable to `clang -c -fembed-bitcode`, but on the (LTO) linker level. Having bitcode alongside native code has many use-cases. To give an example, the MacOS linker can create a `-bitcode_bundle` section containing bitcode. Also, having this feature built into LLVM is an alternative to 3rd party tools such as [[ https://github.com/travitch/whole-program-llvm | wllvm ]] or [[ https://github.com/SRI-CSL/gllvm | gllvm ]]. As with these tools, this feature simplifies creating "whole-program" llvm bitcode files, but in contrast to wllvm/gllvm it does not rely on a specific llvm frontend/driver.

Patch by Josef Eisl <josef.eisl@oracle.com>

Reviewers: #llvm, #clang, rsmith, pcc, alexshap, tejohnson

Reviewed By: tejohnson

Subscribers: tejohnson, mehdi_amini, inglorion, hiraditya, aheejin, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits, #llvm, #clang

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68213
2019-12-12 12:34:19 -08:00
Tony
39015f8b4f [AMDGPU] AMDGPUUsage clarify address space information and other typo and formatting fixes
Summary:
- Clarify AMDGPU address spaces.
- Correct path to AMDGPU backend since now in the mono-repo.
- Fix numerous text style and typo issues.
- Correct reStructure text formatting warnings.
- Made reStructure directive usage more consistent.
- Add references for gfx10 ISA specification.

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71392
2019-12-12 14:51:27 -05:00
Kit Barton
6565d62ef3 Rename LoopInfo::isRotated() to LoopInfo::isRotatedForm().
This patch renames the LoopInfo::isRotated() method to LoopInfo::isRotatedForm()
to make it clear that the method checks whether the loop is in rotated form, not
whether the loop has been rotated by the LoopRotation pass.
2019-12-12 14:22:36 -05:00
Jonas Paulsson
68d81cc406 [SystemZ] Implement the packed stack layout
Any llvm function with the "packed-stack" attribute will be compiled to use
the packed stack layout which reuses unused parts of the incoming register
save area. This is needed for building the Linux kernel.

Review: Ulrich Weigand
https://reviews.llvm.org/D70821
2019-12-12 10:26:03 -08:00
Sanjay Patel
80a23ffb66 [DAGCombiner] improve readability
This is not quite NFC because I changed the SDLoc to use the more
standard 'N' (the starting node for the fold).

This transform is a special-case of a more general fold that we
do in IR, but it seems like the general fold is needed here too
to avoid a potential regression seen in D58017.

https://rise4fun.com/Alive/3jZm
2019-12-12 13:16:50 -05:00
Sanjay Patel
f806d650f4 [AArch64][PowerPC] add tests for shift sandwich; NFC 2019-12-12 12:37:02 -05:00
Florian Hahn
1991907bad [BasicAA] Use GEP as context for computeKnownBits in aliasGEP.
In order to use assumptions, computeKnownBits needs a context
instruction. We can use the GEP, if it is an instruction. We already
pass the assumption cache, but it cannot be used without a context
instruction.

Reviewers: anemet, asbirlea, hfinkel, spatel

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D71264
2019-12-12 17:18:04 +00:00
Michael Liao
8950bb7c13 [amdgpu] Fix -Wenum-compare warning. NFC. 2019-12-12 11:44:16 -05:00
LLVM GN Syncbot
156014a0e8 gn build: Merge 526244b187d 2019-12-12 15:45:15 +00:00
Florian Hahn
1643768c5c [Matrix] Add first set of matrix intrinsics and initial lowering pass.
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html

The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.

Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.

For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.

This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
 * Shape propagation to eliminate the embedding/splitting for each
   intrinsic.
 * Fused & tiled lowering of multiply and other operations.
 * Optimization remarks highlighting matrix expressions and costs.
 * Generate loops for operations on large matrixes.
 * More general block processing for operation on large vectors,
   exploiting shape information.

We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
  (1) become unwieldy for larger matrixes (even for 16x16 matrixes,
      the resulting shufflevector masks would be huge),
  (2) risk instcombine making small changes, causing us to fail to
      detect the transpose, preventing better lowerings

For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70456
2019-12-12 15:42:18 +00:00
Sjoerd Meijer
6e99e23adb [ARM][MVE] findVCMPToFoldIntoVPS. NFC.
This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can
reimplement findVCMPToFoldIntoVPS with just a few calls to RDA.

Differential Revision: https://reviews.llvm.org/D71330
2019-12-12 15:41:20 +00:00
Guillaume Chatelet
60bec22752 [Alignment][NFC] Adding Align compatible methods to IntrinsicInst/IRBuilder
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71420
2019-12-12 16:22:15 +01:00
LLVM GN Syncbot
554f3952e8 gn build: Merge 600d123c6ff 2019-12-12 15:01:28 +00:00
Tom Stellard
c17f304125 AMDGPU/SILoadStoreOptimizer: Simplify function
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71044
2019-12-12 06:53:03 -08:00
Sam Parker
38470d9745 [ARM][MVE] Sink vector shift operand
Recommit e0b966643fc2. sub instructions were being generated for the
negated value, and for some reason they were the register only ones.
I think the problem was because I was grabbing the 'zero' from
vmovimm, which is a target constant. Now I'm just generating a new
Constant zero and so rsb instructions are now generated.

Original commit message:

The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 14:34:00 +00:00