1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00
Commit Graph

4068 Commits

Author SHA1 Message Date
Sriraman Tallam
5afa21a58e Basic Block Sections support in LLVM.
This is the second patch in a series of patches to enable basic block
sections support.

This patch adds support for:

* Creating direct jumps at the end of basic blocks that have fall
through instructions.
* New pass, bbsections-prepare, that analyzes placement of basic blocks
in sections.
* Actual placing of a basic block in a unique section with special
handling of exception handling blocks.
* Supports placing a subset of basic blocks in a unique section.
* Support for MIR serialization and deserialization with basic block
sections.

Parent patch : D68063
Differential Revision: https://reviews.llvm.org/D73674
2020-03-16 16:06:54 -07:00
Sriraman Tallam
0515e4e8ba Basic Block Sections Support.
This is the first in a series of patches to enable Basic Block Sections
in LLVM.

We introduce a new compiler option, -fbasicblock-sections=, which places every
basic block in a unique ELF text section in the object file along with a
symbol labeling the basic block. The linker can then order the basic block
sections in any arbitrary sequence which when done correctly can encapsulate
block layout, function layout and function splitting optimizations. However,
there are a couple of challenges to be addressed for this to be feasible:

1) The compiler must not allow any implicit fall-through between any two
   adjacent basic blocks as they could be reordered at link time to be
   non-adjacent. In other words, the compiler must make a fall-through
   between adjacent basic blocks explicit by retaining the direct jump
   instruction that jumps to the next basic block. These branches can only
   be removed later by the linker after the blocks have been reordered.
2) All inter-basic block branch targets would now need to be resolved by
   the linker as they cannot be calculated during compile time. This is
   done using static relocations which bloats the size of the object files.
   Further, the compiler tries to use short branch instructions on some ISAs
   for branch offsets that can be accommodated in one byte. This is not
   possible with basic block sections as the offset is not determined at
   compile time, and long branch instructions have to be used everywhere.
3) Each additional section bloats object file sizes by tens of bytes. The
   number of basic blocks can be potentially very large compared to the
   size of functions and can bloat object sizes significantly. Option
   fbasicblock-sections= also takes a file path which can be used to
   specify a subset of basic blocks that needs unique sections to keep
   the bloats small.
4) Debug Info and CFI need special handling and will be presented as
   separate patches.

Basic Block Labels

With -fbasicblock-sections=labels, or when a basic block is placed in a
unique section, it is labelled with a symbol. This allows easy mapping of
virtual addresses from PMU profiles back to the corresponding basic blocks.
Since the number of basic blocks is large, the labeling bloats the symbol
table sizes and the string table sizes significantly. While the binary size
does increase, it does not affect performance as the symbol table is not
loaded in memory during run-time. The string table size bloat is kept very
minimal using a unary naming scheme that uses string suffix compression.
The basic blocks for function foo are named "a.BB.foo", "aa.BB.foo", ...
This turns out to be very good for string table sizes and the bloat in the
string table size for a very large binary is ~8 %. The naming also allows
using the --symbol-ordering-file option in LLD to arbitrarily reorder the
sections.

Differential Revision: https://reviews.llvm.org/D68063
2020-03-14 18:22:19 -07:00
Nico Weber
55ba7badfd Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit 5aa5c943f7da155b95564058cd5d50a93eabfc89.
Causes clang to assert, see
https://bugs.chromium.org/p/chromium/issues/detail?id=1061533#c4
for a repro.
2020-03-13 15:37:44 -04:00
Simon Cook
bae1c75f0d [TableGen] Support combining AssemblerPredicates with ORs
For context, the proposed RISC-V bit manipulation extension has a subset
of instructions which require one of two SubtargetFeatures to be
enabled, 'zbb' or 'zbp', and there is no defined feature which both of
these can imply to use as a constraint either (see comments in D65649).

AssemblerPredicates allow multiple SubtargetFeatures to be declared in
the "AssemblerCondString" field, separated by commas, and this means
that the two features must both be enabled. There is no equivalent to
say that _either_ feature X or feature Y must be enabled, short of
creating a dummy SubtargetFeature for this purpose and having features X
and Y imply the new feature.

To solve the case where X or Y is needed without adding a new feature,
and to better match a typical TableGen style, this replaces the existing
"AssemblerCondString" with a dag "AssemblerCondDag" which represents the
same information. Two operators are defined for use with
AssemblerCondDag, "all_of", which matches the current behaviour, and
"any_of", which adds the new proposed ORing features functionality.

This was originally proposed in the RFC at
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139138.html

Changes to all current backends are mechanical to support the replaced
functionality, and are NFCI.

At this stage, it is illegal to combine features with ands and ors in a
single AssemblerCondDag. I suspect this case is sufficiently rare that
adding more complex changes to support it are unnecessary.

Differential Revision: https://reviews.llvm.org/D74338
2020-03-13 17:13:51 +00:00
Kazushi (Jam) Marukawa
ed35b7a19d [VE] Target-specific bit size for sjljehprepare
Summary:
This patch extends the TargetMachine to let targets specify the integer size
used by the sjljehprepare pass. This is 64bit for the VE target and otherwise
defaults to 32bit for all targets, which was hard-wired before.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D71337
2020-03-10 17:51:16 +01:00
Djordje Todorovic
a4907284d9 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-03-10 09:15:06 +01:00
Djordje Todorovic
df8a19e738 [CallSiteInfo] Enable the call site info only for -g + optimizations
Emit call site info only in the case of '-g' + 'O>0' level.

Differential Revision: https://reviews.llvm.org/D75175
2020-03-09 12:12:44 +01:00
Bevin Hansson
4f8b0d2f56 [Intrinsic] Add fixed point saturating division intrinsics.
Summary:
This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:

```
llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*
```

These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.

Reviewers: bjope, leonardchan, craig.topper

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71550
2020-02-24 10:50:52 +01:00
Djordje Todorovic
e67d8322ba Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGfaff707db82d.
A failure found on an ARM 2-stage buildbot.
The investigation is needed.
2020-02-20 14:41:39 +01:00
Djordje Todorovic
fdc5995043 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-02-19 11:12:26 +01:00
Thomas Lively
efadf1f9df Reland "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
This reverts commit 649aba93a27170cb03a4b17c98a19b9237a880b8, now that
the approach started there has been shown to be workable in the patch
series culminating in https://reviews.llvm.org/D74192.
2020-02-18 13:49:46 -08:00
Matt Arsenault
3a92d93149 CodeGen: Move undef_tied_input declaration
This doesn't belong in ARM specific code since it's generally
recognized by tablegen.
2020-02-18 10:33:10 -08:00
Djordje Todorovic
a8a9374ec7 Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGa82d3e8a6e67.
2020-02-18 16:38:11 +01:00
Djordje Todorovic
7e0c075702 Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-18 14:41:08 +01:00
Stanislav Mekhanoshin
e58b532e8c [TBLGEN] Inhibit generation of unneeded psets
Differential Revision: https://reviews.llvm.org/D74744
2020-02-17 15:38:08 -08:00
Stanislav Mekhanoshin
20cb1e9351 [TBLGEN] Allow to override RC weight
Differential Revision: https://reviews.llvm.org/D74509
2020-02-14 15:49:52 -08:00
Djordje Todorovic
45581db685 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Djordje Todorovic
69262470d0 [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00
Matt Arsenault
dffd3d8c49 GlobalISel: Assume G_INTRINSIC* are convergent
This is safer in case anyone tries to run MI optimization passes on
pre-selected MIR. If there turns out to be a real reason to do this,
we might need to add separate convergent intrinsic opcodes.
2020-02-05 10:17:22 -08:00
Thomas Lively
352e779087 Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
Summary:
This reverts commit 3ef169e586f4d14efe690c23c878d5aa92a80eb5. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.

Depends on D73927.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73928
2020-02-04 20:04:59 -08:00
Reid Kleckner
4849f968cb [SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.

Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.

Fixes PR44697

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D73752
2020-02-04 15:13:12 -08:00
Amara Emerson
a64dab6312 [GlobalISel] Add new combine to convert scalar G_MUL to G_SHL.
For pow2 constants we should use G_SHL for pattern matching (and perf)
purposes later.

Vector support not yet implemented.

Differential Revision: https://reviews.llvm.org/D73659
2020-01-29 13:39:00 -08:00
Evandro Menezes
1d4ffd7bad [PATCH] [Target] Test commit
Modify comment to reflect the current users of `Regisgter.CostPerUse`.
2020-01-24 15:56:08 -06:00
David Tenty
71acb7b4cf [NFC][XCOFF] Refactor Csect creation into TargetLoweringObjectFile
Summary:
We create a number of standard types of control sections in multiple places for
things like the function descriptors, external references and the TOC anchor
among others, so it is possible for  their properties to be defined
inconsistently in different places. This refactor moves their creation and
properties into functions in the TargetLoweringObjectFile class hierarchy, where
functions for retrieving various special types of sections typically seem
to reside.

Note: There is one case in PPCISelLowering which is specific to function entry
points which we don't address since we don't have access to the TLOF there.

Reviewers: DiggerLin, jasonliu, hubert.reinterpretcast

Reviewed By: jasonliu, hubert.reinterpretcast

Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72347
2020-01-22 12:09:11 -05:00
Sander de Smalen
c07e22a824 Add support for (expressing) vscale.
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:

  getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1

This can be used to propagate the value in CodeGenPrepare.

In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.

This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).

Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner

Reviewed by: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68203
2020-01-22 10:09:27 +00:00
Thomas Lively
7987888a3f [WebAssembly][InstrEmitter] Foundation for multivalue call lowering
Summary:
WebAssembly is unique among upstream targets in that it does not at
any point use physical registers to store values. Instead, it uses
virtual registers to model positions in its value stack. This means
that some target-independent lowering activities that would use
physical registers need to use virtual registers instead for
WebAssembly and similar downstream targets. This CL generalizes the
existing `usesPhysRegsForPEI` lowering hook to
`usesPhysRegsForValues` in preparation for using it in more places.

One such place is in InstrEmitter for instructions that have variadic
defs. On register machines, it only makes sense for these defs to be
physical registers, but for WebAssembly they must be virtual registers
like any other values. This CL changes InstrEmitter to check the new
target lowering hook to determine whether variadic defs should be
physical or virtual registers.

These changes are necessary to support a generalized CALL instruction
for WebAssembly that is capable of returning an arbitrary number of
arguments. Fully implementing that instruction will require additional
changes that are described in comments here but left for a follow up
commit.

Reviewers: aheejin, dschuff, qcolombet

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71484
2020-01-21 11:13:46 -08:00
Fangrui Song
6fdf7be507 [XRay] Set hasSideEffects flag of PATCHABLE_FUNCTION_{ENTER,EXIT}
Otherwise they may be picked as the delay slot by mips-delay-slot-filler, if we move patchable-function before mips-delay-slot-filler.
2020-01-19 00:09:46 -08:00
Matt Arsenault
534a1fba52 TableGen: Remove dead code 2020-01-16 13:49:43 -05:00
Sam McCall
f288794ade [Target] Fix uninitialized value in 10c11e4e2d05cf0e8f8251f50d84ce77eb1e9b8d 2020-01-14 11:28:13 +01:00
KAWASHIMA Takahiro
78130ba3e6 This option allows selecting the TLS size in the local exec TLS model,
which is the default TLS model for non-PIC objects. This allows large/
many thread local variables or a compact/fast code in an executable.

Specification is same as that of GCC. For example, the code model
option precedes the TLS size option.

TLS access models other than local-exec are not changed. It means
supoort of the large code model is only in the local exec TLS model.

Patch By KAWASHIMA Takahiro (kawashima-fj <t-kawashima@fujitsu.com>)
Reviewers: dmgreen, mstorsjo, t.p.northover, peter.smith, ostannard
Reviewd By: peter.smith
Committed by: peter.smith

Differential Revision: https://reviews.llvm.org/D71688
2020-01-13 10:16:53 +00:00
Vedant Kumar
ac509eb162 [AArch64] Add isAuthenticated predicate to MCInstDesc
Add a predicate to MCInstDesc that allows tools to determine whether an
instruction authenticates a pointer. This can be used by diagnostic
tools to hint at pointer authentication failures.

Differential Revision: https://reviews.llvm.org/D70329

rdar://55089604
2020-01-10 14:30:52 -08:00
Peng Guo
c2705a1490 [MIR] Fix cyclic dependency of MIR formatter
Summary:
Move MIR formatter pointer from TargetMachine to TargetInstrInfo to
avoid cyclic dependency between target & codegen.

Reviewers: dsanders, bkramer, arsenm

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72485
2020-01-10 11:18:12 +01:00
Matt Arsenault
5dd6dcdb6a TableGen/GlobalISel: Add way for SDNodeXForm to work on timm
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.

Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.

Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.

Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.

Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.

One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
2020-01-09 17:37:52 -05:00
Matt Arsenault
fd23fd25d2 GlobalISel: Handle llvm.read_register
Compared to the attempt in bdcc6d3d2638b3a2c99ab3b9bfaa9c02e584993a,
this uses intermediate generic instructions.
2020-01-09 17:37:52 -05:00
Daniel Sanders
0bae600557 Revert "Revert "[MIR] Target specific MIR formating and parsing""
There was an unguarded dereference of MF in a function that permitted
nullptr. Fixed

This reverts commit 71d64f72f934631aa2f12b9542c23f74f256f494.
2020-01-08 20:03:29 -08:00
Nico Weber
b9df07b9a2 Revert "[MIR] Target specific MIR formating and parsing"
This reverts commit 3ef05d85be8c3666ebfa3ad986eb334da5195a47.
It broke check-llvm on many bots, see comments on D69836.
2020-01-08 22:50:49 -05:00
Peng Guo
37a43dcd09 [MIR] Target specific MIR formating and parsing
Summary:
Added MIRFormatter for target specific MIR formating and parsing with
immediate and custom pseudo source values. Target machine can subclass
MIRFormatter and implement custom logic for printing and parsing
immediate and custom pseudo source values for better readability.

* Target specific immediate mnemonic need to start with "." follows by
  identifier string. When MIR parser sees immediate it will call target
  specific parsing function.

* Custom pseudo source value need to start with custom follows by
  double-quoted string. MIR parser will pass the quoted string to target
  specific PSV parsing function.

* MIRFormatter have 2 helper functions to facilitate LLVM value printing
  and parsing for custom PSV if they refers LLVM values.

Patch by Peng Guo

Reviewers: dsanders, arsenm

Reviewed By: dsanders

Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69836
2020-01-08 18:48:02 -08:00
Daniel Sanders
8cd252daeb Revert "[MIR] Target specific MIR formating and parsing"
Forgot to credit Peng in the commit message.

This reverts commit be841f89d0014b1e0246a4feae941b2f74abd908.
2020-01-08 18:48:02 -08:00
Peng Guo
1a654233d1 [MIR] Target specific MIR formating and parsing
Summary:
Added MIRFormatter for target specific MIR formating and parsing with
immediate and custom pseudo source values. Target machine can subclass
MIRFormatter and implement custom logic for printing and parsing
immediate and custom pseudo source values for better readability.

* Target specific immediate mnemonic need to start with "." follows by
  identifier string. When MIR parser sees immediate it will call target
  specific parsing function.

* Custom pseudo source value need to start with custom follows by
  double-quoted string. MIR parser will pass the quoted string to target
  specific PSV parsing function.

* MIRFormatter have 2 helper functions to facilitate LLVM value printing
  and parsing for custom PSV if they refers LLVM values.

Reviewers: dsanders, arsenm

Reviewed By: dsanders

Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69836
2020-01-08 18:34:21 -08:00
Bevin Hansson
21be0de34d [Intrinsic] Add fixed point division intrinsics.
Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:

  llvm.sdiv.fix.*
  llvm.udiv.fix.*

These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.

Patch by: ebevhan

Reviewers: bjope, leonardchan, efriedma, craig.topper

Reviewed By: craig.topper

Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70007
2020-01-08 15:17:46 +01:00
Amara Emerson
52c3b98b3d [AArch64][GlobalISel] Fold a chain of two G_PTR_ADDs of constant offsets.
E.g.
%addr1 = G_PTR_ADD %base, G_CONSTANT 20
%addr2 = G_PTR_ADD %addr1, G_CONSTANT 8
  -->
%addr2 = G_PTR_ADD %base, G_CONSTANT 28

Differential Revision: https://reviews.llvm.org/D72351
2020-01-07 14:12:42 -08:00
Daniel Sanders
728c2ac3b1 [gicombiner] Add GIMatchTree and use it for the code generation
Summary:
GIMatchTree's job is to build a decision tree by zipping all the
GIMatchDag's together.

Each DAG is added to the tree builder as a leaf and partitioners are used
to subdivide each node until there are no more partitioners to apply. At
this point, the code generator is responsible for testing any untested
predicates and following any unvisited traversals (there shouldn't be any
of the latter as the getVRegDef partitioner handles them all).

Note that the leaves don't always fit into partitions cleanly and the
partitions may overlap as a result. This is resolved by cloning the leaf
into every partition it belongs to. One example of this is a rule that can
match one of N opcodes. The leaf for this rule would end up in N partitions
when processed by the opcode partitioner. A similar example is the
getVRegDef partitioner where having rules (add $a, $b), and (add ($a, $b), $c)
will result in the former being in the partition for successfully
following the vreg-def and failing to do so as it doesn't care which
happens.

Depends on D69151

Fixed the issues with the windows bots which were caused by stdout/stderr
interleaving.

Reviewers: bogner, volkan

Reviewed By: volkan

Subscribers: lkail, mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69152
2020-01-07 11:12:53 -08:00
Matt Arsenault
3006dd38dd AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR 2020-01-06 11:19:33 -05:00
James Henderson
91705af363 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
Matt Arsenault
522144cc70 GlobalISel: Define G_READCYCLECOUNTER 2020-01-04 13:10:19 -05:00
Daniel Sanders
c661517808 Revert "[gicombiner] Add GIMatchTree and use it for the code generation"
All the windows bots are failing match-tree.td and there's no obvious cause that
I can see. It's not just the %p formatting problem. My best guess is that
there's an ordering issue too but I'll need further information to figure that
out. Revert while I'm investigating.

This reverts commit 64f1bb5cd2c6d69af7c74ec68840029603560238 and 77d4b5f5feff663e70b347516cc4c77fa5cd2a20
2020-01-03 18:17:00 -08:00
Daniel Sanders
6d286aba49 [gicombiner] Add GIMatchTree and use it for the code generation
Summary:
GIMatchTree's job is to build a decision tree by zipping all the
GIMatchDag's together.

Each DAG is added to the tree builder as a leaf and partitioners are used
to subdivide each node until there are no more partitioners to apply. At
this point, the code generator is responsible for testing any untested
predicates and following any unvisited traversals (there shouldn't be any
of the latter as the getVRegDef partitioner handles them all).

Note that the leaves don't always fit into partitions cleanly and the
partitions may overlap as a result. This is resolved by cloning the leaf
into every partition it belongs to. One example of this is a rule that can
match one of N opcodes. The leaf for this rule would end up in N partitions
when processed by the opcode partitioner. A similar example is the
getVRegDef partitioner where having rules (add $a, $b), and (add ($a, $b), $c)
will result in the former being in the partition for successfully
following the vreg-def and failing to do so as it doesn't care which
happens.

Depends on D69151

Reviewers: bogner, volkan

Reviewed By: volkan

Subscribers: lkail, mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69152
2020-01-03 16:23:23 -08:00
Matt Arsenault
62bbba4bb4 GlobalISel: Define equivalent node for G_INTRINSIC_ROUND 2019-12-24 10:36:54 -05:00
Matt Arsenault
e430cb728d GlobalISel: Define equivalent node for G_INTRINSIC_TRUNC 2019-12-24 09:53:01 -05:00
Ulrich Weigand
924e626fc4 [FPEnv] Strict versions of llvm.minimum/llvm.maximum
Add new intrinsics
   llvm.experimental.constrained.minimum
   llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.

Includes SystemZ back-end support.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D71624
2019-12-18 21:35:28 +01:00
Daniel Sanders
365070365b [gicombiner] Import tryCombineIndexedLoadStore()
Summary:
Now that arbitrary data is supported, import tryCombineIndexedLoadStore()

Depends on D69147

Reviewers: bogner, volkan

Reviewed By: volkan

Subscribers: hiraditya, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69151
2019-12-18 14:41:38 +00:00
Daniel Sanders
1ee93efcd0 [gicombiner] Add support for arbitrary match data being passed from match to apply
Summary:
This is used by the extending_loads combine to tell the apply step which
use is the preferred one to fold and the other uses should be re-written
to consume.

Depends on D69117

Reviewers: volkan, bogner

Reviewed By: volkan

Subscribers: hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69147
2019-12-18 12:27:29 +00:00
Ulrich Weigand
e14932b2ad [SystemZ][FPEnv] Back-end support for STRICT_[SU]INT_TO_FP
As of b1d8576 there is middle-end support for STRICT_[SU]INT_TO_FP,
so this patch adds SystemZ back-end support as well.

The patch is SystemZ target specific except for adding SD patterns
strict_[su]int_to_fp and any_[su]int_to_fp to TargetSelectionDAG.td
as usual.
2019-12-17 18:24:05 +01:00
Melanie Blower
fe1f570fe3 Reapply af57dbf12e54 "Add support for options -frounding-math, ftrapping-math, -ffp-model=, and -ffp-exception-behavior="
Patch was reverted because https://bugs.llvm.org/show_bug.cgi?id=44048
        The original patch is modified to set the strictfp IR attribute
        explicitly in CodeGen instead of as a side effect of IRBuilder.
        In the 2nd attempt to reapply there was a windows lit test fail, the
        tests were fixed to use wildcard matching.

        Differential Revision: https://reviews.llvm.org/D62731
2019-12-05 03:48:04 -08:00
Melanie Blower
ba53e77877 Revert " Reapply af57dbf12e54 "Add support for options -frounding-math, ftrapping-math, -ffp-model=, and -ffp-exception-behavior=""
This reverts commit cdbed2dd856c14687efd741c2d8321686102acb8.
Build break on Windows (lit fail)
2019-12-04 12:21:23 -08:00
Melanie Blower
794b22ce6e Reapply af57dbf12e54 "Add support for options -frounding-math, ftrapping-math, -ffp-model=, and -ffp-exception-behavior="
Patch was reverted because https://bugs.llvm.org/show_bug.cgi?id=44048
        The original patch is modified to set the strictfp IR attribute
        explicitly in CodeGen instead of as a side effect of IRBuilder

        Differential Revision: https://reviews.llvm.org/D62731
2019-12-04 11:32:33 -08:00
David Green
0f0c4be1eb [Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.

To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.

This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
   Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
   legality of masked operations as well as normal ones. This array is
   fairly small, so doubling the size still won't make it very large.
   Offset masked loads can then be controlled with
   setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
   CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
   the same way.
- The ARM backend is then adjusted to make use of these indexed masked
   loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.

Differential Revision: https://reviews.llvm.org/D70176
2019-11-26 16:21:01 +00:00
Eric Christopher
f6be1a2758 Temporarily Revert "Add support for options -frounding-math, ftrapping-math, -ffp-model=, and -ffp-exception-behavior="
and a follow-up NFC rearrangement as it's causing a crash on valid. Testcase is on the original review thread.

This reverts commits af57dbf12e54f3a8ff48534bf1078f4de104c1cd and e6584b2b7b2de06f1e59aac41971760cac1e1b79
2019-11-18 10:46:48 -08:00
Reid Kleckner
3f676674e7 Move CodeGenFileType enum to Support/CodeGen.h
Avoids the need to include TargetMachine.h from various places just for
an enum. Various other enums live here, such as the optimization level,
TLS model, etc. Data suggests that this change probably doesn't matter,
but it seems nice to have anyway.
2019-11-13 16:39:34 -08:00
Melanie Blower
5d2d5957aa Add support for options -frounding-math, ftrapping-math, -ffp-model=, and -ffp-exception-behavior=
Add options to control floating point behavior: trapping and
    exception behavior, rounding, and control of optimizations that affect
    floating point calculations. More details in UsersManual.rst.

    Reviewers: rjmccall

    Differential Revision: https://reviews.llvm.org/D62731
2019-11-07 07:22:45 -08:00
Daniel Sanders
6cdb5ec3bc [globalisel][docs] Rework GMIR documentation and add an early GenericOpcode reference
Summary:
Rework the GMIR documentation to focus more on the end user than the
implementation and tie it in to the MIR document. There was also some
out-of-date information which has been removed.

The quality of the GenericOpcode reference is highly variable and drops
sharply as I worked through them all but we've got to start somewhere :-).
It would be great if others could expand on this too as there is an awful
lot to get through.

Also fix a typo in the definition of G_FLOG. Previously, the comments said
we had two base-2's (G_FLOG and G_FLOG2).

Reviewers: aemerson, volkan, rovka, arsenm

Reviewed By: rovka

Subscribers: wdng, arphaman, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69545
2019-11-05 15:16:43 -08:00
Daniel Sanders
7a5b72e3a3 [globalisel] Rename G_GEP to G_PTR_ADD
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD

Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69734
2019-11-05 10:31:17 -08:00
David Candler
4e46779660 [cfi] Add flag to always generate .debug_frame
This adds a flag to LLVM and clang to always generate a .debug_frame
section, even if other debug information is not being generated. In
situations where .eh_frame would normally be emitted, both .debug_frame
and .eh_frame will be used.

Differential Revision: https://reviews.llvm.org/D67216
2019-10-31 09:48:30 +00:00
Craig Topper
0ad4d87942 [Target] Change PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL to use unknown instead of i8imm/i16imm/i32imm in its definition.
These instructions don't use immediates, they use registers. But
the register class needed is target specific. So just use unknown.
2019-10-30 00:36:01 -07:00
Andrew Paverd
d090368b0c Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.

Reviewers: thakis, rnk, theraven, pcc

Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 15:19:39 +00:00
Daniel Sanders
17873a7a4c [gicombiner] Add the run-time rule disable option
Summary:
Each generated helper can be configured to generate an option that disables
rules in that helper. This can be used to bisect rulesets.

The disable bits are stored in a SparseVector as this is very cheap for the
common case where nothing is disabled. It gets more expensive the more rules
are disabled but you're generally doing that for debug purposes where
performance is less of a concern.

Depends on D68426

Reviewers: volkan, bogner

Reviewed By: volkan

Subscribers: hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68438

llvm-svn: 375067
2019-10-17 00:37:04 +00:00
Daniel Sanders
bb60c325a9 [gicombiner] Hoist pure C++ combine into the tablegen definition
Summary:
This is just moving the existing C++ code around and will be NFC w.r.t
AArch64. Renamed 'CombineBr' to something more descriptive
('ElideByByInvertingCond') at the same time.

The remaining combines in AArch64PreLegalizeCombiner require features that
aren't implemented at this point and will be hoisted as they are added.

Depends on D68424

Reviewers: bogner, volkan

Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68426

llvm-svn: 375057
2019-10-16 23:53:35 +00:00
Matt Arsenault
ae11fea1a6 GlobalISel: Add target pre-isel instructions
Allows targets to introduce regbankselectable
pseudo-instructions. Currently the closet feature to this is an
intrinsic. However this requires creating a public intrinsic
declaration. This litters the public intrinsic namespace with
operations we don't necessarily want to expose to IR producers, and
would rather leave as private to the backend.

Use a new instruction bit. A previous attempt tried to keep using enum
value ranges, but it turned into a mess.

llvm-svn: 373937
2019-10-07 18:43:29 +00:00
Kevin P. Neal
02f26957db [FPEnv] Add constrained intrinsics for lrint and lround
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.

Reviewed by:	andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by:	craig.topper
Differential Revision:	https://reviews.llvm.org/D64746

llvm-svn: 373900
2019-10-07 13:20:00 +00:00
Daniel Sanders
30b8668bfa [gicombiner] Add the boring boilerplate for the declarative combiner
Summary:
This is the first of a series of patches extracted from a much bigger WIP
patch. It merely establishes the tblgen pass and the way empty combiner
helpers are declared and integrated into a combiner info.

The tablegen pass takes a -combiners option to select the combiner helper
that will be generated. This can be given multiple values to generate
multiple combiner helpers at once. Doing so helps to minimize parsing
overhead.

The reason for creating a GlobalISel subdirectory in utils/TableGen is that
there will be quite a lot of non-pass files (~15) by the time the patch
series is done.

Reviewers: volkan

Subscribers: mgorny, hiraditya, simoncook, Petar.Avramovic, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68286

llvm-svn: 373527
2019-10-02 21:13:07 +00:00
Yuanfang Chen
9795ad43cf [NewPM] Port MachineModuleInfo to the new pass manager.
Existing clients are converted to use MachineModuleInfoWrapperPass. The
new interface is for defining a new pass manager API in CodeGen.

Reviewers: fedor.sergeev, philip.pfaffe, chandlerc, arsenm

Reviewed By: arsenm, fedor.sergeev

Differential Revision: https://reviews.llvm.org/D64183

llvm-svn: 373240
2019-09-30 17:54:50 +00:00
Matt Arsenault
c204981f6f Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
James Molloy
7fff32705f [TableGen] Support encoding per-HwMode
Much like ValueTypeByHwMode/RegInfoByHwMode, this patch allows targets
to modify an instruction's encoding based on HwMode. When the
EncodingInfos field is non-empty the Inst and Size fields of the Instruction
are ignored and taken from EncodingInfos instead.

As part of this promote getHwMode() from TargetSubtargetInfo to MCSubtargetInfo.

This is NFC for all existing targets - new code is generated only if targets
use EncodingByHwMode.

llvm-svn: 372320
2019-09-19 13:39:54 +00:00
Hans Wennborg
230a0cd001 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
Matt Arsenault
6df65c514b GlobalISel: Don't materialize immarg arguments to intrinsics
Encode them directly as an imm argument to G_INTRINSIC*.

Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.

This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.

SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.

Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.

The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.

This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372285
2019-09-19 01:33:14 +00:00
Amy Huang
29e7e13b1b Add AutoUpgrade function to add new address space datalayout string to existing datalayouts.
Summary:
Add function to AutoUpgrade to change the datalayout of old X86 datalayout strings.
This adds "-p270:32:32-p271:32:32-p272:64:64" to X86 datalayouts that are otherwise valid
and don't already contain it.

This also removes the compatibility changes in https://reviews.llvm.org/D66843.
Datalayout change in https://reviews.llvm.org/D64931.

Reviewers: rnk, echristo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67631

llvm-svn: 372267
2019-09-18 22:15:58 +00:00
Sam Tebbs
bc425164d2 [ARM] Add support for MVE vmaxv and vminv
This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv.

Differential Revision: https://reviews.llvm.org/D66413

llvm-svn: 371827
2019-09-13 09:11:46 +00:00
Matt Arsenault
62a482c739 DAG/GlobalISel: Correct type profile of bitcount ops
The result integer does not need to be the same width as the input.
AMDGPU, NVPTX, and Hexagon all have patterns working around the types
matching. GlobalISel defines these as being different type indexes.

llvm-svn: 371797
2019-09-13 00:11:14 +00:00
Philip Reames
f9198adb61 Rename nonvolatile_load/store to simple_load/store [NFC]
Implement the TODO from D66318.

llvm-svn: 371789
2019-09-12 23:03:39 +00:00
Philip Reames
ba1f39ccae [SDAG] Update generic code to conservatively check for isAtomic in addition to isVolatile
This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later.  See D66309 for context.

Differential Revision: https://reviews.llvm.org/D66318

llvm-svn: 371786
2019-09-12 22:49:17 +00:00
Tim Northover
1bb14916f2 AArch64: support arm64_32, an ILP32 slice for watchOS.
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.

llvm-svn: 371722
2019-09-12 10:22:23 +00:00
Amy Huang
062b5d40cb Reland "Change the X86 datalayout to add three address spaces
for 32 bit signed, 32 bit unsigned, and 64 bit pointers."
This reverts 57076d3199fc2b0af4a3736b7749dd5462cacda5.

Original review at https://reviews.llvm.org/D64931.
Review for added fix at https://reviews.llvm.org/D66843.

llvm-svn: 371568
2019-09-10 23:15:38 +00:00
Matt Arsenault
4ec23178ea AMDGPU/GlobalISel: Select atomic loads
A new check for an explicitly atomic MMO is needed to avoid
incorrectly matching pattern for non-atomic loads

llvm-svn: 371418
2019-09-09 16:18:07 +00:00
Tim Northover
ba4be9e30b GlobalISel: add combiner to form indexed loads.
Loosely based on DAGCombiner version, but this part is slightly simpler in
GlobalIsel because all address calculation is performed by G_GEP. That makes
the inc/dec distinction moot so there's just pre/post to think about.

No targets can handle it yet so testing is via a special flag that overrides
target hooks.

llvm-svn: 371384
2019-09-09 10:04:23 +00:00
Bjorn Pettersson
f8a47e6d1e [Intrinsic] Add the llvm.umul.fix.sat intrinsic
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.

Patch by: leonardchan, bjope

Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel

Reviewed By: leonardchan

Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57836

llvm-svn: 371308
2019-09-07 12:16:14 +00:00
Matt Arsenault
36ab62a361 GlobalISel: Add G_FMAD instruction
llvm-svn: 371254
2019-09-06 20:49:10 +00:00
Matt Arsenault
1493567d2b GlobalISel: Add G_BITREVERSE
This is the first failing pattern for AMDGPU and is trivial to handle.

llvm-svn: 370927
2019-09-04 17:06:53 +00:00
Matt Arsenault
52047150e9 GlobalISel: Define GINodeEquiv for undef
AMDGPU uses this for undef vector elements in some patterns which will
be enabled in a future patch.

llvm-svn: 370918
2019-09-04 16:19:29 +00:00
Ulrich Weigand
68d6f27a24 [SystemZ] Support constrained fpto[su]i intrinsics
Now that constrained fpto[su]i intrinsic are available,
add codegen support to the SystemZ backend.

In addition to pure back-end changes, I've also needed
to add the strict_fp_to_[su]int and any_fp_to_[su]int
pattern fragments in the obvious way.

llvm-svn: 370674
2019-09-02 16:49:29 +00:00
James Molloy
74fb57df4f [DFAPacketizer] Allow namespacing of automata per-itinerary
The Hexagon itineraries are cunningly crafted such that functional units between
itineraries do not clash. Because all itineraries are bundled into the same DFA,
a functional unit index clash would cause an incorrect DFA to be generated.

A workaround for this is to ensure all itineraries declare the universe of all
possible functional units, but this isn't ideal for three reasons:
  1) We only have a limited number of FUs we can encode in the packetizer, and
     using the universe causes us to hit the limit without care.
  2) Silent codegen faults are bad, and careful triage of the FU list shouldn't
     be required.
  3) Smooshing all itineraries into the same automaton allows combinations of
     instruction classes that cannot exist, which bloats the table.

A simple solution is to allow "namespacing" packetizers.

Differential Revision: https://reviews.llvm.org/D66940

llvm-svn: 370508
2019-08-30 19:50:49 +00:00
Matt Arsenault
5bfe49f2ac GlobalISel/TableGen: Handle setcc patterns
This is a special case because one node maps to two different G_
instructions, and the operand order is changed.

This mostly enables G_FCMP for AMDPGPU. G_ICMP is still manually
selected for now since it has the SALU and VALU complication to deal
with.

llvm-svn: 370280
2019-08-29 01:13:41 +00:00
Andrea Di Biagio
cf4aba9b59 [Tblgen][MCA] Add the ability to mark groups as LoadQueue and StoreQueue. NFCI
Before this patch, users were not allowed to optionally mark processor resource
groups as load/store queues. That is because tablegen class MemoryQueue was
originally declared as expecting a ProcResource template argument (instead of a
more generic ProcResourceKind).

That was an oversight, since the original intention from D54957 was to let user
mark any processor resource as either load/store queue.  This patch adds the
ability to use processor resource groups in MemoryQueue definitions. This is not
a user visible change.

Differential Revision: https://reviews.llvm.org/D66810

llvm-svn: 370091
2019-08-27 18:20:34 +00:00
Amara Emerson
b67ced190e [GlobalISel] Introduce a G_DYN_STACKALLOC opcode to represent dynamic allocas.
This just adds the opcode and verifier, it will be used to replace existing
dynamic alloca handling in a subsequent patch.

Differential Revision: https://reviews.llvm.org/D66677

llvm-svn: 369833
2019-08-24 02:25:56 +00:00
Francis Visoiu Mistrih
e5d8da9da8 [MachO][TLOF] Use hasLocalLinkage to determine if indirect symbol is local
Local symbols in the indirect symbol table contain the value
`INDIRECT_SYMBOL_LOCAL` and the corresponding __pointers entry must
contain the address of the target.

In r349060, I added support for local symbols in the indirect symbol
table, which was checking if the symbol `isDefined` && `!isExternal` to
determine if the symbol is local or not.

It turns out that `isDefined` will return false if the user of the
symbol comes before its definition, and we'll again generate .long 0
which will be the symbol at the adress 0x0.

Instead of doing that, use GlobalValue::hasLocalLinkage() to check if
the symbol is local.

Differential Revision: https://reviews.llvm.org/D66563

llvm-svn: 369671
2019-08-22 16:59:00 +00:00
Sam Tebbs
11317955ae [ARM] Add support for MVE vaddv
This patch adds vecreduce_add and the relevant instruction selection for
vaddv.

Differential revision: https://reviews.llvm.org/D66085

llvm-svn: 369245
2019-08-19 09:38:28 +00:00
Matt Arsenault
284e8e1c63 GlobalISel: Change representation of shuffle masks
Currently shufflemasks get emitted as any other constant, and you end
up with a bunch of virtual registers of G_CONSTANT with a
G_BUILD_VECTOR. The AArch64 selector then asserts on anything that
doesn't fit this pattern. This isn't an ideal representation, and
should avoid legalization and have fewer opportunities for a
representational error.

Rather than invent a new shuffle mask operand type, similar to what
ShuffleVectorSDNode does, just track the original IR Constant mask
operand. I don't completely like the idea of adding another link to
the IR, but MIR is already quite dependent on IR constants already,
and this will allow sharing the shuffle mask utility functions with
the IR.

llvm-svn: 368704
2019-08-13 15:34:38 +00:00
Daniel Sanders
233ed83478 [globalisel] Add G_SEXT_INREG
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
  %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
  %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 23

All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.

To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
  %1 = G_CONSTANT 16
  %2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
  %2 = G_SEXT_INREG %0, 16
or:
  %1 = G_CONSTANT 16
  %2:_(s32) = G_SHL %0, %1
  %3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.

Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61289

llvm-svn: 368487
2019-08-09 21:11:20 +00:00
David Green
b937fefeae [ARM] Add support for MVE pre and post inc loads and stores
This adds pre- and post- increment and decrements for MVE loads and stores. It
uses the builtin pre and post load/store detection, unlike Neon. Loads are
selected with the code in tryT2IndexedLoad, stores are selected with tablegen
patterns. The immediates have a +/-7bit range, multiplied by the size of the
element.

Differential Revision: https://reviews.llvm.org/D63840

llvm-svn: 368305
2019-08-08 15:27:58 +00:00
Matt Arsenault
cfa4ad28b6 AMDGPU: Start redefining atomic PatFrags
Start migrating to a form that will be compatible with the global isel
emitter. Also should fix some overly lax checks on the memory type,
which allowed mis-selecting some illegal atomics.

llvm-svn: 367506
2019-08-01 03:25:52 +00:00
Matt Arsenault
c6739e6709 GlobalISel: moreElementsVector for G_LOAD/G_STORE
AMDGPU change and test is a placeholder until a future patch with
complete handling.

llvm-svn: 367503
2019-08-01 01:44:22 +00:00
Matt Arsenault
3e43a6c1b7 TableGen: Add MinAlignment predicate
AMDGPU uses some custom code predicates for testing alignments.

I'm still having trouble comprehending the behavior of predicate bits
in the PatFrag hierarchy. Any attempt to abstract these properties
unexpectdly fails to apply them.

llvm-svn: 367373
2019-07-31 00:14:43 +00:00
Matt Arsenault
005206df6c GlobalISel: Add G_ATOMICRMW_{FADD|FSUB}
llvm-svn: 367369
2019-07-30 23:56:30 +00:00
Stanislav Mekhanoshin
0c9d743a5c [AMDGPU] Allow register tuples to set asm names
This change reverts most of the previous register name generation.
The real problem is that RegisterTuple does not generate asm names.
Added optional operand to RegisterTuple. This way we can simplify
register name access and dramatically reduce the size of static
tables for the backend.

Differential Revision: https://reviews.llvm.org/D64967

llvm-svn: 366598
2019-07-19 18:05:01 +00:00
Matt Arsenault
fbb71b0b9e GlobalISel: Add GINodeEquiv for fcopysign
I don't need this at the moment, but it should be here.

llvm-svn: 366596
2019-07-19 17:32:19 +00:00
Matt Arsenault
43bcdb17a1 AMDGPU/GlobalISel: Selection for fminnum/fmaxnum
v2f16 case doesn't work yet because the VOP3P complex patterns haven't
been ported yet.

llvm-svn: 366585
2019-07-19 14:42:40 +00:00
Alex Bradbury
ee64434d8e [AsmPrinter] Make the encoding of call sites in .gcc_except_table configurable and use for RISC-V
The original behavior was to always emit the offsets to each call site in the
call site table as uleb128 values, however on some architectures (eg RISCV)
these uleb128 offsets into the code cannot always be resolved until link time
(because relaxation will invalidate any calculated offsets), and there are no
appropriate relocations for uleb128 values. As a consequence it needs to be
possible to specify an alternative.

This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in
.gcc_except_table

Differential Revision: https://reviews.llvm.org/D63415
Patch by Edward Jones.

llvm-svn: 366329
2019-07-17 14:00:35 +00:00
Matt Arsenault
100b40be1a TableGen: Add address space to matchers
Currently AMDGPU uses a CodePatPred to check address spaces from the
MachineMemOperand. Introduce a new first class property so that the
existing patterns can be easily modified to uses the new generated
predicate, which will also be handled for GlobalISel.

I would prefer these to match against the pointer type of the
instruction, but that would be difficult to get working with
SelectionDAG compatbility. This is much easier for now and will avoid
a painful tablegen rewrite for all the loads and stores.

I'm also not sure if there's a better way to encode multiple address
spaces in the table, rather than putting the number to expect.

llvm-svn: 366128
2019-07-15 20:59:42 +00:00
Matt Arsenault
2f9dc44acc GlobalISel: Define the full family of FP min/max instructions
llvm-svn: 365657
2019-07-10 16:31:15 +00:00
Francis Visoiu Mistrih
f0c4b59ebb [CodeGen] Make branch funnels pass the machine verifier
We previously marked all the tests with branch funnels as
`-verify-machineinstrs=0`.

This is an attempt to fix it.

1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList =
(outs)`

2) After that we hit an assert: ``` Assertion failed: (Op.getValueType()
!= MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue
operands should occur at end of operand list!"), function AddOperand,
file
/Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp,
line 461.  ```

The chain operand was added at the beginning of the operand list. Move
that to the end.

3) After that we hit another verifier issue in the pseudo expansion
where the registers used in the cmps and jmps are not added to the
livein lists. Add the `EFLAGS` to all the new MBBs that we create.

PR39436

Differential Review: https://reviews.llvm.org/D54155

llvm-svn: 365058
2019-07-03 17:16:45 +00:00
Matt Arsenault
7cda25fd8b CodeGen: Set hasSideEffects = 0 on BUNDLE
The BUNDLE itself should not have side effects, and this is a property
of instructions inside the bundle. The hasProperty check already
searches for any member instructions, which was pointless since it was
overridden by this bit.

Allows me to distinguish bundles that have side effects vs. do not in
a future patch. Also fixes an unnecessary scheduling barrier in the
bundle AMDGPU uses to get PC relative addresses.

llvm-svn: 364984
2019-07-03 00:30:47 +00:00
Matt Arsenault
cdf2a5a6c1 GlobalISel: Define GINodeEquiv for G_UMULH/G_SMULH
llvm-svn: 364931
2019-07-02 14:49:29 +00:00
Matt Arsenault
2e9237bd7c GlobalISel: Add G_FENCE
The pattern importer is for some reason emitting checks for G_CONSTANT
for the immediate operands.

llvm-svn: 364926
2019-07-02 14:16:39 +00:00
Matt Arsenault
72c77701f2 GlobalISel: Add GINodeEquiv for min/max
llvm-svn: 364759
2019-07-01 13:22:04 +00:00
Matt Arsenault
678857c4a3 GlobalISel: Add DAG compat for G_FCANONICALIZE
llvm-svn: 364758
2019-07-01 13:22:00 +00:00
Ulrich Weigand
8564d2aa2a Allow matching extend-from-memory with strict FP nodes
This implements a small enhancement to https://reviews.llvm.org/D55506

Specifically, while we were able to match strict FP nodes for
floating-point extend operations with a register as source, this
did not work for operations with memory as source.

That is because from regular operations, this is represented as
a combined "extload" node (which is a variant of a load SD node);
but there is no equivalent using a strict FP operation.

However, it turns out that even in the absence of an extload
node, we can still just match the operations explicitly, e.g.
   (strict_fpextend (f32 (load node:$ptr))

This patch implements that method to match the LDEB/LXEB/LXDB
SystemZ instructions even when the extend uses a strict-FP node.

llvm-svn: 364450
2019-06-26 17:19:12 +00:00
Djordje Todorovic
d91362a7a6 [TargetOption] Add option to ebanble the debug entry values
The option enables debug info about parameter's entry values.

([2/13] Introduce the debug entry values.)

Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>

Differential Revision: https://reviews.llvm.org/D60961

llvm-svn: 364395
2019-06-26 08:35:43 +00:00
Craig Topper
b1acf14a86 [X86][SelectionDAG] Cleanup and simplify masked_load/masked_store in tablegen. Use more precise PatFrags for scalar masked load/store.
Rename masked_load/masked_store to masked_ld/masked_st to discourage
their direct use. We need to check truncating/extending and
compressing/expanding before using them. This revealed that
our scalar masked load/store patterns were misusing these.

With those out of the way, renamed masked_load_unaligned and
masked_store_unaligned to remove the "_unaligned". We didn't
check the alignment anyway so the name was somewhat misleading.

Make the aligned versions inherit from masked_load/store instead
from a separate identical version. Merge the 3 different alignments
PatFrags into a single version that uses the VT from the SDNode to
determine the size that the alignment needs to match.

llvm-svn: 364150
2019-06-23 06:06:04 +00:00
Daniel Sanders
0aa9e9e230 Re-commit r363744: [tblgen][disasm] Allow multiple encodings to disassemble to the same instruction
It seems macOS lets you have ArrayRef<const X> even though this is apparently
forbidden by the language standard (Thanks MSVC++ for the clear error message).
Removed the problematic const's to fix this.

(It also seems I'm not receiving buildbot emails anymore and I'm trying to find
 out why. In the mean time I'll be polling lab.llvm.org to hopefully see if/when
 failures occur)

llvm-svn: 363753
2019-06-18 23:34:46 +00:00
Jordan Rupprecht
5f50076bbf Revert [tblgen][disasm] Allow multiple encodings to disassemble to the same instruction
This reverts r363744 (git commit 9b2252123d1e79d2b3594097a9d9cc60072b83d9)

This breaks many buildbots, e.g. http://lab.llvm.org:8011/builders/clang-atom-d525-fedora-rel/builds/203/steps/build%20stage%201/logs/stdio

llvm-svn: 363747
2019-06-18 22:21:31 +00:00
Daniel Sanders
532a61be1c [tblgen][disasm] Allow multiple encodings to disassemble to the same instruction
Summary:
Add an AdditionalEncoding class which can be used to define additional encodings
for a given instruction. This causes the disassembler to add an additional
encoding to its matching tables that map to the specified instruction.

Usage:
  def ADD1 : Instruction {
    bits<8> Reg;
    bits<32> Inst;

    let Size = 4;
    let Inst{0-7} = Reg;
    let Inst{8-14} = 0;
    let Inst{15} = 1; // Continuation bit
    let Inst{16-31} = 0;
    ...
  }
  def : AdditionalEncoding<ADD1> {
    bits<8> Reg;
    bits<16> Inst; // You can also have bits<32> and it will still be a 16-bit encoding
    let Size = 2;
    let Inst{0-3} = 0;
    let Inst{4-7} = Reg;
    let Inst{8-15} = 0;
    ...
  }
with those definitions, llvm-mc will successfully disassemble both of these:
  0x01 0x00
  0x10 0x80 0x00 0x00
to:
  ADD1 r1

Depends on D52366

Reviewers: bogner, charukcs

Reviewed By: bogner

Subscribers: nlguillemot, nhaehnle, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D52369

llvm-svn: 363744
2019-06-18 21:56:04 +00:00
Amara Emerson
a21f959530 [GlobalISel] Add a G_BRJT opcode.
This is a branch opcode that takes a jump table pointer, jump table index and an
index into the table to do an indirect branch.

We pass both the table pointer and JTI to allow targets like ARM64 to more
easily use the existing jump table compression optimization without having to
walk up the block to find a paired G_JUMP_TABLE.

Differential Revision: https://reviews.llvm.org/D63159

llvm-svn: 363434
2019-06-14 17:55:48 +00:00
Amara Emerson
9b31bc723f [GlobalISel] Add a G_JUMP_TABLE opcode.
This opcode generates a pointer to the address of the jump table
specified by the source operand, which is a jump table index.

It will be used in conjunction with an upcoming G_BRJT opcode to support
jump table codegen with GlobalISel.

Differential Revision: https://reviews.llvm.org/D63111

llvm-svn: 363096
2019-06-11 19:58:06 +00:00
Ulrich Weigand
fba10ebb96 Allow target to handle STRICT floating-point nodes
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).

This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.

To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:

- A new MCID flag "mayRaiseFPException" that the target should set on any
  instruction that possibly can raise FP exception according to the
  architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
  instruction resulting from expansion of any constrained FP intrinsic.

Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).

Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.

The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.

This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.

Reviewed By: andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D55506

llvm-svn: 362663
2019-06-05 22:33:10 +00:00
Adhemerval Zanella
2476ae717e [AArch64] Handle ISD::LRINT and ISD::LLRINT
This patch optimizes ISD::LRINT and ISD::LLRINT to frintx plus
fcvtzs. It currently only handles the scalar version.

Reviewed By: SjoerdMeijer, mstorsjo

Differential Revision: https://reviews.llvm.org/D62018

llvm-svn: 361877
2019-05-28 21:04:29 +00:00
Sjoerd Meijer
27556576bb [TargetMachine] error message unsupported code model
When the tiny code model is requested for a target machine that does not
support this, we get an error message (which is nice) but also this diagnostic
and request to submit a bug report:

    fatal error: error in backend: Target does not support the tiny CodeModel
    [Inferior 2 (process 31509) exited with code 0106]
    clang-9: error: clang frontend command failed with exit code 70 (use -v to see invocation)
    (gdb) clang version 9.0.0 (http://llvm.org/git/clang.git 29994b0c63a40f9c97c664170244a7bba5ecc15e) (http://llvm.org/git/llvm.git 95606fdf91c2d63a931e865f4b78b2e9828ddc74)
    Target: arm-arm-none-eabi
    Thread model: posix
    clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
    clang-9: note: diagnostic msg:
    ********************
    PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
    Preprocessed source(s) and associated run script(s) are located at:
    clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.c
    clang-9: note: diagnostic msg: /tmp/tiny-dfe1a2.sh
    clang-9: note: diagnostic msg:

But this is not a bug, this is a feature. :-) Not only is this not a bug, this
is also pretty confusing. This patch causes just to print the fatal error and
not the diagnostic:

fatal error: error in backend: Target does not support the tiny CodeModel

Differential Revision: https://reviews.llvm.org/D62236

llvm-svn: 361370
2019-05-22 10:40:26 +00:00
Leonard Chan
323fa05c6e [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D55720

llvm-svn: 361289
2019-05-21 19:17:19 +00:00
Matt Arsenault
ec6a868f2e GlobalISel: Define integer min/max instructions
Doesn't attempt to emit them for anything yet, but some legalizations
I want to port use them.

llvm-svn: 361061
2019-05-17 18:36:31 +00:00
Adhemerval Zanella
101145b734 [AArch64] Handle ISD::LROUND and ISD::LLROUND
This patch optimizes ISD::LROUND and ISD::LLROUND to fcvtas
instruction. It currently only handles the scalar version.

llvm-svn: 360894
2019-05-16 13:30:18 +00:00
Matt Arsenault
d0fc07f9c6 GlobalISel: Add G_FCOPYSIGN
llvm-svn: 360850
2019-05-16 04:08:39 +00:00
Fangrui Song
5d5e6023b2 Fix typos: (re)?sor?uce -> (re)?source
Closes: https://github.com/llvm/llvm-project/pull/10

In-collaboration-with:	Olivier Cochard-Labbé <olivier@FreeBSD.org>
Signed-off-by: Enji Cooper <yaneurabeya@gmail.com>

Differential Revision: https://reviews.llvm.org/D61021

llvm-svn: 359277
2019-04-26 05:56:23 +00:00
Jessica Paquette
24c94cfae0 [GlobalISel] Add a G_FNEARBYINT opcode
For eventually selecting llvm.nearbyint. Equivalent to the SelectionDAG
nearbyint node.

Update legalizer-info-validation.mir.

Differential Revision: https://reviews.llvm.org/D60921

llvm-svn: 359201
2019-04-25 16:36:03 +00:00
Jessica Paquette
6966425d24 [GlobalISel] Add a G_FRINT opcode
Equivalent to SelectionDAG's frint node.

Differential Revision: https://reviews.llvm.org/D60891

llvm-svn: 358785
2019-04-19 21:44:16 +00:00
Tim Northover
606d5bbe91 DAG: propagate whether an arg is a pointer for CallingConv decisions.
The arm64_32 ABI specifies that pointers (despite being 32-bits) should be
zero-extended to 64-bits when passed in registers for efficiency reasons. This
means that the SelectionDAG needs to be able to tell the backend that an
argument was originally a pointer, which is implmented here.

Additionally, some memory intrinsics need to be declared as taking an i8*
instead of an iPTR.

There should be no CodeGen change yet, but it will be triggered when AArch64
backend support for ILP32 is added.

llvm-svn: 358398
2019-04-15 12:03:54 +00:00
Shiva Chen
4154cae682 [RISCV] Put data smaller than eight bytes to small data section
Because of gp = sdata_start_address + 0x800, gp with signed twelve-bit offset
could covert most of the small data section. Linker relaxation could transfer
the multiple data accessing instructions to a gp base with signed twelve-bit
offset instruction.

Differential Revision: https://reviews.llvm.org/D57493

llvm-svn: 358150
2019-04-11 04:59:13 +00:00
Matt Arsenault
571a94ef5d MIR: Allow targets to serialize MachineFunctionInfo
This has been a very painful missing feature that has made producing
reduced testcases difficult. In particular the various registers
determined for stack access during function lowering were necessary to
avoid undefined register errors in a large percentage of
cases. Implement a subset of the important fields that need to be
preserved for AMDGPU.

Most of the changes are to support targets parsing register fields and
properly reporting errors. The biggest sort-of bug remaining is for
fields that can be initialized from the IR section will be overwritten
by a default initialized machineFunctionInfo section. Another
remaining bug is the machineFunctionInfo section is still printed even
if empty.

llvm-svn: 356215
2019-03-14 22:54:43 +00:00
Amara Emerson
90869d8494 [AArch64][GlobalISel] Add some support for G_CONCAT_VECTORS.
Handles concatenating 2 x v2s32 and 2 x v4s16

Differential Revision: https://reviews.llvm.org/D59390

llvm-svn: 356212
2019-03-14 22:48:15 +00:00
Craig Topper
eef2c65e50 Recommit r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary."
Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher.

Original commit message:

Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.

By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.

This removes something like 40,000 bytes from the X86 isel table.

Differential Revision: https://reviews.llvm.org/D58595

llvm-svn: 355784
2019-03-10 05:21:52 +00:00
Craig Topper
dde774575a Revert r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary."
This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures.

llvm-svn: 355433
2019-03-05 19:18:16 +00:00
Craig Topper
dd0dabebc0 [TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary.
Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts.

By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up.

This removes something like 40,000 bytes from the X86 isel table.

Differential Revision: https://reviews.llvm.org/D58595

llvm-svn: 355224
2019-03-01 20:18:38 +00:00
Igor Kudrin
4ca8c7b7fa [llvm-objdump] Implement -Mreg-names-raw/-std options.
The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.

The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
  which is the default behavior of llvm-objdump.

Differential Revision: https://reviews.llvm.org/D57680

llvm-svn: 354870
2019-02-26 12:15:14 +00:00
Simon Tatham
fb51a50eef [ARM] Make fullfp16 instructions not conditionalisable.
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.

In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.

(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)

So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.

I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.

Reviewers: SjoerdMeijer, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57823

llvm-svn: 354768
2019-02-25 10:39:53 +00:00
Diana Picus
48ab554b0e Fix obsolete comment. NFC
Both files mentioned in the comment now include TargetOpcodes.def. Just
mention that directly.

llvm-svn: 354316
2019-02-19 11:34:36 +00:00
Matt Arsenault
84e44687ee GlobalISel: Add G_FCANONICALIZE instruction
llvm-svn: 353719
2019-02-11 17:05:20 +00:00
Jessica Paquette
cb5da3f61a Recommit "[GlobalISel] Introduce a generic floating point floor opcode, G_FFLOOR""
After r353586, we won't fail on the AMDGPU floor pattern that was killing the
importer before.

llvm-svn: 353589
2019-02-09 00:37:31 +00:00
Craig Topper
ea7e6b3857 Implementation of asm-goto support in LLVM
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html

This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.

This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.

There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.

Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii

Differential Revision: https://reviews.llvm.org/D53765

llvm-svn: 353563
2019-02-08 20:48:56 +00:00
Jessica Paquette
b78928bcbd Revert "[GlobalISel] Introduce a generic floating point floor opcode, G_FFLOOR"
This reverts commit b05ecba6d687fcb3078509220c67458bf1d77a2e.

Apparently adding floor breaks AMDGPU somehow, so I have to back this out
while I look into it.

llvm-svn: 353065
2019-02-04 17:32:47 +00:00
Leonard Chan
7230ae9ced [Intrinsic] Unsigned Fixed Point Multiplication Intrinsic
Add an intrinsic that takes 2 unsigned integers with the scale of them
provided as the third argument and performs fixed point multiplication on
them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D55625

llvm-svn: 353059
2019-02-04 17:18:11 +00:00
Jessica Paquette
b34b9ca867 [GlobalISel] Introduce a generic floating point floor opcode, G_FFLOOR
This introduces a generic opcode for floating point floor, working towards
selecting @llvm.floor.

Differential Revision: https://reviews.llvm.org/D57484

llvm-svn: 353057
2019-02-04 17:10:55 +00:00
Matt Arsenault
51c3d3146e GlobalISel: Allow bitcount ops to have different result type
For AMDGPU the result is always 32-bit for 64-bit inputs.

llvm-svn: 352717
2019-01-31 02:09:57 +00:00
Jessica Paquette
d46e179ece [GlobalISel][AArch64] Select G_FABS
This adds instruction selection support for G_FABS in AArch64. It also updates
the existing basic FP tests, adds a selection test for G_FABS.

https://reviews.llvm.org/D57418

llvm-svn: 352684
2019-01-30 22:54:21 +00:00
Jessica Paquette
2b66bd8551 [GlobalISel] Introduce a G_FSQRT generic instruction
This introduces a generic instruction for computing the floating point
square root of a value.

Right now, we can't select @llvm.sqrt, so this is working towards fixing that.

llvm-svn: 352668
2019-01-30 20:49:50 +00:00
Jessica Paquette
e2b76b0235 [GlobalISel] Add G_FSIN and G_FCOS generic instructions
This introduces generic instrutions for floating point sin and cos, G_FCOS and
G_FSIN. It updates the tests, etc.

https://reviews.llvm.org/D57197
1/3

llvm-svn: 352400
2019-01-28 18:34:16 +00:00
Matt Arsenault
cfaee823da GlobalISel: Allow shift amount to be a different type
For AMDGPU the shift amount is never 64-bit, and
this needs to use a 32-bit shift.

X86 uses i8, but seemed to be hacking around this before.

llvm-svn: 351882
2019-01-22 21:42:11 +00:00
Matt Arsenault
5be17fc66e GlobalISel: Disallow vectors for G_CONSTANT/G_FCONSTANT
llvm-svn: 351853
2019-01-22 18:53:41 +00:00
Matt Arsenault
71e3843c7f Codegen support for atomicrmw fadd/fsub
llvm-svn: 351851
2019-01-22 18:36:06 +00:00
Chandler Carruth
ae65e281f3 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Reid Kleckner
fa6cb76678 [X86] Deduplicate static calling convention helpers for code size, NFC
Summary:
Right now we include ${TGT}GenCallingConv.inc once per each instruction
selection method implemented by ${TGT}:
- ${TGT}ISelLowering.cpp
- ${TGT}CallLowering.cpp
- ${TGT}FastISel.cpp

Instead, add a mechanism to tablegen for marking a particular convention
as "External", which causes tablegen to emit into the ::llvm namespace,
instead of as a static helper. This allows us to provide a header to
forward declare it, so we can simply call the function from all the
places it is referenced. Typically the calling convention analyzer is
called indirectly, so it doesn't benefit from inlining.

This saves a bit of final binary size, but mostly just saves object file
size:

before  after   diff   artifact
12852K  12492K  -360K  X86ISelLowering.cpp.obj
4640K   4280K   -360K  X86FastISel.cpp.obj
1704K   2092K   +388K  X86CallingConv.cpp.obj
52448K  52336K  -112K  llc.exe

I didn't collect before numbers for X86CallLowering.cpp.obj, which is
for GlobalISel, but we should save 360K there as well.

This patch applies the strategy to the X86 backend, but there is no
reason it couldn't be applied to the other backends that implement
multiple ISel strategies, like AArch64.

Reviewers: craig.topper, hfinkel, efriedma

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D56883

llvm-svn: 351616
2019-01-19 00:33:02 +00:00
Matt Arsenault
416efa3eb6 GlobalISel: Add comment to clarify G_BUILD_VECTOR
llvm-svn: 351428
2019-01-17 10:50:07 +00:00
Jessica Paquette
5e7f539f83 [GlobalISel][AArch64] Add support for @llvm.ceil
This adds a G_FCEIL generic instruction and uses it in AArch64. This adds
selection for floating point ceil where it has a supported, dedicated
instruction. Other cases aren't handled here.

It updates the relevant gisel tests and adds a select-ceil test. It also adds a
check to arm64-vcvt.ll which ensures that we don't fall back when we run into
one of the relevant cases.

llvm-svn: 349664
2018-12-19 19:01:36 +00:00
Scott Linder
538dfe3e23 Implement -frecord-command-line (-frecord-gcc-switches)
Implement options in clang to enable recording the driver command-line
in an ELF section.

Implement a new special named metadata, llvm.commandline, to support
frontends embedding their command-line options in IR/ASM/ELF.

This differs from the GCC implementation in some key ways:

* In GCC there is only one command-line possible per compilation-unit,
  in LLVM it mirrors llvm.ident and multiple are allowed.
* In GCC individual options are separated by NULL bytes, in LLVM entire
  command-lines are separated by NULL bytes. The advantage of the GCC
  approach is to clearly delineate options in the face of embedded
  spaces. The advantage of the LLVM approach is to support merging
  multiple command-lines unambiguously, while handling embedded spaces
  with escaping.

Differential Revision: https://reviews.llvm.org/D54487
Clang Differential Revision: https://reviews.llvm.org/D54489

llvm-svn: 349155
2018-12-14 15:38:15 +00:00
Leonard Chan
8c2c853a8e [Intrinsic] Signed Fixed Point Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D54719

llvm-svn: 348912
2018-12-12 06:29:14 +00:00
Evandro Menezes
c839f35d10 [AArch64] Refactor the Exynos scheduling predicates
Refactor the scheduling predicates based on `MCInstPredicate`.  In this
case, for the Exynos processors.

Differential revision: https://reviews.llvm.org/D55345

llvm-svn: 348774
2018-12-10 17:17:26 +00:00
Jessica Paquette
6ddf71d313 [GlobalISel] Add IR translation support for the @llvm.log10 intrinsic
This adds IR translation support for @llvm.log10 and updates relevant tests.

https://reviews.llvm.org/D55392

llvm-svn: 348657
2018-12-07 22:08:02 +00:00
David Green
76448ad394 [Targets] Add errors for tiny and kernel codemodel on targets that don't support them
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.

Differential Revision: https://reviews.llvm.org/D50141

llvm-svn: 348585
2018-12-07 12:10:23 +00:00
Amara Emerson
5901f854c7 [GlobalISel] Introduce G_BUILD_VECTOR, G_BUILD_VECTOR_TRUNC and G_CONCAT_VECTOR opcodes.
These opcodes are intended to subsume some of the capability of G_MERGE_VALUES,
as it was too powerful and thus complex to add deal with throughout the GISel
pipeline.

G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed
scalar values. G_BUILD_VECTOR_TRUNC is a special opcode for handling scalar
operands which are larger than the destination vector element type, and
therefore does an implicit truncate.

G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed,
vectors together.

These will be used in a subsequent commit. This commit just adds the initial
infrastructure.

Differential Revision: https://reviews.llvm.org/D53594

llvm-svn: 348430
2018-12-05 23:53:30 +00:00
Simon Pilgrim
0a9765cd5c [DAG] Add fshl/fshr tblgen opcodes
Missed off from https://reviews.llvm.org/D54698

llvm-svn: 348358
2018-12-05 11:55:33 +00:00
Oliver Stannard
eb66331216 [ARM][MC] Move information about variadic register defs into tablegen
Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.

This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.

This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.

Differential revision: https://reviews.llvm.org/D54853

llvm-svn: 348114
2018-12-03 10:32:42 +00:00
Nicolai Haehnle
aff3e1dfef TableGen/ISel: Allow PatFrag predicate code to access captured operands
Summary:
This simplifies writing predicates for pattern fragments that are
automatically re-associated or commuted.

For example, a followup patch adds patterns for fragments of the form
(add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are
automatically commuted to (add $z, (shl $x, $y)), which makes it basically
impossible to refer to $x, $y, and $z generically in the PredicateCode.

With this change, the PredicateCode can refer to $x, $y, and $z simply
as `Operands[i]`.

Test confirmed that there are no changes to any of the generated files
when building all (non-experimental) targets.

Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c

Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand

Subscribers: wdng, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D51994

llvm-svn: 347992
2018-11-30 14:15:13 +00:00
Petr Pavlu
a73160fd34 [GlobalISel] Make EnableGlobalISel always set when GISel is enabled
Change meaning of TargetOptions::EnableGlobalISel. The flag was
previously set only when a target switched on GlobalISel but it is now
always set when the GlobalISel pipeline is enabled. This makes the flag
consistent with TargetOptions::EnableFastISel and allows its use in
other parts of the compiler to determine when GlobalISel is enabled.

The EnableGlobalISel flag had previouly only one use in
TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value
to determine if GlobalISel was enabled by a target and returned false in
such a case. To preserve the current behaviour, a new flag
TargetOptions::GlobalISelAbort is introduced to separately record the
abort behaviour.

Differential Revision: https://reviews.llvm.org/D54518

llvm-svn: 347861
2018-11-29 12:56:32 +00:00
Andrea Di Biagio
a900121de4 [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.

A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues.  Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`.  Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).

At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model.  If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.

With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".

About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage.  This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.

Differential Revision: https://reviews.llvm.org/D54957

llvm-svn: 347857
2018-11-29 12:15:56 +00:00
Evandro Menezes
c46bcdb565 [TableGen] Emit more variant transitions
`llvm-mca` relies on the predicates to be based on `MCSchedPredicate` in order
to resolve the scheduling for variant instructions.  Otherwise, it aborts
the building of the instruction model early.

However, the scheduling model emitter in `TableGen` gives up too soon, unless
all processors use only such predicates.

In order to allow more processors to be used with `llvm-mca`, this patch
emits scheduling transitions if any processor uses these predicates.  The
transition emitted for the processors using legacy predicates is the one
specified with `NoSchedPred`, which is based on `MCSchedPredicate`.

Preferably, `llvm-mca` should instead assume a reasonable default when a
variant transition is not based on `MCSchedPredicate` for a given processor.
This issue should be revisited in the future.

Differential revision: https://reviews.llvm.org/D54648

llvm-svn: 347504
2018-11-23 21:17:33 +00:00
Simon Pilgrim
805e175f61 [SelectionDAG] Move (repeated) SDTIntShiftDOp double shift node def to common code. NFCI.
Prep work for PR39467.

llvm-svn: 347067
2018-11-16 17:50:59 +00:00
Craig Topper
62d16b9aa6 [SelectionDAG][X86] Relax restriction on the width of an input to *_EXTEND_VECTOR_INREG. Use them and regular *_EXTEND to replace the X86 specific VSEXT/VZEXT opcodes
Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output.

This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes.

X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code.

I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements.

The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits.

Differential Revision: https://reviews.llvm.org/D54346

llvm-svn: 346784
2018-11-13 19:45:21 +00:00
Clement Courbet
21390a9b77 [llvm-exegesis][NFC] Add a way to declare the default counter binding for unbound CPUs for a target.
Summary:
This simplifies the code and moves everything to tablegen for consistency. This
also prepares the ground for adding issue counters.

Reviewers: gchatelet, john.brawn, jsji

Subscribers: nemanjai, mgorny, javed.absar, kbarton, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54297

llvm-svn: 346489
2018-11-09 13:15:32 +00:00
Matthias Braun
81989e4ac5 TargetMachine: Move lib/CodeGen specific callbacks to LLVMTargetMachine; NFC
llvm-svn: 346184
2018-11-05 23:49:15 +00:00
Daniel Sanders
a377d6ce6a [globalisel] Add comments indicating the operand order
llvm-svn: 345769
2018-10-31 19:49:37 +00:00
Andrea Di Biagio
b2b609ef71 [tblgen][PredicateExpander] Add the ability to describe more complex constraints on instruction operands.
Before this patch, class PredicateExpander only knew how to expand simple
predicates that performed checks on instruction operands.
In particular, the new scheduling predicate syntax was not rich enough to
express checks like this one:

  Foo(MI->getOperand(0).getImm()) == ExpectedVal;

Here, the immediate operand value at index zero is passed in input to function
Foo, and ExpectedVal is compared against the value returned by function Foo.

While this predicate pattern doesn't show up in any X86 model, it shows up in
other upstream targets. So, being able to support those predicates is
fundamental if we want to be able to modernize all the scheduling models
upstream.

With this patch, we allow users to specify if a register/immediate operand value
needs to be passed in input to a function as part of the predicate check. Now,
register/immediate operand checks all derive from base class CheckOperandBase.

This patch also changes where TIIPredicate definitions are expanded by the
instructon info emitter. Before, definitions were expanded in class
XXXGenInstrInfo (where XXX is a target name).
With the introduction of this new syntax, we may want to have TIIPredicates
expanded directly in XXXInstrInfo. That is because functions used by the new
operand predicates may only exist in the derived class (i.e. XXXInstrInfo).

This patch is a non functional change for the existing scheduling models.
In future, we will be able to use this richer syntax to better describe complex
scheduling predicates, and expose them to llvm-mca.

Differential Revision: https://reviews.llvm.org/D53880

llvm-svn: 345714
2018-10-31 12:28:05 +00:00
Leonard Chan
4d14f937a7 [Intrinsic] Signed and Unsigned Saturation Subtraction Intirnsics
Add an intrinsic that takes 2 integers and perform saturation subtraction on
them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D53783

llvm-svn: 345512
2018-10-29 16:54:37 +00:00
Andrea Di Biagio
5ab5aec206 [tblgen] Improve comments in TargetInstrPredicate.td. NFC
llvm-svn: 345399
2018-10-26 16:22:26 +00:00
Francis Visoiu Mistrih
d842f3f52d [CodeGen] Remove out operands from PATCHABLE_OP
The current model requires 1 out operand, but it is not used nor created.

This fixed an x86 machine verifier issue.

Part of PR27481.

llvm-svn: 345384
2018-10-26 13:37:25 +00:00
Francis Visoiu Mistrih
6670e950e2 [CodeGen] Remove operands from FENTRY_CALL
FENTRY_CALL is actually not taking any input / output operands. The
machine verifier complains now because the target description says that:

* It needs 1 unknown output
* It needs 1 or more variable inputs

llvm-svn: 345316
2018-10-25 21:12:15 +00:00
Amara Emerson
aa8a544e25 [GlobalISel] Use the target preferred type for G_EXTRACT_VECTOR_ELT index.
Allows for better imported pattern re-use.

llvm-svn: 345265
2018-10-25 14:04:54 +00:00
Clement Courbet
dc9ae03db9 [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel.
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).

This also compresses the pfm counter tables (PR37068).

Reviewers: RKSimon, gchatelet

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D52932

llvm-svn: 345243
2018-10-25 07:44:01 +00:00
Thomas Lively
1b7df70a34 Make fminimum/fmaximum SDNodes commutative and associative
Reviewers: aheejin, dschuff

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53680

llvm-svn: 345220
2018-10-24 23:14:59 +00:00
Thomas Lively
e649dc137e [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Leonard Chan
26225ba254 [Intrinsic] Unigned Saturation Addition Intrinsic
Add an intrinsic that takes 2 integers and perform unsigned saturation
addition on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D53340

llvm-svn: 344971
2018-10-22 23:08:40 +00:00
Matt Arsenault
bc7384fc93 DAG: Change behavior of fminnum/fmaxnum nodes
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.

There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.

llvm-svn: 344914
2018-10-22 16:27:27 +00:00
Leonard Chan
adef4ea9d2 [Intrinsic] Signed Saturation Addition Intrinsic
Add an intrinsic that takes 2 integers and perform saturation addition on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D53053

llvm-svn: 344629
2018-10-16 17:35:41 +00:00
Andrea Di Biagio
23f0c22225 [tblgen][llvm-mca] Add the ability to describe move elimination candidates via tablegen.
This patch adds the ability to identify instructions that are "move elimination
candidates". It also allows scheduling models to describe processor register
files that allow move elimination.

A move elimination candidate is an instruction that can be eliminated at
register renaming stage.
Each subtarget can specify which instructions are move elimination candidates
with the help of tablegen class "IsOptimizableRegisterMove" (see
llvm/Target/TargetInstrPredicate.td).

For example, on X86, BtVer2 allows both GPR and MMX/SSE moves to be eliminated.
The definition of 'IsOptimizableRegisterMove' for BtVer2 looks like this:

```
def : IsOptimizableRegisterMove<[
  InstructionEquivalenceClass<[
    // GPR variants.
    MOV32rr, MOV64rr,

    // MMX variants.
    MMX_MOVQ64rr,

    // SSE variants.
    MOVAPSrr, MOVUPSrr,
    MOVAPDrr, MOVUPDrr,
    MOVDQArr, MOVDQUrr,

    // AVX variants.
    VMOVAPSrr, VMOVUPSrr,
    VMOVAPDrr, VMOVUPDrr,
    VMOVDQArr, VMOVDQUrr
  ], CheckNot<CheckSameRegOperand<0, 1>> >
]>;
```

Definitions of IsOptimizableRegisterMove from processor models of a same
Target are processed by the SubtargetEmitter to auto-generate a target-specific
override for each of the following predicate methods:

```
bool TargetSubtargetInfo::isOptimizableRegisterMove(const MachineInstr *MI)
const;
bool MCInstrAnalysis::isOptimizableRegisterMove(const MCInst &MI, unsigned
CPUID) const;
```

By default, those methods return false (i.e. conservatively assume that there
are no move elimination candidates).

Tablegen class RegisterFile has been extended with the following information:
 - The set of register classes that allow move elimination.
 - Maxium number of moves that can be eliminated every cycle.
 - Whether move elimination is restricted to moves from registers that are
   known to be zero.

This patch is structured in three part:

A first part (which is mostly boilerplate) adds the new
'isOptimizableRegisterMove' target hooks, and extends existing register file
descriptors in MC by introducing new fields to describe properties related to
move elimination.

A second part, uses the new tablegen constructs to describe move elimination in
the BtVer2 scheduling model.

A third part, teaches llm-mca how to query the new 'isOptimizableRegisterMove'
hook to mark instructions that are candidates for move elimination. It also
teaches class RegisterFile how to describe constraints on move elimination at
PRF granularity.

llvm-mca tests for btver2 show differences before/after this patch.

Differential Revision: https://reviews.llvm.org/D53134

llvm-svn: 344334
2018-10-12 11:23:04 +00:00
Clement Courbet
6514f1c243 [llvm-exegesis] Add support for measuring NumMicroOps.
Summary:
Example output for vzeroall:

---
mode:            uops
key:
  instructions:
    - 'VZEROALL'
  config:          ''
  register_initial_values:
cpu_name:        haswell
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { debug_string: HWPort0, value: 0.0006, per_snippet_value: 0.0006,
      key: '3' }
  - { debug_string: HWPort1, value: 0.0011, per_snippet_value: 0.0011,
      key: '4' }
  - { debug_string: HWPort2, value: 0.0004, per_snippet_value: 0.0004,
      key: '5' }
  - { debug_string: HWPort3, value: 0.0018, per_snippet_value: 0.0018,
      key: '6' }
  - { debug_string: HWPort4, value: 0.0002, per_snippet_value: 0.0002,
      key: '7' }
  - { debug_string: HWPort5, value: 1.0019, per_snippet_value: 1.0019,
      key: '8' }
  - { debug_string: HWPort6, value: 1.0033, per_snippet_value: 1.0033,
      key: '9' }
  - { debug_string: HWPort7, value: 0.0001, per_snippet_value: 0.0001,
      key: '10' }
  - { debug_string: NumMicroOps, value: 20.0069, per_snippet_value: 20.0069,
      key: NumMicroOps }
error:           ''
info:            ''
assembled_snippet: C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C5FC77C3
...

Reviewers: gchatelet

Subscribers: tschuett, RKSimon, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D52539

llvm-svn: 343094
2018-09-26 11:22:56 +00:00
Fangrui Song
556d8103ca Use unique_ptr to hold AsmInfo,MRI,MII,STI
Reviewers: pcc, dblaikie

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52389

llvm-svn: 342945
2018-09-25 06:19:31 +00:00
Andrea Di Biagio
db9fd3fc9a [TableGen][SubtargetEmitter] Add the ability for processor models to describe dependency breaking instructions.
This patch adds the ability for processor models to describe dependency breaking
instructions.

Different processors may specify a different set of dependency-breaking
instructions.
That means, we cannot assume that all processors of the same target would use
the same rules to classify dependency breaking instructions.

The main goal of this patch is to provide the means to describe dependency
breaking instructions directly via tablegen, and have the following
TargetSubtargetInfo hooks redefined in overrides by tabegen'd
XXXGenSubtargetInfo classes (here, XXX is a Target name).

```
virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const {
  return false;
}

virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const {
  return isZeroIdiom(MI);
}
```

An instruction MI is a dependency-breaking instruction if a call to method
isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to
true. Similarly, an instruction MI is a special case of zero-idiom dependency
breaking instruction if a call to STI.isZeroIdiom(MI) returns true.
The extra APInt is used for those targets that may want to select which machine
operands have their dependency broken (see comments in code).
Note that by default, subtargets don't know about the existence of
dependency-breaking. In the absence of external information, those method calls
would always return false.

A new tablegen class named STIPredicate has been added by this patch to let
processor models classify instructions that have properties in common. The idea
is that, a MCInstrPredicate definition can be used to "generate" an instruction
equivalence class, with the idea that instructions of a same class all have a
property in common.

STIPredicate definitions are essentially a collection of instruction equivalence
classes.
Also, different processor models can specify a different variant of the same
STIPredicate with different rules (i.e. predicates) to classify instructions.
Tablegen backends (in this particular case, the SubtargetEmitter) will be able
to process STIPredicate definitions, and automatically generate functions in
XXXGenSubtargetInfo.

This patch introduces two special kind of STIPredicate classes named
IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a
definition for those in the BtVer2 scheduling model only.

This patch supersedes the one committed at r338372 (phabricator review: D49310).

The main advantages are:
 - We can describe subtarget predicates via tablegen using STIPredicates.
 - We can describe zero-idioms / dep-breaking instructions directly via
   tablegen in the scheduling models.

In future, the STIPredicates framework can be used for solving other problems.
Examples of future developments are:
 - Teach how to identify optimizable register-register moves
 - Teach how to identify slow LEA instructions (each subtarget defining its own
   concept of "slow" LEA).
 - Teach how to identify instructions that have undocumented false dependencies
   on the output registers on some processors only.

It is also (in my opinion) an elegant way to expose knowledge to both external
tools like llvm-mca, and codegen passes.
For example, machine schedulers in LLVM could reuse that information when
internally constructing the data dependency graph for a code region.

This new design feature is also an "opt-in" feature. Processor models don't have
to use the new STIPredicates. It has all been designed to be as unintrusive as
possible.

Differential Revision: https://reviews.llvm.org/D52174

llvm-svn: 342555
2018-09-19 15:57:45 +00:00
Craig Topper
eee28696cb [SelectionDAG] Remove masked_gather/scatter from TargetSelectionDAG.td.
These aren't used in tree and the number of operands in the type profile is wrong. X86 uses its own ISD opcode and type profile after op legalization.

llvm-svn: 340899
2018-08-29 04:45:33 +00:00
Aditya Nandakumar
4e615a2119 [GISel]: Add missing opcodes for overflow intrinsics
https://reviews.llvm.org/D51197

Currently, IRTranslator (and GISel) seems to be arbitrarily picking
which overflow intrinsics get mapped into opcodes which either have a
carry as an input or not.
For intrinsics such as Intrinsic::uadd_with_overflow, translate it to an
opcode (G_UADDO) which doesn't have any carry inputs (similar to LLVM
IR).

This patch adds 4 missing opcodes for completeness - G_UADDO, G_USUBO,
G_SSUBE and G_SADDE.

llvm-svn: 340865
2018-08-28 18:54:10 +00:00
Craig Topper
f92ce9e064 [SelectionDAG][X86] Reorder the operands the MaskedStoreSDNode to put the value first.
Summary:
Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions.

This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly.

X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree.

Reviewers: RKSimon, delena, hfinkel, eli.friedman

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50402

llvm-svn: 340689
2018-08-25 17:48:17 +00:00
David Green
61ff754920 [AArch64] Add Tiny Code Model for AArch64
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.

This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.

Differential Revision: https://reviews.llvm.org/D49673

llvm-svn: 340397
2018-08-22 11:31:39 +00:00
Heejin Ahn
8e7bbf9abf [WebAssembly] Add isEHScopeReturn instruction property
Summary:
So far, `isReturn` property is used to mean both a return instruction
from a functon and the end of an EH scope, a scope that starts with a EH
scope entry BB and ends with a catchret or a cleanupret instruction.
Because WinEH uses funclets, all EH-scope-ending instructions are also
real return instruction from a function. But for wasm, they only serve
as the end marker of an EH scope but not a return instruction that
exits a function. This mismatch caused incorrect prolog and epilog
generation in wasm EH scopes. This patch fixes this.

This patch is in the same vein with rL333045, which splits
`MachineBasicBlock::isEHFuncletEntry` into `isEHFuncletEntry` and
`isEHScopeEntry`.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D50653

llvm-svn: 340325
2018-08-21 19:44:11 +00:00
Aditya Nandakumar
1dea0652da Revert "Revert rr340111 "[GISel]: Add Legalization/lowering code for bit counting operations""
This reverts commit d1341152d91398e9a882ba2ee924147ea2f9b589.

This patch originally made use of Nested MachineIRBuilder buildInstr
calls, and since order of argument processing is not well defined, the
instructions were built slightly in a different order (still correct).
I've removed the nested buildInstr calls to have a defined order now.

Patch was tested by Mikael.

llvm-svn: 340309
2018-08-21 17:30:31 +00:00
Aditya Nandakumar
7515ede575 Revert "Revert r339977: [GISel]: Add Opcodes for a few LLVM Intrinsics"
This reverts commit 7debc334e6421bb5251ef8f18e97166dfc7dd787.

I missed updating legalizer-info-validation.mir as I had assertions
turned off in my build and that specific test requires asserts. Fixed it
now.

llvm-svn: 340197
2018-08-20 18:43:19 +00:00
Reid Kleckner
b3c566c278 Revert rr340111 "[GISel]: Add Legalization/lowering code for bit counting operations"
It causes LegalizerHelperTest.LowerBitCountingCTTZ1 to fail.

llvm-svn: 340186
2018-08-20 16:50:19 +00:00
Aditya Nandakumar
d12808310c [GISel]: Add Legalization/lowering code for bit counting operations
https://reviews.llvm.org/D48847#inline-448257

Ported legalization expansions for CTLZ/CTTZ from DAG to GISel.

Reviewed by rtereshin.

llvm-svn: 340111
2018-08-18 00:01:54 +00:00
Chandler Carruth
6edb18f7dd Revert r339977: [GISel]: Add Opcodes for a few LLVM Intrinsics
This is breaking ~all the bots.

llvm-svn: 339982
2018-08-17 04:47:16 +00:00
Aditya Nandakumar
5123be13d4 [GISel]: Add Opcodes for a few LLVM Intrinsics
https://reviews.llvm.org/D50401

Add opcodes for llvm.intrinsic.trunc, round, and update the IRTranslator
for the same.

Reviewed by: dsanders.

llvm-svn: 339977
2018-08-17 01:41:56 +00:00
Andrea Di Biagio
8add466d59 [Tablegen][MCInstPredicate] Removed redundant template argument from class TIIPredicate, and implemented verification rules for TIIPredicates.
This patch removes redundant template argument `TargetName` from TIIPredicate.
Tablegen can always infer the target name from the context. So we don't need to
force users of TIIPredicate to always specify it.

This allows us to better modularize the tablegen class hierarchy for the
so-called "function predicates". class FunctionPredicateBase has been added; it
is currently used as a building block for TIIPredicates. However, I plan to
reuse that class to model other function predicate classes too (i.e. not just
TIIPredicates). For example, this can be a first step towards implementing
proper support for dependency breaking instructions in tablegen.

This patch also adds a verification step on TIIPredicates in tablegen.
We cannot have multiple TIIPredicates with the same name. Otherwise, this will
cause build errors later on, when tablegen'd .inc files are included by cpp
files and then compiled.

Differential Revision: https://reviews.llvm.org/D50708

llvm-svn: 339706
2018-08-14 18:36:54 +00:00
Andrea Di Biagio
e84b02f47e [Tablegen][SubtargetEmitter] Improve expansion of predicates of a variant scheduling class.
This patch refactors the logic that expands predicates of a variant scheduling
class.

The idea is to improve the readability of the auto-generated code by removing
redundant parentheses around predicate expressions, and by removing redundant
if(true) statements.

This patch replaces the definition of NoSchedPred in TargetSchedule.td with an
instance of MCSchedPredicate. The new definition is sematically equivalent to
the previous one. The main difference is that now SubtargetEmitter knows that it
represents predicate "true".

Before this patch, we always generated an if (true) for the default transition
of a variant scheduling class.

Example (taken from AArch64GenSubtargetInfo.inc) :

```
if (SchedModel->getProcessorID() == 3) { // CycloneModel
  if ((TII->isScaledAddr(*MI)))
    return 927; // (WriteIS_WriteLD)_ReadBaseRS
  if ((true))
    return 928; // WriteLD_ReadDefault
}
```

Extra parentheses were also generated around the predicate expressions.

With this patch, we get the following auto-generated checks:

```
if (SchedModel->getProcessorID() == 3) { // CycloneModel
  if (TII->isScaledAddr(*MI))
    return 927; // (WriteIS_WriteLD)_ReadBaseRS
  return 928; // WriteLD_ReadDefault
}
```

The new auto-generated code behaves exactly the same as before. So, technically
this is a non functional change.

Differential revision: https://reviews.llvm.org/D50566

llvm-svn: 339552
2018-08-13 11:09:04 +00:00
Reid Kleckner
5f0f124d6a [MC] Move EH DWARF encodings from MC to CodeGen, NFC
Summary:
The TType encoding, LSDA encoding, and personality encoding are all
passed explicitly by CodeGen to the assembler through .cfi_* directives,
so only the AsmPrinter needs to know about them.

The FDE CFI encoding however, controls the encoding of the label
implicitly created by the .cfi_startproc directive. That directive seems
to be special in that it doesn't take an encoding, so the assembler just
has to know how to encode one DSO-local label reference from .eh_frame
to .text.

As a result, it looks like MC will continue to have to know when the
large code model is in use. Perhaps we could invent a '.cfi_startproc
[large]' flag so that this knowledge doesn't need to pollute the
assembler.

Reviewers: davide, lliu0, JDevlieghere

Subscribers: hiraditya, fedor.sergeev, llvm-commits

Differential Revision: https://reviews.llvm.org/D50533

llvm-svn: 339397
2018-08-09 22:24:04 +00:00
Andrea Di Biagio
6413a2dd47 [MC][PredicateExpander] Extend the grammar to support simple switch and return statements.
This patch introduces tablegen class MCStatement.

Currently, an MCStatement can be either a return statement, or a switch
statement.

```
MCStatement:
   MCReturnStatement
   MCOpcodeSwitchStatement
```

A MCReturnStatement expands to a return statement, and the boolean expression
associated with the return statement is described by a MCInstPredicate.

An MCOpcodeSwitchStatement is a switch statement where the condition is a check
on the machine opcode. It allows the definition of multiple checks, as well as a
default case. More details on the grammar implemented by these two new
constructs can be found in the diff for TargetInstrPredicates.td.

This patch makes it easier to read the body of auto-generated TargetInstrInfo
predicates.

In future, I plan to reuse/extend the MCStatement grammar to describe more
complex target hooks. For now, this is just a first step (mostly a minor
cosmetic change to polish the new predicates framework).

Differential Revision: https://reviews.llvm.org/D50457

llvm-svn: 339352
2018-08-09 15:32:48 +00:00
Craig Topper
bb1bde9246 [SelectionDAG][X86][SystemZ] Add a generic nonvolatile_store/nonvolatile_load pattern fragment in TargetSelectionDAG.td
Differential Revision: https://reviews.llvm.org/D50358

llvm-svn: 339156
2018-08-07 17:34:59 +00:00
Andrea Di Biagio
7444c6289e [Tablegen] In TargetSchedule.td: Remove unused argument pfmCounters from ProcResourceUnits.
PFM counters don't need to be passed in input to the definition of
ProcResourceUnits. class PfmIssueCounter (see r329675) is used to map resources
to PFM counter(s).

Differential Revision: https://reviews.llvm.org/D50333

llvm-svn: 339125
2018-08-07 10:33:46 +00:00
Aditya Nandakumar
6188b99859 [GISel]: Add Opcodes for CTLZ/CTTZ/CTPOP
https://reviews.llvm.org/D48600

Added IRTranslator support to translate these known intrinsics into GISel opcodes.

llvm-svn: 338944
2018-08-04 01:22:12 +00:00
Lei Liu
4779c3a4b6 [AArch64] DWARF: do not generate AT_location for thread local
AArch64 ELF ABI does not define a static relocation type for TLS offset within
a module, which makes it impossible for compiler to generate a valid
DW_AT_location content for thread local variables. Currently LLVM generates an
invalid R_AARCH64_ABS64 relocation at the DW_AT_location field for a TLS
variable. That causes trouble for linker because thread local variable does
not have an absolute address at link time. AArch64 GCC solves the problem by
not generating DW_AT_location for thread local variables. We should do the
same in LLVM.

Differential Revision: https://reviews.llvm.org/D43860

llvm-svn: 338655
2018-08-01 23:46:49 +00:00
Amara Emerson
836debbb53 [GlobalISel] Add a G_BLOCK_ADDR opcode to handle IR blockaddress constants.
Differential Revision: https://reviews.llvm.org/D49900

llvm-svn: 338335
2018-07-31 00:08:50 +00:00
Fangrui Song
121474a01b Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
Andrea Di Biagio
667da2dc91 [TargetInstPredicate] Add definition of CheckInvalidRegisterOperand.
This should have been part of r337378. I forgot to svn add it before committing
the change.

llvm-svn: 337380
2018-07-18 11:16:31 +00:00
Peter Collingbourne
52168cfcad CodeGen: Add a target option for emitting .addrsig directives for all address-significant symbols.
Differential Revision: https://reviews.llvm.org/D48143

llvm-svn: 337331
2018-07-17 22:40:08 +00:00
Fangrui Song
c76988a0c0 [CodeGen] Fix inconsistent declaration parameter name
llvm-svn: 337200
2018-07-16 18:51:40 +00:00
Andrea Di Biagio
c19db3b1d5 [llvm-mca][BtVer2] teach how to identify false dependencies on partially written
registers.

The goal of this patch is to improve the throughput analysis in llvm-mca for the
case where instructions perform partial register writes.

On x86, partial register writes are quite difficult to model, mainly because
different processors tend to implement different register merging schemes in
hardware.

When the code contains partial register writes, the IPC (instructions per
cycles) estimated by llvm-mca tends to diverge quite significantly from the
observed IPC (using perf).

Modern AMD processors (at least, from Bulldozer onwards) don't rename partial
registers. Quoting Agner Fog's microarchitecture.pdf:
" The processor always keeps the different parts of an integer register together.
For example, AL and AH are not treated as independent by the out-of-order
execution mechanism. An instruction that writes to part of a register will
therefore have a false dependence on any previous write to the same register or
any part of it."

This patch is a first important step towards improving the analysis of partial
register updates. It changes the semantic of RegisterFile descriptors in
tablegen, and teaches llvm-mca how to identify false dependences in the presence
of partial register writes (for more details: see the new code comments in
include/Target/TargetSchedule.h - class RegisterFile).

This patch doesn't address the case where a write to a part of a register is
followed by a read from the whole register.  On Intel chips, high8 registers
(AH/BH/CH/DH)) can be stored in separate physical registers. However, a later
(dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which
adds extra latency (and potentially affects the pipe usage).
This is a very interesting article on the subject with a very informative answer
from Peter Cordes:
https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to

In future, the definition of RegisterFile can be extended with extra information
that may be used to identify delays caused by merge opcodes triggered by a dirty
read of a partial write.

Differential Revision: https://reviews.llvm.org/D49196

llvm-svn: 337123
2018-07-15 11:01:38 +00:00
Joel Galenson
9249622410 [cfi-verify] Support AArch64.
This patch adds support for AArch64 to cfi-verify.

This required three changes to cfi-verify.  First, it generalizes checking if an instruction is a trap by adding a new isTrap flag to TableGen (and defining it for x86 and AArch64).  Second, the code that ensures that the operand register is not clobbered between the CFI check and the indirect call needs to allow a single dereference (in x86 this happens as part of the jump instruction).  Third, we needed to ensure that return instructions are not counted as indirect branches.  Technically, returns are indirect branches and can be covered by CFI, but LLVM's forward-edge CFI does not protect them, and x86 does not consider them, so we keep that behavior.

In addition, we had to improve AArch64's code to evaluate the branch target of a MCInst to handle calls where the destination is not the first operand (which it often is not).

Differential Revision: https://reviews.llvm.org/D48836

llvm-svn: 337007
2018-07-13 15:19:33 +00:00
Ulrich Weigand
535942804d [TableGen] Support multi-alternative pattern fragments
A TableGen instruction record usually contains a DAG pattern that will
describe the SelectionDAG operation that can be implemented by this
instruction. However, there will be cases where several different DAG
patterns can all be implemented by the same instruction. The way to
represent this today is to write additional patterns in the Pattern
(or usually Pat) class that map those extra DAG patterns to the
instruction. This usually also works fine.

However, I've noticed cases where the current setup seems to require
quite a bit of extra (and duplicated) text in the target .td files.
For example, in the SystemZ back-end, there are quite a number of
instructions that can implement an "add-with-overflow" operation.
The same instructions also need to be used to implement just plain
addition (simply ignoring the extra overflow output). The current
solution requires creating extra Pat pattern for every instruction,
duplicating the information about which particular add operands
map best to which particular instruction.

This patch enhances TableGen to support a new PatFrags class, which
can be used to encapsulate multiple alternative patterns that may
all match to the same instruction.  It operates the same way as the
existing PatFrag class, except that it accepts a list of DAG patterns
to match instead of just a single one.  As an example, we can now define
a PatFrags to match either an "add-with-overflow" or a regular add
operation:

  def z_sadd : PatFrags<(ops node:$src1, node:$src2),
                        [(z_saddo node:$src1, node:$src2),
                         (add node:$src1, node:$src2)]>;

and then use this in the add instruction pattern:

  defm AR : BinaryRRAndK<"ar", 0x1A, 0xB9F8, z_sadd, GR32, GR32>;

These SystemZ target changes are implemented here as well.


Note that PatFrag is now defined as a subclass of PatFrags, which
means that some users of internals of PatFrag need to be updated.
(E.g. instead of using PatFrag.Fragment you now need to use
!head(PatFrag.Fragments).)


The implementation is based on the following main ideas:
- InlinePatternFragments may now replace each original pattern
  with several result patterns, not just one.
- parseInstructionPattern delays calling InlinePatternFragments
  and InferAllTypes.  Instead, it extracts a single DAG match
  pattern from the main instruction pattern.
- Processing of the DAG match pattern part of the main instruction
  pattern now shares most code with processing match patterns from
  the Pattern class.
- Direct use of main instruction patterns in InferFromPattern and
  EmitResultInstructionAsOperand is removed; everything now operates
  solely on DAG match patterns.


Reviewed by: hfinkel

Differential Revision: https://reviews.llvm.org/D48545

llvm-svn: 336999
2018-07-13 13:18:00 +00:00
Francis Visoiu Mistrih
f144cbd9ce [XRay] Fix machine verifier issues in X86
I'm not sure if this fix is the right thing to do, but it seemed to me
that PATCHABLE_RET and PATCHABLE_TAIL_CALL don't have any defs.

Running the following:

```
LLVM_ENABLE_MACHINE_VERIFIER=1 ./build/bin/llvm-lit -v -a test/CodeGen/X86/xray-*
```

results in the following tests to fail (along others):

```
LLVM :: CodeGen/X86/xray-attribute-instrumentation.ll
LLVM :: CodeGen/X86/xray-custom-log.ll
LLVM :: CodeGen/X86/xray-log-args.ll
LLVM :: CodeGen/X86/xray-loop-detection.ll
LLVM :: CodeGen/X86/xray-multiplerets-in-blocks.mir
LLVM :: CodeGen/X86/xray-section-group.ll
LLVM :: CodeGen/X86/xray-selective-instrumentation.ll
LLVM :: CodeGen/X86/xray-tail-call-sled.ll
LLVM :: CodeGen/X86/xray-typed-event-log.ll
```

The errors are:

```
*** Bad machine code: Explicit definition must be a register ***
- function:    fn
- basic block: %bb.0  (0x7fa31a84d908)
- instruction: PATCHABLE_RET 2560, $eax
- operand 0:   2560
```

and

```
*** Bad machine code: Explicit definition must be a register ***
- function:    caller
- basic block: %bb.0  (0x7fbff3044108)
- instruction: PATCHABLE_TAIL_CALL 3009, @callee, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $rsp, implicit $ssp, implicit $edi
- operand 0:   3009
```

Differential Revision: https://reviews.llvm.org/D49187

llvm-svn: 336906
2018-07-12 14:36:43 +00:00
Jessica Paquette
2c6ef18647 [MachineOutliner] Add support for target-default outlining.
This adds functionality to the outliner that allows targets to
specify certain functions that should be outlined from by default.

If a target supports default outlining, then it specifies that in
its TargetOptions. In the case that it does, and the user hasn't
specified that they *never* want to outline, the outliner will
be added to the pass pipeline and will run on those default functions.

This is a preliminary patch for turning the outliner on by default
under -Oz for AArch64.

https://reviews.llvm.org/D48776

llvm-svn: 336040
2018-06-30 03:56:03 +00:00
Jessica Paquette
848840c089 [MachineOutliner] Define MachineOutliner support in TargetOptions
Targets should be able to define whether or not they support the outliner
without the outliner being added to the pass pipeline. Before this, the
outliner pass would be added, and ask the target whether or not it supports the
outliner.

After this, it's possible to query the target in TargetPassConfig, before the
outliner pass is created. This ensures that passing -enable-machine-outliner
will not modify the pass pipeline of any target that does not support it.

https://reviews.llvm.org/D48683

llvm-svn: 335887
2018-06-28 17:45:43 +00:00
Matthias Braun
719039ac2c SelectionDAGBuilder, mach-o: Skip trap after noreturn call (for Mach-O)
Add NoTrapAfterNoreturn target option which skips emission of traps
behind noreturn calls even if TrapUnreachable is enabled.

Enable the feature on Mach-O to save code size; Comments suggest it is
not possible to enable it for the other users of TrapUnreachable.

rdar://41530228

DifferentialRevision: https://reviews.llvm.org/D48674
llvm-svn: 335877
2018-06-28 17:00:45 +00:00
Aditya Nandakumar
ec43439ab4 [GISel]: Add G_ADDRSPACE_CAST Opcode
Added IRTranslator support for addrspacecast.

https://reviews.llvm.org/D48469

reviewed by: volkan

llvm-svn: 335388
2018-06-22 20:58:51 +00:00
Daniel Sanders
50735a1161 [globalisel][tablegen] Add support for C++ predicates on PatFrags and use it to support BFC on ARM.
So far, we've only handled special cases of PatFrag like ImmLeaf. This patch
adds support for the remaining cases using similar mechanisms.

Like most C++ code from SelectionDAG, GISel and DAGISel expect to operate on
different types and representations and as such the code is not compatible
between the two. It's therefore necessary to add an alternative implementation
in the GISelPredicateCode field.

The target test for this feature could easily be done with IntImmLeaf and this
would save on a little boilerplate. The reason I've chosen to implement this
using PatFrag.GISelPredicateCode and not IntImmLeaf is because I was unable to
find a rule that was blocked solely by lack of support for PatFrag predicates. I
found that the ones I investigated as being likely candidates for the test
were further blocked by other things.

llvm-svn: 334871
2018-06-15 23:13:43 +00:00
Clement Courbet
e3e2fa9c0a [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles.
Summary:
For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files.
For more obvious cases, I've ventured a fix.

Some notes:
 - Exynos is especially fishy.
 - AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice.
 - The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit.

Also see PR37310.

Reviewers: RKSimon, craig.topper, javed.absar

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 334586
2018-06-13 09:41:49 +00:00
Andrea Di Biagio
c26fd3adc6 [RFC][Patch 1/3] Add a new class of predicates for variant scheduling classes.
This patch is the first of a sequence of three patches described by the LLVM-dev
RFC "MC support for variant scheduling classes".
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123181.html

The goal of this patch is to introduce a new class of scheduling predicates for
SchedReadVariant and SchedWriteVariant.

An MCSchedPredicate can be used instead of a normal SchedPredicate to model
checks on the instruction (either a MachineInstr or a MCInst).
Internally, an MCSchedPredicate encapsulates an MCInstPredicate definition.
MCInstPredicate allows the definition of expressions with a well-known semantic,
that can be used to generate code for both MachineInstr and MCInst.

This is the first step toward teaching to tools like lllvm-mca how to resolve
variant scheduling classes.

Differential Revision: https://reviews.llvm.org/D46695

llvm-svn: 333282
2018-05-25 15:55:37 +00:00
Petar Jovanovic
8c311a61bc [X86][MIPS][ARM] New machine instruction property 'isMoveReg'
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D45204

llvm-svn: 333093
2018-05-23 15:28:28 +00:00
Simon Dardis
b7fb5f7be7 [FastISel] Permit instructions to be skipped for FastISel generation.
Some ISA's such as microMIPS32(R6) have instructions which are near identical
for code generation purposes, e.g. xor and xor16. These instructions take the
same value types for operands and return values, have the same
instruction predicates and map to the same ISD opcode. (These instructions do
differ by register classes.)

In such cases, the FastISel generator rejects the instruction definition.

This patch borrows the 'FastIselShouldIgnore' bit from rL129692 and enables
applying it to an instruction definition.

Reviewers: mcrosier

Differential Revision: https://reviews.llvm.org/D46953

llvm-svn: 332983
2018-05-22 14:36:58 +00:00
Peter Collingbourne
7f9b89f0e2 CodeGen: Add a dwo output file argument to addPassesToEmitFile and hook it up to dwo output.
Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47089

llvm-svn: 332881
2018-05-21 20:16:41 +00:00
Shiva Chen
1127e5026d [DebugInfo] Convert intrinsic llvm.dbg.label to MachineInstr.
In order to convert LLVM IR to MachineInstr, we need a new TargetOpcode,
DBG_LABEL, to ‘lower’ intrinsic llvm.dbg.label. The patch
creates this new TargetOpcode and convert intrinsic llvm.dbg.label to
MachineInstr through SelectionDAG.

In SelectionDAG, debug information is stored in SDDbgInfo. We create a
new data member of SDDbgInfo for labels and use the new data member,
SDDbgLabel, to create DBG_LABEL MachineInstr.

The new DBG_LABEL MachineInstr uses label metadata from LLVM IR as its
parameter. So, the backend could get metadata information of labels from
DBG_LABEL MachineInstr.

Differential Revision: https://reviews.llvm.org/D45341

Patch by Hsiangkai Wang.

llvm-svn: 331842
2018-05-09 02:41:08 +00:00
Daniel Sanders
96623363e9 [globalisel] Update GlobalISel emitter to match new representation of extending loads
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch changes the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.

The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.

Depends on D45540

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar

Reviewed By: rtereshin

Subscribers: javed.absar, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D45541

llvm-svn: 331601
2018-05-05 20:53:24 +00:00
Adrian Prantl
076a6683eb Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Craig Topper
1a57227400 [X86] Restrict many of the InstAliases to either to only att or intel syntax. NFCI
Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic

This patch restricts a lot of these to only one variant so we don't get the duplication.

This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen.

llvm-svn: 331117
2018-04-28 18:46:11 +00:00
Daniel Sanders
491fd6384e [globalisel][legalizerinfo] Introduce dedicated extending loads and add lowerings for them
Summary:
Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:
* G_LOAD had to be legal for all sizes you could extend from, even if
  registers didn't naturally hold those sizes.
* All sizes you could extend from had to be allocatable just in case the
  extend went missing (e.g. by optimization).
* At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we
  improve optimization of extends and truncates, this legality requirement
  would spread without considerable care w.r.t when certain combines were
  permitted.
* The SelectionDAG importer required some ugly and fragile pattern
  rewriting to translate patterns into this style.

This patch begins changing the representation to:
* (G_[SZ]EXTLOAD x)
* (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()
which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

This patch introduces the new generic instructions and new variation on
G_LOAD and adds lowering for them to convert back to the existing
representations.

Depends on D45466

Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, aemerson, javed.absar

Reviewed By: aemerson

Subscribers: aemerson, kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45540

llvm-svn: 331115
2018-04-28 18:14:50 +00:00
Eric Christopher
9e3701d460 Remove unused argument from emitModuleMetadata.
NFCI.

llvm-svn: 330470
2018-04-20 19:07:57 +00:00
Keith Wyss
2c7eab97b2 [XRay] Typed event logging intrinsic
Summary:
Add an LLVM intrinsic for type discriminated event logging with XRay.
Similar to the existing intrinsic for custom events, but also accepts
a type tag argument to allow plugins to be aware of different types
and semantically interpret logged events they know about without
choking on those they don't.

Relies on a symbol defined in compiler-rt patch D43668. I may wait
to submit before I can see demo everything working together including
a still to come clang patch.

Reviewers: dberris, pelikan, eizan, rSerge, timshen

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45633

llvm-svn: 330219
2018-04-17 21:30:29 +00:00
Clement Courbet
221e6ccd99 [MC][TableGen] Add optional libpfm counter names for ProcResUnits.
Summary:
Subtargets can define the libpfm counter names that can be used to
measure cycles and uops issued on ProcResUnits.
This allows making llvm-exegesis available on more targets.
Fixes PR36984.

Reviewers: gchatelet, RKSimon, andreadb, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D45360

llvm-svn: 329675
2018-04-10 08:16:37 +00:00
Andrea Di Biagio
01d033c489 [MC][Tablegen] Allow models to describe the retire control unit for llvm-mca.
This patch adds the ability to describe properties of the hardware retire
control unit.

Tablegen class RetireControlUnit has been added for this purpose (see
TargetSchedule.td).

A RetireControlUnit specifies the size of the reorder buffer, as well as the
maximum number of opcodes that can be retired every cycle.

A zero (or negative) value for the reorder buffer size means: "the size is
unknown". If the size is unknown, then llvm-mca defaults it to the value of
field SchedMachineModel::MicroOpBufferSize.  A zero or negative number of
opcodes retired per cycle means: "there is no restriction on the number of
instructions that can be retired every cycle".

Models can optionally specify an instance of RetireControlUnit. There can only
be up-to one RetireControlUnit definition per scheduling model.

Information related to the RCU (RetireControlUnit) is stored in (two new fields
of) MCExtraProcessorInfo.  llvm-mca loads that information when it initializes
the DispatchUnit / RetireControlUnit (see Dispatch.h/Dispatch.cpp).

This patch fixes PR36661.

Differential Revision: https://reviews.llvm.org/D45259

llvm-svn: 329304
2018-04-05 15:41:41 +00:00
Andrea Di Biagio
f425ba9f0f [MC][Tablegen] Allow the definition of processor register files in the scheduling model for llvm-mca
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.

A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
 - The total number of physical registers.
 - Which target registers are accessible through the register file.
 - The cost of allocating a register at register renaming stage.

Example (from this patch - see file X86/X86ScheduleBtVer2.td)

  def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>

Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.

The syntax allows to specify an empty set of register classes.  An empty set of
register classes means: this register file models all the registers specified by
the Target.  For each register class, users can specify an optional register
cost. By default, register costs default to 1.  A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".

This patch is structured in two parts.

* Part 1 - MC/Tablegen *

A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.

Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.

* Part 2 - llvm-mca related *

The second part of this patch is related to changes to llvm-mca.

The main differences are:
 1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
 2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
 3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.

The effect of point 3. is also visible in tests register-files-[1..5].s.

Differential Revision: https://reviews.llvm.org/D44980

llvm-svn: 329067
2018-04-03 13:36:24 +00:00
David Blaikie
938d7fe477 Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
David Blaikie
af9abae7af Fix layering by moving Support/CodeGenCWrappers.h to Target
This includes llvm-c/TargetMachine.h which is logically part of
libTarget (since libTarget implements llvm-c/TargetMachine.h's
functions).

llvm-svn: 328394
2018-03-23 23:58:21 +00:00
David Blaikie
0bf763875a Move TargetLoweringObjectFile from CodeGen to Target to fix layering
It's implemented in Target & include from other Target headers, so the
header should be in Target.

llvm-svn: 328392
2018-03-23 23:58:19 +00:00
Krzysztof Parzyszek
e7d626ae4a [X86] Add phony registers for high halves of regs with low halves
Registers E[A-D]X, E[SD]I, E[BS]P, and EIP have 16-bit subregisters
that cover the low halves of these registers. This change adds artificial
subregisters for the high halves in order to differentiate (in terms of
register units) between the 32- and the low 16-bit registers.

This patch contains parts that aim to preserve the calculated register
pressure. This is in order to preserve the current codegen (minimize the
impact of this patch). The approach of having artificial subregisters
could be used to fix PR23423, but the pressure calculation would need
to be changed.

Differential Revision: https://reviews.llvm.org/D43353

llvm-svn: 328016
2018-03-20 18:46:55 +00:00
Nicolai Haehnle
a64a4772e8 TableGen: Check the dynamic type of !cast<Rec>(string)
Summary:
The docs already claim that this happens, but so far it hasn't. As a
consequence, existing TableGen files get this wrong a lot, but luckily
the fixes are all reasonably straightforward.

To make this work with all the existing forms of self-references (since
the true type of a record is only built up over time), the lookup of
self-references in !cast is delayed until the final resolving step.

Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164

Reviewers: arsenm, craig.topper, tra, MartinO

Subscribers: wdng, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D44475

llvm-svn: 327849
2018-03-19 14:14:20 +00:00
Craig Topper
ead1756da7 [TableGen] When trying to reuse a scheduler class for instructions from an InstRW, make sure we haven't already seen another InstRW containing this instruction on this CPU.
This is similar to the check later when we remap some of the instructions from one class to a new one. But if we reuse the class we don't get to do that check.

So many CPUs have violations of this check that I had to add a flag to the SchedMachineModel to allow it to be disabled. Hopefully we can get those cleaned up quickly and remove this flag.

A lot of the violations are due to overlapping regular expressions, but that's not the only kind of issue it found.

llvm-svn: 327808
2018-03-18 19:56:15 +00:00
Matt Arsenault
3a3a83a86f TargetMachine: Add address space to getPointerSize
llvm-svn: 327467
2018-03-14 00:36:23 +00:00
Peter Collingbourne
1834ad0e4a Use branch funnels for virtual calls when retpoline mitigation is enabled.
The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the
branch predictor, and as a result it can lead to a measurable loss of
performance. We can reduce the performance impact of retpolined virtual
calls by replacing them with a special construct known as a branch
funnel, which is an instruction sequence that implements virtual calls
to a set of known targets using a binary tree of direct branches. This
allows the processor to speculately execute valid implementations of the
virtual function without allowing for speculative execution of of calls
to arbitrary addresses.

This patch extends the whole-program devirtualization pass to replace
certain virtual calls with calls to branch funnels, which are
represented using a new llvm.icall.jumptable intrinsic. It also extends
the LowerTypeTests pass to recognize the new intrinsic, generate code
for the branch funnels (x86_64 only for now) and lay out virtual tables
as required for each branch funnel.

The implementation supports full LTO as well as ThinLTO, and extends the
ThinLTO summary format used for whole-program devirtualization to
support branch funnels.

For more details see RFC:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html

Differential Revision: https://reviews.llvm.org/D42453

llvm-svn: 327163
2018-03-09 19:11:44 +00:00
Volkan Keles
6a307ed4fd GlobalISel: IRTranslate llvm.fabs.* intrinsic
Summary:
Fabs is a common floating-point operation, especially for some expansions. This patch adds
a new generic opcode for llvm.fabs.* intrinsic in order to avoid building/matching this intrinsic.

Reviewers: qcolombet, aditya_nandakumar, dsanders, rovka

Reviewed By: aditya_nandakumar

Subscribers: kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D43864

llvm-svn: 326749
2018-03-05 22:31:55 +00:00
Chih-Hung Hsieh
13e607925f [TLS] use emulated TLS if the target supports only this mode
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.

Differential Revision: https://reviews.llvm.org/D42999

llvm-svn: 326341
2018-02-28 17:48:55 +00:00