1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 04:02:41 +01:00
Commit Graph

30976 Commits

Author SHA1 Message Date
Craig Topper
a7bbefe73d [LegalizeTypes][VE] Don't Expand BITREVERSE/BSWAP during type legalization promotion if they will be promoted for NVT in op legalization.
We were trying to expand these if they were going to be expanded
in op legalization so that we generated the minimum number of
operations. We failed to take into account that NVT could be
promoted to another legal type in op legalization.

Hoping this fixes the issue on the VE target reported as a follow
up to D96681. The check line changes were taken from before
1e46b6f4012399a2fef5fbbb4ed06fc919835414 so this patch does
appear to improve some cases that had previously regressed.
2021-06-29 11:00:11 -07:00
Jeremy Morse
cc491fbe18 Catch an extremely obvious memory leak, thanks asan
https://lab.llvm.org/buildbot/#/builders/5/builds/9208

(dbg-phis-merging-in-ldv.mir and dbg-phis-with-loops.mir in the asan
 check stage)
2021-06-29 15:47:17 +01:00
Jeremy Morse
62063d6d86 [DebugInstrRef][3/3] Follow DBG_PHI instructions through LiveDebugValues
This patch reads machine value numbers from DBG_PHI instructions (marking
where SSA PHIs used to be), and matches them up with DBG_INSTR_REF
instructions that refer to them. Essentially they are two separate parts of
a DBG_VALUE: the place to read the value (register and program position),
and where the variable is assigned that value.

Sometimes these DBG_PHIs can be duplicated, usually by tail duplication.
This corresponds to the SSA structure of the program being destroyed, and
the original PHI being split. When this happens: run LLVMs standard
SSAUpdater utility, to work out what values should appear in which blocks.
The majority of this patch is boilerplate to make use of SSAUpdater.

If there are any additional PHIs on the path between multiple DBG_PHIs and
their using DBG_INSTR_REF, their existance is validated, just in case a
value gets clobbered along the way (see dbg-phis-with-loops.mir for
several examples).

Differential Revision: https://reviews.llvm.org/D86814
2021-06-29 14:45:13 +01:00
Michael Liao
58a44149ab [MIRParser] Add machine metadata.
- Add standalone metadata parsing support so that machine metadata nodes
  could be populated before and accessed during MIR is parsed.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103282
2021-06-28 22:29:36 -04:00
Scott Linder
47e3a5ca06 [DebugInfo] Enforce implicit constraints on distinct MDNodes
Add UNIQUED and DISTINCT properties in Metadata.def and use them to
implement restrictions on the `distinct` property of MDNodes:

* DIExpression can currently be parsed from IR or read from bitcode
  as `distinct`, but this property is silently dropped when printing
  to IR. This causes accepted IR to fail to round-trip. As DIExpression
  appears inline at each use in the canonical form of IR, it cannot
  actually be `distinct` anyway, as there is no syntax to describe it.
* Similarly, DIArgList is conceptually always uniqued. It is currently
  restricted to only appearing in contexts where there is no syntax for
  `distinct`, but for consistency it is treated equivalently to
  DIExpression in this patch.
* DICompileUnit is already restricted to always being `distinct`, but
  along with adding general support for the inverse restriction I went
  ahead and described this in Metadata.def and updated the parser to be
  general. Future nodes which have this restriction can share this
  support.

The new UNIQUED property applies to DIExpression and DIArgList, and
forbids them to be `distinct`. It also implies they are canonically
printed inline at each use, rather than via MDNode ID.

The new DISTINCT property applies to DICompileUnit, and requires it to
be `distinct`.

A potential alternative change is to forbid the non-inline syntax for
DIExpression entirely, as is done with DIArgList implicitly by requiring
it appear in the context of a function. For example, we would forbid:

    !named = !{!0}
    !0 = !DIExpression()

Instead we would only accept the equivalent inlined version:

    !named = !{!DIExpression()}

This essentially removes the ability to create a `distinct` DIExpression
by construction, as there is no syntax for `distinct` inline. If this
patch is accepted as-is, the result would be that the non-canonical
version is accepted, but the following would be an error and produce a diagnostic:

    !named = !{!0}
    ; error: 'distinct' not allowed for !DIExpression()
    !0 = distinct !DIExpression()

Also update some documentation to consistently use the inline syntax for
DIExpression, and to describe the restrictions on `distinct` for nodes
where applicable.

Reviewed By: StephenTozer, t-tye

Differential Revision: https://reviews.llvm.org/D104827
2021-06-28 21:20:04 +00:00
Melanie Blower
423a70f3f3 [llvm][clang][fpenv] Create new intrinsic llvm.arith.fence to control FP optimization at expression level
This intrinsic blocks floating point transformations by the optimizer.

Author: Pengfei

Reviewed By: LuoYuanke, Andy Kaylor, Craig Topper, kpn

Differential Revision: https://reviews.llvm.org/D99675
2021-06-28 12:26:52 -04:00
Sander de Smalen
a82143ea32 Reland [GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize.
This patch relands https://reviews.llvm.org/D104454, but fixes some failing
builds on Mac OS which apparently has a different definition for size_t,
that caused 'ambiguous operator overload' for the implicit conversion
of TypeSize to a scalar value.

This reverts commit b732e6c9a8438e5204ac96c8ca76f9b11abf98ff.
2021-06-28 15:24:27 +01:00
Ahsan Saghir
74fd9c1f8e Teach peephole optimizer to not emit sub-register defs
Peephole optimizer should not be introducing sub-reg definitions
as they are illegal in machine SSA phase. This patch modifies
the optimizer to not emit sub-register definitions.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103408
2021-06-28 09:24:07 -05:00
Brendon Cahoon
16a6bb6581 [AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.

The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.

There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.

Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.

Differential Revision: https://reviews.llvm.org/D100149
2021-06-28 09:06:44 -04:00
David Blaikie
dfefcbf53b PR37255: DebugInfo: LTO with -g inlined into -gmlt combined with Split DWARF without CU cross-references
A combination of features ^ that lead to a mismatch of expectations
about how a subprogram definition DIE would be produced with/without a
declaration when taking full -g debug info and inlining it into a -gmlt
CU - specifically when using Split DWARF that doesn't support cross-CU
references, so we have to put the -g debug info into the -gmlt CU, which
gets confusing about which mode is respected.

This patch comes down on respecting the CU the debug info is emitted
into, rather than preserving the full debug info when it's emitted into
the gmlt CU.
2021-06-27 14:40:38 -07:00
David Green
6a316cf978 [ISel] Port AArch64 SABD and UABD to DAGCombine
This ports the AArch64 SABD and USBD over to DAG Combine, where they can be
used by more backends (notably MVE in a follow-up patch). The matching code
has changed very little, just to handle legal operations and types
differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing
a ubds/abdu which is zexted to the original type.

Differential Revision: https://reviews.llvm.org/D91937
2021-06-26 19:34:16 +01:00
David Green
20d5a26896 [DAG] Fold neg(splat(neg(x)) -> splat(x)
This add as a fold of sub(0, splat(sub(0, x))) -> splat(x). This can
come up in the lowering of right shifts under AArch64, where we generate
a shift left of a negated number.

Differential Revision: https://reviews.llvm.org/D103755
2021-06-25 19:53:29 +01:00
Hendrik Greving
36b5d3f1af [ModuloSchedule] Pass loop block explicitly to kernel rewriter.
This change is NFC upstream. We pass in the loop's block to the kernel
rewriter explicitly, instead of assuming it's the loop's top block. This
change is made for downstream targets where this assumption doesn't hold.

Differential Revision: https://reviews.llvm.org/D104811
2021-06-25 09:51:22 -07:00
Sander de Smalen
4d07cbe876 Revert "[GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize."
This patch seems to be causing build errors, reverting it for now.

This reverts commit aeab9d9570ac8cb554aff6e1af24a471fdf5b4e5.
2021-06-25 17:37:16 +01:00
Sander de Smalen
9d34fb6e49 [GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize.
To reflect that the size may be scalable, a TypeSize is returned
instead of an unsigned. In places where the result is used,
it currently relies on an implicit cast of TypeSize -> uint64_t,
which asserts that the type is not scalable.

This patch is NFC for fixed-width vectors.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104454
2021-06-25 17:06:50 +01:00
Sander de Smalen
5eab663b62 [GlobalISel] NFC: Change LLT::changeNumElements to LLT::changeElementCount.
Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104453
2021-06-25 15:54:00 +01:00
Sander de Smalen
88c55d538f [GlobalISel] NFC: Change LLT::scalarOrVector to take ElementCount.
Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104452
2021-06-25 11:26:16 +01:00
Martin Storsjö
9d14adb9f6 [llvm] Rename StringRef _lower() method calls to _insensitive()
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
2021-06-25 00:22:01 +03:00
Craig Topper
c1782c733f [TargetLowering][ARM] Don't alter opaque constants in TargetLowering::ShrinkDemandedConstant.
We don't constant fold based on demanded bits elsewhere in
SimplifyDemandedBits, so I don't think we should shrink them either.

The affected ARM test changes because a constant become non-opaque
and eventually enabled some constant folding. This no longer happens.
I checked and InstCombine is able to simplify this test. I'm not sure exactly
what it was trying to test.

Reviewed By: lebedev.ri, dmgreen

Differential Revision: https://reviews.llvm.org/D104832
2021-06-24 10:09:36 -07:00
Sander de Smalen
ac11cfc716 [GlobalISel] NFC: Change LLT::vector to take ElementCount.
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector

The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
  same number of elements, then use LLT::vector(OtherTy.getElementCount())
  or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
  or operator*. That is because there is no reason to specifically restrict
  the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
  just use fixed_vector. This will need to be fixed up in the future when
  modifying the algorithm to also work for scalable vectors, and will need
  then need additional tests to confirm the behaviour works the same for
  scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
  this is replaced by LLT::scalable_vector.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104451
2021-06-24 11:26:12 +01:00
Stephen Tozer
782c047ef4 Partial Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.

Differential Revision: https://reviews.llvm.org/D91722

This reverts commit 386b66b2fc297cda121a3cc8a36887a6ecbcfc68.
2021-06-24 09:46:38 +01:00
Carl Ritson
e6a4177023 [ValueTypes] Define MVTs for v3i64/v3f64 to complement v6i32/v6f32
Having type symmetry with these is somewhat necessary when implementing support for 192-bit values.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104621
2021-06-24 12:41:22 +09:00
modimo
e702056af0 [NFC] [DwarfEHPrepare] Add additional stats for EH
Stats added:

1. NumCleanupLandingPadsUnreachable: how many cleanup landing pads were optimized as unreachable
1. NumCleanupLandingPadsRemaining: how many cleanup landing pads remain
1. NumNoUnwind: Number of functions with nounwind attribute
1. NumUnwind: Number of functions with unwind attribute

DwarfEHPrepare is always run a single time as part of `TargetPassConfig::addISelPasses()` which makes it an ideal place near the end of the pipeline to record this information.

Example output from clang built with exceptions cumulative during thinLTO backend (NumCleanupLandingPadsUnreachable was not incremented):

	"dwarfehprepare.NumCleanupLandingPadsRemaining": 123660,
	"dwarfehprepare.NumNoUnwind": 323836,
	"dwarfehprepare.NumUnwind": 472893,

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104161
2021-06-23 17:09:30 -07:00
Craig Topper
db11347a21 [CGP][RISCV] Teach CodeGenPrepare::optimizeSwitchInst to honor isSExtCheaperThanZExt.
This optimization pre-promotes the input and constants for a
switch instruction to a legal type so that all the generated compares
share the same extend. Since RISCV prefers sext for i32 to i64
extends, we should honor that to use sext.w instead of a pair
of shifts.

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D104612
2021-06-23 15:38:11 -07:00
Xun Li
3626fc401a [SjLj] Insert UnregisterFn before musttail call
When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104807
2021-06-23 15:33:55 -07:00
Xun Li
99fd766094 Revert "[SjLj] Insert UnregisterFn before musttail call"
This reverts commit f36703ada3dc18388ef5cdcbb8f39f74c27ad8e9.
Test failure: https://lab.llvm.org/buildbot#builders/104/builds/3450
2021-06-23 15:31:35 -07:00
Xun Li
6c523b4fc3 [SjLj] Insert UnregisterFn before musttail call
When inserting UnregisterFn, if there is a musttail call, we must insert before the call so that we don't break the musttail call contract.

Differential Revision: https://reviews.llvm.org/D104807
2021-06-23 14:29:46 -07:00
Jinsong Ji
4cae41ddbe [DAGCombine] Check reassoc flags in aggressive fsub fusion
The is from discussion in https://reviews.llvm.org/D104247#inline-993387

The contract and reassoc flags shouldn't imply each other .

All the aggressive fsub fusion reassociate operations,
we should guard them with reassoc flag check.

Reviewed By: mcberg2017

Differential Revision: https://reviews.llvm.org/D104723
2021-06-23 13:59:40 +00:00
Jon Roelofs
f2b70884ff [Remarks] Make memsize remarks report as an analysis, not a missed opportunity.
Differential revision: https://reviews.llvm.org/D104078
2021-06-22 18:22:47 -07:00
zhijian
3f6513e8c5 [AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table.
Summary:

generate eh_info when vector registers are saved according to the traceback table.

struct eh_info_t {

unsigned version;       /* EH info version 0 */
#if defined(64BIT)

char _pad[4];           /* padding */
#endif

unsigned long lsda;     /* Pointer to Language Specific Data Area */
unsigned long personality; /* Pointer to the personality routine */
};

the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function

Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/D103651
2021-06-22 13:01:31 -04:00
Fangrui Song
61ffec433f Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular)
Before: `warning: stack size limit exceeded (888) in main`
After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned)

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D104667
2021-06-22 09:55:20 -07:00
Sander de Smalen
fd053d5ffe [GlobalISel] Add scalable property to LLT types.
This patch aims to add the scalable property to LLT. The rest of the
patch-series changes the interfaces to take/return ElementCount and
TypeSize, which both have the ability to represent the scalable property.

The changes are mostly mechanical and aim to be non-functional changes
for fixed-width vectors.

For scalable vectors some unit tests have been added, but no effort has
been put into making any of the GlobalISel algorithms work with scalable
vectors yet. That will be left as future work.

The work is split into a series of 5 patches to make reviews easier.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D104450
2021-06-22 08:43:34 +01:00
Eli Friedman
0c81356419 Rename MachineMemOperand::getOrdering -> getSuccessOrdering.
Since this method can apply to cmpxchg operations, make sure it's clear
what value we're actually retrieving.  This will help ensure we don't
accidentally ignore the failure ordering of cmpxchg in the future.

We could potentially introduce a getOrdering() method on AtomicSDNode
that asserts the operation isn't cmpxchg, but not sure that's
worthwhile.

Differential Revision: https://reviews.llvm.org/D103338
2021-06-21 16:49:27 -07:00
Nick Desaulniers
2aca733d9e [IR] convert warn-stack-size from module flag to fn attr
Otherwise, this causes issues when building with LTO for object files
that use different values.

Link: https://github.com/ClangBuiltLinux/linux/issues/1395

Reviewed By: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D104342
2021-06-21 15:09:25 -07:00
Jinsong Ji
4d8c0e830b [DAGCombine] reassoc flag shouldn't enable contract
According to IR LangRef, the FMF flag:

contract
Allow floating-point contraction (e.g. fusing a multiply followed by an
addition into a fused multiply-and-add).

reassoc
Allow reassociation transformations for floating-point instructions.
This may dramatically change results in floating-point.

My understanding is that these two flags shouldn't imply each other,
as we might have a SDNode that can be reassociated with others, but
not contractble.

eg: We may want following fmul/fad/fsub to freely reassoc, but don't
want fma being generated here.

   %F = fmul reassoc double %A, %B         ; <double> [#uses=1]
   %G = fmul reassoc double %C, %D         ; <double> [#uses=1]
   %H = fadd reassoc double %F, %G         ; <double> [#uses=1]
   %I = fsub reassoc double %H, %E         ; <double> [#uses=1]

Before https://reviews.llvm.org/D45710, `reassoc` flag actually
did not imply isContratable either.

The current implementation also only check the flag in fadd node,
ignoring fmul node, this patch update that as well.

Reviewed By: spatel, qiucf

Differential Revision: https://reviews.llvm.org/D104247
2021-06-21 21:15:43 +00:00
Hendrik Greving
6abba09064 RegisterCoalescer: Fix iterating through use operands.
Fixes a minor bug when trying to iterate through use operands when
updating debug use operands.

Extends a test to include above.

Differential Revision: https://reviews.llvm.org/D104576
2021-06-21 09:17:54 -07:00
Craig Topper
0dab191ced [TypePromotion] Prune Intrinsic includes. NFC
TypePromotion is meant to be a generic pass and doesn't reference
any ARM intrinsics so it shouldn't include IntrinsicsARM.h.
The other Intrinsic related headers appear to be unneeded as well.
2021-06-20 13:04:02 -07:00
Fangrui Song
2beef4520d Simplify some typedef struct 2021-06-19 11:36:44 -07:00
Michael Liao
d21f701c76 [MIRPrinter] Add machine metadata support.
- Distinct metadata needs generating in the codegen to attach correct
  AAInfo on the loads/stores after lowering, merging, and other relevant
  transformations.
- This patch adds 'MachhineModuleSlotTracker' to help assign slot
  numbers to these newly generated unnamed metadata nodes.
- To help 'MachhineModuleSlotTracker' track machine metadata, the
  original 'SlotTracker' is rebased from 'AbstractSlotTrackerStorage',
  which provides basic interfaces to create/retrive metadata slots. In
  addition, once LLVM IR is processsed, additional hooks are also
  introduced to help collect machine metadata and assign them slot
  numbers.
- Finally, if there is any such machine metadata, 'MIRPrinter' outputs
  an additional 'machineMetadataNodes' field containing all the
  definition of those nodes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103205
2021-06-19 12:48:08 -04:00
Hongtao Yu
7fbb587058 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Simon Pilgrim
ccd7a98f97 [DAG] SelectionDAG::computeKnownBits - use APInt::insertBits to merge subvector knownbits. NFCI.
As noticed on D104472 we can use APInt::insertBits which will avoid a lot of temporary APInt creations
2021-06-18 14:59:01 +01:00
Jon Roelofs
921bd72ae7 [GISel] Eliminate redundant bitmasking
This was a GISel vs SDAG regression that showed up at -Os on arm64 in:
SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding.test

https://llvm.godbolt.org/z/aecjodsjG

Differential revision: https://reviews.llvm.org/D103334
2021-06-17 12:53:00 -07:00
David Green
751ee64aee [InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass
The Interleave Access pass will convert shuffle(binop(load, load)) to
binop(shuffle(load), shuffle(load)), in order to create more
interleaving load patterns (VLD2/3/4) that might have been messed up by
instcombine. As shown in D104247 we were missing copying IR flags to the
new instruction though, which should just be kept the same as the
original instruction.

Differential Revision: https://reviews.llvm.org/D104255
2021-06-17 09:53:33 +01:00
Bjorn Pettersson
29ffba4b56 Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705e82a4fe20e2,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Sushma Unnibhavi
900cbf02d0 [M68k][GloballSel] Adding initial GlobalISel infrastructure
Wiring up GlobalISel for the M68k backend

Differential Revision: https://reviews.llvm.org/D101819
2021-06-16 10:48:38 -06:00
David Spickett
0a8120a8f5 [llvm][AArch64] Handle arrays of struct properly (from IR)
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46996

One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)

This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
  ret [ 1 x %T1 ] zeroinitializer
}
```

We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.

This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.

You get this:
```
  renamable $d0 = FMOVD0
  $w0 = COPY killed renamable $d0
```

Rather than this:
```
  $d0 = FMOVD0
  $w0 = COPY $wzr
```

The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.

This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.

Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.

New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104123
2021-06-16 13:56:01 +00:00
Dylan Fleming
17f5ba8d25 [SVE] Fix PromoteIntRes_TRUNCATE not to call getVectorNumElements
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D104115
2021-06-16 13:09:43 +01:00
Rong Xu
022ca8be28 [SampleFDO] Place the discriminator flag variable into the used list.
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.

This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.

Here I add the variable to the use list in IR level's AddDiscriminators
pass. The machine level code is still keep in the case IR's
AddDiscriminators is not invoked. If this is the case, this just use
-Wl,--export-dynamic-symbol=__llvm_fs_discriminator__
to force the emit.

Differential Revision: https://reviews.llvm.org/D103988
2021-06-15 21:51:04 -07:00
Rong Xu
17842d79ed Revert "[SampleFDO] Using common linkage for the discriminator flag variable"
This reverts commit 434fed5aff5e62460e2e984c7cc2674c12779b1e.

Post commit review suggested to use another implmenentation.
Detailed can be found in the review.
2021-06-15 21:22:23 -07:00
Rong Xu
42c862d6e9 [SampleFDO] Using common linkage for the discriminator flag variable
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.

This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.

In this patch, we use a "common" linkage for this variable so that
it will be GC'ed by the linker.

Differential Revision: https://reviews.llvm.org/D103988
2021-06-15 14:51:27 -07:00
Roman Lebedev
8401a57c05 [TLI] SimplifyDemandedVectorElts(): handle SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(?, 0))
Iff we have `SCALAR_TO_VECTOR` (and we demand it's only defined 0'th element),
and said scalar was produced by `EXTRACT_VECTOR_ELT` from the 0'th element
of some vector, then we can just continue traversal into said source vector.

This comes up in X86 vector uniform shift lowering.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104250
2021-06-14 23:52:53 +03:00
Saleem Abdulrasool
c92efb7d33 SelectionDAG: repair the Windows build
6e5628354e22f3ca40b04295bac540843b8e6482 regressed the Windows build as
the return type no longer matched in both branches for the return value
type deduction.  This uses a bit more compiler magic to deal with that.
2021-06-14 08:25:36 -07:00
zhijian
ad7e1ecf68 [AIX][XCOFF] emit vector info of traceback table.
Summary:

emit vector info of traceback table.

Reviewers: Jason Liu,Hubert Tong
Differential Revision: https://reviews.llvm.org/D93659
2021-06-14 11:15:22 -04:00
Roman Lebedev
7a71822528 [NFC][DAGCombine] Extract getFirstIndexOf() lambda back into a function
Not all supported compilers like such lambdas, at least one buildbot is unhappy.
2021-06-14 16:25:59 +03:00
Roman Lebedev
9f4eaf3945 [DAGCombine] reduceBuildVecToShuffle(): sort input vectors by decreasing size
The sorting, obviously, must be stable, else we will have random assembly fluctuations.

Apparently there was no test coverage that would benefit from that,
so i've added one test.

The sorting consists of two parts - just sort the input vectors,
and recompute the shuffle mask -> input vector mapping.
I don't believe we need to do anything else.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104187
2021-06-14 16:18:37 +03:00
Jeroen Dobbelaere
c08eaddde6 Intrinsic::getName: require a Module argument
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.

This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.

Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99173
2021-06-14 14:52:29 +02:00
RamNalamothu
a2306da6e0 Implement DW_CFA_LLVM_* for Heterogeneous Debugging
Add support in MC/MIR for writing/parsing, and DebugInfo.

This is part of the Extensions for Heterogeneous Debugging defined at
https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html

Specifically the CFI instructions implemented here are defined at
https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#cfa-definition-instructions

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D76877
2021-06-14 08:51:50 +05:30
Simon Pilgrim
20d33f98a5 RegUsageInfoPropagate.cpp - remove unused <string> and <map> includes. NFCI. 2021-06-13 15:19:24 +01:00
Florian Hahn
f2662d35c8 Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB"
This reverts commit 1b748faf2bae246e2fc77d88420df13c2e60f4df because it
breaks building the llvm-test-suite with -verify-machineinstrs on X86:
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9585/

Running llc -verify-machineinstr on X86 crashes on the IR below:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

    %struct.widget = type { i32, i32, i32, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [16 x [16 x i16]], [6 x [32 x i32]], [16 x [16 x i32]], [4 x [12 x [4 x [4 x i32]]]], [16 x i32], i8**, i32*, i32***, i32**, i32, i32, i32, i32, %struct.baz*, %struct.wobble.1*, i32, i32, i32, i32, i32, i32, %struct.quux.2*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32***, i32***, i32****, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [3 x [2 x i32]], i32, i32, i64, i64, %struct.zot.3, %struct.zot.3, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.baz = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.snork*, %struct.wombat.0*, %struct.wobble*, i32, i32*, i32*, i32*, i32, i32*, i32*, i32*, i32 (%struct.widget*, %struct.eggs*)*, i32, i32, i32, i32 }
    %struct.snork = type { %struct.spam*, %struct.zot, i32 (%struct.wombat*, %struct.widget*, %struct.snork*)* }
    %struct.spam = type { i32, i32, i32, i32, i8*, i32 }
    %struct.zot = type { i32, i32, i32, i32, i32, i8*, i32* }
    %struct.wombat = type { i32, i32, i32, i32, i32, i32, i32, i32, void (i32, i32, i32*, i32*)*, void (%struct.wombat*, %struct.widget*, %struct.zot*)* }
    %struct.wombat.0 = type { [4 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] }
    %struct.quux = type { i16, i8 }
    %struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] }
    %struct.eggs = type { [1000 x i8], [1000 x i8], [1000 x i8], i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.wobble.1 = type { i32, [2 x i32], i32, i32, %struct.wobble.1*, %struct.wobble.1*, i32, [2 x [4 x [4 x [2 x i32]]]], i32, i64, i64, i32, i32, [4 x i8], [4 x i8], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.quux.2 = type { i32, i32, i32, i32, i32, %struct.quux.2* }
    %struct.zot.3 = type { i64, i16, i16, i16 }

    define void @blam(%struct.widget* %arg, i32 %arg1) local_unnamed_addr {
    bb:
      %tmp = load i32, i32* undef, align 4
      %tmp2 = sdiv i32 %tmp, 6
      %tmp3 = sdiv i32 undef, 6
      %tmp4 = load i32, i32* undef, align 4
      %tmp5 = icmp eq i32 %tmp4, 4
      %tmp6 = select i1 %tmp5, i32 %tmp3, i32 %tmp2
      %tmp7 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* undef, i64 0, i64 0, i64 0
      %tmp8 = zext i16 undef to i32
      %tmp9 = zext i16 undef to i32
      %tmp10 = load i16, i16* undef, align 2
      %tmp11 = zext i16 %tmp10 to i32
      %tmp12 = zext i16 undef to i32
      %tmp13 = zext i16 undef to i32
      %tmp14 = zext i16 undef to i32
      %tmp15 = load i16, i16* undef, align 2
      %tmp16 = zext i16 %tmp15 to i32
      %tmp17 = zext i16 undef to i32
      %tmp18 = sub nsw i32 %tmp8, %tmp9
      %tmp19 = shl nsw i32 undef, 1
      %tmp20 = add nsw i32 %tmp19, %tmp18
      %tmp21 = sub nsw i32 %tmp11, %tmp12
      %tmp22 = shl nsw i32 undef, 1
      %tmp23 = add nsw i32 %tmp22, %tmp21
      %tmp24 = sub nsw i32 %tmp13, %tmp14
      %tmp25 = shl nsw i32 undef, 1
      %tmp26 = add nsw i32 %tmp25, %tmp24
      %tmp27 = sub nsw i32 %tmp16, %tmp17
      %tmp28 = shl nsw i32 undef, 1
      %tmp29 = add nsw i32 %tmp28, %tmp27
      %tmp30 = sub nsw i32 %tmp20, %tmp29
      %tmp31 = sub nsw i32 %tmp23, %tmp26
      %tmp32 = shl nsw i32 %tmp30, 1
      %tmp33 = add nsw i32 %tmp32, %tmp31
      store i32 %tmp33, i32* undef, align 4
      %tmp34 = mul nsw i32 %tmp31, -2
      %tmp35 = add nsw i32 %tmp34, %tmp30
      store i32 %tmp35, i32* undef, align 4
      %tmp36 = select i1 %tmp5, i32 undef, i32 undef
      br label %bb37

    bb37:                                             ; preds = %bb
      %tmp38 = load i32, i32* undef, align 4
      %tmp39 = ashr i32 %tmp38, %tmp6
      %tmp40 = load i32, i32* undef, align 4
      %tmp41 = sdiv i32 %tmp39, %tmp40
      store i32 %tmp41, i32* undef, align 4
      ret void
    }
2021-06-12 11:41:38 +01:00
Arthur Eubanks
5f73af13bd [NFC][OpaquePtr] Explicitly pass GEP source type in optimizeGatherScatterInst()
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103480
2021-06-11 11:49:59 -07:00
Matt Arsenault
71dc8a693a GlobalISel: Reduce indentation and remove dead path 2021-06-11 13:45:24 -04:00
Matt Arsenault
f133d28419 CodeGen: Fix missing const 2021-06-11 13:45:24 -04:00
Tomas Matheson
bb84fe4048 [CodeGen][regalloc] Don't align stack slots if the stack can't be realigned
Register allocation may spill virtual registers to the stack, which can
increase alignment requirements of the stack frame. If the the function
did not require stack realignment before register allocation, the
registers required to do so may not be reserved/available. This results
in a stack frame that requires realignment but can not be realigned.

Instead, only increase the alignment of the stack if we are still able
to realign.

The register SpillAlignment will be ignored if we can't realign, and the
backend will be responsible for emitting the correct unaligned loads and
stores. This seems to be the assumed behaviour already, e.g.
ARMBaseInstrInfo::storeRegToStackSlot and X86InstrInfo::storeRegToStackSlot
are both `canRealignStack` aware.

Differential Revision: https://reviews.llvm.org/D103602
2021-06-11 16:49:12 +01:00
Simon Pilgrim
165132af1b [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Carl Ritson
15d8bd80ce [ValueTypes] Define MVTs for v6i32, v6f32, v7i32, v7f32
For use in AMDGPU selection DAG.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103881
2021-06-11 08:58:16 +09:00
Carl Ritson
03819dac19 [SDAG] Fix pow2 assumption when splitting vectors
When reducing vector builds to shuffles it possible that
the DAG combiner may try to extract invalid subvectors.

This happens as the existing code assumes vectors will be power
of 2 sizes, which is already untrue, but becomes more noticable
with v6 and v7 types.
Specifically the existing code assumes that half PowerOf2Ceil of
a given vector index will fit twice into a given vector.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103880
2021-06-11 08:58:16 +09:00
Wolfgang Pieb
249a0bf5b1 [static initializers] Emit global_ctors and global_dtors in reverse order when .ctors/.dtors are used.
Reviewed By: rnk, MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D103495
2021-06-10 16:44:47 -07:00
Nick Desaulniers
e9e1661fa2 [IR] make -warn-frame-size into a module attr
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.

-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.

Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.

Reviewed By: rsmith, qcolombet

Differential Revision: https://reviews.llvm.org/D103928
2021-06-10 16:15:27 -07:00
David Spickett
17058150e4 Revert "Implementation of global.get/set for reftypes in LLVM IR"
This reverts commit 31859f896cf90d64904134ce7b31230f374c3fcc.

Causing SVE and RISCV-V test failures on bots.
2021-06-10 10:11:17 +00:00
Paulo Matos
fb9c8fd3dd Implementation of global.get/set for reftypes in LLVM IR
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D95425
2021-06-10 10:07:45 +02:00
Jinsong Ji
34315a15e8 [AIX] Add traceback ssp canary bit support
We will need to set the ssp canary bit in traceback table to communicate
with unwinder about the canary.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D103202
2021-06-10 02:40:02 +00:00
Sanjay Patel
56083624ee [SDAG] fix miscompile from merging stores of different sizes
As shown in:
https://llvm.org/PR50623
...and the similar tests here, we were not accounting for
store merging of different sizes that do not cover the
entire range of the wide value to be stored.

This is the easy fix: just make sure that all of the
original stores are the same size, so when we calculate
the wide width, it's a simple N * M check.

This still allows all of the motivating optimizations from:
D86420 / 54a5dd485c4d
D87112 / 7a06b166b1af

We could enhance this code to track individual bytes and
allow merging multiple sizes.
2021-06-09 09:51:39 -04:00
Fraser Cormack
cb0fa6245f [ValueTypes][RISCV] Cap RVV fixed-length vectors by size
This patch changes RVV's policy for its supported list of fixed-length
vector types by capping by vector size rather than element count. Now
all 1024-byte vectors (of supported element types) are supported, rather
than all 256-element vectors.

This is a more natural fit for the architecture, and allows us to, for
example, improve the support for vector bitcasts.

This change necessitated the adding of some new simple types to avoid
"regressing" on the number of currently-supported vectors. We round out
the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and
`v512f16`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103884
2021-06-09 12:15:37 +01:00
Matt Arsenault
fca6ba66d2 GlobalISel: Avoid use of G_INSERT in insertParts
G_INSERT legalization is incomplete and doesn't work very
well. Instead try to use sequences of G_MERGE_VALUES/G_UNMERGE_VALUES
padding with undef values (although this can get pretty large).

For the case of load/store narrowing, this is still performing the
load/stores in irregularly sized pieces. It might be cleaner to split
this down into equal sized pieces, and rely on load/store merging to
optimize it.
2021-06-08 14:44:24 -04:00
Matt Arsenault
96cc3e31f6 GlobalISel: Hide virtual register creation in MIRBuilder 2021-06-08 14:44:24 -04:00
Nick Desaulniers
42052632ff reland [IR] make -stack-alignment= into a module attr
Relands commit 433c8d950cb3a1fa0977355ce0367e8c763a3f13 with fixes for
MIPS.

Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 10:59:46 -07:00
Justin Bogner
521d48da10 [GlobalISel] Handle non-multiples of the base type in narrowScalarAddSub
When narrowing G_ADD and G_SUB, handle types that aren't a multiple of
the type we're narrowing to. This allows us to handle types like s96
on 64 bit targets.

Note that the test here has a couple of dead instructions because of
the way the setup legalizes. I wasn't able to come up with a way to
write this test that avoids that easily.

Differential Revision: https://reviews.llvm.org/D97811
2021-06-08 10:13:38 -07:00
Justin Bogner
5328161b2b [GlobalISel] Handle non-multiples of the base type in narrowScalarInsert
When narrowing G_INSERT, handle types that aren't a multiple of the
type we're narrowing to. This comes up if we're narrowing something
like an s96 to fit in 64 bit registers and also for non-byte multiple
packed types if they come up.

This implementation handles these cases by extending the extra bits to
the narrow size and truncating the result back to the destination
size.

Differential Revision: https://reviews.llvm.org/D97791
2021-06-08 10:13:38 -07:00
Simon Pilgrim
1fe60d3f0a InstrEmitter.cpp - don't dereference a dyn_cast<>.
dyn_cast<> can return nullptr which we would then dereference - use cast<> which will assert that the type is correct.
2021-06-08 17:59:04 +01:00
Nick Desaulniers
579f298a64 Revert "[IR] make -stack-alignment= into a module attr"
This reverts commit 433c8d950cb3a1fa0977355ce0367e8c763a3f13.

Breaks the MIPS build.
2021-06-08 08:55:50 -07:00
Nick Desaulniers
5c936095e3 [IR] make -stack-alignment= into a module attr
Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 08:31:04 -07:00
Simon Pilgrim
8eda87aacf [DAG] foldShuffleOfConcatUndefs - ensure shuffles of upper (undef) subvector elements is undef (PR50609)
shuffle(concat(x,undef),concat(y,undef)) -> concat(shuffle(x,y),shuffle(x,y))

If the original shuffle references any of the upper (undef) subvector elements, ensure the split shuffle masks uses undef instead of an out-of-bounds value.

Fixes PR50609
2021-06-08 15:49:41 +01:00
Hans Wennborg
5697956ae9 Revert "3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.

This reverts commit 36ec97f76ac0d8be76fb16ac521f55126766267d.

The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.

Reverting to unbreak people's builds until that can be addressed.

This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d80698c53ceaa64bbe6e3b7e7bbf99ee.
2021-06-08 14:54:08 +02:00
Kerry McLaughlin
d259f6577a [CostModel] Return an invalid cost for memory ops with unsupported types
Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector
type cannot be legalized by widening/splitting. When this is the method of legalization
found, getTypeLegalizationCost will return an Invalid cost.

The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call
getTypeLegalizationCost and will now also return an Invalid cost for unsupported types.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102515
2021-06-08 12:07:36 +01:00
David Green
a2815f65e2 [DAG] Allow isNullOrNullSplat to see truncated zeroes
This sets the AllowTruncation flag on isConstOrConstSplat in
isNullOrNullSplat, allowing it to see truncated constant zeroes on
architectures such as AArch64, where only a i32.i64 are legal. As a
truncation of 0 is always 0, this should always be valid, allowing some
extra folding to happen including some of the cases from D103755.

Differential Revision: https://reviews.llvm.org/D103756
2021-06-08 10:18:58 +01:00
Arthur Eubanks
4cc1a31398 Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry"
Needs to be discussed more.

This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
2021-06-07 16:07:44 -07:00
Matt Arsenault
eb4a9081cb GlobalISel: Use MMO helper for getting the size in bits 2021-06-07 14:26:48 -04:00
Matt Arsenault
81260d0120 GlobalISel: Remove unnecessary .getReg(0)s 2021-06-07 14:26:48 -04:00
Guillaume Chatelet
517914e3e7 [NFC] Fix semantic discrepancy for MVT::LAST_VALUETYPE
Differential Revision: https://reviews.llvm.org/D103251
2021-06-07 10:04:16 +00:00
Nikita Popov
6fa17c5cd4 [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
Nikita Popov
435f6cc870 [TargetLowering] Move methods out of line (NFC)
Move methods using IRBuilder out of line, so we can drop the
dependency on the header.
2021-06-06 16:02:10 +02:00
Nikita Popov
a84c0ba160 [CodeGen] Add missing includes (NFC)
These currently rely on the IRBuilder.h include in TargetLowering.h.
Make them explicit.
2021-06-06 15:48:27 +02:00
Mirko Brkusanin
2afa995d4a [AMDGPU][GlobalISel] Legalize G_ABS
Legalize and select G_ABS so that we can use llvm.abs intrinsic

Differential Revision: https://reviews.llvm.org/D102391
2021-06-04 14:46:43 +02:00
Jeremy Morse
f08c337615 Re-land ae4303b42c, "Track PHI values through register coalescing"
Was reverted in 0507fc2ffc9, in phi-coalesce-subreg.mir I'd explicitly named
some passes to run instead of specifying a range. As a result some
two-address-instrs weren't correctly rewritten and the verifier got upset.
Original commit message:

[DebugInstrRef][2/3] Track PHI values through register coalescing

In the instruction referencing variable location model, we store variable
locations that point at PHIs in MachineFunction during register allocation.
Unfortunately, register coalescing can substantially change the locations
of registers, and so that PHI-variable-location side table needs
maintenence during the pass.

This patch builds an index from the side table, and whenever a vreg gets
coalesced into another vreg, update the index to record the new vreg that
the PHI happens in. It also accepts a limited range of subregister
coalescing, for example merging a subregister into a larger class.

Differential Revision: https://reviews.llvm.org/D86813
2021-06-04 11:32:02 +01:00
Fraser Cormack
5ee63b3d02 [SelectionDAG] Extend FoldConstantVectorArithmetic to SPLAT_VECTOR
This patch extends the SelectionDAG's ability to constant-fold vector
arithmetic to include support for SPLAT_VECTOR. This is not only for
scalable-vector types but also for fixed-length vector types, which
helps Hexagon in a couple of cases.

The original RISC-V test case was in fact an infinite DAGCombine loop.
The pattern `and (truncate v1), (truncate v2)` can be combined to
`truncate (and v1, v2)` but the truncate can similarly be combined back
to `truncate (and v1, v2)` (but, crucially, only when one of `v1` or
`v2` is a constant vector).

It wasn't exposed in on fixed-length types because a TRUNCATE of a
constant BUILD_VECTOR was folded into the BUILD_VECTOR itself, whereas
this did not happen for the equivalent (scalable-vector) SPLAT_VECTOR.

Reviewed By: RKSimon, craig.topper

Differential Revision: https://reviews.llvm.org/D103246
2021-06-04 09:53:15 +01:00
Esme-Yi
914b5eeb20 [Debug-Info] handle DW_CC_pass_by_value/DW_CC_pass_by_reference under strict DWARF.
Summary: When -strict-dwarf=true is specified, the calling convention info
    DW_CC_pass_by_value or DW_CC_pass_by_reference can only be generated at DWARF5.

Reviewed By: shchenz, dblaikie

Differential Revision: https://reviews.llvm.org/D103300
2021-06-04 08:14:47 +00:00
Arthur Eubanks
b01ec7e228 [TargetLowering] Only inspect attributes in the arguments for ArgListEntry
Parameter attributes are considered part of the function [1], and like
mismatched calling conventions [2], we can't have the verifier check for
mismatched parameter attributes.

Issues can be diagnosed with D103412.

[1] https://llvm.org/docs/LangRef.html#parameter-attributes
[2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D101806
2021-06-03 15:52:01 -07:00
Brendon Cahoon
f5ff020d9c [GlobalISel] Add G_SBFX/G_UBFX to computeKnownBits
Differential Revision: https://reviews.llvm.org/D102969
2021-06-03 16:01:47 -04:00
Eli Friedman
3474ad7aad [AtomicExpand] Merge cmpxchg success and failure ordering when appropriate.
If we're not emitting separate fences for the success/failure cases, we
need to pass the merged ordering to the target so it can emit the
correct instructions.

For the PowerPC testcase, we end up with extra fences, but that seems
like an improvement over missing fences.  If someone wants to improve
that, the PowerPC backed could be taught to emit the fences after isel,
instead of depending on fences emitted by AtomicExpand.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33332 .

Differential Revision: https://reviews.llvm.org/D103342
2021-06-03 11:34:35 -07:00
Nikita Popov
1c866d4e4f [ADT] Move DenseMapInfo for ArrayRef/StringRef into respective headers (NFC)
This is a followup to D103422. The DenseMapInfo implementations for
ArrayRef and StringRef are moved into the ArrayRef.h and StringRef.h
headers, which means that these two headers no longer need to be
included by DenseMapInfo.h.

This required adding a few additional includes, as many files were
relying on various things pulled in by ArrayRef.h.

Differential Revision: https://reviews.llvm.org/D103491
2021-06-03 18:34:36 +02:00
Jeremy Morse
5cea2689df Revert "[DebugInstrRef][2/3] Track PHI values through register coalescing"
This reverts commit ae4303b42cfa5c8c14e3fff67d73af2f154aea9a.

Expensive checks buildbot has found a problem with this:

  https://lab.llvm.org/buildbot/#/builders/16/builds/11863
2021-06-03 17:16:58 +01:00
Jeremy Morse
8e310e3825 [DebugInstrRef][2/3] Track PHI values through register coalescing
In the instruction referencing variable location model, we store variable
locations that point at PHIs in MachineFunction during register
allocation. Unfortunately, register coalescing can substantially change
the locations of registers, and so that PHI-variable-location side table
needs maintenence during the pass.

This patch builds an index from the side table, and whenever a vreg gets
coalesced into another vreg, update the index to record the new vreg that
the PHI happens in. It also accepts a limited range of subregister
coalescing, for example merging a subregister into a larger class.

Differential Revision: https://reviews.llvm.org/D86813
2021-06-03 17:06:51 +01:00
Fraser Cormack
69bae66509 [CodeGen] Fix a scalable-vector crash in VSELECT legalization
The `DAGTypeLegalizer::WidenVSELECTMask` function is not (yet) ready for
scalable vector types, and has numerous places in which it tries to grab
either the fixed size or number of elements of its types.

I believe that it should be possible to update this method to properly
account for scalable-vector types, but we don't have test cases for
that; RISC-V bails out early on as it has legal i1 vector masks. As
such, this patch just prevents it from crashing.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103536
2021-06-03 10:24:55 +01:00
Fraser Cormack
5419f2ccbb [ValueTypes] Fix scalable-vector changeExtendedVectorTypeToInteger
The attached tests check for the regression in DAGCombiner's
`visitVSELECT`, which may call this method.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103534
2021-06-03 09:36:56 +01:00
Sanjay Patel
e3cc7798ec [SDAG] allow cast folding for vector sext-of-setcc with signed compare
This extends 434c8e013a2c and ede3982792df to handle signed
predicates by sign-extending the setcc operands.

This is not shown directly in https://llvm.org/PR50055 ,
but the pattern is visible by changing the unsigned convert
to signed in the source code.
2021-06-02 15:05:02 -04:00
Rong Xu
f505b894a2 [SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part)
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is mainly for ProfileData part of change. It will load
FS Profile when such profile is detected. For an extbinary format profile,
create_llvm_prof tool will add a flag to profile summary section.
For other format profiles, the users need to use an internal option
(-profile-isfs) to tell the compiler that the profile uses FS discriminators.

This patch also simplified the bit API used by FS discriminators.

Differential Revision: https://reviews.llvm.org/D103041
2021-06-02 10:32:52 -07:00
Sanjay Patel
bb75641fb0 [SDAG] allow more cast folding for vector sext-of-setcc
This is a follow-up to D103280 that eases the use restrictions,
so we can handle the motivating case from:
https://llvm.org/PR50055

The loop code is adapted from similar use checks in
ExtendUsesToFormExtLoad() and SliceUpLoad(). I did not see an
easier way to filter out non-chain uses of load values.

Differential Revision: https://reviews.llvm.org/D103462
2021-06-02 13:14:49 -04:00
Bjorn Pettersson
e9cfe99a84 [CodeGen] Refactor libcall lookups for RTLIB::POWI_*
Use RuntimeLibcalls to get a common way to pick correct RTLIB::POWI_*
libcall for a given value type.

This includes a small refactoring of ExpandFPLibCall and
ExpandArgFPLibCall in SelectionDAGLegalize to share a bit of code,
plus adding an ExpandFPLibCall version that can be called directly
when expanding FPOWI/STRICT_FPOWI to ensure that we actually use
the same RTLIB::Libcall when expanding the libcall as we used when
checking the legality of such a call by doing a getLibcallName check.

Differential Revision: https://reviews.llvm.org/D103050
2021-06-02 11:40:34 +02:00
Bjorn Pettersson
1603844caa [LegalizeTypes] Avoid promotion of exponent in FPOWI
The FPOWI DAG node is normally lowered to a libcall to one of the
RTLIB::POWI* runtime functions and the exponent should normally
have a type matching sizeof(int) when making the call. Thus,
type promotion of the exponent could lead to an FPOWI with a type
for the second operand that would be incorrect when doing the
libcall (a situation which would be hard to detect post-legalization
if we allow such FPOWI nodes).

This patch is changing DAGTypeLegalizer::PromoteIntOp_FPOWI to
do the rewrite into a libcall directly instead of promoting the
operand. This way we can check that the exponent is smaller than
sizeof(int) and we can let TargetLowering handle promotion as
part of making the libcall. It could be noticed here that makeLibCall
has some knowledge about targets such as 64-bit RISCV, for which the
libcall argument should be extended to a type larger than sizeof(int).

Differential Revision: https://reviews.llvm.org/D102950
2021-06-02 11:40:34 +02:00
Sriraman Tallam
f1e4168f92 Resubmit D85085 after fixing the tests that were failing.
D85085 was pushed earlier but broke tests on mac and win:
http://lab.llvm.org:8080/green/job/clang-stage1-RA/21182/consoleFull#-706149783d489585b-5106-414a-ac11-3ff90657619c

Recommitting it after adding mtriple to the llc commands.

Emit correct location lists with basic block sections.

This patch addresses multiple things:

1) It ensures that const_value is emitted when possible with basic block
   sections.
2) It emits location lists such that the labels are always within the
   section boundary.
3) It fixes a bug when the parameter is first used in a non-entry block
   which is in a different section from the entry block.

Differential Revision: https://reviews.llvm.org/D85085
2021-06-01 21:59:47 -07:00
Daniel Sanders
8c8a0a3167 fixup: Missing operator in [globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one
My local compiler was fine with it but the bots complain about ambiguous types.
2021-06-01 13:58:03 -07:00
Daniel Sanders
71a22fb7f8 [globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one
It's still in use in a few places so we can't delete it yet but there's not
many at this point.

Differential Revision: https://reviews.llvm.org/D103352
2021-06-01 13:23:48 -07:00
Jessica Paquette
852a8449e7 [GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width
Also add a target hook which allows us to get around custom legalization on
AArch64.

Differential Revision: https://reviews.llvm.org/D99283
2021-06-01 10:56:17 -07:00
Guozhi Wei
b2dfe60e88 [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB
This patch transforms the sequence

    lea (reg1, reg2), reg3
    sub reg3, reg4

to two sub instructions

    sub reg1, reg4
    sub reg2, reg4

Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register of ADD).

Differential Revision: https://reviews.llvm.org/D101970
2021-06-01 10:31:30 -07:00
Sanjay Patel
5888c31732 [SDAG] add helper function for sext-of-setcc folds; NFC
Try to make this easier to read as noted in D103280
2021-06-01 08:07:17 -04:00
Arthur Eubanks
8f3353aa63 [OpaquePtr] Remove some uses of PointerType::getElementType() 2021-05-31 16:11:25 -07:00
Arthur Eubanks
cfc7e26853 [OpaquePtr] Clean up some uses of Type::getPointerElementType()
These depend on pointee types.
2021-05-31 09:54:57 -07:00
Arthur Eubanks
7178161bf7 Remove "Rewrite Symbols" from codegen pipeline
It breaks up the function pass manager in the codegen pipeline.

With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.

Add a check that we only have one function pass manager in the codegen
pipeline.

Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.

addr-label.ll crashes on ARM due to this change. This is because a
ARMConstantPoolConstant containing a BasicBlock to represent a
blockaddress may hold an invalid pointer to a BasicBlock if the
blockaddress is invalidated by its BasicBlock getting removed. In that
case all referencing blockaddresses are RAUW a constant int. Making
ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure
that's the right fix. As a workaround, create a barrier right before
ISel so that IR optimizations can't happen while a
ARMConstantPoolConstant has been created.

Reviewed By: rnk, MaskRay, compnerd

Differential Revision: https://reviews.llvm.org/D99707
2021-05-31 08:32:36 -07:00
Sanjay Patel
4f81b668e8 [SDAG] add check to sext-of-setcc fold to bypass changing a legal op
I accidentaly pushed a draft of D103280 that was discussed
during the review, but it was not supposed to be the final
version.

Rather than revert and recommit, I'm updating the existing
code. This way we have a record of the codegen diff that
would result if we decide to remove this predicate in the
future.
2021-05-31 08:58:11 -04:00
Sanjay Patel
dceff7ed2b [SDAG] try harder to fold casts into vector compare
sext (vsetcc X, Y) --> vsetcc (zext X), (zext Y) --
(when the zexts are free and a bunch of other conditions)

We have a couple of similar folds to this already for vector selects,
but this pattern slips through because it is only a setcc.

The tests are based on the motivating case from:
https://llvm.org/PR50055
...but we need extra logic to get that example, so I've left that as
a TODO for now.

Differential Revision: https://reviews.llvm.org/D103280
2021-05-31 07:14:01 -04:00
Djordje Todorovic
c362f0a17e [LiveDebugVariables] Stop trimming locations of non-inlined vars
The D35953, D62650 and D73691 introduced trimming of variables locations
in LiveDebugVariables pass, since there are some cases where after
the virtregrewrite we have exploded number of DBG_VALUEs created for some
inlined variables. As it looks, all problematic cases were regarding
inlined variables, so it seems reasonable to stop trimming the location
ranges for non-inlined variables.
It has very good impact on the llvm-locstats report.

Differential Revision: https://reviews.llvm.org/D102917
2021-05-31 02:59:19 -07:00
Florian Hahn
0fab9e3072 [DAGCombine] Poison-prove scalarizeExtractedVectorLoad.
extractelement is poison if the index is out-of-bounds, so just
scalarizing the load may introduce an out-of-bounds load, which is UB.

To avoid introducing new UB, we can mask the index so it only contains
valid indices.

Fixes PR50382.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D103077
2021-05-30 11:40:55 +01:00
Pengxuan Zheng
e434a8e756 [SafeStack] Use proper API to get stack guard
Using the proper API automatically sets `__stack_chk_guard` to `dso_local` if
`Reloc::Static`. This wasn't strictly necessary until recently when dso_local was
no longer implied by `TargetMachine::shouldAssumeDSOLocal` for
`__stack_chk_guard`. By using the proper API, we can avoid generating unnecessary
GOT relocations.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102646
2021-05-30 00:52:48 -07:00
Arthur Eubanks
f9c1930dea Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry"
This reverts commit 1c7f32334d4becc725b9025fd32291a0e5729acd.

Some code still needs to properly set parameter ABI attributes, see
D101806.
2021-05-29 23:08:15 -07:00
Arthur Eubanks
85767d0682 Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering"
This reverts commit bc7d15c61da78864b35e3c114294d6e4db645611.

Dependent change is to be reverted.
2021-05-29 22:40:33 -07:00
LemonBoy
f2c842c94a [AtomicExpandPass][AArch64] Promote xchg with floating-point types to integer ones
Follow the same strategy used for atomic loads/stores by converting the operands to equally-sized integer types.
This change prevents the atomic expansion pass from generating illegal LL/SC pairs when targeting AArch64: `expand-atomicrmw-xchg-fp.ll` would previously instantiate intrinsics such as `llvm.aarch64.ldaxr.p0f32` that cannot be lowered.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D103232
2021-05-29 08:57:27 +02:00
Eli Friedman
1638fc9086 [AArch64][RISCV] Make sure isel correctly honors failure orderings.
If a cmpxchg specifies acquire or seq_cst on failure, make sure we
generate code consistent with that ordering even if the success ordering
is not acquire/seq_cst.

At one point, it was ambiguous whether this sort of construct was valid,
but the C++ standad and LLVM now accept arbitrary combinations of
success/failure orderings.

This doesn't address the corresponding issue in AtomicExpand. (This was
reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .)

Fixes https://bugs.llvm.org/show_bug.cgi?id=50512.

Differential Revision: https://reviews.llvm.org/D103284
2021-05-28 12:47:40 -07:00
Craig Topper
22fc6f8fbe [VP] Make getMaskParamPos/getVectorLengthParamPos return unsigned. Lowercase function names.
Parameter positions seem like they should be unsigned.

While there, make function names lowercase per coding standards.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D103224
2021-05-28 11:28:47 -07:00
Craig Topper
7d56e782b7 [SelectionDAG] Fix typo in assert. NFC 2021-05-28 10:37:11 -07:00
Tim Northover
859ff3505c SwiftTailCC: teach verifier musttail rules applicable to this CC.
SwiftTailCC has a different set of requirements than the C calling convention
for a tail call. The exact argument sequence doesn't have to match, but fewer
ABI-affecting attributes are allowed.

Also make sure the musttail diagnostic triggers if a musttail call isn't
actually a tail call.
2021-05-28 11:12:00 +01:00
Amara Emerson
28e85e594b [AArch64][GlobalISel] Legalize oversize G_EXTRACT_VECTOR_ELT sources.
Also changes the fewerElements helper to use the lookthrough constant helper
instead of m_ICst, since m_ICst doesn't look through extends.

Differential Revision: https://reviews.llvm.org/D103227
2021-05-27 23:52:24 -07:00
Matt Arsenault
6a84ed648e GlobalISel: Do not change register types in lowerLoad
Adjusting the load register type is a widenScalar type action, not a
lowering. lowerLoad should be reserved for operations that change the
memory access size, such as unaligned load decomposition. With this
trying to adjust the register type, it was hard to avoid infinite
loops in the legalizer. Adds a bandaid to avoid regressing a few
AArch64 tests, but I'm not sure what the exact condition is and
there's probably a cleaner way to do this.

For AMDGPU this regresses handling of some cases for unaligned loads,
but the way this is currently working is a pretty ugly hack.
2021-05-27 11:49:37 -04:00
Nico Weber
7a5ca2da76 Revert "Emit correct location lists with basic block sections."
Breaks check-llvm on non-linux, see comments on https://reviews.llvm.org/D85085
This reverts commit caae570978c490a137921b9516162a382831209e
and follow-up commit 1546c52d971292ed4145b6d41aaca0d02229ebff.
2021-05-27 11:42:04 -04:00
Matt Arsenault
776d715ebe VirtRegMap: Preserve LiveDebugVariables
This avoids recomputing it between regalloc runs when allocation is
split, and also avoids a debug info test regression.
2021-05-27 10:40:14 -04:00
Fraser Cormack
58a6d02787 [VP][SelectionDAG] Add a target-configurable EVL operand type
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.

This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D103027
2021-05-27 15:27:36 +01:00
Fraser Cormack
5a44777de2 [DAGCombine][RISCV] Don't try to trunc-store combined vector stores
DAGCombine's `mergeStoresOfConstantsOrVecElts` optimization is told
whether it's to use vector types and also whether it's to issue a
truncating store. However, the truncating store code path assumes a
scalar integer `ConstantSDNode`, and when using vector types it creates
either a `BUILD_VECTOR` or `CONCAT_VECTORS` to store: neither of which
is a constant.

The `riscv64` target is able to expose a crash here because it switches
on both code paths at the same time. The `f32` is stored as `i32` which
must be promoted to `i64`, necessitating a truncating store.
It also decides later that it prefers a vector store of `v2f32`.

While vector truncating stores are legal, this combine is not able to
emit them. We also don't have a test case. This patch adds an assert to
catch this case more gracefully, and updates one of the caller functions
to the function to turn off the use of truncating stores when preferring
vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103173
2021-05-27 14:16:32 +01:00
Fraser Cormack
ab12d83795 [SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs
This patch extends the cases in which the legalizer is able to express
VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between
boolean vector types, the mask itself is an all-ones or all-ones value
of the operand type, so a 0/1 boolean type behaves identically to a 0/-1
type.

This greatly helps RISC-V which relies on expansion for these nodes. It
also allows scalable-vector bool VSELECTs to use the default expansion,
where before it would crash in SelectionDAG::UnrollVectorOp.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103147
2021-05-27 10:08:57 +01:00
Amara Emerson
d5383816bc [GlobalISel] Implement splitting of G_SHUFFLE_VECTOR.
Thhis is a port from the DAG legalization. We're still missing some of the
canonicalizations of shuffles but it's a start.

Differential Revision: https://reviews.llvm.org/D102828
2021-05-27 00:28:38 -07:00
Jessica Paquette
d821abe3ce [GlobalISel] Don't emit lost debug location remarks when legalizing tail calls
There were a bunch of lost debug location remarks that show up when legalizing
tail calls on AArch64.

This would happen because we drop the return in the block where we emit the
tail call. So, we end up dropping the debug location, which makes the
LostDebugLocObserver report a missing debug location.

Although it's *true* that we lose these debug locations, this isn't
a particularly useful remark. We expect to drop these debug locations when
emitting tail calls. Suppressing remarks in this case is preferable, since the
amount of noise could hide actual debug location related bugs.

To do this, I just plumbed the LostDebugLocObserver through the relevant
LegalizerHelper functions. This is the only case I can think of where we need
the LostDebugLocObserver in the LegalizerHelper. So, rather than storing it
in the LegalizerHelper proper and mucking around with the constructors, I
figured it'd be cleanest to take the simplest path for now.

This clears up ~20 noisy lost debug location remarks on CTMark in AArch64 at
-Os.

Differential Revision: https://reviews.llvm.org/D103128
2021-05-26 17:16:11 -07:00
Sriraman Tallam
40b2a440c5 Emit correct location lists with basic block sections.
This patch addresses multiple things:

1) It ensures that const_value is emitted when possible with basic block
sections.
2) It emits location lists such that the labels are always within the
section boundary.
3) It fixes a bug when the parameter is first used in a non-entry block
which is in a different section from the entry block.

Differential Revision: https://reviews.llvm.org/D85085
2021-05-26 17:12:31 -07:00
Jeremy Morse
2a2a79e361 [DebugInstrRef][1/3] Track PHI values through register allocation
This patch introduces "DBG_PHI" instructions, a marker of where a PHI
instruction used to be, before PHI elimination. Under the instruction
referencing model, we want to know where every value in the function is
defined -- and a PHI, even if implicit, is such a place.

Just like instruction numbers, we can use this to identify a value to be
used as a variable value, but we don't need to know what instruction
defines that value, for example:

bb1:
   DBG_PHI $rax, 1
   [... more insts ... ]
bb2:
   DBG_INSTR_REF 1, 0, !1234, !DIExpression()

This specifies that on entry to bb1, whatever value is in $rax is known
as value number one -- and the later DBG_INSTR_REF marks the position
where variable !1234 should take on value number one.

PHI locations are stored in MachineFunction for the duration of the
regalloc phase in the DebugPHIPositions map. The map is populated by
PHIElimination, and then flushed back into the instruction stream by
virtregrewriter. A small amount of maintenence is needed in
LiveDebugVariables to account for registers being split, but only for
individual positions, not for entire ranges of blocks.

Differential Revision: https://reviews.llvm.org/D86812
2021-05-26 20:24:00 +01:00
Heejin Ahn
3f66a78716 [WebAssembly] Add TargetInstrInfo::getCalleeOperand
DwarfDebug unconditionally assumes for all call instructions the 0th
operand is the callee operand, which seems to be true for other targets,
but not for WebAssembly. This adds `TargetInstrInfo::getCallOperand`
method whose default implementation returns `getOperand(0)` and makes
WebAssembly overrides it to use its own utility method to get the callee
operand.

This also fixes an existing bug in `WebAssembly::getCalleeOp`, which was
uncovered by this CL.

Reviewed By: dschuff, djtodoro

Differential Revision: https://reviews.llvm.org/D102978
2021-05-26 11:43:59 -07:00
Jonas Paulsson
0a9439773a [SystemZ] Support i128 inline asm operands.
Support virtual, physical and tied i128 register operands in inline assembly.

i128 is on SystemZ not really supported and is not a legal type and generally
such a value will be split into two i64 parts. There are however some
instructions that require a pair of two GPR64 registers contained in the GR128
bit reg class, which is untyped.

For inline assmebly operands, it proved to be very cumbersome to first follow
the general behavior of splitting an i128 operand into two parts and then
later rebuild the INLINEASM MI to have one GR128 register. Instead, some
minor common code changes were made to SelectionDAGBUilder to only create one
GR128 register part to begin with. In particular:

- getNumRegisters() now has an optional parameter "RegisterVT" which is
  passed by AddInlineAsmOperands() and GetRegistersForValue().

- The bitcasting in GetRegistersForValue is not performed if RegVT is
  Untyped.

- The RC for a tied use in AddInlineAsmOperands() is now computed either from
  the tied def (virtual register), or by getMinimalPhysRegClass() (physical
  register).

- InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register
  class (DstRC) can also be computed for an illegal type.

In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and
joinRegisterPartsIntoValue() have been implemented to handle i128 operands.

Differential Revision: https://reviews.llvm.org/D100788

Review: Ulrich Weigand
2021-05-26 10:08:32 -05:00
Tomas Matheson
4b71dc385f [MC][NFCI] Factor out ELF section unique ID calculation
Precursor to D100944. The logic for determining the unique ID had become
quite difficult to reason about, so I have factored this out into a
separate function.

Differential Revision: https://reviews.llvm.org/D102336
2021-05-26 11:51:29 +01:00
Michael Liao
1d1d6e0b78 [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics.
- When memory intrinsics, such as memcpy, the attached scoped AA
  metadata is not passed down to the backend. As a result, the backend
  cannot schedule relevant memory operations around them following that
  hint. In this patch, SelectionDAG is enhanced to propagate that
  metadata (scoped AA only) when they are lowered into loads and stores.

Differential Revision: https://reviews.llvm.org/D102215
2021-05-25 14:42:26 -04:00
Benjamin Kramer
9d856bcc84 [GlobalISel] Silence unused variable warning in Release builds. NFC. 2021-05-25 10:55:29 +02:00
Amara Emerson
afaf301e48 [GlobalISel] Fix MachineIRBuilder not using the DstOp argument for G_SHUFFLE_VECTOR. 2021-05-25 00:43:26 -07:00
Christudasan Devadasan
022be2495f AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100726
2021-05-25 10:51:07 +05:30
Jon Roelofs
66a4976b23 [Remarks] Add analysis remarks for memset/memcpy/memmove lengths
Re-landing now that the crasher this patch previously uncovered has been fixed
in: https://reviews.llvm.org/D102935

Differential revision: https://reviews.llvm.org/D102452
2021-05-24 10:10:44 -07:00
David Green
35e013cb3d [ARM] Allow findLoopPreheader to return headers with multiple loop successors
The findLoopPreheader function will currently not find a preheader if it
branches to multiple different loop headers. This patch adds an option
to relax that, allowing ARMLowOverheadLoops to process more loops
successfully. This helps with WhileLoopStart setup instructions that can
branch/fallthrough to the low overhead loop and to branch to a separate
loop from the same preheader (but I don't believe it is possible for
both loops to be low overhead loops).

Differential Revision: https://reviews.llvm.org/D102747
2021-05-24 12:22:15 +01:00
Philipp Krones
df7a8b162e [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo
This makes it possible for targets to define their own MCObjectFileInfo.
This MCObjectFileInfo is then used to determine things like section alignment.

This is a follow up to D101462 and prepares for the RISCV backend defining the
text section alignment depending on the enabled extensions.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101921
2021-05-23 14:15:23 -07:00
LemonBoy
ca1236f15d [SelectionDAG] Fix argument copy elision with irregular types
D29668 enabled to avoid a useless copy of the argument value into an alloca if the caller places it in memory (as it often happens on x86) by directly forwarding the pointer to it. This optimization is illegal if the type contains padding bytes: if a truncating store into the alloca is replaced the upper bits are filled with garbage and produce code misbehaving at runtime.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D102153
2021-05-22 09:43:37 +02:00
Nick Desaulniers
753fcd644e [IR] make stack-protector-guard-* flags into module attrs
D88631 added initial support for:

- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=

flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.

Link: https://github.com/ClangBuiltLinux/linux/issues/1378

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D102742
2021-05-21 15:53:30 -07:00
Matt Arsenault
c2fa720a37 AMDGPU/GlobalISel: Add subtarget to a test
SelectionDAG forces us to have a weird ABI for 16-bit values without
legal 16-bit operations, but currently GlobalISel bypasses this and
sometimes ends up using the gfx8+ ABI in some contexts. Make sure
we're testing the normal ABI to avoid a test change in a future patch.
2021-05-21 23:57:38 +09:00
Stephen Tozer
15caaeb8e5 3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This reapplies c0f3dfb9, which was reverted following the discovery of
crashes on linux kernel and chromium builds - these issues have since
been fixed, allowing this patch to re-land.

This reverts commit 4397b7095d640f9b9426c4d0135e999c5a1de1c5.
2021-05-21 11:06:20 +01:00
Christudasan Devadasan
44fa7cd53e GlobalISel: Help reduce operation width for instruction with two results.
The function `reduceOperationWidth` helps to legalize a vector
operation either by narrowing its type or by scalarizing the
operation itself. It currently supports instructions with one result.
This patch, in addition allows the same for instructions with two
results (for instance, G_SDIVREM).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100725
2021-05-21 10:34:18 +05:30
Serge Pavlov
72fd6b9af9 [APFloat] convertToDouble/Float can work on shorter types
Previously APFloat::convertToDouble may be called only for APFloats that
were built using double semantics. Other semantics like single precision
were not allowed although corresponding numbers could be converted to
double without loss of precision. The similar restriction applied to
APFloat::convertToFloat.

With this change any APFloat that can be precisely represented by double
can be handled with convertToDouble. Behavior of convertToFloat was
updated similarly. It make the conversion operations more convenient and
adds support for formats like half and bfloat.

Differential Revision: https://reviews.llvm.org/D102671
2021-05-21 11:02:51 +07:00
Jessica Clarke
8c42ad8897 [SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics
Unlike normal loads these don't have an extension field, but we know
from TargetLowering whether these are sign-extending or zero-extending,
and so can optimise away unnecessary extensions.

This was noticed on RISC-V, where sign extensions in the calling
convention would result in unnecessary explicit extension instructions,
but this also fixes some Mips inefficiencies. PowerPC sees churn in the
tests as all the zero extensions are only for promoting 32-bit to
64-bit, but these zero extensions are still not optimised away as they
should be, likely due to i32 being a legal type.

This also simplifies the WebAssembly code somewhat, which currently
works around the lack of target-independent combines with some ugly
patterns that break once they're optimised away.

Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits,
where zero-extending atomics were incorrectly returning 0 rather than
the (slightly confusing) required return value of 1.

Re-landed again after D102819 fixed PowerPC to correctly zero-extend all
of its atomics as it claimed to do, since the combination of that bug
and this optimisation caused buildbot regressions.

Reviewed By: RKSimon, atanasyan

Differential Revision: https://reviews.llvm.org/D101342
2021-05-20 20:34:23 +01:00
Jon Roelofs
8ecb13b84d Revert "[Remarks] Add analysis remarks for memset/memcpy/memmove lengths"
This reverts commit 4bf69fb52b3c445ddcef5043c6b292efd14330e0.

This broke spec2k6/403.gcc under -global-isel. Details to follow once I've
reduced the problem.
2021-05-20 12:19:16 -07:00
Fraser Cormack
a3089bc5eb [RISCV] Ensure shuffle splat operands are type-legal
The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a
type-legal splat value as it may implicitly extract a vector element
from another shuffle. It is not permitted to introduce an illegal type
when lowering shuffles.

This patch addresses the crash by adding a boolean flag to
`getSplatValue`, defaulting to false, which when set will ensure a
type-legal return value. If it is unable to do that it will fail to
return a splat value.

I've been through the existing uses of `getSplatValue` in other targets
and was unable to find a need or test cases showing a need to update
their uses. In some cases, the call is made during `LegalizeVectorOps`
which may still produce illegal scalar types. In other situations, the
illegally-typed splat value may be quickly patched up to a legal type
(such as any-extending the returned `extract_vector_elt` up to a legal
type) before `LegalizeDAG` notices.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102687
2021-05-20 18:00:03 +01:00
Stephen Tozer
90c1db3c3c [DebugInfo] Handle DIArgList in FastISel or GlobalIsel
Currently, variadic dbg.values (i.e. those using a DIArgList as part of
their location) are not handled properly by FastISel or GlobalISel, and
will produce invalid DBG_VALUE instructions if they encounter them. This
patch fixes this issue by emitting undef DBG_VALUE instructions for
variadic dbg.values, so that no incorrect instruction is produced and
any prior variable location is terminated.

This is simply a quick-fix to prevent errors; a correct implementation
should come later for these ISel pipelines to ensure that we do not drop
debug information unnecessarily.

Differential Revision: https://reviews.llvm.org/D102500
2021-05-20 17:37:28 +01:00
David Sherwood
f86b85d85e [CodeGen] Add support for widening the result of EXTRACT_SUBVECTOR
When trying to return a type such as <vscale x 1 x i32> from a
function we crash in DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR
when attempting to get the fixed number of elements in the vector.

For the simple case we are dealing with, i.e. extracting
<vscale x 1 x i32> from index 0 of input vector <vscale x 4 x i32>
we can simply rely upon existing code that just returns the input.

Differential Revision: https://reviews.llvm.org/D102605
2021-05-20 12:27:08 +01:00
David Sherwood
3bda651940 [CodeGen] Add support for widening INSERT_SUBVECTOR operands
When attempting to return something like a <vscale x 1 x i32>
type from a function we end up trying to widen the vector by
inserting a <vscale x 1 x i32> subvector into an undefined
<vscale x 4 x i32> vector. However, during legalisation we
then attempt to widen the INSERT_SUBVECTOR operands and hit
an error in WidenVectorOperand.

This patch adds a new WidenVecOp_INSERT_SUBVECTOR function
that currently only supports inserting subvectors into undefined
vectors.

Differential Revision: https://reviews.llvm.org/D102501
2021-05-20 10:37:03 +01:00
Amara Emerson
e9e3784d95 [GlobalISel] Fix div+rem -> divrem combine causing use-def violation. 2021-05-19 23:13:41 -07:00
Jon Roelofs
ad30e385ed [Remarks] Add analysis remarks for memset/memcpy/memmove lengths
Differential revision: https://reviews.llvm.org/D102452
2021-05-19 15:09:18 -07:00
Jessica Paquette
c0b812fa18 Recommit "[GlobalISel] Simplify G_ICMP to true/false when the result is known"
Add missing REQUIRES line to
prelegalizer-combiner-icmp-to-true-false-known-bits.
2021-05-19 09:29:19 -07:00
Simon Moll
0dc8431dd3 [VP] make getFunctionalOpcode return an Optional
The operation of some VP intrinsics do/will not map to regular
instruction opcodes.  Returning 'None' seems more intuitive here than
'Instruction::Call'.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102778
2021-05-19 17:08:34 +02:00
Anirudh Prasad
5ae5cfdb6b [AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1
- This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html | main RFC here ]])
- This patch in particular introduces HLASM Parser support for Z machine instructions.
- The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate.
- The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well.

The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf | HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format):

```
<TokA><spaces.*><TokB><spaces.*><TokC><spaces.*><TokD>
```

1. TokA is referred to as the Name Entry. This token is optional
2. TokB is referred to as the Operation Entry. This token is mandatory.
3. TokC is referred to as the Operand Entry. This token is mandatory
4. TokD is referred to as the Remarks Entry. This token is optional

- If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on.
- If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field
- TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error.
- TokD if provided is taken as is, and emitted as a comment.

The following additional approach was examined, but not taken:

- Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage.

Testing:

- This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on.

- Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated.

Reviewed By: Kai, uweigand

Differential Revision: https://reviews.llvm.org/D98276
2021-05-19 11:05:30 -04:00
Simon Pilgrim
3aef61c246 Revert rG528bc10e95d5f9d6a338f9bab5e91d7265d1cf05 : "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB"
Reports on D101970 indicate this is causing failures on multi-stage compiles.
2021-05-19 15:01:20 +01:00
Nico Weber
a80d7c0abf Revert "[GlobalISel] Simplify G_ICMP to true/false when the result is known"
This reverts commit 892497c806306a4b7185ead16d60b0ebcca0a304.
Breaks tests, see comments on https://reviews.llvm.org/D102542
2021-05-19 09:02:27 -04:00
Sanjay Patel
151c1a6abb [SDAG] propagate FMF from target-specific IR intrinsics
This is a step towards relying more on node-level FMF rather than function-wide
or target settings.
I think it was just an oversight that we didn't get this path in D87361
or follow-on patches.

The lack of FMF propagation is blocking D90901 from converting tests to IR-level FMF.

We can't do much more than this currently because we also fail to propagate flags
from x86-specific node to generic FMA node. That would be another patch, so the
test just verifies that we can transfer from IR to initial SDAG node.

Differential Revision: https://reviews.llvm.org/D102725
2021-05-19 07:50:50 -04:00
Tim Northover
d6d3226a57 MachineBasicBlock: add liveout iterator aware of which liveins are defined by the runtime.
Using this in RegAlloc fast reduces register pressure, and in some cases allows
x86 code to compile that wouldn't before.
2021-05-19 11:00:24 +01:00
Rong Xu
275da5fb80 Fix sanitizer test errors from commit 886629a8
Explictly handle the empty string in the Hash calculation.
2021-05-18 22:46:51 -07:00
Guozhi Wei
5f78ed293a [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB
This patch transforms the sequence

    lea (reg1, reg2), reg3
    sub reg3, reg4

to two sub instructions

    sub reg1, reg4
    sub reg2, reg4

Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register
of ADD).

Differential Revision: https://reviews.llvm.org/D101970
2021-05-18 18:02:36 -07:00
Rong Xu
8bcc53c7a0 Fix a buildbot failure from commit 886629a8 2021-05-18 16:53:34 -07:00
Rong Xu
4e30d875bf [SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This patch implements first part of Flow Sensitive SampleFDO (FSAFDO).
It has the following changes:
(1) disable current discriminator encoding scheme,
(2) new hierarchical discriminator for FSAFDO.

For this patch, option "-enable-fs-discriminator=true" turns on the new
functionality. Option "-enable-fs-discriminator=false" (the default)
keeps the current SampleFDO behavior. When the fs-discriminator is
enabled, we insert a flag variable, namely, llvm_fs_discriminator, to
the object. This symbol will checked by create_llvm_prof tool, and used
to generate a profile with FS-AFDO discriminators enabled. If this
happens, for an extbinary format profile, create_llvm_prof tool
will add a flag to profile summary section.

Differential Revision: https://reviews.llvm.org/D102246
2021-05-18 16:23:43 -07:00
Arthur Eubanks
2682845f03 [NFC] Use ArgListEntry indirect types more in ISel lowering
For opaque pointers, we're trying to avoid uses of
PointerType::getElementType().

A couple of ISel places use PointerType::getElementType(). Some of these
are easy to fix by using ArgListEntry's indirect types.

The inalloca type wasn't stored there, as opposed to preallocated and
byval which have their indirect types available, so add it and use it.

This is a reland after an MSan fix in D102667.

Differential Revision: https://reviews.llvm.org/D101713
2021-05-18 14:30:22 -07:00
Arthur Eubanks
b8775b2a78 [TargetLowering] Only inspect attributes in the arguments for ArgListEntry
Parameter attributes are considered part of the function [1], and like
mismatched calling conventions [2], we can't have the verifier check for
mismatched parameter attributes.

This is a reland after fixing MSan issues in D102667.

[1] https://llvm.org/docs/LangRef.html#parameter-attributes
[2] https://llvm.org/docs/FAQ.html#why-does-instcombine-simplifycfg-turn-a-call-to-a-function-with-a-mismatched-calling-convention-into-unreachable-why-not-make-the-verifier-reject-it

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D101806
2021-05-18 14:30:22 -07:00
Arthur Eubanks
7a1762f190 [NewPM] Don't mark AA analyses as preserved
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.

SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102032
2021-05-18 13:49:03 -07:00
Jessica Paquette
71450116c7 [GlobalISel] Simplify G_ICMP to true/false when the result is known
Use existing KnownBits helpers from KnownBits.h to simplify G_ICMPs.

E.g.

x == x -> true
x != x -> false
load(x) > 1 -> true (when the load is known to be greater than 1)

And so on.

Differential Revision: https://reviews.llvm.org/D102542
2021-05-18 09:26:41 -07:00
David Green
cedff971cb [RDA] Fix printing of regs / reg units. NFC
It was printing RegUnits as Regs, leading to much confusion in the debug
logs.
2021-05-18 08:07:30 +01:00
Ten Tzen
9ff115e8b2 [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.

This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.

Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.

NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.

The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
  faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
  must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
  outside of _try region must be updated in memory (not just in register)
  before the subsequent exception occurs.

The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.

Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.

This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.

One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:

A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others.  If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:

Warning: jump bypasses variable with a non-trivial destructor

In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.

Implementation:
Part-1: Clang implementation described below.

Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.

Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.

Part-2 (will be in Part-2 patch): LLVM implementation described below.

For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).

For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.

The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.

Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html

Differential Revision: https://reviews.llvm.org/D80344/new/
2021-05-17 22:42:17 -07:00
Serguei Katkov
b6a8f6d36b [Statepoint Lowering] Cleanup: remove unused option statepoint-always-spill-base. 2021-05-18 12:15:15 +07:00
Nick Desaulniers
e282834e98 [AArch64] Support customizing stack protector guard
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:

1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0

to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).

Address pr/47341 for aarch64.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/289
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen

Differential Revision: https://reviews.llvm.org/D100919
2021-05-17 11:49:22 -07:00
Simon Pilgrim
633bafaf5f [TargetLowering] prepareUREMEqFold/prepareSREMEqFold - account for non legal shift types
Ensure we tell getShiftAmountTy that we're working with pre-legalized types to prevent cases where the (legalized) shift type can no longer handle the (non-legalized) type width.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34366
2021-05-17 11:03:27 +01:00
Tim Northover
fc5daa6083 IR/AArch64/X86: add "swifttailcc" calling convention.
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".

Support is added for AArch64 and X86 for now.
2021-05-17 10:48:34 +01:00
Fraser Cormack
69fc258fc0 [DAGCombiner] Relax an assertion to an early return
The select-of-constants transform was asserting that its constant vector
inputs did not implicitly truncate their input without that as an
explicit precondition to the function. This patch relaxes that assertion
into an early return to skip the optimization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102393
2021-05-17 09:15:55 +01:00
Arthur Eubanks
99d811bdc5 Revert "[TargetLowering] Only inspect attributes in the arguments for ArgListEntry"
This reverts commit 16748bd2fb1fe10d7d097961f1988327338f3f9f.

Causes https://crbug.com/1209013
2021-05-16 22:02:10 -07:00
Arthur Eubanks
630c86e151 Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering"
This reverts commit 85af8a8c1b574faa0d5d57d189ae051debdfada8.
2021-05-16 22:00:54 -07:00
Pan, Tao
f9d8052498 [SelectionDAG] Make fast and linearize visible by clang -pre-RA-sched
ScheduleDAGFast.cpp is compiled to object file, but the ScheduleDAGFast
object file isn't linked into clang executable file as no symbol is
referred by outside. Add calling to createXxx of ScheduleDAGFast.cpp,
then the ScheduleDAGFast object file will be linked into clang
executable file. The static RegisterScheduler will register scheduler
fast and linearize at clang boot time.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D101601
2021-05-17 11:25:15 +08:00
David Green
99f589780c [CPG][ARM] Optimize towards branch on zero in codegenprepare
This adds a simple fold into codegenprepare that converts comparison of
branches towards comparison with zero if possible. For example:
  %c = icmp ult %x, 8
  br %c, bla, blb
  %tc = lshr %x, 3
becomes
  %tc = lshr %x, 3
  %c = icmp eq %tc, 0
  br %c, bla, blb

As a first order approximation, this can reduce the number of
instructions needed to perform the branch as the shift is (often) needed
anyway. At the moment this does not effect very much, as llvm tends to
prefer the opposite form. But it can protect against regressions from
commits like rG9423f78240a2.

Simple cases of Add and Sub are added along with Shift, equally as the
comparison to zero can often be folded with cpsr flags.

Differential Revision: https://reviews.llvm.org/D101778
2021-05-16 17:54:06 +01:00
Pengxuan Zheng
3d401a1bf6 Support GCC's -fstack-usage flag
This patch adds support for GCC's -fstack-usage flag. With this flag, a stack
usage file (i.e., .su file) is generated for each input source file. The format
of the stack usage file is also similar to what is used by GCC. For each
function defined in the source file, a line with the following information is
produced in the .su file.

<source_file>:<line_number>:<function_name> <size_in_byte> <static/dynamic>

"Static" means that the function's frame size is static and the size info is an
accurate reflection of the frame size. While "dynamic" means the function's
frame size can only be determined at run-time because the function manipulates
the stack dynamically (e.g., due to variable size objects). The size info only
reflects the size of the fixed size frame objects in this case and therefore is
not a reliable measure of the total frame size.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D100509
2021-05-15 10:22:49 -07:00
Simon Pilgrim
109645876c IfConverter::MeetIfcvtSizeLimit - Fix uninitialized variable warnings. NFCI.
Ensure the duplication instruction counts are initialized to zero (even though they aren't used) to silence static analysis warnings.
2021-05-15 14:51:54 +01:00
Nikita Popov
d8b6f240a9 [IR] Add BasicBlock::isEntryBlock() (NFC)
This is a recurring and somewhat awkward pattern. Add a helper
method for it.
2021-05-15 12:41:58 +02:00
Amara Emerson
7c35883ebc [GlobalISel][CallLowering] Fix crash when handling a v3s32 type that's being passed as v2s64. 2021-05-14 16:30:51 -07:00
Sanjay Patel
1a03ebd2a9 [SDAG] reduce code duplication for extend_vec_inreg combines; NFC
These are identical so far, and I was looking at adding a fold
for a pattern with scalar_to_vector which would also nd up duplicated.
2021-05-14 08:29:57 -04:00
Tim Northover
5661b7eb80 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
David Spickett
3d7d76ad6f [llvm][AsmPrinter] Restore source location to register clobber warning
Since 5de2d189e6ad466a1f0616195e8c524a4eb3cbc0 this particular warning
hasn't had the location of the source file containing the inline
assembly.

Fix this by reporting via LLVMContext. Which means that we no longer
have the "instantiated into assembly here" lines but they were going to
point to the start of the inline asm string anyway.

This message is already tested via IR in llvm. However we won't have
the required location info there so I've added a C file test in clang
to cover it.
(though strictly, this is testing llvm code)

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D102244
2021-05-14 08:22:57 +00:00
Chen Zheng
4eed210a79 [Debug-Info] change Tag type to dwarf::Tag for createAndAddDIE; NFC
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102207
2021-05-13 21:15:06 -04:00
Chen Zheng
b9ce1812f9 [Debug-Info] make DIE attributes generation under strict DWARF control
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101024
2021-05-13 20:34:07 -04:00