This adds MC support for the crypto instructions that were made optional
extensions in Armv8.2-A (AArch64 only).
Differential Revision: https://reviews.llvm.org/D49370
llvm-svn: 338010
--strip-trailing-cr is a diffutils option which is also available on
BSD-licensed diff introduced in FreeBSD 11.2, however, it has a bug
comparing files mixing \r and \r\n. Use -b (POSIX) instead.
llvm-svn: 338008
I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited.
So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations.
Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now.
llvm-svn: 338007
These calls were making sure some newly created nodes were added to worklist, but the DAGCombiner has internal support for ensuring it has visited all nodes. Any time it visits a node it ensures the operands have been queued to be visited as well. This means if we only need to return the last new node. The DAGCombiner will take care of adding its inputs thus walking backwards through all the new nodes.
llvm-svn: 337996
The function in question is copy-pasted lots of times in DWARF-related classes.
Thus it will make sense to place its implementation into the Support library.
Reviewed by: lhames
Differential Revision: https://reviews.llvm.org/D49824
llvm-svn: 337995
- Remove unnecessary anchor function
- Remove unnecessary override of getAnalysisUsage
- Use reference instead of pointers where things cannot be nullptr
- Use ArrayRef instead of std::vector where possible
llvm-svn: 337989
- Avoid duplication of regmask size calculation.
- Simplify allocateRegisterMask() call.
- Rename allocateRegisterMask() to allocateRegMask() to be consistent
with naming in MachineOperand.
llvm-svn: 337986
This patch add support for emitting DWARF5 accelerator tables
(.debug_names) from dsymutil. Just as with the Apple style accelerator
tables, it's possible to update existing dSYMs. This patch includes a
test that show how you can convert back and forth between the two types.
If no kind of table is specified, dsymutil will default to generating
Apple-style accelerator tables whenever it finds those in its input. The
same is true when there are no accelerator tables at all. Finally, in
the remaining case, where there's at least one DWARF v5 table and no
Apple ones, the output will contains a DWARF accelerator tables
(.debug_names).
Differential revision: https://reviews.llvm.org/D49137
llvm-svn: 337980
Some BPF JIT backends would want to optimize memcpy in their own
architecture specific way.
However, at the moment, there is no way for JIT backends to see memcpy
semantics in a reliable way. This is due to LLVM BPF backend is expanding
memcpy into load/store sequences and could possibly schedule them apart from
each other further. So, BPF JIT backends inside kernel can't reliably
recognize memcpy semantics by peephole BPF sequence.
This patch introduce new intrinsic expand infrastructure to memcpy.
To get stable in-order load/store sequence from memcpy, we first lower
memcpy into BPF::MEMCPY node which then expanded into in-order load/store
sequences in expandPostRAPseudo pass which will happen after instruction
scheduling. By this way, kernel JIT backends could reliably recognize
memcpy through scanning BPF sequence.
This new memcpy expand infrastructure is gated by a new option:
-bpf-expand-memcpy-in-order
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 337977
Reuse the handling for llvm.used, and don't transform such globals.
Fixes a failure on the asan buildbot caused by my previous commit.
llvm-svn: 337973
If the DAGCombiner's rotate matching was working as expected,
I don't think we'd see any test diffs here.
This sidesteps the issue of custom lowering for rotates raised in PR38243:
https://bugs.llvm.org/show_bug.cgi?id=38243
...by only dealing with legal operations.
llvm-svn: 337966
In some cases LSV sees (load/store _ (select _ <pointer expression>
<pointer expression>)) patterns in input IR, often due to sinking and
other forms of CFG simplification, sometimes interspersed with
bitcasts and all-constant-indices GEPs. With this
patch`areConsecutivePointers` method would attempt to handle select
instructions. This leads to an increased number of successful
vectorizations.
Technically, select instructions could appear in index arithmetic as
well, however, we don't see those in our test suites / benchmarks.
Also, there is a lot more freedom in IR shapes computing integral
indices in general than in what's common in pointer computations, and
it appears that it's quite unreliable to do anything short of making
select instructions first class citizens of Scalar Evolution, which
for the purposes of this patch is most definitely an overkill.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D49428
llvm-svn: 337965
Instead of depending on implicit padding from the structure layout code,
use a packed struct and emit the padding explicitly.
Differential Revision: https://reviews.llvm.org/D49710
llvm-svn: 337961
GNU binutils tools have no problems with this kind of shared constants,
provided that we actually hook it up completely in AsmPrinter and
produce a global symbol.
This effectively reverts SVN r335918 by hooking the rest of it up
properly.
This feature was implemented originally in SVN r213006, with no reason
for why it can't be used for MinGW other than the fact that GCC doesn't
do it while MSVC does.
Differential Revision: https://reviews.llvm.org/D49646
llvm-svn: 337951
In SVN r334523, the first half of comdat constant pool handling was
hoisted from X86WindowsTargetObjectFile (which despite the name only
was used for msvc targets) into the arch independent
TargetLoweringObjectFileCOFF, but the other half of the handling was
left behind in X86AsmPrinter::GetCPISymbol.
With only half of the handling in place, inconsistent comdat
sections/symbols are created, causing issues with both GNU binutils
(avoided for X86 in SVN r335918) and with the MS linker, which
would complain like this:
fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4
Differential Revision: https://reviews.llvm.org/D49644
llvm-svn: 337950
Violating the invariants specified by attributes is undefined behavior.
Maybe we could use poison instead for some of the parameter attributes,
but I don't think it's worthwhile.
Differential Revision: https://reviews.llvm.org/D49041
llvm-svn: 337947
Saves materializing the immediate for the "ands".
Corresponding patterns exist for lsrs+lsls, but that seems less common
in practice.
Now implemented as a DAGCombine.
Differential Revision: https://reviews.llvm.org/D49585
llvm-svn: 337945
as well as sext(C + x + ...) -> (D + sext(C-D + x + ...))<nuw><nsw>
similar to the equivalent transformation for zext's
if the top level addition in (D + (C-D + x * n)) could be proven to
not wrap, where the choice of D also maximizes the number of trailing
zeroes of (C-D + x * n), ensuring homogeneous behaviour of the
transformation and better canonicalization of such AddRec's
(indeed, there are 2^(2w) different expressions in `B1 + ext(B2 + Y)` form for
the same Y, but only 2^(2w - k) different expressions in the resulting `B3 +
ext((B4 * 2^k) + Y)` form, where w is the bit width of the integral type)
This patch generalizes sext(C1 + C2*X) --> sext(C1) + sext(C2*X) and
sext{C1,+,C2} --> sext(C1) + sext{0,+,C2} transformations added in
r209568 relaxing the requirements the following way:
1. C2 doesn't have to be a power of 2, it's enough if it's divisible by 2
a sufficient number of times;
2. C1 doesn't have to be less than C2, instead of extracting the entire
C1 we can split it into 2 terms: (00...0XXX + YY...Y000), keep the
second one that may cause wrapping within the extension operator, and
move the first one that doesn't affect wrapping out of the extension
operator, enabling further simplifications;
3. C1 and C2 don't have to be positive, splitting C1 like shown above
produces a sum that is guaranteed to not wrap, signed or unsigned;
4. in AddExpr case there could be more than 2 terms, and in case of
AddExpr the 2nd and following terms and in case of AddRecExpr the
Step component don't have to be in the C2*X form or constant
(respectively), they just need to have enough trailing zeros,
which in turn could be guaranteed by means other than arithmetics,
e.g. by a pointer alignment;
5. the extension operator doesn't have to be a sext, the same
transformation works and profitable for zext's as well.
Apparently, optimizations like SLPVectorizer currently fail to
vectorize even rather trivial cases like the following:
double bar(double *a, unsigned n) {
double x = 0.0;
double y = 0.0;
for (unsigned i = 0; i < n; i += 2) {
x += a[i];
y += a[i + 1];
}
return x * y;
}
If compiled with `clang -std=c11 -Wpedantic -Wall -O3 main.c -S -o - -emit-llvm`
(!{!"clang version 7.0.0 (trunk 337339) (llvm/trunk 337344)"})
it produces scalar code with the loop not unrolled with the unsigned `n` and
`i` (like shown above), but vectorized and unrolled loop with signed `n` and
`i`. With the changes made in this commit the unsigned version will be
vectorized (though not unrolled for unclear reasons).
How it all works:
Let say we have an AddExpr that looks like (C + x + y + ...), where C
is a constant and x, y, ... are arbitrary SCEVs. Let's compute the
minimum number of trailing zeroes guaranteed of that sum w/o the
constant term: (x + y + ...). If, for example, those terms look like
follows:
i
XXXX...X000
YYYY...YY00
...
ZZZZ...0000
then the rightmost non-guaranteed-zero bit (a potential one at i-th
position above) can change the bits of the sum to the left (and at
i-th position itself), but it can not possibly change the bits to the
right. So we can compute the number of trailing zeroes by taking a
minimum between the numbers of trailing zeroes of the terms.
Now let's say that our original sum with the constant is effectively
just C + X, where X = x + y + .... Let's also say that we've got 2
guaranteed trailing zeros for X:
j
CCCC...CCCC
XXXX...XX00 // this is X = (x + y + ...)
Any bit of C to the left of j may in the end cause the C + X sum to
wrap, but the rightmost 2 bits of C (at positions j and j - 1) do not
affect wrapping in any way. If the upper bits cause a wrap, it will be
a wrap regardless of the values of the 2 least significant bits of C.
If the upper bits do not cause a wrap, it won't be a wrap regardless
of the values of the 2 bits on the right (again).
So let's split C to 2 constants like follows:
0000...00CC = D
CCCC...CC00 = (C - D)
and represent the whole sum as D + (C - D + X). The second term of
this new sum looks like this:
CCCC...CC00
XXXX...XX00
----------- // let's add them up
YYYY...YY00
The sum above (let's call it Y)) may or may not wrap, we don't know,
so we need to keep it under a sext/zext. Adding D to that sum though
will never wrap, signed or unsigned, if performed on the original bit
width or the extended one, because all that that final add does is
setting the 2 least significant bits of Y to the bits of D:
YYYY...YY00 = Y
0000...00CC = D
----------- <nuw><nsw>
YYYY...YYCC
Which means we can safely move that D out of the sext or zext and
claim that the top-level sum neither sign wraps nor unsigned wraps.
Let's run an example, let's say we're working in i8's and the original
expression (zext's or sext's operand) is 21 + 12x + 8y. So it goes
like this:
0001 0101 // 21
XXXX XX00 // 12x
YYYY Y000 // 8y
0001 0101 // 21
ZZZZ ZZ00 // 12x + 8y
0000 0001 // D
0001 0100 // 21 - D = 20
ZZZZ ZZ00 // 12x + 8y
0000 0001 // D
WWWW WW00 // 21 - D + 12x + 8y = 20 + 12x + 8y
therefore zext(21 + 12x + 8y) = (1 + zext(20 + 12x + 8y))<nuw><nsw>
This approach could be improved if we move away from using trailing
zeroes and use KnownBits instead. For instance, with KnownBits we could
have the following picture:
i
10 1110...0011 // this is C
XX X1XX...XX00 // this is X = (x + y + ...)
Notice that some of the bits of X are known ones, also notice that
known bits of X are interspersed with unknown bits and not grouped on
the rigth or left.
We can see at the position i that C(i) and X(i) are both known ones,
therefore the (i + 1)th carry bit is guaranteed to be 1 regardless of
the bits of C to the right of i. For instance, the C(i - 1) bit only
affects the bits of the sum at positions i - 1 and i, and does not
influence if the sum is going to wrap or not. Therefore we could split
the constant C the following way:
i
00 0010...0011 = D
10 1100...0000 = (C - D)
Let's compute the KnownBits of (C - D) + X:
XX1 1 = carry bit, blanks stand for known zeroes
10 1100...0000 = (C - D)
XX X1XX...XX00 = X
--- -----------
XX X0XX...XX00
Will this add wrap or not essentially depends on bits of X. Adding D
to this sum, however, is guaranteed to not to wrap:
0 X
00 0010...0011 = D
sX X0XX...XX00 = (C - D) + X
--- -----------
sX XXXX XX11
As could be seen above, adding D preserves the sign bit of (C - D) +
X, if any, and has a guaranteed 0 carry out, as expected.
The more bits of (C - D) we constrain, the better the transformations
introduced here canonicalize expressions as it leaves less freedom to
what values the constant part of ((C - D) + x + y + ...) can take.
Reviewed By: mzolotukhin, efriedma
Differential Revision: https://reviews.llvm.org/D48853
llvm-svn: 337943
Summary:
The VS compiler (on Windows) has a bug which results in fieldFromInstruction being optimized out in some circumstances. This only happens in *release no debug info* builds that have assertions *turned off* - in all other situations the function is not inlined, so the functionality is correct. All of the bots have assertions turned on, so this path is not regularly tested. The workaround is to not inline the function on Windows - if the bug is fixed in a later release of the VS compiler, the noinline specification can be removed.
The test that consistently reproduces this is Lanai v11.txt test.
Reviewers: asmith, labath, zturner
Subscribers: dblaikie, stella.stamenova, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D49753
llvm-svn: 337942
When VectorLegalizer::LegalizeOp creates a new SDValue after iterating
over its arguments, we need to refer to the same result number of the
new node that the original value used.
Reviewed by: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D49805
llvm-svn: 337939
Currently ComputeNumSignBits does early exit while processing some
of the operations (add, sub, mul, and select). This prevents the
function from using AssumptionCacheTracker if passed.
Differential Revision: https://reviews.llvm.org/D49759
llvm-svn: 337936
For example v = <2 x i1> is represented as bbbbaaaa in a predicate register,
where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4
from the predicate register.
llvm-svn: 337934
This recommits r337910 after fixing an "ambiguous call to addAttribute"
error with some compilers (gcc circa 4.9 and MSVC). It seems that these
compilers will consider a "false -> pointer" conversion during overload
resolution. This creates ambiguity because one I added an overload which
takes a MCExpr * as an argument.
I fix this by making the new overload take MCExpr&, which avoids the
conversion. It also documents the fact that we expect a valid MCExpr
object.
Original commit message follows:
The motivation for this is D49493, where we'd like to test details of
debug_str_offsets behavior which is difficult to trigger from a
traditional test.
This adds the plubming necessary for dwarfgen to generate this section.
The more interesting changes are:
- I've moved emitStringOffsetsTableHeader function from DwarfFile to
DwarfStringPool, so I can generate the section header more easily from
the unit test.
- added a new addAttribute overload taking an MCExpr*. This is used to
generate the DW_AT_str_offsets_base, which links a compile unit to the
offset table.
I've also added a basic test for reading and writing DW_form_strx forms.
Reviewers: dblaikie, JDevlieghere, probinson
Subscribers: llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D49670
llvm-svn: 337933
Initially, in https://reviews.llvm.org/D44890, I had these defined as
empty functions inside the header when the respective event listener
was not built in. As done in that commit, that wasn't correct, because
it was a ODR violation. Krasimir hot-fixed that in r333265, but that
wasn't quite right either, because it'd lead to the symbol not being
available.
Instead just move the fallbacksto ExecutionEngineBindings.cpp. Could
define them as static inlines in the header too, but I don't think it
matters.
Reviewers: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49654
llvm-svn: 337930
This reverts commit r337910 as it's generating "ambiguous call to
addAttribute" errors on some bots.
Will resubmit once I get a chance to look into the problem.
llvm-svn: 337924
Add support for lowering pointer arguments.
Changing type from pointer to integer is already done in
MipsTargetLowering::getRegisterTypeForCallingConv.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D49419
llvm-svn: 337912
Summary:
The motivation for this is D49493, where we'd like to test details of
debug_str_offsets behavior which is difficult to trigger from a
traditional test.
This adds the plubming necessary for dwarfgen to generate this section.
The more interesting changes are:
- I've moved emitStringOffsetsTableHeader function from DwarfFile to
DwarfStringPool, so I can generate the section header more easily from
the unit test.
- added a new addAttribute overload taking an MCExpr*. This is used to
generate the DW_AT_str_offsets_base, which links a compile unit to the
offset table.
I've also added a basic test for reading and writing DW_form_strx forms.
Reviewers: dblaikie, JDevlieghere, probinson
Subscribers: llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D49670
llvm-svn: 337910
NFC changes to make scheduler TableGen files more readable, by using loops
instead of a lot of similar defs with just e.g. a latency value that changes.
https://reviews.llvm.org/D49598
Review: Ulrich Weigand, Javed Abshar
llvm-svn: 337909
r337828 resolves a PredicateInfo issue with unnamed types.
Original message:
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
llvm-svn: 337904
Add support for inline assembly with output operand that do not
naturally go in the register class it is constrained to (eg. double in a
32-bit GPR as in the PR).
llvm-svn: 337903
Helpers are available to make this option file format independant. This
patch adds the feature for Wasm file format. It doesn't change the
behavior of the other file format handling.
Differential Revision: https://reviews.llvm.org/D49545
llvm-svn: 337896
code.
This consolidates all our hardening calls, and simplifies the code
a bit. It seems much more clear to handle all of these together.
No functionality changed here.
llvm-svn: 337895