Summary:
As discussed in [[ https://bugs.llvm.org/show_bug.cgi?id=38938 | PR38938 ]],
we fail to emit `BEXTR` if the mask is shifted.
We can't deal with that in `X86DAGToDAGISel` `before the address mode for the inc is selected`,
and we can't really do it in the normal DAGCombine, because we don't have generic `ISD::BitFieldExtract` node,
and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back.
So it would seem X86ISelLowering is the place to handle this.
This patch only moves the matchBEXTRFromAnd()
from X86DAGToDAGISel to X86ISelLowering.
It does not add support for the 'shifted mask' pattern.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52426
llvm-svn: 344179
Adding a new reduction pattern match for vectorizing code similar to TSVC s3111:
for (int i = 0; i < N; i++)
if (a[i] > b)
sum += a[i];
This patch adds support for fadd, fsub and fmull, as well as multiple
branches and different (but compatible) instructions (ex. add+sub) in
different branches.
I have forwarded to trunk, added fsub and fmul functionality and
additional tests, but the credit goes to Takahiro, who did most of the
actual work.
Differential Revision: https://reviews.llvm.org/D49168
Patch by Takahiro Miyoshi <takahiro.miyoshi@linaro.org>.
llvm-svn: 344172
Add a library that parses optimization remarks (currently YAML, so based
on the YAMLParser).
The goal is to be able to provide tools a remark parser that is not
completely dependent on YAML, in case we decide to change the format
later.
It exposes a C API which takes a handler that is called with the remark
structure.
It adds a libLLVMOptRemark.a static library, and it's used in-tree by
the llvm-opt-report tool (from which the parser has been mostly moved
out).
Differential Revision: https://reviews.llvm.org/D52776
Fixed the tests by removing the usage of C++11 strings, which seems not
to be supported by gcc 4.8.4 if they're used as a macro argument.
llvm-svn: 344171
Summary:
GlobalISel generates incorrect code because the legalizer artifact
combiner assumes `G_[SZ]EXT (G_IMPLICIT_DEF)` is equivalent to
`G_IMPLICIT_DEF `.
Replace `G_[SZ]EXT (G_IMPLICIT_DEF)` with 0 because the top bits
will be 0 for G_ZEXT and 0/1 for the G_SEXT.
Reviewers: aditya_nandakumar, dsanders, aemerson, javed.absar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D52996
llvm-svn: 344163
Add a library that parses optimization remarks (currently YAML, so based
on the YAMLParser).
The goal is to be able to provide tools a remark parser that is not
completely dependent on YAML, in case we decide to change the format
later.
It exposes a C API which takes a handler that is called with the remark
structure.
It adds a libLLVMOptRemark.a static library, and it's used in-tree by
the llvm-opt-report tool (from which the parser has been mostly moved
out).
Differential Revision: https://reviews.llvm.org/D52776
llvm-svn: 344162
The case will randomly fail if we test it with command "
while llvm-lit test/tools/gold/X86/cache.ll ; do true; done". It is because the llvmcache-foo file is younger than llvmcache-349F039B8EB076D412007D82778442BED3148C4E and llvmcache-A8107945C65C2B2BBEE8E61AA604C311D60D58D6. But due to timestamp precision reason their timestamp is the same. Given the same timestamp, the file prune policy is to remove bigger size file first, so mostly foo file is removed for its bigger size. And the files size is under threshold after deleting foo file. That's what test case expect.
However sometimes, the precision is enough to measure that timestamp of llvmcache-349F039B8EB076D412007D82778442BED3148C4E and llvmcache-A8107945C65C2B2BBEE8E61AA604C311D60D58D6 are smaller than foo, so llvmcache-349F039B8EB076D412007D82778442BED3148C4E and llvmcache-A8107945C65C2B2BBEE8E61AA604C311D60D58D6 are deleted first. Since the files size is still above the file size threshold after deleting the 2 files, the foo file is also deleted. And then the test case fails, because it expect only one file should be deleted instead of 3.
The fix is to change the timestamp of llvmcache-foo file to meet the thinLTO prune policy.
Patch by Luo Yuanke.
Differential Revision: https://reviews.llvm.org/D52452
llvm-svn: 344158
These should test all the optimizable moves on Jaguar.
A follow-up patch will teach how to recognize these optimizable register moves.
llvm-svn: 344144
Summary: Simplify code by having LLVMState hold the RegisterAliasingTrackerCache.
Reviewers: courbet
Subscribers: tschuett, llvm-commits
Differential Revision: https://reviews.llvm.org/D53078
llvm-svn: 344143
Summary:
Extend analysis forwarding loads from preceeding stores to work with
extended loads and truncated stores to the same address so long as the
load is fully subsumed by the store.
Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are
deleted as they've no longer seem to be relevant.
Reviewers: RKSimon, rnk, kparzysz, javed.absar
Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D49200
llvm-svn: 344142
This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in:
rL343727
As noted in PR39195:
https://bugs.llvm.org/show_bug.cgi?id=39195
...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd
solve that in a machine instruction pass, but a quicker solution will be adding a 'HasFastHorizontalOp' feature
bit to deal with it here in the DAG.
Differential Revision: https://reviews.llvm.org/D52997
llvm-svn: 344141
This patch moves the virtual file system form clang to llvm so it can be
used by more projects.
Concretely the patch:
- Moves VirtualFileSystem.{h|cpp} from clang/Basic to llvm/Support.
- Moves the corresponding unit test from clang to llvm.
- Moves the vfs namespace from clang::vfs to llvm::vfs.
- Formats the lines affected by this change, mostly this is the result of
the added llvm namespace.
RFC on the mailing list:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/126657.html
Differential revision: https://reviews.llvm.org/D52783
llvm-svn: 344140
When fillMachineFunction generates a return on targets without a return opcode
(such as AArch64) it should pass an empty set of registers as the return
registers, not 0 which means register number zero.
Differential Revision: https://reviews.llvm.org/D53074
llvm-svn: 344139
Similar to what already happens in the DAGCombiner wrappers, this patch adds the root nodes back onto the worklist if the DCI wrappers' SimplifyDemandedBits/SimplifyDemandedVectorElts were successful.
Differential Revision: https://reviews.llvm.org/D53026
llvm-svn: 344132
Until mischeduler is clever enough to avoid spilling in a vectorized loop
with many (scalar) DLRs it is better to avoid high vectorization factors (8
and above).
llvm-svn: 344129
I've added a new test case that causes the scalarizer to try and use
dead-and-erased values - caused by the basic blocks not being in
domination order within the function. To fix this, instead of iterating
through the blocks in function order, I walk them in reverse post order.
Differential Revision: https://reviews.llvm.org/D52540
llvm-svn: 344128
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.
Differential Revision: https://reviews.llvm.org/D52887
llvm-svn: 344120
A new function getNumVectorRegs() is better to use for the number of needed
vector registers instead of getNumberOfParts(). This is to make sure that the
number of vector registers (and typically operations) required for a vector
type is accurate.
getNumberOfParts() which was previously used works by splitting the vector
type until it is legal gives incorrect results for types with a non
power of two number of elements (rare).
A new static function getScalarSizeInBits() that also checks for a pointer
type and returns 64U for it since otherwise it gets a value of 0). Used in a
few places where Ty may be pointer.
Review: Ulrich Weigand
llvm-svn: 344115
There are places where we need to merge multiple LocationSizes of
different sizes into one, and get a sensible result.
There are other places where we want to optimize aggressively based on
the value of a LocationSizes (e.g. how can a store of four bytes be to
an area of storage that's only two bytes large?)
This patch makes LocationSize hold an 'imprecise' bit to note whether
the LocationSize can be treated as an upper-bound and lower-bound for
the size of a location, or just an upper-bound.
This concludes the series of patches leading up to this. The most recent
of which is r344108.
Fixes PR36228.
Differential Revision: https://reviews.llvm.org/D44748
llvm-svn: 344114
An upcoming patch will change the codegen for these patterns. This test case is
added now so that the patch can show the differences in codegen.
llvm-svn: 344112
Commit r343851 changed the format of the generated instructions.
An unnecessary load has been removed. Previously, a value would be moved
from r24 into a temporary register just to be copied into r30 before the
indirect call. Now, codegen immediately loads r24 into r30, saving a
MOVW instruction.
llvm-svn: 344111
For ISD::SIGN_EXTEND_INREG operation of v2i16 and v2i8 types will cause assert because they are registered as custom operation.
So that the type legalization phase will enter the custom hook, which do not handle ISD::SIGN_EXTEND_INREG operation and fall throw into unreachable assert.
Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52449
llvm-svn: 344109
This is the third patch in a series intended to make
https://reviews.llvm.org/D44748 more easily reviewable. Please see that
patch for more context. The second being r344013.
The intent is to make the output of printing a LocationSize more
precise. The main motivation for this is that we plan to add a bit to
distinguish whether a given LocationSize is an upper-bound or is
precise; making that information available in pretty-printing is nice.
llvm-svn: 344108
Summary:
Subtraction from zero and floating point negation do not have the same
semantics, so fix lowering.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52948
llvm-svn: 344107
sancov subtracts one from the address to get the previous instruction,
which makes sense on x86_64, but not on other platforms.
This change ensures that the offset is correct for different platforms.
The logic for computing the offset is copied from sanitizer_common.
Differential Revision: https://reviews.llvm.org/D53039
llvm-svn: 344103
Summary:
Before, "[options] <inputs>" is unconditionally appended to the `Name` parameter. It is more flexible to change its semantic to `Usage` and let user customize the usage line.
% llvm-objcopy
...
USAGE: llvm-objcopy <input> [ <output> ] [options] <inputs>
With this patch:
% llvm-objcopy
...
USAGE: llvm-objcopy input [output]
Reviewers: rupprecht, alexshap, jhenderson
Reviewed By: rupprecht
Subscribers: jakehehrlich, mehdi_amini, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51009
llvm-svn: 344097
This patch fixes three issues.
The first is that we didn't consider files which are explicitly
set to eolstyle CRLF in the repo, and there are a handful of
these.
Second is that dos2unix doesn't have a -q option in GnuWin32,
so this codepath wasn't working properly.
Finally with newer versions of Python (or newer versions of Git,
or some combination of the two) patches can't be applied when
we treat stdin as text, because Python silently undoes all the
work we did to convert the newlines to LF using dos2unix by
using universal_newlines=True and then converting them *back*
to CRLF. So we need to add a way to force stdin to be treated
as binary, and use it when LF-newlines are required.
Differential Revision: https://reviews.llvm.org/D51444
llvm-svn: 344095
Summary:
Also add tests to catch crashes in passes that are not normally run in
tests.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52959
llvm-svn: 344094
We already do the following combines:
(bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X
(bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X
When the target has "bit preserving fp logic". This patch just extends it
to also combine:
(bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X)
As some targets have fnabs and even those that don't can efficiently lower
both the fabs and the fneg.
Differential revision: https://reviews.llvm.org/D44548
llvm-svn: 344093