If they have other users we'll just end up increasing the instruction count.
We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed.
Fixes PR41164.
Differential Revision: https://reviews.llvm.org/D59630
llvm-svn: 356690
Under optsize we try to avoid folding immediates into instructions under optsize. But if the immediate is 16-bits or 32 bits, but can be encoded as an 8-bit immediate we don't save enough from disabling the folding unless the immediate has enough uses to make up for the size of the move which is either 3 bytes or 5 bytes since there are no sign extended 8-bit moves. We would also save something if the immediate was a live out of the basic block and thus a move was unavoidable, but that would require a more advanced heuristic than just counting uses.
Note we only avoid folding multiple use immediates into the patterns that use X86ISD::ADD/SUB/XOR/OR/AND/CMP/ADC/SBB nodes and not the more common ISD::ADD/SUB/XOR/OR/AND nodes.
Differential Revision: https://reviews.llvm.org/D59522
llvm-svn: 356688
This adds support for scalarizing these intrinsics as well the X86TargetTransformInfo support to avoid scalarizing them in the cases X86 can handle.
I've omitted handling special cases for constant masks for this first pass. Though CodeGenPrepare can constant fold the branch conditions and remove some of the control flow anyway.
Fixes PR40994 and is covers most of PR3666. Might want to implement constant masks to close that.
Differential Revision: https://reviews.llvm.org/D59180
llvm-svn: 356687
This is D59450, but for signed sub. This case is not NFC, because
the overflow logic in ConstantRange is more powerful than the existing
check. This resolves the TODO in the function.
I've added two tests to show that this indeed catches more cases than
the previous logic, but the main correctness test coverage here is in
the existing ConstantRange unit tests.
Differential Revision: https://reviews.llvm.org/D59617
llvm-svn: 356685
SDNodes can only have 64k operands and for some inputs (e.g. large
number of stores), we can reach this limit when creating TokenFactor
nodes. This patch is a follow up to D56740 and updates a few more places
that potentially can create TokenFactors with too many operands.
Reviewers: efriedma, craig.topper, aemerson, RKSimon
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D59156
llvm-svn: 356668
This is probably a bigger limitation than necessary, but since we don't have any evidence yet
that this transform led to real-world perf improvements rather than regressions, I'm making a
quick, blunt fix.
In the motivating x86 example from:
https://bugs.llvm.org/show_bug.cgi?id=41129
...and shown in the regression test, we want to avoid an extra instruction in the dominating
block because that could be costly.
The x86 LSR test diff is reversing the changes from D57789. There's no evidence that 1 version
is any better than the other yet.
Differential Revision: https://reviews.llvm.org/D59602
llvm-svn: 356665
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.
SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58902
Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
llvm-svn: 356659
Added subtarget features for AArch64 to use TPIDR_EL[1|2|3] as the TLS base
register, rather than the default TPIDR_EL0.
Patch by Philip Derrin!
Differential revision: https://reviews.llvm.org/D54685
llvm-svn: 356657
The first problem was a use-after-free in the tests (detected by asan
bots). The temporary array created for the "create" call is guaranteed
to live only until the end of the statement. The fix there is to store
the test data in a local variable to ensure it has the right lifetime
The second issue is broken BUILD_SHARED_LIBS build, which I fix by
adding the appropriate BinaryFormat dependency to the Object unit tests.
llvm-svn: 356655
Summary:
This patch adds basic support for reading minidump files. It contains
the definitions of various important minidump data structures (header,
stream directory), and of one minidump stream (SystemInfo). The ability
to read other streams will be added in follow-up patches. However, all
streams can be read even now as raw data, which means lldb's minidump
support (where this code is taken from) can be immediately rebased on
top of this patch as soon as it lands.
As we don't have any support for generating minidump files (yet), this
tests the code via unit tests with some small handcrafted binaries in
the form of c char arrays.
Reviewers: Bigcheese, jhenderson, zturner
Subscribers: srhines, dschuff, mgorny, fedor.sergeev, lemo, clayborg, JDevlieghere, aprantl, lldb-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59291
llvm-svn: 356652
Summary:
This is a refactoring patch.
- Reduce the number of map searches by reusing the iterator.
- Add asserts to check that the entry is in the cache, as this is something BasicAA relies on to avoid infinite recursion.
Reviewers: chandlerc, aschwaighofer
Subscribers: sanjoy, jlebar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59151
llvm-svn: 356644
Code archaeology in D59315 revealed that MSSA should never be moved.
Rather than trying to check dynamically that this hasn't happened in the
verify() functions of Walkers, it's likely best to just delete its move
constructor.
Since all these verify() functions did is check that MSSA hasn't moved,
this allows us to remove these verify functions.
I can readd the verification checks if someone's super concerned about
us trying to `memcpy` MemorySSA or something somewhere, but I imagine we
have other problems if we're trying anything like that...
llvm-svn: 356641
CMPXCHG8B was introduced on i586/pentium generation.
If its not enabled, limit the atomic width to 32 bits so the AtomicExpandPass will expand to lib calls. Unclear if we should be using a different limit for other configs. The default is 1024 and experimentation shows that using an i256 atomic will cause a crash in SelectionDAG.
Differential Revision: https://reviews.llvm.org/D59576
llvm-svn: 356631
Summary:
llvm-objdump (via libObject) validates DYLD_INFO rebase and bind
entries against the basic structure found in the Mach-O file before
evaluating the contents of those entries. Certain malformed Mach-Os can
defeat the validation check and force llvm-objdump (libObject) to crash.
The previous logic verified a rebase or bind started in a valid Mach-O
section, but did not verify that the section wholely contained the
fixup. It also generally allows rebases or binds to start immediately
after a valid section even if that range is not itself part of a valid
section. Finally, bind and rebase opcodes that indicate more than one
fixup (apply N times...) are not completely validated: only the first
and final fixups are checked.
The previous logic also rejected certain binaries as false positives.
Some bind and rebase opcodes can modify the state machine such that the
next bind or rebase will fail. libObject will reject these opcodes as
invalid in order to be helpful and print an error message associated
with the instruction that caused the problem, even though the binary is
not actually illegal until it consumes the invalid state in the state
machine. In other words, libObject may reject a Mach-O binary that
Apple's dynamic linker may consider legal. The original version of
macho-rebase-add-addr-uleb-too-big is an example of such a binary.
I have replaced the existing checkSegAndOffset and checkCountAndSkip
functions with a single function, checkSegAndOffsets, which validates
all of the fixups realized by a DYLD_INFO opcode. checkSegAndOffsets
verifies that a Mach-O section fully contains each fixup. Every fixup
realized by an opcode is validated, and some (but not all!)
inconsistencies in the state machine are allowed until a fixup is
realized. This means that libObject may fail on an opcode that realizes
a fixup, not on the opcode that introduced the arithmetic error.
Existing test cases have been modified to reflect the changes in error
messages returned by libObject. What's more, the test case for
macho-rebase-add-addr-uleb-too-big has been modified so that it actually
triggers the error condition; the new code in libObject considers the
original test binary "legal".
rdar://47797757
Reviewers: lhames, pete, ab
Reviewed By: pete
Subscribers: rupprecht, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59574
llvm-svn: 356629
My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata"
accidentally caused a spurious PAL metadata .note record to be emitted
for any AMDGPU output. That caused failures in the lld test
amdgpu-relocs.s. Fixed.
Differential Revision: https://reviews.llvm.org/D59613
Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e
llvm-svn: 356621
Machine DCE cannot remove a dead definition if there are non-dbg uses.
A use however can be in the same instruction:
dead %0 = INST %0
Such instructions sometimes created by Detect dead lanes pass.
Allow this instruction to be deleted despite the use if the only use
belongs to the same instruction.
Differential Revision: https://reviews.llvm.org/D59565
llvm-svn: 356619
This patch enables the use of lowerShuffleAsBitMask for 512-bit blends before
falling back to move immedate, GPR to k-register, and masked op.
I had to make some changes to support v8i64 when i64 is not a legal type. And to
support floating point types.
This trades a load for the move immediate and GPR move which is higher latency.
But its probably better for register pressure not having to hop through other
register classes. The load+and should play better with LICM and
rematerialization I think.
Differential Revision: https://reviews.llvm.org/D59479
llvm-svn: 356618
Summary: - The linking is broken when this library is built as shared one.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59610
llvm-svn: 356617
The constantness shouldn't change the register bank choice. We also
don't need to restrict this to only indexing VGPRs, since it's
possible to index SGPRs (but SelectionDAG made using this
difficult). Allow directly indexing SGPRs when appropriate.
llvm-svn: 356611
Summary:
Implements a new target features section in assembly and object files
that records what features are used, required, and disallowed in
WebAssembly objects. The linker uses this information to ensure that
all objects participating in a link are feature-compatible and records
the set of used features in the output binary for use by optimizers
and other tools later in the toolchain.
The "atomics" feature is always required or disallowed to prevent
linking code with stripped atomics into multithreaded binaries. Other
features are marked used if they are enabled globally or on any
function in a module.
Future CLs will add linker flags for ignoring feature compatibility
checks and for specifying the set of allowed features, implement using
the presence of the "atomics" feature to control the type of memory
and segments in the linked binary, and add front-end flags for
relaxing the linkage policy for atomics.
Reviewers: aheejin, sbc100, dschuff
Subscribers: jgravelle-google, hiraditya, sunfish, mgrang, jfb, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59173
llvm-svn: 356610
Summary:
- Should use `targetconstant` instead of `constant` operand for clamp
bit, which is expected as an immediate operand. Under certain
conditions, such as a common `i1 false` constant is used in other
place and selected before the instruction with clamp bit, register
operand may be added instead of immediate one. Use `targetcosntant` to
enforce that.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59608
llvm-svn: 356608
This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them
to something like "str r0, [sp]".
For regular stack variables, this optimization was already implemented:
we lower loads and stores using frame indexes, which are expanded later.
However, when constructing a call frame for a call with more than four
arguments, the existing optimization doesn't apply. We need to use
stores which are actually relative to the current value of sp, and don't
have an associated frame index.
This patch adds a special case to handle that construct. At the DAG
level, this is an ISD::STORE where the address is a CopyFromReg from SP
(plus a small constant offset).
This applies only to Thumb1: in Thumb2 or ARM mode, a regular store
instruction can access SP directly, so the COPY gets eliminated by
existing code.
The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related
cleanup: we shouldn't pretend that it can select anything other than
frame indexes.
Differential Revision: https://reviews.llvm.org/D59568
llvm-svn: 356601
Summary:
When linking two llvm.used arrays, if the resulting merged
array ends up with duplicated elements (with the same name) but with
different types, the IRLinker was crashing. This was supposed to be
legal, as the IRLinker bitcasts elements to match types in these
situations.
This bug was exposed by D56928 in clang to support attribute used
in member functions of class templates. Crash happened when self-hosting
with LTO. Since LLVM depends on attribute used to generate code
for the dump() method, ubiquitous in the code base, many input bc
had a definition of this method referenced in their llvm.used array.
Some of these classes got optimized, changing the type of the first
parameter (this) in the dump method, leading to a scenario with a
pool of valid definitions but some with a different type, triggering
this bug.
This is a memory bug: ValueMapper depends on (calls) the materializer
provided by IRLinker, and this materializer was freely calling RAUW
methods whenever a global definition was updated in the temporary merged
output file. However, replaceAllUsesWith may or may not destroy
constants that use this global. If the linked definition has a type
mismatch regarding the new def and the old def, the materializer would
bitcast the old type to the new type and the elements of the llvm.used
array, which already uses bitcast to i8*, would end up with elements
cascading two bitcasts. RAUW would then indirectly call the
constantfolder to update the constant to the new ref, which would,
instead of updating the constant, destroy it to be able to create
a new constant that folds the two bitcasts into one. The problem is that
ValueMapper works with pointers to the same constants that may be
getting destroyed by RAUW. Obviously, RAUW can update references in the
Module to do not use the old destroyed constant, but it can't update
ValueMapper's internal pointers to these constants, which are now
invalid.
The approach here is to move the task of RAUWing old definitions
outside of the materializer.
Test Plan:
Added LIT test case, tested clang self-hosting with D56928 and
verified it works
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59552
llvm-svn: 356597
Summary:
PAL metadata now supports both the old linear reg=val pairs format and
the new MsgPack format.
The MsgPack format uses YAML as its textual representation. On output to
YAML, a mnemonic name is provided for some hardware registers.
Differential Revision: https://reviews.llvm.org/D57028
Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94
llvm-svn: 356591
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.
Differential Revision: https://reviews.llvm.org/D57247
llvm-svn: 356590
Summary:
Before this patch, if any Use existed in the loop, with a defining
access in the loop, we conservatively decide to not move the store.
What this approach was missing, is that ordered loads are not Uses, they're Defs
in MemorySSA. So, even when the clobbering walker does not find that
volatile load to interfere, we still cannot hoist a store past a
volatile load.
Resolves PR41140.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59564
llvm-svn: 356588
This is a small followup to D59511. The code that was moved into
computeConstantRange() there is a bit overly conversative: If the
abs is not nsw, it does not compute any range. However, abs without
nsw still has a well-defined contiguous unsigned range from 0 to
SIGNED_MIN. This is a lot less useful than the usual 0 to SIGNED_MAX
range, but if we're already here we might as well specify it...
Differential Revision: https://reviews.llvm.org/D59563
llvm-svn: 356586