1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00
Commit Graph

7039 Commits

Author SHA1 Message Date
David Majnemer
f0e072b4f9 [WinEH] Fill out CatchHigh in the TryBlockMap
Now all fields in the WinEH xdata have been filled out.

llvm-svn: 234067
2015-04-03 23:37:34 +00:00
David Majnemer
4a823b111e [WinEH] Fill out .xdata for catch objects
This add support for catching an exception such that an exception object
available to the catch handler will be initialized by the runtime.

llvm-svn: 234062
2015-04-03 22:49:05 +00:00
David Majnemer
694a466675 [WinEH] Sink UnwindHelp completely out of IR
We don't need to represent UnwindHelp in IR.  Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.

llvm-svn: 234061
2015-04-03 22:32:26 +00:00
Andrew Kaylor
cd70932cf9 [WinEH] Handle nested landing pads in outlined catch handlers
Differential Revision: http://reviews.llvm.org/D8596

llvm-svn: 234041
2015-04-03 19:37:50 +00:00
Duncan P. N. Exon Smith
054ffcf2c3 CodeGen: Assert that inlined-at locations agree
As a follow-up to r234021, assert that a debug info intrinsic variable's
`MDLocalVariable::getInlinedAt()` always matches the
`MDLocation::getInlinedAt()` of its `!dbg` attachment.

The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), but I'll let these assertions bake for a while
first.

If you have an out-of-tree backend that just broke, you're probably
attaching the wrong `DebugLoc` to a `DBG_VALUE` instruction.  The one
you want is the location that was attached to the corresponding
`@llvm.dbg.declare` or `@llvm.dbg.value` call that you started with.

llvm-svn: 234038
2015-04-03 19:20:26 +00:00
Duncan P. N. Exon Smith
2170816b1a SelectionDAG: Use specialized metadata nodes in EmitFuncArgumentDbgValue(), NFC
Use `MDLocalVariable` and `MDExpression` directly for the arguments of
`EmitFuncArgumentDbgValue()` to simplify a follow-up patch.

llvm-svn: 234026
2015-04-03 17:11:42 +00:00
Simon Pilgrim
335a565d46 [DAGCombiner] Combine shuffles of BUILD_VECTOR and SCALAR_TO_VECTOR
This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.

Differential Revision: http://reviews.llvm.org/D8516

llvm-svn: 234004
2015-04-03 10:02:21 +00:00
David Majnemer
228c1dbd4c [WinEH] Implement support for catch-all
A catch (...) doesn't have a type descriptor.  Instead, the 'adjectives'
field has bit six set.

llvm-svn: 233788
2015-04-01 05:20:42 +00:00
Jiangning Liu
0fbb128209 Fix PR23065. Avoid optimizing bitcast of build_vector with constant input to scalar_to_vector.
llvm-svn: 233778
2015-04-01 01:52:38 +00:00
David Majnemer
a6739dd367 [WinEH] ExitingScope is vacuously true if !PoppedCatches.empty()
Remove a redundant condition, no functional change intended.

llvm-svn: 233770
2015-03-31 22:43:56 +00:00
David Majnemer
e7ba02b466 [WinEH] Generate .xdata for catch handlers
This lets us catch exceptions in simple cases.

N.B. Things that do not work include (but are not limited to):
- Throwing from within a catch handler.
- Catching an object with a named catch parameter.
- 'CatchHigh' is fictitious, we aren't sure of its purpose.
- We aren't entirely efficient with regards to the number of EH states
  that we generate.
- IP-to-State tables are sensitive to the order of emission.

llvm-svn: 233767
2015-03-31 22:35:44 +00:00
Hal Finkel
60a07f273f [SDAG] Handle non-integer preferred memset types for non-constant values
The existing code in getMemsetValue only handled integer-preferred types when
the fill value was not a constant. Make this more robust in two ways:

  1. If the preferred type is a floating-point value, do the mul-splat trick on
     the corresponding integer type and then bitcast.
  2. If the preferred type is a vector, do the mul-splat trick on one vector
     element, and then build a vector out of them.

Fixes PR22754 (although, we should also turn off use of vector types at -O0).

llvm-svn: 233749
2015-03-31 20:35:26 +00:00
Sanjay Patel
f0b4906544 typos; NFC
llvm-svn: 233701
2015-03-31 16:17:51 +00:00
James Molloy
c4bf9e7282 [SDAG] Move TRUNCATE splitting logic into a helper, and use
it more liberally.

SplitVecOp_TRUNCATE has logic for recursively splitting oversize vectors
that need more than one round of splitting to become legal. There are many
other ISD nodes that could benefit from this logic, so factor it out and
use it for FP_TO_UINT,FP_TO_SINT,SINT_TO_FP,UINT_TO_FP and FTRUNC.

llvm-svn: 233681
2015-03-31 10:20:58 +00:00
David Majnemer
60bdeae6d4 Silence an unused variable warning.
No functional change intended.

llvm-svn: 233639
2015-03-30 23:14:45 +00:00
David Majnemer
6b3129357f [WinEH] Run cleanup handlers when an exception is thrown
Generate tables in the .xdata section representing what actions to take
when an exception is thrown.  This currently fills in state for
cleanups, catch handlers are still unfinished.

llvm-svn: 233636
2015-03-30 22:58:10 +00:00
Duncan P. N. Exon Smith
ba71dc528b CodeGen: Use the new DebugLoc API, NFC
Update lib/CodeGen (and lib/Target) to use the new `DebugLoc` API.

llvm-svn: 233582
2015-03-30 19:14:47 +00:00
Duncan P. N. Exon Smith
1be15fef4c SelectionDAG: Reflow code to use early returns, NFC
llvm-svn: 233577
2015-03-30 18:23:28 +00:00
Simon Pilgrim
3510dc3223 Use SDValue bool check to tidyup some possible vector folding ops. NFC.
llvm-svn: 233498
2015-03-29 19:13:40 +00:00
Simon Pilgrim
e10da00523 Use SDValue bool check to tidyup some possible ReassociateOps. NFC.
llvm-svn: 233495
2015-03-29 16:49:51 +00:00
Simon Pilgrim
8de24604ec [DAGCombiner] Fixed incorrect test for buildvector of constant integers.
DAGCombiner::ReassociateOps was correctly testing for an constant integer scalar but failed to correctly test for constant integer vectors (it was testing for any constant vector).

llvm-svn: 233482
2015-03-28 18:31:31 +00:00
Sanjay Patel
d055dfa615 fix typo and 80-col; NFC
llvm-svn: 233427
2015-03-27 21:45:18 +00:00
Ahmed Bougacha
a4f32650b4 [CodeGen] Don't attempt a tail-call with a non-forwarded explicit sret.
Tailcalls are only OK with forwarded sret pointers. With explicit sret,
one approximation is to check that the pointer isn't an Instruction, as
in that case it might point into some local memory (alloca). That's not
OK with tailcalls.

Explicit sret counterpart to r233409.
Differential Revison: http://reviews.llvm.org/D8510

llvm-svn: 233410
2015-03-27 20:35:49 +00:00
Ahmed Bougacha
e204350911 [CodeGen] Don't attempt a tail-call with implicit sret.
Tailcalls are only OK with forwarded sret pointers. With sret demotion,
they're not, as we'd have a pointer into a soon-to-be-dead stack frame.

Differential Revison: http://reviews.llvm.org/D8510

llvm-svn: 233409
2015-03-27 20:28:30 +00:00
Yaron Keren
3856893d6d Remove superfluous .str() and replace std::string concatenation with Twine.
llvm-svn: 233392
2015-03-27 17:51:30 +00:00
Philip Reames
007c0af082 Require a GC strategy be specified for functions which use gc.statepoint
This was discussed a while back and I left it optional for migration.  Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.  

llvm-svn: 233357
2015-03-27 05:09:33 +00:00
Philip Reames
1e1d0dcae5 Allow explicit spill slots to be specified for a gc.statepoint
This patch adds support for explicitly provided spill slots in the GC arguments of a gc.statepoint.  This is somewhat analogous to gcroot, but leverages the STATEPOINT MI node and StackMap infrastructure.  The motivation for this is:
1) The stack spilling code for gc.statepoints hasn't advanced as fast as I'd like.  One major option is to give up on doing spilling in the backend and do it at the IR level instead.  We'd give up the ability to have gc values in registers, but that's a minor cost in practice.  We are not neccessarily moving in that direction, but having the ability to prototype such a thing cheaply is interesting.
2) I want to port the gcroot lowering to use the statepoint infastructure.  Given the metadata printers for gcroot expect a fixed set of stack roots, it's easiest to just reuse the explicit stack slots and pass them directly to the underlying statepoint.  

I'm holding off on the documentation for the new feature until I'm reasonable sure this is going to stick around.

llvm-svn: 233356
2015-03-27 04:52:48 +00:00
David Majnemer
74b5efbe6b WinEH: Create a parent frame alloca for HandlerType xdata tables
We don't have any logic to emit those tables yet, so the SDAG lowering
of this intrinsic is just a stub.  We can see the intrinsic in the
prepared IR, though.

llvm-svn: 233354
2015-03-27 04:17:07 +00:00
Andrew Trick
276fac879c Fix a bug in SelectionDAG scheduling backtracking code: PR22304.
It can happen (by line CurSU->isPending = true; // This SU is not in
AvailableQueue right now.) that a SUnit is mark as available but is
not in the AvailableQueue. For SUnit being selected for scheduling
both conditions must be met.

This patch mainly defensively protects from invalid removing a node
from a queue. Sometimes nodes are marked isAvailable but are not in
the queue because they have been defered due to some hazard.

Patch by Pawel Bylica!

llvm-svn: 233351
2015-03-27 03:44:13 +00:00
Ahmed Bougacha
ebfa4c3692 [CodeGen] Report error rather than crash when unable to makeLibCall.
Also, make the assumption explicit in the header.

llvm-svn: 233329
2015-03-26 22:46:58 +00:00
Sanjay Patel
6b16639733 revert inadvertent change
llvm-svn: 233294
2015-03-26 17:19:24 +00:00
Sanjay Patel
b7ffd7dd27 comment cleanup; NFC
llvm-svn: 233293
2015-03-26 17:18:17 +00:00
Sanjay Patel
f8b93937c8 fix indent; NFC
llvm-svn: 233288
2015-03-26 16:55:17 +00:00
Simon Pilgrim
2751d2eb79 [DAGCombiner] Add support for TRUNCATE + FP_EXTEND vector constant folding
This patch adds supports for the vector constant folding of TRUNCATE and FP_EXTEND instructions and tidies up the SINT_TO_FP and UINT_TO_FP instructions to match.

It also moves the vector constant folding for the FNEG and FABS instructions to use the DAG.getNode() functionality like the other unary instructions.

Differential Revision: http://reviews.llvm.org/D8593

llvm-svn: 233224
2015-03-25 22:30:31 +00:00
Reid Kleckner
a894d59f4a WinEH: Create an unwind help alloca for __CxxFrameHandler3 xdata tables
We don't have any logic to emit those tables yet, so the sdag lowering
of this intrinsic is just a stub. We can see the intrinsic in the
prepared IR, though.

llvm-svn: 233209
2015-03-25 20:10:36 +00:00
Paul Robinson
ef36d53059 'optnone' should not disable DAG combiner.
Reverts the code change from r221168 and the relevant test.
It was a mistake to disable the combiner, and based on the ultimate
definition of 'optnone' we shouldn't have considered the test case
as failing in the first place.

llvm-svn: 233153
2015-03-25 00:10:24 +00:00
Simon Pilgrim
67352336fb [SelectionDAG] Fixed issue with uitofp vector constant folding being treated as sitofp
While the uitofp scalar constant folding treats an integer as an unsigned value (from lang ref):

%X = sitofp i8 -1 to double ; yields double:-1.0
%Y = uitofp i8 -1 to double ; yields double:255.0

The vector constant folding was always using sitofp:

%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>

This patch fixes this so that the correct opcode is used for sitofp and uitofp.

%X = sitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double -1.0, double -1.0>
%Y = uitofp <2 x i8> <i8 -1, i8 -1> to <2 x double> ; yields <double 255.0, double 255.0>

Differential Revision: http://reviews.llvm.org/D8560

llvm-svn: 233033
2015-03-23 22:44:55 +00:00
Benjamin Kramer
6a9aa608f1 Re-sort includes with sort-includes.py and insert raw_ostream.h where it's used.
llvm-svn: 232998
2015-03-23 19:32:43 +00:00
Benjamin Kramer
d52ec1c0ec Move private classes into anonymous namespaces
NFC.

llvm-svn: 232944
2015-03-23 12:30:58 +00:00
Petar Jovanovic
8e9b052c46 Fix sign extension for MIPS64 in makeLibCall function
Fixing sign extension in makeLibCall for MIPS64. In MIPS64 architecture all
32 bit arguments (int, unsigned int, float 32 (soft float)) must be sign
extended. This fixes test "MultiSource/Applications/oggenc/".

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D7791

llvm-svn: 232943
2015-03-23 12:28:13 +00:00
Hal Finkel
70188cc8ca [SDAG] Don't widen VSETCC during type legalization for split operands
Because the operands of a vector SETCC node can be of a different type from the
result (and often are), it can happen that even if we'd prefer to widen the
result type of the SETCC, the operands have been split instead. In this case,
the SETCC result also must be split. This mirrors what is done in
WidenVecRes_SELECT, and should be NFC elsewhere because if the operands are not
widened the following calls to GetWidenedVector will assert (which is what was
happening in the test case).

llvm-svn: 232935
2015-03-23 08:22:43 +00:00
Hans Wennborg
0ad4b0624b SelectionDAGBuilder: Rangeify a loop. NFC.
llvm-svn: 232831
2015-03-20 18:48:40 +00:00
Hans Wennborg
c133d4c52c SelectionDAGBuilder::handleJTSwitchCase, simplify loop; NFC
llvm-svn: 232830
2015-03-20 18:48:31 +00:00
Hans Wennborg
2efdcbf4cf Rewrite SelectionDAGBuilder::Clusterify to run in linear time. NFC.
It was previously repeatedly erasing elements from the middle of a vector,
causing O(n^2) worst-case run-time.

llvm-svn: 232789
2015-03-20 00:41:03 +00:00
Owen Anderson
480ad2b319 Fix a nasty bug in DAGCombine of STORE nodes.
This is very related to the bug fixed in r174431.  The problem is that
SelectionDAG does not include alignment in the uniquing of loads and
stores.  When an otherwise no-op DAGCombine would increase the alignment
of a load or store, the original node would be returned (with the
alignment increased), which would cause the node not to be processed by
any further DAGCombines.

I don't have a direct testcase for this that manifests on an in-tree
target, but I did see some noise in the tests for other targets and have
updated them for it.

llvm-svn: 232780
2015-03-19 22:48:57 +00:00
Hans Wennborg
db765fd873 Switch lowering: extract NextBlock function. NFC.
llvm-svn: 232759
2015-03-19 20:41:48 +00:00
Hans Wennborg
09e99767b4 Switch lowering: remove unnecessary ConstantInt casts. NFC.
llvm-svn: 232729
2015-03-19 16:42:21 +00:00
Hans Wennborg
da17bb1b24 SelectionDAGBuilder: update comment in HandlePHINodesInSuccessorBlocks.
From what I can tell, the code is checking for PHIs that expect any value from
this block, not just constants.

llvm-svn: 232697
2015-03-19 00:57:51 +00:00
Hans Wennborg
96ae8f3919 SelectionDAGIsel: Fix comment about terminators being "handled below".
That changed in r102128.

llvm-svn: 232692
2015-03-19 00:02:22 +00:00
David Majnemer
ec30fe4691 DAGCombiner: fold (xor (shl 1, x), -1) -> (rotl ~1, x)
Targets which provide a rotate make it possible to replace a sequence of
(XOR (SHL 1, x), -1) with (ROTL ~1, x).  This saves an instruction on
architectures like X86 and POWER(64).

Differential Revision: http://reviews.llvm.org/D8350

llvm-svn: 232572
2015-03-18 00:03:36 +00:00
Simon Pilgrim
9145dcb159 XformToShuffleWithZero - Added clearer early outs and general tidy up. NFCI
llvm-svn: 232557
2015-03-17 22:19:08 +00:00
Daniel Sanders
b2b69459a8 Recommit r232027 with PR22883 fixed: Add infrastructure for support of multiple memory constraints.
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break
anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate
Constraint_* values.

PR22883 was caused the matching operands copying the whole of the operand flags
for the matched operand. This included the constraint id which needed to be
replaced with the operand number. This has been fixed with a conversion
function. Following on from this, matching operands also used the operand
number as the constraint id. This has been fixed by looking up the matched
operand and taking it from there. 

llvm-svn: 232165
2015-03-13 12:45:09 +00:00
Sanjay Patel
13a9b5db63 [X86, AVX2] Replace inserti128 and extracti128 intrinsics with generic shuffles
This should complete the job started in r231794 and continued in r232045:
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

AVX2 introduced proper integer variants of the hacked integer insert/extract
C intrinsics that were created for this same functionality with AVX1.

This should complete the removal of insert/extract128 intrinsics.

The Clang precursor patch for this change was checked in at r232109.

llvm-svn: 232120
2015-03-12 23:16:18 +00:00
Hal Finkel
dc4180d54f Revert "r232027 - Add infrastructure for support of multiple memory constraints"
This (r232027) has caused PR22883; so it seems those bits might be used by
something else after all. Reverting until we can figure out what else to do.

Original commit message:

The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.

llvm-svn: 232093
2015-03-12 20:09:39 +00:00
Sanjay Patel
aa9ea9aae7 [X86, AVX] replace vextractf128 intrinsics with generic shuffles
Now that we've replaced the vinsertf128 intrinsics, 
do the same for their extract twins.

This is very much like D8086 (checked in at r231794):
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

This is also the LLVM sibling to the cfe D8275 patch.

Differential Revision: http://reviews.llvm.org/D8276

llvm-svn: 232045
2015-03-12 15:15:19 +00:00
Daniel Sanders
4eee6f840d Add infrastructure for support of multiple memory constraints.
Summary:
The operand flag word for ISD::INLINEASM nodes now contains a 15-bit
memory constraint ID when the operand kind is Kind_Mem. This constraint
ID is a numeric equivalent to the constraint code string and is converted
with a target specific hook in TargetLowering.

This patch maps all memory constraints to InlineAsm::Constraint_m so there
is no functional change at this point. It just proves that using these
previously unused bits in the encoding of the flag word doesn't break anything.

The next patch will make each target preserve the current mapping of
everything to Constraint_m for itself while changing the target independent
implementation of the hook to return Constraint_Unknown appropriately. Each
target will then be adapted in separate patches to use appropriate Constraint_*
values.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D8171

llvm-svn: 232027
2015-03-12 11:00:48 +00:00
Reid Kleckner
789a64e429 Handle big index in getelementptr instruction
CodeGen incorrectly ignores (assert from APInt) constant index bigger
than 2^64 in getelementptr instruction. This is a test and fix for that.

Patch by Paweł Bylica!

Reviewed By: rnk

Subscribers: majnemer, rnk, mcrosier, resistor, llvm-commits

Differential Revision: http://reviews.llvm.org/D8219

llvm-svn: 231984
2015-03-11 23:36:10 +00:00
Eric Christopher
b5d16a61c0 Remove useMachineScheduler and replace it with subtarget options
that control, individually, all of the disparate things it was
controlling.

At the same time move a FIXME in the Hexagon port to a new
subtarget function that will enable a user of the machine
scheduler to avoid using the source scheduler for pre-RA-scheduling.
The FIXME would have this removed, but involves either testcase
changes or adding -pre-RA-sched=source to a few testcases.

llvm-svn: 231980
2015-03-11 22:56:10 +00:00
Eric Christopher
7e02765bdf Have getCallPreservedMask and getThisCallPreservedMask take a
MachineFunction argument so that we can grab subtarget specific
features off of it.

llvm-svn: 231979
2015-03-11 22:42:13 +00:00
Igor Laevsky
ec23b1a840 Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints.
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
(Resubmitting this change after not being able to reproduce buildbot failure)

Differential Revision: http://reviews.llvm.org/D7760

llvm-svn: 231800
2015-03-10 16:26:48 +00:00
Sanjay Patel
5c62e16cdb [X86, AVX] replace vinsertf128 intrinsics with generic shuffles
We want to replace as much custom x86 shuffling via intrinsics
as possible because pushing the code down the generic shuffle
optimization path allows for better codegen and less complexity
in LLVM.

This is the sibling patch for the Clang half of this change:
http://reviews.llvm.org/D8088

Differential Revision: http://reviews.llvm.org/D8086

llvm-svn: 231794
2015-03-10 16:08:36 +00:00
Daniel Sanders
b0ddb75c07 The operand flag word used in ISD::INLINEASM is an i32 not a pointer. NFC.
Summary:
This is part of the work to support memory constraints that behave
differently to 'm'. The subsequent patches will expand on the existing
encoding (which is a 32-bit int) and as a result in some flag words will no
longer fit into an i16. This problem only affected the MSP430 target which
appears to have 16-bit pointers.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D8168

llvm-svn: 231783
2015-03-10 10:42:59 +00:00
Mehdi Amini
f88efe5f8a DataLayout is mandatory, update the API to reflect it with references.
Summary:
Now that the DataLayout is a mandatory part of the module, let's start
cleaning the codebase. This patch is a first attempt at doing that.

This patch is not exactly NFC as for instance some places were passing
a nullptr instead of the DataLayout, possibly just because there was a
default value on the DataLayout argument to many functions in the API.
Even though it is not purely NFC, there is no change in the
validation.

I turned as many pointer to DataLayout to references, this helped
figuring out all the places where a nullptr could come up.

I had initially a local version of this patch broken into over 30
independant, commits but some later commit were cleaning the API and
touching part of the code modified in the previous commits, so it
seemed cleaner without the intermediate state.

Test Plan:

Reviewers: echristo

Subscribers: llvm-commits

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231740
2015-03-10 02:37:25 +00:00
Ahmed Bougacha
c37ed67e4a [CodeGen] Replace the reused stores' chain for extractelt expansion.
This fixes a subtle issue that was introduced in r205153.

When reusing a store for the extractelement expansion (to load directly
from it, inserting of going through the stack), later stores to the
same location might have overwritten the data we were expecting to
extract from.

To fix that, we need to explicitly replace the chain going out of the
reused store, so that later stores also have an explicit dependency on
the generated element-extracting loads, and can't clobber them.

rdar://20066785
Differential Revision: http://reviews.llvm.org/D8180

llvm-svn: 231721
2015-03-09 22:51:05 +00:00
Simon Pilgrim
60d1ada777 [DAGCombiner] Add a shuffle mask commutation helper function. NFCI.
We have an increasing number of cases where we are creating commuted shuffle masks - all implementing nearly the same code.

This patch adds a static helper function - ShuffleVectorSDNode::commuteMask() and replaces a number of cases to use it.

Differential Revision: http://reviews.llvm.org/D8139

llvm-svn: 231581
2015-03-07 22:33:11 +00:00
Benjamin Kramer
38504f768a Make constant arrays that are passed to functions as const.
In theory this allows the compiler to skip materializing the array on
the stack. In practice clang often fails to do that, but that's a
different story. NFC.

llvm-svn: 231571
2015-03-07 17:41:00 +00:00
Simon Pilgrim
d85428f7b8 Use SDValue bool check to tidyup some possible combines. NFC.
llvm-svn: 231569
2015-03-07 16:34:55 +00:00
Andrea Di Biagio
7cad5c0bec [DAGCombiner] Fix wrong folding of AND dag nodes.
This patch fixes the logic in the DAGCombiner that folds an AND node according
to rule: (and (X (load V)), C) -> (X (load V))

An AND between a vector load 'X' and a constant build_vector 'C' can be folded
into the load itself only if we can prove that the AND operation is redundant.
The algorithm implemented by 'visitAND' firstly computes the splat value 'S'
from C, and then checks if S has the lower 'B' bits set (where B is the size in
bits of the vector element type). The algorithm takes into account also the
'undef' bits in the splat mask.

Unfortunately, the algorithm only worked under the assumption that the size of S
is a multiple of the vector element type. With this patch, we conservatively
avoid folding the AND if the splat bits are not compatible with the vector
element type.

Added X86 test and-load-fold.ll

Differential Revision: http://reviews.llvm.org/D8085

llvm-svn: 231563
2015-03-07 12:24:55 +00:00
Simon Pilgrim
ce7343d964 [DAGCombiner] SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT(V,C)) -> VECTOR_SHUFFLE
This patch attempts to convert a SCALAR_TO_VECTOR using an operand from an EXTRACT_VECTOR_ELT into a VECTOR_SHUFFLE.

This prevents many cases of spilling scalar data between the gpr + simd registers. 

At present the optimization only accepts cases where there is no TRUNC of the scalar type (i.e. all types must match).

Differential Revision: http://reviews.llvm.org/D8132

llvm-svn: 231554
2015-03-07 05:52:42 +00:00
Matthias Braun
47028079f1 DAGCombiner: Canonicalize select(and/or,x,y) depending on target.
This is based on the following equivalences:
select(C0 & C1, X, Y) <=> select(C0, select(C1, X, Y), Y)
select(C0 | C1, X, Y) <=> select(C0, X, select(C1, X, Y))

Many target cannot perform and/or on the CPU flags and therefore the
right side should be choosen to avoid materializign the i1 flags in an
integer register. If the target can perform this operation efficiently
we normalize to the left form.

Differential Revision: http://reviews.llvm.org/D7622

llvm-svn: 231507
2015-03-06 19:49:10 +00:00
Matthias Braun
b345d38012 DAGCombiner: Factor out some and/or combines.
This is in preparation for changing visitSELECT to normalize towards
select(Cond0, select(Cond1, X, Y), Y);
select(Cond0, X, select(Cond1, X, Y)) which perfom an implicit and/or of
the conditions.

The factored function contains all DAGCombine rules which reduce two values
combined by an And/Or operation to a single value. This does not include rules
involving constants as visitSELECT already handles that case.

Differential Revision: http://reviews.llvm.org/D8026

llvm-svn: 231506
2015-03-06 19:49:06 +00:00
Michael Zolotukhin
b7a87300a2 LegalizeTypes: Handle shift by 0 in ExpandShiftByConstant.
Though such shifts are usually optimized away by combiner, we still can
encounter them after a vector shift is legalized.

llvm-svn: 231443
2015-03-06 01:13:01 +00:00
Benjamin Kramer
cfe741989e SelectionDAGBuilder: Merge 3 copies of the limited precision exp2 emission code.
NFC intended.

llvm-svn: 231406
2015-03-05 21:13:08 +00:00
Benjamin Kramer
288ff9cec2 SDAG: Merge the meat of two ExpandAtomic implementations.
The copies already diverged, don't let them become any worse. Reduce
redundancy in code with a little macro metaprogramming.

llvm-svn: 231401
2015-03-05 20:04:29 +00:00
David Majnemer
bdce8557da X86: Optimize address mode matching for FRAME_ALLOC_RECOVER nodes
We know that the absolute symbol will be less than 2GB and thus will
always fit.

llvm-svn: 231389
2015-03-05 18:50:12 +00:00
Reid Kleckner
d0e0d012a0 Replace llvm.frameallocate with llvm.frameescape
Turns out it's pretty straightforward and simplifies the implementation.

Reviewers: andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D8051

llvm-svn: 231386
2015-03-05 18:26:34 +00:00
Simon Pilgrim
2d456bc887 [DagCombiner] Allow shuffles to merge through bitcasts
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).

This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.

Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.

Differential Revision: http://reviews.llvm.org/D7939

llvm-svn: 231380
2015-03-05 17:14:04 +00:00
Igor Laevsky
26ad396d2f Revert change r231366 as it broke clang-native-arm-cortex-a9 Analysis/properties.m test.
llvm-svn: 231374
2015-03-05 15:41:14 +00:00
Elena Demikhovsky
a71b2e475e AVX-512, SKX: Enabled masked_load/store operations for this target.
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.

llvm-svn: 231371
2015-03-05 15:11:35 +00:00
Igor Laevsky
eb9f695bc2 Teach lowering to correctly handle invoke statepoint and gc results tied to them. Note that we still can not lower gc.relocates for invoke statepoints.
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.

llvm-svn: 231366
2015-03-05 14:11:21 +00:00
Michael Kuperstein
c46a58c60e [DAGCombine] Fix a bug in a BUILD_VECTOR combine
When trying to convert a BUILD_VECTOR into a shuffle, we try to split a single source vector that is twice as wide as the destination vector. 
We can not do this when we also need the zero vector to create a blend.
This fixes PR22774.

Differential Revision: http://reviews.llvm.org/D8040

llvm-svn: 231219
2015-03-04 07:27:39 +00:00
Mehdi Amini
62f77ca5f4 Use report_fatal_error instead of unreachable for -fast-isel-abort
Suggestion by Andrea Di Biagio

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231201
2015-03-04 01:48:39 +00:00
David Blaikie
d695cc9032 DAGCombiner::LoadedSlice: Remove explicit copy ctor in favor of the Rule of Zero
This way, the copy assignment operator can be used without hitting the
deprecated case in C++11.

llvm-svn: 231144
2015-03-03 21:50:47 +00:00
David Blaikie
5fd9cda286 Revert "Remove the explicit SDNodeIterator::operator= in favor of the implicit default"
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.

This reverts commit r231135.

llvm-svn: 231136
2015-03-03 21:18:16 +00:00
David Blaikie
f9b228449d Remove the explicit SDNodeIterator::operator= in favor of the implicit default
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.

llvm-svn: 231135
2015-03-03 21:17:08 +00:00
David Blaikie
53c7e75dbf unique_ptrify ResourcePriorityQueue::ResourceModel
llvm-svn: 231127
2015-03-03 20:49:08 +00:00
David Blaikie
370206c461 Remove ResourcePriorityQueue::dump as it relies on copying a non-copyable type which would result in a double-delete
llvm-svn: 231126
2015-03-03 20:49:05 +00:00
Benjamin Kramer
b9cb72cbe5 Accidentaly inverted the condition again. Sorry.
llvm-svn: 230973
2015-03-02 16:45:08 +00:00
Benjamin Kramer
d525f7a9a0 Avoid assertion in MSVC 2013 debug builds.
llvm-svn: 230972
2015-03-02 16:42:56 +00:00
Benjamin Kramer
95a7c55022 Simplify code. NFC.
llvm-svn: 230948
2015-03-02 11:57:04 +00:00
Sanjay Patel
46d370c549 avoid infinite looping when folding vector multiplies of constants (PR22698)
We were missing a check for the following fold in DAGCombiner:

// fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2))

If 'x' is also a constant, then we shouldn't do anything. Otherwise, we could end up swapping the operands back and forth forever.

This should fix:
http://llvm.org/bugs/show_bug.cgi?id=22698

Differential Revision: http://reviews.llvm.org/D7917

llvm-svn: 230884
2015-03-01 00:09:35 +00:00
Mehdi Amini
22544426f3 Fixup for recent -fast-isel-abort change: code didn't match description
Level 1 should abort for all instructions but call/terminators/args.
Instead it was aborting only if the level was > 2

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230861
2015-02-28 19:34:54 +00:00
Benjamin Kramer
4718069d87 Convert push_back loops into append calls.
No functionality change intended.

llvm-svn: 230849
2015-02-28 13:20:15 +00:00
Benjamin Kramer
a9b2056013 Reduce double set lookups.
llvm-svn: 230798
2015-02-27 21:43:14 +00:00
Mehdi Amini
32875af6e3 Change the fast-isel-abort option from bool to int to enable "levels"
Summary:
Currently fast-isel-abort will only abort for regular instructions,
and just warn for function calls, terminators, function arguments.
There is already fast-isel-abort-args but nothing for calls and
terminators.

This change turns the fast-isel-abort options into an integer option,
so that multiple levels of strictness can be defined.
This will help no being surprised when the "abort" option indeed does
not abort, and enables the possibility to write test that verifies
that no intrinsics are forgotten by fast-isel.

Reviewers: resistor, echristo

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D7941

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230775
2015-02-27 18:32:11 +00:00
Eric Christopher
454cbc40f6 getRegForInlineAsmConstraint wants to use TargetRegisterInfo for
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.

llvm-svn: 230699
2015-02-26 22:38:43 +00:00
Paul Robinson
b0132db1ee When the source has a series of assignments, users reasonably want to
have the debugger step through each one individually. Turn off the
combine for adjacent stores at -O0 so we get this behavior.

Possibly, DAGCombine shouldn't run at all at -O0, but that's for
another day; see PR22346.

Differential Revision: http://reviews.llvm.org/D7181

llvm-svn: 230659
2015-02-26 18:47:57 +00:00
Simon Pilgrim
50fa51790b Reapplied D7816 & rL230177 & rL230278 - with an additional fix toensure that the smallest build vector input scalar type is always used. Additional (crash) test cases already committed.
llvm-svn: 230388
2015-02-24 22:08:56 +00:00
Eric Christopher
db420cc1dd Revert:
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Mon Feb 23 23:04:28 2015 +0000

    Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.

and

Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date:   Sun Feb 22 18:17:28 2015 +0000

    [DagCombiner] Generalized BuildVector Vector Concatenation

    The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

    This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

    This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

    Differential Revision: http://reviews.llvm.org/D7816

as the root cause of PR22678 which is causing an assertion inside the DAG combiner.

I'll follow up to the main thread as well.

llvm-svn: 230358
2015-02-24 19:11:00 +00:00
Matthias Braun
2c6ac76701 DAGCombiner: Move variable definitions closer to use; NFC
llvm-svn: 230354
2015-02-24 18:52:01 +00:00
Matthias Braun
69fb836050 DAGCombiner: Move variable declaration closer to definiion; NFC
llvm-svn: 230353
2015-02-24 18:51:59 +00:00
Tim Northover
afcf85da25 ARM: treat [N x i32] and [N x i64] as AAPCS composite types
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.

Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.

llvm-svn: 230348
2015-02-24 17:22:34 +00:00
Hal Finkel
5b5de90baa [SDAG] Handle LowerOperation returning its input consistently
For almost all node types, if the target requested custom lowering, and
LowerOperation returned its input, we'd treat the original node as legal. This
did not work, however, for many loads and stores, because they follow
slightly different code paths, and we did not account for the possibility of
LowerOperation returning its input at those call sites.

I think that we now handle this consistently everywhere. At the call sites in
LegalizeDAG, we used to assert in this case, so there's no functional change
for any existing code there. For the call sites in LegalizeVectorOps, this
really only affects whether or not we set Changed = true, but I think makes the
semantics clearer.

No test case here, but it will be covered by an upcoming PowerPC commit adding
QPX support.

llvm-svn: 230332
2015-02-24 12:59:47 +00:00
Simon Pilgrim
ec80e98714 Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.
llvm-svn: 230278
2015-02-23 23:04:28 +00:00
Andrea Di Biagio
fc5bcff188 [X86] Teach how to custom lower double-to-half conversions under fast-math.
This patch teaches the backend how to expand a double-half conversion into
a double-float conversion immediately followed by a float-half conversion.
We do this only under fast-math, and if float-half conversions are legal
for the target.

Added test CodeGen/X86/fastmath-float-half-conversion.ll

Differential Revision: http://reviews.llvm.org/D7832

llvm-svn: 230276
2015-02-23 22:59:02 +00:00
Simon Pilgrim
fb2ad0f430 [DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.

This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.

This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.

Differential Revision: http://reviews.llvm.org/D7816

llvm-svn: 230177
2015-02-22 18:17:28 +00:00
Hal Finkel
9822c946fa [DAGCombine] Don't assume integer-type legailty in reduceBuildVecConvertToConvertBuildVec
DAGCombine will rewrite an BUILD_VECTOR where all non-undef inputs some from
[US]INT_TO_FP, as a BUILD_VECTOR of integers with the conversion applied as a
vector operation. We check operation legality of the conversion, but fail to
check legality of the integer vector type itself. Because targets don't
normally override operation legality defaults for illegal types, we need to
check this also.

This came up in the context of the QPX vector entensions for PowerPC (which can
have legal floating-point vector types without corresponding legal integer
vector types). No in-tree test case for this yes, but one can be added once
the QPX support has been committed.

llvm-svn: 230176
2015-02-22 16:10:22 +00:00
Hal Finkel
e88c3cdcc4 [SDAG] Use correct alignments on expanded vector trunc-store/ext-loads
When expanding a truncating store or extending load using vector extracts or
inserts and scalar stores and loads, we were giving each of these scalar stores
or loads the same alignment as the original vector operation. While this will
often be right (most vector operations, especially those produced by
autovectorization, have the alignment of the underlying scalar type), the
vector operation could certainly have a larger alignment.

No test case (yet); noticed by inspection.

llvm-svn: 230175
2015-02-22 15:58:04 +00:00
David Majnemer
cc843a9a3a X86: Call __main using the SelectionDAG
Synthesizing a call directly using the MI layer would confuse the frame
lowering code.  This is problematic as frame lowering is highly
sensitive the particularities of calls, etc.

llvm-svn: 230129
2015-02-21 05:49:45 +00:00
Matt Arsenault
79c7d059c9 Add generic fmad DAG node.
This allows sharing of FMA forming combines to work
with instructions that have the same semantics as a separate
multiply and add.

This is expand by default, and only formed post legalization
so it shouldn't have much impact on targets that do not want it.

llvm-svn: 230070
2015-02-20 22:10:33 +00:00
Igor Laevsky
d3544cb9c6 Generalize statepoint lowering to use ImmutableStatepoint. Move statepoint lowering into a separate function 'LowerStatepoint' which uses ImmutableStatepoint instead of a CallInst. Also related utility functions are changed to receive ImmutableCallSite.
Differential Revision: http://reviews.llvm.org/D7756 

llvm-svn: 230017
2015-02-20 15:28:35 +00:00
Ahmed Bougacha
5920886c7a [CodeGen] Use ArrayRef instead of std::vector&. NFC.
The former lets us use SmallVectors.  Do so in ARM and AArch64.

llvm-svn: 229925
2015-02-19 23:13:10 +00:00
Benjamin Kramer
c0850fa665 Demote vectors to arrays. No functionality change.
llvm-svn: 229861
2015-02-19 15:26:17 +00:00
Chandler Carruth
105d2fa5e8 [x86,sdag] Two interrelated changes to the x86 and sdag code.
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.

However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.

Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).

This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!

There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.

llvm-svn: 229835
2015-02-19 10:36:19 +00:00
Michael Kuperstein
8617e481c7 Fixes two issue in SimplifyDemandedBits of sext_in_reg:
1) We should not try to simplify if the sext has multiple uses
2) There is no need to simplify is the source value is already sign-extended.

Patch by Gil Rapaport <gil.rapaport@intel.com>

Differential Revision: http://reviews.llvm.org/D6949

llvm-svn: 229659
2015-02-18 09:43:40 +00:00
Sanjay Patel
0117b06b3f Canonicalize splats as build_vectors (PR22283)
This is a follow-on patch to:
http://reviews.llvm.org/D7093

That patch canonicalized constant splats as build_vectors, 
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.

This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283

The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...

This improves an existing x86 AVX test and changes codegen in an ARM test.

Differential Revision: http://reviews.llvm.org/D7389

llvm-svn: 229511
2015-02-17 16:54:32 +00:00
Benjamin Kramer
21bee91af5 Prefer SmallVector::append/insert over push_back loops.
Same functionality, but hoists the vector growth out of the loop.

llvm-svn: 229500
2015-02-17 15:29:18 +00:00
Mehdi Amini
d44ae390cb SelectionDAG: fold (fp_to_u/sint (s/uint_to_fp)) here too
Update SPARC tests to match.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229438
2015-02-16 21:47:58 +00:00
Andrew Trick
e7964c82c7 AArch64: Safely handle the incoming sret call argument.
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.

Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.

Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering

llvm-svn: 229413
2015-02-16 18:10:47 +00:00
Chandler Carruth
0b684c6980 [SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats
directly into blends of the splats.

These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.

Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.

llvm-svn: 229308
2015-02-15 12:18:12 +00:00
Chandler Carruth
6153ae3921 [x86] Fix PR22377, a regression with the new vector shuffle legality
test.

This was just a matter of the DAG combine for vector shuffles being too
aggressive. This is a bit of a grey area, but I think generally if we
can re-use intermediate shuffles, we should. Certainly, given the test
cases I have available, this seems like the right call.

llvm-svn: 229285
2015-02-15 07:01:10 +00:00
Duncan P. N. Exon Smith
33685ffe5d CodeGen: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

Also, add `Function::getFnStackAlignment()`, and canonicalize:

getAttributes().getStackAlignment(AttributeSet::FunctionIndex)
  => getFnStackAlignment()

llvm-svn: 229208
2015-02-14 01:44:41 +00:00
Reid Kleckner
56669ab852 Unify the two EH personality classification routines I wrote
We only need one.

llvm-svn: 229193
2015-02-14 00:21:02 +00:00
Chandler Carruth
33dabe4f44 Re-sort #include lines using my handy dandy ./utils/sort_includes.py
script. This is in preparation for changes to lots of include lines.

llvm-svn: 229088
2015-02-13 09:09:03 +00:00
Hal Finkel
76da9b7997 [SDAG] Don't try to use FP_EXTEND/FP_ROUND for int<->fp promotions
The PowerPC backend has long promoted some floating-point vector operations
(such as select) to integer vector operations. Unfortunately, this behavior was
broken by r216555. When using FP_EXTEND/FP_ROUND for promotions, we must check
that both the old and new types are floating-point types. Otherwise, we must
use BITCAST as we did prior to r216555 for everything.

llvm-svn: 228969
2015-02-12 22:43:52 +00:00
Benjamin Kramer
4b76aa3d46 MathExtras: Bring Count(Trailing|Leading)Ones and CountPopulation in line with countTrailingZeros
Update all callers.

llvm-svn: 228930
2015-02-12 15:35:40 +00:00
Ahmed Bougacha
88db0c3c30 [CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x).
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.

Differential Revision: http://reviews.llvm.org/D7571

llvm-svn: 228911
2015-02-12 06:15:29 +00:00
Jonas Paulsson
763ee9e6b6 Fix SelectionDAG compile time issue with alias analysis.
Add new token factor node and its users to worklist if alias analysis is
turned on, in DAGCombiner::visitTokenFactor(). Alias analysis may cause
a lot of new token factors to be inserted into the DAG, and they need to
be optimized to avoid significant slow-downs.

Reviewed by Hal Finkel.

llvm-svn: 228841
2015-02-11 16:10:31 +00:00
Petar Jovanovic
2b0935a094 Fix makeLibCall argument (signed) in SoftenFloatRes_XINT_TO_FP function
The isSigned argument of makeLibCall function was hard-coded to false
(unsigned). This caused zero extension on MIPS64 soft float.
As the result SingleSource/Benchmarks/Stanford/FloatMM test and
SingleSource/UnitTests/2005-07-17-INT-To-FP test failed. 
The solution was to use the proper argument.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D7292

llvm-svn: 228765
2015-02-10 23:30:14 +00:00
Andrew Kaylor
fff974fc6d Adding support for llvm.eh.begincatch and llvm.eh.endcatch intrinsics and beginning the documentation of native Windows exception handling.
Differential Revision: http://reviews.llvm.org/D7398

llvm-svn: 228733
2015-02-10 19:52:43 +00:00
Jonas Paulsson
bee296d752 Two comment typo fixes in lib/CodeGen/SelectionDAG/DAGCombiner.cpp.
llvm-svn: 228700
2015-02-10 15:34:29 +00:00
Chandler Carruth
3301dd2d10 [x86] Fix PR22524: the DAG combiner was incorrectly handling illegal
nodes when folding bitcasts of constants.

We can't fold things and then check after-the-fact whether it was legal.
Once we have formed the DAG node, arbitrary other nodes may have been
collapsed to it. There is no easy way to go back. Instead, we need to
test for the specific folding cases we're interested in and ensure those
are legal first.

This could in theory make this less powerful for bitcasting from an
integer to some vector type, but AFAICT, that can't actually happen in
the SDAG so its fine. Now, we *only* whitelist specific int->fp and
fp->int bitcasts for post-legalization folding. I've added the test case
from the PR.

(Also as a note, this does not appear to be in 3.6, no backport needed)

llvm-svn: 228656
2015-02-10 02:25:56 +00:00
Ahmed Bougacha
fccf28b772 [CodeGen] Add hook/combine to form vector extloads, enabled on X86.
The combine that forms extloads used to be disabled on vector types,
because "None of the supported targets knows how to perform load and
sign extend on vectors in one instruction."

That's not entirely true, since at least SSE4.1 X86 knows how to do
those sextloads/zextloads (with PMOVS/ZX).
But there are several aspects to getting this right.
First, vector extloads are controlled by a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).

The interesting optimization enables folding of s/zextloads to illegal
(splittable) vector types, expanding them into smaller legal extloads.

It's not ideal (it introduces some legalization-like behavior in the
combine) but it's better than the obvious alternative: form illegal
extloads, and later try to split them up.  If you do that, you might
generate extloads that can't be split up, but have a valid ext+load
expansion.  At vector-op legalization time, it's too late to generate
this kind of code, so you end up forced to scalarize. It's better to
just avoid creating egregiously illegal nodes.

This optimization is enabled unconditionally on X86.

Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).

Also note that the existing combine that forms extloads is now also
enabled on legal vectors.  This doesn't have a big effect on X86
(because sext+load is usually combined to sext_inreg+aextload).
On ARM it fires on some rare occasions; that's for a separate commit.

Differential Revision: http://reviews.llvm.org/D6904

llvm-svn: 228325
2015-02-05 18:31:02 +00:00
Michael Kuperstein
69b881344b Fixes a bug in vector load legalization that confused bits and bytes.
Differential Revision: http://reviews.llvm.org/D7400

llvm-svn: 228168
2015-02-04 18:54:01 +00:00
Manuel Jacob
83779125f4 Add nullptr checks for TargetSelectionDAGInfo in SelectionDAG.
TSI is not guaranteed be non-null in SelectionDAG.

llvm-svn: 227397
2015-01-28 23:50:40 +00:00
Quentin Colombet
5db4690593 Revert r227242 - Merge vector stores into wider vector stores (PR21711).
This commit creates infinite loop in DAG combine for in the LLVM test-suite
for aarch64 with mcpu=cylcone (just having neon may be enough to expose this).

llvm-svn: 227272
2015-01-27 23:58:01 +00:00
Sanjay Patel
be6834ec4b Merge vector stores into wider vector stores (PR21711)
This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

The 'f3' test case in that report presents a situation where we have two 128-bit
stores extracted from a 256-bit source vector. 

Instead of producing this:

vmovaps %xmm0, (%rdi)
vextractf128    $1, %ymm0, 16(%rdi)

This patch merges the 128-bit stores into a single 256-bit store:

vmovups %ymm0, (%rdi)

Differential Revision: http://reviews.llvm.org/D7208

llvm-svn: 227242
2015-01-27 20:50:27 +00:00
Manuel Jacob
3f8d565b43 Add a FIXME in SelectionDAGBuilder before an assert that is valid only on X86.
When lowering memcpy, memset or memmove, this assert checks whether the pointer
operands are in an address space < 256 which means "user defined address space"
on X86.  However, this notion of "user defined address space" does not exist
for other targets.

llvm-svn: 227191
2015-01-27 13:14:35 +00:00
Eric Christopher
a7bc04d3a5 Grab the TargetLowering info from the DAG rather than querying for
a subtarget.

llvm-svn: 227156
2015-01-27 01:01:36 +00:00
Ahmed Bougacha
8d46cf46ff [SelectionDAG] Fix assert message copypasta. NFC.
llvm-svn: 227119
2015-01-26 19:31:42 +00:00
Eric Christopher
aacfef65cf Move DataLayout back to the TargetMachine from TargetSubtargetInfo
derived classes.

Since global data alignment, layout, and mangling is often based on the
DataLayout, move it to the TargetMachine. This ensures that global
data is going to be layed out and mangled consistently if the subtarget
changes on a per function basis. Prior to this all targets(*) have
had subtarget dependent code moved out and onto the TargetMachine.

*One target hasn't been migrated as part of this change: R600. The
R600 port has, as a subtarget feature, the size of pointers and
this affects global data layout. I've currently hacked in a FIXME
to enable progress, but the port needs to be updated to either pass
the 64-bitness to the TargetMachine, or fix the DataLayout to
avoid subtarget dependent features.

llvm-svn: 227113
2015-01-26 19:03:15 +00:00
Philip Reames
95098b9759 Revert GCStrategy ownership changes
This change reverts the interesting parts of 226311 (and 227046).  This change introduced two problems, and I've been convinced that an alternate approach is preferrable anyways.

The bugs were:
- Registery appears to require all users be within the same linkage unit.  After this change, asking for "statepoint-example" in Transform/ would sometimes get you nullptr, whereas asking the same question in CodeGen would return the right GCStrategy.  The correct long term fix is to get rid of the utter hack which is Registry, but I don't have time for that right now.  227046 appears to have been an attempt to fix this, but I don't believe it does so completely.
- GCMetadataPrinter::finishAssembly was being called more than once per GCStrategy.  Each Strategy was being added to the GCModuleInfo multiple times.

Once I get time again, I'm going to split GCModuleInfo into the gc.root specific part and a GCStrategy owning Analysis pass.  I'm probably also going to kill off the Registry.  Once that's done, I'll move the new GCStrategyAnalysis and all built in GCStrategies into Analysis.  (As original suggested by Chandler.)  This will accomplish my original goal of being able to access GCStrategy from Transform/  without adding all of the builtin GCs to IR/.  

llvm-svn: 227109
2015-01-26 18:26:35 +00:00
Andrea Di Biagio
015201ba20 [DAG] Fix wrong canonicalization performed on shuffle nodes.
This fixes a regression introduced by r226816.
When replacing a splat shuffle node with a constant build_vector,
make sure that the new build_vector has a valid number of elements.

Thanks to Patrik Hagglund for reporting this problem and providing a
small reproducible.

llvm-svn: 227002
2015-01-24 11:54:29 +00:00
Reid Kleckner
f3d1116092 Classify functions by EH personality type rather than using the triple
This mostly reverts commit r222062 and replaces it with a new enum. At
some point this enum will grow at least for other MSVC EH personalities.

Also beefs up the way we were sniffing the personality function.
Previously we would emit the Itanium LSDA despite using
__C_specific_handler.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6987

llvm-svn: 226920
2015-01-23 18:49:01 +00:00
Mehdi Amini
8d442c114d DAGCombine: always constant fold FMA when target disable FP exceptions
Summary: When trying to constant fold an FMA in the DAG, getNode()
fails to fold the FMA if an operand is not finite. In this case this
patch allows the constant folding if !TLI->hasFloatingPointExceptions()

Reviewers: resistor

Reviewed By: resistor

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D6912

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 226901
2015-01-23 07:07:20 +00:00
Jan Vesely
c4e7d8864b SelectionDAG: Add KnownBits and SignBits computation for EXTRACT_ELEMENT
v2: use getZExtValue
    add missing break
    codestyle

v3: add few more comments

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 226880
2015-01-22 23:42:41 +00:00
Ramkumar Ramachandra
550e92d3f7 Intrinsics: introduce llvm_any_ty aka ValueType Any
Specifically, gc.result benefits from this greatly. Instead of:

gc.result.int.*
gc.result.float.*
gc.result.ptr.*
...

We now have a gc.result.* that can specialize to literally any type.

Differential Revision: http://reviews.llvm.org/D7020

llvm-svn: 226857
2015-01-22 20:14:38 +00:00
Sanjay Patel
73374b7f96 merge consecutive stores of extracted vector elements (PR21711)
This is a 2nd try at the same optimization as http://reviews.llvm.org/D6698. 
That patch was checked in at r224611, but reverted at r225031 because it
caused a failure outside of the regression tests.

The cause of the crash was not recognizing consecutive stores that have mixed
source values (loads and vector element extracts), so this patch adds a check
to bail out if any store value is not coming from a vector element extract.

This patch also refactors the shared logic of the constant source and vector
extracted elements source cases into a helper function.

Differential Revision: http://reviews.llvm.org/D6850
 

llvm-svn: 226845
2015-01-22 18:21:26 +00:00
Michael Kuperstein
c63c75da0b [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

Fixed recommit of r226811.

llvm-svn: 226816
2015-01-22 13:07:28 +00:00
Michael Kuperstein
1c951c9953 Revert r226811, MSVC accepts code sane compilers don't.
llvm-svn: 226814
2015-01-22 12:48:07 +00:00
Michael Kuperstein
91fb9ac13f [DAGCombine] Produce better code for constant splats
This solves PR22276.
Splats of constants would sometimes produce redundant shuffles, sometimes ridiculously so (see the PR for details). Fold these shuffles into BUILD_VECTORs early on instead.

Differential Revision: http://reviews.llvm.org/D7093

llvm-svn: 226811
2015-01-22 12:37:23 +00:00
Elena Demikhovsky
4c1384e72f Fixed a bug in type legalizer for masked load/store intrinsics.
The problem occurs when after vectorization we have type
<2 x i32>. This type is promoted to <2 x i64> and then requires
additional efforts for expanding loads and truncating stores.
I added EXPAND / TRUNCATE attributes to the masked load/store
SDNodes. The code now contains additional shuffles.
I've prepared changes in the cost estimation for masked memory
operations, it will be submitted separately.

llvm-svn: 226808
2015-01-22 12:07:59 +00:00
Elena Demikhovsky
4b14f03cdb Fixed a comment
llvm-svn: 226806
2015-01-22 10:01:36 +00:00
Elena Demikhovsky
cc330838cb Fixed a bug in narrowing store operation.
Type MVT::i1 became legal in KNL, but store operation can't be narrowed to this type,
since the size of VT (1 bit) is not equal to its actual store size(8 bits).

Added a test provided by David (dag@cray.com)

llvm-svn: 226805
2015-01-22 09:39:08 +00:00
Tim Northover
5b0e908c64 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
It can help with argument juggling on some targets, and is generally a good
idea.

llvm-svn: 226740
2015-01-21 23:17:19 +00:00
Tim Northover
c2963c8019 Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))"
It hadn't gone through review yet, but was still on my local copy.

This reverts commit r226663

llvm-svn: 226665
2015-01-21 15:48:52 +00:00
Tim Northover
c9cc73b336 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
llvm-svn: 226663
2015-01-21 15:43:28 +00:00
Daniel Jasper
5a3e345e1b Prevent binary-tree deterioration in sparse switch statements.
This addresses part of llvm.org/PR22262. Specifically, it prevents
considering the densities of sub-ranges that have fewer than
TLI.getMinimumJumpTableEntries() elements. Those densities won't help
jump tables.

This is not a complete solution but works around the most pressing
issue.

Review: http://reviews.llvm.org/D7070
llvm-svn: 226600
2015-01-20 19:43:33 +00:00
Daniel Jasper
77c4186290 Factor out a splitSwitchCase() function so that it can be reused.
This is in preparation for a fix to llvm.org/PR22262. One of the ideas
here is to first find a good jump table range first and then split
before and after it. Thereby, we don't need to use the
split-based-on-density heuristic at all, which can make the "binary
tree" deteriorate in various cases.

Also some minor cleanups.

No functional changes.

llvm-svn: 226551
2015-01-20 08:57:44 +00:00
Chandler Carruth
77221961c5 [PM] Remove the Pass argument from all of the critical edge splitting
APIs and replace it and numerous booleans with an option struct.

The critical edge splitting API has a really large surface of flags and
so it seems worth burning a small option struct / builder. This struct
can be constructed with the various preserved analyses and then flags
can be flipped in a builder style.

The various users are now responsible for directly passing along their
analysis information. This should be enough for the critical edge
splitting to work cleanly with the new pass manager as well.

This API is still pretty crufty and could be cleaned up a lot, but I've
focused on this change just threading an option struct rather than
a pass through the API.

llvm-svn: 226456
2015-01-19 12:09:11 +00:00
Mehdi Amini
e119c6afaf Improve DAG combine pass on certain IR vector patterns
Loading 2 2x32-bit float vectors into the bottom half of a 256-bit vector
produced suboptimal code in AVX2 mode with certain IR combinations.

In particular, the IR optimizer folded 2f32 + 2f32 -> 4f32, 4f32 + 4f32
(undef) -> 8f32 into a 2f32 + 2f32 -> 8f32, which seems more canonical,
but then mysteriously generated rather bad code; the movq/movhpd combination
didn't match.

The problem lay in the BUILD_VECTOR optimization path. The 2f32 inputs
would get promoted to 4f32 by the type legalizer, eventually resulting
in a BUILD_VECTOR on two 4f32 into an 8f32. The BUILD_VECTOR then, recognizing
these were both half the output size, concatted them and then produced
a shuffle. However, the resulting concat + shuffle was more complex than
it should be; in the case where the upper half of the output is undef, we
probably want to generate shuffle + concat instead.

This enhancement causes the vector_shuffle combine step to recognize this
suboptimal pattern and correct it. I included it there instead of in BUILD_VECTOR
in case the same suboptimal pattern occurs for other reasons.

This results in the optimizer correctly producing the optimal movq + movhpd
sequence for all three variations on this IR, even with AVX2.

I've included a test case.

Radar link: rdar://problem/19287012
Fix for PR 21943.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 226360
2015-01-17 01:35:56 +00:00
Philip Reames
c6126d1689 Move ownership of GCStrategy objects to LLVMContext
Note: This change ended up being slightly more controversial than expected.  Chandler has tentatively okayed this for the moment, but I may be revisiting this in the near future after we settle some high level questions.

Rather than have the GCStrategy object owned by the GCModuleInfo - which is an immutable analysis pass used mainly by gc.root - have it be owned by the LLVMContext. This simplifies the ownership logic (i.e. can you have two instances of the same strategy at once?), but more importantly, allows us to access the GCStrategy in the middle end optimizer. To this end, I add an accessor through Function which becomes the canonical way to get at a GCStrategy instance.

In the near future, this will allows me to move some of the checks from http://reviews.llvm.org/D6808 into the Verifier itself, and to introduce optimization legality predicates for some of the recent additions to InstCombine. (These will follow as separate changes.)

Differential Revision: http://reviews.llvm.org/D6811

llvm-svn: 226311
2015-01-16 20:07:33 +00:00
Mehdi Amini
be6225a8f6 Fix SelectionDAG -view-*-dags filtering
llvm-svn: 226163
2015-01-15 12:03:32 +00:00
Alexander Kornienko
66580103e2 Replace size method call of containers to empty method where appropriate
This patch was generated by a clang tidy checker that is being open sourced.
The documentation of that checker is the following:

/// The emptiness of a container should be checked using the empty method
/// instead of the size method. It is not guaranteed that size is a
/// constant-time function, and it is generally more efficient and also shows
/// clearer intent to use empty. Furthermore some containers may implement the
/// empty method but not implement the size method. Using empty whenever
/// possible makes it easier to switch to another container in the future.

Patch by Gábor Horváth!

llvm-svn: 226161
2015-01-15 11:41:30 +00:00
Chandler Carruth
88fd126216 [PM] Separate the TargetLibraryInfo object from the immutable pass.
The pass is really just a means of accessing a cached instance of the
TargetLibraryInfo object, and this way we can re-use that object for the
new pass manager as its result.

Lots of delta, but nothing interesting happening here. This is the
common pattern that is developing to allow analyses to live in both the
old and new pass manager -- a wrapper pass in the old pass manager
emulates the separation intrinsic to the new pass manager between the
result and pass for analyses.

llvm-svn: 226157
2015-01-15 10:41:28 +00:00
Chandler Carruth
49a7633378 [PM] Move TargetLibraryInfo into the Analysis library.
While the term "Target" is in the name, it doesn't really have to do
with the LLVM Target library -- this isn't an abstraction which LLVM
targets generally need to implement or extend. It has much more to do
with modeling the various runtime libraries on different OSes and with
different runtime environments. The "target" in this sense is the more
general sense of a target of cross compilation.

This is in preparation for porting this analysis to the new pass
manager.

No functionality changed, and updates inbound for Clang and Polly.

llvm-svn: 226078
2015-01-15 02:16:27 +00:00
Reid Kleckner
73cbbd28bb Emit the Itanium LSDA for unknown EH personalities on Win64
This fixes lots of generic CodeGen tests that use __gcc_personality_v0.
This suggests that using ExceptionHandling::MSVC was a mistake, and we
should instead classify each function by personality function. This
would, for example, allow us to LTO a binary containing uses of SEH and
Itanium EH.

llvm-svn: 226019
2015-01-14 18:50:10 +00:00
Reid Kleckner
898f390ac5 Remove dead code for llvm.eh.selector in the old EH model
llvm-svn: 226018
2015-01-14 18:49:39 +00:00
Chandler Carruth
0b619fcc8e [cleanup] Re-sort all the #include lines in LLVM using
utils/sort_includes.py.

I clearly haven't done this in a while, so more changed than usual. This
even uncovered a missing include from the InstrProf library that I've
added. No functionality changed here, just mechanical cleanup of the
include order.

llvm-svn: 225974
2015-01-14 11:23:27 +00:00
Mehdi Amini
54e3de76f7 SelectionDAG: add a -filter-view-dags option to llc
This option takes the name of the basic block you want to visualize
with -view-*-dags

Differential Revision: http://reviews.llvm.org/D6948

llvm-svn: 225953
2015-01-14 06:03:18 +00:00
Mehdi Amini
6fb1fc8a8f DAG Combiner: Fold SelectCC When Cond is UNDEF
In case folding a node end up with a NaN as operand for the select, 
the folding of the condition of the selectcc node returns "UNDEF".

Differential Revision: http://reviews.llvm.org/D6889

llvm-svn: 225952
2015-01-14 05:45:24 +00:00
Matt Arsenault
22a9f67443 Implement new way of expanding extloads.
Now that the source and destination types can be specified,
allow doing an expansion that doesn't use an EXTLOAD of the
result type. Try to do a legal extload to an intermediate type
and extend that if possible.

This generalizes the special case custom lowering of extloads
R600 has been using to work around this problem.

This also happens to fix a bug that would incorrectly use more
aligned loads than should be used.

llvm-svn: 225925
2015-01-14 01:35:17 +00:00
Hal Finkel
add252ca22 Adjust ScheduleDAGSDNodes::RegDefIter for patchpoints
PATCHPOINT is a strange pseudo-instruction. Depending on how it is used, and
whether or not the AnyReg calling convention is being used, it might or might
not define a value. However, its TableGen definition says that it defines one
value, and so when it doesn't, the code in ScheduleDAGSDNodes::RegDefIter
becomes confused and the code that uses the RegDefIter will try to get the
register class of the MVT::Other type associated with the PATCHPOINT's chain
result (under certain circumstances).

This will be covered by the PPC64 PatchPoint test cases once that support is
re-committed.

llvm-svn: 225907
2015-01-14 01:07:03 +00:00
Reid Kleckner
b190c8f871 CodeGen support for x86_64 SEH catch handlers in LLVM
This adds handling for ExceptionHandling::MSVC, used by the
x86_64-pc-windows-msvc triple. It assumes that filter functions have
already been outlined in either the frontend or the backend. Filter
functions are used in place of the landingpad catch clause type info
operands. In catch clause order, the first filter to return true will
catch the exception.

The C specific handler table expects the landing pad to be split into
one block per handler, but LLVM IR uses a single landing pad for all
possible unwind actions. This patch papers over the mismatch by
synthesizing single instruction BBs for every catch clause to fill in
the EH selector that the landing pad block expects.

Missing functionality:
- Accessing data in the parent frame from outlined filters
- Cleanups (from __finally) are unsupported, as they will require
  outlining and parent frame access
- Filter clauses are unsupported, as there's no clear analogue in SEH

In other words, this is the minimal set of changes needed to write IR to
catch arbitrary exceptions and resume normal execution.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6300

llvm-svn: 225904
2015-01-14 01:05:27 +00:00
Matthias Braun
3267ed0083 DAGCombiner: simplify by using condition variables; NFC
llvm-svn: 225836
2015-01-13 22:17:46 +00:00
Matt Arsenault
fb153e33c0 R600: Implement getRecipEstimate
This requires a new hook to prevent expanding sqrt in terms
of rsqrt and reciprocal. v_rcp_f32, v_rsq_f32, and v_sqrt_f32 are
all the same rate, so this expansion would just double the number
of instructions and cycles.

llvm-svn: 225828
2015-01-13 20:53:23 +00:00
Hal Finkel
05914a8701 [StackMaps] Mark in CallLoweringInfo when lowering a patchpoint
While, generally speaking, the process of lowering arguments for a patchpoint
is the same as lowering a regular indirect call, on some targets it may not be
exactly the same. Targets may not, for example, want to add additional register
dependencies that apply only to making cross-DSO calls through linker stubs,
may not want to load additional registers out of function descriptors, and may
not want to add additional side-effect-causing instructions that cannot be
removed later with the call itself being generated.

The PowerPC target will use this in a future commit (for all of the reasons
stated above).

llvm-svn: 225806
2015-01-13 17:48:04 +00:00
Olivier Sallenave
b0c1223b39 Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook.
llvm-svn: 225795
2015-01-13 15:06:36 +00:00
Reid Kleckner
033ced7470 Rename llvm.recoverframeallocation to llvm.framerecover
This name is less descriptive, but it sort of puts things in the
'llvm.frame...' namespace, relating it to frameallocate and
frameaddress. It also avoids using "allocate" and "allocation" together.

llvm-svn: 225752
2015-01-13 01:51:34 +00:00
Reid Kleckner
002e480f22 Add the llvm.frameallocate and llvm.recoverframeallocation intrinsics
These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.

Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.

These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.

Reviewers: echristo, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D6493

llvm-svn: 225746
2015-01-13 00:48:10 +00:00
Matt Arsenault
bdd98aed5a Combine fcmp + select to fminnum / fmaxnum if no nans and legal
Also require unsafe FP math for no since there isn't a way to
test for signed zeros.

llvm-svn: 225744
2015-01-13 00:43:00 +00:00
Hal Finkel
db800a6904 [DAGCombine] Remainder of fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), when we elide an FP extend to form an FMA,
we need to extend the incoming operands so that the resulting node will really
be legal. This is currently enabled only for PowerPC, and it happens to work
there regardless, but this should fix the functionality for everyone else
should anyone else wish to use it.

llvm-svn: 225492
2015-01-09 01:29:29 +00:00
Hal Finkel
87302ccea9 Partial fix to r225380 (More FMA folding opportunities)
As pointed out by Aditya (and Owen), there are two things wrong with this code.
First, it adds patterns which elide FP extends when forming FMAs, and that might
not be profitable on all targets (it belongs behind the pre-existing
aggressive-FMA-formation flag). This is fixed by this change.

Second, the resulting nodes might have operands of different types (the
extensions need to be re-added). That will be fixed in the follow-up commit.

llvm-svn: 225485
2015-01-09 00:45:54 +00:00
Elena Demikhovsky
9ab7f6e415 Masked Load/Store - fixed a bug in type legalization.
llvm-svn: 225441
2015-01-08 12:29:19 +00:00
Ahmed Bougacha
4150499cd1 [SelectionDAG] Allow targets to specify legality of extloads' result
type (in addition to the memory type).

The *LoadExt* legalization handling used to only have one type, the
memory type.  This forced users to assume that as long as the extload
for the memory type was declared legal, and the result type was legal,
the whole extload was legal.

However, this isn't always the case.  For instance, on X86, with AVX,
this is legal:
    v4i32 load, zext from v4i8
but this isn't:
    v4i64 load, zext from v4i8
Whereas v4i64 is (arguably) legal, even without AVX2.

Note that the same thing was done a while ago for truncstores (r46140),
but I assume no one needed it yet for extloads, so here we go.

Calls to getLoadExtAction were changed to add the value type, found
manually in the surrounding code.

Calls to setLoadExtAction were mechanically changed, by wrapping the
call in a loop, to match previous behavior.  The loop iterates over
the MVT subrange corresponding to the memory type (FP vectors, etc...).
I also pulled neighboring setTruncStoreActions into some of the loops;
those shouldn't make a difference, as the additional types are illegal.
(e.g., i128->i1 truncstores on PPC.)

No functional change intended.

Differential Revision: http://reviews.llvm.org/D6532

llvm-svn: 225421
2015-01-08 00:51:32 +00:00
Olivier Sallenave
82830c0d30 More FMA folding opportunities.
llvm-svn: 225380
2015-01-07 20:54:17 +00:00
Olivier Sallenave
b30e9a1b52 Test commit
llvm-svn: 225368
2015-01-07 19:45:17 +00:00
Philip Reames
813212cde9 Introduce an example statepoint GC strategy
This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.)

Most of the validation logic added here as proof of concept will soon move in to the Verifier.  That move is dependent on http://reviews.llvm.org/D6811

There was discussion in the review thread about addrspace(1) being reserved for something.  I'm going to follow up on a seperate llvmdev thread.  If needed, I'll update all the code at once.

Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others.

Differential Revision: http://reviews.llvm.org/D6808

llvm-svn: 225365
2015-01-07 19:07:50 +00:00
Mehdi Amini
a6822a0177 SelectionDAGBuilder: move constant initialization out of loop
No semantic change intended.

Reviewers: resistor

Differential Revision: http://reviews.llvm.org/D6834

llvm-svn: 225278
2015-01-06 18:20:04 +00:00
Craig Topper
03e518b16d Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert.
llvm-svn: 225160
2015-01-05 10:15:49 +00:00
Alexey Samsonov
4caafc0c9a Revert "merge consecutive stores of extracted vector elements"
This reverts commit r224611. This change causes crashes
in X86 DAG->DAG Instruction Selection.

llvm-svn: 225031
2014-12-31 00:40:28 +00:00
Elena Demikhovsky
4a153fb55a Masked Load/Store - Changed the order of parameters in intrinsics.
No functional changes.
The documentation is coming.

llvm-svn: 224829
2014-12-25 07:49:20 +00:00
Mehdi Amini
1f065b2623 Always assert in DAGCombine and not only when -debug is enabled
Right now in DAG Combine check the validity of the returned type 
only when -debug is given on the command line. However usually 
the test cases in the validation does not use -debug. 
An Assert build should always check this.

llvm-svn: 224779
2014-12-23 18:59:02 +00:00
Michael Kuperstein
81a40a5cbe [DagCombine] Improve DAGCombiner BUILD_VECTOR when it has two sources of elements
This partially fixes PR21943.

For AVX, we go from:

vmovq   (%rsi), %xmm0
vmovq   (%rdi), %xmm1
vpermilps       $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps       $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps       $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps       $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps       $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

To the expected:

vmovq   (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq

Fixing this for AVX2 is still open.

Differential Revision: http://reviews.llvm.org/D6749

llvm-svn: 224759
2014-12-23 08:59:45 +00:00
Matt Arsenault
92400a8e42 Enable (sext x) == C --> x == (trunc C) combine
Extend the existing code which handles this for zext. This makes this
more useful for targets with ZeroOrNegativeOne BooleanContent and
obsoletes a custom combine SI uses for i1 setcc (sext(i1), 0, setne)
since the constant will now be shrunk to i1.

llvm-svn: 224691
2014-12-21 16:48:42 +00:00
Sanjay Patel
7cfa420ae4 merge consecutive stores of extracted vector elements
Add a path to DAGCombiner::MergeConsecutiveStores() 
to combine multiple scalar stores when the store operands
are extracted vector elements. This is a partial fix for
PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ).

For the new test case, codegen improves from:

   vmovss  %xmm0, (%rdi)
   vextractps      $1, %xmm0, 4(%rdi)
   vextractps      $2, %xmm0, 8(%rdi)
   vextractps      $3, %xmm0, 12(%rdi)
   vextractf128    $1, %ymm0, %xmm0
   vmovss  %xmm0, 16(%rdi)
   vextractps      $1, %xmm0, 20(%rdi)
   vextractps      $2, %xmm0, 24(%rdi)
   vextractps      $3, %xmm0, 28(%rdi)
   vzeroupper
   retq

To:

   vmovups	%ymm0, (%rdi)
   vzeroupper
   retq

Patch reviewed by Nadav Rotem.

Differential Revision: http://reviews.llvm.org/D6698

llvm-svn: 224611
2014-12-19 20:23:41 +00:00
Michael Kuperstein
3790301d73 [DAGCombine] Slightly improve lowering of BUILD_VECTOR into a shuffle.
This handles the case of a BUILD_VECTOR being constructed out of elements extracted from a vector twice the size of the result vector. Previously this was always scalarized. Now, we try to construct a shuffle node that feeds on extract_subvectors.

This fixes PR15872 and provides a partial fix for PR21711.

Differential Revision: http://reviews.llvm.org/D6678

llvm-svn: 224429
2014-12-17 12:32:17 +00:00
Hans Wennborg
37a572f581 SelectionDAG switch lowering: use 'unsigned' to count destination popularity
SwitchInst::getNumCases() returns unsinged, so using uint64_t to count cases
seems unnecessary.

Also fix a missing CHECK in the test case.

llvm-svn: 224393
2014-12-16 23:41:59 +00:00
Sanjay Patel
af93d5f15c merge consecutive loads that are offset from a base address
SelectionDAG::isConsecutiveLoad() was not detecting consecutive loads
when the first load was offset from a base address. 

This patch recognizes that pattern and subtracts the offset before comparing
the second load to see if it is consecutive.

The codegen change in the new test case improves from:

vmovsd	32(%rdi), %xmm0
vmovsd	48(%rdi), %xmm1 
vmovhpd	56(%rdi), %xmm1, %xmm1
vmovhpd	40(%rdi), %xmm0, %xmm0
vinsertf128	$1, %xmm1, %ymm0, %ymm0

To:

vmovups	32(%rdi), %ymm0

An existing test case is also improved from:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovsd	24(%rdi), %xmm2
vunpcklpd	%xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
vmovhpd	8(%rdi), %xmm1, %xmm3

To:

vmovsd	(%rdi), %xmm0
vmovsd	16(%rdi), %xmm1
vmovhpd	24(%rdi), %xmm0, %xmm0
vmovhpd	8(%rdi), %xmm1, %xmm1

This patch fixes PR21771 ( http://llvm.org/bugs/show_bug.cgi?id=21771 ).

Differential Revision: http://reviews.llvm.org/D6642

llvm-svn: 224379
2014-12-16 21:57:18 +00:00
Michael Ilseman
dd56e9aa72 Silence more static analyzer warnings.
Add in definedness checks for shift operators, null checks when
pointers are assumed by the code to be non-null, and explicit
unreachables.

llvm-svn: 224255
2014-12-15 18:48:43 +00:00
Matt Arsenault
b0274833dd Add target hook for whether it is profitable to reduce load widths
Add an option to disable optimization to shrink truncated larger type
loads to smaller type loads. On SI this prevents using scalar load
instructions in some cases, since there are no scalar extloads.

llvm-svn: 224084
2014-12-12 00:00:24 +00:00
Duncan P. N. Exon Smith
3d57886267 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
Owen Anderson
2ca3461569 Fix a few instances found in SelectionDAG where we were not handling F16 at parity with F32 and F64.
llvm-svn: 223760
2014-12-09 06:50:39 +00:00
Justin Bogner
430d01bf77 InstrProf: An intrinsic and lowering for instrumentation based profiling
Introduce the ``llvm.instrprof_increment`` intrinsic and the
``-instrprof`` pass. These provide the infrastructure for writing
counters for profiling, as in clang's ``-fprofile-instr-generate``.

The implementation of the instrprof pass is ported directly out of the
CodeGenPGO classes in clang, and with the followup in clang that rips
that code out to use these new intrinsics this ends up being NFC.

Doing the instrumentation this way opens some doors in terms of
improving the counter performance. For example, this will make it
simple to experiment with alternate lowering strategies, and allows us
to try handling profiling specially in some optimizations if we want
to.

Finally, this drastically simplifies the frontend and puts all of the
lowering logic in one place.

llvm-svn: 223672
2014-12-08 18:02:35 +00:00
Hans Wennborg
24cc36324d SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

SimplifyCFG currently does this transformation, but I'm working towards changing
that so we can optimize harder based on unreachable defaults.

Differential Revision: http://reviews.llvm.org/D6510

llvm-svn: 223566
2014-12-06 01:28:50 +00:00
Simon Pilgrim
0b4002e32c [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Elena Demikhovsky
befed29343 Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 223348
2014-12-04 09:40:44 +00:00
Hal Finkel
337f550328 [PowerPC] Implement readcyclecounter for PPC32
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).

This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).

Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.

llvm-svn: 223161
2014-12-02 22:01:00 +00:00
Philip Reames
82e79ac753 Restructure some assertion checking based on post commit feedback by Aaron and Tom.
llvm-svn: 223150
2014-12-02 21:01:48 +00:00
Philip Reames
63f6675179 Appease a build bot complaining about an unused variable that's used in an assertion.
llvm-svn: 223142
2014-12-02 19:28:57 +00:00
Philip Reames
02104421ff [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  

With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  

Reviewed by: atrick, ributzka

llvm-svn: 223137
2014-12-02 18:50:36 +00:00
Hans Wennborg
f8109e3ddf Revert r223049, r223050 and r223051 while investigating test failures.
I didn't foresee affecting the Clang test suite :/

llvm-svn: 223054
2014-12-01 17:36:43 +00:00
Hans Wennborg
09865c30b6 SelectionDAG switch lowering: Replace unreachable default with most popular case.
This can significantly reduce the size of the switch, allowing for more
efficient lowering.

I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.

llvm-svn: 223049
2014-12-01 17:08:32 +00:00
Akira Hatanaka
cad434d590 [stack protector] Set edge weights for newly created basic blocks.
This commit fixes a bug in stack protector pass where edge weights were not set
when new basic blocks were added to lists of successor basic blocks.

Differential Revision: http://reviews.llvm.org/D5766

llvm-svn: 222987
2014-12-01 04:27:03 +00:00
Hans Wennborg
4856812966 Switch lowering: reformat some for loops etc. NFC
llvm-svn: 222962
2014-11-29 21:24:12 +00:00
Hans Wennborg
22fda335d6 Switch lowering: Fix broken 'Figure out which block is next' code
This doesn't seem to have worked in a long time, but other optimizations
would clean it up.

llvm-svn: 222961
2014-11-29 21:17:05 +00:00
Duncan P. N. Exon Smith
73ce6dbb2b Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
Elena Demikhovsky
25f6c9047c Converted back to Unix format (after my last commit 222632)
llvm-svn: 222636
2014-11-23 15:21:53 +00:00
Elena Demikhovsky
36a2243ab7 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Sanjay Patel
5d493f5d01 Don't repeat class/function/variable names in comments. NFC.
llvm-svn: 222555
2014-11-21 18:58:38 +00:00
Sanjay Patel
e65f60a9c9 Less space; NFC
llvm-svn: 222546
2014-11-21 18:05:59 +00:00
Andrea Di Biagio
0a8cf1ad5a [DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero.
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.

This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.

llvm-svn: 222536
2014-11-21 14:32:06 +00:00
Andrea Di Biagio
9c99df5e6c [DAG] Refactor the shuffle combining logic in DAGCombiner. NFC.
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.

llvm-svn: 222522
2014-11-21 11:33:07 +00:00
Hao Liu
9cb82be410 DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal.
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)

A hook is added to allow the target to control whether it needs to do such combine.

Reviewed in http://reviews.llvm.org/D6334

llvm-svn: 222510
2014-11-21 06:39:58 +00:00
Simon Pilgrim
e5f972f1c1 [X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.

I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.

Differential Revision: http://reviews.llvm.org/D5699

llvm-svn: 222340
2014-11-19 10:06:49 +00:00
David Blaikie
60e6c80905 Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Owen Anderson
4490a7cce1 Fix an incorrect chain operand when expanding INSERT_VECTOR operations through the stack.
Patch by Daniil Troshkov!

llvm-svn: 222254
2014-11-18 20:50:19 +00:00
Oliver Stannard
2efad103c3 Fix optimisations of SELECT_CC which assumed result is boolean
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.

This is a follow-up to D6210/r221693.

llvm-svn: 222123
2014-11-17 10:49:31 +00:00
Craig Topper
5b6e56da60 Move register class name strings to a single array in MCRegisterInfo to reduce static table size and number of relocation entries.
Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table.

llvm-svn: 222118
2014-11-17 05:50:14 +00:00
Craig Topper
2aa3e27053 Replace a couple asserts with static_asserts.
llvm-svn: 222114
2014-11-17 00:26:50 +00:00
Craig Topper
5121b0369b Convert some EVTs to MVTs where only a SimpleValueType is needed.
llvm-svn: 222109
2014-11-16 21:17:18 +00:00
Andrea Di Biagio
5475bc1d1b [DAG] Improved target independent vector shuffle folding logic.
This patch teaches the DAGCombiner how to combine shuffles according to rules:
   shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2)
   shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2)

llvm-svn: 222090
2014-11-15 22:56:25 +00:00
Reid Kleckner
c90bcabd2d Allow the use of functions as typeinfo in landingpad clauses
This is one step towards supporting SEH filter functions in LLVM.

llvm-svn: 221954
2014-11-14 00:35:50 +00:00
Aditya Nandakumar
4d9c1ff994 We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed.
llvm-svn: 221926
2014-11-13 21:29:21 +00:00
Frederic Riss
67f2855174 Add an assert and a test that verify r221709's fix.
llvm-svn: 221854
2014-11-13 03:20:23 +00:00
Duncan P. N. Exon Smith
8770505e4e Revert "IR: MDNode => Value"
Instead, we're going to separate metadata from the Value hierarchy.  See
PR21532.

This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.

llvm-svn: 221711
2014-11-11 21:30:22 +00:00
Frederic Riss
b2c059475b Totally forget deallocated SDNodes in SDDbgInfo.
What would happen before that commit is that the SDDbgValues associated with
a deallocated SDNode would be marked Invalidated, but SDDbgInfo would keep
a map entry keyed by the SDNode pointer pointing to this list of invalidated
SDDbgNodes. As the memory gets reused, the list might get wrongly associated
with another new SDNode. As the SDDbgValues are cloned when they are transfered,
this can lead to an exponential number of SDDbgValues being produced during
DAGCombine like in http://llvm.org/bugs/show_bug.cgi?id=20893

Note that the previous behavior wasn't really buggy as the invalidation made
sure that the SDDbgValues won't be used. This commit can be considered a
memory optimization and as such is really hard to validate in a unit-test.

llvm-svn: 221709
2014-11-11 21:21:08 +00:00
Oliver Stannard
553b791f8f LLVM incorrectly folds xor into select
LLVM replaces the SelectionDAG pattern (xor (set_cc cc x y) 1) with
(set_cc !cc x y), which is only correct when the xor has type i1.
Instead, we should check that the constant operand to the xor is all
ones.

llvm-svn: 221693
2014-11-11 17:36:01 +00:00
Andrea Di Biagio
38de0c92c6 [X86] Teach method 'isVectorClearMaskLegal' how to check for legal blend masks.
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.

This allows for example to simplify the following code:

define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
  %1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
  %2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
  %3 = or <4 x i32> %1, %2
  ret <4 x i32> %3
}

Before this patch llc (-mcpu=corei7) generated:
        andps  LCPI1_0(%rip), %xmm0, %xmm0
        andps  LCPI1_1(%rip), %xmm1, %xmm1
        orps   %xmm1, %xmm0, %xmm0
        retq

With this patch we generate a single 'vpblendw'.

llvm-svn: 221343
2014-11-05 13:04:14 +00:00
Paul Robinson
2b27bd26b6 Normally an 'optnone' function goes through fast-isel, which does not
call DAGCombiner. But we ran into a case (on Windows) where the
calling convention causes argument lowering to bail out of fast-isel,
and we end up in CodeGenAndEmitDAG() which does run DAGCombiner.
So, we need to make DAGCombiner check for 'optnone' after all.

Commit includes the test that found this, plus another one that got
missed in the original optnone work.

llvm-svn: 221168
2014-11-03 18:19:26 +00:00
Duncan P. N. Exon Smith
7004fd9aac IR: MDNode => Value: Instruction::getMetadata()
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.

Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.

llvm-svn: 221024
2014-11-01 00:10:31 +00:00
Ahmed Bougacha
38c0bf429c [SelectionDAG] When scalarizing trunc, don't assert for legal operands.
r212242 introduced a legalizer hook, originally to let AArch64 widen
v1i{32,16,8} rather than scalarize, because the legalizer expected, when
scalarizing the result of a conversion operation, to already have
scalarized the operands.  On AArch64, v1i64 is legal, so that commit
ensured operations such as v1i32 = trunc v1i64 wouldn't assert.

It did that by choosing to widen v1 types whenever possible.  However,
v1i1 types, for which there's no legal widened type, would still trigger
the assert.

This commit fixes that, by only scalarizing a trunc's result when the
operand has already been scalarized, and introducing an extract_elt
otherwise.  
This is similar to r205625.

Fixes PR20777.

llvm-svn: 220937
2014-10-30 23:46:50 +00:00
Louis Gerbarg
6f92b8978d Fix incorrect invariant check in DAG Combine
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.

Tests included.

rdar://18816719

llvm-svn: 220933
2014-10-30 22:21:03 +00:00
NAKAMURA Takumi
ee3b3d3d09 Whitespace.
llvm-svn: 220857
2014-10-29 15:23:11 +00:00
Matt Arsenault
cef46eb164 Fix copy paste comment
llvm-svn: 220581
2014-10-24 18:13:10 +00:00
Sanjay Patel
d9b7837012 Use rsqrt (X86) to speed up reciprocal square root calcs
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.

For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).

This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..

See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900

Differential Revision: http://reviews.llvm.org/D5658

llvm-svn: 220570
2014-10-24 17:02:16 +00:00
Ahmed Bougacha
a035d19b50 [SelectionDAG] Teach the vector scalarizer about FP conversions.
This adds support for legalization of instructions of the form:

  [fp_conv] <1 x i1> %op to <1 x double>

where fp_conv is one of fpto[us]i, [us]itofp.  This used to assert
because they were simply missing from the vector operand scalarizer.

A similar problem arose in r190830, with trunc instead.

Fixes PR20778.

Differential Revision: http://reviews.llvm.org/D5810

llvm-svn: 220533
2014-10-23 22:49:25 +00:00
Ahmed Bougacha
4e0335a62d Update comment and fix typos in assert message. (NFC)
llvm-svn: 220531
2014-10-23 22:40:34 +00:00
Tim Northover
d882ba8bc5 ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodes
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS
dependency because it was represented by a pair of CopyFromReg(EFLAGS) ->
CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an
implicit-def on the instruction, where the result numbers in the DAG and the
Uses list in TableGen matched up precisely.

The Copy notation seems much more robust, so this patch extends ScheduleDAG
rather than refactoring x86.

Should fix PR20376.

llvm-svn: 220529
2014-10-23 22:31:48 +00:00
Benjamin Kramer
a6c059251d Strength reduce constant-sized vectors into arrays. No functionality change.
llvm-svn: 220412
2014-10-22 19:55:26 +00:00