llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00

Author	SHA1	Message	Date
David Blaikie	8da0c326fb	Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks. Somewhat unnoticed in the original implementation of discriminators, but it could cause instructions to end up in new, small, DW_TAG_lexical_blocks due to the use of DILexicalBlock to track discriminator changes. Instead, use DILexicalBlockFile which we already use to track file changes without introducing new scopes, so it works well to track discriminator changes in the same way. llvm-svn: 216239	2014-08-21 22:45:21 +00:00
Sanjay Patel	7498546daf	name change: isPow2DivCheap -> isPow2SDivCheap isPow2DivCheap That name doesn't specify signed or unsigned. Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong: srl/add/sra is not the general sequence for signed integer division by power-of-2. We need one more 'sra': sra/srl/add/sra That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case. This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods. No functional change intended. Differential Revision: http://reviews.llvm.org/D5010 llvm-svn: 216237	2014-08-21 22:31:48 +00:00
Quentin Colombet	0bee8352b0	[PeepholeOptimizer] Enable the advanced copy optimization by default. The advanced copy optimization does not yield any difference on the whole llvm test-suite + SPECs, either in compile time or runtime (binaries are identical), but has a big potential when data go back and forth between register files as demonstrated with test/CodeGen/ARM/adv-copy-opt.ll. Note: This was measured for both Os and O3 for armv7s, arm64, and x86_64. <rdar://problem/12702965> llvm-svn: 216236	2014-08-21 22:23:52 +00:00
Robin Morisset	b2dd60f27d	Rename AtomicExpandLoadLinked into AtomicExpand AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of a group that aim at making it more target-independent. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html for details The command line option is "atomic-expand" llvm-svn: 216231	2014-08-21 21:50:01 +00:00
Quentin Colombet	14b5a5d227	[PeepholeOptimizer] Update the kill flags when extending the live-range of the source of a copy. <rdar://problem/12702965> llvm-svn: 216229	2014-08-21 21:34:06 +00:00
David Blaikie	7a58463dea	Explicitly pass ownership of the MemoryBuffer to AddNewSourceBuffer using std::unique_ptr llvm-svn: 216223	2014-08-21 20:44:56 +00:00
Benjamin Kramer	54a0e07417	DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors with many arguments. PR20677 llvm-svn: 216175	2014-08-21 13:28:02 +00:00
Oliver Stannard	7994364d2f	[ARM] Enable DP copy, load and store instructions for FPv4-SP The FPv4-SP floating-point unit is generally referred to as single-precision only, but it does have double-precision registers and load, store and GPR<->DPR move instructions which operate on them. This patch enables the use of these registers, the main advantage of which is that we now comply with the AAPCS-VFP calling convention. This partially reverts r209650, which added some AAPCS-VFP support, but did not handle return values or alignment of double arguments in registers. This patch also adds tests for Thumb2 code generation for floating-point instructions and intrinsics, which previously only existed for ARM. llvm-svn: 216172	2014-08-21 12:50:31 +00:00
Craig Topper	65775cc03d	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 216158	2014-08-21 05:55:13 +00:00
Jiangning Liu	c14e7de948	Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type". llvm-svn: 216147	2014-08-21 01:59:30 +00:00
Quentin Colombet	be2a400039	[PeepholeOptimizer] Take advantage of the isInsertSubreg property in the advanced copy optimization. This is the final step patch toward transforming: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 bx lr Indeed, thanks to this patch, this optimization is able to look through vmov.32 d16[0], r0 vmov.32 d16[1], r1 and is able to rewrite the following sequence: vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 into simple generic GPR copies that the coalescer managed to remove. <rdar://problem/12702965> llvm-svn: 216144	2014-08-21 00:19:16 +00:00
Quentin Colombet	1849edbdf6	Add isInsertSubreg property. This patch adds a new property: isInsertSubreg and the related target hooks: TargetIntrInfo::getInsertSubregInputs and TargetInstrInfo::getInsertSubregLikeInputs to specify that a target specific instruction is a (kind of) INSERT_SUBREG. The approach is similar to r215394. <rdar://problem/12702965> llvm-svn: 216139	2014-08-20 23:49:36 +00:00
Quentin Colombet	2d800208a6	[PeepholeOptimizer] Take advantage of the isExtractSubreg property in the advanced copy optimization. This patch is a step toward transforming: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 bx lr Indeed, thanks to this patch, this optimization is able to look through vmov r0, r1, d16 but it does not understand yet vmov.32 d16[0], r0 vmov.32 d16[1], r1 Comming patches will fix that and update the related test case. <rdar://problem/12702965> llvm-svn: 216136	2014-08-20 23:13:02 +00:00
Quentin Colombet	b02f26b10f	Add isExtractSubreg property. This patch adds a new property: isExtractSubreg and the related target hooks: TargetIntrInfo::getExtractSubregInputs and TargetInstrInfo::getExtractSubregLikeInputs to specify that a target specific instruction is a (kind of) EXTRACT_SUBREG. The approach is similar to r215394. <rdar://problem/12702965> llvm-svn: 216130	2014-08-20 21:51:26 +00:00
Alexey Samsonov	e029f253b7	Fix null reference creation in SelectionDAG constructor. Store TargetSelectionDAGInfo as a pointer instead of a reference: getSelectionDAGInfo() may not be implemented for certain backends (e.g. it's not currently implemented for R600). This bug is reported by UBSan. llvm-svn: 216129	2014-08-20 21:40:15 +00:00
Alexey Samsonov	08af4466dd	Cleanup: Delete seemingly unused reference to MachineDominatorTree from ScheduleDAGInstrs. llvm-svn: 216124	2014-08-20 20:57:26 +00:00
Alexey Samsonov	d47abf2d7c	Fix null reference creation in ScheduleDAGInstrs constructor call. Both MachineLoopInfo and MachineDominatorTree may be null in ScheduleDAGMI constructor call. It is undefined behavior to take references to these values. This bug is reported by UBSan. llvm-svn: 216118	2014-08-20 19:36:05 +00:00
Sanjay Patel	55357e1fa3	critical-anti-dependency breaker: don't use reg def info from kill insts (PR20308) In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker caused a miscompile because it broke a WAR hazard using a register that it thinks is available based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from a kill because they are really just nops. This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again. The test case is a reduced version of the code from the bug report. Differential Revision: http://reviews.llvm.org/D4977 llvm-svn: 216114	2014-08-20 18:03:00 +00:00
Quentin Colombet	5404c5510b	[PeepholeOptimizer] Refactor the advanced copy optimization to take advantage of the isRegSequence property. This is a follow-up of r215394 and r215404, which respectively introduces the isRegSequence property and uses it for ARM. Thanks to the property introduced by the previous commits, this patch is able to optimize the following sequence: vmov d0, r2, r3 vmov d1, r0, r1 vmov r0, s0 vmov r1, s2 udiv r0, r1, r0 vmov r1, s1 vmov r2, s3 udiv r1, r2, r1 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr This patch refactors how the copy optimizations are done in the peephole optimizer. Prior to this patch, we had one copy-related optimization that replaced a copy or bitcast by a generic, more suitable (in terms of register file), copy. With this patch, the peephole optimizer features two copy-related optimizations: 1. One for rewriting generic copies to generic copies: PeepholeOptimizer::optimizeCoalescableCopy. 2. One for replacing non-generic copies with generic copies: PeepholeOptimizer::optimizeUncoalescableCopy. The goals of these two optimizations are slightly different: one rewrite the operand of the instruction (#1), the other kills off the non-generic instruction and replace it by a (sequence of) generic instruction(s). Both optimizations rely on the ValueTracker introduced in r212100. The ValueTracker has been refactored to use the information from the TargetInstrInfo for non-generic instruction. As part of the refactoring, we switched the tracking from the index of the definition to the actual register (virtual or physical). This one change is to provide better consistency with register related APIs and to ease the use of the TargetInstrInfo. Moreover, this patch introduces a new helper class CopyRewriter used to ease the rewriting of generic copies (i.e., #1). Finally, this patch adds a dead code elimination pass right after the peephole optimizer to get rid of dead code that may appear after rewriting. This is related to <rdar://problem/12702965>. Review: http://reviews.llvm.org/D4874 llvm-svn: 216088	2014-08-20 17:41:48 +00:00
Jiangning Liu	c3dd378a9e	Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG Builder and type legalization stage. With those two optimizations, fewer signed/zero extension instructions can be inserted, and then we can expose more opportunities to Machine CSE pass in back-end. llvm-svn: 216066	2014-08-20 12:05:15 +00:00
Juergen Ributzka	15f8549d05	Reapply [FastISel] Let the target decide first if it wants to materialize a constant (215588). Note: This was originally reverted to track down a buildbot error. This commit exposed a latent bug that was fixed in r215753. Therefore it is reapplied without any modifications. I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new regeressions. Original commit message: This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 216006	2014-08-19 19:05:24 +00:00
Oliver Stannard	0f36700d69	Teach the AArch64 backend to handle f16 This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv operations on f16 (half-precision) types by promoting to f32. llvm-svn: 215891	2014-08-18 14:22:39 +00:00
Craig Topper	aa7422b5a6	Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size." Getting a weird buildbot failure that I need to investigate. llvm-svn: 215870	2014-08-18 00:24:38 +00:00
Craig Topper	227456e133	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 215868	2014-08-17 23:47:00 +00:00
Matt Arsenault	71bb8180a2	Fix fmul combines with constant splat vectors Fixes things like fmul x, 2 -> fadd x, x llvm-svn: 215820	2014-08-16 10:14:19 +00:00
Andrea Di Biagio	e85b8c6f4a	[DAGCombiner] Improve the folding of target independet shuffles to Undef. When combining a pair of shuffle nodes, check if the combined shuffle mask is trivially Undef. In case, immediately fold that pair of shuffles to Undef. The lack of checks for undef masks was the root-cause of a poor-codegen bug in the dag combiner. Example: %1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6> %3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3> Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire sequence to Undef value and therefore we generated: shufps $-123, %xmm1, $xmm0 pshufd $-46, %xmm0, %xmm0 With this patch, the entire shuffle sequence is folded to Undef and no shuffles are generated in the output assembly. Added new test cases to test 'combine-vec-shuffle-5.ll'. llvm-svn: 215797	2014-08-16 00:29:44 +00:00
Hal Finkel	27a66a526c	Make isAliased property for fixed-offset stack objects adjustable We used to assume that any fixed-offset stack object was not aliased. This meant that no IR value could point to the memory contained in such an object. This is a reasonable default, but is not a universally-correct target-independent fact. For example, on PowerPC (both Darwin and non-Darwin), some byval arguments are allocated at fixed offsets by the ABI. These, however, certainly can be pointed to by IR values. This change moves the 'isAliased' logic out of FixedStackPseudoSourceValue and into MFI, and allows the isAliased property to be overridden for fixed-offset objects. This will be used by an upcoming commit to the PowerPC backend to fix PR20280. No functionality change intended (the behavior of FixedStackPseudoSourceValue::isAliased has been made more conservative for callers that don't pass an MFI object, but I don't see any in-tree callers that do that). llvm-svn: 215794	2014-08-16 00:17:02 +00:00
Robin Morisset	8881e6ec1a	Fix typos in comments llvm-svn: 215777	2014-08-15 22:17:28 +00:00
Juergen Ributzka	a6eef50b3f	[FastISel] Remove an performance debugging assert. As Jim pointed out this assert isn't really needed to test for correctness, because the code right afterwards does the same check and falls-back to SelectionDAG - as intended. llvm-svn: 215735	2014-08-15 17:36:30 +00:00
Rafael Espindola	a9d0fe5b94	Delete dead code. NFC. llvm-svn: 215720	2014-08-15 14:58:22 +00:00
Juergen Ributzka	a981de1e50	Revert several FastISel commits to track down a buildbot error. This reverts: r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants." r215594 "[FastISel][X86] Use XOR to materialize the "0" value." r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization." r215591 "[FastISel][AArch64] Make use of the zero register when possible." r215588 "[FastISel] Let the target decide first if it wants to materialize a constant." r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI." llvm-svn: 215673	2014-08-14 19:56:28 +00:00
Sanjay Patel	e7588bf533	optimize vector fneg of bitcasted integer value This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops. This patch is very similar to a fabs patch committed at r214892. Differential Revision: http://reviews.llvm.org/D4852 llvm-svn: 215646	2014-08-14 15:15:28 +00:00
Chandler Carruth	b65d715ac5	[SDAG] Fix a bug in the DAG combiner where we would fail to return the input node after manually adding it to the worklist and using CombineTo. Once we use CombineTo the input node may have been deleted. Despite this being completely confusing and somewhat broken, the only way to "correctly" return from a DAG combine after potentially deleting the input node is to return that exact node.... But really, this code should just never have used CombineTo. It won't do what it wants (returning the node as mentioned above just causes the combine to infloop). The correct way to combine away a casted load to a load of the correct type is to RAUW the chain directly and then return the loaded value to replace the actual value node. I managed to find this with the vector shuffle fuzzer even though it clearly has nothing at all to do with vector shuffles and rather those happen to trigger a load of a constant pool that hits this combine just right. I've included the test as it is small and a nice stress test that the infrastructure isn't asserting. llvm-svn: 215622	2014-08-14 08:18:34 +00:00
Chandler Carruth	f5c0be3c57	[SDAG] Fix a case where we would iteratively legalize a node during combining by replacing it with something else but not re-process the node afterward to remove it. In a truly remarkable stroke of bad luck, this would (in the test case attached) end up getting some other node combined into it without ever getting re-processed. By adding it back on to the worklist, in addition to deleting the dead nodes more quickly we also ensure that if it stops being dead for any reason it makes it back through the legalizer. Without this, the test case will end up failing during instruction selection due to an and node with a type we don't have an instruction pattern for. It took many million runs of the shuffle fuzz tester to find this. llvm-svn: 215611	2014-08-14 01:07:37 +00:00
Juergen Ributzka	3e89dc8eab	[FastISel] Let the target decide first if it wants to materialize a constant. This changes the order in which FastISel tries to materialize a constant. Originally it would try to use a simple target-independent approach, which can lead to the generation of inefficient code. On X86 this would result in the use of movabsq to materialize any 64bit integer constant - even for simple and small values such as 0 and 1. Also some very funny floating-point materialization could be observed too. On AArch64 it would materialize the constant 0 in a register even the architecture has an actual "zero" register. On ARM it would generate unnecessary mov instructions or not use mvn. This change simply changes the order and always asks the target first if it likes to materialize the constant. This doesn't fix all the issues mentioned above, but it enables the targets to implement such optimizations. Related to <rdar://problem/17420988>. llvm-svn: 215588	2014-08-13 22:08:02 +00:00
Gerolf Hoflehner	971b863baf	[MachineCombiner] Removal of dangling DBG_VALUES after combining [20598] This is a cleaner solution to the problem described in r215431. When instructions are combined a dangling DBG_VALUE is removed. This resolves bug 20598. llvm-svn: 215587	2014-08-13 22:07:36 +00:00
Gerolf Hoflehner	97f5907d85	[Cleanup] Utility function to erase instruction and mark DBG_Values New function to erase a machine instruction and mark DBG_VALUE for removal. A DBG_VALUE is marked for removal when it references an operand defined in the instruction. Use the new function to cleanup code in dead machine instruction removal pass. llvm-svn: 215580	2014-08-13 21:15:23 +00:00
Quentin Colombet	2563daec04	[MachineDominatorTree] Provide a method to inform a MachineDominatorTree that a critical edge has been split. The MachineDominatorTree will when lazy update the underlying dominance properties when require. Context This is a follow-up of r215410. Each time a critical edge is split this invalidates the dominator tree information. Thus, subsequent queries of that interface will be slow until the underlying information is actually recomputed (costly). Problem Prior to this patch, splitting a critical edge needed to query the dominator tree to update the dominator information. Therefore, splitting a bunch of critical edges will likely produce poor performance as each query to the dominator tree will use the slow query path. This happens a lot in passes like MachineSink and PHIElimination. Proposed Solution Splitting a critical edge is a local modification of the CFG. Moreover, as soon as a critical edge is split, it is not critical anymore and thus cannot be a candidate for critical edge splitting anymore. In other words, the predecessor and successor of a basic block inserted on a critical edge cannot be inserted by critical edge splitting. Using these observations, we can pile up the splitting of critical edge and apply then at once before updating the DT information. The core of this patch moves the update of the MachineDominatorTree information from MachineBasicBlock::SplitCriticalEdge to a lazy MachineDominatorTree. Performance Thanks to this patch, the motivating example compiles in 4- minutes instead of 6+ minutes. No test case added as the motivating example as nothing special but being huge! The binaries are strictly identical for all the llvm test-suite + SPECs with and without this patch for both Os and O3. Regarding compile time, I observed only noise, although on average I saw a small improvement. <rdar://problem/17894619> llvm-svn: 215576	2014-08-13 21:00:07 +00:00
Benjamin Kramer	da144ed5a2	Canonicalize header guards into a common format. Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558	2014-08-13 16:26:38 +00:00
Andrea Di Biagio	2e80a6ad4d	[DAGCombiner] Improved target independent vector shuffle combine rule. This patch improves the existing algorithm in DAGCombiner that attempts to fold shuffles according to rule: shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3) Before this change, there were cases where the DAGCombiner conservatively avoided folding shuffles even if the resulting mask would have been legal. That is because the algorithm wrongly assumed that commuting an illegal shuffle mask would always produce an illegal mask. With this change, we now correctly compute the commuted shuffle mask before calling method 'isShuffleMaskLegal' on it. On X86, this improves for example the codegen for the following function: define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) { %1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7> %2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3> ret <4 x i32> %2 } Before this change the X86 backend (-mcpu=corei7) generated the following assembly code for function @test: shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3] movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1] movaps %xmm1, %xmm0 Now we produce: movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1] Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly fold according to the above-mentioned rule. llvm-svn: 215555	2014-08-13 16:09:40 +00:00
Hal Finkel	97fb1d4d91	[PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store intrinsics. As with the construction of the MachineMemOperands for the intrinsic calls used for unaligned load/store lowering, the only slight complication is that we need to represent a larger memory range than the loaded/stored value-type size (because the address is rounded down to an aligned address, and we need to conservatively represent the entire possible range of the actual access). This required adding an extra size field to TargetLowering::IntrinsicInfo, and this was done in a way that required no modifications to other targets (the size defaults to the store size of the provided memory data type). This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed). llvm-svn: 215512	2014-08-13 01:15:40 +00:00
Adrian Prantl	7b5b539bcc	Remove a condition that can never be true, as wittnessed by the assert above. llvm-svn: 215477	2014-08-12 21:55:58 +00:00
Quentin Colombet	384a195a06	Fix a parentheses warning introduced in r215394. llvm-svn: 215459	2014-08-12 17:11:26 +00:00
Eric Christopher	fd30e29785	Have MachineRegisterInfo take and store the MachineFunction it was created for rather than the TargetMachine since we only needed the TM for the subtarget and we can get that from the MF. llvm-svn: 215432	2014-08-12 08:00:56 +00:00
Adrian Prantl	c007e0faa6	DebugLocEntry: Restore the comparison predicate from before the refactoring in 215384. This way it can unique multiple entries describing the same piece even if they don't have the exact same location. (The same piece may get merged in and be added from OpenRanges). There ought to be a more elegant solution for this, though. llvm-svn: 215418	2014-08-12 01:07:53 +00:00
David Blaikie	04dd297573	Revert "Partially revert r214761 that asserted that all concrete debug info variables had DIEs, due to a failure on Darwin." I believe this was addressed by r215157 and r215227, so let's have another go at the bots, etc. This reverts commit r214880. llvm-svn: 215412	2014-08-12 00:00:31 +00:00
Quentin Colombet	a09cf4d3ee	[MachineSink] Improve the compile time by preserving the dominance information as long as possible. Context Each time the dominance information is modified, the dominator tree analysis switches in a slow query mode. After a few queries without any modification on the dominator tree, it performs an expensive update of its internal structure to provide fast queries again. Problem Prior to this patch, the MachineSink pass was splitting the critical edges on demand while relying heavy on the dominator tree information. In some cases, this leads to pathological behavior where: - We end up in the slow query mode right after splitting an edge. - We update the dominance information. - We break the dominance information again, thus ending up in the slow query mode and so on. Proposed Solution To mitigate this effect, this patch postpones all the splitting of the edges at the end of each iteration of the main loop. The benefits are: - The dominance information is valid for the life time of an iteration. - This simplifies the code as we do not have to special treat instructions that are sunk on critical edges. Indeed, the related block will be available through the next iteration. The downside is that when edges splitting is required, this incurs an additional iteration of the main loop compared to the previous scheme. Performance Thanks to this patch, the motivating example compiles in 6+ minutes instead of 10+ minutes. No test case added as the motivating example as nothing special but being huge! I have measured only noise for both the compile time and the runtime on the llvm test-suite + SPECs with Os and O3. Note: The current implementation of MachineBasicBlock::SplitCriticalEdge also uses the dominance information and therefore, hits this problem. A subsequent patch will address that. <rdar://problem/17894619> llvm-svn: 215410	2014-08-11 23:52:01 +00:00
Michael J. Spencer	045df0bbd3	[x86] Fold extract_vector_elt of a load into the Load's address computation. llvm-svn: 215409	2014-08-11 23:49:33 +00:00
Adrian Prantl	8221613b32	Add a couple of convenience accessors to DebugLocEntry::Value to further simplify common usage patterns. llvm-svn: 215407	2014-08-11 23:22:59 +00:00
Adrian Prantl	ed31df5992	Make these DebugLocEntry::Value comparison operators friend functions as suggested by dblaikie in a comment on r215384. llvm-svn: 215403	2014-08-11 22:52:56 +00:00

1 2 3 4 5 ...

17078 Commits