llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00

Author	SHA1	Message	Date
Rafael Espindola	6ffbd5bf5d	Fix a bit of confusion about .set and produce more readable assembly. Every target we support has support for assembly that looks like a = b - c .long a What is special about MachO is that the above combination suppresses the production of a relocation. With this change we avoid producing the intermediary labels when they don't add any value. llvm-svn: 220256	2014-10-21 01:17:30 +00:00
Robin Morisset	8dc41d55aa	Erase fence insertion from SelectionDAGBuilder.cpp (NFC) Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain unsound, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957	2014-10-16 20:34:57 +00:00
Duncan P. N. Exon Smith	c1be4794ba	Revert "Revert "DI: Fold constant arguments into a single MDString"" This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010	2014-10-03 20:01:09 +00:00
Duncan P. N. Exon Smith	fb6bcc4eb2	Revert "DI: Fold constant arguments into a single MDString" This reverts commit r218914 while I investigate some bots. llvm-svn: 218918	2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith	58b6077a79	DI: Fold constant arguments into a single MDString This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 218914	2014-10-02 21:56:57 +00:00
Adrian Prantl	2b1df58ebe	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Adrian Prantl	0959156fa3	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	229943585f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
NAKAMURA Takumi	5a124748fa	llvm/test/CodeGen/XCore/dwarf_debug.ll: Fix not to be affected by *-win32. llvm-svn: 212335	2014-07-04 11:58:03 +00:00
Robert Lytton	463595f38e	XCore target: remove incorrect DebugLoc entries from prologue Summary: This was causing the prologue_end to be incorrectly positioned. Differential Revision: http://reviews.llvm.org/D4122 llvm-svn: 212318	2014-07-04 06:38:22 +00:00
Alp Toker	03b6e12fae	Reduce verbiage of lit.local.cfg files We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496	2014-06-09 22:42:55 +00:00
Duncan P. N. Exon Smith	78dd4cd9af	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206707, reapplying r206704. The preceding commit to CalcSpillWeights should have sorted out the failing buildbots. <rdar://problem/14292693> llvm-svn: 206766	2014-04-21 17:57:07 +00:00
Duncan P. N. Exon Smith	f65036e329	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206704, as expected. llvm-svn: 206707	2014-04-19 22:46:00 +00:00
Duncan P. N. Exon Smith	707997192f	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite. I've done a careful audit, added some asserts, and fixed a couple of bugs (unfortunately, they were in unlikely code paths). There's a small chance that this will appease the failing bots [1][2]. (If so, great!) If not, I have a follow-up commit ready that will temporarily add -debug-only=block-freq to the two failing tests, allowing me to compare the code path between what the failing bots and what my machines (and the rest of the bots) are doing. Once I've triggered those builds, I'll revert both commits so the bots go green again. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 <rdar://problem/14292693> llvm-svn: 206704	2014-04-19 22:34:26 +00:00
Duncan P. N. Exon Smith	0ee9548e22	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206666, as planned. Still stumped on why the bots are failing. Sanitizer bots haven't turned anything up. If anyone can help me debug either of the failures (referenced in r206666) I'll owe them a beer. (In the meantime, I'll be auditing my patch for undefined behaviour.) llvm-svn: 206677	2014-04-19 00:42:46 +00:00
Duncan P. N. Exon Smith	66e247e69c	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206628, reapplying r206622 (and r206626). Two tests are failing only on buildbots [1][2]: i.e., I can't reproduce on Darwin, and Chandler can't reproduce on Linux. Asan and valgrind don't tell us anything, but we're hoping the msan bot will catch it. So, I'm applying this again to get more feedback from the bots. I'll leave it in long enough to trigger builds in at least the sanitizer buildbots (it was failing for reasons unrelated to my commit last time it was in), and hopefully a few others.... and then I expect to revert a third time. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 [2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445 llvm-svn: 206666	2014-04-18 22:30:03 +00:00
Duncan P. N. Exon Smith	80fdbd652d	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" (#2 ) This reverts commit r206622 and the MSVC fixup in r206626. Apparently the remotely failing tests are still failing, despite my attempt to fix the nondeterminism in r206621. llvm-svn: 206628	2014-04-18 17:56:08 +00:00
Duncan P. N. Exon Smith	cf746f5ff0	Reapply "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commit r206556, effectively reapplying commit r206548 and its fixups in r206549 and r206550. In an intervening commit I've added target triples to the tests that were failing remotely [1] (but passing locally). I'm hoping the mystery is solved? I'll revert this again if the tests are still failing remotely. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206622	2014-04-18 17:22:25 +00:00
Duncan P. N. Exon Smith	79011f6e40	Revert "blockfreq: Rewrite BlockFrequencyInfoImpl" This reverts commits r206548, r206549 and r206549. There are some unit tests failing that aren't failing locally [1], so reverting until I have time to investigate. [1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816 llvm-svn: 206556	2014-04-18 02:17:43 +00:00
Duncan P. N. Exon Smith	78f8766db3	blockfreq: Rewrite BlockFrequencyInfoImpl Rewrite the shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo entirely. The old implementation had a fundamental flaw: precision losses from nested loops (or very wide branches) compounded past loop exits (and convergence points). The @nested_loops testcase at the end of test/Analysis/BlockFrequencyAnalysis/basic.ll is motivating. This function has three nested loops, with branch weights in the loop headers of 1:4000 (exit:continue). The old analysis gives non-sensical results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': ---- Block Freqs ---- entry = 1.0 for.cond1.preheader = 1.00103 for.cond4.preheader = 5.5222 for.body6 = 18095.19995 for.inc8 = 4.52264 for.inc11 = 0.00109 for.end13 = 0.0 The new analysis gives correct results: Printing analysis 'Block Frequency Analysis' for function 'nested_loops': block-frequency-info: nested_loops - entry: float = 1.0, int = 8 - for.cond1.preheader: float = 4001.0, int = 32007 - for.cond4.preheader: float = 16008001.0, int = 128064007 - for.body6: float = 64048012001.0, int = 512384096007 - for.inc8: float = 16008001.0, int = 128064007 - for.inc11: float = 4001.0, int = 32007 - for.end13: float = 1.0, int = 8 Most importantly, the frequency leaving each loop matches the frequency entering it. The new algorithm leverages BlockMass and PositiveFloat to maintain precision, separates "probability mass distribution" from "loop scaling", and uses dithering to eliminate probability mass loss. I have unit tests for these types out of tree, but it was decided in the review to make the classes private to BlockFrequencyInfoImpl, and try to shrink them (or remove them entirely) in follow-up commits. The new algorithm should generally have a complexity advantage over the old. The previous algorithm was quadratic in the worst case. The new algorithm is still worst-case quadratic in the presence of irreducible control flow, but it's linear without it. The key difference between the old algorithm and the new is that control flow within a loop is evaluated separately from control flow outside, limiting propagation of precision problems and allowing loop scale to be calculated independently of mass distribution. Loops are visited bottom-up, their loop scales are calculated, and they are replaced by pseudo-nodes. Mass is then distributed through the function, which is now a DAG. Finally, loops are revisited top-down to multiply through the loop scales and the masses distributed to pseudo nodes. There are some remaining flaws. - Irreducible control flow isn't modelled correctly. LoopInfo and MachineLoopInfo ignore irreducible edges, so this algorithm will fail to scale accordingly. There's a note in the class documentation about how to get closer. See also the comments in test/Analysis/BlockFrequencyInfo/irreducible.ll. - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting the 64-bit integer precision used downstream. - The "bias" calculation proposed on llvmdev is not incorporated here. This will be added in a follow-up commit, once comments from this review have been handled. llvm-svn: 206548	2014-04-18 01:57:45 +00:00
Richard Osborne	6d5512a94e	[XCore] Don't create invalid MKMSK instructions inside loadImmediate(). Summary: Previously loadImmediate() would produce MKMSK instructions with invalid immediate values such as mkmsk r0, 9. Fix this by checking the mask size is valid. Reviewers: robertlytton Reviewed By: robertlytton CC: llvm-commits Differential Revision: http://reviews.llvm.org/D3289 llvm-svn: 206163	2014-04-14 12:30:35 +00:00
Richard Osborne	7d4ecf1273	[XCore] Add support for the "m" inline asm constraint. Summary: This provides support for CP and DP relative global accesses in inline asm. Reviewers: robertlytton Reviewed By: robertlytton Differential Revision: http://llvm-reviews.chandlerc.com/D2943 llvm-svn: 203129	2014-03-06 16:37:48 +00:00
Richard Osborne	b9f5c6e728	[XCore] Fix call of absolute address. Previously for: tail call void inttoptr (i64 65536 to void ()*)() nounwind We would emit: bl 65536 The immediate operand of the bl instruction is a relative offset so it is wrong to use the absolute address here. llvm-svn: 202860	2014-03-04 16:50:30 +00:00
Richard Osborne	947c19eaa0	[XCore] Support functions returning more than 4 words. If a function returns a large struct by value return the first 4 words in registers and the rest on the stack in a location reserved by the caller. This is needed to support the xC language which supports functions returning an arbitrary number of return values. This is r202397 reapplied with a fix to avoid an uninitialized read of a member. llvm-svn: 202414	2014-02-27 17:47:54 +00:00
Richard Osborne	f8fb4e8a7f	Revert r202396, r202397. These are causing test failures, revert for now. llvm-svn: 202398	2014-02-27 14:24:13 +00:00
Richard Osborne	cb6866dfec	[XCore] Support functions returning more than 4 words. Summary: If a function returns a large struct by value return the first 4 words in registers and the rest on the stack in a location reserved by the caller. This is needed to support the xC language which supports functions returning an arbitrary number of return values. Reviewers: robertlytton Reviewed By: robertlytton CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2889 llvm-svn: 202397	2014-02-27 14:00:40 +00:00
Richard Osborne	5ac74685fd	[XCore] Target optimized library function __memcpy_4() Summary: If the src, dst and size of a memcpy are known to be 4 byte aligned we can call __memcpy_4() instead of memcpy(). Reviewers: robertlytton Reviewed By: robertlytton CC: llvm-commits Differential Revision: http://llvm-reviews.chandlerc.com/D2871 llvm-svn: 202395	2014-02-27 13:39:07 +00:00
Richard Osborne	75c16f2bf4	[XCore] Add dag combines for instructions that ignore some input bits. These instructions ignore the high bits of one of their input operands - try and use this to simplify the code. llvm-svn: 202394	2014-02-27 13:20:11 +00:00
Richard Osborne	f815df9c6e	[XCore] Provide information about known zero bits of resource instructions. llvm-svn: 202393	2014-02-27 13:20:06 +00:00
Andrew Trick	323d31a625	Use regnum regex in an XCore test case. llvm-svn: 202315	2014-02-26 23:22:49 +00:00
Andrew Trick	4823d7c2b4	Very temporarily XFAILing a test. Will be fixed shortly. llvm-svn: 202310	2014-02-26 22:39:59 +00:00
Richard Osborne	d5250f323a	[XCore] Add intrinsic for CLRPT (clear port time) instruction. llvm-svn: 202172	2014-02-25 17:31:15 +00:00
Richard Osborne	127dc9d63c	[XCore] Add intrinsic for EDU (event disable unconditional) instruction. llvm-svn: 202171	2014-02-25 17:31:06 +00:00
Richard Osborne	871fa66400	[XCore] Prefer to word align functions. The behaviour of the XCore's instruction buffer means that the performance of the same code sequence can differ depending on whether it starts at a 4 byte aligned address or not. Since we don't model the instruction buffer in the backend we have no way of knowing for sure if it is beneficial to word align a specific function. However, in the absence of precise modelling, it is better on balance to word align functions because: * It makes a fetch-nop while executing the prologue slightly less likely. * If we don't word align functions then a small perturbation in one function can have a dramatic knock on effect. If the size of the function changes it might change the alignment and therefore the performance of all the functions that happen to follow it in the binary. This butterfly effect makes it harder to reason about and measure the performance of code. llvm-svn: 202163	2014-02-25 16:37:15 +00:00
Robert Lytton	3f025fc96b	XCore target: Handle common linkage llvm-svn: 201563	2014-02-18 11:21:59 +00:00
Robert Lytton	296ff43f53	XCore target: Fix llvm.eh.return and EH info register handling llvm-svn: 201561	2014-02-18 11:21:48 +00:00
Robert Lytton	604b5e52e1	XCore target: fix const section handling Xcore target ABI requires const data that is externally visible to be handled differently if it has C-language linkage rather than C++ language linkage. Clang now emits ".cp.rodata" section information. All other externally visible constant data will be placed in the DP section. llvm-svn: 201144	2014-02-11 10:36:26 +00:00
Robert Lytton	6ac9a5d013	XCore target: Lower ATOMIC_LOAD & ATOMIC_STORE llvm-svn: 201143	2014-02-11 10:36:18 +00:00
Benjamin Kramer	002aed9cb3	Fix broken CHECK lines. llvm-svn: 199016	2014-01-11 21:06:00 +00:00
Robert Lytton	3d4bb0d4e4	XCore Target: correct callee save register spilling when callsUnwindInit is true. llvm-svn: 198616	2014-01-06 14:21:12 +00:00
Robert Lytton	69e4de31bf	XCore target: Lower EH_RETURN llvm-svn: 198615	2014-01-06 14:21:07 +00:00
Robert Lytton	2c10e542b0	XCore target: Lower FRAME_TO_ARGS_OFFSET This requires a knowledge of the stack size which is not known until the frame is complete, hence the need for the XCoreFTAOElim pass which lowers the XCoreISD::FRAME_TO_ARGS_OFFSET instrution into its final form. llvm-svn: 198614	2014-01-06 14:21:00 +00:00
Robert Lytton	9059c1d570	XCore target: Lower RETURNADDR Only handles a depth of zero (the same as FRAMEADDR) llvm-svn: 198613	2014-01-06 14:20:53 +00:00
Robert Lytton	33b5209ddc	XCore target: Optimise entsp / retsp selection llvm-svn: 198612	2014-01-06 14:20:47 +00:00
Robert Lytton	6e7ff61390	XCore target: fix handling of unsized global arrays in large code model llvm-svn: 198609	2014-01-06 14:20:32 +00:00
Robert Lytton	aec919de4b	XCore target: Make handling of large frames not dependent upon an FP. eliminateFrameIndex() has been reworked to handle both small & large frames with either a FP or SP. An additional Slot is required for Scavenging spills when not using FP for large frames. Reworked the handling of Register Scavenging. Whether we are using an FP or not, whether it is a large frame or not, and whether we are using a large code model or not are now independent. llvm-svn: 196091	2013-12-02 11:05:28 +00:00
Robert Lytton	7a58a4e90d	XCore target: fix large code model 'select' indirect address handling. llvm-svn: 196088	2013-12-02 10:18:37 +00:00
Robert Lytton	3eb24d0e61	XCore target: Add large code model When using large code model: Global objects larger than 'CodeModelLargeSize' bytes are placed in sections named with a trailing ".large" The folded global address of such objects are lowered into the const pool. During inspection it was noted that LowerConstantPool() was using a default offset of zero. A fix was made, but due to only offsets of zero being generated, testing only verifies the change is not detrimental. Correct the flags emitted for explicitly specified sections. We assume the size of the object queried by getSectionForConstant() is never greater than CodeModelLargeSize. To handle greater than CodeModelLargeSize, changes to AsmPrinter would be required. llvm-svn: 196087	2013-12-02 10:18:31 +00:00
Robert Lytton	c3b700cb09	XCore target: extend tests in preparation llvm-svn: 196086	2013-12-02 10:18:24 +00:00
Robert Lytton	75d72dfcd2	XCore target: Fix eliminateFrameIndex() to handle large frames Large frame offsets are loaded from the ConstantPool. Where possible, offsets are encoded using the smaller MKMSK instruction. Large frame offsets can only be used when there is a frame-pointer. llvm-svn: 196085	2013-12-02 10:18:19 +00:00

1 2 3 4

175 Commits