llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00

Author	SHA1	Message	Date
Ahmed Bougacha	3f5a21a3d9	[SimplifyLibCalls] Factor out fortified libcall handling. This lets us remove CGP duplicate. Differential Revision: http://reviews.llvm.org/D6541 llvm-svn: 225640	2015-01-12 17:22:43 +00:00
Duncan P. N. Exon Smith	f4c5cf1d04	CodeGen: Use handy new-fangled post-increment, NFC Drive-by cleanup; I noticed this when reviewing the patch that became r225466. llvm-svn: 225468	2015-01-08 21:07:55 +00:00
Duncan P. N. Exon Smith	b0dfd1b536	CodeGen: Use range-based for loops, NFC Patch by Ramkumar Ramachandra! llvm-svn: 225466	2015-01-08 20:44:33 +00:00
Ahmed Bougacha	4150499cd1	[SelectionDAG] Allow targets to specify legality of extloads' result type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421	2015-01-08 00:51:32 +00:00
Andrea Di Biagio	83f99b8d3f	[CodeGenPrepare] Improved logic to speculate calls to cttz/ctlz. This patch improves the logic added at revision 224899 (see review D6728) that teaches the backend when it is profitable to speculate calls to cttz/ctlz. The original algorithm conservatively avoided speculating more than one instruction from a basic block in a control flow grap modelling an if-statement. In particular, the only allowed instruction (excluding the terminator) was a call to cttz/ctlz. However, there are cases where we could be less conservative and still be able to speculate a call to cttz/ctlz. With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the result is zero extended/truncated in the same basic block, and the zext/trunc instruction is "free" for the target. Added new test cases to CodeGen/X86/cttz-ctlz.ll Differential Revision: http://reviews.llvm.org/D6853 llvm-svn: 225274	2015-01-06 17:41:18 +00:00
Craig Topper	03e518b16d	Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert. llvm-svn: 225160	2015-01-05 10:15:49 +00:00
Andrea Di Biagio	64b0da84f4	[CodeGenPrepare] Teach when it is profitable to speculate calls to @llvm.cttz/ctlz. If the control flow is modelling an if-statement where the only instruction in the 'then' basic block (excluding the terminator) is a call to cttz/ctlz, CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control flow graph. Example: \code entry: %cmp = icmp eq i64 %val, 0 br i1 %cmp, label %end.bb, label %then.bb then.bb: %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) br label %end.bb end.bb: %cond = phi i64 [ %c, %then.bb ], [ 64, %entry] \code In this example, basic block %then.bb is taken if value %val is not zero. Also, the phi node in %end.bb would propagate the size-of in bits of %val only if %val is equal to zero. With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb into basic block %entry only if cttz is cheap to speculate for the target. Added two new hooks in TargetLowering.h to let targets customize the behavior (i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'. By default, both methods return 'false'. On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI. Differential Revision: http://reviews.llvm.org/D6728 llvm-svn: 224899	2014-12-28 11:07:35 +00:00
Elena Demikhovsky	5ba5e3e13d	Scalarizer for masked load and store intrinsics. Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks. http://reviews.llvm.org/D6436 llvm-svn: 224897	2014-12-28 08:54:45 +00:00
Quentin Colombet	c245321b0e	[CodeGenPrepare] Handle properly the promotion of operands when this does not generate instructions. Fixes PR21978. Related to <rdar://problem/18310086> llvm-svn: 224717	2014-12-22 18:11:52 +00:00
Quentin Colombet	5896cdb9ff	[CodeGenPrepare] Reapply r224351 with a fix for the assertion failure: The type promotion helper does not support vector type, so when make such it does not kick in in such cases. Original commit message: [CodeGenPrepare] Move sign/zero extensions near loads using type promotion. This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224402	2014-12-17 01:36:17 +00:00
Reid Kleckner	b4ee65bf9b	Revert "[CodeGenPrepare] Move sign/zero extensions near loads using type promotion." This reverts commit r224351. It causes assertion failures when building ICU. llvm-svn: 224397	2014-12-17 00:29:23 +00:00
Quentin Colombet	d31121348b	[CodeGenPrepare] Move sign/zero extensions near loads using type promotion. This patch extends the optimization in CodeGenPrepare that moves a sign/zero extension near a load when the target can combine them. The optimization may promote any operations between the extension and the load to make that possible. Although this optimization may be beneficial for all targets, in particular AArch64, this is enabled for X86 only as I have not benchmarked it for other targets yet. Context Most targets feature extended loads, i.e., loads that perform a zero or sign extension for free. In that context it is interesting to expose such pattern in CodeGenPrepare so that the instruction selection pass can form such loads. Sometimes, this pattern is blocked because of instructions between the load and the extension. When those instructions are promotable to the extended type, we can expose this pattern. Motivating Example Let us consider an example: define void @foo(i8* %addr1, i32* %addr2, i8 %a, i32 %b) { %ld = load i8* %addr1 %zextld = zext i8 %ld to i32 %ld2 = load i32* %addr2 %add = add nsw i32 %ld2, %zextld %sextadd = sext i32 %add to i64 %zexta = zext i8 %a to i32 %addza = add nsw i32 %zexta, %zextld %sextaddza = sext i32 %addza to i64 %addb = add nsw i32 %b, %zextld %sextaddb = sext i32 %addb to i64 call void @dummy(i64 %sextadd, i64 %sextaddza, i64 %sextaddb) ret void } As it is, this IR generates the following assembly on x86_64: [...] movzbl (%rdi), %eax # zero-extended load movl (%rsi), %es # plain load addl %eax, %esi # 32-bit add movslq %esi, %rdi # sign extend the result of add movzbl %dl, %edx # zero extend the first argument addl %eax, %edx # 32-bit add movslq %edx, %rsi # sign extend the result of add addl %eax, %ecx # 32-bit add movslq %ecx, %rdx # sign extend the result of add [...] The throughput of this sequence is 7.45 cycles on Ivy Bridge according to IACA. Now, by promoting the additions to form more extended loads we would generate: [...] movzbl (%rdi), %eax # zero-extended load movslq (%rsi), %rdi # sign-extended load addq %rax, %rdi # 64-bit add movzbl %dl, %esi # zero extend the first argument addq %rax, %rsi # 64-bit add movslq %ecx, %rdx # sign extend the second argument addq %rax, %rdx # 64-bit add [...] The throughput of this sequence is 6.15 cycles on Ivy Bridge according to IACA. This kind of sequences happen a lot on code using 32-bit indexes on 64-bit architectures. Note: The throughput numbers are similar on Sandy Bridge and Haswell. Proposed Solution To avoid the penalty of all these sign/zero extensions, we merge them in the loads at the beginning of the chain of computation by promoting all the chain of computation on the extended type. The promotion is done if and only if we do not introduce new extensions, i.e., if we do not degrade the code quality. To achieve this, we extend the existing “move ext to load” optimization with the promotion mechanism introduced to match larger patterns for addressing mode (r200947). The idea of this extension is to perform the following transformation: ext(promotableInst1(...(promotableInstN(load)))) => promotedInst1(...(promotedInstN(ext(load)))) The promotion mechanism in that optimization is enabled by a new TargetLowering switch, which is off by default. In other words, by default, the optimization performs the “move ext to load” optimization as it was before this patch. Performance Configuration: x86_64: Ivy Bridge fixed at 2900MHz running OS X 10.10. Tested Optimization Levels: O3/Os Tests: llvm-testsuite + externals. Results: - No regression beside noise. - Improvements: CINT2006/473.astar: ~2% Benchmarks/PAQ8p: ~2% Misc/perlin: ~3% The results are consistent for both O3 and Os. <rdar://problem/18310086> llvm-svn: 224351	2014-12-16 19:09:03 +00:00
Duncan P. N. Exon Smith	3d57886267	IR: Split Metadata from Value Split `Metadata` away from the `Value` class hierarchy, as part of PR21532. Assembly and bitcode changes are in the wings, but this is the bulk of the change for the IR C++ API. I have a follow-up patch prepared for `clang`. If this breaks other sub-projects, I apologize in advance :(. Help me compile it on Darwin I'll try to fix it. FWIW, the errors should be easy to fix, so it may be simpler to just fix it yourself. This breaks the build for all metadata-related code that's out-of-tree. Rest assured the transition is mechanical and the compiler should catch almost all of the problems. Here's a quick guide for updating your code: - `Metadata` is the root of a class hierarchy with three main classes: `MDNode`, `MDString`, and `ValueAsMetadata`. It is distinct from the `Value` class hierarchy. It is typeless -- i.e., instances do not have a `Type`. - `MDNode`'s operands are all `Metadata ` (instead of `Value `). - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively. If you're referring solely to resolved `MDNode`s -- post graph construction -- just use `MDNode`. - `MDNode` (and the rest of `Metadata`) have only limited support for `replaceAllUsesWith()`. As long as an `MDNode` is pointing at a forward declaration -- the result of `MDNode::getTemporary()` -- it maintains a side map of its uses and can RAUW itself. Once the forward declarations are fully resolved RAUW support is dropped on the ground. This means that uniquing collisions on changing operands cause nodes to become "distinct". (This already happened fairly commonly, whenever an operand went to null.) If you're constructing complex (non self-reference) `MDNode` cycles, you need to call `MDNode::resolveCycles()` on each node (or on a top-level node that somehow references all of the nodes). Also, don't do that. Metadata cycles (and the RAUW machinery needed to construct them) are expensive. - An `MDNode` can only refer to a `Constant` through a bridge called `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`). As a side effect, accessing an operand of an `MDNode` that is known to be, e.g., `ConstantInt`, takes three steps: first, cast from `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`; third, cast down to `ConstantInt`. The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have metadata schema owners transition away from using `Constant`s when the type isn't important (and they don't care about referring to `GlobalValue`s). In the meantime, I've added transitional API to the `mdconst` namespace that matches semantics with the old code, in order to avoid adding the error-prone three-step equivalent to every call site. If your old code was: MDNode N = foo(); bar(isa <ConstantInt>(N->getOperand(0))); baz(cast <ConstantInt>(N->getOperand(1))); bak(cast_or_null <ConstantInt>(N->getOperand(2))); bat(dyn_cast <ConstantInt>(N->getOperand(3))); bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4))); you can trivially match its semantics with: MDNode N = foo(); bar(mdconst::hasa <ConstantInt>(N->getOperand(0))); baz(mdconst::extract <ConstantInt>(N->getOperand(1))); bak(mdconst::extract_or_null <ConstantInt>(N->getOperand(2))); bat(mdconst::dyn_extract <ConstantInt>(N->getOperand(3))); bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4))); and when you transition your metadata schema to `MDInt`: MDNode N = foo(); bar(isa <MDInt>(N->getOperand(0))); baz(cast <MDInt>(N->getOperand(1))); bak(cast_or_null <MDInt>(N->getOperand(2))); bat(dyn_cast <MDInt>(N->getOperand(3))); bay(dyn_cast_or_null<MDInt>(N->getOperand(4))); - A `CallInst` -- specifically, intrinsic instructions -- can refer to metadata through a bridge called `MetadataAsValue`. This is a subclass of `Value` where `getType()->isMetadataTy()`. `MetadataAsValue` is the only class that can legally refer to a `LocalAsMetadata`, which is a bridged form of non-`Constant` values like `Argument` and `Instruction`. It can also refer to any other `Metadata` subclass. (I'll break all your testcases in a follow-up commit, when I propagate this change to assembly.) llvm-svn: 223802	2014-12-09 18:38:53 +00:00
Juergen Ributzka	4299b7a6ce	[CGP] Rewrite pattern match for splitBranchCondition to work with Values instead. Rewrite the pattern match code to work also with Values instead with Instructions only. Also remove the no longer need matcher (m_Instruction). llvm-svn: 223797	2014-12-09 17:50:10 +00:00
Juergen Ributzka	a2310d0648	Revert "Move function to obtain branch weights into the BranchInst class. NFC." This reverts commit r223784 and copies the 'ExtractBranchMetadata' to CodeGenPrepare. llvm-svn: 223795	2014-12-09 17:32:12 +00:00
Juergen Ributzka	9c01dd36f8	[CodeGenPrepare] Split branch conditions into multiple conditional branches. This optimization transforms code like: bb1: %0 = icmp ne i32 %a, 0 %1 = icmp ne i32 %b, 0 %or.cond = or i1 %0, %1 br i1 %or.cond, label %TrueBB, label %FalseBB into a multiple branch instructions like: bb1: %0 = icmp ne i32 %a, 0 br i1 %0, label %TrueBB, label %bb2 bb2: %1 = icmp ne i32 %b, 0 br i1 %1, label %TrueBB, label %FalseBB This optimization is already performed by SelectionDAG, but not by FastISel. FastISel cannot perform this optimization, because it cannot generate new MachineBasicBlocks. Performing this optimization at CodeGenPrepare time makes it available to both - SelectionDAG and FastISel - and the implementation in SelectiuonDAG could be removed. There are currenty a few differences in codegen for X86 and PPC, so this commmit only enables it for FastISel. Reviewed by Jim Grosbach This fixes rdar://problem/19034919. llvm-svn: 223786	2014-12-09 16:36:13 +00:00
Ahmed Bougacha	59bdb1e517	[CodeGenPrepare] Use variables for reused values. NFC. llvm-svn: 223491	2014-12-05 18:04:40 +00:00
David Blaikie	60e6c80905	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
Quentin Colombet	9239f4dcef	[CodeGenPrepare] Handle zero extensions in the TypePromotionHelper. Prior to this patch the TypePromotionHelper was promoting only sign extensions. Supporting zero extensions changes: - How constants are extended. - How sign extensions, zero extensions, and truncate are composed together. - How the type of the extended operation is recorded. Now we need to know the kind of the extension as well as its type. Each change is fairly small, unlike the diff. Most of the diff are comments/variable renaming to say "extension" instead of "sign extension". The performance improvements on the test suite are within the noise. Related to <rdar://problem/18310086>. llvm-svn: 221851	2014-11-13 01:44:51 +00:00
Ahmed Bougacha	d50e1d8236	[CodeGenPrepare] Replace other uses of EVT::getEVT with TL::getValueType. r221820 fixed a problem (PR21548) where an iPTR was used in TLI legality checks, which isn't valid and resulted in a failed assertion. The solution was to lower pointer types into the correct target's VT, by using TL::getValueType instead of EVT::getEVT. This commit changes 3 other uses of EVT::getEVT, but without any tests: - One of these non-lowered EVTs is passed to allowsMisalignedMemoryAccesses, which goes into target's TL implementation and doesn't cause any problem (yet.) - Two others are passed to TLI.isOperationLegalOrCustom: - one only looks at extensions, so doesn't concern pointers. - one only looks at binary operators, so also isn't a problem. The latter might some day be exposed to pointers and cause the same assert as the original PR, because there's a comment hinting at also supporting cast ops. For consistency, update all of them and be done with it. llvm-svn: 221827	2014-11-12 23:05:03 +00:00
Ahmed Bougacha	6f5323f5b8	[CodeGenPrepare][AArch64] Fix a TLI legality check on iPTR to use a lowered instead. Fixes PR21548. Related to PR20474. llvm-svn: 221820	2014-11-12 22:16:55 +00:00
Quentin Colombet	06167df4ad	[CodeGenPrepare] Move extractelement close to store if they can be combined. This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. Context Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. Motivating Example Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. Proposed Solution To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978	2014-10-31 17:52:53 +00:00
NAKAMURA Takumi	ee3b3d3d09	Whitespace. llvm-svn: 220857	2014-10-29 15:23:11 +00:00
Quentin Colombet	324286d9e6	[CodeGenPrepare][AddressingModeMatcher] The promotion mechanism was expecting instructions when truncate, sext, or zext were created. Fix that. llvm-svn: 217926	2014-09-16 22:36:07 +00:00
Quentin Colombet	798b42868c	[CodeGenPrepare][AddressingModeMatcher] Fix a think-o for the sext(zext) -> zext promotion introduced in r217629. We were returning the old sext instead of the new zext as the promoted instruction! Thanks Joerg Sonnenberger for the test case. llvm-svn: 217800	2014-09-15 18:26:58 +00:00
Quentin Colombet	74d3a27ed8	[CodeGenPrepare] Teach the addressing mode matcher how to promote zext. I.e., teach it about 'sext (zext a to ty) to ty2' => zext a to ty2. llvm-svn: 217629	2014-09-11 21:22:14 +00:00
Craig Topper	65775cc03d	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 216158	2014-08-21 05:55:13 +00:00
Craig Topper	aa7422b5a6	Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size." Getting a weird buildbot failure that I need to investigate. llvm-svn: 215870	2014-08-18 00:24:38 +00:00
Craig Topper	227456e133	Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size. llvm-svn: 215868	2014-08-17 23:47:00 +00:00
Eric Christopher	99307e99a2	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781	2014-08-04 21:25:23 +00:00
Tim Northover	2329ee9f83	CodeGenPrep: fall back to MVT::Other if instruction's type isn't an EVT. The test being performed is just an approximation anyway, so it really shouldn't crash when things don't go entirely as expected. Should fix PR20474. llvm-svn: 214177	2014-07-29 10:20:22 +00:00
Duncan P. N. Exon Smith	2ae51d315c	Revert "[C++11] Add predecessors(BasicBlock ) / successors(BasicBlock ) iterator ranges." This reverts commit r213474 (and r213475), which causes a miscompile on a stage2 LTO build. I'll reply on the list in a moment. llvm-svn: 213562	2014-07-21 17:06:51 +00:00
Manuel Jacob	8e924ddc40	[C++11] Add predecessors(BasicBlock ) / successors(BasicBlock ) iterator ranges. Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended. Test Plan: All tests (make check-all) still pass. Reviewers: dblaikie Reviewed By: dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4481 llvm-svn: 213474	2014-07-20 09:10:11 +00:00
Sanjay Patel	5537765676	Fixed formatting, removed bug reference, renamed testcase Thanks to Duncan Exon Smith for reviewing and cleanup suggestions. llvm-svn: 213205	2014-07-16 22:40:28 +00:00
Sanjay Patel	8befe236c4	trivial fix for PR20314 Make sure that the AddrInst is an Instruction. llvm-svn: 213197	2014-07-16 21:08:10 +00:00
Jiangning Liu	a490614c0a	Create macro INITIALIZE_TM_PASS. Pass initialization requires to initialize TargetMachine for back-end specific passes. This commit creates a new macro INITIALIZE_TM_PASS to simplify this kind of initialization. llvm-svn: 210641	2014-06-11 07:04:37 +00:00
Richard Trieu	2252d26c83	Remove use of comma operator. llvm-svn: 209871	2014-05-30 03:15:17 +00:00
Eli Bendersky	80ec3983a5	Similar to bitcast, treat addrspacecast as a foldable operand. Added a test sink-addrspacecast.ll to verify this change. Patch by Jingyue Wu. llvm-svn: 209343	2014-05-22 00:02:52 +00:00
Louis Gerbarg	c59e699789	Add missing line breaks to debug output in CodeGenPrepare llvm-svn: 208731	2014-05-13 21:54:22 +00:00
Joey Gouly	c0f94cb136	[CGP] r205941 changed the logic, so that a cast happens before 'Result' is compared to 'AddrMode.BaseReg'. In the case that 'AddrMode.BaseReg' is nullptr, 'Result' will also be nullptr, so the cast causes an assertion. We should use dyn_cast_or_null here to check 'Result' is not null and it is an instruction. Bug found by Mats Petersson, and I reduced his IR to get a test case. llvm-svn: 208705	2014-05-13 15:42:45 +00:00
Benjamin Kramer	171a3310a4	Make helper functions static. llvm-svn: 207359	2014-04-27 14:54:59 +00:00
Michael Zolotukhin	1c3340dd78	Revert r206749 till a final decision about the intrinsics is made. llvm-svn: 207313	2014-04-26 09:56:41 +00:00
Adrian Prantl	7566e72bb8	This reapplies r207235 with an additional bugfixes caught by the msan buildbot - do not insert debug intrinsics before phi nodes. Debug info for optimized code: Support variables that are on the stack and described by DBG_VALUEs during their lifetime. Previously, when a variable was at a FrameIndex for any part of its lifetime, this would shadow all other DBG_VALUEs and only a single fbreg location would be emitted, which in fact is only valid for a small range and not the entire lexical scope of the variable. The included dbg-value-const-byref testcase demonstrates this. This patch fixes this by Local - emitting dbg.value intrinsics for allocas that are passed by reference - dropping all dbg.declares (they are now fully lowered to dbg.values) SelectionDAG - renamed constructors for SDDbgValue for better readability. - fix UserValue::match() to handle indirect values correctly - not inserting an MMI table entries for dbg.values that describe allocas. - lowering dbg.values that describe allocas into indirect DBG_VALUEs. CodeGenPrepare - leaving dbg.values for an alloca were they are (see comment) Other - regenerated/updated instcombine.ll testcase and included source rdar://problem/16679879 http://reviews.llvm.org/D3374 llvm-svn: 207269	2014-04-25 20:49:25 +00:00
Adrian Prantl	319db7c542	Revert "This reapplies r207130 with an additional testcase+and a missing check for" This reverts commit 207235 to investigate msan buildbot breakage. llvm-svn: 207250	2014-04-25 18:18:09 +00:00
Adrian Prantl	7f9d1e9fd6	This reapplies r207130 with an additional testcase+and a missing check for AllocaInst that was missing in one location. Debug info for optimized code: Support variables that are on the stack and described by DBG_VALUEs during their lifetime. Previously, when a variable was at a FrameIndex for any part of its lifetime, this would shadow all other DBG_VALUEs and only a single fbreg location would be emitted, which in fact is only valid for a small range and not the entire lexical scope of the variable. The included dbg-value-const-byref testcase demonstrates this. This patch fixes this by Local - emitting dbg.value intrinsics for allocas that are passed by reference - dropping all dbg.declares (they are now fully lowered to dbg.values) SelectionDAG - renamed constructors for SDDbgValue for better readability. - fix UserValue::match() to handle indirect values correctly - not inserting an MMI table entries for dbg.values that describe allocas. - lowering dbg.values that describe allocas into indirect DBG_VALUEs. CodeGenPrepare - leaving dbg.values for an alloca were they are (see comment) Other - regenerated/updated instcombine.ll testcase and included source rdar://problem/16679879 http://reviews.llvm.org/D3374 llvm-svn: 207235	2014-04-25 17:01:00 +00:00
Adrian Prantl	0338f80f17	Revert "This reapplies r207130 with an additional testcase+and a missing check for" Typo in testcase. llvm-svn: 207166	2014-04-25 00:42:50 +00:00
Adrian Prantl	bf019d19e9	This reapplies r207130 with an additional testcase+and a missing check for AllocaInst that was missing in one location. Debug info for optimized code: Support variables that are on the stack and described by DBG_VALUEs during their lifetime. Previously, when a variable was at a FrameIndex for any part of its lifetime, this would shadow all other DBG_VALUEs and only a single fbreg location would be emitted, which in fact is only valid for a small range and not the entire lexical scope of the variable. The included dbg-value-const-byref testcase demonstrates this. This patch fixes this by Local - emitting dbg.value intrinsics for allocas that are passed by reference - dropping all dbg.declares (they are now fully lowered to dbg.values) SelectionDAG - renamed constructors for SDDbgValue for better readability. - fix UserValue::match() to handle indirect values correctly - not inserting an MMI table entries for dbg.values that describe allocas. - lowering dbg.values that describe allocas into indirect DBG_VALUEs. CodeGenPrepare - leaving dbg.values for an alloca were they are (see comment) Other - regenerated/updated instcombine.ll testcase and included source rdar://problem/16679879 http://reviews.llvm.org/D3374 llvm-svn: 207165	2014-04-25 00:38:40 +00:00
Adrian Prantl	0b669e8f79	Revert "Debug info for optimized code: Support variables that are on the stack and" This reverts commit 207130 for buildbot breakage. llvm-svn: 207162	2014-04-25 00:04:49 +00:00
Adrian Prantl	807e5d8a9a	Debug info for optimized code: Support variables that are on the stack and described by DBG_VALUEs during their lifetime. Previously, when a variable was at a FrameIndex for any part of its lifetime, this would shadow all other DBG_VALUEs and only a single fbreg location would be emitted, which in fact is only valid for a small range and not the entire lexical scope of the variable. The included dbg-value-const-byref testcase demonstrates this. This patch fixes this by Local - emitting dbg.value intrinsics for allocas that are passed by reference - dropping all dbg.declares (they are now fully lowered to dbg.values) SelectionDAG - renamed constructors for SDDbgValue for better readability. - fix UserValue::match() to handle indirect values correctly - not inserting an MMI table entries for dbg.values that describe allocas. - lowering dbg.values that describe allocas into indirect DBG_VALUEs. CodeGenPrepare - leaving dbg.values for an alloca were they are (see comment) Other - regenerated/updated instcombine-intrinsics testcase and included source rdar://problem/16679879 http://reviews.llvm.org/D3374 llvm-svn: 207130	2014-04-24 17:41:45 +00:00
Chandler Carruth	2361db41db	[Modules] Remove potential ODR violations by sinking the DEBUG_TYPE define below all header includes in the lib/CodeGen/... tree. While the current modules implementation doesn't check for this kind of ODR violation yet, it is likely to grow support for it in the future. It also removes one layer of macro pollution across all the included headers. Other sub-trees will follow. llvm-svn: 206837	2014-04-22 02:02:50 +00:00

1 2

76 Commits