llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Evgeny Astigeevich	a7fa7ac1b2	[TargetTransformInfo, API] Add a list of operands to TTI::getUserCost The changes are a result of discussion of https://reviews.llvm.org/D33685. It solves the following problem: 1. We can inform getGEPCost about simplified indices to help it with calculating the cost. But getGEPCost does not take into account the context which GEPs are used in. 2. We have getUserCost which can take the context into account but we cannot inform about simplified indices. With the changes getUserCost will have access to additional information as getGEPCost has. The one parameter getUserCost is also provided. Differential Revision: https://reviews.llvm.org/D34057 llvm-svn: 306674	2017-06-29 13:42:12 +00:00
Geoff Berry	9c1c467f23	[LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI. Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper Subscribers: jholewinski, arsenm, mzolotukhin, nemanjai, nhaehnle, javed.absar, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D34531 llvm-svn: 306554	2017-06-28 15:53:17 +00:00
Alexander Timofeev	ca60194f1e	DivergencyAnalysis patch for review llvm-svn: 305494	2017-06-15 19:33:10 +00:00
Daniel Neilson	1fd6840870	Const correctness for TTI::getRegisterBitWidth Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation. Reviewers: chandlerc, rnk, reames Reviewed By: reames Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D33903 llvm-svn: 305189	2017-06-12 14:22:21 +00:00
Anna Thomas	6cfdd1fe30	[Atomics][LoopIdiom] Recognize unordered atomic memcpy Summary: Expanding the loop idiom test for memcpy to also recognize unordered atomic memcpy. The only difference for recognizing an unordered atomic memcpy and instead of a normal memcpy is that the loads and/or stores involved are unordered atomic operations. Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html Patch by Daniel Neilson! Reviewers: reames, anna, skatkov Reviewed By: reames, anna Subscribers: llvm-commits, mzolotukhin Differential Revision: https://reviews.llvm.org/D33243 llvm-svn: 304806	2017-06-06 16:45:25 +00:00
Chandler Carruth	eb66b33867	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Evgeny Stupachenko	5e8ec36407	Fix PR23384 (part 2 of 3) NFC Summary: The patch moves LSR cost comparison to target part. Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30561 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304750	2017-06-05 23:37:00 +00:00
Zaara Syeda	39139cb634	[PPC] Inline expansion of memcmp This patch does an inline expansion of memcmp. It changes the memcmp library call into an inline expansion when the size is known at compile time and is under a target specified threshold. This expansion is implemented in CodeGenPrepare and expands into straight line code. The target specifies a maximum load size and the expansion works by using this size to load the two sources, compare, and exit early if a difference is found. It also has a special case when the memcmp result is used in a compare to zero equality. Differential Revision: https://reviews.llvm.org/D28637 llvm-svn: 304313	2017-05-31 17:12:38 +00:00
Jonas Paulsson	0437145b47	[LoopVectorizer] Let target prefer scalar addressing computations. The loop vectorizer usually vectorizes any instruction it can and then extracts the elements for a scalarized use. On SystemZ, all elements containing addresses must be extracted into address registers (GRs). Since this extraction is not free, it is better to have the address in a suitable register to begin with. By forcing address arithmetic instructions and loads of addresses to be scalar after vectorization, two benefits result: * No need to extract the register * LSR optimizations trigger (LSR isn't handling vector addresses currently) Benchmarking show improvements on SystemZ with this new behaviour. Any other target could try this by returning false in the new hook prefersVectorizedAddressing(). Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand https://reviews.llvm.org/D32422 llvm-svn: 303744	2017-05-24 13:42:56 +00:00
Adam Nemet	f9607f0660	[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116	2017-05-15 21:15:01 +00:00
Amara Emerson	668fbd4cf5	Add a late IR expansion pass for the experimental reduction intrinsics. This pass uses a new target hook to decide whether or not to expand a particular intrinsic to the shuffevector sequence. Differential Revision: https://reviews.llvm.org/D32245 llvm-svn: 302631	2017-05-10 09:42:49 +00:00
Amara Emerson	59ff6c8c60	Introduce experimental generic intrinsics for horizontal vector reductions. - This change allows targets to opt-in to using them instead of the log2 shufflevector algorithm. - The SLP and Loop vectorizers have the common code to do shuffle reductions factored out into LoopUtils, and now have a unified interface for generating reductions regardless of the preference of the target. LoopUtils now uses TTI to determine what kind of reductions the target wants to handle. - For CodeGen, basic legalization support is added. Differential Revision: https://reviews.llvm.org/D30086 llvm-svn: 302514	2017-05-09 10:43:25 +00:00
Jun Bum Lim	93f61e6588	[InlineCost] Improve the cost heuristic for Switch Summary: The motivation example is like below which has 13 cases but only 2 distinct targets ``` lor.lhs.false2: ; preds = %if.then switch i32 %Status, label %if.then27 [ i32 -7012, label %if.end35 i32 -10008, label %if.end35 i32 -10016, label %if.end35 i32 15000, label %if.end35 i32 14013, label %if.end35 i32 10114, label %if.end35 i32 10107, label %if.end35 i32 10105, label %if.end35 i32 10013, label %if.end35 i32 10011, label %if.end35 i32 7008, label %if.end35 i32 7007, label %if.end35 i32 5002, label %if.end35 ] ``` which is compiled into a balanced binary tree like this on AArch64 (similar on X86) ``` .LBB853_9: // %lor.lhs.false2 mov w8, #10012 cmp w19, w8 b.gt .LBB853_14 // BB#10: // %lor.lhs.false2 mov w8, #5001 cmp w19, w8 b.gt .LBB853_18 // BB#11: // %lor.lhs.false2 mov w8, #-10016 cmp w19, w8 b.eq .LBB853_23 // BB#12: // %lor.lhs.false2 mov w8, #-10008 cmp w19, w8 b.eq .LBB853_23 // BB#13: // %lor.lhs.false2 mov w8, #-7012 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_14: // %lor.lhs.false2 mov w8, #14012 cmp w19, w8 b.gt .LBB853_21 // BB#15: // %lor.lhs.false2 mov w8, #-10105 add w8, w19, w8 cmp w8, #9 // =9 b.hi .LBB853_17 // BB#16: // %lor.lhs.false2 orr w9, wzr, #0x1 lsl w8, w9, w8 mov w9, #517 and w8, w8, w9 cbnz w8, .LBB853_23 .LBB853_17: // %lor.lhs.false2 mov w8, #10013 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_18: // %lor.lhs.false2 mov w8, #-7007 add w8, w19, w8 cmp w8, #2 // =2 b.lo .LBB853_23 // BB#19: // %lor.lhs.false2 mov w8, #5002 cmp w19, w8 b.eq .LBB853_23 // BB#20: // %lor.lhs.false2 mov w8, #10011 cmp w19, w8 b.eq .LBB853_23 b .LBB853_3 .LBB853_21: // %lor.lhs.false2 mov w8, #14013 cmp w19, w8 b.eq .LBB853_23 // BB#22: // %lor.lhs.false2 mov w8, #15000 cmp w19, w8 b.ne .LBB853_3 ``` However, the inline cost model estimates the cost to be linear with the number of distinct targets and the cost of the above switch is just 2 InstrCosts. The function containing this switch is then inlined about 900 times. This change use the general way of switch lowering for the inline heuristic. It etimate the number of case clusters with the suitability check for a jump table or bit test. Considering the binary search tree built for the clusters, this change modifies the model to be linear with the size of the balanced binary tree. The model is off by default for now : -inline-generic-switch-cost=false This change was originally proposed by Haicheng in D29870. Reviewers: hans, bmakam, chandlerc, eraman, haicheng, mcrosier Reviewed By: hans Subscribers: joerg, aemerson, llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D31085 llvm-svn: 301649	2017-04-28 16:04:03 +00:00
Jonas Paulsson	b15d643e4a	[LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore() Since SystemZ supports vector element load/store instructions, there is no need for extracts/inserts if a vector load/store gets scalarized. This patch lets Target specify that it supports such instructions by means of a new TTI hook that defaults to false. The use for this is in the LoopVectorizer getScalarizationOverhead() method, which will with this patch produce a smaller sum for a vector load/store on SystemZ. New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll Review: Adam Nemet https://reviews.llvm.org/D30680 llvm-svn: 300056	2017-04-12 12:41:37 +00:00
Jonas Paulsson	90b172efa0	[SystemZ] TargetTransformInfo cost functions implemented. getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(), getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(), getInterleavedMemoryOpCost() implemented. Interleaved access vectorization enabled. BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads, in which case the cost of the z/sext instruction becomes 0. Review: Ulrich Weigand, Renato Golin. https://reviews.llvm.org/D29631 llvm-svn: 300052	2017-04-12 11:49:08 +00:00
Jun Bum Lim	2666864c81	[CodeGenPrep] move aarch64-type-promotion to CGP Summary: Move the aarch64-type-promotion pass within the existing type promotion framework in CGP. This change also support forking sexts when a new sext is required for promotion. Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853. Reviewers: jmolloy, mcrosier, javed.absar, qcolombet Reviewed By: qcolombet Subscribers: llvm-commits, aemerson, rengolin, mcrosier Differential Revision: https://reviews.llvm.org/D28680 llvm-svn: 299379	2017-04-03 19:20:07 +00:00
Jonas Paulsson	42e7a2d74b	[TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands. Tests updates: Analysis/CostModel/X86/arith-fp.ll Transforms/LoopVectorize/AArch64/interleaved_cost.ll Transforms/LoopVectorize/ARM/interleaved_cost.ll The improvement in getOperandsScalarizationOverhead() to differentiate on constants made it necessary to update the interleaved_cost.ll tests even though they do not relate to intrinsics. Review: Hal Finkel https://reviews.llvm.org/D29540 llvm-svn: 297705	2017-03-14 06:35:36 +00:00
Matt Arsenault	dd128e79e5	NVPTX: Refactor NVPTXInferAddressSpaces to check TTI Add a new TTI hook for getting the generic address space value. llvm-svn: 293563	2017-01-30 23:02:12 +00:00
Jonas Paulsson	1dc6fdc89f	[TargetTransformInfo] Refactor and improve getScalarizationOverhead() Refactoring to remove duplications of this method. New method getOperandsScalarizationOverhead() that looks at the present unique operands and add extract costs for them. Old behaviour was to just add extract costs for one operand of the type always, which still happens in getArithmeticInstrCost() if no operands are provided by the caller. This is a good start of improving on this, but there are more places that can be improved by using getOperandsScalarizationOverhead(). Review: Hal Finkel https://reviews.llvm.org/D29017 llvm-svn: 293155	2017-01-26 07:03:25 +00:00
Mohammed Agabaria	df301aa885	[X86] updating TTI costs for arithmetic instructions on X86\SLM arch. updated instructions: pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd. special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. In case if the real operands bitwidth <= 16. Differential Revision: https://reviews.llvm.org/D28104 llvm-svn: 291657	2017-01-11 08:23:37 +00:00
Mohammed Agabaria	caef091029	Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling. This code seems to be target dependent which may not be the same for all targets. Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'. Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general. Differential Revision: https://reviews.llvm.org/D27518 llvm-svn: 291106	2017-01-05 14:03:41 +00:00
Haicheng Wu	c157a2a55a	[TTI/CostModel] Correct the way getGEPCost() calls isLegalAddressingMode() Fix a bug when we call isLegalAddressingMode() from getGEPCost(). Differential Revision: https://reviews.llvm.org/D27357 llvm-svn: 288569	2016-12-03 01:57:24 +00:00
Peter Collingbourne	bc87b9fd38	IR: Change the gep_type_iterator API to avoid always exposing the "current" type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458	2016-12-02 02:24:42 +00:00
Benjamin Kramer	c2de5980d3	Do a sweep over move ctors and remove those that are identical to the default. All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721	2016-10-20 12:20:28 +00:00
Oliver Stannard	7be9f2d236	[ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPI With the ROPI and RWPI relocation models we can't always have pointers to global data or functions in constant data, so don't try to convert switches into lookup tables if any value in the lookup table would require a relocation. We can still safely emit lookup tables of other values, such as simple constants. Differential Revision: https://reviews.llvm.org/D24462 llvm-svn: 283530	2016-10-07 08:48:24 +00:00
Volkan Keles	2c3720a7dd	Add new target hooks for LoadStoreVectorizer Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D24727 llvm-svn: 283099	2016-10-03 10:31:34 +00:00
Jonas Paulsson	6ae041f182	[LoopStrenghtReduce] Refactoring and addition of a new target cost function. Refactored so that a LSRUse owns its fixups, as oppsed to letting the LSRInstance own them. This makes it easier to rate formulas for LSRUses, since the fixups are available directly. The Offsets vector has been removed since it was no longer necessary. New target hook isFoldableMemAccessOffset(), which is used during formula rating. For SystemZ, this is useful to express that loads and stores with float or vector types with a big/negative offset should be avoided in loops. Without this, LSR will generate a lot of negative offsets that would require extra instructions for loading the address. Updated tests: test/CodeGen/SystemZ/loop-01.ll Reviewed by: Quentin Colombet and Ulrich Weigand. https://reviews.llvm.org/D19152 llvm-svn: 278927	2016-08-17 13:24:19 +00:00
Charles Davis	2ca27b7279	Revert "[X86] Support the "ms-hotpatch" attribute." This reverts commit r278048. Something changed between the last time I built this--it takes awhile on my ridiculously slow and ancient computer--and now that broke this. llvm-svn: 278053	2016-08-08 21:20:15 +00:00
Charles Davis	24439f8d33	[X86] Support the "ms-hotpatch" attribute. Summary: Based on two patches by Michael Mueller. This is a target attribute that causes a function marked with it to be emitted as "hotpatchable". This particular mechanism was originally devised by Microsoft for patching their binaries (which they are constantly updating to stay ahead of crackers, script kiddies, and other ne'er-do-wells on the Internet), but is now commonly abused by Windows programs to hook API functions. This mechanism is target-specific. For x86, a two-byte no-op instruction is emitted at the function's entry point; the entry point must be immediately preceded by 64 (32-bit) or 128 (64-bit) bytes of padding. This padding is where the patch code is written. The two byte no-op is then overwritten with a short jump into this code. The no-op is usually a `movl %edi, %edi` instruction; this is used as a magic value indicating that this is a hotpatchable function. Reviewers: majnemer, sanjoy, rnk Subscribers: dberris, llvm-commits Differential Revision: https://reviews.llvm.org/D19908 llvm-svn: 278048	2016-08-08 21:01:39 +00:00
Alina Sbirlea	6b31c219ab	LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments. Summary: TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses. A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted. Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures: - x86 failing tests either did not have the right target, or the right alignment. - NVPTX failing tests did not have the right alignment. - AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks for a maximum size allowed and relies on the next condition checking for %4 for correctness. This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list). Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure), so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context. Reviewers: arsenm, jlebar, tstellarAMD Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D23068 llvm-svn: 277735	2016-08-04 16:38:44 +00:00
David Majnemer	466bebafa0	[coroutines] Part 2 of N: Adding Coroutine Intrinsics This is the second patch in the coroutine series. It adds coroutine intrinsics and updates intrinsic cost in TargetTransformInfoImpl.h. Patch by Gor Nishanov! Differential Revision: https://reviews.llvm.org/D22659 llvm-svn: 276839	2016-07-27 05:12:35 +00:00
Sjoerd Meijer	b088b3e65b	This implements a more optimal algorithm for selecting a base constant in constant hoisting. It not only takes into account the number of uses and the cost of expressions in which constants appear, but now also the resulting integer range of the offsets. Thus, the algorithm maximizes the number of uses within an integer range that will enable more efficient code generation. On ARM, for example, this will enable code size optimisations because less negative offsets will be created. Negative offsets/immediates are not supported by Thumb1 thus preventing more compact instruction encoding. Differential Revision: http://reviews.llvm.org/D21183 llvm-svn: 275382	2016-07-14 07:44:20 +00:00
Alina Sbirlea	7abab61a0e	Add TLI.allowsMisalignedMemoryAccesses to LoadStoreVectorizer Summary: Extend TTI to access TLI.allowsMisalignedMemoryAccesses(). Check condition when vectorizing load and store chains. Add additional parameters: AddressSpace, Alignment, Fast. Reviewers: llvm-commits, jlebar Subscribers: arsenm, mzolotukhin Differential Revision: http://reviews.llvm.org/D21935 llvm-svn: 275100	2016-07-11 20:46:17 +00:00
Jingyue Wu	b8fbccd103	[TTI] Expose TTI::getGEPCost and use it in SLSR and NaryReassociate. NFC. llvm-svn: 274940	2016-07-08 21:48:05 +00:00
Matt Arsenault	442b29f8c4	TTI: Add hook for memory width to vectorize llvm-svn: 272964	2016-06-16 21:43:12 +00:00
Matthew Simpson	995eceaf0c	[TTI] Add hook for vector extract with extension This change adds a new hook for estimating the cost of vector extracts followed by zero- and sign-extensions. The motivating example for this change is the SMOV and UMOV instructions on AArch64. These instructions move data from vector to general purpose registers while performing the corresponding extension (sign-extend for SMOV and zero-extend for UMOV) at the same time. For these operations, TargetTransformInfo can assume the extensions are free and only report the cost of the vector extract. The SLP vectorizer has been updated to make use of the new hook. Differential Revision: http://reviews.llvm.org/D18523 llvm-svn: 267725	2016-04-27 15:20:21 +00:00
Justin Lebar	5824577459	[TTI] Add getInliningThresholdMultiplier. Summary: InlineCost's threshold is multiplied by this value. This lets us adjust the inlining threshold up or down on a per-target basis. For example, we might want to increase the threshold on targets where calls are unusually expensive. Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18560 llvm-svn: 266405	2016-04-15 01:38:48 +00:00
Renato Golin	bf4b959200	[ARM] Adding IEEE-754 SIMD detection to loop vectorizer Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON. This patch teaches the loop vectorizer to only allow transformations of loops that either contain no floating-point operations or have enough allowance flags supporting lack of precision (ex. -ffast-math, Darwin). For that, the target description now has a method which tells us if the vectorizer is allowed to handle FP math without falling into unsafe representations, plus a check on every FP instruction in the candidate loop to check for the safety flags. This commit makes LLVM behave like GCC with respect to ARM NEON support, but it stops short of fixing the underlying problem: sub-normals. Neither GCC nor LLVM have a flag for allowing sub-normal operations. Before this patch, GCC only allows it using unsafe-math flags and LLVM allows it by default with no way to turn it off (short of not using NEON at all). As a first step, we push this change to make it safe and in sync with GCC. The second step is to discuss a new sub-normal's flag on both communitues and come up with a common solution. The third step is to improve the FastMath flags in LLVM to encode sub-normals and use those flags to restrict NEON FP. Fixes PR16275. llvm-svn: 266363	2016-04-14 20:42:18 +00:00
David Majnemer	d2ed420815	[CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only one of the inputs is NaN: -NUM will return the non-NaN argument while -NAN would return NaN. It is desirable to lower to @llvm.{min,max}num to -NAN if they don't have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the -NAN semantics. N.B. Of course, it is only safe to do this if the intrinsic call is marked nnan. llvm-svn: 266279	2016-04-14 07:13:24 +00:00
Adam Nemet	23548e39cd	[LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead Summary: It can hurt performance to prefetch ahead too much. Be conservative for now and don't prefetch ahead more than 3 iterations on Cyclone. Reviewers: hfinkel Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17949 llvm-svn: 263772	2016-03-18 00:27:43 +00:00
Adam Nemet	8d7ad736df	[LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses Summary: And use this TTI for Cyclone. As it was explained in the original RFC (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW prefetcher work up to 2KB strides. I am also adding tests for this and the previous change (D17943): * Cyclone prefetching accesses with a large stride * Cyclone not prefetching accesses with a small stride * Generic Aarch64 subtarget not prefetching either Reviewers: hfinkel Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D17945 llvm-svn: 263771	2016-03-18 00:27:38 +00:00
Adam Nemet	b70814180d	[TTI] Add getPrefetchDistance from PPCLoopDataPrefetch, NFC This patch is part of the work to make PPCLoopDataPrefetch target-independent (http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758). As it was discussed in the above thread, getPrefetchDistance is currently using instruction count which may change in the future. llvm-svn: 258995	2016-01-27 22:21:25 +00:00
Eduard Burtescu	636d36b9c9	[opaque pointer types] [NFC] gep_type_{begin,end} now take source element type and address space. Reviewers: mjacob, dblaikie Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D16436 llvm-svn: 258474	2016-01-22 01:33:43 +00:00
Adam Nemet	2628dc52d2	[TTI] Add getCacheLineSize Summary: And use it in PPCLoopDataPrefetch.cpp. @hfinkel, please let me know if your preference would be to preserve the ppc-loop-prefetch-cache-line option in order to be able to override the value of TTI::getCacheLineSize for PPC. Reviewers: hfinkel Subscribers: hulx2000, mcrosier, mssimpso, hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D16306 llvm-svn: 258419	2016-01-21 18:28:36 +00:00
Elena Demikhovsky	3ed0b3c7f1	Implemented cost model for masked gather and scatter operations The cost is calculated for all X86 targets. When gather/scatter instruction is not supported we calculate the cost of scalar sequence. Differential revision: http://reviews.llvm.org/D15677 llvm-svn: 256519	2015-12-28 20:10:59 +00:00
Manuel Jacob	3a4569b878	Remove deprecated llvm.experimental.gc.result.{int,float,ptr} intrinsics. Summary: These were deprecated 11 months ago when a generic llvm.experimental.gc.result intrinsic, which works for all types, was added. Reviewers: sanjoy, reames Subscribers: sanjoy, chenli, llvm-commits Differential Revision: http://reviews.llvm.org/D15719 llvm-svn: 256262	2015-12-22 18:44:45 +00:00
Elena Demikhovsky	ce52715df6	Fixed a failure in cost calculation for vector GEP Cost calculation for vector GEP failed with due to invalid cast to GEP index operand. The bug is fixed, added a test. http://reviews.llvm.org/D14976 llvm-svn: 254408	2015-12-01 12:08:36 +00:00
Sanjoy Das	3a214a28fe	Revert r243347 "Add TargetTransformInfo::isZExtFree." r243347 was intended to support a change to LSR (r243348). That change to LSR has since had to be reverted (r243939) because it was buggy, and now the code added in r243347 is untested and unexercised. Given that, I think it is appropriate to revert r243347 for now, with the intent of adding it back in later if I get around to checking in a fixed version of r243348. llvm-svn: 252948	2015-11-12 20:51:52 +00:00
Elena Demikhovsky	b4a97e3e11	Scalarizer for masked.gather and masked.scatter intrinsics. When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations. If the mask is not constant, the scalarizer will build a chain of conditional basic blocks. I added isLegalMaskedGather() isLegalMaskedScatter() APIs. Differential Revision: http://reviews.llvm.org/D13722 llvm-svn: 251237	2015-10-25 15:37:55 +00:00
Elena Demikhovsky	2e0208e770	Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore(). Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case. Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces. Differential Revision: http://reviews.llvm.org/D13850 llvm-svn: 250686	2015-10-19 07:43:38 +00:00

1 2

76 Commits