llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00

Author	SHA1	Message	Date
Raphael Isemann	3598ed7aee	Fix modules build by adding missing includes	2019-11-19 14:37:18 +01:00
Francesco Petrogalli	0a74d2ec75	[SVFS] Inject TLI Mappings in VFABI attribute. This patch introduces a function pass to inject the scalar-to-vector mappings stored in the TargetLIbraryInfo (TLI) into the Vector Function ABI (VFABI) variants attribute. The test is testing the injection for three vector libraries supported by the TLI (Accelerate, SVML, MASSV). The pass does not change any of the analysis associated to the function. Differential Revision: https://reviews.llvm.org/D70107	2019-11-15 18:42:56 +00:00
Reid Kleckner	b3a7316049	Add missing includes needed to prune LLVMContext.h include, NFC These are a pre-requisite to removing #include "llvm/Support/Options.h" from LLVMContext.h: https://reviews.llvm.org/D70280	2019-11-14 15:23:15 -08:00
Reid Kleckner	68092989f3	Sink all InitializePasses.h includes This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211	2019-11-13 16:34:37 -08:00
Francesco Petrogalli	d054b2dd05	[VFABI] Add LLVM internal mangling for vector functions. Summary: This patch adds a custom ISA for vector functions for internal use in LLVM. The <isa> token is set to "_LLVM_", and it is not attached to any specific instruction Vector ISA, or Vector Function ABI. The ISA is used as a token for handling Vector Function ABI-style vectorization for those vector functions that are not directly associated to any existing Vector Function ABI (for example, some of the vector functions exposed by TargetLibraryInfo). The demangling function for this ISA in a Vector Function ABI context is set to be the same as the common one shared between X86 and AArch64. Reviewers: jdoerfert, sdesmalen, simoll Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70089	2019-11-13 03:26:39 +00:00
Alina Sbirlea	28990514f5	[GlobalsAA] Restrict ModRef result if any internal method has its address taken. Summary: If there are any internal methods whose address was taken, conclude there is nothing known in relation of any other internal method and a global. Reviewers: nlopes, sanjoy.google Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69690	2019-11-12 14:24:56 -08:00
Francesco Petrogalli	cbc5510b51	[VFABI] Read/Write functions for the VFABI attribute. The attribute is stored at the `FunctionIndex` attribute set, with the name "vector-function-abi-variant". The get/set methods of the attribute have assertion to verify that: 1. Each name in the attribute is a valid VFABI mangled name. 2. Each name in the attribute correspond to a function declared in the module. Differential Revision: https://reviews.llvm.org/D69976	2019-11-12 03:40:42 +00:00
aqjune	f2a9453b21	Add InstCombine/InstructionSimplify support for Freeze Instruction Summary: - Add llvm::SimplifyFreezeInst - Add InstCombiner::visitFreeze - Add llvm tests Reviewers: majnemer, sanjoy, reames, lebedev.ri, spatel Reviewed By: reames, lebedev.ri Subscribers: reames, lebedev.ri, filcab, regehr, trentxintong, llvm-commits Differential Revision: https://reviews.llvm.org/D29013	2019-11-12 12:13:26 +09:00
Gil Rapaport	baa1dd5bfc	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI) This recommits 11ed1c0239fd51fd2f064311dc7725277ed0a994 (reverted in 9f08ce0d2197d4f163dfa4633eae2347ce8fc881 for failing an assert) with a fix: tryToWidenMemory() now first checks if the widening decision is to interleave, thus maintaining previous behavior where tryToInterleaveMemory() was called first, giving priority to interleave decisions over widening/scalarization. This commit adds the test case that exposed this bug as a LIT.	2019-11-09 20:52:25 +02:00
bmahjour	d3fb929e1b	[DDG] Data Dependence Graph - Pi Block Summary: This patch adds Pi Blocks to the DDG. A pi-block represents a group of DDG nodes that are part of a strongly-connected component of the graph. Replacing all the SCCs with pi-blocks results in an acyclic representation of the DDG. For example if we have: {a -> b}, {b -> c, d}, {c -> a} the cycle a -> b -> c -> a is abstracted into a pi-block "p" as follows: {p -> d} with "p" containing: {a -> b}, {b -> c}, {c -> a} In this implementation the edges between nodes that are part of the pi-block are preserved. The crossing edges (edges where one end of the edge is in the set of nodes belonging to an SCC and the other end is outside that set) are replaced with corresponding edges to/from the pi-block node instead. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tag: #llvm Differential Revision: https://reviews.llvm.org/D68827	2019-11-08 15:46:08 -05:00
Gil Rapaport	fcea1fdb8a	Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)" This reverts commit 11ed1c0239fd51fd2f064311dc7725277ed0a994 - causes an assert failure.	2019-11-08 22:17:11 +02:00
Gil Rapaport	23a035a9ad	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI) This recommits 100e797adb433724a17c9b42b6533cd634cb796b (reverted in 009e032634b3bd7fc32071ac2344b12142286477 for failing an assert). While the root cause was independently reverted in eaff3004019f97c64c88ab76da6b25106b659b30, this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not try to sink branch instructions.	2019-11-08 15:25:14 +02:00
Eric Christopher	998e9c6584	Temporarily Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)" as it's causing assert failures. This reverts commit 100e797adb433724a17c9b42b6533cd634cb796b.	2019-11-06 21:58:28 -08:00
Simon Pilgrim	b3fa3af42f	LoopAccessAnalysis - fix uninitialized variable warnings. NFCI.	2019-11-06 17:04:21 +00:00
Simon Pilgrim	512a23db84	BranchProbabilityInfo - fix uninitialized variable warning. NFCI.	2019-11-06 17:04:21 +00:00
Sjoerd Meijer	2a67a1de31	[TTI][LV] preferPredicateOverEpilogue We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040	2019-11-06 10:14:20 +00:00
Gil Rapaport	829090e5cd	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC) This recommits 2be17087f8c38934b7fc9208ae6cf4e9b4d44f4b (reverted in d3ec06d219788801380af1948c7f7ef9d3c6100b for heap-use-after-free) with a fix in IAI's reset() which was not clearing the set of interleave groups after deleting them.	2019-11-05 17:29:13 +02:00
Jinsong Ji	90c7b69420	Lower generic MASSV entries to PowerPC subtarget-specific entries This patch (second of two patches) lowers the generic PowerPC vector entries to PowerPC subtarget-specific entries. For instance, the PowerPC generic entry 'cbrtd2_massv' is lowered to 'cbrtd2_P9' or Power9 subtarget. The first patch enables the vectorizer to recognize the IBM MASS vector library routines. This patch specifically adds support for recognizing the '-vector-library=MASSV' option, and defines mappings from IEEE standard scalar math functions to generic PowerPC MASS vector counterparts. For instance, the generic PowerPC MASS vector entry for double-precision 'cbrt' function is '__cbrtd2_massv' The overall support for MASS vector library is presented as such in two patches for ease of review. Patch by pjeeva01 (Jeeva P.) Differential Revision: https://reviews.llvm.org/D59883	2019-11-04 17:17:24 +00:00
Simon Pilgrim	7c495ec4d4	AliasSetTracker - fix uninitialized variable warnings. NFCI.	2019-11-04 15:35:20 +00:00
Hiroshi Yamauchi	bc180e0da8	[PGO][PGSO] TargetLowering/TargetTransformationInfo/SwitchLoweringUtils part. Summary: (Split of off D67120) TargetLowering/TargetTransformationInfo/SwitchLoweringUtils changes for profile guided size optimization. Reviewers: davidxl Subscribers: eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69580	2019-10-31 13:22:56 -07:00
Johannes Doerfert	ff3180c323	[MustExecute] Forward iterate over conditional branches Summary: If a conditional branch is encountered we can try to find a join block where the execution is known to continue. This means finding a suitable block, e.g., the immediate post dominator of the conditional branch, and proofing control will always reach that block. This patch implements different techniques that work with and without provided analysis. Reviewers: uenoku, sstefan1, hfinkel Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68933	2019-10-31 00:06:43 -05:00
Johannes Doerfert	4bf91275a6	[Attributor] Add "free"-based heap2stack deduction Summary: If there is a unique free of the allocated that has to be reached from the malloc, we can apply the heap-2-stack transformation even if the pointer escapes. Reviewers: hfinkel, sstefan1, uenoku Subscribers: hiraditya, bollu, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68958	2019-10-30 20:57:57 -05:00
Guillaume Chatelet	b9cc7719cd	[Alignment][NFC] getMemoryOpCost uses MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69307	2019-10-25 21:26:59 +02:00
Philip Reames	2801c47571	[SCEV] Add a clarifying comment around ExitLimit construction	2019-10-25 10:33:02 -07:00
Philip Reames	b567b75641	[SCEV] Expose and use maximum constant exit counts for individual loop exits We were already going to all of the trouble of computing maximum constant exit counts for each loop exit, we might as well expose them through the API. The change in IndVars is mostly to demonstrate that the wired up code works, but it als very slightly strengthens the transform. The strengthened case is rather narrow though: it requires one exactly analyzeable exit, one imprecisely analyzeable exit (with the upper bound less than the precise one), and one unanalyzeable exit. I coudn't construct a reasonably stable test case. This does increase the memory usage of the BackedgeTakenCount by a factor of 2 in the worst case. I also noticed the loop in IndVars is O(#Exits ^ 2). This doesn't change with this patch. A future patch will cache this result inside of SCEV to avoid requering.	2019-10-24 19:07:33 -07:00
Philip Reames	8339682e3f	[SCEV] Start reworking backedge taken count APIs to unify max handling [NFC] This is a first step in figuring out a proper API for maximum (non constant) exit counts. This may evolve a bit as we get experience with the API needs; suggestions very welcome. This patch just tried to provide a framework that we can later add maximum too in a clean and obvious way.	2019-10-24 18:21:55 -07:00
Philip Reames	b141a66992	[SCEV] Delete unused code from header	2019-10-24 16:34:49 -07:00
Guillaume Chatelet	4a32ea01b3	[Alignment][NFC] Finish transition for `Loads` Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, asbirlea, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69253 llvm-svn: 375419	2019-10-21 15:10:26 +00:00
Oliver Stannard	6aaf81e821	Reland: Dead Virtual Function Elimination Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 375094	2019-10-17 09:58:57 +00:00
Mikhail Maltsev	8cc28e1080	[Analysis] Don't assume that unsigned overflow can't happen in EmitGEPOffset (PR42699) Summary: Currently when computing a GEP offset using the function EmitGEPOffset for the following instruction getelementptr inbounds i32, i32* %p, i64 %offs we get mul nuw i64 %offs, 4 Unfortunately we cannot assume that unsigned wrapping won't happen here because %offs is allowed to be negative. Making such assumptions can lead to miscompilations: see the new test test24_neg_offs in InstCombine/icmp.ll. Without the patch InstCombine would generate the following comparison: icmp eq i64 %offs, 4611686018427387902; 0x3ffffffffffffffe Whereas the correct value to compare with is -2. This patch replaces the NUW flag with NSW in the multiplication instructions generated by EmitGEPOffset and adjusts the test suite. https://bugs.llvm.org/show_bug.cgi?id=42699 Reviewers: chandlerc, craig.topper, ostannard, lebedev.ri, spatel, efriedma, nlopes, aqjune Reviewed By: lebedev.ri Subscribers: reames, lebedev.ri, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68342 llvm-svn: 375089	2019-10-17 08:59:06 +00:00
Jorge Gorbe Moya	6a19d78c0a	Revert "Dead Virtual Function Elimination" This reverts commit 9f6a873268e1ad9855873d9d8007086c0d01cf4f. llvm-svn: 374844	2019-10-14 23:25:25 +00:00
Sam Parker	d863761b7a	[NFC][TTI] Add Alignment for isLegalMasked[Load/Store] Add an extra parameter so the backend can take the alignment into consideration. Differential Revision: https://reviews.llvm.org/D68400 llvm-svn: 374763	2019-10-14 10:00:21 +00:00
Zi Xuan Wu	5d5b98eb29	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634	2019-10-12 02:53:04 +00:00
Oliver Stannard	901c588c1f	Dead Virtual Function Elimination Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 374539	2019-10-11 11:59:55 +00:00
David Greene	5e281d2d23	[System Model] [TTI] Move default cache/prefetch implementations Move the default implementations of cache and prefetch queries to TargetTransformInfoImplBase and delete them from NoTIIImpl. This brings these interfaces in line with how other TTI interfaces work. Differential Revision: https://reviews.llvm.org/D68804 llvm-svn: 374446	2019-10-10 20:39:27 +00:00
Guillaume Chatelet	08dec91263	[Alignment][NFC] Make VectorUtils uas llvm::Align Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, rogfer01, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68784 llvm-svn: 374330	2019-10-10 12:35:04 +00:00
David Greene	a0e8b2161d	[System Model] [TTI] Update cache and prefetch TTI interfaces Re-apply 9fdfb045ae8b/r365676 with fixes for PPC and Hexagon. This involved moving defaults from TargetTransformInfoImplBase to MCSubtargetInfo. Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Moving the default TargetTransformInfoImplBase implementation to a default MCSubtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default MCSubtargetInfo implementation essentially returns the defaults TargetTransformInfoImplBase used to return. Existing users of TTI defaults will hit the defaults now in MCSubtargetInfo. Targets that define their own custom TTI implementations won't use the BasicTTIImpl implementations that route to the subtarget. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 374205	2019-10-09 19:51:48 +00:00
Jinsong Ji	6616c06663	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a. This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091	2019-10-08 17:32:56 +00:00
Zi Xuan Wu	d4140de63c	[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017	2019-10-08 03:28:33 +00:00
Martin Storsjo	6530ff4317	Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine" This reverts SVN r373833, as it caused a failed assert "Non-zero loop cost expected" on building numerous projects, see PR43582 for details and reproduction samples. llvm-svn: 373882	2019-10-07 08:21:37 +00:00
Sanjay Patel	801065a9df	[SLP] avoid reduction transform on patterns that the backend can load-combine I don't see an ideal solution to these 2 related, potentially large, perf regressions: https://bugs.llvm.org/show_bug.cgi?id=42708 https://bugs.llvm.org/show_bug.cgi?id=43146 We decided that load combining was unsuitable for IR because it could obscure other optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend. Therefore, preventing SLP from destroying load combine opportunities requires that it recognizes patterns that could be combined later, but not do the optimization itself ( it's not a vector combine anyway, so it's probably out-of-scope for SLP). Here, we add a scalar cost model adjustment with a conservative pattern match and cost summation for a multi-instruction sequence that can probably be reduced later. This should prevent SLP from creating a vector reduction unless that sequence is extremely cheap. In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining will produce a single instruction on these tests like: movbe rax, qword ptr [rdi] or: mov rax, qword ptr [rdi] Not some (half) vector monstrosity as we currently do using SLP: vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,.. vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0] movzx eax, byte ptr [rdi] movzx ecx, byte ptr [rdi + 5] shl rcx, 40 movzx edx, byte ptr [rdi + 6] shl rdx, 48 or rdx, rcx movzx ecx, byte ptr [rdi + 7] shl rcx, 56 or rcx, rdx or rcx, rax vextracti128 xmm1, ymm0, 1 vpor xmm0, xmm0, xmm1 vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1] vpor xmm0, xmm0, xmm1 vmovq rax, xmm0 or rax, rcx vzeroupper ret Differential Revision: https://reviews.llvm.org/D67841 llvm-svn: 373833	2019-10-05 18:03:58 +00:00
Bardia Mahjour	a49fa03628	[DDG] Data Dependence Graph - Root Node Summary: This patch adds Root Node to the DDG. The purpose of the root node is to create a single entry node that allows graph walk iterators to iterate through all nodes of the graph, making sure that no node is left unvisited during a graph walk (eg. SCC or DFS). Once the DDG is fully constructed it will have exactly one root node. Every node in the graph is reachable from the root. The algorithm for connecting the root node is based on depth-first-search that keeps track of visited nodes to try to avoid creating unnecessary edges. Authored By: bmahjour Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert Reviewed By: Meinersbur Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto, ppc-slack Tag: #llvm Differential Revision: https://reviews.llvm.org/D67970 llvm-svn: 373386	2019-10-01 19:32:42 +00:00
Guillaume Chatelet	114e854bc6	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
Wei Mi	e03a35d303	[LoopInfo] Remove duplicates in ExitBlocks to reduce the compile time of hasDedicatedExits. For the compile time problem described in https://reviews.llvm.org/D67359, turns out the root cause is there are many duplicates in ExitBlocks so the algorithm complexity of hasDedicatedExits gets very high. If we remove the duplicates, the compile time issue is gone. Thanks to Philip Reames for raising a good question and it leads me to find the root cause. Differential Revision: https://reviews.llvm.org/D68107 llvm-svn: 373045	2019-09-27 05:43:31 +00:00
Wei Mi	92134e706c	Revert "[LoopInfo] Limit the iterations to check whether a loop has dedicated exits" Get a better approach in https://reviews.llvm.org/D68107 to solve the problem. Revert the initial patch and will commit the new one soon. This reverts commit rL372990. llvm-svn: 373044	2019-09-27 05:43:30 +00:00
Wei Mi	72db08839a	[LoopInfo] Limit the iterations to check whether a loop has dedicated exits for extreme large case. We had a case that a single loop which has 4000 exits and the average number of predecessors of each exit is > 1000, and we found compiling the case spent a significant amount of time on checking whether a loop has dedicated exits. This patch adds a limit for the iterations to the check. With the patch, the time to compile our testcase reduced from 1000s to 200s (clang release build). Differential Revision: https://reviews.llvm.org/D67359 llvm-svn: 372990	2019-09-26 15:36:25 +00:00
Florian Hahn	d2c50ce461	[InstCombine] Limit FMul constant folding for fma simplifications. As @reames pointed out post-commit, rL371518 adds additional rounding in some cases, when doing constant folding of the multiplication. This breaks a guarantee llvm.fma makes and must be avoided. This patch reapplies rL371518, but splits off the simplifications not requiring rounding from SimplifFMulInst as SimplifyFMAFMul. Reviewers: spatel, lebedev.ri, reames, scanon Reviewed By: reames Differential Revision: https://reviews.llvm.org/D67434 llvm-svn: 372899	2019-09-25 17:03:20 +00:00
Artur Pilipenko	7644f0319b	[SCEV] Disable canonical expansion for non-affine addrecs. Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D65276 Patch by Evgeniy Brevnov (ybrevnov@azul.com) llvm-svn: 372789	2019-09-24 23:21:07 +00:00
Hiroshi Yamauchi	8566ca7d16	[PGO][PGSO] ProfileSummary changes. (Split of off D67120) ProfileSummary changes for profile guided size optimization. Differential Revision: https://reviews.llvm.org/D67377 llvm-svn: 372783	2019-09-24 22:17:51 +00:00
Simon Pilgrim	3b784b9457	[ValueTracking] Remove unused matchSelectPattern optional argument. NFCI. The matchSelectPattern const wrapper is never explicitly called with the optional Instruction::CastOps argument, and it turns out that it wasn't being forwarded to matchSelectPattern anyway! Noticed while investigating clang static analyzer warnings. llvm-svn: 372604	2019-09-23 13:20:47 +00:00

1 2 3 4 5 ...

4247 Commits