llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Alex Bradbury	78e18785d2	[RISCV] Implement codegen for cmpxchg on RV32IA Utilise a similar ('late') lowering strategy to D47882. The changes to AtomicExpandPass allow this strategy to be utilised by other targets which implement shouldExpandAtomicCmpXchgInIR. All cmpxchg are lowered as 'strong' currently and failure ordering is ignored. This is conservative but correct. Differential Revision: https://reviews.llvm.org/D48131 llvm-svn: 347914	2018-11-29 20:43:42 +00:00
Craig Topper	513fc8bae1	[X86] Change the pre-type legalization DAG combine added in r347898 into a custom type legalization operation instead. This seems to produce the same results on the tests we have. llvm-svn: 347912	2018-11-29 20:18:58 +00:00
David Stuttard	c4074ba685	Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic" Also revert fix r347876 One of the buildbots was reporting a failure in some relevant tests that I can't repro or explain at present, so reverting until I can isolate. llvm-svn: 347911	2018-11-29 20:14:17 +00:00
Francis Visoiu Mistrih	b1262b3845	[MachineScheduler] Order FI-based memops based on stack direction It makes more sense to order FI-based memops in descending order when the stack goes down. This allows offsets to stay "consecutive" and allow easier pattern matching. llvm-svn: 347906	2018-11-29 20:03:19 +00:00
Craig Topper	dfe7e315ea	[SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG. I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse. Differential Revision: https://reviews.llvm.org/D54276 llvm-svn: 347902	2018-11-29 19:36:17 +00:00
Craig Topper	099c4dc2f4	[X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing. I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1? Differential Revision: https://reviews.llvm.org/D54919 llvm-svn: 347898	2018-11-29 19:13:38 +00:00
Graham Sellers	95cb757dac	[AMDGPU] Add and update scalar instructions This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit. A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly). Differential: https://reviews.llvm.org/D54714 llvm-svn: 347877	2018-11-29 16:05:38 +00:00
David Stuttard	aae321d253	Fix: Add support for TFE/LWE in image intrinsic My change svn-id: 347871 caused a buildbot failure due to an unused variable def (used in an assert). Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336 llvm-svn: 347876	2018-11-29 15:56:36 +00:00
David Stuttard	c65201ef09	Add support for TFE/LWE in image intrinsics TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871	2018-11-29 15:21:13 +00:00
Hans Wennborg	98a8db31eb	Revert r347596 "Support for inserting profile-directed cache prefetches" It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for repro. This also reverts the follow-ups: Revert r347724 "Do not insert prefetches with unsupported memory operands." Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596" Revert r347607 "Add new passes to X86 pipeline tests" llvm-svn: 347864	2018-11-29 13:58:02 +00:00
Petr Pavlu	a73160fd34	[GlobalISel] Make EnableGlobalISel always set when GISel is enabled Change meaning of TargetOptions::EnableGlobalISel. The flag was previously set only when a target switched on GlobalISel but it is now always set when the GlobalISel pipeline is enabled. This makes the flag consistent with TargetOptions::EnableFastISel and allows its use in other parts of the compiler to determine when GlobalISel is enabled. The EnableGlobalISel flag had previouly only one use in TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value to determine if GlobalISel was enabled by a target and returned false in such a case. To preserve the current behaviour, a new flag TargetOptions::GlobalISelAbort is introduced to separately record the abort behaviour. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347861	2018-11-29 12:56:32 +00:00
Andrea Di Biagio	a900121de4	[llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857	2018-11-29 12:15:56 +00:00
Nicolai Haehnle	fb70615d5e	AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo Summary: MachineLoopInfo cannot be relied on for correctness, because it cannot properly recognize loops in irreducible control flow which can be introduced by late machine basic block optimization passes. See the new test case for the reduced form of an example that occurred in practice. Use a simple fixpoint iteration instead. In order to facilitate this change, refactor WaitcntBrackets so that it only tracks pending events and registers, rather than also maintaining state that is relevant for the high-level algorithm. Various accessor methods can be removed or made private as a consequence. Affects (in radv): - dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex} Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU") Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54231 llvm-svn: 347853	2018-11-29 11:06:26 +00:00
Nicolai Haehnle	bbaf7db77b	AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points Summary: There is one obsolete reference to using -1 as an indication of "unknown", but this isn't actually used anywhere. Using unsigned makes robust wrapping checks easier. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam Differential Revision: https://reviews.llvm.org/D54230 llvm-svn: 347852	2018-11-29 11:06:21 +00:00
Nicolai Haehnle	d0a944b0ec	AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54229 llvm-svn: 347851	2018-11-29 11:06:18 +00:00
Nicolai Haehnle	008d5e44a3	AMDGPU/InsertWaitcnts: Simplify pending events tracking Summary: Instead of storing the "score" (last time point) of the various relevant events, only store whether an event is pending or not. This is sufficient, because whenever only one event of a count type is pending, its last time point is naturally the upper bound of all time points of this count type, and when multiple event types are pending, the count type has gone out of order and an s_waitcnt to 0 is required to clear any pending event type (and will then clear all pending event types for that count type). This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK. I do not understand what this special handling ever attempted to achieve. It has existed ever since the original port from an internal code base, so my best guess is that it solved a problem related to EXEC handling in that internal code base. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54228 llvm-svn: 347850	2018-11-29 11:06:14 +00:00
Nicolai Haehnle	1eb7b430b1	AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types Summary: It hides the type casting ugliness, and I happened to have to add a new such loop (in a later patch). Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54227 llvm-svn: 347849	2018-11-29 11:06:11 +00:00
Nicolai Haehnle	72f01d1369	AMDGPU/InsertWaitcnts: Untangle some semi-global state Summary: Reduce the statefulness of the algorithm in two ways: 1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets. 2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets. To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form. There are some functional changes: 1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters. 2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases. s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1) This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are: WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0] Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54226 llvm-svn: 347848	2018-11-29 11:06:06 +00:00
Craig Topper	e8fbdccc16	[X86] Correct comment. NFC llvm-svn: 347835	2018-11-29 05:56:03 +00:00
Li Jia He	781b1e98f2	[PowerPC] Fix a conversion is not considered when the ISD::BR_CC node making the instruction selection Summary: A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative so for signed comparisons vs. unsigned ones, the input operands just need to be swapped. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54825 llvm-svn: 347831	2018-11-29 03:04:39 +00:00
Sanjay Patel	7efca9012e	[x86] try select simplification for target-specific nodes This failed to select (which might be a separate bug) in X86ISelDAGToDAG because we try to create a select node that can be simplified away after rL347227. This change avoids the problem by simplifying the SHRUNKBLEND node sooner. In the test case, we manage to realize that the true/false values of the select (SHRUNKBLEND) are the same thing, so it simplifies away completely. llvm-svn: 347818	2018-11-28 22:51:04 +00:00
Craig Topper	75af621930	[X86] Make X86TTIImpl::getCastInstrCost properly handle the case where AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786	2018-11-28 18:11:42 +00:00
Craig Topper	b3ade96ac9	[X86] Add some cost model entries for sext/zext for avx512bw This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785	2018-11-28 18:11:39 +00:00
Craig Topper	68b5fe48b0	[X86] Add a combine for back to back VSRAI instructions Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI Differential Revision: https://reviews.llvm.org/D54959 llvm-svn: 347784	2018-11-28 18:03:38 +00:00
Alex Bradbury	cdab4efbd2	[RISCV] Support .option push and .option pop This adds support in the RISCVAsmParser the storing of Subtarget feature bits to a stack so that they can be pushed/popped to enable/disable multiple features at once. Differential Revision: https://reviews.llvm.org/D46424 Patch by Lewis Revill. llvm-svn: 347774	2018-11-28 16:39:14 +00:00
Francis Visoiu Mistrih	c7da15d985	[MachineScheduler] Add support for clustering mem ops with FI base operands Before this patch, the following stores in `merge_fail` would fail to be merged, while they would get merged in `merge_ok`: ``` void use(unsigned long long ); void merge_fail(unsigned key, unsigned index) { unsigned long long args[8]; args[0] = key; args[1] = index; use(args); } void merge_ok(unsigned long long dst, unsigned a, unsigned b) { dst[0] = a; dst[1] = b; } ``` The reason is that `getMemOpBaseImmOfs` would return false for FI base operands. This adds support for this. Differential Revision: https://reviews.llvm.org/D54847 llvm-svn: 347747	2018-11-28 12:00:28 +00:00
Francis Visoiu Mistrih	6683b9c236	[CodeGen][NFC] Make `TII::getMemOpBaseImmOfs` return a base operand Currently, instructions doing memory accesses through a base operand that is not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`. This means that functions such as `TII::shouldClusterMemOps` will bail out on instructions using an FI as a base instead of a register. The goal of this patch is to refactor all this to return a base operand instead of a base register. Then in a separate patch, I will add FI support to the mem op clustering in the MachineScheduler. Differential Revision: https://reviews.llvm.org/D54846 llvm-svn: 347746	2018-11-28 12:00:20 +00:00
Simon Atanasyan	200e7ee47a	[DebugInfo] Rename EmitDebugThreadLocal back to EmitDebugValue. NFC This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr for PCOffset. In that case the expression is unrelated to thread locals and so emitting a value of the DIEExpr does not have to always mean emit-debug-thread-local. llvm-svn: 347744	2018-11-28 11:48:07 +00:00
Jonas Paulsson	ea9fed10ed	[SystemZ::TTI] Improve cost for compare of i64 with extended i32 load CGF/CLGF compares an i64 register with a sign/zero extended loaded i32 value in memory. This patch makes such a load considered foldable and so gets a 0 cost. Review: Ulrich Weigand https://reviews.llvm.org/D54944 llvm-svn: 347735	2018-11-28 08:58:27 +00:00
Jonas Paulsson	4328cc486d	[SystemZ::TTI] Improve costs for i16 add, sub and mul against memory. AH, SH and MH costs are already covered in the cases where LHS is 32 bits and RHS is 16 bits of memory sign-extended to i32. As these instructions are also used when LHS is i16, this patch recognizes that the loads will get folded then as well. Review: Ulrich Weigand https://reviews.llvm.org/D54940 llvm-svn: 347734	2018-11-28 08:31:50 +00:00
Jonas Paulsson	5ca245644e	[SystemZ::TTI] Improved cost values for comparison against memory. Single instructions exist for i8 and i16 comparisons of memory against a small immediate. This patch makes sure that if the load in these cases has a single user (the ICmp), it gets a 0 cost (folded), and also that the ICmp gets a cost of 1. Review: Ulrich Weigand https://reviews.llvm.org/D54897 llvm-svn: 347733	2018-11-28 08:08:05 +00:00
Jonas Paulsson	67885a2bf8	[SystemZ::TTI] Return zero cost for scalar load/store connected with a bswap. Since byte-swapping loads and stores are supported, a 'load -> bswap' or 'bswap -> store' sequence should have the cost of one. Review: Ulrich Weigand https://reviews.llvm.org/D54870 llvm-svn: 347732	2018-11-28 07:52:34 +00:00
Mircea Trofin	0953e7ad7b	Do not insert prefetches with unsupported memory operands. Summary: Ignore advices where the memory operand of the 'anchor' instruction uses unsupported register types. Reviewers: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54983 llvm-svn: 347724	2018-11-28 01:08:45 +00:00
Evandro Menezes	a13afb10aa	[TableGen] Refactor macro names (NFC) Make the names for the macros for `TargetInstrInfo` uniform. llvm-svn: 347706	2018-11-27 20:58:27 +00:00
Craig Topper	d99e39cf74	[X86] Replace an APInt that is guaranteed to be 8-bits with just an 'unsigned' We're already mixing this APInt with other 'unsigned' variables. This allows us to use regular comparison operators instead of needing to use APInt::ult or APInt::uge. And it removes a later conversion from APInt to unsigned. I might be adding another combine to this function and this will probably simplify the logic required for that. llvm-svn: 347684	2018-11-27 18:24:56 +00:00
Craig Topper	f88d85bd02	[X86] Add cascade lake arch in X86 target. This is skylake-avx512 with the addition of avx512vnni ISA. Patch by Jianping Chen Differential Revision: https://reviews.llvm.org/D54785 llvm-svn: 347681	2018-11-27 18:05:00 +00:00
Stanislav Mekhanoshin	a6054d7527	[AMDGPU] Disable DAG combine at -O0 Differential Revision: https://reviews.llvm.org/D54358 llvm-svn: 347659	2018-11-27 15:13:37 +00:00
Craig Topper	98b042f678	[X86] Use getUnpackl/getUnpackh instead of directly creating UNPCKL/UNPCKH nodes. llvm-svn: 347642	2018-11-27 06:24:56 +00:00
Craig Topper	1d0281cd43	[X86] Prevent DAG combine from folding a bitcast from vXi1 to iX with a store on pre-AVX512 targets. If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk. We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears. llvm-svn: 347632	2018-11-27 02:57:27 +00:00
Sterling Augustine	567aedab5a	Notify the linker when a TU compiled with split-stack has a function without a prologue. More context here: https://go-review.googlesource.com/c/go/+/148819/ llvm-svn: 347614	2018-11-26 23:26:31 +00:00
David Blaikie	8a379c9d75	AArch64ISelLowering: Remove a return-of-assignment to allow NRVO Patch by Arthur O'Dwyer! llvm-svn: 347609	2018-11-26 22:57:18 +00:00
Mircea Trofin	342ebc2382	Add new passes to X86 pipeline tests Summary: Fixes test failures introduced by rL347596. Reviewers: davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54916 llvm-svn: 347607	2018-11-26 22:49:17 +00:00
Fangrui Song	2b86e18bbf	[X86] Add dependency from X86 to ProfileData after rL347596 llvm-svn: 347606	2018-11-26 22:16:19 +00:00
Evandro Menezes	c0e9308c17	[AArch64] Refactor the scheduling predicates (3/3) (NFC) Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasExtendedReg()`. Differential revision: https://reviews.llvm.org/D54822 llvm-svn: 347599	2018-11-26 21:47:46 +00:00
Evandro Menezes	12a8ee69b4	[AArch64] Refactor the scheduling predicates (2/3) (NFC) Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::hasShiftedReg()`. Differential revision: https://reviews.llvm.org/D54820 llvm-svn: 347598	2018-11-26 21:47:41 +00:00
Evandro Menezes	7caf28a957	[AArch64] Refactor the scheduling predicates (1/3) (NFC) Refactor the scheduling predicates based on `MCInstPredicate`. In this case, `AArch64InstrInfo::isScaledAddr()` Differential revision: https://reviews.llvm.org/D54777 llvm-svn: 347597	2018-11-26 21:47:28 +00:00
Mircea Trofin	60f657acba	Support for inserting profile-directed cache prefetches Summary: Support for profile-driven cache prefetching (X86) This change is part of a larger system, consisting of a cache prefetches recommender, create_llvm_prof (https://github.com/google/autofdo), and LLVM. A proof of concept recommender is DynamoRIO's cache miss analyzer. It processes memory access traces obtained from a running binary and identifies patterns in cache misses. Based on them, it produces a csv file with recommendations. The expectation is that, by leveraging such recommendations, we can reduce the amount of clock cycles spent waiting for data from memory. A microbenchmark based on the DynamoRIO analyzer is available as a proof of concept: https://goo.gl/6TM2Xp. The recommender makes prefetch recommendations in terms of: * the binary offset of an instruction with a memory operand; * a delta; * and a type (nta, t0, t1, t2) meaning: a prefetch of that type should be inserted right before the instrution at that binary offset, and the prefetch should be for an address delta away from the memory address the instruction will access. For example: 0x400ab2,64,nta and assuming the instruction at 0x400ab2 is: movzbl (%rbx,%rdx,1),%edx means that the recommender determined it would be beneficial for a prefetchnta instruction to be inserted right before this instruction, as such: prefetchnta 0x40(%rbx,%rdx,1) movzbl (%rbx, %rdx, 1), %edx The workflow for prefetch cache instrumentation is as follows (the proof of concept script details these steps as well): 1. build binary, making sure -gmlt -fdebug-info-for-profiling is passed. The latter option will enable the X86DiscriminateMemOps pass, which ensures instructions with memory operands are uniquely identifiable (this causes ~2% size increase in total binary size due to the additional debug information). 2. collect memory traces, run analysis to obtain recommendations (see above-referenced DynamoRIO demo as a proof of concept). 3. use create_llvm_prof to convert recommendations to reference insertion locations in terms of debug info locations. 4. rebuild binary, using the exact same set of arguments used initially, to which -mllvm -prefetch-hints-file=<file> needs to be added, using the afdo file obtained at step 3. Note that if sample profiling feedback-driven optimization is also desired, that happens before step 1 above. In this case, the sample profile afdo file that was used to produce the binary at step 1 must also be included in step 4. The data needed by the compiler in order to identify prefetch insertion points is very similar to what is needed for sample profiles. For this reason, and given that the overall approach (memory tracing-based cache recommendation mechanisms) is under active development, we use the afdo format as a syntax for capturing this information. We avoid confusing semantics with sample profile afdo data by feeding the two types of information to the compiler through separate files and compiler flags. Should the approach prove successful, we can investigate improvements to this encoding mechanism. Reviewers: davidxl, wmi, craig.topper Reviewed By: davidxl, wmi, craig.topper Subscribers: davide, danielcdh, mgorny, aprantl, eraman, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D54052 llvm-svn: 347596	2018-11-26 21:36:18 +00:00
Matt Arsenault	4836b99157	AMDGPU: Record SGPR spills when restoring too It's possible in some cases to have a restore present without a corresponding spill. Due to an apparent bug in D54366 <https://reviews.llvm.org/D54366>, only the restore for a register was emitted. It's probably always a bug for this to happen, but due to how SGPR spilling is implemented, this makes the issues appear worse than it is. llvm-svn: 347595	2018-11-26 21:28:40 +00:00
Craig Topper	0792d88e71	[LegalizeVectorTypes][X86][ARM][AArch64][PowerPC] Don't use SplitVecOp_TruncateHelper for FP_TO_SINT/UINT. SplitVecOp_TruncateHelper tries to promote the result type while splitting FP_TO_SINT/UINT. It then concatenates the result and introduces a truncate to the original result type. But it does this without inserting the AssertZExt/AssertSExt that the regular result type promotion would insert. Nor does it turn FP_TO_UINT into FP_TO_SINT the way normal result type promotion for these operations does. This is bad on X86 which doesn't support FP_TO_SINT until AVX512. This patch disables the use of SplitVecOp_TruncateHelper for these operations and just lets normal promotion handle it. I've tweaked a couple things in X86ISelLowering to avoid a few obvious regressions there. I believe all the changes on X86 are improvements. The other targets look neutral. Differential Revision: https://reviews.llvm.org/D54906 llvm-svn: 347593	2018-11-26 21:12:39 +00:00
Than McIntosh	19bdd4a2bb	[CodeGen] Support custom format of stack maps Summary: Add a hook to the GCMetadataPrinter for emitting stack maps in custom format. The hook will be called at stack map generation time. The default stack map format is used if there is no hook. For this to be useful a few data structures and accessors are exposed from the StackMaps class, so the custom printer can access the stack map data. This patch authored by Cherry Zhang <cherryyz@google.com>. Reviewers: thanm, apilipenko, reames Reviewed By: reames Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D53892 llvm-svn: 347584	2018-11-26 18:43:48 +00:00

1 2 3 4 5 ...

49939 Commits