llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

Author	SHA1	Message	Date
Simon Pilgrim	4cccff07f7	[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT (REAPPLIED) This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract. In particular this helps remove some unnecessary scalar->vector->scalar patterns. The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue. Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0. Differential Revision: https://reviews.llvm.org/D65887	2020-01-04 13:15:50 +00:00
Craig Topper	ef88147c11	[X86] Update MaxIndex test in x86-cmov-converter.ll to return the index and not use the index to look up the array after the loop. This represents a more realistic version of the code being tested. The cmov converter doesn't look at the code after the loop so it doesn't matter for what's being tested. But as noted in this twitter thread https://twitter.com/trav_downs/status/1213311159413161987 gcc can turn the previous MaxIndex code into the MaxValue code. So returning the index makes it a distinct case.	2020-01-03 23:59:54 -08:00
Craig Topper	4ad2df5155	[X86] Autogenerate complete checks. NFC	2020-01-03 17:18:18 -08:00
Matt Arsenault	037bd41e28	AMDGPU: Add gfx9 run lines to a testcase	2020-01-03 15:25:50 -05:00
Sanjay Patel	7ff3934618	[DAGCombiner] fix miscompile in translating (X & undef) to shuffle See PR42982 for more context: https://bugs.llvm.org/show_bug.cgi?id=42982	2020-01-03 14:58:49 -05:00
Sanjay Patel	e91f60c7b7	[x86] add test for miscompile in XformToShuffleWithZero(); NFC	2020-01-03 14:49:25 -05:00
Craig Topper	58c0b5e9fe	[X86] Improve for v2i32->v2f64 uint_to_fp This uses an alternative implementation of this conversion derived from our v2i32->v2f32 handling. We can zero extend the v2i32 to v2i64, or it with the bit representation of 2.0^52 which will give us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits. Then we just need to subtract 2.0^52 as a double and let the floating point unit normalize the remaining bits into a valid double. This is less instructions then our previous code, but does require a port 5 shuffle for the zero extend or unpack. Differential Revision: https://reviews.llvm.org/D71945	2020-01-03 11:39:08 -08:00
Reid Kleckner	e7c2cc0e45	Move tail call disabling code to target independent code When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in d9699bc7bdf0362173fcd256690f61a4d47429c2 (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118	2020-01-03 11:27:41 -08:00
Fangrui Song	9fd094ca82	[AArch64][test] Merge arm64-$i.ll Linux tests into $i.ll Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D72061	2020-01-03 09:18:55 -08:00
Matt Arsenault	0ffb461077	AMDGPU/GlobalISel: Fix off by one in operand index This should be looking at the RHS of the add for a constant.	2020-01-03 10:30:30 -05:00
Roman Lebedev	24c9de9750	[DAGCombiner][X86][AArch64] Generalize `A-(A&B)`->`A&(~B)` fold (PR44448) The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should/can just be 'A - (A & B)' -> 'A & (~B)' Even if we don't manage to fold `~` into B, we have likely formed `ANDN` node. Also, this way there's less similar-but-duplicate folds. Name: X - (X & Y) -> X & (~Y) %o = and i32 %X, %Y %r = sub i32 %X, %o => %n = xor i32 %Y, -1 %r = and i32 %X, %n https://rise4fun.com/Alive/kOUl See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 17:55:47 +03:00
Roman Lebedev	e5e0775c51	[NFC][X86][AArch64] Add 'A - (A & B)' pattern tests (PR44448) The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should just be 'A - (A & B)' -> 'A & (~B)' Name: X - (X & Y) -> X & (~Y) %o = and i32 %X, %Y %r = sub i32 %X, %o => %n = xor i32 %Y, -1 %r = and i32 %X, %n https://rise4fun.com/Alive/kOUl See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 17:55:46 +03:00
Roman Lebedev	122f4cf40d	[NFC][X86] Add BMI runlines to align-down.ll test	2020-01-03 17:55:46 +03:00
Roman Lebedev	f9c10b284e	[DAGCombiner] `~(add X, -1)` -> `neg X` fold The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should just be 'A - (A & B)' -> 'A & (~B)', but we currently fail to sink that '~' into `(B - 1)`. Name: ~(X - 1) -> (0 - X) %o = add i32 %X, -1 %r = xor i32 %o, -1 => %r = sub i32 0, %X https://rise4fun.com/Alive/rjU	2020-01-03 17:55:46 +03:00
Roman Lebedev	976233953a	[NFC][DAGCombine][X86] '~(X - 1)' pattern tests The fold 'A - (A & (B - 1))' -> 'A & (0 - B)' added in 8dab0a4a7d691f2704f1079538e0ef29548db159 is too specific. It should just be 'A - (A & B)' -> 'A & (~B)', but we currently fail to sink that '~' into `(B - 1)`. Name: ~(X - 1) -> (0 - X) %o = add i32 %X, -1 %r = xor i32 %o, -1 => %r = sub i32 0, %X https://rise4fun.com/Alive/rjU	2020-01-03 17:55:46 +03:00
Roman Lebedev	f8c34dee2f	[DAGCombine][X86][Thumb2/LowOverheadLoops] `A - (A & C)` -> `A & (~C)` fold (PR44448) While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. Name: PR44448 ptr - (ptr & C) -> ptr & (~C) %bias = and i32 %ptr, C %r = sub i32 %ptr, %bias => %r = and i32 %ptr, ~C See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 17:55:45 +03:00
Roman Lebedev	65e2edf233	[NFC][DAGCombine][X86] Tests for 'A - (A & C)' pattern (PR44448) Name: PR44448 ptr - (ptr & C) -> ptr & (~C) %bias = and i32 %ptr, C %r = sub i32 %ptr, %bias => %r = and i32 %ptr, ~C The main motivational pattern involes pointer-typed values, so this transform can't really be done in middle-end. See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 17:55:45 +03:00
Sam Parker	c88913b0a6	[ARM][NFC] Update MIR test	2020-01-03 14:51:15 +00:00
Roman Lebedev	d15ef79468	[DAGCombine][X86][AArch64] 'A - (A & (B - 1))' -> 'A & (0 - B)' fold (PR44448) While we do manage to fold integer-typed IR in middle-end, we can't do that for the main motivational case of pointers. There is @llvm.ptrmask() intrinsic which may or may not be helpful, but i'm not sure it is fully considered canonical yet, not everything is fully aware of it likely. https://rise4fun.com/Alive/ZVdp Name: ptr - (ptr & (alignment-1)) -> ptr & (0 - alignment) %mask = add i64 %alignment, -1 %bias = and i64 %ptr, %mask %r = sub i64 %ptr, %bias => %highbitmask = sub i64 0, %alignment %r = and i64 %ptr, %highbitmask See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 13:58:36 +03:00
Roman Lebedev	1a3061047d	[NFC][DAGCombine][X86][AArch64] Tests for 'A - (A & (B - 1))' pattern (PR44448) https://rise4fun.com/Alive/ZVdp Name: ptr - (ptr & (alignment-1)) -> ptr & (0 - alignment) %mask = add i64 %alignment, -1 %bias = and i64 %ptr, %mask %r = sub i64 %ptr, %bias => %highbitmask = sub i64 0, %alignment %r = and i64 %ptr, %highbitmask The main motivational pattern involes pointer-typed values, so this transform can't really be done in middle-end. See https://bugs.llvm.org/show_bug.cgi?id=44448 https://reviews.llvm.org/D71499	2020-01-03 13:58:36 +03:00
Craig Topper	01fa15d221	[X86] Re-enable lowerUINT_TO_FP_vXi32 under fast-math by using an FSUB instead of an FADD. Summary: We previously disabled this under fast math due to aggressive reassociation by the machine combiner. But I think we can work around this by using a FSUB instead of FADD for the first operation. This matches the similar algorithm we do for uint_to_fp i64->f64 in TargetLowering::expandUINT_TO_FP. If reassociation hasn't been a problem for that, hopefully its not a problem here. Reviewers: RKSimon, spatel, scanon Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71968	2020-01-02 21:46:53 -08:00
QingShan Zhang	0a7fe2ac65	[DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000	2020-01-03 03:26:41 +00:00
Wang, Pengfei	6277397419	[X86] Enable strict FP by default and remove option -disable-strictnode-mutation. NFCI.	2020-01-03 10:59:34 +08:00
Justin Hibbits	75b225e755	[PowerPC]: Fix predicate handling with SPE SPE floating-point compare instructions only update the GT bit in the CR field. All predicates must therefore be reduced to GT/LE.	2020-01-02 19:30:53 -06:00
Justin Hibbits	72adc6d222	Run update_llc_test_checks against SPE tests. This is in preparation for further tests which are better generated with the script. No functional change.	2020-01-02 19:30:52 -06:00
Wang, Pengfei	1222e272ea	[X86] Optimization of inserting vxi1 sub vector into vXi1 vector Summary: After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations. The history information at https://reviews.llvm.org/D68311 Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Patch by Xiang Zhang (xiangzhangllvm) Differential Revision: https://reviews.llvm.org/D71917	2020-01-03 09:25:25 +08:00
Sean Fertile	800b81bc7b	[PowerPC][AIX] Enable sret arguments. Removes the fatal error for sret arguments and adds lit testing. Differential Revision: https://reviews.llvm.org/D71504	2020-01-02 19:31:01 -05:00
Evgenii Stepanov	9d355c1873	Change dbg-*-tag-offset tests to use llvm-dwarfdump. Reviewers: dblaikie Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72023	2020-01-02 14:35:54 -08:00
Jonas Paulsson	2c90fb35b9	[SystemZ] Create brcl 0,0 instead of brcl 0,3 in EmitNop for 6 bytes. For consistency with GCC, the target label is moved to the brcl itself instead of the next instruction. Review: Ulrich Weigand	2020-01-02 13:21:04 -08:00
Matt Arsenault	ff6a637c9b	AMDGPU/GlobalISel: Correct MMO sizes in some tests There intended to test non-extloads, but the memory size did not match the result size.	2020-01-02 16:00:46 -05:00
Matt Arsenault	6596ef63a7	AMDGPU/GlobalISel: Regenerate check lines This avoids diff noise in a future commit from the check name change from the G_GEP->G_PTR_ADD rename.	2020-01-02 16:00:45 -05:00
Nemanja Ivanovic	95db3a73f8	[PowerPC] Only legalize FNEARBYINT with unsafe fp math Commit 0f0330a78709 legalized these nodes on PPC without consideration of unsafe math which means that we get inexact exceptions raised for nearbyint. Since this doesn't conform to the standard, switch this legalization to depend on unsafe fp math.	2020-01-02 13:45:54 -06:00
Craig Topper	2dd345a5d8	[X86] Remove FP0-6 operands from call instructions in FPStackifier pass. Only count defs as returns. All FP0-6 operands should be removed by the FP stackifier. By removing these we fix the machine verifier error in PR39437. I've also made it so that only defs are counted for STReturns which removes what I think were extra stack cleanup instructions. And I've removed the regcall assert because it was checking the attributes of the caller, but here we're concerned with the attributes of the callee. But I don't know how to get that information from this level.	2020-01-02 11:10:51 -08:00
Ulrich Weigand	d43411f026	[FPEnv] Default NoFPExcept SDNodeFlag to false The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all other such flags. This is a problem, because it implies that all code that transforms SDNodes without copying flags can introduce a correctness bug, not just a missed optimization. This patch changes the default to false. This makes it necessary to move setting the (No)FPExcept flag for constrained intrinsics from the visitConstrainedIntrinsic routine to the generic visit routine at the place where the other flags are set, or else the intersectFlagsWith call would erase the NoFPExcept flag again. In order to avoid making non-strict FP code worse, whenever SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes none of which can raise FP exceptions, it will preserve this property on all results nodes generated, by setting the NoFPExcept flag on those result nodes that would otherwise be considered as raising an FP exception. To check whether or not an SD node should be considered as raising an FP exception, the following logic applies: - For machine nodes, check the mayRaiseFPException property of the underlying MI instruction - For regular nodes, check isStrictFPOpcode - For target nodes, check a newly introduced isTargetStrictFPOpcode The latter is implemented by reserving a range of target opcodes, similarly to how memory opcodes are identified. (Note that there a bit of a quirk in identifying target nodes that are both memory nodes and strict FP nodes. To simplify the logic, right now all target memory nodes are automatically also considered strict FP nodes -- this could be fixed by adding one more range.) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D71841	2020-01-02 16:59:45 +01:00
David Green	5ca377e901	[ARM] Update ifcvt test target triples and opcodes. NFC Some of the instructions in these tests were technically invalid combinations (using ARM opcodes in Thumb mode, for example). Update the targets and the instructions used to be more correct.	2020-01-02 14:18:54 +00:00
Andrzej Warzynski	f863ea0ba6	[AArch64][SVE] Gather loads: pass 32 bit unpacked offsets as nxv2i32 Summary: Currently 32 bit unpacked offsets are passed as nxv2i64. However, as pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead would improve consistency with: * how other arguments are treated * how scatter stores are implemented This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32 instead of nxv2i64. Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71724	2020-01-02 13:01:28 +00:00
Fangrui Song	524d73508e	[XRay][test] Fix xray-empty-firstmbb.mir and delete incorrect xray-empty-function.mir xray-empty-firstmbb.mir does not test the intended code path. Change xray-instruction-threshold to 0 to exercise the code path. Delete xray-empty-function.mir . Empty MachineFunction does not work. Various passes (e.g. MachineDominatorTree) assume the presence of an entry block.	2020-01-01 22:21:11 -08:00
Craig Topper	e7e2dd9f70	[X86] Add x86_regcallcc calling convention to function declaration recently added in a test. The callsite had the calling convention, but not the function itself.	2020-01-01 19:07:37 -08:00
Craig Topper	88590b7c02	[X86] Add test cases for regcall function that takes a long double as a parameter, but does not return a long double. I believe we are incorrectly doing some FP stack manipulations after the call.	2020-01-01 18:53:12 -08:00
Craig Topper	0d150f0e68	[X86] Call SimplifyMultipleUseDemandedBits from combineVSelectToBLENDV if the condition is used by something other than select conditions. We might be able to bypass some nodes on the condition path. Differential Revision: https://reviews.llvm.org/D71984	2020-01-01 11:16:52 -08:00
David Green	d7d2f17739	[ARM] Add +mve feature to mve tests. NFC	2020-01-01 17:25:20 +00:00
Liu, Chen3	b8ff856aff	add strict float for round operation Differential Revision: https://reviews.llvm.org/D72026	2020-01-01 20:42:12 +08:00
Fangrui Song	fcca6780a8	[MC][TargetMachine] Delete MCTargetOptions::MCPIECopyRelocations clang/lib/CodeGen/CodeGenModule performs the -mpie-copy-relocations check and sets dso_local on applicable global variables. We don't need to duplicate the work in TargetMachine shouldAssumeDSOLocal. Verified that -mpie-copy-relocations can still emit PC relative relocations for external variable accesses. clang -target x86_64 -fpie -mpie-copy-relocations -c => R_X86_64_PC32 clang -target aarch64 -fpie -mpie-copy-relocations -c => R_AARCH64_ADR_PREL_PG_HI21+R_AARCH64_LDST64_ABS_LO12_NC	2020-01-01 00:50:18 -08:00
Craig Topper	c7da2d97ef	[X86] Add X87 FCMOV support to X86FlagsCopyLowering. Fixes PR44396	2019-12-31 20:35:21 -08:00
Matt Arsenault	94bbcd967d	DAG: Stop trying to fold FP -(x-y) -> y-x in getNode with nsz This was increasing the number of instructions when fsub was legalized on AMDGPU with no signed zeros enabled. This fold should be guarded by hasOneUse, and I don't think getNode should be doing that. The same fold is already done as a regular combine through isNegatibleForFree. This does require duplicating, even though isNegatibleForFree does this combine already (and properly checks hasOneUse) to avoid one PPC regression. In the regression, the outer fneg has nsz but the fsub operand does not. isNegatibleForFree only sees the operand, and doesn't see it's used from a nsz context. A nsz parameter needs to be added and threaded through isNegatibleForFree to avoid this.	2019-12-31 22:49:51 -05:00
Craig Topper	3d13b6dc72	[X86] Constant fold KSHIFT of an all zeros vector to just an all zeros vector.	2019-12-31 15:57:39 -08:00
Craig Topper	fad67910ab	[X86] Use carry flag from add for (seteq (add X, -1), -1). If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest. Fixes one case from PR44412 Differential Revision: https://reviews.llvm.org/D72019	2019-12-31 15:05:23 -08:00
Matt Arsenault	6ce33119b3	AMDGPU: Precommit test showing extra instructions are introduced	2019-12-31 14:54:57 -05:00
Michael Liao	e0db31d08e	[amdgpu] Fix scoreboard updating on `s_waitcnt_vscnt`. Summary: - Other counters are accidentally cleared. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71866	2019-12-31 14:20:30 -05:00
Craig Topper	dd14b23f4e	[X86] Add test case for opposite branch condition for PR44412. NFC	2019-12-31 10:58:04 -08:00

1 2 3 4 5 ...

31898 Commits