llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00

Author	SHA1	Message	Date
Roman Lebedev	e731a4fcca	[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown While we indeed can't treat them as no-ops, i believe we can/should do better than just modelling them as `unknown`. `inttoptr` story is complicated, but for `ptrtoint`, it seems straight-forward to model it just as a zext-or-trunc of unknown. This may be important now that we track towards making inttoptr/ptrtoint casts not no-op, and towards preventing folding them into loads/etc (see D88979/D88789/D88788) Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D88806	2020-10-12 11:04:03 +03:00
David Sherwood	9a012ebf36	[SVE] Make ElementCount and TypeSize use a new PolySize class I have introduced a new template PolySize class, where the template parameter determines the type of quantity, i.e. for an element count this is just an unsigned value. The ElementCount class is now just a simple derivation of PolySize<unsigned>, whereas TypeSize is more complicated because it still needs to contain the uint64_t cast operator, since there are still many places in the code that rely upon this implicit cast. As such the class also still needs some of it's own operators. I've tried to minimise the amount of code in the base PolySize class, which led to a couple of changes: 1. In some places we were relying on '==' operator comparisons between ElementCounts and the scalar value 1. I didn't put this operator in the new PolySize class, and thought it was actually clearer to use the isScalar() function instead. 2. I removed the isByteSized function and replaced it with calls to isKnownMultipleOf(8). I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so that it's more consistent with coefficientDivideBy. Differential Revision: https://reviews.llvm.org/D88409	2020-10-12 08:23:38 +01:00
Kito Cheng	b69d089152	[Tablegen][SubtargetEmitter] Print TuneCPU in Subtarget::ParseSubtargetFeatures Let user able to know which -tune-cpu are used now. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D88951	2020-10-12 14:46:44 +08:00
Fangrui Song	0c8cdbbac8	[SchedDAGInstrs] Delete redundant contains(). NFC	2020-10-11 20:58:30 -07:00
Qiu Chaofan	3ff519e685	[NFC] Move PPC strict-fp MIR test to dedicated file fp-strict-conv-f128.ll is generated by script, but some manual MIR tests exist in it. Move them to another file to satisfy script when updating.	2020-10-12 10:40:19 +08:00
Craig Topper	8ea25ef245	[ValueTracking] Use KnownBits::countMaxLeadingZeros/countMaxTrailingZeros to make code more readable. NFC	2020-10-11 14:26:18 -07:00
Roman Lebedev	794d4b6ead	[InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load And another step towards transforms not introducing inttoptr and/or ptrtoint casts that weren't there already. As we've been establishing (see D88788/D88789), if there is a int<->ptr cast, it basically must stay as-is, we can't do much with it. I've looked, and the most source of new such casts being introduces, as far as i can tell, is this transform, which, ironically, tries to reduce count of casts.. On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in -33.58% less `IntToPtr`s (19014 -> 12629) and +76.20% more `PtrToInt`s (18589 -> 32753), which is an increase of +20.69% in total. However just on RawSpeed, where i know there are basically none `IntToPtr` in the original source code, this results in -99.27% less `IntToPtr`s (2724 -> 20) and +82.92% more `PtrToInt`s (4513 -> 8255). which is again an increase of 14.34% in total. To me this does seem like the step in the right direction, we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`, which seems like a reasonable trade-off. See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995 for some more discussion on the subject. (Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair` should be taught about this, yes) Reviewed By: nlopes, nikic Differential Revision: https://reviews.llvm.org/D88979	2020-10-11 20:24:28 +03:00
David Green	15eddfee76	[LV] Tail folded inloop reductions. This expands upon the inloop reductions added in e9761688e41cb9e976, allowing them to be inserted into tail folded loops. Reductions are generates with the form: x = select(mask, vecop, zero) v = vecreduce.add(x) c = add chain, v Where zero here is chosen as the identity value for add reductions. The backend is then expected to fold the select and the vecreduce into a single predicated instruction. Most of the code is fairly straight forward, except for the creation of blockmasks which need to ensure they are created in dominance order. The order they are added is altered to be after any phis, keeping the requirements for the underlying IR. Differential Revision: https://reviews.llvm.org/D84451	2020-10-11 16:58:34 +01:00
Nikita Popov	992463d714	[MemCpyOpt] Add lifetime may alias test (NFC) Test the case where a lifetime intrinsic may alias the memcpy source. Other cases test must or no alias.	2020-10-11 17:08:28 +02:00
David Green	76941ccb01	[LV] Extra predicated inloop reduction tests. NFC	2020-10-11 15:06:21 +01:00
Nikita Popov	d785af869c	[MemCpyOpt] Add additional byval tests (NFC) Test read/write clobbers and the the non-local case.	2020-10-11 15:22:31 +02:00
Sanjay Patel	f2e6883077	[InstCombine] allow vector splats for add+xor --> shifts	2020-10-11 09:04:24 -04:00
Sanjay Patel	372d448d9b	[InstCombine] add one-use check to add+xor transform As shown in the affected test, we could increase instruction count without this limitation. There's another test with extra use that shows we still convert directly to a real "sext" if possible.	2020-10-11 09:04:24 -04:00
Sanjay Patel	846ed8e1d0	[InstCombine] add tests with extra uses for add+xor transform; NFC	2020-10-11 09:04:24 -04:00
Sanjay Patel	cc5fc0ded3	[InstCombine] add/adjust tests for add+xor -> shifts; NFC	2020-10-11 09:04:24 -04:00
Kazushi (Jam) Marukawa	c18c974b4d	[VE][NFC] Clean VEISelLowering.cpp Clean the order of setOperationActions and others. Differential Revision: https://reviews.llvm.org/D89203	2020-10-11 21:47:50 +09:00
Simon Pilgrim	b07cd634b8	Fix Wdocumentation warning. NFCI. Add a space after /param names before any commas otherwise the doxygen parsers get confused.	2020-10-11 11:25:22 +01:00
Simon Pilgrim	c913b82065	[X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16. This is my first LLVM patch, so please tell me if there are any process issues. The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one. We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this. Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case. Patch By: @TomHender (Tom Hender) ActuallyaDeviloper Differential Revision: https://reviews.llvm.org/D87236	2020-10-11 11:21:23 +01:00
Simon Pilgrim	47a8b646ff	[InstCombine] Remove accidental unnecessary ConstantExpr qualification added in rGb752daa26b64155 MSVC didn't complain but everything else did....	2020-10-11 10:39:51 +01:00
Simon Pilgrim	9d1902bc07	[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139). Differential Revision: https://reviews.llvm.org/D88783	2020-10-11 10:37:20 +01:00
Simon Pilgrim	e74e2ee551	[InstCombine] Replace getLogBase2 internal helper with ConstantExpr::getExactLogBase2. NFCI. This exposes the helper for other power-of-2 instcombine folds that I'm intending to add vector support to. The helper only operated on power-of-2 constants so getExactLogBase2 is a more accurate name.	2020-10-11 10:31:17 +01:00
Xun Li	bcb2d04b33	[Coroutines] Refactor/Rewrite Spill and Alloca processing This patch is a refactoring of how we process spills and allocas during CoroSplit. In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas. And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points. This approach is fundamentally confusing, and unfortunately, incorrect. First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately. Doing so simplify lots of code and makes the logic more clear and easier to reason about. Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap. There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship. For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call. Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!). All these issues suggest that we need to separate spill and alloca in order to properly implement this. This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch. The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap). Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index. In doing so, I also cleaned up a few things and also discovered a few other bugs. Cleanups: 1. Found out that PromiseFieldId is not used, delete it. 2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users. 3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap. 4. All the loops that process Spills are simplified now because we use a map instead of a vector. Bugs: It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function. The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame. Differential Revision: https://reviews.llvm.org/D88872	2020-10-10 22:21:34 -07:00
Craig Topper	026decf374	[X86] Redefine X86ISD::PEXTRB/W and X86ISD::PINSRB/PINSRW to use a i8 TargetConstant for the immediate instead of a ptr constant. This is more consistent with other target specific ISD opcodes that require immediates.	2020-10-10 21:50:58 -07:00
Craig Topper	7b121149bf	[X86] AMX intrinsics should have ImmArg for the register numbers and use timm in isel patterns.	2020-10-10 20:12:28 -07:00
Craig Topper	b26942e7f4	[X86] Add a X86ISD::BEXTRI to distinquish the case where the control must be a constant. The bextri intrinsic has a ImmArg attribute which will be converted in SelectionDAG using TargetConstant. We previously converted this to a plain Constant to allow X86ISD::BEXTR to call SimplifyDemandedBits on it. But while trying to decide if D89178 was safe, I realized that this conversion of TargetConstant to Constant would be one case where that would break. So this patch adds a new opcode specifically for the immediate case. And then teaches computeKnownBits and SimplifyDemandedBits to also handle it, but not try to SimplifyDemandedBits on it. To make up for that, I immediately masked the constant to 16 bits when converting from the intrinsic node to the X86ISD node.	2020-10-10 19:18:06 -07:00
Krzysztof Parzyszek	796a593c0d	[Hexagon] Replace HexagonISD::VSPLAT with ISD::SPLAT_VECTOR This removes VSPLAT and VZERO. VZERO is now SPLAT_VECTOR of (i32 0). Included is also a testcase for the previous (target-independent) commit.	2020-10-10 19:49:47 -05:00
Krzysztof Parzyszek	6f882a320b	[SDAG] Remember to set UndefElts in isSplatValue for SPLAT_VECTOR	2020-10-10 19:42:24 -05:00
Fangrui Song	c84828ec90	[X86] Delete redundant 'static' from namespace scope 'static constexpr'. NFC This decreases 7 lines as the result of packing more bits on one line.	2020-10-10 14:05:49 -07:00
Simon Pilgrim	0dd60f8342	[InstCombine] getLogBase2(undef) -> 0. Move the undef element handling into the getLogBase2 helper instead of pre-empting with replaceUndefsWith.	2020-10-10 20:29:03 +01:00
Alex Denisov	d0ba1d1053	Fix CMake configuration error when run with -Werror/-Wall The following code doesn't compile uint64_t i = x.load(std::memory_order_relaxed); return 0; when CMAKE_C_FLAGS set to -Werror -Wall, thus incorrectly breaking the CMake configuration step: -- Looking for __atomic_load_8 in atomic -- Looking for __atomic_load_8 in atomic - not found CMake Error at cmake/modules/CheckAtomic.cmake:79 (message): Host compiler appears to require libatomic for 64-bit operations, but cannot find it. Call Stack (most recent call first): cmake/config-ix.cmake:360 (include) CMakeLists.txt:671 (include)	2020-10-10 21:22:40 +02:00
Simon Pilgrim	7f80e86b2b	[InstCombine] getLogBase2 - no need to specify Type. NFCI. In all the getLogBase2 uses, the specified Type is always the same as the constant being folded.	2020-10-10 20:09:55 +01:00
Simon Pilgrim	473789ee5d	Remove %tmp variables from test cases to appease update_test_checks.py	2020-10-10 19:13:16 +01:00
Simon Pilgrim	a9a5a70b75	[PowerPC] ReplaceNodeResults - bail on funnel shifts and let generic legalizers deal with it Fixes regression raised on D88834 for 32-bit triple + 64-bit cpu cases (which apparently is a thing).	2020-10-10 19:13:16 +01:00
Krzysztof Parzyszek	5e7db7f06e	Define splat_vector for ISD::SPLAT_VECTOR in TargetSelectionDAG.td	2020-10-10 13:12:20 -05:00
Simon Pilgrim	438b89d403	[PowerPC] Add ppc32 funnel shift test coverage	2020-10-10 18:19:42 +01:00
Simon Pilgrim	724010c5dd	[InstCombine] Add test case showing rotate intrinsic being split by SimplifyDemandedBits Noticed while triaging regression report on D88834	2020-10-10 18:19:42 +01:00
Philip Reames	f5a55066e4	Step down from security group Resigning from security group as Azul representative as I have left Azul. Previously communicated via email with security group. Differential Revision: https://reviews.llvm.org/D88933	2020-10-10 09:48:02 -07:00
Tim Renouf	13991476f1	[AMDGPU] Add gfx602, gfx705, gfx805 targets At AMD, in an internal audit of our code, we found some corner cases where we were not quite differentiating targets enough for some old hardware. This commit is part of fixing that by adding three new targets: * The "Oland" and "Hainan" variants of gfx601 are now split out into gfx602. LLPC (in the GPUOpen driver) and other front-ends could use that to avoid using the shaderZExport workaround on gfx602. * One variant of gfx703 is now split out into gfx705. LLPC and other front-ends could use that to avoid using the shaderSpiCsRegAllocFragmentation workaround on gfx705. * The "TongaPro" variant of gfx802 is now split out into gfx805. TongaPro has a faster 64-bit shift than its former friends in gfx802, and a subtarget feature could be set up for that to take advantage of it. This commit does not make that change; it just adds the target. V2: Add clang changes. Put TargetParser list in order. V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order, so fix the GPUKind order. Differential Revision: https://reviews.llvm.org/D88916 Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d	2020-10-10 17:22:22 +01:00
Florian Hahn	f9400b47ee	[SCEV] Add test cases where the max BTC is imprecise, due to step != 1. Add a test case where we fail to compute a tight max backedge taken count, due to the step being != 1. This is part of the issue with PR40961.	2020-10-10 16:39:48 +01:00
Florian Hahn	dd97e149d5	[SCEV] Handle ULE in applyLoopGuards. Handle ULE predicate in similar fashion to ULT predicate in applyLoopGuards.	2020-10-10 16:26:28 +01:00
Florian Hahn	82b474d0d8	[SCEV] Add a test case with ULE loop guard.	2020-10-10 15:58:26 +01:00
Nikita Popov	6d9e2cd7e0	[MemCpyOpt] Add test for incorrect memset DSE (NFC) We can't shorten the memset if there's a throwing call in between and the destination is non-local.	2020-10-10 16:11:14 +02:00
David Green	995d885e43	[ARM] Attempt to make Tail predication / RDA more resilient to empty blocks There are a number of places in RDA where we assume the block will not be empty. This isn't necessarily true for tail predicated loops where we have removed instructions. This attempt to make the pass more resilient to empty blocks, not casting pointers to machine instructions where they would be invalid. The test contains a case that was previously failing, but recently been hidden on trunk. It contains an empty block to begin with to show a similar error. Differential Revision: https://reviews.llvm.org/D88926	2020-10-10 14:50:25 +01:00
Alok Kumar Sharma	0a8029e199	[DebugInfo] Support for DWARF attribute DW_AT_rank This patch adds support for DWARF attribute DW_AT_rank. Summary: Fortran assumed rank arrays have dynamic rank. DWARF attribute DW_AT_rank is needed to support that. Testing: unit test cases added (hand-written) check llvm check debug-info Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D89141	2020-10-10 17:51:12 +05:30
David Green	d5dc3dc1f4	[AArch64][LV] Move vectorizer test to Transforms/LoopVectorize/AArch64. NFC	2020-10-10 10:15:43 +01:00
David Green	08b29d77c1	[TblGen][Scheduling] Fix debug output. NFC This just moves some newlines to the expected places.	2020-10-10 10:04:28 +01:00
Nikita Popov	12c9ccbc9d	[MemCpyOpt] Don't hoist store that's not guaranteed to execute MemCpyOpt can hoist stores while load+store pairs into memcpy. This hoisting can currently result in stores being executed that weren't guaranteed to execute in the original problem. Differential Revision: https://reviews.llvm.org/D89154	2020-10-10 10:26:28 +02:00
Denis Antrushin	2cab9515a5	[Statepoints] Allow deopt GC pointer on VReg if gc-live bundle is empty. Currently we allow passing pointers from deopt bundle on VReg only if they were seen in list of gc-live pointers passed on VRegs. This means that for the case of empty gc-live bundle we spill deopt bundle's pointers. This change allows lowering deopt pointers to VRegs in case of empty gc-live bundle. In case of non-empty gc-live bundle, behavior does not change. Reviewed By: skatkov Differential Revision: https://reviews.llvm.org/D88999	2020-10-10 14:58:08 +07:00
Zi Xuan Wu	e3b36cdf43	[CSKY 1/n] Add basic stub or infra of csky backend This patch introduce files that just enough for lib/Target/CSKY to compile. Notably a basic CSKYTargetMachine and CSKYTargetInfo. Differential Revision: https://reviews.llvm.org/D88466	2020-10-10 10:44:08 +08:00
Fangrui Song	9d5d82fe09	[PowerPC] Fix signed overflow in decomposeMulByConstant after D88201 Caught by multipliers LONG_MAX (after +1) and LONG_MIN (after -1) in CodeGen/PowerPC/mul-const-i64.ll	2020-10-09 18:29:12 -07:00

... 3 4 5 6 7 ...

205145 Commits