llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00

Author	SHA1	Message	Date
Kazushi (Jam) Marukawa	a7e64248cb	[VE] Support copy of vector mask registers Support VM and VMP registers in copyPhysReg() function. Also add regression tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D93547	2020-12-19 09:16:43 +09:00
Harald van Dijk	87ab7fb0ff	[X86] Avoid generating invalid R_X86_64_GOTPCRELX relocations We need to make sure not to emit R_X86_64_GOTPCRELX relocations for instructions that use a REX prefix. If a REX prefix is present, we need to instead use a R_X86_64_REX_GOTPCRELX relocation. The existing logic for CALL64m, JMP64m, etc. already handles this by checking the HasREX parameter and using it to determine which relocation type to use. Do this for all instructions that can use relaxed relocations. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D93561	2020-12-18 23:38:38 +00:00
Fraser Cormack	39e945b403	[RISCV] Address clang-tidy warnings in RISCVTargetMachine. NFC.	2020-12-18 21:50:55 +00:00
Sanjay Patel	54d3cfb2b0	[SLP] fix typo; NFC	2020-12-18 16:55:52 -05:00
Fraser Cormack	bae3d21c4e	[RISCV] Assume no-op addrspacecasts by default To support OpenCL, which typically uses SPIR as an IR, non-zero address spaces must be accounted for. This patch makes the RISC-V target assume no-op address space casts across the board, which effectively removes the need to support addrspacecast instructions in the backend. For a RISC-V implementation with different configurations or specialized address spaces where casts aren't no-ops, the function can be adjusted as required. Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D93536	2020-12-18 21:03:37 +00:00
Chih-Ping Chen	440c869a50	Rename files with same (case insensitive) name Patch by: Aditya Kumar. Differential Revision: https://reviews.llvm.org/D93559	2020-12-18 16:01:37 -05:00
Craig Topper	590b88c6f9	[RISCV] Add intrinsics for vsetvli instruction This patch adds two IR intrinsics for vsetvli instruction. One to set the vector length to a user specified value and one to set it to vlmax. The vlmax uses the X0 source register encoding. Clang builtins will follow in a separate patch Differential Revision: https://reviews.llvm.org/D92973	2020-12-18 12:10:09 -08:00
Fangrui Song	b37b65492b	[TableGen] Fix D90844 introduced non-determinism due to iteration over a std::map over allocated object pointers 993eaf2d69d8beb97e4695cbd919b927ed1cfe86 (D90844) is still wrong. The allocated const Record* pointers do not have an order guarantee so switching from DenseMap to std::map does not help. ProcModelMapTy = std::map<const Record*, unsigned> Sort the values instead.	2020-12-18 12:08:16 -08:00
Nikita Popov	8ca3523a4f	[InstCombine] Regenerate test checks (NFC)	2020-12-18 20:55:26 +01:00
Craig Topper	c0c9269ef6	[RISCV] Sign extend constant arguments to V intrinsics when promoting to XLen. The default behavior for any_extend of a constant is to zero extend. This occurs inside of getNode rather than allowing type legalization to promote the constant which would sign extend. By using sign extend with getNode the constant will be sign extended. This gives a better chance for isel to find a simm5 immediate since all xlen bits are examined there. For instructions that use a uimm5 immediate, this change only affects constants >= 128 for i8 or >= 32768 for i16. Constants that large already wouldn't have been eligible for uimm5 and would need to use a scalar register. If the instruction isn't able to use simm5 or the immediate is too large, we'll need to materialize the immediate in a register. As far as I know constants with all 1s in the upper bits should materialize as well or better than all 0s. Longer term we should probably have a SEW aware PatFrag to ignore the bits above SEW before checking simm5. I updated about half the test cases in some tests to use a negative constant to get coverage for this. Reviewed By: evandro Differential Revision: https://reviews.llvm.org/D93487	2020-12-18 11:43:38 -08:00
Nikita Popov	1f7ac02082	[DSE] Use correct memory location for read clobber check MSSA DSE starts at a killing store, finds an earlier store and then checks that the earlier store is not read along any paths (without being killed first). However, it uses the memory location of the killing store for that, not the earlier store that we're attempting to eliminate. This has a number of problems: * Mismatches between what BasicAA considers aliasing and what DSE considers an overwrite (even though both are correct in isolation) can result in miscompiles. This is PR48279, which D92045 tries to fix in a different way. The problem is that we're using a location from a store that is potentially not executed and thus may be UB, in which case analysis results can be arbitrary. * Metadata on the killing store may be used to determine aliasing, but there is no guarantee that the metadata is valid, as the specific killing store may not be executed. Using the metadata on the earlier store is valid (it is the store we're removing, so on any execution where its removal may be observed, it must be executed). * The location is imprecise. For full overwrites the killing store will always have a location that is larger or equal than the earlier access location, so it's beneficial to use the earlier access location. This is not the case for partial overwrites, in which case either location might be smaller. There is some room for improvement here. Using the earlier access location means that we can no longer cache which accesses are read for a given killing store, as we may be querying different locations. However, it turns out that simply dropping the cache has no notable impact on compile-time. Differential Revision: https://reviews.llvm.org/D93523	2020-12-18 20:26:53 +01:00
Craig Topper	d25223344d	Recommit "[RISCV] Add intrinsics for vfmv.f.s and vfmv.s.f" This time with tests. Original message: Similar to D93365, but for floating point. No need for special ISD opcodes though. We can directly isel these from intrinsics. I had to use anyfloat_ty instead of anyvector_ty in the intrinsics to make LLVMVectorElementType not crash when imported into the -gen-dag-isel tablegen backend. Differential Revision: https://reviews.llvm.org/D93426	2020-12-18 11:19:05 -08:00
Craig Topper	1444279745	Revert "[RISCV] Add intrinsics for vfmv.f.s and vfmv.s.f" This reverts commit 46a40c4bc10671ebddb45fabd1a3b0b419a58109. I forgot to git add the tests.	2020-12-18 11:16:36 -08:00
Craig Topper	54bda5464f	[RISCV] Add intrinsics for vfmv.f.s and vfmv.s.f Similar to D93365, but for floating point. No need for special ISD opcodes though. We can directly isel these from intrinsics. I had to use anyfloat_ty instead of anyvector_ty in the intrinsics to make LLVMVectorElementType not crash when imported into the -gen-dag-isel tablegen backend. Differential Revision: https://reviews.llvm.org/D93426	2020-12-18 11:11:15 -08:00
Roman Lebedev	7940e6d2c7	[NFC][InstCombine] Fixup check lines for prof md in select_meta.ll test	2020-12-18 21:33:30 +03:00
Craig Topper	73b2cc4257	[RISCV] Add intrinsics for vmv.x.s and vmv.s.x This adds intrinsics for vmv.x.s and vmv.s.x. I've used stricter type constraints on these intrinsics than what we've been doing on the arithmetic intrinsics so far. This will allow us to not need to pass the scalar type to the Intrinsic::getDeclaration call when creating these intrinsics. A custom ISD is used for vmv.x.s in order to implement the change in computeNumSignBitsForTargetNode which can remove sign extends on the result. I also modified the MC layer description of these instructions to show the tied source/dest operand. This is different than what we do for masked instructions where we drop the tied source operand when converting to MC. But it is a more accurate description of the instruction. We can't do this for masked instructions since we use the same MC instruction for masked and unmasked. Tools like llvm-mca operate in the MC layer and rely on ins/outs and Uses/Defs for analysis so I don't know if we'll be able to maintain the current behavior for masked instructions. So I went with the accurate description here since it was easy. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D93365	2020-12-18 10:30:48 -08:00
Kazu Hirata	42dd0b9079	[GVNHoist] Remove successorDominate (NFC) The function was introduced on Aug 25, 2016 in commit 5f0d0e60d11b8d2e48aacf31a82762280f9a8712. Its last use was removed on Sep 13, 2017 in commit dfa8741c9693c344477c842a25ee0cb6a6f59fcd.	2020-12-18 10:29:52 -08:00
Roman Lebedev	a60037d732	[InstCombine] Canonicalize SPF to abs intrinsic This patch enables canonicalization of SPF_ABS and SPF_ABS to the abs intrinsic. This is a recommit, the original try was 05d4c4ebc2fb006b8a2bd05b24c6aba10dd2eef8, but it was reverted due to an apparent miscompile, which since then has just been fixed by the previous commit. Differential Revision: https://reviews.llvm.org/D87188	2020-12-18 21:18:14 +03:00
Roman Lebedev	00fc2f7164	[InstSimplify] Don't miscompile `X == 0 ? abs(X) : -abs(X) --> -abs(X)` xform The transform wasn't checking that the LHS of the comparison is the `X` in question... This is the miscompile that was holding up D87188. Thanks to Dave Green for producing an actionable reproducer!	2020-12-18 21:18:13 +03:00
Roman Lebedev	15ac6117c4	[NFC][InstSimplify] Add miscompiled testcase from D87188/D87197 Thanks to Dave Green for producing an actionable reproducer! It is (obviously) a miscompile: ``` ---------------------------------------- define i32 @select_abs_of_abs_eq_wrong(i32 %x, i32 %y) { %0: %abs = abs i32 %x, 0 %neg = sub i32 0, %abs %cmp = icmp eq i32 %y, 0 %sel = select i1 %cmp, i32 %neg, i32 %abs ret i32 %sel } => define i32 @select_abs_of_abs_eq_wrong(i32 %x, i32 %y) { %0: %abs = abs i32 %x, 0 ret i32 %abs } Transformation doesn't verify! ERROR: Value mismatch Example: i32 %x = #xe0000000 (3758096384, -536870912) i32 %y = #x00000000 (0) Source: i32 %abs = #x20000000 (536870912) i32 %neg = #xe0000000 (3758096384, -536870912) i1 %cmp = #x1 (1) i32 %sel = #xe0000000 (3758096384, -536870912) Target: i32 %abs = #x20000000 (536870912) Source value: #xe0000000 (3758096384, -536870912) Target value: #x20000000 (536870912) Alive2: Transform doesn't verify! ```	2020-12-18 21:18:13 +03:00
Chih-Ping Chen	c44b393235	[DebugInfo] Support Fortran 'use <external module>' statement. The main change is to add a 'IsDecl' field to DIModule so that when IsDecl is set to true, the debug info entry generated for the module would be marked as a declaration. That way, the debugger would look up the definition of the module in the gloabl scope. Please see the comments in llvm/test/DebugInfo/X86/dimodule.ll for what the debug info entries would look like. Differential Revision: https://reviews.llvm.org/D93462	2020-12-18 13:10:57 -05:00
diggerlin	6c0d365003	[AIX] Change the code based on https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201214/864235.html Summary: change the code based on the discussion as: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20201214/864235.html	2020-12-18 13:02:41 -05:00
Florian Hahn	b0f7329005	Revert "[BasicAA] Handle two unknown sizes for GEPs" Temporarily revert commit 8b1c4e310c2f9686cad925ad81d8e2be10a1ef3c. After 8b1c4e310c2f the compile-time for `MultiSource/Benchmarks/MiBench/consumer-lame` dramatically increases with -O3 & LTO, causing issues for builders with that configuration. I filed PR48553 with a smallish reproducer that shows a 10-100x compile time increase.	2020-12-18 17:59:12 +00:00
Craig Topper	fe0fb08c2f	[RISCV] Add intrinsics for vmv.v.v, vmv.v.x, and vmv.x.i We work with @rogfer01 from BSC to come out this patch. Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com> Co-Authored-by: Craig Topper <craig.topper@sifive.com> Differential Revision: https://reviews.llvm.org/D93514	2020-12-18 09:49:07 -08:00
Kevin P. Neal	657c816bfd	Revert "Revert "[FPEnv] Teach the IRBuilder about invoke's correct use of the strictfp attribute."" Similar to D69312, and documented in D69839, the IRBuilder needs to add the strictfp attribute to invoke instructions when constrained floating point is enabled. This is try 2, with the test corrected. Differential Revision: https://reviews.llvm.org/D93134	2020-12-18 12:42:06 -05:00
Whitney Tsang	0ac56aa46f	Ensure SplitEdge to return the new block between the two given blocks This PR implements the function splitBasicBlockBefore to address an issue that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore. The issue occurs in SplitEdge when the Succ has a single predecessor and the edge between the BB and Succ is not critical. This produces the result ‘BB->Succ->New’. The new function splitBasicBlockBefore was added to splitBlockBefore to handle the issue and now produces the correct result ‘BB->New->Succ’. Below is an example of splitting the block bb1 at its first instruction. /// Original IR bb0: br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlock bb0: br bb1 bb1: br bb1.split bb1.split: %0 = mul i32 1, 2 br bb2 bb2: /// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore bb0: br bb1.split bb1.split br bb1 bb1: %0 = mul i32 1, 2 br bb2 bb2: Differential Revision: https://reviews.llvm.org/D92200	2020-12-18 17:37:17 +00:00
Kazu Hirata	7b36fe3c9b	[MCA, ExecutionEngine, Object] Use llvm::is_contained (NFC)	2020-12-18 09:09:04 -08:00
Craig Blackmore	afe3fa52eb	[RegisterScavenging] Fix assert in scavengeRegisterBackwards According to the documentation, if a spill is required to make a register available and AllowSpill is false, then NoRegister should be returned, however, this scenario was actually triggering an assertion failure. This patch moves the assertion after the handling of AllowSpill. Authored by: Lewis Revill Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D92104	2020-12-18 16:57:05 +00:00
Arnamoy Bhattacharyya	8015b0c713	[SROA] Remove Dead Instructions while creating speculative instructions The SROA pass tries to be lazy for removing dead instructions that are collected during iterative run of the pass in the DeadInsts list. However it does not remove instructions from the dead list while running eraseFromParent() on those instructions. This causes (rare) null pointer dereferences. For example, in the speculatePHINodeLoads() instruction, in the following code snippet: ``` while (!PN.use_empty()) { LoadInst LI = cast<LoadInst>(PN.user_back()); LI->replaceAllUsesWith(NewPN); LI->eraseFromParent(); } ``` If the Load instruction LI belongs to the DeadInsts list, it should be removed when eraseFromParent() is called. However, the bug does not show up in most cases, because immediately in the same function, a new LoadInst is created in the following line: ``` LoadInst Load = PredBuilder.CreateAlignedLoad( LoadTy, InVal, Alignment, (PN.getName() + ".sroa.speculate.load." + Pred->getName())); ``` This new LoadInst object takes the same memory address of the just deleted LI using eraseFromParent(), therefore the bug does not materialize. In very rare cases, the addresses differ and therefore, a dangling pointer is created, causing a crash. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D92431	2020-12-18 11:47:02 -05:00
David Green	9ee7075ca5	[ARM] Match dual lane vmovs from insert_vector_elt MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of: vmov q0[2], q0[0], r2, r0 vmov q0[3], q0[1], r3, r1 They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2. This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are know to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are: 3 2 1 0 -> vmovqrr 31; vmovqrr 20 3 2 1 -> vmovqrr 31; vmov 2 3 1 -> vmovqrr 31 2 1 0 -> vmovqrr 20; vmov 1 2 0 -> vmovqrr 20 With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is nothing else that will attempt to match an insert_vector_elt node. This is a recommit of 6cc3d80a84884a79967fffa4596c14001b8ba8a3 after fixing the backward instruction definitions.	2020-12-18 16:13:08 +00:00
Xun Li	d42c14b69b	Cleanup coro-inline.ll Following up with the comments in D92706. - Use -passes instead of -enable-new-pm - CoroEarly should happen before AlwaysInliner, adjust it. - Remove some unnecessary barriers (still kept one) - Cleanup unnecessary debug info Differential Revision: https://reviews.llvm.org/D93342	2020-12-18 08:05:04 -08:00
Matt Arsenault	59531454e7	PEI: Only call updateLiveness once per function This only needs to be called once for the function, and it visits all the necessary blocks in the function. It looks like 631f6b888c50276450fee8b9ef129f37f83fc5a1 accidentally moved this into the loop over all save blocks.	2020-12-18 11:02:28 -05:00
Simon Pilgrim	992f58385f	[X86] Avoid std::string creation in RecognizableInstr constructor. NFCI. The value names in byteFromRec calls are compile time constants - just create StringRef directly instead of via std::string.	2020-12-18 16:00:41 +00:00
Lucas Prates	b0acef6ebb	[AArch64] Updating .arch_extension negative tests This updates the test for the `.arch_extension` as directive negatives to properly enable the extensions being tested on the llvm-mc command line before validating that the directive correctly disables them. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D93538	2020-12-18 15:57:11 +00:00
Lucas Prates	8fe181f795	[AArch64] Add support for ls64 to the .arch_extension asm directive This adds support for the 'ls64' AArch64 extension to the `.arch_extension` asm directive. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D92574	2020-12-18 15:55:55 +00:00
Simon Pilgrim	9361873ea6	[X86][AVX] Remove X86ISD::SUBV_BROADCAST (PR38969) Followup to D92645 - remove the remaining places where we create X86ISD::SUBV_BROADCAST, and fold splatted vector loads to X86ISD::SUBV_BROADCAST_LOAD instead. Remove all the X86SubVBroadcast isel patterns, including all the fallbacks for if memory folding failed.	2020-12-18 15:49:53 +00:00
Sanjay Patel	e5b3f1f802	[VectorCombine] allow peeking through GEPs when creating a vector load This is an enhancement motivated by https://llvm.org/PR16739 (see D92858 for another). We can look through a GEP to find a base pointer that may be safe to use for a vector load. If so, then we shuffle (shift) the necessary vector element over to index 0. Alive2 proof based on 1 of the regression tests: https://alive2.llvm.org/ce/z/yPJLkh The vector translation is independent of endian (verify by changing to leading 'E' in the datalayout string). Differential Revision: https://reviews.llvm.org/D93229	2020-12-18 09:25:03 -05:00
Georgii Rymar	6ae506ec2d	[libObject, llvm-readobj] - Reimplement `ELFFile<ELFT>::getEntry`. Currently, `ELFFile<ELFT>::getEntry` does not check an index of an entry. Because of that the code might read past the end of the symbol table silently. I've added a test to `llvm-readobj\ELF\relocations.test` to demonstrate the possible issue. Also, I've added a unit test for this method. After this change, `getEntry` stops reporting the section index and reuses the `getSectionContentsAsArray` method, which already has all the validation needed. Our related warnings now provide more and better context sometimes. Differential revision: https://reviews.llvm.org/D93209	2020-12-18 16:52:27 +03:00
David Green	8cf30d1ea8	Revert "[ARM] Match dual lane vmovs from insert_vector_elt" This one needed more testing.	2020-12-18 13:33:40 +00:00
Tomas Matheson	34495b4840	[AArch64] Fix Copy Elemination for negative values Redundant Copy Elimination was eliminating a MOVi32imm -1 when it determined that the value of the destination register is already -1. However, it didn't take into account that the MOVi32imm zeroes the upper 32 bits (which are FFFFFFFF) and therefore cannot be eliminated. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D93100	2020-12-18 13:30:46 +00:00
Paul Walker	5ba58d248d	[NFC][SVE] Clean up bfloat isel patterns that emit non-bfloat instructions. During isel there's no need to protect illegal types. Patch also adds a missing unit test for tbl2 intrinsic using bfloat types. Differential Revision: https://reviews.llvm.org/D93404	2020-12-18 13:20:41 +00:00
LLVM GN Syncbot	594960eb50	[gn build] Port e69e551e0e5	2020-12-18 13:00:09 +00:00
Kerry McLaughlin	a9d5f92f0d	[SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate addressing modes for scalable masked gathers & scatters. selectGatherScatterAddrMode checks if the base pointer is null, in which case we can swap the base pointer and the index, e.g. getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices) -> getelementptr %offset, <vscale x N x T> %indices Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93132	2020-12-18 11:56:36 +00:00
Simon Pilgrim	28cf5e1f81	[X86][AVX] Replace extract_subvector(broadcast(), 0) folds with generic SimplifyDemandedVectorEltsForTargetNode handling. Simplifies a few more cases, notably shuffle demanded elts cases.	2020-12-18 11:51:10 +00:00
Carl Ritson	b988d0d807	[AMDGPU][NFC] Remove unused Hi16Elt definition	2020-12-18 20:38:54 +09:00
Lucas Prates	bf2fbafd5f	[AArch64] Add support for the SPE-EEF feature This is an addition to the existing Statistical Profiling extension, which introduces an extra system register that is enabled by the new 'spe-eef' subtarget feature. Patch written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D92391	2020-12-18 11:11:56 +00:00
Lucas Prates	60c2a88e72	[AArch64] Add support for the Branch Record Buffer extension This introduces asm support for the Branch Record Buffer extension, through the new 'brbe' subtarget feature. It consists of a new set of system registers that enable the handling of branch records. Patch written by Simon Tatham. Reviewed By: ostannard Differential Revision: https://reviews.llvm.org/D92389	2020-12-18 11:11:06 +00:00
Carl Ritson	d466c5273e	[AMDGPU][NFC] Document high parameter of f16 interp intrinsics	2020-12-18 19:59:13 +09:00
Cullen Rhodes	b434082b2a	[TTI] Add supportsScalableVectors target hook This is split off from D91718 and adds a new target hook supportsScalableVectors that can be queried to check if scalable vectors are supported by the backend. For AArch64 this returns true if SVE is enabled. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D93060	2020-12-18 10:37:01 +00:00
Bjorn Pettersson	3774e2781f	Add intrinsics for saturating float to int casts This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ The intrinsics have overloaded source and result type and support vector operands: i32 @llvm.fptoui.sat.i32.f32(float %f) i100 @llvm.fptoui.sat.i100.f64(double %f) <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f) // etc On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example: i19 @llvm.fptsi.sat.i19.f32(float %f) // builds i19 fp_to_sint_sat f, 19 // type legalizes (through integer result promotion) i32 fp_to_sint_sat f, 19 I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width. There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like: f = fmaxnum f, FP(MIN) f = fminnum f, FP(MAX) i = fptoxi f i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN If the bounds cannot be exactly represented, we expand to something like this instead: i = fptoxi f i = select f ult FP(MIN), MIN, i i = select f ogt FP(MAX), MAX, i i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN It should be noted that this expansion assumes a non-trapping fptoxi. Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening. Original patch by @nikic and @ebevhan (based on D54696). Differential Revision: https://reviews.llvm.org/D54749	2020-12-18 11:09:41 +01:00

... 2 3 4 5 6 ...

208661 Commits