llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00

Author	SHA1	Message	Date
Ulrich Weigand	b5b6e8e953	[FPEnv] Constrained FCmp intrinsics This adds support for constrained floating-point comparison intrinsics. Specifically, we add: declare <ty2> @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) declare <ty2> @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>, metadata <condition code>, metadata <exception behavior>) The first variant implements an IEEE "quiet" comparison (i.e. we only get an invalid FP exception if either argument is a SNaN), while the second variant implements an IEEE "signaling" comparison (i.e. we get an invalid FP exception if either argument is any NaN). The condition code is implemented as a metadata string. The same set of predicates as for the fcmp instruction is supported (except for the "true" and "false" predicates). These new intrinsics are mapped by SelectionDAG codegen onto two new ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again representing quiet vs. signaling comparison operations. Otherwise those nodes look like SETCC nodes, with an additional chain argument and result as usual for strict FP nodes. The patch includes support for the common legalization operations for those nodes. The patch also includes full SystemZ back-end support for the new ISD nodes, mapping them to all available SystemZ instruction to fully implement strict semantics (scalar and vector). Differential Revision: https://reviews.llvm.org/D69281	2019-12-07 11:28:39 +01:00
LLVM GN Syncbot	3f8cd19ea3	gn build: Merge e60b36cf92e	2019-12-07 08:57:51 +00:00
Florian Hahn	c0a866cb87	[VPlan] Rename VPlanHCFGTransforms to VPlanTransforms (NFC). The file is intended to gather various VPlan transformations, not only CFG related transforms. Actually, the only transformation there is not CFG related. Reviewers: Ayal, gilr, hsaito, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D70732	2019-12-07 08:56:35 +00:00
Kai Luo	485e7e7049	[PowerPC] Fix MI peephole optimization for splats Summary: This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap. Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered. This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register. Here is an example in pseudo-MIR of what happens in the test cased added in this patch: Before PPC MI peephole optimization: ``` %arg = XVADDDP %0, %1 $f1 = COPY %arg.sub_64 call double rint(double) %res.first = COPY $f1 %vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64 %arg.swapped = XXPERMDI %arg, %arg, 2 $f1 = COPY %arg.swapped.sub_64 call double rint(double) %res.second = COPY $f1 %vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64 %vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0 %vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2 ; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ] ``` After optimization: ``` ; ... %vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0 ; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1 ; so the pass replaces the swap with a copy: %vec.res = COPY %vec.res.splat ; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ] ``` As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat. Committed for vddvss (Colin Samples). Thanks Colin for the patch! Differential Revision: https://reviews.llvm.org/D69497	2019-12-07 14:51:20 +08:00
Tom Stellard	8ecd409463	export.sh: Fetch sources from GitHub instead of SVN Reviewers: hansw, jdoerfert Subscribers: sylvestre.ledru, mgorny, hans, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70460	2019-12-06 18:57:08 -08:00
Amara Emerson	a902bd320c	[AArch64][GlobalISel] Add missing default statement to a switch in the selector.	2019-12-06 17:43:27 -08:00
Sterling Augustine	07ff52ab90	Move variable only used in an assert into the assert itself. This prevents unused variable warnings from breaking the build.	2019-12-06 17:09:19 -08:00
Amara Emerson	39754a683f	[AArch64][GlobalISel] Add support for selection of vector G_SHL with immediates. Only implemented for the type combinations already supported for G_SHL. Differential Revision: https://reviews.llvm.org/D71153	2019-12-06 16:24:57 -08:00
Peter Collingbourne	563db3c456	gn build: Change scudo's list of supported platforms to a whitelist. Scudo only supports building for android/linux/fuchsia, so require target_os to be one of linux/fuchsia to do a stage2_unix scudo build. Android is already covered by the stage2_android* toolchains below. Differential Revision: https://reviews.llvm.org/D71131	2019-12-06 15:53:54 -08:00
Don Hinton	4da312a2f1	[CommandLine] Add callbacks to Options Summary: Add a new cl::callback attribute to Option. This attribute specifies a callback function that is called when an option is seen, and can be used to set other options, as in option A implies option B. If the option is a `cl::list`, and `cl::CommaSeparated` is also specified, the callback will fire once for each value. This could be used to validate combinations or selectively set other options. Reviewers: beanz, thomasfinch, MaskRay, thopre, serge-sans-paille Reviewed By: beanz Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70620	2019-12-06 15:16:45 -08:00
Sam Clegg	7d2c6fd3c7	[WebAssebmly][MC] Support .import_name/.import_field asm directives Convert the MC test to use asm rather than bitcode. This is a precursor to https://reviews.llvm.org/D70520. Differential Revision: https://reviews.llvm.org/D70877	2019-12-06 15:09:56 -08:00
Reid Kleckner	e3b8614158	[MC] Rewrite tablegen for printInstrAlias to comiple faster, NFC Before this change, the *InstPrinter.cpp files of each target where some of the slowest objects to compile in all of LLVM. See this snippet produced by ClangBuildAnalyzer: https://reviews.llvm.org/P8171$96 Search for "InstPrinter", and see that it shows up in a few places. Tablegen was emitting a large switch containing a sequence of operand checks, each of which created many conditions and many BBs. Register allocation and jump threading both did not scale well with such a large repetitive sequence of basic blocks. So, this change essentially turns those control flow structures into data. The previous structure looked like: switch (Opc) { case TGT::ADD: // check alias 1 if (MI->getOperandCount() == N && // check num opnds MI->getOperand(0).isReg() && // check opnd 0 ... MI->getOperand(1).isImm() && // check opnd 1 AsmString = "foo"; break; } // check alias 2 if (...) ... return false; The new structure looks like: OpToPatterns: Sorted table of opcodes mapping to pattern indices. \-> Patterns: List of patterns. Previous table points to subrange of patterns to match. \-> Conds: The if conditions above encoded as a kind and 32-bit value. See MCInstPrinter.cpp for the details of how the new data structures are interpreted. Here are some before and after metrics. Time to compile AArch64InstPrinter.cpp: 0m29.062s vs. 0m2.203s size of the obj: 3.9M vs. 676K size of clang.exe: 97M vs. 96M I have not benchmarked disassembly performance, but typically disassemblers are bottlenecked on IO and string processing, not alias matching, so I'm not sure it's interesting enough to be worth doing. Reviewers: RKSimon, andreadb, xbolva00, craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D70650	2019-12-06 15:00:18 -08:00
Amara Emerson	c4c92d3e17	[X86] Fix prolog/epilog mismatch for stack protectors on win32-macho. The xor'ing behaviour is only used for msvc/crt environments, when we're targeting macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing when targeting win32-macho to be consistent. Differential Revision: https://reviews.llvm.org/D71095	2019-12-06 14:44:56 -08:00
Nico Weber	b50dc04b3d	wrap an rst file to 80 cols, to cycle bots	2019-12-06 17:28:02 -05:00
Craig Topper	08dfbd5919	[TargetLowering] Fix another potential FPE in expandFP_TO_UINT D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence: // Sel = Src < 0x8000000000000000 // Val = select Sel, Src, Src - 0x8000000000000000 // Ofs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val) ^ Ofs The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.) Instead, I'd suggest to use the following sequence: // Sel = Src < 0x8000000000000000 // FltOfs = select Sel, 0, 0x8000000000000000 // IntOfs = select Sel, 0, 0x8000000000000000 // Result = fp_to_sint(Val - FltOfs) ^ IntOfs In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway). In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit. There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.) Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper Differential Revision: https://reviews.llvm.org/D67105	2019-12-06 14:11:04 -08:00
Sanjay Patel	5bd1aece96	[InstSimplify] add tests for copysign with fneg operand; NFC	2019-12-06 16:23:44 -05:00
Teresa Johnson	717b67f012	[WPD] Remove unused parameter (NFC) Remove unused parameter.	2019-12-06 13:14:21 -08:00
Reid Kleckner	8418cd3680	[X86] Don't setup and teardown memory for a musttail call Summary: musttail calls should not require allocating extra stack for arguments. Updates to arguments passed in memory should happen in place before the epilogue. This bug was mostly a missed optimization, unless inalloca was used and store to push conversion fired. If a reserved call frame was used for an inalloca musttail call, the call setup and teardown instructions would be deleted, and SP adjustments would be inserted in the prologue and epilogue. You can see these are removed from several test cases in this change. In the case where the stack frame was not reserved, i.e. call frame optimization fires and turns argument stores into pushes, then the imbalanced call frame setup instructions created for inalloca calls become a problem. They remain in the instruction stream, resulting in a call setup that allocates zero bytes (expected for inalloca), and a call teardown that deallocates the inalloca pack. This deallocation was unbalanced, leading to subsequent crashes. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71097	2019-12-06 12:58:54 -08:00
Hiroshi Yamauchi	7e81b42fc1	Revert "[PGO][PGSO] Instrument the code gen / target passes." This reverts commit 9a0b5e14075a1f42a72eedb66fd4fde7985d37ac. This seems to break buildbots.	2019-12-06 12:17:32 -08:00
Wenlei He	a91e4b34f2	[AutoFDO] Inline replay for cold/small callees from sample profile loader Summary: Sample profile loader of AutoFDO tries to replay previous inlining using context sensitive profile. The replay only repeats inlining if the call site block is hot. As a result it punts inlining of small functions, some of which can be beneficial for size, and will still be inlined by CSGCC inliner later. The oscillation between sample profile loader's inlining and regular CGSSC inlining cause unnecessary loss of context-sensitive profile. It doesn't have much impact for inline decision itself, but it negatively affects post-inline profile quality as CGSCC inliner have to scale counts which is not as accurate as the original context sensitive profile, and bad post-inline profile can misguide code layout. This change added regular Inline Cost calculation for sample profile loader, so we can inline small functions upfront under switch -sample-profile-inline-size. In addition -sample-profile-cold-inline-threshold is added so we can tune the separate size threshold - currently the default is chosen to be the same as regular inliner's cold call-site threshold. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70750	2019-12-06 11:44:45 -08:00
Reid Kleckner	633e639e60	Avoid naming variable after type to fix GCC 5.3 build GCC says: .../llvm/lib/DebugInfo/GSYM/FunctionInfo.cpp:195:12: error: ‘InfoType’ is not a class, namespace, or enumeration case InfoType::EndOfList: ^ Presumably, GCC thinks InfoType is a variable here. Work around it by using the name IT as is done above.	2019-12-06 11:25:28 -08:00
Sanjay Patel	36598c8642	Revert "[InstCombine] reduce code duplication; NFC" This reverts commit db5739658467e20a52f20e769d3580412e13ff87. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.	2019-12-06 14:24:14 -05:00
Sanjay Patel	e53673fb64	Revert "[InstCombine] improve readability; NFC" This reverts commit 7250ef3613cc6b81145b9543bafb86d7f9466cde. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.	2019-12-06 14:20:44 -05:00
Sanjay Patel	80f01f20d0	Revert "[InstCombine] reduce indentation; NFC" This reverts commit 8bf8ef7116bd0daec570b35480ca969b74e66c6e. At least 1 of these supposedly NFC commits wasn't - sanitizer bot is angry.	2019-12-06 14:19:02 -05:00
Alina Sbirlea	1b1de7caca	Revert "ARM-Darwin: keep the frame register reserved even if not updated." This reverts commit a7d90af1be48234ce583e00fb16e33633d44ae38. This revision came back as the root-cause for crashes in internal ARM-IOS apps. Reproducer in https://bugs.llvm.org/show_bug.cgi?id=44231.	2019-12-06 10:59:26 -08:00
Sanjay Patel	acc6b9d068	[x86] add cost model special-case for insert/extract from element 0 This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023	2019-12-06 13:50:25 -05:00
Hiroshi Yamauchi	800ab3625d	[PGO][PGSO] Instrument the code gen / target passes. Summary: Split off of D67120. Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't enable the size optimizations in those passes yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass queries). Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71072	2019-12-06 10:43:39 -08:00
Sanjay Patel	d37a542658	[InstCombine] reduce indentation; NFC	2019-12-06 13:26:45 -05:00
Sanjay Patel	d9ffcf2505	[InstCombine] improve readability; NFC CreateIntCast returns the input if its type matches, so need to duplicate that check.	2019-12-06 13:26:45 -05:00
Sanjay Patel	7d9af1f0f7	[InstCombine] reduce code duplication; NFC	2019-12-06 13:26:45 -05:00
Sanjay Patel	2855f303f4	[InstCombine] improve readability; NFC	2019-12-06 13:26:44 -05:00
Guozhi Wei	2be00d9e54	[MBP] Avoid tail duplication if it can't bring benefit Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead. To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks: make sure there is at least one duplication in current work set. the number of duplication should not exceed the number of successors. The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain. Differential Revision: https://reviews.llvm.org/D64376	2019-12-06 09:53:53 -08:00
diggerlin	ccd0b82a9d	[NFC][AIX][XCOFF] if the size of Csect is zero, the Csect do not need write any data into sections SUMMARY: if the size of Csect is zero, the Csect do not need write any data into sections for example, the TOC Csect has zero size, it do not need invoke a Asm.writeSectionData(W.OS, Csect.MCCsect, Layout); Reviewers: daltenty Subscribers: rupprecht, seiyai,hiraditya Differential Revision: https://reviews.llvm.org/D71120	2019-12-06 12:41:38 -05:00
diggerlin	8f5f4ed3e5	[NFC][AIX][XCOFF] fixed compile warning on the strncpy. SUMMARY: There is warning when compile the file XCOFFObjectWriter.cpp /srv/llvm-buildbot-srcatch/llvm-build-dir/openmp-gcc-x86_64-linux-debian/llvm.src/llvm/lib/MC/XCOFFObjectWriter.cpp:414:17: warning: 'char* strncpy(char, const char, size_t)' specified bound 8 equals destination size [-Wstringop-truncation] The patch fixed the warning. Reviewer: daltenty Differential Revision: https://reviews.llvm.org/D71119	2019-12-06 12:22:28 -05:00
John Brawn	ac55c0eb6d	[LegalizeTypes] Add missing case for STRICT_FP_ROUND softening This fixes a test failure in test/CodeGen/ARM/fp-intrinsics.ll.	2019-12-06 15:54:27 +00:00
Simon Tatham	449029437c	[ARM][MVE] Fix copy-paste error in VQSHL instruction ids. Summary: The immediate forms of the MVE VQSHL instruction have MC names like `MVE_VSLIimms8` and `MVE_VSLIimmu32`. Those names are confusing, because VSLI is a completely different shift instruction with no semantic relation to VQSHL. But it just happens to be defined immediately before VQSHL in `ARMInstrMVE.td`, so this looks like a copy-paste error. Renamed the ids to match the instruction name. Reviewers: ostannard, dmgreen, MarkMurrayARM, miyuki Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71114	2019-12-06 15:23:23 +00:00
diggerlin	6a285a1aba	[AIX][XCOFF] created a test case to verify the raw text section of xcoffobject file SUMMARY: in the patch https://reviews.llvm.org/D66969 . we need a test case to verify the out text section of the xcoffobject file is correct or not. but we do not have llvm disassembly tools to dump the xcoffobjectfile . since we commit the patch https://reviews.llvm.org/D70255, we have tools for it. we create this test case for it. Reviewers: daltenty,hubert.reinterpretcast, Differential Revision: https://reviews.llvm.org/D70719	2019-12-06 10:12:09 -05:00
Cullen Rhodes	86ca6336d4	[AArch64] Fix a bug with jump table generation Summary: When trying to calculate the offsets for the jump table entries we fail to take into account the block alignment, which could be greater than 4 bytes. This led to cases where the jump table offset was too big to fit in a byte. Reviewers: t.p.northover, sdesmalen, ostannard Reviewed By: ostannard Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits Committed on behalf of David Sherwood (david-arm) Tags: #llvm Differential Revision: https://reviews.llvm.org/D70533	2019-12-06 14:31:53 +00:00
Nico Weber	1362f5ee73	gn build: Unbreak mac build after 4066591	2019-12-06 09:09:25 -05:00
Alexey Lapshin	ca8269eaf0	Fix building shared libraries broken by 8e48e8e3e32.	2019-12-06 16:48:41 +03:00
Jeremy Morse	648a9901c3	Attempt to fix a debuginfo test that wasn't as generic as I thought An ARM buildbot croaks when this test doesn't have a triple specified: http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/12021/ Move the test to the X86 directory and put an x86_64 triple on the llc command line.	2019-12-06 12:51:58 +00:00
Georgii Rymar	ac568692e9	[llvm-readobj][llvm-readelf] - Refactor parsing of the SHT_GNU_versym section. This introduce a new helper which is used to parse the SHT_GNU_versym section. LLVM/GNU styles implementations now use it to share the logic. Differential revision: https://reviews.llvm.org/D71054	2019-12-06 15:35:05 +03:00
Gil Rapaport	66a3737c3f	[LV] Record GEP widening decisions in recipe (NFCI) InnerLoopVectorizer's code called during VPlan execution still relies on original IR's def-use relations to decide which vector code to generate, limiting VPlan transformations ability to modify def-use relations and still have ILV generate the vector code. This commit moves GEP operand queries controlling how GEPs are widened to a dedicated recipe and extracts GEP widening code to its own ILV method taking those recorded decisions as arguments. This reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Differential revision: https://reviews.llvm.org/D69067	2019-12-06 13:41:19 +02:00
Cullen Rhodes	f7d32c78c1	[AArch64][SVE2] Implement while comparison intrinsics Summary: Adds the following intrinsics: * whilege, whilegt, whilehi, whilehs Reviewers: sdesmalen, rovka, dancgr, efriedma, rengolin, huntergr Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70909	2019-12-06 11:29:34 +00:00
Georgii Rymar	41cf18166c	[llvm-readobj] - Implement --dependent-libraries flag. There is no way to dump SHT_LLVM_DEPENDENT_LIBRARIES sections currently. This patch implements this. The section is described here: https://llvm.org/docs/Extensions.html#sht-llvm-dependent-libraries-section-dependent-libraries Differential revision: https://reviews.llvm.org/D70665	2019-12-06 14:28:29 +03:00
Jeremy Morse	765a5a757e	[DebugInfo][CGP] Update dbg.values when sinking address computations One of CodeGenPrepare's optimizations is to duplicate address calculations into basic blocks, so that as much information as possible can be folded into memory addressing operands. This is great -- but the dbg.value variable location intrinsics are not updated in the same way. This can lead to dbg.values referring to address computations in other blocks that will never be encoded into the DAG, while duplicate address computations are performed locally that could be used by the dbg.value. Some of these (such as non-constant-offset GEPs) can't be salvaged past. Fix this by, whenever we duplicate an address computation into a block, looking for dbg.value users of the original memory address in the same block, and redirecting those to the local computation. Differential Revision: https://reviews.llvm.org/D58403	2019-12-06 11:27:19 +00:00
Ulrich Weigand	6fcc90719e	[X86] Regenerate test to fix build bot failures After my recent commit daee549 the following test case is failing: CodeGen/X86/vector-constrained-fp-intrinsics.ll Not sure why I didn't catch this earlier, seems to be affected by other changes that came in recently. Fixed by regerenating the test again. Sorry for the disruption!	2019-12-06 12:11:56 +01:00
Cullen Rhodes	48fc78f483	[AArch64][SVE] Implement integer compare intrinsics Summary: Adds intrinsics for the following: * cmphs, cmphi * cmpge, cmpgt * cmpeq, cmpne * cmplt, cmple * cmplo, cmpls Includes a minor change to `TLI.getMemValueType` that fixes a crash due to the scalable flag being dropped. Reviewers: sdesmalen, efriedma, rengolin, rovka, dancgr, huntergr Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70889	2019-12-06 10:39:06 +00:00
Ulrich Weigand	1a3ef94576	[FPEnv][SelectionDAG] Relax chain requirements This patch implements the following changes: 1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats each constrained intrinsic like a global barrier (e.g. a function call) and fully serializes all pending chains. This is actually not required; it is allowed for constrained intrinsics to be reordered w.r.t one another or (nonvolatile) memory accesses. The MI-level scheduler already allows for that flexibility, so it makes sense to allow it at the DAG level as well. This patch therefore changes the way chains for constrained intrisincs are created, and handles them basically like load operations are handled. This has the effect that constrained intrinsics are no longer serialized against one another or (nonvolatile) loads. They are still serialized against stores, but that seems hard to change with the current DAG chain setup, and it also doesn't seem to be a big problem preventing DAG 2) The OPC_CheckFoldableChainNode check requires that each of the intermediate nodes in a multi-node pattern match only has a single use. This check tends to fail if those intermediate nodes are strict operations as those have a chain output that typically indeed has another use. However, we don't really need to consider chains here at all, since they will all be rewritten anyway by UpdateChains later. Other parts of the matcher therefore already ignore chains, but this hasOneUse check doesn't. This patch replaces hasOneUse by a custom test that verifies there is no more than one use of any non-chain output value. In theory, this change could affect code unrelated to strict FP nodes, but at least on SystemZ I could not find any single instance of that happening 3) The SystemZ back-end currently does not allow matching multiply-and- extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for strict FP operations. This was not possible in the past due to the problems described under 1) and 2) above. With those issues fixed, it is now possible to fully support those instructions in strict mode as well, and this patch does so. Differential Revision: https://reviews.llvm.org/D70913	2019-12-06 11:02:11 +01:00
LLVM GN Syncbot	2cf462d7ef	gn build: Merge 99768b243cd	2019-12-06 08:55:53 +00:00

1 2 3 4 5 ...

188622 Commits