llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00

Author	SHA1	Message	Date
Joel Jones	74a51d699d	[AArch64] Enable libm vectorized functions via SLEEF This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: http://sleef.org/. * A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF. * A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF. * A comprehensive test case is included in this changeset. * In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF. Trigonometry functions that are vectorized by sleef: acos asin atan atanh cos cosh exp exp2 exp10 lgamma log10 log2 log sin sinh sqrt tan tanh tgamma Patch by Stefan Teleman Differential Revision: https://reviews.llvm.org/D53927 llvm-svn: 347510	2018-11-24 06:41:39 +00:00
Fangrui Song	a4304aa848	[ARM] Add dependency from ARMAsmParser to ARMAsmPrinter after r347494 This fixes -DBUILD_SHARED_LIBS=on llvm-svn: 347506	2018-11-23 23:43:46 +00:00
Nikita Popov	ed78d1169c	[InstCombine] Simplify funnel shift with zero/undef operand to shift The following simplifications are implemented: * `fshl(X, 0, C) -> shl X, C%BW` * `fshl(X, undef, C) -> shl X, C%BW` (assuming undef = 0) * `fshl(0, X, C) -> lshr X, BW-C%BW` * `fshl(undef, X, C) -> lshr X, BW-C%BW` (assuming undef = 0) * `fshr(X, 0, C) -> shl X, (BW-C%BW)` * `fshr(X, undef, C) -> shl X, BW-C%BW` (assuming undef = 0) * `fshr(0, X, C) -> lshr X, C%BW` * `fshr(undef, X, C) -> lshr, X, C%BW` (assuming undef = 0) The simplification is only performed if the shift amount C is constant, because we can explicitly compute C%BW and BW-C%BW in this case. Differential Revision: https://reviews.llvm.org/D54778 llvm-svn: 347505	2018-11-23 22:45:08 +00:00
Evandro Menezes	c46bcdb565	[TableGen] Emit more variant transitions `llvm-mca` relies on the predicates to be based on `MCSchedPredicate` in order to resolve the scheduling for variant instructions. Otherwise, it aborts the building of the instruction model early. However, the scheduling model emitter in `TableGen` gives up too soon, unless all processors use only such predicates. In order to allow more processors to be used with `llvm-mca`, this patch emits scheduling transitions if any processor uses these predicates. The transition emitted for the processors using legacy predicates is the one specified with `NoSchedPred`, which is based on `MCSchedPredicate`. Preferably, `llvm-mca` should instead assume a reasonable default when a variant transition is not based on `MCSchedPredicate` for a given processor. This issue should be revisited in the future. Differential revision: https://reviews.llvm.org/D54648 llvm-svn: 347504	2018-11-23 21:17:33 +00:00
Andrea Di Biagio	92711a810d	[llvm-mca] Refactor some of the logic in InstrBuilder, and add a verifyOperands method. With this change, InstrBuilder emits an error if the MCInst sequence contains an instruction with a variadic opcode, and a non-zero number of variadic operands. Currently we don't know how to correctly analyze variadic opcodes. The problem with variadic operands is that there is no information for them in the opcode descriptor (i.e. MCInstrDesc). That means, we don't know which variadic operands are defs, and which are uses. In future, we could try to conservatively assume that any extra register operands is both a register use and a register definition. This patch fixes a subtle bug in the evaluation of read/write operands for ARM VLD1 with implicit index update. Added test vld1-index-update.s llvm-svn: 347503	2018-11-23 20:26:57 +00:00
Sanjay Patel	99ff4d9983	[DAG] consolidate shift simplifications ...and use them to avoid creating obviously undef values as discussed in the post-commit thread for r347478. The diffs in vector div/rem show that we were missing real optimizations by creating bogus shift nodes. llvm-svn: 347502	2018-11-23 20:05:12 +00:00
Sanjay Patel	ccc81c64b7	[x86] make test immune to oversized shift simplification I'm not sure if this actually preserves the original intent of this test, but if we leave it as-is, the -1 (oversized) shift should be folded to undef and allow deleting half of the output. llvm-svn: 347501	2018-11-23 19:45:29 +00:00
Luke Cheeseman	610670577d	Revert r347490 as it breaks address sanitizer builds llvm-svn: 347499	2018-11-23 17:13:06 +00:00
Oliver Stannard	878c27c4ea	[ARM][AsmParser] Improve debug printing of parsed asm operands In ARMOperand::print: - Print human-readable register names, instead of numbers. - Print the correct names for IT condition masks (these were in the wrong order before). - Print all parts of memory operands, not just the base register. This makes the output of llvm-mc -show-inst-operands more readable. Differential revision: https://reviews.llvm.org/D54850 llvm-svn: 347494	2018-11-23 14:27:21 +00:00
Andrea Di Biagio	57fcc40fb8	[llvm-mca][View] Improved Retire Control Unit Statistics. RetireControlUnitStatistics now reports extra information about the ROB and the avg/maximum number of entries consumed over the entire simulation. Example: Retire Control Unit - number of cycles where we saw N instructions retired: [# retired], [# cycles] 0, 109 (17.9%) 1, 102 (16.7%) 2, 399 (65.4%) Total ROB Entries: 64 Max Used ROB Entries: 35 ( 54.7% ) Average Used ROB Entries per cy: 32 ( 50.0% ) Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to reflect this change. llvm-svn: 347493	2018-11-23 12:12:57 +00:00
Eugene Leviant	18236c9bd1	Attempt to fix buildbot after r347489 llvm-svn: 347492	2018-11-23 11:28:58 +00:00
Luke Cheeseman	30437c2af7	Revert r343341 - Cannot reproduce the build failure locally and the build logs have been deleted. llvm-svn: 347490	2018-11-23 11:01:47 +00:00
Eugene Leviant	e92db220ba	[ThinLTO] Assembly representation of ReadOnly attribute Differential revision: https://reviews.llvm.org/D54754 llvm-svn: 347489	2018-11-23 10:54:51 +00:00
Max Kazantsev	99f7d361cb	[NFC] Add test that demonstrates buggy behavior on term folding of LoopSimplifyCFG llvm-svn: 347488	2018-11-23 10:34:22 +00:00
Sjoerd Meijer	59d415684b	[ARM][NFC] codegen tests cleanup: remove dangling check prefixes I am working on making FileCheck stricter (in D54769 and D53710) so that it issues diagnostics when there's something wrong with tests. This is a cleanup for dangling prefixes in the ARM codegen tests, e.g.: --check-prefixes=A,B where A occurs in the check file, but B doesn't. This can be innocent if A does all the required checking, but can also be a bug in that test if it results in the test actually not checking anything (if A for example only checks a common label). Test CodeGen/ARM/smml.ll is such an example. Differential Revision: https://reviews.llvm.org/D54842 llvm-svn: 347487	2018-11-23 10:08:39 +00:00
Max Kazantsev	de2c2b71c2	Disable LoopSimplifyCFG terminator folding by default llvm-svn: 347486	2018-11-23 09:14:53 +00:00
Max Kazantsev	4090cb9a08	[LoopSimplifyCFG] Don't delete LCSSA Phis When removing edges, we also update Phi inputs and may end up removing a Phi if it has only one input. We should not do it for edges that leave the current loop because these Phis are LCSSA Phis and need to be preserved. Thanks @dmgreen for finding this! Differential Revision: https://reviews.llvm.org/D54841 llvm-svn: 347484	2018-11-23 07:56:47 +00:00
Max Kazantsev	fa5d441135	[NFC] Add verification flags to tests llvm-svn: 347483	2018-11-23 05:21:53 +00:00
Craig Topper	0d537b1b78	[LegalizeVectorTypes] Don't use SplitVecOp_TruncateHelper if we're heading towards scalarizing the type. This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together. But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code. The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway. llvm-svn: 347482	2018-11-23 02:32:13 +00:00
Fangrui Song	5985e4d0db	[Object] Also treat STB_GNU_UNIQUE symbols as exported to other DSO All of STB_GLOBAL/STB_WEAK/STB_GNU_UNIQUE are treated as export symbols, see: glibc/elf/dl-lookup.c:do_lookup_x musl/ldso/dynlink.c OK_BINDS Though ld.so does not read binding, the currently used STV_DEFAULT or STV_PROTECTED is a good emulation of linker behavior. llvm-svn: 347481	2018-11-23 01:33:19 +00:00
Craig Topper	632b998af3	[LegalizeVectorTypes] Have SplitVecOp_TruncateHelper fall back to SplitVecOp_UnaryOp if splitting the output type would be a legal type. SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that. The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked. llvm-svn: 347479	2018-11-22 22:56:52 +00:00
Sanjay Patel	f60a29f8fa	[DAGCombiner] form 'not' ops ahead of shifts (PR39657) We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'), but that would not matter without this patch because DAGCombiner was reversing that transform. I think we need this transform in the backend regardless of what happens in IR to catch cases where the shift-xor is formed late from GEP or other ops. https://rise4fun.com/Alive/NC1 Name: shl Pre: (-1 << C2) == C1 %shl = shl i8 %x, C2 %r = xor i8 %shl, C1 => %not = xor i8 %x, -1 %r = shl i8 %not, C2 Name: shr Pre: (-1 u>> C2) == C1 %sh = lshr i8 %x, C2 %r = xor i8 %sh, C1 => %not = xor i8 %x, -1 %r = lshr i8 %not, C2 https://bugs.llvm.org/show_bug.cgi?id=39657 llvm-svn: 347478	2018-11-22 19:24:10 +00:00
Vladimir Stefanovic	0ab2a92a40	Reland test/MC/Mips/reloc-directive-label-offset.s The test was reverted because it failed on llvm-clang-x86_64-expensive-checks-win builder, and that was because -DEXPENSIVE_CHECKS adds randomness to llvm::sort(), affecting the order of relocation table entries. Modified the test to not have two relocations at the same offset. llvm-svn: 347476	2018-11-22 18:18:58 +00:00
Andrea Di Biagio	ab66dcd995	[llvm-mca] LSUnit: use a SmallSet to model load/store queues. NFCI Also, try to minimize the number of queries to the memory queues to speedup the analysis. On average, this change gives a small 2% speedup. For memcpy-like kernels, the speedup is up to 5.5%. llvm-svn: 347469	2018-11-22 15:47:44 +00:00
Andrea Di Biagio	68ff6bfb10	[llvm-mca] Use a SmallVector instead of std::vector to track register reads/writes. NFCI This avoids a heap allocation most of the times. This patch gives a small but consistent 3% speedup on a release build (up to ~5% on a debug build). llvm-svn: 347464	2018-11-22 14:48:53 +00:00
Andrea Di Biagio	934e895b61	[llvm-mca] Fix an invalid memory read introduced by r346487. This patch fixes an invalid memory read introduced by r346487. Before this patch, partial register write had to query the latency of the dependent full register write by calling a method on the full write descriptor. However, if the full write is from an already retired instruction, chances are that the EntryStage already reclaimed its memory. In some parial register write tests, valgrind was reporting an invalid memory read. This change fixes the invalid memory access problem. Writes are now responsible for tracking dependent partial register writes, and notify them in the event of instruction issued. That means, partial register writes no longer need to query their associated full write to check when they are ready to execute. Added test X86/BtVer2/partial-reg-update-7.s llvm-svn: 347459	2018-11-22 12:48:57 +00:00
Max Kazantsev	2646e999a1	[NFC] Assert that all blocks staying in loop are live llvm-svn: 347458	2018-11-22 12:43:27 +00:00
Max Kazantsev	351f1250a0	[NFC] Ensure deterministic order of dead exit blocks llvm-svn: 347457	2018-11-22 12:33:41 +00:00
John Brawn	0cab72eb19	[AArch64] Fix SelectionDAG infinite loop for v1i64 SCALAR_TO_VECTOR A consequence of r347274 is that SCALAR_TO_VECTOR can be converted into BUILD_VECTOR by SimplifyDemandedBits, but LowerBUILD_VECTOR can turn BUILD_VECTOR into SCALAR_TO_VECTOR so we get an infinite loop. Fix this by making LowerBUILD_VECTOR not do this transformation for those vectors that would get transformed back, i.e. BUILD_VECTOR of a single-element constant vector. Doing that means we get a DUP, which we then need to recognise in ISel as a copy. llvm-svn: 347456	2018-11-22 11:45:23 +00:00
Max Kazantsev	f196353c66	[NFC] Simplify code by using standard exit blocks collection llvm-svn: 347454	2018-11-22 10:48:30 +00:00
Chandler Carruth	4933a85499	[TI removal] Leverage the fact that TerminatorInst is gone to create a normal base class that provides all common "call" functionality. This merges two complex CRTP mixins for the common "call" logic and common operand bundle logic into a single, normal base class of `CallInst` and `InvokeInst`. Going forward, users can typically `dyn_cast<CallBase>` and use the resulting API. No more need for the `CallSite` wrapper. I'm planning to migrate current usage of the wrapper to directly use the base class and then it can be removed, but those are simpler and much more incremental steps. The big change is to introduce this abstraction into the type system. I've tried to do some basic simplifications of the APIs that I couldn't really help but touch as part of this: - I've tried to organize the attribute API and bundle API into groups to make understanding the API of `CallBase` easier. Without this, I wasn't able to navigate the API sanely for all of the ways I needed to modify it. - I've added what seem like more clear and consistent APIs for getting at the called operand. These ended up being especially useful to consolidate the numerous duplicated code paths trying to do this. - I've largely reworked the organization and implementation of the APIs for computing the argument operands as they needed to change to work with the new subclass approach. To minimize any cost associated with this abstraction, I've moved the operand layout in memory to store the called operand last. This makes its position relative to the end of the operand array the same, regardless of the subclass. It should make it much cheaper to reference from the `CallBase` abstraction, and this is likely one of the most frequent things to query. We do still pay one abstraction penalty here: we have to branch to determine whether there are 0 or 2 extra operands when computing the end of the argument operand sequence. However, that seems both rare and should optimize well. I've implemented this in a way specifically designed to allow it to optimize fairly well. If this shows up in profiles, we can add overrides of the relevant methods to the subclasses that bypass this penalty. It seems very unlikely that this will be an issue as the code was already dealing with an ever present abstraction of whether or not there are operand bundles, so this isn't the first branch to go into the computation. I've tried to remove as much of the obvious vestigial API surface of the old CRTP implementation as I could, but I suspect there is further cleanup that should now be possible, especially around the operand bundle APIs. I'm leaving all of that for future work in this patch as enough things are changing here as-is. One thing that made this harder for me to reason about and debug was the pervasive use of unsigned values in subtraction and other arithmetic computations. I had to debug more than one unintentional wrap. I've switched a few of these to use `int` which seems substantially simpler, but I've held back from doing this more broadly to avoid creating confusing divergence within a single class's API. I also worked to remove all of the magic numbers used to index into operands, putting them behind named constants or putting them into a single method with a comment and strictly using the method elsewhere. This was necessary to be able to re-layout the operands as discussed above. Thanks to Ben for reviewing this (somewhat large and awkward) patch! Differential Revision: https://reviews.llvm.org/D54788 llvm-svn: 347452	2018-11-22 10:31:35 +00:00
Haojian Wu	0ea34b9a2d	Revert r343473 "Move llvm util dependencies from clang-tools-extra to add_lit_target." Summary: It will cause test tools `FileCheck`, `count`, `not` being built blindly, these dependencies should move back to clang-tools-extra. Reviewers: mgorny Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54797 llvm-svn: 347448	2018-11-22 10:14:24 +00:00
Diana Picus	41fbe9382b	[ARM GlobalISel] Add test for BFC. NFCI r334871 has made it possible for TableGen'erated code to select BFC, but it has not added a test for it on the ARM side. Add it now to make sure we don't introduce regressions if we ever change anything about that rule. llvm-svn: 347447	2018-11-22 09:54:14 +00:00
Jonas Paulsson	011332f2a7	[SystemZTTIImpl] Give correct cost values for vector bswap intrinsics. Implement getIntrinsicInstrCost() and return costs reflecting that bswap can be done with a vperm per vector register. Review: Ulrich Weigand https://reviews.llvm.org/D54789 llvm-svn: 347445	2018-11-22 07:17:29 +00:00
Fangrui Song	2f7f5bdd19	[llvm-size] Use empty() and range-based for loop. NFC llvm-svn: 347441	2018-11-22 00:44:17 +00:00
Evandro Menezes	a8f6ed71b3	[llvm-mca] Add test case (NFC) Add test case that will serve as the base for D54820. llvm-svn: 347440	2018-11-22 00:38:36 +00:00
Sanjay Patel	1b0a399719	[x86] use FileCheck to verify output; NFC llvm-svn: 347438	2018-11-21 23:39:19 +00:00
Evandro Menezes	c9b15f0877	[llvm-mca] Add test case (NFC) Fix previous commit r347434. llvm-svn: 347437	2018-11-21 23:36:40 +00:00
Peter Collingbourne	5e94e2fc8e	Add a ubsan blacklist entry for libstdc++ 8.0.1. llvm-svn: 347436	2018-11-21 23:04:39 +00:00
Evandro Menezes	3553d34c93	[llvm-mca] Add test case (NFC) Add test case that will serve as the base for D54777. llvm-svn: 347434	2018-11-21 22:57:46 +00:00
Vladimir Stefanovic	ba4b117082	Removing test/MC/Mips/reloc-directive-label-offset.s temporarily This test is failing on llvm-clang-x86_64-expensive-checks-win builder. Removing it until I get it fixed. llvm-svn: 347433	2018-11-21 22:08:34 +00:00
Fedor Sergeev	2c255721ee	[PM] correcting return value for new-pass-manager version of Scalarizer Obvious mistake missed during D54695 review. llvm-svn: 347432	2018-11-21 22:01:19 +00:00
Reid Kleckner	757cba7bab	[mingw] Use unmangled name after the $ in the section name GCC does it this way, and we have to be consistent. This includes stdcall and fastcall functions with suffixes. I confirmed that a fastcall function named "foo" ends up in ".text$foo", not ".text$@foo@8". Based on a patch by Andrew Yohn! Fixes PR39218. Differential Revision: https://reviews.llvm.org/D54762 llvm-svn: 347431	2018-11-21 22:01:10 +00:00
Stefan Pintilie	f6b46a8a9b	[PowerPC][NFC] Split PPCMCCodeEmitter into header and cpp file. This is further cleanup for PPCMCCodeEmitter. The class had been contained within the cpp file alone. Now it has been split up between a header file and a cpp file which allows other classes to make use of the functions in this class if required. llvm-svn: 347428	2018-11-21 21:23:50 +00:00
Sanjay Patel	79c86e20ec	[DAGCombiner] refactor select-of-FP-constants transform This transform needs to be limited. We are converting to a constant pool load very early, and we are turning loads that are independent of the select condition (and therefore speculatable) into a dependent non-speculatable load. We may also be transferring a condition code from an FP register to integer to create that dependent load. llvm-svn: 347424	2018-11-21 20:54:47 +00:00
Stefan Pintilie	210b15e45d	[PowerPC][NFC] Minor Code Cleaup for PPCMCCodeEmitter. llvm-svn: 347422	2018-11-21 20:47:59 +00:00
Eric Fiselier	155ac280ff	[LLVM] Allow modulemap installation Summary: Currently we can't install the modulemaps provided by LLVM, since they are not structured to support headers generated as part of the build (ex. `llvm/IR/Attributes.gen`). This patch restructures the module maps in order to support installation. Modules containing generated headers are defined in the new `module.extern.modulemap` file, and are referenced from the main `module.modulemap` using `extern module`. There are two versions of the `module.extern.modulemap` file; one used when building and another, `module.install.modulemap`, which is re-named during installation. Users can opt-into module map installation using `-DLLVM_INSTALL_MODULEMAPS=ON`. The default value is `OFF` due to llvm.org/PR31905. Reviewers: rsmith, mehdi_amini, bruno, EricWF Reviewed By: EricWF Subscribers: tschuett, chapuni, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D53510 llvm-svn: 347420	2018-11-21 20:46:50 +00:00
Nikita Popov	d3de0c70b1	[InstCombine] Add tests for funnel shift with zero operand; NFC These are additional baseline tests for D54778. llvm-svn: 347414	2018-11-21 20:34:11 +00:00
Sanjay Patel	cd55deba15	[DAGCombiner] reduce code duplication; NFC llvm-svn: 347410	2018-11-21 20:00:32 +00:00
Nikita Popov	bce3f84f3f	[MergeFuncs] Generate alias instead of thunk if possible The MergeFunctions pass was originally intended to emit aliases instead of thunks where possible (unnamed_addr). However, for a long time this functionality was behind a flag hardcoded to false, bitrotted and was eventually removed in r309313. Originally the functionality was first disabled in r108417 due to lack of support for aliases in Mach-O. I believe that this is no longer the case nowadays, but not really familiar with this area. In the interest of being conservative, this patch reintroduces the aliasing functionality behind a default disabled -mergefunc-use-aliases flag. Differential Revision: https://reviews.llvm.org/D53285 llvm-svn: 347407	2018-11-21 19:37:19 +00:00

... 4 5 6 7 8 ...

172145 Commits