llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 19:23:23 +01:00

Author	SHA1	Message	Date
Jessica Paquette	7d1c45a131	[MachineOutliner] Drop candidates that require fixups if it's beneficial If it's a bigger code size win to drop candidates that require stack fixups than to demote every candidate to that variant, the outliner should do that. This happens if the number of bytes taken by calls to functions that don't require fixups, plus the number of bytes that'd be left is less than the number of bytes that it'd take to emit a save + restore for all candidates. Also add tests for each possible new behaviour. - machine-outliner-compatible-candidates shows that when we have candidates that don't use the stack, we can use the default call variant along with the no save/regsave variant. - machine-outliner-all-stack shows that when it's better to fix up the stack, we still will demote all candidates to that case - machine-outliner-drop-stack shows that we can discard candidates that require stack fixups when it would be beneficial to do so. llvm-svn: 348168	2018-12-03 19:11:27 +00:00
Krzysztof Parzyszek	6fc5489b62	[Hexagon] Add HasV5 predicate for compatibility with auto-generated files llvm-svn: 348167	2018-12-03 19:05:42 +00:00
Zachary Turner	252a37a8c5	Fix issue with Tpi Stream hash map. Part of the patch to not build the hash map eagerly was omitted due to a merge conflict. Add it back, which should fix the failing tests. llvm-svn: 348166	2018-12-03 19:05:12 +00:00
Craig Topper	c94660f8af	[X86] Fix bad formatting. NFC llvm-svn: 348164	2018-12-03 18:58:57 +00:00
Krzysztof Parzyszek	d0234835b8	[Hexagon] Remove unused operand definitions, NFC llvm-svn: 348163	2018-12-03 18:54:24 +00:00
Krzysztof Parzyszek	e363a3bab7	[Hexagon] Some formatting changes, NFC llvm-svn: 348162	2018-12-03 18:40:15 +00:00
Zachary Turner	bbde9d8ccd	Don't build the Tpi Hash map by default. This is very slow and should be done for specific cases where lookups will need to happen. llvm-svn: 348160	2018-12-03 18:32:05 +00:00
Craig Topper	1c65ac8f4d	[X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS. Summary: We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly. After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations. This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55165 llvm-svn: 348159	2018-12-03 18:26:27 +00:00
Craig Topper	067d9f2d4f	[X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8. Summary: Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction. The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54836 llvm-svn: 348158	2018-12-03 18:26:24 +00:00
Adrian Prantl	21a44946f8	Fix non-modular build. llvm-svn: 348157	2018-12-03 18:07:03 +00:00
Adrian Prantl	09f3fa4855	Update Diagnostic handling for changes in CFE. The clang frontend no longer emits the current working directory for DIFiles containing an absolute path in the filename: and will move the common prefix between current working directory and the file into the directory: component. https://reviews.llvm.org/D55085 llvm-svn: 348155	2018-12-03 17:55:29 +00:00
Sanjay Patel	67ca1a946a	[SimplifyCFG] add tests for cross block compare folding; NFC These are the baseline tests for D54827. Patch based on code originally written by: @yinyuefengyi (luo xionghu) Differential Revision: https://reviews.llvm.org/D54994 llvm-svn: 348151	2018-12-03 16:55:29 +00:00
Sanjay Patel	7f522094fd	[CmpInstAnalysis] fix formatting; NFC There are potential improvements to the structure of this API raised by D54994, but remove some cosmetic blemishes before making any functional changes. llvm-svn: 348149	2018-12-03 15:48:30 +00:00
Simon Pilgrim	17abba4cec	Fix line endings. NFCI. llvm-svn: 348146	2018-12-03 14:55:09 +00:00
Fedor Sergeev	e9e52b4a6a	Fixing -print-module-scope for legacy SCC passes It appears that print-module-scope was not implemented for legacy SCC passes. Fixed to print a whole module instead of just current SCC. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D54793 llvm-svn: 348144	2018-12-03 14:48:15 +00:00
Jonas Paulsson	7777cbe23a	[SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test. A loaded value with multiple users compared with 0 will become a load and test single instruction. The load is not folded in this case (multiple users), but the compare instruction is eliminated. This patch returns 0 cost for the icmp in these cases. Review: Ulrich Weigand https://reviews.llvm.org/D55111 llvm-svn: 348141	2018-12-03 14:30:18 +00:00
Pablo Barrio	72d3164a16	[AArch64] Add command-line option for SSBS Summary: SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5 onwards but is optional from Armv8.0-A. This patch adds a command line option to enable SSBS, as it was previously only possible to enable by selecting -march=armv8.5-a. Similar patch upstream in GNU binutils: https://sourceware.org/ml/binutils/2018-09/msg00274.html Reviewers: olista01, samparker, aemerson Reviewed By: samparker Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits Differential Revision: https://reviews.llvm.org/D54629 llvm-svn: 348137	2018-12-03 14:00:47 +00:00
Ron Lieberman	c710fa7f34	[AMDGPU] Add sdwa support for ADD\|SUB U64 decomposed Pseudos The introduction of S_{ADD\|SUB}_U64_PSEUDO instructions which are decomposed into VOP3 instruction pairs for S_ADD_U64_PSEUDO: V_ADD_I32_e64 V_ADDC_U32_e64 and for S_SUB_U64_PSEUDO V_SUB_I32_e64 V_SUBB_U32_e64 preclude the use of SDWA to encode a constant. SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions, but not on VOP3 instructions. We desire to fold the bit-and operand into the instruction encoding for the V_ADD_I32 instruction. This requires that we transform the VOP3 into a VOP2 form of the instruction (_e32). %19:vgpr_32 = V_AND_B32_e32 255, killed %16:vgpr_32, implicit $exec %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64 %26.sub0:vreg_64, %19:vgpr_32, implicit $exec %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64 %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec which then allows the SDWA encoding and becomes %47:vgpr_32 = V_ADD_I32_sdwa 0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0, implicit-def $vcc, implicit $exec %48:vgpr_32 = V_ADDC_U32_e32 0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec Differential Revision: https://reviews.llvm.org/D54882 llvm-svn: 348132	2018-12-03 13:04:54 +00:00
Tim Northover	5daefbd8b2	ARM: use target-specific SUBS node when combining cmp with cmov. This has two positive effects. First, using a custom node prevents recombination leading to an infinite loop since the output DAG is notionally a little more complex than the input one. Using a flag-setting instruction also allows the subtraction to be folded with the related comparison more easily. https://reviews.llvm.org/D53190 llvm-svn: 348122	2018-12-03 11:16:21 +00:00
Diogo N. Sampaio	61a6678d57	[NFC][AArch64] Split out backend features This patch splits backend features currently hidden behind architecture versions. For example, currently the only way to activate complex numbers extension is targeting an v8.3 architecture, where after the patch this extension can be added separately. This refactoring is required by the new command lines proposal: http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html Reviewers: DavidSpickett, olista01, t.p.northover Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio Differential revision: https://reviews.llvm.org/D54633 llvm-svn: 348121	2018-12-03 11:08:13 +00:00
Stefan Granitz	e03d279ac1	[CMake] Add LLVM_EXTERNALIZE_DEBUGINFO_OUTPUT_DIR for custom dSYM target directory on Darwin Summary: When using `LLVM_EXTERNALIZE_DEBUGINFO` in LLDB, the default dSYM location for the shared library in LLDB.framework is inside the framework bundle. With `LLVM_EXTERNALIZE_DEBUGINFO_OUTPUT_DIR` we can easily fix that. I consider it a useful feature to be able to set a global output directory for external debug info (rather then having a target-specific one). Only implemented for Darwin so far. Reviewers: beanz, aprantl Reviewed By: aprantl Subscribers: mgorny, aprantl, #lldb, lldb-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D55114 llvm-svn: 348118	2018-12-03 10:42:32 +00:00
Alex Bradbury	02c51ed077	[RISCV] Fix test/MC/Disassembler/RISCV/invalid-instruction.txt after rL347988 The test for [0x00 0x00] failed due to the introduction of c.unimp. This particular test is unnecessary now that c.unimp was defined (and is tested in test/MC/RISCV/rv32c-valid.s). llvm-svn: 348117	2018-12-03 10:35:46 +00:00
George Rimar	20faa793e5	[llvm-dwarfdump] - Stop printing the bogus empty section name on invalid dwarf. When there is no .debug_addr section for some reason, llvm-dwarfdump would print the bogus empty section name when dumping ranges in .debug_info: DW_AT_ranges [DW_FORM_rnglistx] (indexed (0x0) rangelist = 0x00000004 [0x0000000000000000, 0x0000000000000001) "" [0x0000000000000000, 0x0000000000000002) "") That happens because of the code which uses 0 (zero) as a section index as a default value. The code should use -1ULL instead because technically 0 is a valid zero section index in ELF and -1ULL is a special constant used that means "no section available". This is mostly a fix for the overall correctness/safety of the code, but a test case is provided too. Differential revision: https://reviews.llvm.org/D55113 llvm-svn: 348115	2018-12-03 10:33:40 +00:00
Oliver Stannard	eb66331216	[ARM][MC] Move information about variadic register defs into tablegen Currently, variadic operands on an MCInst are assumed to be uses, because they come after the defs. However, this is not always the case, for example the Arm/Thumb LDM instructions write to a variable number of registers. This adds a property of instruction definitions which can be used to mark variadic operands as defs. This only affects MCInst, because MachineInstruction already tracks use/def per operand in each instance of the instruction, so can already represent this. This property can then be checked in MCInstrDesc, allowing us to remove some special cases in ARMAsmParser::isITBlockTerminator. Differential revision: https://reviews.llvm.org/D54853 llvm-svn: 348114	2018-12-03 10:32:42 +00:00
Oliver Stannard	a7553313be	[ARM][Asm] Debug trace for the processInstruction loop In the Arm assembly parser, we first match an instruction, then call processInstruction to possibly change it to a different encoding, to match rules in the architecture manual which can't be expressed by the table-generated matcher. This adds debug printing so that this process is visible when using the -debug option. To support this, I've added a new overload of MCInst::dump_pretty which takes the opcode name as a StringRef, since we don't have an InstPrinter instance in the assembly parser. Instead, we can get the same information directly from the MCInstrInfo. Differential revision: https://reviews.llvm.org/D54852 llvm-svn: 348113	2018-12-03 10:21:28 +00:00
Alexander Potapenko	8e6b80a242	[KMSAN] Enable -msan-handle-asm-conservative by default This change enables conservative assembly instrumentation in KMSAN builds by default. It's still possible to disable it with -msan-handle-asm-conservative=0 if something breaks. It's now impossible to enable conservative instrumentation for userspace builds, but it's not used anyway. llvm-svn: 348112	2018-12-03 10:15:43 +00:00
Petr Pavlu	3e533dcfa4	[GlobalISel] Fix test irtranslator-stackprotect-check.ll Fix for commit r347862. Use correct AArch64 triple in test CodeGen/AArch64/GlobalISel/irtranslator-stackprotect-check.ll. llvm-svn: 348111	2018-12-03 09:28:28 +00:00
Sjoerd Meijer	129cbf3900	[ARM] FP16: support vld1.16 for vector loads with post-increment Differential Revision: https://reviews.llvm.org/D55112 llvm-svn: 348110	2018-12-03 08:26:34 +00:00
Kang Zhang	b1f5b9262b	[PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction Summary: There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD. These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54738 llvm-svn: 348109	2018-12-03 03:32:57 +00:00
QingShan Zhang	46ee47cec1	[NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global is accessed as got-indirect or not. In theory, we should let the PPC target to determine how to lower the TOC Entry for globals. And the PPCTargetLowering requires this query to do some optimization for TOC_Entry. Differential Revision: https://reviews.llvm.org/D54925 llvm-svn: 348108	2018-12-03 03:32:16 +00:00
Nico Weber	184177b539	[gn build] Fix cosmetic bug in write_cmake_config.py Before, #cmakedefine FOO resulted in #define FOO with a trailing space if FOO was set to something truthy. Make it so that it's just #define FOO without a trailing space. No functional difference. Differential Revision: https://reviews.llvm.org/D55172 llvm-svn: 348107	2018-12-02 22:26:18 +00:00
Nico Weber	1835f9f561	[gn build] Slightly simplify write_cmake_config. Before, the script had a bunch of special cases for #cmakedefine and #cmakedefine01 and then did general variable substitution. Now, the script always does general variable substitution for all lines and handles the special cases afterwards. This has no observable effect for the inputs we use, but is easier to explain and slightly easier to implement. Also mention to link to CMake's configure_file() in the docstring. (The new behavior doesn't quite match CMake on lines like #cmakedefine ${FOO}, but nobody does that.) Differential Revision: https://reviews.llvm.org/D55171 llvm-svn: 348106	2018-12-02 22:25:25 +00:00
Nico Weber	39f9b06cce	[gn build] Add build files for llvm/lib/Analysis and llvm/lib/ProfileData Differential Revision: https://reviews.llvm.org/D55166 llvm-svn: 348105	2018-12-02 21:43:15 +00:00
Craig Topper	b14bada7bd	[X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar. llvm-svn: 348104	2018-12-02 19:47:14 +00:00
Craig Topper	5c466e750d	[X86] Fix bad comment. NFC llvm-svn: 348103	2018-12-02 19:47:13 +00:00
Michal Gorny	76c7542bde	[test] Fix use of 'sort -b' in SimpleLoopUnswitch on NetBSD Add '-k 1' to 'sort -b' calls in SimpleLoopUnswitch tests, as required for sort implementation on NetBSD. The '-b' modifier is ineffective if specified without any key. Per the manpage: Note that the -b option has no effect unless key fields are specified. Differential Revision: https://reviews.llvm.org/D55168 llvm-svn: 348097	2018-12-02 16:49:33 +00:00
Michal Gorny	2d8d7b8468	[test] Fix ScalarEvolution test to allow __func__ with prototype Fix ScalarEvolution/solve-quadratic.ll test to account for __func__ output listing the complete function prototype rather than just its name, as it does on NetBSD. Example Linux output: GetQuadraticEquation: addrec coeff bw: 4 GetQuadraticEquation: equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Example NetBSD output: llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr): addrec coeff bw: 4 llvm::Optional<std::tuple<llvm::APInt, llvm::APInt, llvm::APInt, llvm::APInt, unsigned int> > GetQuadraticEquation(const llvm::SCEVAddRecExpr): equation -2x^2 + -2x + -4, coeff bw: 5, multiplied by 2 Differential Revision: https://reviews.llvm.org/D55162 llvm-svn: 348096	2018-12-02 16:49:28 +00:00
Michal Gorny	fe6b91d018	[test] Fix BugPoint/compile-custom.ll to use detected python exec Spawn the custom compile command in BugPoint/compile-custom.ll via %python rather than relying on implicit 'env python' shebang, in order to fix it on systems that don't have 'python' executable such as NetBSD. Differential Revision: https://reviews.llvm.org/D55161 llvm-svn: 348095	2018-12-02 16:49:23 +00:00
Nikita Popov	bb7c898cb7	[ValueTracking] Support funnel shifts in computeKnownBits() If the shift amount is known, we can determine the known bits of the output based on the known bits of two inputs. This is essentially the same functionality as implemented in D54869, but for ValueTracking rather than InstCombine SimplifyDemandedBits. Differential Revision: https://reviews.llvm.org/D55140 llvm-svn: 348091	2018-12-02 14:14:11 +00:00
Sanjay Patel	2986f40e8a	[SelectionDAG] fold constant with undef vector per element This makes the SDAG behavior consistent with the way we do this in IR. It's possible that we were getting the wrong answer before. For example, 'xor undef, undef --> 0' but 'xor undef, C' --> undef. But the most practical improvement is likely as shown in the tests here - for FP, we were overconstraining undef lanes to NaN, and that can prevent vector simplifications/narrowing (see D51553). llvm-svn: 348090	2018-12-02 13:48:42 +00:00
Sanjay Patel	7fad8e54d1	[DAGCombiner] guard against an oversized shift crash This change prevents the crash noted in the post-commit comments for rL347478 : http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181119/605166.html We can't guarantee that an oversized shift amount is folded away, so we have to check for it. Note that I committed an incomplete fix for that crash with: rL347502 But as discussed here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181126/605679.html ...we have to try harder. So I'm not sure how to expose the bug now (and apparently no fuzzers have found a way yet either). On the plus side, we have discovered that we're missing real optimizations by not simplifying nodes sooner, so the earlier fix still has value, and there's likely more value in extending that so we can simplify more opcodes and simplify when doing RAUW and/or putting nodes on the combiner worklist. Differential Revision: https://reviews.llvm.org/D54954 llvm-svn: 348089	2018-12-02 13:33:56 +00:00
Sanjay Patel	9c0a094ebb	[ValueTracking] add helper function for testing implied condition; NFCI We were duplicating code around the existing isImpliedCondition() that checks for a predecessor block/dominating condition, so make that a wrapper call. llvm-svn: 348088	2018-12-02 13:26:03 +00:00
Craig Topper	9ae71594e9	[X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast. Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations. llvm-svn: 348087	2018-12-02 07:52:39 +00:00
Craig Topper	1c3f74bc5b	[X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack. Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register. llvm-svn: 348086	2018-12-02 05:46:50 +00:00
Craig Topper	b16fe4df30	[X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64. The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away. By custom legalizing it we can avoid this churn and maybe produce better code. llvm-svn: 348085	2018-12-02 05:46:48 +00:00
Craig Topper	d89a17060e	[X86] Add vXi8 division/remainder by non-splat constant test cases to prepare for an upcoming patch. llvm-svn: 348082	2018-12-01 21:53:08 +00:00
Jessica Paquette	c877e03376	[MachineOutliner][AArch64] Improve checks for stack instructions If we know that we'll definitely save LR to a register, there's no reason to pre-check whether or not a stack instruction is unsafe to fix up. This makes it so that we check for that condition before mapping instructions. This allows us to outline more, since we don't pessimise as many instructions. Also update some tests, since we outline more. llvm-svn: 348081	2018-12-01 21:24:06 +00:00
Jessica Paquette	2fa8070014	Replace w16/w17 in machine-outliner.mir with w11/w12 These registers should not be used here, since they are interprocedural scratch registers in AArch64. llvm-svn: 348080	2018-12-01 21:23:58 +00:00
Craig Topper	108b8ed5bf	[X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1 Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55138 llvm-svn: 348079	2018-12-01 19:26:31 +00:00
Simon Pilgrim	a5a555e69a	[TTI] Reduction costs only need to include a single extract element cost (REAPPLIED) We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type. For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed. Fixes PR37731 Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997 Differential Revision: https://reviews.llvm.org/D54585 llvm-svn: 348076	2018-12-01 14:18:31 +00:00

1 2 3 4 5 ...

172209 Commits