llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00

Author	SHA1	Message	Date
Kit Barton	12e4595b58	Properly handle the mftb instruction. The mftb instruction was incorrectly marked as deprecated in the PPC Backend. Instead, it should not be treated as deprecated, but rather be implemented using the mfspr instruction. A similar patch was put into GCC last year. Details can be found at: https://sourceware.org/ml/binutils/2014-11/msg00383.html. This change will replace instances of the mftb instruction with the mfspr instruction for all CPUs except 601 and pwr3. This will also be the default behaviour. Additional details can be found in: https://llvm.org/bugs/show_bug.cgi?id=23680 Phabricator review: http://reviews.llvm.org/D10419 llvm-svn: 239827	2015-06-16 16:01:15 +00:00
Colin LeMahieu	765eef8121	[Hexagon] Alphabetical ordering of functions, NFC. llvm-svn: 239826	2015-06-16 15:59:53 +00:00
Daniel Sanders	134c99480b	Clean up redundant copies of Triple objects. NFC Summary: Reviewers: rengolin Reviewed By: rengolin Subscribers: llvm-commits, rengolin, jholewinski Differential Revision: http://reviews.llvm.org/D10382 llvm-svn: 239823	2015-06-16 15:44:21 +00:00
Daniel Sanders	3469cb7a94	[mips][ias] Expand on r238751 to cover as many relocs as possible. Summary: Relocs that can be converted from absolute to PC-relative now do so if IsPCRel is true. Relocs that require PC-relative now call llvm_unreachable() if IsPCRel is false and similarly those that require absolute assert that IsPCRel is false. Note that while it looks like some relocs (e.g. R_MIPS_26) can be converted into the MIPS32r6/MIPS64r6 relocs (R_MIPS_PC*_S2), it isn't actually valid to do so. Placeholders have been left in the testcase for unsupported relocs and relocs that cannot be generated at the moment. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits, rafael Differential Revision: http://reviews.llvm.org/D10184 llvm-svn: 239817	2015-06-16 13:46:26 +00:00
Daniel Sanders	df4e26d91a	Replace string GNU Triples with llvm::Triple in TargetMachine::getTargetTriple(). NFC. Summary: This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. Reviewers: rengolin Reviewed By: rengolin Subscribers: llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10381 llvm-svn: 239815	2015-06-16 13:15:50 +00:00
Daniel Sanders	4f36d138c9	Recommit r239721: Replace string GNU Triples with llvm::Triple in InitMCObjectFileInfo. NFC. Summary: This affects other tools so the previous C++ API has been retained as a deprecated function for the moment. Clang has been updated with a trivial patch (not covered by the pre-commit review) to avoid breaking -Werror builds. Other in-tree tools will be fixed with similar patches. This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. The first time this was committed it accidentally fixed an inconsistency in triples in llvm-mc and this caused a failure. This inconsistency was fixed in r239808. Reviewers: rengolin Reviewed By: rengolin Subscribers: llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10366 llvm-svn: 239812	2015-06-16 12:18:07 +00:00
Toma Tabacu	3bbc325a64	[mips] [IAS] Refactor symbol-address loading code into a helper function. NFC. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9523 llvm-svn: 239811	2015-06-16 12:16:24 +00:00
Asaf Badouh	9879449284	[AVX512] add integer min/max intrinsics support. review: http://reviews.llvm.org/D10439 llvm-svn: 239806	2015-06-16 08:39:27 +00:00
Elena Demikhovsky	7cf9dd07b7	X86: optimized i64 vector multiply with constant When we multiply two 64-bit vectors, we extract lower and upper part and use the PMULUDQ instruction. When one of the operands is a constant, the upper part may be zero, we know this at compile time. Example: %a = mul <4 x i64> %b, <4 x i64> < i64 5, i64 5, i64 5, i64 5>. I'm checking the value of the upper part and prevent redundant "multiply", "shift" and "add" operations. llvm-svn: 239802	2015-06-16 06:07:24 +00:00
Ahmed Bougacha	e339d025d2	[AArch64] Generalize extract-high DUP extension to MOVI/MVNI. These are really immediate DUPs, and suffer from the same problem with long instructions with a high/2 variant (e.g. smull). By extending a MOVI (or DUP, before this patch), we can avoid an ext on the other operand of the long instruction, e.g. turning: ext.16b v0, v0, v0, #8 movi.4h v1, #0x53 smull.4s v0, v0, v1 into: movi.8h v1, #0x53 smull2.4s v0, v0, v1 While there, add a now-necessary combine to fold (VT NVCAST (VT x)). llvm-svn: 239799	2015-06-16 01:18:14 +00:00
Reid Kleckner	b0f9490716	[X86] Try to shorten dwarf CFI emission llvm-svn: 239786	2015-06-15 23:45:08 +00:00
Colin LeMahieu	454daa07b2	[Hexagon] PC-relative offsets are relative to packet start rather than the offset of the relocation. Set relocation addend and check it's correct in the ELF. llvm-svn: 239769	2015-06-15 21:52:13 +00:00
Peter Collingbourne	ea9bf98c05	Protection against stack-based memory corruption errors using SafeStack This patch adds the safe stack instrumentation pass to LLVM, which separates the program stack into a safe stack, which stores return addresses, register spills, and local variables that are statically verified to be accessed in a safe way, and the unsafe stack, which stores everything else. Such separation makes it much harder for an attacker to corrupt objects on the safe stack, including function pointers stored in spilled registers and return addresses. You can find more information about the safe stack, as well as other parts of or control-flow hijack protection technique in our OSDI paper on code-pointer integrity (http://dslab.epfl.ch/pubs/cpi.pdf) and our project website (http://levee.epfl.ch). The overhead of our implementation of the safe stack is very close to zero (0.01% on the Phoronix benchmarks). This is lower than the overhead of stack cookies, which are supported by LLVM and are commonly used today, yet the security guarantees of the safe stack are strictly stronger than stack cookies. In some cases, the safe stack improves performance due to better cache locality. Our current implementation of the safe stack is stable and robust, we used it to recompile multiple projects on Linux including Chromium, and we also recompiled the entire FreeBSD user-space system and more than 100 packages. We ran unit tests on the FreeBSD system and many of the packages and observed no errors caused by the safe stack. The safe stack is also fully binary compatible with non-instrumented code and can be applied to parts of a program selectively. This patch is our implementation of the safe stack on top of LLVM. The patches make the following changes: - Add the safestack function attribute, similar to the ssp, sspstrong and sspreq attributes. - Add the SafeStack instrumentation pass that applies the safe stack to all functions that have the safestack attribute. This pass moves all unsafe local variables to the unsafe stack with a separate stack pointer, whereas all safe variables remain on the regular stack that is managed by LLVM as usual. - Invoke the pass as the last stage before code generation (at the same time the existing cookie-based stack protector pass is invoked). - Add unit tests for the safe stack. Original patch by Volodymyr Kuznetsov and others at the Dependable Systems Lab at EPFL; updates and upstreaming by myself. Differential Revision: http://reviews.llvm.org/D6094 llvm-svn: 239761	2015-06-15 21:07:11 +00:00
Alex Lorenz	2e08d769c3	MIR Serialization: Connect the machine function analysis pass to the MIR parser. This commit connects the machine function analysis pass (which creates machine functions) to the MIR parser, which will initialize the machine functions with the state from the MIR file and reconstruct the machine IR. This commit introduces a new interface called 'MachineFunctionInitializer', which can be used to provide custom initialization for the machine functions. This commit also introduces a new diagnostic class called 'DiagnosticInfoMIRParser' which is used for MIR parsing errors. This commit modifies the default diagnostic handling in LLVMContext - now the the diagnostics are printed directly into llvm::errs() so that the MIR parsing errors can be printed with colours. Reviewers: Justin Bogner Differential Revision: http://reviews.llvm.org/D9928 llvm-svn: 239753	2015-06-15 20:30:22 +00:00
Eric Christopher	b7f0e9d828	Remove duplicate conditional in if-stmt. Fixes PR23839. llvm-svn: 239751	2015-06-15 20:16:53 +00:00
Colin LeMahieu	d0b4dab4b6	[Hexagon] Moving pass declarations out of header and in to implementation files. Removing unused function getSubtargetInfo from HexagonMCCodeEmitter.cpp Removing deletion of copy construction and assignment operator since parent already deletes it. llvm-svn: 239744	2015-06-15 19:05:35 +00:00
Sanjoy Das	f1dab90647	[TargetInstrInfo] Add new hook: AnalyzeBranchPredicate. Summary: NFC: no one uses AnalyzeBranchPredicate yet. Add TargetInstrInfo::AnalyzeBranchPredicate and implement for x86. A later change adding support for page-fault based implicit null checks depends on this. Reviewers: reames, ab, atrick Reviewed By: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10200 llvm-svn: 239742	2015-06-15 18:44:21 +00:00
Sanjoy Das	ce0590cf7a	[TargetInstrInfo] Rename getLdStBaseRegImmOfs and implement for x86. Summary: TargetInstrInfo::getLdStBaseRegImmOfs to TargetInstrInfo::getMemOpBaseRegImmOfs and implement for x86. The implementation only handles a few easy cases now and will be made more sophisticated in the future. This is NFCI: the only user of `getLdStBaseRegImmOfs` (now `getmemOpBaseRegImmOfs`) is `LoadClusterMotion` and `LoadClusterMotion` is disabled for x86. Reviewers: reames, ab, MatzeB, atrick Reviewed By: MatzeB, atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10199 llvm-svn: 239741	2015-06-15 18:44:14 +00:00
Sanjoy Das	b396b9e375	[CodeGen] Introduce a FAULTING_LOAD_OP pseudo-op. Summary: This instruction encodes a loading operation that may fault, and a label to branch to if the load page-faults. The locations of potentially faulting loads and their "handler" destinations are recorded in a FaultMap section, meant to be consumed by LLVM's clients. Nothing generates FAULTING_LOAD_OP instructions yet, but they will be used in a future change. The documentation (FaultMaps.rst) needs improvement and I will update this diff with a more expanded version shortly. Depends on D10196 Reviewers: rnk, reames, AndyAyers, ab, atrick, pgavlin Reviewed By: atrick, pgavlin Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10197 llvm-svn: 239740	2015-06-15 18:44:08 +00:00
Sanjoy Das	dc981cb399	[NFC] Extract X86MCInstLower::LowerMachineOperand. Summary: Refactoring-only change that will be used later. Reviewers: reames, atrick Reviewed By: atrick Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10196 llvm-svn: 239739	2015-06-15 18:44:01 +00:00
Evgeny Astigeevich	0dabab1ee4	On behalf of Alexandros Lamprineas: LLVM targeting aarch64 doesn't correctly produce aligned accesses for non-aligned data at -O0/fast-isel (-mno-unaligned-access). The root cause seems to be in fast-isel not producing unaligned access correctly for -mno-unaligned-access. The patch just aborts fast-isel for loads and stores when -mno-unaligned-access is present. The regression test is updated to check this new test case (-mno-unaligned-access together with fast-isel). Differential Revision: http://reviews.llvm.org/D10360 llvm-svn: 239732	2015-06-15 15:48:44 +00:00
Daniel Sanders	52648be0df	Revert r239721 - Replace string GNU Triples with llvm::Triple in InitMCObjectFileInfo. NFC. It appears to cause sparc-little-endian.s to assert on Windows and Darwin. llvm-svn: 239724	2015-06-15 10:34:38 +00:00
Daniel Sanders	15d01ae3f0	Replace string GNU Triples with llvm::Triple in InitMCObjectFileInfo. NFC. Summary: This affects other tools so the previous C++ API has been retained as a deprecated function for the moment. Clang has been updated with a trivial patch (not covered by the pre-commit review) to avoid breaking -Werror builds. Other in-tree tools will be fixed with similar trivial patches. This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. Reviewers: rengolin Reviewed By: rengolin Subscribers: llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10366 llvm-svn: 239721	2015-06-15 09:19:41 +00:00
Hao Liu	27ebfb93f8	[AArch64] Delete two empty files, which should be removed by r239713. llvm-svn: 239715	2015-06-15 02:56:40 +00:00
Hao Liu	98f2c1622d	[AArch64] Revert r239711 again. We need to discuss how to share code between AArch64 and ARM backend. llvm-svn: 239713	2015-06-15 01:56:40 +00:00
Hao Liu	3a81303ff9	[AArch64] Match interleaved memory accesses into ldN/stN instructions. Re-commit after adding "-aarch64-neon-syntax=generic" to fix the failure on OS X. This patch was firstly committed in r239514, then reverted in r239544 because of a syntax incompatible failure on OS X. llvm-svn: 239711	2015-06-15 01:35:49 +00:00
Igor Breger	bf950bad09	AVX-512: Implemented DAG lowering for shuff62x2/shufi62x2 instuctions ( Shuffle Packed Values at 128-bit Granularity ) Tests added , vector-shuffle-512-v8.ll test re-generated. Differential Revision: http://reviews.llvm.org/D10300 llvm-svn: 239697	2015-06-14 13:07:47 +00:00
Michael Kuperstein	915c69271d	Add support for parsing the XOR operator in Intel syntax inline assembly. Differential Revision: http://reviews.llvm.org/D10385 Patch by marina.yatsina@intel.com llvm-svn: 239695	2015-06-14 12:59:45 +00:00
Igor Breger	f163333815	AVX-512: Implemented cvtsi2ss/d cvtusi2ss/d instructions with round control for KNL. Added intrinsics for cvtsi2ss/d instructions. Added tests for intrinsics and encoding. Differential Revision: http://reviews.llvm.org/D10430 llvm-svn: 239694	2015-06-14 12:44:55 +00:00
Simon Pilgrim	0f06a86b5d	Stripped trailing whitespace. NFC. llvm-svn: 239672	2015-06-13 12:51:39 +00:00
Tom Stellard	665c24e443	AMDGPU: s/R600/AMDGPU/ in the Makefiles Now the library names in the Makefiles match the library names in LLVMBuild.txt. This should hopefully fix the remaining bot failures. llvm-svn: 239661	2015-06-13 05:11:14 +00:00
Matthias Braun	729d1d707d	Rename TargetSubtargetInfo::enablePostMachineScheduler() to enablePostRAScheduler() r213101 changed the behaviour of this method to not only affect the PostMachineScheduler scheduler but also the PostRAScheduler scheduler, renaming should make this fact clear. Also document that the preferred way is to specify this in the scheduling model instead of overriding this method. Differential Revision: http://reviews.llvm.org/D10427 llvm-svn: 239659	2015-06-13 03:42:16 +00:00
Matthias Braun	e311841a60	MachineLICM: Use TargetSchedModel instead of just itineraries This will use Itinieraries if available, but will also work if just a MCSchedModel is available. Differential Revision: http://reviews.llvm.org/D10428 llvm-svn: 239658	2015-06-13 03:42:11 +00:00
Tom Stellard	3f1708598e	R600 -> AMDGPU rename llvm-svn: 239657	2015-06-13 03:28:10 +00:00
Tim Northover	ddc656df8e	AArch64: map bare-metal arm64-macho triple to MachO MC layer. Far better than an assertion about expecting ELF. llvm-svn: 239647	2015-06-12 23:37:11 +00:00
Tom Stellard	2b78afe7cf	R600/SI: Add assembler support for FLAT instructions - Add glc, slc, and tfe operands to flat instructions - Add missing flat instructions - Fix the encoding of flat_load_dwordx3 and flat_store_dwordx3. llvm-svn: 239637	2015-06-12 20:47:06 +00:00
Colin LeMahieu	68ef17e4c2	[Hexagon] Making intrinsic tests agnostic to register allocation. Narrowing intrinsic parameters to appropriate width. llvm-svn: 239634	2015-06-12 19:57:32 +00:00
Douglas Katzman	5eb858225c	Wrap some long lines in LLVMBuild files. NFC As suggested by jroelofs in a prior review (D9752), it makes sense to generally prefer multi-line format. llvm-svn: 239632	2015-06-12 18:44:57 +00:00
Rafael Espindola	12677ab0cc	Remove a hack that tries to align '*'. The alignment is not required, so we can just remove it for now. The old code is a hack as it depends on the buffer management to find the current column. If the alignment is really desirable, the proper way to do it is to pass in a formatted_raw_stream that knows the current column. llvm-svn: 239603	2015-06-12 12:42:13 +00:00
Reid Kleckner	2ccc557010	[WinEH] Put finally pointers in the handler scope table field We were putting them in the filter field, which is correct for 64-bit but wrong for 32-bit. Also switch the order of scope table entry emission so outermost entries are emitted first, and fix an obvious state assignment bug. llvm-svn: 239574	2015-06-11 23:37:18 +00:00
Juergen Ributzka	372830c96f	[Stackmaps][X86] Remove EFLAGS and IP registers from the live-out mask. Remove the EFLAGS from the stackmap live-out mask. The EFLAGS register is not supposed to be part of that set, because the X86 calling conventions mark the register as NOT preserved. Also remove the IP registers, since spilling and restoring those doesn't really make any sense. Related to rdar://problem/21019635. llvm-svn: 239568	2015-06-11 22:40:04 +00:00
Reid Kleckner	493c014968	[WinEH] Create an llvm.x86.seh.exceptioninfo intrinsic This intrinsic is like framerecover plus a load. It recovers the EH registration stack allocation from the parent frame and loads the exception information field out of it, giving back a pointer to an EXCEPTION_POINTERS struct. It's designed for clang to use in SEH filter expressions instead of accessing the EXCEPTION_POINTERS parameter that is available on x64. This required a minor change to MC to allow defining a label variable to another absolute framerecover label variable. llvm-svn: 239567	2015-06-11 22:32:23 +00:00
Daniel Sanders	ac381fec59	Replace string GNU Triples with llvm::Triple in TargetMachine. NFC. Summary: For the moment, TargetMachine::getTargetTriple() still returns a StringRef. This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. Reviewers: rengolin Reviewed By: rengolin Subscribers: ted, llvm-commits, rengolin, jholewinski Differential Revision: http://reviews.llvm.org/D10362 llvm-svn: 239554	2015-06-11 19:41:26 +00:00
Ahmed Bougacha	ee490f0abc	[CodeGen] ArrayRef'ize cond/pred in various TII APIs. NFC. llvm-svn: 239553	2015-06-11 19:30:37 +00:00
Rafael Espindola	4973f5abd5	This reverts commit r239529 and r239514. Revert "[AArch64] Match interleaved memory accesses into ldN/stN instructions." Revert "Fixing MSVC 2013 build error." The test/CodeGen/AArch64/aarch64-interleaved-accesses.ll test was failing on OS X. llvm-svn: 239544	2015-06-11 17:30:33 +00:00
Daniel Sanders	b5daf5341a	Replace string GNU Triples with llvm::Triple in computeDataLayout(). NFC. Summary: This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. Reviewers: rengolin Reviewed By: rengolin Subscribers: llvm-commits, jfb, rengolin Differential Revision: http://reviews.llvm.org/D10361 llvm-svn: 239538	2015-06-11 15:34:59 +00:00
Tom Stellard	3bff6f7e5e	R600/SI: Define latency for flat instructions llvm-svn: 239535	2015-06-11 14:51:50 +00:00
Tom Stellard	3b8fa5b676	R600/SI: Move flat instruction defs to CIInstructions.td llvm-svn: 239534	2015-06-11 14:51:49 +00:00
Aaron Ballman	34bd3c1b27	Fixing MSVC 2013 build error. llvm-svn: 239529	2015-06-11 13:06:02 +00:00
Toma Tabacu	69851ca2cd	Recommit "[mips] [IAS] Add support for BNE and BEQ with an immediate operand." (r239396). Apparently, Arcanist didn't include some of my local changes in my previous commit attempt. llvm-svn: 239523	2015-06-11 10:36:10 +00:00
Zoran Jovanovic	cee7a90cff	[mips][microMIPS] Implement ERET and ERETNC instructions http://reviews.llvm.org/D10091 llvm-svn: 239522	2015-06-11 10:22:46 +00:00
Zoran Jovanovic	16ce288b93	[mips] Change existing uimm10 operand to restrict the accepted immediates http://reviews.llvm.org/D10312 llvm-svn: 239520	2015-06-11 09:51:58 +00:00
Hao Liu	3ad5dd3f0c	[AArch64] Match interleaved memory accesses into ldN/stN instructions. Add a pass AArch64InterleavedAccess to identify and match interleaved memory accesses. This pass transforms an interleaved load/store into ldN/stN intrinsic. As Loop Vectorizor disables optimization on interleaved accesses by default, this optimization is also disabled by default. To enable it by "-aarch64-interleaved-access-opt=true" E.g. Transform an interleaved load (Factor = 2): %wide.vec = load <8 x i32>, <8 x i32>* %ptr %v0 = shuffle %wide.vec, undef, <0, 2, 4, 6> ; Extract even elements %v1 = shuffle %wide.vec, undef, <1, 3, 5, 7> ; Extract odd elements Into: %ld2 = { <4 x i32>, <4 x i32> } call aarch64.neon.ld2(%ptr) %v0 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 0 %v1 = extractelement { <4 x i32>, <4 x i32> } %ld2, i32 1 E.g. Transform an interleaved store (Factor = 2): %i.vec = shuffle %v0, %v1, <0, 4, 1, 5, 2, 6, 3, 7> ; Interleaved vec store <8 x i32> %i.vec, <8 x i32>* %ptr Into: %v0 = shuffle %i.vec, undef, <0, 1, 2, 3> %v1 = shuffle %i.vec, undef, <4, 5, 6, 7> call void aarch64.neon.st2(%v0, %v1, %ptr) llvm-svn: 239514	2015-06-11 09:05:02 +00:00
Simon Pilgrim	c3425b72b9	[X86][SSE] Vectorized i8 and i16 shift operators This patch ensures that SHL/SRL/SRA shifts for i8 and i16 vectors avoid scalarization. It builds on the existing i8 SHL vectorized implementation of moving the shift bits up to the sign bit position and separating the 4, 2 & 1 bit shifts with several improvements: 1 - SSE41 targets can use (v)pblendvb directly with the sign bit instead of performing a comparison to feed into a VSELECT node. 2 - pre-SSE41 targets were masking + comparing with an 0x80 constant - we avoid this by using the fact that a set sign bit means a negative integer which can be compared against zero to then feed into VSELECT, avoiding the need for a constant mask (zero generation is much cheaper). 3 - SRA i8 needs to be unpacked to the upper byte of a i16 so that the i16 psraw instruction can be correctly used for sign extension - we have to do more work than for SHL/SRL but perf tests indicate that this is still beneficial. The i16 implementation is similar but simpler than for i8 - we have to do 8, 4, 2 & 1 bit shifts but less shift masking is involved. SSE41 use of (v)pblendvb requires that the i16 shift amount is splatted to both bytes however. Tested on SSE2, SSE41 and AVX machines. Differential Revision: http://reviews.llvm.org/D9474 llvm-svn: 239509	2015-06-11 07:46:37 +00:00
Nemanja Ivanovic	d737ff7af7	LLVM support for vector quad bit permute and gather instructions through builtins This patch corresponds to review: http://reviews.llvm.org/D10096 This is the back end portion of the patch related to D10095. The patch adds the instructions and back end intrinsics for: vbpermq vgbbd llvm-svn: 239505	2015-06-11 06:21:25 +00:00
Reid Kleckner	8d217e6b48	Revert "Move dllimport name mangling to IR mangler." This reverts commit r239437. This broke clang-cl self-hosts. We'd end up calling the __imp_ symbol directly instead of using it to do an indirect function call. llvm-svn: 239502	2015-06-11 01:31:48 +00:00
Pete Cooper	8328fce315	Remove MachineModuleInfo::UsedFunctions as it has no users. It hasn't been used since r130964. This also removes MachineModuleInfo::isUsedFunction and MachineModuleInfo::AnalyzeModule, both of which were only there to support UsedFunctions. llvm-svn: 239501	2015-06-11 01:04:56 +00:00
Sanjay Patel	8266e384a1	change assert that will never fire to llvm_unreachable llvm-svn: 239497	2015-06-10 23:27:33 +00:00
Sanjay Patel	6b15a1a605	[x86] Add a reassociation optimization to increase ILP via the MachineCombiner pass This is a reimplementation of D9780 at the machine instruction level rather than the DAG. Use the MachineCombiner pass to reassociate scalar single-precision AVX additions (just a starting point; see the TODO comments) to increase ILP when it's safe to do so. The code is closely based on the existing MachineCombiner optimization that is implemented for AArch64. This patch should not cause the kind of spilling tragedy that led to the reversion of r236031. Differential Revision: http://reviews.llvm.org/D10321 llvm-svn: 239486	2015-06-10 20:32:21 +00:00
Colin LeMahieu	705d770c5e	[Hexagon] Adding decoders for signed operands and ensuring all signed operand types disassemble correctly. llvm-svn: 239477	2015-06-10 16:52:32 +00:00
Benjamin Kramer	dc5eda7447	[Hexagon] Make global arrays 'static const'. NFC. llvm-svn: 239475	2015-06-10 14:43:59 +00:00
Daniel Sanders	e37ebd59c5	Replace string GNU Triples with llvm::Triple in MCSubtargetInfo and create*MCSubtargetInfo(). NFC. Summary: This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. Reviewers: rafael Reviewed By: rafael Subscribers: rafael, ted, jfb, llvm-commits, rengolin, jholewinski Differential Revision: http://reviews.llvm.org/D10311 llvm-svn: 239467	2015-06-10 12:11:26 +00:00
Daniel Sanders	326a8d5bed	Replace string GNU Triples with llvm::Triple in create*MCRelocationInfo(). NFC. Summary: This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. Reviewers: rafael Reviewed By: rafael Subscribers: rafael, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10307 llvm-svn: 239465	2015-06-10 10:54:40 +00:00
Daniel Sanders	2da676c315	Replace string GNU Triples with llvm::Triple in MCAsmBackend subclasses and create*AsmBackend(). NFC. Summary: This continues the patch series to eliminate StringRef forms of GNU triples from the internals of LLVM that began in r239036. Reviewers: echristo, rafael Reviewed By: rafael Subscribers: rafael, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D10243 llvm-svn: 239464	2015-06-10 10:35:34 +00:00
Elena Demikhovsky	0e31a916e3	AVX-512: Fixed a bug in comparison of i1 vectors. cmp eq should give kxnor instruction cmp neq should give kxor https://llvm.org/bugs/show_bug.cgi?id=23631 llvm-svn: 239460	2015-06-10 06:49:28 +00:00
Craig Topper	90cbda743d	Remove unnecessary conversion from StringRef to std::string and back to StringRef. NFC. llvm-svn: 239455	2015-06-10 02:07:37 +00:00
Reid Kleckner	34c2802c0d	[WinEH] Call llvm.stackrestore in __except blocks We have to do this manually, the runtime only sets up ebp. Fixes a crash when returning after catching an exception. llvm-svn: 239451	2015-06-10 01:34:54 +00:00
Reid Kleckner	a2dfb4b154	[WinEH] Emit .safeseh directives for all 32-bit exception handlers Use a "safeseh" string attribute to do this. You would think we chould just accumulate the set of personalities like we do on dwarf, but this fails to account for the LSDA-loading thunks we use for __CxxFrameHandler3. Each of those needs to make it into .sxdata as well. The string attribute seemed like the most straightforward approach. llvm-svn: 239448	2015-06-10 01:02:30 +00:00
Peter Collingbourne	6f8524df44	Move dllimport name mangling to IR mangler. This ensures that LTO clients see the correct external symbol name. Differential Revision: http://reviews.llvm.org/D10318 llvm-svn: 239437	2015-06-09 22:09:53 +00:00
Jingyue Wu	2406860399	[NVPTX] fix a crash bug in NVPTXFavorNonGenericAddrSpaces Summary: We used to assume V->RAUW only modifies the operand list of V's user. However, if V and V's user are Constants, RAUW may replace and invalidate V's user entirely. This patch fixes the above issue by letting the caller replace the operand instead of calling RAUW on Constants. Test Plan: @nested_const_expr and @rauw in access-non-generic.ll Reviewers: broune, jholewinski Reviewed By: broune, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10345 llvm-svn: 239435	2015-06-09 21:50:32 +00:00
Reid Kleckner	4083ec51c4	[WinEH] Add 32-bit SEH state table emission prototype This gets all the handler info through to the asm printer and we can look at the .xdata tables now. I've convinced one small catch-all test case to work, but other than that, it would be a stretch to say this is functional. The state numbering algorithm avoids doing any scope reconstruction as we do for C++ to simplify the implementation. llvm-svn: 239433	2015-06-09 21:42:19 +00:00
Chad Rosier	0562bfe49d	[AArch64] Remove an overly conservative check when generating store pairs. Store instructions do not modify register values and therefore it's safe to form a store pair even if the source register has been read in between the two store instructions. Previously, the read of w1 (see below) prevented the formation of a stp. str w0, [x2] ldr w8, [x2, #8] add w0, w8, w1 str w1, [x2, #4] ret We now generate the following code. stp w0, w1, [x2] ldr w8, [x2, #8] add w0, w8, w1 ret All correctness tests with -Ofast on A57 with Spec200x and EEMBC pass. Performance results for SPEC2K were within noise. llvm-svn: 239432	2015-06-09 20:59:41 +00:00
Akira Hatanaka	c42437a4f8	Remove DisableTailCalls from TargetOptions and the code in resetTargetOptions that was resetting it. Remove the uses of DisableTailCalls in subclasses of TargetLowering and use the value of function attribute "disable-tail-calls" instead. Also, unconditionally add pass TailCallElim to the pipeline and check the function attribute at the start of runOnFunction to disable the pass on a per-function basis. This is part of the work to remove TargetMachine::resetTargetOptions, and since DisableTailCalls was the last non-fast-math option that was being reset in that function, we should be able to remove the function entirely after the work to propagate IR-level fast-math flags to DAG nodes is completed. Out-of-tree users should remove the uses of DisableTailCalls and make changes to attach attribute "disable-tail-calls"="true" or "false" to the functions in the IR. rdar://problem/13752163 Differential Revision: http://reviews.llvm.org/D10099 llvm-svn: 239427	2015-06-09 19:07:19 +00:00
Samuel Antao	1e229d55ce	The constant initialization for globals in NVPTX is generated as an array of bytes. The generation of this byte arrays was expecting the host to be little endian, which prevents big endian hosts to be used in the generation of the PTX code. This patch fixes the problem by changing the way the bytes are extracted so that it works for either little and big endian. llvm-svn: 239412	2015-06-09 16:29:34 +00:00
Toma Tabacu	0113773d53	Recommit "[mips] [IAS] Restore STI.FeatureBits in .set pop." (r239144). Specified the llvm namespace for the 2 calls to make_unique() which caused compilation errors in Visual Studio 2013. llvm-svn: 239405	2015-06-09 13:33:26 +00:00
Elena Demikhovsky	b46f844518	X86-MPX: Implemented encoding for MPX instructions. Added encoding tests. llvm-svn: 239403	2015-06-09 13:02:10 +00:00
Aaron Ballman	0856283b6f	Removing spurious semi colons; NFC. llvm-svn: 239399	2015-06-09 12:03:46 +00:00
Toma Tabacu	6c50fe0db3	Revert "[mips] [IAS] Add support for BNE and BEQ with an immediate operand." (r239396). It was breaking buildbots. llvm-svn: 239397	2015-06-09 10:43:49 +00:00
Toma Tabacu	141f41f8f0	[mips] [IAS] Add support for BNE and BEQ with an immediate operand. Summary: For some branches, GAS accepts an immediate instead of the 2nd register operand. We only implement this for BNE and BEQ for now. Other branch instructions can be added later, if needed. Reviewers: dsanders Reviewed By: dsanders Subscribers: seanbruno, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D9666 llvm-svn: 239396	2015-06-09 10:34:31 +00:00
Daniel Sanders	1835a8f945	[nvptx] Only support the 'm' inline assembly memory constraint. NFC. Summary: NVPTX doesn't seem to support any additional constraints. Therefore remove the target hook. No functional change intended. Reviewers: jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8209 llvm-svn: 239395	2015-06-09 10:34:05 +00:00
Matt Arsenault	8d14eecae1	R600: Switch to using generic min / max nodes. llvm-svn: 239377	2015-06-09 00:52:37 +00:00
Matt Arsenault	8c9e05929c	MC: Add target hook to control symbol quoting llvm-svn: 239370	2015-06-09 00:31:39 +00:00
Jingyue Wu	c74965ca17	[NVPTX] run SROA after NVPTXFavorNonGenericAddrSpaces Summary: This cleans up most allocas NVPTXLowerKernelArgs emits for byval parameters. Test Plan: makes bug21465.ll more stronger to verify no redundant local load/store. Reviewers: eliben, jholewinski Reviewed By: eliben, jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10322 llvm-svn: 239368	2015-06-09 00:05:56 +00:00
Reid Kleckner	a0a1b056e6	[WinEH] Cache declarations of frame intrinsics llvm-svn: 239361	2015-06-08 22:43:32 +00:00
Reid Kleckner	64ffb91c79	Fix clang-cl self-host -Wc++11-narrowing bug Use unsigned as the underlying storage type of the AMDGPU address space enum. llvm-svn: 239355	2015-06-08 21:57:57 +00:00
Ranjeet Singh	89463edae6	[AArch64] AsmParser should be case insensitive about accepting vector register names. Differential Revision: http://reviews.llvm.org/D10320 llvm-svn: 239353	2015-06-08 21:32:16 +00:00
Keno Fischer	154ce9f3df	[InstrInfo] Refactor foldOperandImpl to thread through InsertPt. NFC Summary: This was a longstanding FIXME and is a necessary precursor to cases where foldOperandImpl may have to create more than one instruction (e.g. to constrain a register class). This is the split out NFC changes from D6262. Reviewers: pete, ributzka, uweigand, mcrosier Reviewed By: mcrosier Subscribers: mcrosier, ted, llvm-commits Differential Revision: http://reviews.llvm.org/D10174 llvm-svn: 239336	2015-06-08 20:09:58 +00:00
Akira Hatanaka	76bd57e472	[ARM] Pass a callback to FunctionPass constructors to enable skipping execution on a per-function basis. Previously some of the passes were conditionally added to ARM's pass pipeline based on the target machine's subtarget. This patch makes changes to add those passes unconditionally and execute them conditonally based on the predicate functor passed to the pass constructors. This enables running different sets of passes for different functions in the module. rdar://problem/20542263 Differential Revision: http://reviews.llvm.org/D8717 llvm-svn: 239325	2015-06-08 18:50:43 +00:00
Pete Cooper	226ceaabfc	Remove includes of MCMachOSymbolFlags.h after it was deleted llvm-svn: 239318	2015-06-08 17:25:57 +00:00
Matthias Braun	2d0e1092b5	X86: Reject register operands with obvious type mismatches. While we have some code to transform specification like {ax} into {eax}/{rax} if the operand type isn't 16bit, we should reject cases where there is no sane way to do this, like the i128 type in the example. Related to rdar://21042280 Differential Revision: http://reviews.llvm.org/D10260 llvm-svn: 239309	2015-06-08 16:56:23 +00:00
Colin LeMahieu	e930f8c3f6	[Hexagon] Adding functionality for searching for compound instruction pairs. Compound instructions reduce slot resource requirements freeing those packet slots up for more instructions. llvm-svn: 239307	2015-06-08 16:34:47 +00:00
Javed Absar	125fd5df95	ARM]: Add support for MMFR4_EL1 in assembler This patch adds support for system register MMFR4_EL1 (memory model feature register) in the assembler. This register provides information about the implemented memory model and memory management support. llvm-svn: 239302	2015-06-08 15:01:11 +00:00
Igor Breger	545f43a067	AVX-512: Implemented 256/128bit VALIGND/Q instructions for SKX and KNL Implemented DAG lowering for all these forms. Added tests for DAG lowering and encoding. Differential Revision: http://reviews.llvm.org/D10310 llvm-svn: 239300	2015-06-08 14:03:17 +00:00
Simon Pilgrim	9f0e1e3606	[X86] Added BitScanForward/BitScanReverse memory folding + tests llvm-svn: 239257	2015-06-07 18:34:25 +00:00
Rafael Espindola	dc62dbfafc	Handle 16 bit PC relative relocations. Fixes pr23771. llvm-svn: 239214	2015-06-06 02:29:56 +00:00
Peter Collingbourne	3003aae9ae	Revert r238473, "Thumb2: Modify codegen for memcpy intrinsic to prefer LDM/STM." as it caused miscompilations and assertion failures (PR23768, http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150601/280380.html). llvm-svn: 239169	2015-06-05 18:01:28 +00:00
Alexei Starovoitov	457dd6a66f	[bpf] rename triple names bpf_be -> bpfeb llvm-svn: 239162	2015-06-05 16:11:14 +00:00
Colin LeMahieu	b9726af751	[Hexagon] Reapply r239097 with tests corrected for shuffling and duplexing. llvm-svn: 239161	2015-06-05 16:00:11 +00:00
Benjamin Kramer	eaeddf4864	[ARM] Make helper function static. This one had a declaration but it differed from the definition so the declaration was actually dead. llvm-svn: 239157	2015-06-05 14:32:54 +00:00
John Brawn	4ef1f45b4f	[ARM] Add support for -sp- FPUs and FPU none to TargetParser These are added mainly for the benefit of clang, but this also means that they are now allowed in .fpu directives and we emit the correct .fpu directive when single-precision-only is used. Differential Revision: http://reviews.llvm.org/D10238 llvm-svn: 239151	2015-06-05 13:31:19 +00:00
John Brawn	38460915a2	[ARM] Add knowledge of FPU subtarget features to TargetParser Add getFPUFeatures to TargetParser, which gets the list of subtarget features that are enabled/disabled for each FPU, and use it when handling the .fpu directive. No functional change in this commit, though clang will start behaving differently once it starts using this. Differential Revision: http://reviews.llvm.org/D10237 llvm-svn: 239150	2015-06-05 13:29:24 +00:00
Toma Tabacu	097d24151b	Revert "[mips] [IAS] Restore STI.FeatureBits in .set pop." (r239144). This is breaking the Windows buildbots. llvm-svn: 239145	2015-06-05 12:19:27 +00:00
Toma Tabacu	8e641889eb	[mips] [IAS] Restore STI.FeatureBits in .set pop. Summary: Only restoring AvailableFeatures is not enough and will lead to buggy behaviour. For example, if we have a feature enabled and we ".set pop", the next time we try to ".set" that feature nothing will happen because the "!(STI.getFeatureBits()[Feature])" check will be false, because we didn't restore STI.FeatureBits. In order to fix this, we need to make MipsAssemblerOptions remember the STI.FeatureBits instead of the AvailableFeatures and then regenerate AvailableFeatures each time we ".set pop". This is because, AFAIK, there is no way to convert from AvailableFeatures back to STI.FeatureBits, but the reverse is possible by using ComputeAvailableFeatures(STI.FeatureBits). I also moved the updating of AssemblerOptions inside the "if" statement in setFeatureBits() and clearFeatureBits(), as there is no reason to update if nothing changes. Reviewers: dsanders, mkuper Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9156 llvm-svn: 239144	2015-06-05 11:48:54 +00:00
Jim Grosbach	39b6b1defc	MC: Clean up the naming for MCMachObjectWriter. NFC. s/ExecutePostLayoutBinding/executePostLayoutBinding/ s/ComputeSymbolTable/computeSymbolTable/ s/BindIndirectSymbols/bindIndirectSymbols/ s/RecordTLVPRelocation/recordTLVPRelocation/ s/RecordScatteredRelocation/recordScatteredRelocation/ s/WriteLinkerOptionsLoadCommand/writeLinkerOptionsLoadCommand/ s/WriteLinkeditLoadCommand/writeLinkeditLoadCommand/ s/WriteNlist/writeNlist/ s/WriteDysymtabLoadCommand/writeDysymtabLoadCommand/ s/WriteSymtabLoadCommand/writeSymtabLoadCommand/ s/WriteSection/writeSection/ s/WriteSegmentLoadCommand/writeSegmentLoadCommand/ s/WriteHeader/writeHeader/ llvm-svn: 239119	2015-06-04 23:25:54 +00:00
Charles Davis	bd41682a42	[Target/X86] Don't use callee-saved registers in a Win64 tail call on non-Windows. Summary: A small bit that I missed when I updated the X86 backend to account for the Win64 calling convention on non-Windows. Now we don't use dead non-volatile registers when emitting a Win64 indirect tail call on non-Windows. Should fix PR23710. Test Plan: Added test for the correct behavior based on the case I posted to PR23710. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10258 llvm-svn: 239111	2015-06-04 22:50:05 +00:00
Jim Grosbach	e76e79548b	MC: Clean up naming in MCObjectWriter. NFC. s/WriteObject/writeObject/ s/RecordRelocation/recordRelocation/ s/IsSymbolRefDifferenceFullyResolved/isSymbolRefDifferenceFullyResolved/ s/Write8/write8/ s/WriteLE16/writeLE16/ s/WriteLE32/writeLE32/ s/WriteLE64/writeLE64/ s/WriteBE16/writeBE16/ s/WriteBE32/writeBE32/ s/WriteBE64/writeBE64/ s/Write16/write16/ s/Write32/write32/ s/Write64/write64/ s/WriteZeroes/writeZeroes/ s/WriteBytes/writeBytes/ llvm-svn: 239108	2015-06-04 22:24:41 +00:00
Colin LeMahieu	b1cfdef068	Revert r239095 incorrect test tree. llvm-svn: 239102	2015-06-04 21:32:42 +00:00
Jingyue Wu	efaa872a9c	[NVPTX] roll forward r239082 NVPTXISelDAGToDAG translates "addrspacecast to param" to NVPTX::nvvm_ptr_gen_to_param Added an llc test in bug21465. llvm-svn: 239100	2015-06-04 21:28:26 +00:00
Colin LeMahieu	a8ad0ad05b	[Hexagon] Removing unused variable. llvm-svn: 239097	2015-06-04 21:22:12 +00:00
Colin LeMahieu	cdf9553102	[Hexagon] Adding functionality for duplexing. Duplexing is a way to compress commonly used pairs of instructions in order to reduce code size. The test case duplex.ll normally would be 8 bytes, assign register to 0 and jump to link register. After duplexing this is only 4 bytes. This also tests the HexagonMCShuffler code path which is used to make sure duplexed instructions still follow slot requirements. llvm-svn: 239095	2015-06-04 21:16:16 +00:00
Jingyue Wu	aaaab32848	Revert r239082 llc crashed for NVPTX backend llvm-svn: 239094	2015-06-04 21:07:08 +00:00
Ahmed Bougacha	59fec45c7d	[GlobalMerge] Take into account minsize on Global users' parents. Now that we can look at users, we can trivially do this: when we would have otherwise disabled GlobalMerge (currently -O<3), we can just run it for minsize functions, as it's usually a codesize win. Differential Revision: http://reviews.llvm.org/D10054 llvm-svn: 239087	2015-06-04 20:39:23 +00:00
Jim Grosbach	3a8310cc67	MC: Remove obsolete MachO UseAggressiveSymbolFolding. Fix the FIXME and remove this old as(1) compat option. It was useful for bringup of the integrated assembler to diff object files, but now it's just causing more relocations than strictly necessary to be generated. rdar://21201804 llvm-svn: 239084	2015-06-04 20:27:42 +00:00
Jingyue Wu	98b56fca22	[NVPTX] kernel pointer arguments point to the global address space Summary: With this patch, NVPTXLowerKernelArgs converts a kernel pointer argument to a pointer in the global address space. This change, along with NVPTXFavorNonGenericAddrSpaces, allows the NVPTX backend to emit ld.global.* and st.global.* for accessing kernel pointer arguments. Minor changes: 1. refactor: extract function convertToPointerInAddrSpace 2. fix a bug in the test case in bug21465.ll Test Plan: lower-kernel-ptr-arg.ll Reviewers: eliben, meheff, jholewinski Reviewed By: jholewinski Subscribers: wengxt, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D10154 llvm-svn: 239082	2015-06-04 20:19:38 +00:00
Alexei Starovoitov	8cc8aba19c	[bpf] add big- and host- endian support Summary: -march=bpf -> host endian -march=bpf_le -> little endian -match=bpf_be -> big endian Test Plan: v1 was tested by IBM s390 guys and appears to be working there. It bit rots too fast here. Reviewers: chandlerc, tstellarAMD Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D10177 llvm-svn: 239071	2015-06-04 19:15:05 +00:00
Matt Arsenault	50655128a5	R600/SI: Reimplement isLegalAddressingMode Now that we sometimes know the address space, this can theoretically do a better job. This needs better test coverage, but this mostly depends on first updating the loop optimizatiosn to provide the address space. llvm-svn: 239053	2015-06-04 16:17:42 +00:00
Matt Arsenault	b80f84293e	R600/SI: Fix some cases for load / store of half Mostly argument loads were producing broken zextloads from an FP type. llvm-svn: 239049	2015-06-04 16:00:27 +00:00
Benjamin Kramer	55692fb42d	Replace custom fixed endian to raw_ostream emission with EndianStream. Less code, clearer and more efficient. No functionality change intended. llvm-svn: 239040	2015-06-04 15:03:02 +00:00
Daniel Sanders	06c811431c	Replace string GNU Triples with llvm::Triple in MCAsmInfo subclasses and create*AsmInfo(). NFC. Summary: This is the first of several patches to eliminate StringRef forms of GNU triples from the internals of LLVM. After this is complete, GNU triples will be replaced by a more authoratitive representation in the form of an LLVM TargetTuple. Reviewers: rengolin Reviewed By: rengolin Subscribers: ted, llvm-commits, rengolin, jholewinski Differential Revision: http://reviews.llvm.org/D10236 llvm-svn: 239036	2015-06-04 13:12:25 +00:00
Elena Demikhovsky	682c692fe7	AVX-512: I brought back vector-shuffle-512-v8.ll test. I re-generated it after all AVX-512 shuffle optimizations. llvm-svn: 239026	2015-06-04 07:49:56 +00:00
Elena Demikhovsky	2b7fd2c6ef	AVX-512: added all SKX forms of VPERMW/D/Q instructions. Added all forms of VPERMPS/PD instrcuctions. Added encoding tests. llvm-svn: 239016	2015-06-04 07:07:13 +00:00
Elena Demikhovsky	51b2050982	Removed {}, NFC. llvm-svn: 239014	2015-06-04 07:01:29 +00:00
Rafael Espindola	58fdd31cd1	Bring back r239006 with a fix. The fix is just that getOther had not been updated for packing the st_other values in fewer bits and could return spurious values: - unsigned Other = (getFlags() & (0x3f << ELF_STO_Shift)) >> ELF_STO_Shift; + unsigned Other = (getFlags() & (0x7 << ELF_STO_Shift)) >> ELF_STO_Shift; Original message: Pack the MCSymbolELF bit fields into MCSymbol's Flags. This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64. While at it, also make getOther/setOther easier to use by accepting unshifted STO_* values. llvm-svn: 239012	2015-06-04 05:59:23 +00:00
Rafael Espindola	ab39f50035	Revert "Pack the MCSymbolELF bit fields into MCSymbol's Flags." This reverts commit r239006. I am debugging the powerpc failures. llvm-svn: 239010	2015-06-04 05:00:12 +00:00
Rafael Espindola	e2a1190807	Pack the MCSymbolELF bit fields into MCSymbol's Flags. This reduces MCSymolfELF from 64 bytes to 56 bytes on x86_64. While at it, also make getOther/setOther easier to use by accepting unshifted STO_* values. llvm-svn: 239006	2015-06-04 02:32:20 +00:00
Sanjay Patel	d41c35dc2f	make reciprocal estimate code generation more flexible by adding command-line options (3rd try) The first try (r238051) to land this was reverted due to ExecutionEngine build failure; that was hopefully addressed by r238788. The second try (r238842) to land this was reverted due to BUILD_SHARED_LIBS failure; that was hopefully addressed by r238953. This patch adds a TargetRecip class for processing many recip codegen possibilities. The class is intended to handle both command-line options to llc as well as options passed in from a front-end such as clang with the -mrecip option. The x86 backend is updated to use the new functionality. Only -mcpu=btver2 with -ffast-math should see a functional change from this patch. All other x86 CPUs continue to not use reciprocal estimates by default with -ffast-math. Differential Revision: http://reviews.llvm.org/D8982 llvm-svn: 239001	2015-06-04 01:32:35 +00:00
Tom Stellard	ff4a31fb85	R600: Re-enable sub-reg liveness The bug in the R600 backend that this uncovered has been fixed. llvm-svn: 238999	2015-06-04 01:20:04 +00:00
Rafael Espindola	f7db6d4a8a	Remove MCELFSymbolFlags.h. It is now internal to MCSymbolELF. llvm-svn: 238996	2015-06-04 00:47:43 +00:00
Rafael Espindola	5d3bece91c	Remove getOrCreateSymbolData. There is no MCSymbolData anymore. llvm-svn: 238952	2015-06-03 19:03:11 +00:00
Colin LeMahieu	9a3629f84a	[Hexagon] Test doesn't work on all platforms. At any rate the uninitialized variable issue was fixed. Removing re-registering ASM backend. llvm-svn: 238949	2015-06-03 18:00:45 +00:00
Colin LeMahieu	b480f20398	[Hexagon] Reapply 238772 OSABI was not correctly set, added empty_elf test to make sure it is. llvm-svn: 238947	2015-06-03 17:34:16 +00:00
Matthias Braun	80e08ad23e	ARM: Thumb2 LDRD/STRD supports independent input/output regs The existing code would unnecessarily break LDRD/STRD apart with non-adjacent registers, on thumb2 this is not necessary. Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore as there is not reason to set register hints anymore, changing that is something for a future patch however. Differential Revision: http://reviews.llvm.org/D9694 Recommiting after the revert in r238821, the buildbot still failed with the patch removed so there seems to be another reason for the breakage. llvm-svn: 238935	2015-06-03 16:30:24 +00:00
Daniel Sanders	88d2b6b8ee	[arm] Fix r238921. We must handle Constraint_i too. llvm-svn: 238925	2015-06-03 14:17:18 +00:00
Asaf Badouh	08f13fa0ba	re-apply 238809 AVX-512: Implemented GETEXP instruction for KNL and SKX Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. CR: http://reviews.llvm.org/D9991 llvm-svn: 238923	2015-06-03 13:41:48 +00:00
Daniel Sanders	d73d802d93	[arm] Distinguish the /U[qytnms]/, 'Uv', 'Q', and 'm' inline assembly memory constraints. Summary: But still handle them the same way since I don't know how they differ on this target. Of these, /U[qytnms]/ do not have backend tests but are accepted by clang. No functional change intended. Reviewers: t.p.northover Reviewed By: t.p.northover Subscribers: t.p.northover, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D8203 llvm-svn: 238921	2015-06-03 12:33:56 +00:00
Elena Demikhovsky	e6e69ab1a5	AVX-512: More code improvements in shuffles, NFC llvm-svn: 238919	2015-06-03 12:05:03 +00:00
Elena Demikhovsky	179cf4948d	AVX-512: VSHUFPD instruction selection - code improvements llvm-svn: 238918	2015-06-03 11:21:01 +00:00
Elena Demikhovsky	13b85a4aa6	AVX-512: Implemented SHUFF32x4/SHUFF64x2/SHUFI32x4/SHUFI64x2 instructions for SKX and KNL. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238917	2015-06-03 10:56:40 +00:00
Elena Demikhovsky	a1ed8ca184	X86: Added MPX feature and bound registers. Intel® Memory Protection Extensions (Intel® MPX) is a new feature in Skylake. It is a part of KNL and SKX sets. It is also a part of Skylake client. I added definition of %bnd0 - %bnd3 registers, each register is a pair of 64-bit integers. llvm-svn: 238916	2015-06-03 10:30:57 +00:00
Simon Pilgrim	5a028d9698	[X86] Removed (unused) FSRL x86 operation This patch removes the old X86ISD::FSRL op - which allowed float vectors to use the byte right shift operations (causing a domain switch....). Since the refactoring of the shuffle lowering code this no longer has any use. Differential Revision: http://reviews.llvm.org/D10169 llvm-svn: 238906	2015-06-03 08:32:36 +00:00
Rafael Espindola	81fe124a03	Revert "make reciprocal estimate code generation more flexible by adding command-line options (2nd try)" This reverts commit r238842. It broke -DBUILD_SHARED_LIBS=ON build. llvm-svn: 238900	2015-06-03 05:32:44 +00:00
Rafael Espindola	672d8ae68d	Avoid a call to getOrCreateSymbol when we already have the symbol. llvm-svn: 238890	2015-06-03 00:02:40 +00:00
Rafael Espindola	2e71400006	Pass a MCSymbolELF to a few ELF only functions. NFC. llvm-svn: 238868	2015-06-02 21:30:13 +00:00
Rafael Espindola	1fc8198d33	Merge MCELF.h into MCSymbolELF.h. Now that we have a dedicated type for ELF symbol, these helper functions can become member function of MCSymbolELF. llvm-svn: 238864	2015-06-02 20:38:46 +00:00
Tim Northover	0f51120873	AArch64: fix typo in SMIN far atomics and add tests llvm-svn: 238858	2015-06-02 18:37:20 +00:00
Benjamin Kramer	9139d0a501	Push constness through LoopInfo::isLoopHeader and clean it up a bit. NFC. llvm-svn: 238843	2015-06-02 15:28:27 +00:00
Sanjay Patel	006dd407ed	make reciprocal estimate code generation more flexible by adding command-line options (2nd try) The first try (r238051) to land this was reverted due to bot failures that were hopefully addressed by r238788. This patch adds a TargetRecip class for processing many recip codegen possibilities. The class is intended to handle both command-line options to llc as well as options passed in from a front-end such as clang with the -mrecip option. The x86 backend is updated to use the new functionality. Only -mcpu=btver2 with -ffast-math should see a functional change from this patch. All other x86 CPUs continue to not use reciprocal estimates by default with -ffast-math. Differential Revision: http://reviews.llvm.org/D8982 llvm-svn: 238842	2015-06-02 15:28:15 +00:00
Elena Demikhovsky	c14282d277	AVX-512: Implemented VRANGESD and VRANGESS instructions for SKX Implemented DAG lowering for all these forms. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238834	2015-06-02 14:12:54 +00:00
Elena Demikhovsky	d91fbd97b2	AVX-512: Shorten implementation of lowerV16X32VectorShuffle() using lowerVectorShuffleWithSHUFPS() and other shuffle-helpers routines. Added matching of VALIGN instruction. llvm-svn: 238830	2015-06-02 13:43:18 +00:00
Vasileios Kalintiris	ab07da4ca9	[mips] Add support for dynamic stack realignment. Summary: With this change we are able to realign the stack dynamically, whenever it contains objects with alignment requirements that are larger than the alignment specified from the given ABI. We have to use the $fp register as the frame pointer when we perform dynamic stack realignment. In complex stack frames, with variably-sized objects, we reserve additionally the callee-saved register $s7 as the base pointer in order to reference locals. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8633 llvm-svn: 238829	2015-06-02 13:14:46 +00:00
Renato Golin	ff58af5431	Revert "ARM: Thumb2 LDRD/STRD supports independent input/output regs" This reverts commit r238795, as it broke the Thumb2 self-hosting buildbot. Since self-hosting issues with Clang are hard to investigate, I'm taking the liberty to revert now, so we can investigate it offline. llvm-svn: 238821	2015-06-02 11:47:30 +00:00
Vladimir Sukharev	42d76a2a46	[AArch64] Add v8.1a atomic instructions Patch by: Tom Coxon Reviewers: t.p.northover Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8501 llvm-svn: 238818	2015-06-02 10:58:41 +00:00
Toma Tabacu	7a6df5b9ac	[mips] [IAS] Add support for the .set softfloat/hardfloat directives. Summary: These directives are used to set the current value of the SoftFloat feature. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, mpf Differential Revision: http://reviews.llvm.org/D9074 llvm-svn: 238813	2015-06-02 09:48:04 +00:00
Elena Demikhovsky	9402ebb636	AVX-512: Implemented VFIXUPIMMSD and VFIXUPIMMSS instructions for KNL Implemented DAG lowering for all these forms. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238811	2015-06-02 08:28:57 +00:00
Asaf Badouh	f8387bd5f5	revert 238809 llvm-svn: 238810	2015-06-02 07:45:19 +00:00
Asaf Badouh	9a55f1d0aa	AVX-512: Implemented GETEXP instruction for KNL and SKX Added rounding mode modifier for SQRTPS/PD Added tests for encoding and intrinsics. llvm-svn: 238809	2015-06-02 07:18:14 +00:00
Rafael Espindola	f9aa800569	Create a MCSymbolELF. This create a MCSymbolELF class and moves SymbolSize since only ELF needs a size expression. This reduces the size of MCSymbol from 56 to 48 bytes. llvm-svn: 238801	2015-06-02 00:25:12 +00:00
Matthias Braun	7f4928846c	ARM: Thumb2 LDRD/STRD supports independent input/output regs The existing code would unnecessarily break LDRD/STRD apart with non-adjacent registers, on thumb2 this is not necessary. Ideally on thumb2 we shouldn't match for ldrd/strd pre-regalloc anymore as there is not reason to set register hints anymore, changing that is something for a future patch however. Differential Revision: http://reviews.llvm.org/D9694 llvm-svn: 238795	2015-06-01 23:27:08 +00:00
Matthias Braun	0c511ee9db	AArch64: Use CMP;CCMP sequences for and/or/setcc trees. Previously CCMP/FCCMP instructions were only used by the AArch64ConditionalCompares pass for control flow. This patch uses them for SELECT like instructions as well by matching patterns in ISelLowering. PR20927, rdar://18326194 Differential Revision: http://reviews.llvm.org/D8232 llvm-svn: 238793	2015-06-01 22:31:17 +00:00
Alexei Starovoitov	fc64519bf8	[bpf] fix build fix breakage due to r238634 Patch by Vijay Subramanian. llvm-svn: 238792	2015-06-01 22:24:36 +00:00
Matt Arsenault	1306de8969	R600/SI: Don't hardcode pointer type llvm-svn: 238789	2015-06-01 21:58:24 +00:00
Matthias Braun	4db11611d9	ARMLoadStoreOptimizer: Fix doxygen comments; NFC llvm-svn: 238784	2015-06-01 21:26:23 +00:00
Rafael Espindola	db58275a9f	Revert "[Hexagon] Adding basic ELF relocation generation and testing advanced relaxation codepath." This reverts commit r238748. It broke the msan bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/4372/steps/check-llvm%20msan/logs/stdio llvm-svn: 238772	2015-06-01 19:20:47 +00:00
Vasileios Kalintiris	342dea971f	[mips][FastISel] Implement bswap. Summary: Implement bswap intrinsic for MIPS FastISel. It's very different for misp32 r1/r2 . Based on a patch by Reed Kotler. Test Plan: bswap1.ll test-suite Reviewers: dsanders, rkotler Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D7219 llvm-svn: 238760	2015-06-01 16:40:45 +00:00
Vasileios Kalintiris	62afb8d1e4	[mips][FastISel] Implement intrinsics memset, memcopy & memmove. Summary: Implement the intrinsics memset, memcopy and memmove in MIPS FastISel. Make some needed infrastructure fixes so that this can work. Based on a patch by Reed Kotler. Test Plan: memtest1.ll The patch passes test-suite for mips32 r1/r2 and at O0/O2 Reviewers: rkotler, dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D7158 llvm-svn: 238759	2015-06-01 16:36:01 +00:00
Vasileios Kalintiris	a40244af91	[mips][FastISel] Implement srem/urem and sdiv/udiv instructions. Summary: Implement the LLVM assembly urem/srem and sdiv/udiv instructions in MIPS FastISel. Based on a patch by Reed Kotler. Test Plan: srem1.ll div1.ll test-suite at O0/O2 for mips32 r1/r2 Reviewers: dsanders, rkotler Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D7028 llvm-svn: 238757	2015-06-01 16:17:37 +00:00
Vasileios Kalintiris	22a5251930	[mips][FastISel] Implement the select statement for MIPS FastISel. Summary: Implement the LLVM IR select statement for MIPS FastISelsel. Based on a patch by Reed Kotler. Test Plan: "Make check" test included now. Passes test-suite at O2/O0 mips32 r1/r2. Reviewers: dsanders, rkotler Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D6774 llvm-svn: 238756	2015-06-01 15:56:40 +00:00
Vasileios Kalintiris	b637264ad2	[mips][FastISel] Clobber HI0/LO0 registers in MUL instructions. Summary: The contents of the HI/LO registers are unpredictable after the execution of the MUL instruction. In addition to implicitly defining these registers in the MUL instruction definition, we have to mark those registers as dead too. Without this the fast register allocator is running out of registers when the MUL instruction is followed by another one that tries to allocate the AC0 register. Based on a patch by Reed Kotler. Reviewers: dsanders, rkotler Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D9825 llvm-svn: 238755	2015-06-01 15:48:09 +00:00
Rafael Espindola	2945c6afc6	Fix relocation selection for foo-. on mips. This handles only the 32 bit case. llvm-svn: 238751	2015-06-01 15:10:51 +00:00
Rafael Espindola	90c2b96b93	Simplify code, NFC. llvm-svn: 238750	2015-06-01 14:58:29 +00:00
Colin LeMahieu	027bd2baa3	[Hexagon] Adding basic ELF relocation generation and testing advanced relaxation codepath. llvm-svn: 238748	2015-06-01 14:51:26 +00:00
Elena Demikhovsky	ba8112a901	AVX-512: Optimized vector shuffle for v16f32 and v16i32 types. llvm-svn: 238743	2015-06-01 13:26:18 +00:00
Luke Cheeseman	4fff08f837	Re-commit of r238201 with fix for building with shared libraries. llvm-svn: 238739	2015-06-01 12:02:47 +00:00
Elena Demikhovsky	9e9a44e5bd	AVX-512: Implemented VRANGEPD and VRANGEPD instructions for SKX. Implemented DAG lowering for all these forms. Added tests for encoding. By Igor Breger (igor.breger@intel.com) llvm-svn: 238738	2015-06-01 11:05:34 +00:00
Elena Demikhovsky	0afba4941b	AVX-512: Implemented vector shuffle lowering for v8i64 and v8f64 types. I removed the vector-shuffle-512-v8.ll, it is auto-generated test, not valid any more. llvm-svn: 238735	2015-06-01 09:49:53 +00:00
Elena Demikhovsky	12406985ca	AVX-512: added all forms of VPSHUFD and VPSHUFHW, VPSHUFLW including encodings. llvm-svn: 238729	2015-06-01 07:17:23 +00:00
Elena Demikhovsky	9db95755e6	AVX-512: Implemented VFIXUPIMMPD and VFIXUPIMMPS instructions for KNL and SKX Implemented DAG lowering for all these forms. Added tests for encoding. by Igor Breger (igor.breger@intel.com) llvm-svn: 238728	2015-06-01 06:50:49 +00:00
Elena Demikhovsky	bca790accb	AVX-512: Fixed a bug in compress and expand intrinsics. By Igor Breger (igor.breger@intel.com) llvm-svn: 238724	2015-06-01 06:30:13 +00:00
Matt Arsenault	b0334192af	Add address space argument to isLegalAddressingMode This is important because of different addressing modes depending on the address space for GPU targets. This only adds the argument, and does not update any of the uses to provide the correct address space. llvm-svn: 238723	2015-06-01 05:31:59 +00:00
Rafael Espindola	5d79b3bd90	Simplify another function that doesn't fail. llvm-svn: 238703	2015-06-01 00:27:26 +00:00
NAKAMURA Takumi	1533760873	ARMConstantIslandPass.cpp: Prune an empty \brief. [-Wdocumentation] llvm-svn: 238697	2015-05-31 23:05:35 +00:00
Colin LeMahieu	1e71c87eeb	[Hexagon] Including raw_ostream for debug builds. llvm-svn: 238695	2015-05-31 22:29:33 +00:00
Colin LeMahieu	f339dd00b7	[Hexagon] classes are actually structs. llvm-svn: 238694	2015-05-31 22:18:42 +00:00
Colin LeMahieu	fd4f2786fc	[Hexagon] Adding MC packet shuffler. llvm-svn: 238692	2015-05-31 21:57:09 +00:00
Tim Northover	0eb976c493	ARM: recommit r237590: allow jump tables to be placed as constant islands. The original version didn't properly account for the base register being modified before the final jump, so caused miscompilations in Chromium and LLVM. I've fixed this and tested with an LLVM self-host (I don't have the means to build & test Chromium). The general idea remains the same: in pathological cases jump tables can be too far away from the instructions referencing them (like other constants) so they need to be movable. Should fix PR23627. llvm-svn: 238680	2015-05-31 19:22:07 +00:00
Colin LeMahieu	60beb2e4b6	[Hexagon] Adding override specifier and removing erroneous assertion llvm-svn: 238664	2015-05-30 20:03:07 +00:00
Colin LeMahieu	c59d7c9784	[Hexagon] Adding basic relaxation functionality. llvm-svn: 238660	2015-05-30 18:55:47 +00:00
Simon Pilgrim	280d31052e	Stripped trailing whitespace. NFC. llvm-svn: 238654	2015-05-30 13:01:42 +00:00
Renato Golin	068c4bcd5c	Comment change. NFC That comment misleads the current discussions in mentioned bug. Leave the discussions to the bug. Also, adding a future change FIXME. llvm-svn: 238653	2015-05-30 10:44:07 +00:00
Chandler Carruth	e918d09d5b	[x86] Unify the horizontal adding used for popcount lowering taking the best approach of each. For vNi16, we use SHL + ADD + SRL pattern that seem easily the best. For vNi32, we use the PUNPCK + PSADBW + PACKUSWB pattern. In some cases there is a huge improvement with this in IACA's estimated throughput -- over 2x higher throughput!!!! -- but the measurements are too good to be true. In one narrow case, the SHL + ADD + SHL + ADD + SRL pattern looks slightly faster, but I'm not sure I believe any of the measurements at this point. Both are the exact same uops though. Hard to be confident of anything past that. If anyone wants to collect very detailed (Agner-level) timings with the result of this patch, or with the i32 case replaced with SHL + ADD + SHl + ADD + SRL, I'd be very interested. Note that you'll need to test it on both Ivybridge and Haswell, with both SSE3, SSSE3, and AVX selected as I saw unique behavior in each of these buckets with IACA all of which should be checked against measured performance. But this patch is still a useful improvement by dropping duplicate work and getting the much nicer PSADBW lowering for v2i64. I'd still like to rephrase this in terms of generic horizontal sum. It's a bit lame to have a special case of that just for popcount. llvm-svn: 238652	2015-05-30 10:35:03 +00:00
Renato Golin	c0a1933df0	[ARMTargetParser] Move IAS arch ext parser. NFC The plan was to move the whole table into the already existing ArchExtNames but some fields depend on a table-generated file, and we don't yet have this feature in the generic lib/Support side. Once the minimum target-specific table-generated files are available in a generic fashion to these libraries, we'll have to keep it in the ASM parser. llvm-svn: 238651	2015-05-30 10:30:02 +00:00
Chandler Carruth	5708da7c46	[x86] Split out the horizontal byte sum lowering component of the LUT lowering into a helper function. NFC. llvm-svn: 238650	2015-05-30 09:46:16 +00:00
Chandler Carruth	230340df2b	[x86] Replace the long spelling of getting a bitcast with the much shorter one. NFC. In addition to being much shorter to type and requiring fewer arguments, this change saves over 30 lines from this one file, all wasted on total boilerplate... llvm-svn: 238640	2015-05-30 04:23:13 +00:00
Chandler Carruth	2cc1323bd7	[x86] Replace the long spelling of getting a bitcast with the new short spelling. NFC. llvm-svn: 238639	2015-05-30 04:19:57 +00:00
Chandler Carruth	6f372f0515	[sdag] Add the helper I most want to the DAG -- building a bitcast around a value using its existing SDLoc. Start using this in just one function to save omg lines of code. llvm-svn: 238638	2015-05-30 04:14:10 +00:00
Chandler Carruth	0c88847b8a	[x86] Restore the bitcasts I removed when refactoring this to avoid shifting vectors of bytes as x86 doesn't have direct support for that. This removes a bunch of redundant masking in the generated code for SSE2 and SSE3. In order to avoid the really significant code size growth this would have triggered, I also factored the completely repeatative logic for shifting and masking into two lambdas which in turn makes all of this much easier to read IMO. llvm-svn: 238637	2015-05-30 04:05:11 +00:00
Chandler Carruth	11c24e4998	[x86] Implement a faster vector population count based on the PSHUFB in-register LUT technique. Summary: A description of this technique can be found here: http://wm.ite.pl/articles/sse-popcount.html The core of the idea is to use an in-register lookup table and the PSHUFB instruction to compute the population count for the low and high nibbles of each byte, and then to use horizontal sums to aggregate these into vector population counts with wider element types. On x86 there is an instruction that will directly compute the horizontal sum for the low 8 and high 8 bytes, giving vNi64 popcount very easily. Various tricks are used to get vNi32 and vNi16 from the vNi8 that the LUT computes. The base implemantion of this, and most of the work, was done by Bruno in a follow up to D6531. See Bruno's detailed post there for lots of timing information about these changes. I have extended Bruno's patch in the following ways: 0) I committed the new tests with baseline sequences so this shows a diff, and regenerated the tests using the update scripts. 1) Bruno had noticed and mentioned in IRC a redundant mask that I removed. 2) I introduced a particular optimization for the i32 vector cases where we use PSHL + PSADBW to compute the the low i32 popcounts, and PSHUFD + PSADBW to compute doubled high i32 popcounts. This takes advantage of the fact that to line up the high i32 popcounts we have to shift them anyways, and we can shift them by one fewer bit to effectively divide the count by two. While the PSHUFD based horizontal add is no faster, it doesn't require registers or load traffic the way a mask would, and provides more ILP as it happens on different ports with high throughput. 3) I did some code cleanups throughout to simplify the implementation logic. 4) I refactored it to continue to use the parallel bitmath lowering when SSSE3 is not available to preserve the performance of that version on SSE2 targets where it is still much better than scalarizing as we'll still do a bitmath implementation of popcount even in scalar code there. With #1 and #2 above, I analyzed the result in IACA for sandybridge, ivybridge, and haswell. In every case I measured, the throughput is the same or better using the LUT lowering, even v2i64 and v4i64, and even compared with using the native popcnt instruction! The latency of the LUT lowering is often higher than the latency of the scalarized popcnt instruction sequence, but I think those latency measurements are deeply misleading. Keeping the operation fully in the vector unit and having many chances for increased throughput seems much more likely to win. With this, we can lower every integer vector popcount implementation using the LUT strategy if we have SSSE3 or better (and thus have PSHUFB). I've updated the operation lowering to reflect this. This also fixes an issue where we were scalarizing horribly some AVX lowerings. Finally, there are some remaining cleanups. There is duplication between the two techniques in how they perform the horizontal sum once the byte population count is computed. I'm going to factor and merge those two in a separate follow-up commit. Differential Revision: http://reviews.llvm.org/D10084 llvm-svn: 238636	2015-05-30 03:20:59 +00:00
Chandler Carruth	7c7805b691	[x86] Restructure the parallel bitmath lowering of popcount into a separate routine, generalize it to work for all the integer vector sizes, and do general code cleanups. This dramatically improves lowerings of byte and short element vector popcount, but more importantly it will make the introduction of the LUT-approach much cleaner. The biggest cleanup I've done is to just force the legalizer to do the bitcasting we need. We run these iteratively now and it makes the code much simpler IMO. Other changes were minor, and mostly naming and splitting things up in a way that makes it more clear what is going on. The other significant change is to use a different final horizontal sum approach. This is the same number of instructions as the old method, but shifts left instead of right so that we can clear everything but the final sum with a single shift right. This seems likely better than a mask which will usually have to read the mask from memory. It is certaily fewer u-ops. Also, this will be temporary. This and the LUT approach share the need of horizontal adds to finish the computation, and we have more clever approaches than this one that I'll switch over to. llvm-svn: 238635	2015-05-30 03:20:55 +00:00
Jim Grosbach	30efd68a58	MC: Clean up MCExpr naming. NFC. llvm-svn: 238634	2015-05-30 01:25:56 +00:00
Reid Kleckner	2f05d5a280	[WinEH] Adjust the 32-bit SEH prologue to better match reality It turns out that _except_handler3 and _except_handler4 really use the same stack allocation layout, at least today. They just make different choices about encoding the LSDA. This is in preparation for lowering the llvm.eh.exceptioninfo(). llvm-svn: 238627	2015-05-29 22:57:46 +00:00

... 2 3 4 5 6 ...

33484 Commits