llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 11:42:57 +01:00

Author	SHA1	Message	Date
Teresa Johnson	8827aa8f71	[ThinLTO] Fix ThinLTO crash Summary: Follow up to fix in r311023, which fixed the case where the combined index is written to disk. The same samplePGO logic exists for the in-memory index when computing imports, so we need to filter out GlobalVariable summaries there too. Reviewers: davidxl Subscribers: inglorion, llvm-commits Differential Revision: https://reviews.llvm.org/D36919 llvm-svn: 311254	2017-08-19 18:04:25 +00:00
Craig Topper	5b3f3dc678	[X86] Remove an unnecessary alignment restriction from MOVDDUP pattern. The SSE MOVDDUP instruction only loads 64-bits with no alignment restriction. llvm-svn: 311253	2017-08-19 18:02:28 +00:00
Jatin Bhateja	7fb455e0e4	Revert rL311247 : To rectify commit message. Summary: This reverts commit rL311247. Differential Revision: https://reviews.llvm.org/D36927 llvm-svn: 311252	2017-08-19 17:59:58 +00:00
Jatin Bhateja	b676533174	Merge branch 'arcpatch-D35788' llvm-svn: 311247	2017-08-19 17:00:04 +00:00
Jatin Bhateja	192c957069	Revert rL311242 "Extension of shuffle vector pattern detection, updating post rebase." Summary: This reverts commit rL311242. Differential Revision: https://reviews.llvm.org/D36924 llvm-svn: 311246	2017-08-19 16:40:06 +00:00
Jatin Bhateja	d11726f0d1	Extension of shuffle vector pattern detection, updating post rebase. llvm-svn: 311242	2017-08-19 15:58:36 +00:00
Victor Leschuk	307153d235	revert failing test llvm-svn: 311238	2017-08-19 12:24:41 +00:00
Victor Leschuk	c7a88ed8b2	Add temporary test to verify that win10 builder hangs on error llvm-svn: 311236	2017-08-19 12:02:39 +00:00
Victor Leschuk	2689f0c407	Temporary mark lit :: shtest-format as unsupported on windows When run manually it fails, but when run under buildbot it causes hang. llvm-svn: 311230	2017-08-19 07:58:07 +00:00
Chandler Carruth	9d308b85ac	[Inliner] Fix a nasty bug when inlining a non-recursive trace of a function into itself. We tried to fix this before in r306495 but that got reverted as the assert was actually hit. This fixes the original bug (which we seem to have lost track of with the revert) by blocking a second remapping when the function being inlined is also the caller and the remapping could succeed but erroneously. The included test case would actually load from an inlined copy of the alloca before this change, failing to load the stored value and miscompiling. Many thanks to Richard Smith for diagnosing a user miscompile to this bug, and to Kyle for the first attempt and initial analysis and David Li for remembering the issue and how to fix it and suggesting the patch. I'm just stitching it together and landing it. =] llvm-svn: 311229	2017-08-19 06:56:11 +00:00
Chandler Carruth	15b31f053e	[Inliner] Clean up a test case a bit to make it more clear what is being tested and why. llvm-svn: 311228	2017-08-19 06:06:44 +00:00
Chandler Carruth	8d95cf5170	[SLP] Fix an unused variable warning in non-asserts builds. llvm-svn: 311227	2017-08-19 05:06:23 +00:00
Chandler Carruth	45b4e980ae	[x86] Teach the cmov converter to aggressively convert cmovs with memory operands into control flow. We have seen periodically performance problems with cmov where one operand comes from memory. On modern x86 processors with strong branch predictors and speculative execution, this tends to be much better done with a branch than cmov. We routinely see cmov stalling while the load is completed rather than continuing, and if there are subsequent branches, they cannot be speculated in turn. Also, in many (even simple) cases, macro fusion causes the control flow version to be fewer uops. Consider the IACA output for the initial sequence of code in a very hot function in one of our internal benchmarks that motivates this, and notice the micro-op reduction provided. Before, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.20 Cycles Throughput Bottleneck: Port1 \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| \| 1.0 \| \| \| \| \| CP \| mov rcx, rdi \| 0* \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| 0.1 \| 0.6 \| 0.5 0.5 \| 0.5 0.5 \| \| 0.4 \| CP \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| \| mov rax, qword ptr [rsi] \| 3 \| 1.8 \| 0.6 \| \| \| \| 0.6 \| CP \| cmovbe rax, rdi \| 2^ \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| 1.0 \| \| cmp byte ptr [rcx+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| jb 0xf Total Num Of Uops: 9 ``` After, SNB: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: Port5 \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| 0.5 \| 0.5 \| \| \| \| \| \| mov rax, rdi \| 0* \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| 0.5 \| 0.5 \| 1.0 1.0 \| \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| 0.5 \| 0.5 \| \| \| \| \| \| mov ecx, 0x0 \| 1 \| \| \| \| \| \| 1.0 \| CP \| jnbe 0x39 \| 2^ \| \| \| \| 1.0 1.0 \| \| 1.0 \| CP \| cmp byte ptr [rax+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| jnb 0x3c Total Num Of Uops: 7 ``` The difference even manifests in a throughput cycle rate difference on Haswell. Before, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 2.00 Cycles Throughput Bottleneck: FrontEnd \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| 6 \| 7 \| \| --------------------------------------------------------------------------------- \| 0* \| \| \| \| \| \| \| \| \| \| mov rcx, rdi \| 0* \| \| \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| 1.0 \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| \| \| \| mov rax, qword ptr [rsi] \| 3 \| 1.0 \| 1.0 \| \| \| \| \| 1.0 \| \| \| cmovbe rax, rdi \| 2^ \| 0.5 \| \| 0.5 0.5 \| 0.5 0.5 \| \| \| 0.5 \| \| \| cmp byte ptr [rcx+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| \| \| jb 0xf Total Num Of Uops: 8 ``` After, HSW: ``` Throughput Analysis Report -------------------------- Block Throughput: 1.50 Cycles Throughput Bottleneck: FrontEnd \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| 6 \| 7 \| \| --------------------------------------------------------------------------------- \| 0* \| \| \| \| \| \| \| \| \| \| mov rax, rdi \| 0* \| \| \| \| \| \| \| \| \| \| xor edi, edi \| 2^ \| \| \| 1.0 1.0 \| \| \| 1.0 \| \| \| \| cmp byte ptr [rsi+0xf], 0xf \| 1 \| \| 1.0 \| \| \| \| \| \| \| \| mov ecx, 0x0 \| 1 \| \| \| \| \| \| \| 1.0 \| \| \| jnbe 0x39 \| 2^ \| 1.0 \| \| \| 1.0 1.0 \| \| \| \| \| \| cmp byte ptr [rax+0xf], 0x10 \| 0F \| \| \| \| \| \| \| \| \| \| jnb 0x3c Total Num Of Uops: 6 ``` Note that this cannot be usefully restricted to inner loops. Much of the hot code we see hitting this is not in an inner loop or not in a loop at all. The optimization still remains effective and indeed critical for some of our code. I have run a suite of internal benchmarks with this change. I saw a few very significant improvements and a very few minor regressions, but overall this change rarely has a significant effect. However, the improvements were very significant, and in quite important routines responsible for a great deal of our C++ CPU cycles. The gains pretty clealy outweigh the regressions for us. I also ran the test-suite and SPEC2006. Only 11 binaries changed at all and none of them showed any regressions. Amjad Aboud at Intel also ran this over their benchmarks and saw no regressions. Differential Revision: https://reviews.llvm.org/D36858 llvm-svn: 311226	2017-08-19 05:01:19 +00:00
Chandler Carruth	4d5a2979de	[x86] Refactor the CMOV conversion pass to be more flexible. The primary thing that this accomplishes is to allow future re-use of these routines in more contexts and clarify the behavior w.r.t. loops. For example, if handling outer loops is desirable, doing so in a inside-out order becomes straight forward because it walks the loop nest itself (rather than walking the function's basic blocks) and de-couples the CMOV rewriting from the loop structure as there isn't actually anything loop-specific about this transformation. This patch should be essentially a no-op. It potentially changes the order in which we visit the inner loops, but otherwise should merely set the stage for subsequent changes. Differential Revision: https://reviews.llvm.org/D36783 llvm-svn: 311225	2017-08-19 04:28:20 +00:00
Dinar Temirbulatov	8641c659f9	[SLPVectorizer] Tighten up VLeft, VRight declaration, remove unnecessary testcase test/Transforms/SLPVectorizer/X86/reorder.ll, NFCI. llvm-svn: 311223	2017-08-19 03:15:07 +00:00
Dinar Temirbulatov	58a3537457	[SLPVectorizer] Add opcode parameter to reorderAltShuffleOperands, reorderInputsAccordingToOpcode functions. Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D36766 llvm-svn: 311221	2017-08-19 02:54:20 +00:00
Matthias Braun	47e07f9e45	ARMRegsiterInfo: Define more ssub indexes; NFC This doesn't really change anything as Tablegen would have inferred those indices anyway; defining them gives us shorter names that are easier to read while debugging (i.e. "ssub_4" rather than "dsub2_then_ssub_0") llvm-svn: 311218	2017-08-19 01:21:11 +00:00
Adrian Prantl	65172dd948	Filter out non-constant DIGlobalVariableExpressions reachable via the CU They won't affect the DWARF output, but they will mess with the sorting of the fragments. This fixes the crash reported in PR34159. https://bugs.llvm.org/show_bug.cgi?id=34159 llvm-svn: 311217	2017-08-19 01:15:06 +00:00
Eric Beckmann	ddb2903ebb	llvm-mt: Merge manifest namespaces. mt.exe performs a tree merge where certain element nodes are combined into one. This introduces the possibility of xml namespaces conflicting with each other. The original mt.exe has a hierarchy whereby certain namespace names can override others, and nodes that would then end up in ambigious namespaces have their namespaces explicitly defined. This namespace handles this merging process. llvm-svn: 311215	2017-08-19 00:37:41 +00:00
Eugene Zelenko	626e76b0a7	[Analysis] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 311212	2017-08-18 23:51:26 +00:00
Xinliang David Li	cfb9a3b007	Fix comment /NFC llvm-svn: 311209	2017-08-18 23:08:50 +00:00
Xinliang David Li	39614caf8e	[Profile] backward propagate profile info in JumpThreading Differential Revsion: http://reviews.llvm.org/D36864 llvm-svn: 311208	2017-08-18 23:00:05 +00:00
Amjad Aboud	fcfd748fb7	[InstCombine] Teach ComputeNumSignBitsImpl to handle integer multiply instruction. Differential Revision: https://reviews.llvm.org/D36679 llvm-svn: 311206	2017-08-18 22:56:55 +00:00
Max Kazantsev	1040ba2c7d	[IRCE] Fix buggy behavior in Clamp Clamp function was too optimistic when choosing signed or unsigned min/max function for calculations. In fact, `!IsSignedPredicate` guarantees us that `Smallest` and `Greatest` can be compared safely using unsigned predicates, but we did not check this for `S` which can in theory be negative. This patch makes Clamp use signed min/max for cases when it fails to prove `S` being non-negative, and it adds a test where such situation may lead to incorrect conditions calculation. Differential Revision: https://reviews.llvm.org/D36873 llvm-svn: 311205	2017-08-18 22:50:29 +00:00
Justin Bogner	c5397dbcdd	IR: Make stripDebugInfo robust against (invalid) empty basic blocks Since stripDebugInfo runs before the verifier when reading IR, we can end up in a situation where we read some invalid IR but don't know its invalid yet. Before this patch we would crash in stripDebugInfo when given IR with a completely empty basic block, and after we get a nice error from the verifier instead. llvm-svn: 311202	2017-08-18 21:38:03 +00:00
Jonas Devlieghere	8e141da7e4	[llvm-dwarfdump] Hide .debug_str and DIE reference offsets in brief mode This patch hides the .debug_str offset and DIE reference offsets into the CU when llvm-dwarfdump is invoked with -brief. Differential Revision: https://reviews.llvm.org/D36835 llvm-svn: 311201	2017-08-18 21:35:44 +00:00
Simon Pilgrim	a58d3c4934	[X86][ADX] Regenerate ADX intrinsics tests llvm-svn: 311198	2017-08-18 21:21:14 +00:00
Sanjay Patel	3148ff929a	fix typos in comments; NFC llvm-svn: 311193	2017-08-18 20:27:47 +00:00
Ana Pazos	c748485cc5	[PGO] Fixed assertion due to mismatched memcpy size type. Summary: Memcpy intrinsics have size argument of any integer type, like i32 or i64. Fixed size type along with its value when cloning the intrinsic. Reviewers: davidxl, xur Reviewed By: davidxl Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D36844 llvm-svn: 311188	2017-08-18 19:17:08 +00:00
Tim Northover	5bb9dd7c78	ARM: use an external relocation for calls from MachO ARM mode. The internal (__text-relative) relocation risks the offset not being encodable if the destination is Thumb. llvm-svn: 311187	2017-08-18 19:13:56 +00:00
Matt Morehouse	38756d86aa	[SanitizerCoverage] Add stack depth tracing instrumentation. Summary: Augment SanitizerCoverage to insert maximum stack depth tracing for use by libFuzzer. The new instrumentation is enabled by the flag -fsanitize-coverage=stack-depth and is compatible with the existing trace-pc-guard coverage. The user must also declare the following global variable in their code: thread_local uintptr_t __sancov_lowest_stack https://bugs.llvm.org/show_bug.cgi?id=33857 Reviewers: vitalybuka, kcc Reviewed By: vitalybuka Subscribers: kubamracek, hiraditya, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D36839 llvm-svn: 311186	2017-08-18 18:43:30 +00:00
Marek Sokolowski	0cbccbe294	Reapply: [llvm-rc] Add basic RC scripts parsing ability. As for now, the parser supports a limited set of statements and resources. This will be extended in the following patches. Thanks to Nico Weber (thakis) for his original work in this area. This patch was originally submitted as r311175 and got reverted in r311177 because of the problems with compilation under gcc. Differential Revision: https://reviews.llvm.org/D36340 llvm-svn: 311184	2017-08-18 18:24:17 +00:00
Jonas Devlieghere	766f310ccf	[Debug info] Transfer DI to fragment expressions for split integer values. This patch teaches the SDag type legalizer how to split up debug info for integer values that are split into a hi and lo part. (re-commit) Differential Revision: https://reviews.llvm.org/D36805 llvm-svn: 311181	2017-08-18 18:07:00 +00:00
Ben Dunbobbin	d80ce7788d	[lit] support unsetting env variables (again!) This is an updated version of https://reviews.llvm.org/D22144 by @jlpeyton. The patch was accepted but not landed. This is useful functionality and I would like to use this to enable lit tests for environment variable behaviour. Differential Revision: https://reviews.llvm.org/D36403 llvm-svn: 311180	2017-08-18 17:32:57 +00:00
Konstantin Zhuravlyov	eec800fc3a	AMDGPU/NFC: Rename few things in SIMemoryLegalizer: - AtomicInfo -> MemOpInfo - getAtomicLoadInfo -> getLoadInfo - getAtomicStoreInfo -> getStoreInfo - expandAtomicLoad -> expandLoad - expandAtomicStore -> expandStore Differential Revision: https://reviews.llvm.org/D36861 llvm-svn: 311179	2017-08-18 17:30:02 +00:00
Marek Sokolowski	aea1c80dd2	Revert "[llvm-rc] Add basic RC scripts parsing ability." This reverts commit r311175. This failed some buildbots compilation. llvm-svn: 311177	2017-08-18 17:25:55 +00:00
Jakub Kuderski	004e4fe2ff	[Dominators] Don't print the whole tree when running with -debug As the incremental API is now used in several transforms, printing the whole dominator tree creates a lot of noise when running with the `-debug` flag. This patch fixes that. llvm-svn: 311176	2017-08-18 17:06:37 +00:00
Marek Sokolowski	0823f275d4	[llvm-rc] Add basic RC scripts parsing ability. As for now, the parser supports a limited set of statements and resources. This will be extended in the following patches. Thanks to Nico Weber (thakis) for his original work in this area. Differential Revision: https://reviews.llvm.org/D36340 llvm-svn: 311175	2017-08-18 17:05:47 +00:00
Ben Dunbobbin	6940aac2e2	[Support] env vars with empty values on windows An environment variable can be in one of three states: 1. undefined. 2. defined with a non-empty value. 3. defined but with an empty value. The windows implementation did not support case 3 (it was not handling errors). The Linux implementation is already correct. Differential Revision: https://reviews.llvm.org/D36394 llvm-svn: 311174	2017-08-18 16:55:44 +00:00
Simon Pilgrim	121feabbdd	[X86][BMI2] Added scheduling test for RORX/SARX/SHLX/SHRX instructions llvm-svn: 311171	2017-08-18 16:26:39 +00:00
Brian Gesiak	171bd76e17	[Lexicon] Add "GEP" Summary: `getelementptr` is frequently abbreviated as "GEP", often in source files that do not ever reference the full name of the instruction. Add it to the Lexicon, in case readers go to look for what it means there. Test plan: 1. `ninja sphinx` 2. Confirm that the rendered docs HTML contains the new "GEP" entry llvm-svn: 311168	2017-08-18 15:35:53 +00:00
Simon Pilgrim	406e5b9a78	[X86][AES] Add scheduling latency/throughput tests for AES instructions llvm-svn: 311167	2017-08-18 15:26:51 +00:00
Simon Pilgrim	5b790be865	[X86][PCLMUL] Add scheduling latency/throughput test for PCLMULQDQ instruction Added it to the SSE42 tests as targets seem to always have both llvm-svn: 311166	2017-08-18 15:08:30 +00:00
Simon Pilgrim	cebf34766a	[X86][SHA] Add scheduling latency/throughput tests for SHA instructions llvm-svn: 311164	2017-08-18 14:55:50 +00:00
Simon Pilgrim	10afe8d271	[X86][MOVBE] Add scheduling latency/throughput tests for MOVBE instructions llvm-svn: 311163	2017-08-18 14:44:31 +00:00
Sam Parker	e6c91b456b	[ARM] Add PostRAScheduler option This patch adds the option to allow also using the PostRA scheduler, which brings the ARM backend inline with AArch64 targets. The SchedModel can also set 'PostRAScheduler', as the R52 does, so also query this property in the overridden function. Differential Revision: https://reviews.llvm.org/D36866 llvm-svn: 311162	2017-08-18 14:27:51 +00:00
Simon Dardis	74b1d4ec43	[mips] Follow up comments on r310460 Use dblaikie's suggestion of cast<> instead of a seperate assert. llvm-svn: 311160	2017-08-18 13:27:02 +00:00
Simon Pilgrim	044849accc	[X86][BMI2] Added scheduling test for MULX instructions llvm-svn: 311159	2017-08-18 13:22:18 +00:00
Sjoerd Meijer	75c6d8cc25	[AArch64] Do not promote f16 when subtarget HasFullFP16 Armv8.2-A adds FP16 support, i.e. f16 is not only a storage-only type, but it also supports performing data processing on 16-bit floating-point quantities. All the necessary (tablegen) groundwork of adding the ARMv8.2-A FP16 (scalar) instructions was done in D15014. To take advantage of this, this patch avoids promotion of f16 to f32 types when the subtarget supports FullFP16, which enables instruction selection of these FP16 instructions. Differential Revision: https://reviews.llvm.org/D36396 llvm-svn: 311154	2017-08-18 10:51:14 +00:00
Renato Golin	3797af285e	[Triple] Define OS Check for Haiku This adds the OS check for the Haiku operating system, as it was missing in the Triple class. Tests for x86_64-unknown-haiku and i586-pc-haiku were also added. These patches only affect Haiku and are completely harmless for other platforms. Patch by Calvin Hill <calvin@hakobaito.co.uk> llvm-svn: 311153	2017-08-18 10:35:42 +00:00

... 2 3 4 5 6 ...

153304 Commits