llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 11:13:28 +01:00

Author	SHA1	Message	Date
Sjoerd Meijer	ac0a363b76	[LangRef] Revise semantics of intrinsic get.active.lane.mask A first version of get.active.lane.mask was committed in rG7fb8a40e5220. One of the main purposes and uses of this intrinsic is to communicate information from the middle-end to the back-end, but its current definition and semantics make this actually very difficult. The intrinsic was defined as: @llvm.get.active.lane.mask(%IV, %BTC) where %BTC is the Backedge-Taken Count (variable names are different in the LangRef spec). This allows to implicitly communicate the loop tripcount, which can be reconstructed by calculating BTC + 1. But it has been very difficult to prove that calculating BTC + 1 is safe and doesn't overflow. We need complicated range and SCEV analysis, and thus the problem is that this intrinsic isn't really doing what it was supposed to solve. Examples of the overflow checks that are required in the (ARM) back-end are D79175 and D86074, which aren't even complete/correct yet. To solve this problem, we are revising the definitions/semantics for get.active.lane.mask to avoid all the complicated overflow analysis. This means that instead of communicating the BTC, we are now using the loop tripcount. Now using LangRef's variable names, its semantics is changed from: icmp ule (%base + i), %n to: icmp ult (%base + i), %n with %n > 0 and corresponding to the loop tripcount. The intrinsic signature remains the same. Differential Revision: https://reviews.llvm.org/D86147	2020-08-25 16:23:51 +01:00
Sanjay Patel	66453001f1	[InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try) The 1st attempt (rG557b890) was reverted because it caused miscompiles. That bug is avoided here by changing the order of folds and as verified in the new tests. Original commit message: InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-25 11:19:36 -04:00
Sanjay Patel	c2dd03cd0f	[InstCombine] add vector demanded elements tests with shuffles; NFC The 1st draft of D86460 (reverted) would show miscompiles with these tests because the undef element tracking went wrong and became visible in the shuffle masks.	2020-08-25 11:19:35 -04:00
Jay Foad	bb09b6c69f	AMDGPU/GlobalISel: re-auto-generate some test checks	2020-08-25 15:54:22 +01:00
Sjoerd Meijer	47761b113f	[Verifier] Additional check for intrinsic get.active.lane.mask This adapts the verifier checks for intrinsic get.active.lane.mask to the new semantics of it as described in D86147. I.e., the second argument %n, which corresponds to the loop tripcount, must be greater than 0 if it is a constant, so check that. Differential Revision: https://reviews.llvm.org/D86301	2020-08-25 15:44:33 +01:00
Xing GUO	6857a04f03	[DWARFYAML] Make the 'Attributes' field optional. This patch makes the 'Attributes' field optional. We don't need to explicitly specify the 'Attributes' field in the future. Reviewed By: jhenderson, grimar Differential Revision: https://reviews.llvm.org/D86537	2020-08-25 22:37:43 +08:00
Sjoerd Meijer	02f39d5a7e	[SelectionDAG] Legalize intrinsic get.active.lane.mask This adapts legalization of intrinsic get.active.lane.mask to the new semantics as described in D86147. Because the second argument is now the loop tripcount, we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the backedge-taken count. Differential Revision: https://reviews.llvm.org/D86302	2020-08-25 15:00:10 +01:00
Jeremy Morse	f5080847e6	[LiveDebugValues] Add switches for using instr-ref variable locations This patch adds the -Xclang option "-fexperimental-debug-variable-locations" and same LLVM CodeGen option, to pick which variable location tracking solution to use. Right now all the switch does is pick which LiveDebugValues implementation to use, the normal VarLoc one or the instruction referencing one in rGae6f78824031. Over time, the aim is to add fragments of support in aid of the value-tracking RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html also controlled by this command line switch. That will slowly move variable locations to be defined by an instruction calculating a value, and a DBG_INSTR_REF instruction referring to that value. Thus, this is going to grow into a "use the new kind of variable locations" switch, rather than just "use the new LiveDebugValues implementation". Differential Revision: https://reviews.llvm.org/D83048	2020-08-25 14:58:48 +01:00
Matt Arsenault	1a92d5b134	AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge Most notably, we were incorrectly reporting <3 x s16> as a legal type for these. Make sure these aren't legal to help make progress on fixing the artifact combiner and vector legalizer rules. Unfortunately, this means spreading the -global-isel-abort=0 hack, although this doesn't change the legalizer result in any situation.	2020-08-25 09:40:20 -04:00
Matt Arsenault	f16e802454	AMDGPU/GlobalISel: Fix using unlegalizable values in tests Implicit uses of non-register value types places impossible to satisfy constraints on the legalizer / artifact combiner. These prevent writing sensible legalize rules for the artifacts without triggering infinite loops in the legalizer. The verifier really needs to enforce this, but I'm not sure what the exact conditions would look like yet.	2020-08-25 09:39:32 -04:00
Sjoerd Meijer	bd571cfbf0	[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303	2020-08-25 14:38:03 +01:00
Matt Arsenault	2a33728d72	AMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors The selection patterns will currently fail on these.	2020-08-25 09:37:41 -04:00
Anatoly Trosinenko	1adb0ef206	[Utils] Add highlighting definition for byref IR attribute This patch assumes `byref` can be handled identically to `byval`. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D85768	2020-08-25 16:19:24 +03:00
Sjoerd Meijer	1cd139275c	[LV] get.active.lane.mask consuming tripcount instead of backedge-taken count This adapts LV to the new semantics of get.active.lane.mask as discussed in D86147, which means that the LV now emits intrinsic get.active.lane.mask with the loop tripcount instead of the backedge-taken count as its second argument. The motivation for this is described in D86147. Differential Revision: https://reviews.llvm.org/D86304	2020-08-25 13:49:19 +01:00
Alex Richardson	65d5b3b1e2	Fix update_llc_test_checks function regex for RV64 Some functions also include a `.Lfunc$local:` label due to -fno-semantic-interposition Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D85888	2020-08-25 12:20:33 +01:00
Sam Parker	50697e16b0	[NFC][SimplifyCFG] More tests for Arm	2020-08-25 12:13:48 +01:00
David Green	906e8b5e0e	[ARM][CGP] Fix scalar condition selects for MVE The arm backend does not handle select/select_cc on vectors with scalar conditions, preferring to expand them in codegenprepare instead. This usually works except when optimizing for size, where the optsize check would end up overruling the backend isSelectSupported check. We could handle the selects in ISel too, but this seems like smaller code than trying to splat the condition to all lanes. Differential Revision: https://reviews.llvm.org/D86433	2020-08-25 12:09:06 +01:00
Mikael Holmen	5ae70c97fb	[PowerPC] Fix gcc warning [NFC] Without the fix gcc 7.4 warns with ../lib/Target/PowerPC/PPCAsmPrinter.cpp: In member function 'void {anonymous}::PPCAsmPrinter::EmitTlsCall(const llvm::MachineInstr*, llvm::MCSymbolRefExpr::VariantKind)': ../lib/Target/PowerPC/PPCAsmPrinter.cpp:525:53: warning: enumeral and non-enumeral type in conditional expression [-Wextra] MCInstBuilder(Subtarget->isPPC64() ? Opcode : PPC::BL_TLS) ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~	2020-08-25 12:58:38 +02:00
Sam Parker	2ad592033d	[NFC][SimplifyCFG] Add some more tests for Arm.	2020-08-25 11:44:17 +01:00
Shinji Okumura	3667bf202b	[Attributor][NFC] Clang format	2020-08-25 19:32:58 +09:00
Paul Walker	7cd72f042a	[SVE] Lower scalable vector ISD::FNEG operations. Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A) transformation when -1 is expressed as an ISD::SPLAT_VECTOR. Differential Revision: https://reviews.llvm.org/D86415	2020-08-25 11:22:28 +01:00
Sam Parker	5f52cd12df	[NFC][ARM] arith code size cost tests Add a run to measure the code size cost of arithmetic instructions and add a function for i1 types.	2020-08-25 11:16:01 +01:00
Sam Parker	c6eebcc686	[UpdatesTestChecks] Fix typo in common.py global_vars_see_dict -> global_vars_seen_dict	2020-08-25 11:13:33 +01:00
Georgii Rymar	f23c466d76	[llvm-readobj] - Print "Unknown" when a program header is unknown. Currently, when a program header type is unknown, we dont print anything: ``` ProgramHeader { Type: (0x60000000) ``` With this patch the output will be: ``` ProgramHeader { Type: Unknown (0x60000000) ``` It was discussed in D85526 and consistent with what we print for '--sections' already, e.g.: ``` Section { Name: .sec Type: Unknown (0x7FFFFFFF) } ``` Differential revision: https://reviews.llvm.org/D86213	2020-08-25 13:05:17 +03:00
Roman Lebedev	8908aecdf3	[NFC][InstCombine] Tests for PHI-of-extractvalues Much like with it's sibling fold HI-of-insertvalues, it appears to be much more worthwhile than it would seem.	2020-08-25 13:01:07 +03:00
Benjamin Kramer	cc40144ebc	Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract" This reverts commit 557b890ff4f4dd5fa979c232df5b31cf3fef04c1. Causing miscompiles, test case is on llvm-commits.	2020-08-25 11:31:31 +02:00
Hans Wennborg	8821af3ec2	Revert "[CMake] Fix ncurses/zlib in LLVM_SYSTEM_LIBS for Windows GNU" It broke Chromium's llvm build: CMake Error at lib/Support/CMakeLists.txt:13 (string): string sub-command REGEX, mode REPLACE: regex "^()" matched an empty string. Call Stack (most recent call first): lib/Support/CMakeLists.txt:223 (get_system_libname) This reverts commit 2b3807d822c50d361ae67184b6de5a41bd7b1bba / https://reviews.llvm.org/D86434	2020-08-25 11:22:50 +02:00
Georgii Rymar	ce5d5ddb40	[llvm-readelf/obj] - Change the return type of the `createDRI(...)` to `Expected<>` This allows to get rid of "Invalid data was encountered while parsing the file" error reported in cases when sh_size/sh_offset of sections are broken. Differential revision: https://reviews.llvm.org/D86451	2020-08-25 12:11:26 +03:00
Yang Zhihui	cd03b849a7	[FileCheck][docs] Fix word errors ouput -> output Reviewed By: thopre Differential Revision: https://reviews.llvm.org/D86504	2020-08-25 09:53:52 +01:00
OCHyams	6420954176	[llvm-dwarfdump] Fix misleading scope byte coverage statistics Fixes PR46575. Bump statistics version to 6. Without this patch, for a variable described with a location list the stat 'sum_all_variables(#bytes in parent scope covered by DW_AT_location)' is calculated by summing all bytes covered by the location ranges in the list and capping the result to the number of bytes in the parent scope. With the patch, only bytes which overlap with the parent DIE scope address ranges contribute to the stat. A new stat 'sum_all_variables(#bytes in any scope covered by DW_AT_location)' has been added which displays the total bytes covered when ignoring scopes.	2020-08-25 06:40:11 +01:00
David Sherwood	82c9874179	[SVE] Fix TypeSize related warnings with IR truncates of scalable vectors In getCastInstrCost when the instruction is a truncate we were relying upon the implicit TypeSize -> uint64_t cast when asking if a given type has the same size as a legal integer. I've changed the code to only ask the question if the type is fixed length. I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail out for now if the type is a scalable vector. I've added the following new tests: Analysis/CostModel/AArch64/sve-trunc.ll Transforms/InstCombine/AArch64/sve-trunc.ll for both of these fixes. Differential revision: https://reviews.llvm.org/D86432	2020-08-25 09:17:56 +01:00
Florian Hahn	2c80ab9174	[DSE,MemorySSA] Cache accesses with/without reachable read-clobbers. Currently we repeatedly check the same uses for read clobbers in some cases. We can avoid unnecessary checks by keeping track of the memory accesses we already found read clobbers for. To do so, we just add memory access causing read-clobbers to a set. Note that marking all visited accesses as read-clobbers would be to pessimistic, as that might include accesses not on any path to the actual read clobber. If we do not find any read-clobbers, we can add all visited instructions to another set and use that to skip the same accesses in the next call. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D75025	2020-08-25 08:48:46 +01:00
Roman Lebedev	ed8ecc651f	[InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's As per statistic, this happens pretty exceedingly rare, but i have seen it in exactly the situations the Phi-aware aggregate reconstruction would have handled, eventually, and allowed invoke -> call fold later on. So while this might be something that other fold will have to learn about, i believe we should be doing this transform in general. Here, we are okay with adding two PHI's to get both the base aggregate, and the inserted value. I'm not sure it makes much sense to restrict it to a single phi (to just the inserted value?), because originally we'd be receiving the final aggregate already.. llvm test-suite + RawSpeed: ``` \| statistic name \| baseline \| proposed \| Δ \| % \| \\|%\\| \| \|--------------------------------------------\|-----------\|-----------\|-----:\|-------:\|------:\| \| instcombine.NumPHIsOfInsertValues \| 0 \| 12 \| 12 \| 0.00% \| 0.00% \| \| asm-printer.EmittedInsts \| 8926643 \| 8926595 \| -48 \| 0.00% \| 0.00% \| \| instcombine.NumCombined \| 3846614 \| 3846640 \| 26 \| 0.00% \| 0.00% \| \| instcombine.NumConstProp \| 24302 \| 24293 \| -9 \| -0.04% \| 0.04% \| \| instcombine.NumDeadInst \| 1620140 \| 1620112 \| -28 \| 0.00% \| 0.00% \| \| instcount.NumBrInst \| 898466 \| 898464 \| -2 \| 0.00% \| 0.00% \| \| instcount.NumCallInst \| 1760819 \| 1760875 \| 56 \| 0.00% \| 0.00% \| \| instcount.NumExtractValueInst \| 45659 \| 45649 \| -10 \| -0.02% \| 0.02% \| \| instcount.NumInsertValueInst \| 4991 \| 4981 \| -10 \| -0.20% \| 0.20% \| \| instcount.NumIntToPtrInst \| 27084 \| 27087 \| 3 \| 0.01% \| 0.01% \| \| instcount.NumPHIInst \| 371435 \| 371429 \| -6 \| 0.00% \| 0.00% \| \| instcount.NumStoreInst \| 906011 \| 906019 \| 8 \| 0.00% \| 0.00% \| \| instcount.TotalBlocks \| 1105520 \| 1105518 \| -2 \| 0.00% \| 0.00% \| \| instcount.TotalInsts \| 9795737 \| 9795776 \| 39 \| 0.00% \| 0.00% \| \| simplifycfg.NumInvokes \| 2784 \| 2786 \| 2 \| 0.07% \| 0.07% \| \| simplifycfg.NumSimpl \| 1001840 \| 1001850 \| 10 \| 0.00% \| 0.00% \| \| simplifycfg.NumSinkCommonInstrs \| 15174 \| 15170 \| -4 \| -0.03% \| 0.03% \| ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86306	2020-08-25 10:38:11 +03:00
Sam Parker	077f615e10	[NFC][RDA] Add explicit def check Explicitly check that there is a local def prior to the given instruction in getReachingLocalMIDef instead of just relying on a nullptr return from getInstFromId.	2020-08-25 08:37:45 +01:00
Freddy Ye	3ad559cce3	[X86] Support -march=sapphirerapids Support -march=sapphirerapids for x86. Compare with Icelake Server, it includes 14 more new features. They are amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote, enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D86503	2020-08-25 14:21:21 +08:00
Petr Hosek	2d6ef3a647	[CMake] Fix ncurses/zlib in LLVM_SYSTEM_LIBS for Windows GNU For the Windows GNU platform, CMAKE_FIND_LIBRARY_PREFIXES is a list containing an empty string, which ended up in a regex capturing group, which is invalid in CMake's regex engine. With this change, we get the following: set(CMAKE_FIND_LIBRARY_PREFIXES "lib" "") set(CMAKE_FIND_LIBRARY_SUFFIXES ".dll.a" ".a" ".lib") get_system_libname(path/to/libz.dll.a zlib) message("${zlib}") outputs z, as expected. Patch By: haampie Differential Revision: https://reviews.llvm.org/D86434	2020-08-24 23:00:54 -07:00
Alexandre Ganea	70a9ce1df9	Disable 'not' test on Windows because 'env' from GnuWin32 cannot be used without arguments.	2020-08-24 21:55:34 -04:00
Mircea Trofin	9c71d4e1d1	[MLInliner] Support training that doesn't require partial rewards If we use training algorithms that don't need partial rewards, we don't need to worry about an ir2native model. In that case, training logs won't contain a 'delta_size' feature either (since that's the partial reward). Differential Revision: https://reviews.llvm.org/D86481	2020-08-24 17:36:29 -07:00
Fangrui Song	cf617b95ee	[not][test] Fix disable-symbolization.test when 'printenv' is not available On Windows, 'env' or 'printenv' may not exist. Also switch back to 'env' which is specified by POSIX.1-2017. 'printenv' is not standard (I picked it because 'printenv' exists on GnuWin32 but 'env' does not). Reviewed By: zequanwu Differential Revision: https://reviews.llvm.org/D86496	2020-08-24 17:27:34 -07:00
Venkataramanan Kumar	255be4506d	[DAGCombine]: Fold X/Sqrt(X) to Sqrt(X) With FMF ( "nsz" and " reassoc") fold X/Sqrt(X) to Sqrt(X). This is done after targets have the chance to produce a reciprocal sqrt estimate sequence because that expansion is probably more efficient than an expansion of a non-reciprocal sqrt. That is also why we deferred doing this transform in IR (D85709). Differential Revision: https://reviews.llvm.org/D86403	2020-08-24 18:16:13 -04:00
Sanjay Patel	7a4b11cdee	[x86][AArch64] adjust fast-math-flags in tests; NFC This goes with the proposal in D86403.	2020-08-24 18:16:13 -04:00
Matt Arsenault	e5708466b3	AMDGPU/GlobalISel: Handle AGPRs used for SGPR operands. We would still need to waterfall if the value were somehow an AGPR, and also need to explicitly copy to a VGPR.	2020-08-24 17:54:34 -04:00
Nemanja Ivanovic	887edb78a5	[PowerPC] Do not use FISel for calls and TOC-based accesses with PC-Rel PC-Relative addressing introduces a fair bit of complexity for correctly eliminating TOC accesses. FastISel does not include any of that handling so we miscompile code with -mcpu=pwr10 -O0 if it includes an external call that FastISel does not handle followed by any of the following: Floating point constant materialization Materialization of a GlobalValue Call that FastISel does handle This patch switches to SDISel for any of the above. Differential revision: https://reviews.llvm.org/D86343	2020-08-24 16:51:44 -05:00
Craig Topper	82f73ac58e	[X86] Copy the tuning features and scheduler model from pentium4/x86-64 to generic This is preparation for making clang default to -mtune=generic when no -march is specified. This will allow the default tuning to be "generic" even though our default march is "pentium4" or "x86-64". To avoid llc lit test regressions, if no mcpu is specified, I've defaulted tune to use i586 to match the old tuning settings of no CPU. Some tests explicitly used -mcpu=generic which I've removed so they instead get this default of architecture features from generic and tune from i586. I updated one llvm-mca test to check a different CPU since generic has a scheduler model now Differential Revision: https://reviews.llvm.org/D86312	2020-08-24 14:47:10 -07:00
Matt Arsenault	3b7d6a6aaa	AMDGPU: Have a few selection failure tests check both paths SelectionDAG and GlobalISel take different failure paths for these and end up producing different failure errors. Check both so the test passes when the default is switched.	2020-08-24 17:46:31 -04:00
Nemanja Ivanovic	b06fbd740a	[PowerPC] Handle SUBFIC in reg+reg -> reg+imm transformation We initially missed the subtract-immediate in this transformation. This patch just adds that. Differential revision: https://reviews.llvm.org/D84659	2020-08-24 16:22:59 -05:00
Sanjay Patel	8f9cb71b9c	[InstCombine] improve demanded element analysis for vector insert-of-extract InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-24 17:00:16 -04:00
Sanjay Patel	663055a339	[SLP] avoid 'tmp' names in regression tests; NFC That can cause problems for update_test_checks.py (it warns when updating this file).	2020-08-24 17:00:16 -04:00
Sanjay Patel	07008dabee	[InstCombine] add tests for insert+extract demanded elements; NFC	2020-08-24 17:00:16 -04:00
Shoaib Meenai	a3c544e1b7	[runtimes] Use llvm-libtool-darwin for runtimes build It's full featured now and we can use it for the runtimes build instead of relying on an external libtool, which means the CMAKE_HOST_APPLE restriction serves no purpose either now. Restrict llvm-lipo to Darwin targets while I'm here, since it's only needed there. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D86367	2020-08-24 13:48:30 -07:00

1 2 3 4 5 ...

202547 Commits