llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00

Author	SHA1	Message	Date
Arthur Eubanks	764e5745a3	[OpaquePtr] Make GEPs work with opaque pointers No verifier changes needed, the verifier currently doesn't check that the pointer operand's pointee type matches the GEP type. There is a similar check in GetElementPtrInst::Create() though. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D102744	2021-05-19 12:39:37 -07:00
Joseph Huber	a0d824fa55	[Diagnostics] Allow emitting analysis and missed remarks on functions Summary: Currently, only `OptimizationRemarks` can be emitted using a Function. Add constructors to allow this for `OptimizationRemarksAnalysis` and `OptimizationRemarkMissed` as well. Reviewed By: jdoerfert thegameg Differential Revision: https://reviews.llvm.org/D102784	2021-05-19 15:10:20 -04:00
Sanjay Patel	f66389763f	[x86] add tests for fma folds with fast-math-flags; NFC Part of prep work for D90901	2021-05-19 14:28:57 -04:00
Sanjay Patel	f47bf23962	[x86] propagate FMF from x86-specific intrinsic nodes to others during combining This is another FMF gap exposed by D90901, but I don't see a way to show the difference in a regression test as with: f66ba4c 6025663 We will see an asm difference if we add a test as part of D90901.	2021-05-19 14:25:09 -04:00
Andrea Di Biagio	af6ea1e0bd	[MCA] Unbreak the buildbots by passing flag -mcpu=generic to the new test added by commit e5d59db469. This should unbreak buildbot clang-ppc64le-linux-lnt.	2021-05-19 19:12:33 +01:00
Sanjay Patel	ce4fb4a2a8	[x86] update fma test with deprecated intrinsics; NFC Similar to 8854b27 - All of the CHECK lines should be identical to before, but without any of the x86-specific calls that were replaced with generic FMA long ago. The file still has value because it shows a miscompile as demonstrated in D90901, but we probably need to add tests with FMF to make that explicit without losing coverage.	2021-05-19 13:52:08 -04:00
Pirama Arumuga Nainar	7b5dc69f32	[CoverageMapping] Handle gaps in counter IDs for source-based coverage For source-based coverage, the frontend sets the counter IDs and the constraints of counter IDs is not defined. For e.g., the Rust frontend until recently had a reserved counter #0 (https://github.com/rust-lang/rust/pull/83774). Rust coverage instrumentation also creates counters on edges in addition to basic blocks. Some functions may have more counters than regions. This breaks an assumption in CoverageMapping.cpp where the number of counters in a function is assumed to be bounded by the number of regions: Counts.assign(Record.MappingRegions.size(), 0); This assumption causes CounterMappingContext::evaluate() to fail since there are not enough counter values created in the above call to `Counts.assign`. Consequently, some uncovered functions are not reported in coverage reports. This change walks a Function's CoverageMappingRecord to find the maximum counter ID, and uses it to initialize the counter array when instrprof records are missing for a function in sparse profiles. Differential Revision: https://reviews.llvm.org/D101780	2021-05-19 10:46:38 -07:00
Roman Lebedev	22691d2522	[NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): use DeleteDeadBlocks()	2021-05-19 20:38:30 +03:00
Roman Lebedev	9892711a85	[NFCI][Local] MergeBlockIntoPredecessor(): use DeleteDeadBlocks()	2021-05-19 20:38:30 +03:00
Roman Lebedev	8b0f054cf7	[NFCI][Local] removeUnreachableBlocks(): use DeleteDeadBlocks()	2021-05-19 20:38:30 +03:00
Patrick Holland	81da9f4819	[MCA] llvm-mca MCTargetStreamer segfault fix In order to create the code regions for llvm-mca to analyze, llvm-mca creates an AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions(). Within this function, both an MCAsmParser and MCTargetAsmParser are created so that MCAsmParser::Run() can be used to create the code regions for us. These parser classes were created for llvm-mc so they are designed to emit code with an MCStreamer and MCTargetStreamer that are expected to be setup and passed into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any code, an MCStreamerWrapper class gets created instead and passed into the MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides many of the emit methods to just do nothing. The exception is the emitInstruction() method which calls Regions.addInstruction(Inst). This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build our code regions, however there are a few directives which rely on the MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of those directives, a segfault will occur. In x86, each one of these 7 directives will cause this segfault if they exist in the input assembly to llvm-mca: .cv_fpo_proc .cv_fpo_setframe .cv_fpo_pushreg .cv_fpo_stackalloc .cv_fpo_stackalign .cv_fpo_endprologue .cv_fpo_endproc I haven’t looked at other targets, but I wouldn’t be surprised if some of the other ones also have certain directives which could result in this same segfault. My proposed solution is to simply initialize an MCTargetStreamer after we initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream object, but we don’t actually want any of these directives to be emitted anywhere, so I use an ostream created with the nulls() function. Since this needs to happen after the MCStreamerWrapper has been initialized, it needs to happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize within the main() function of llvm-mca. So this MCInstPrinter gets constructed within main() then passed into the parseCodeRegions() function as a parameter. (If you feel like it would be appropriate and possible to create the MCInstPrinter within the parseCodeRegions() function, then feel free to modify my solution. That would stop us from having to pass it into the function and would limit its scope / lifetime.) My solution stops the segfault from happening and still passes all of the current (expected) llvm-mca tests. I also added a new test for x86 that checks for this segfault on an input that includes one of the .cv_fpo directives (this test fails without my solution, but passes with it). As far as I can tell, all of the functions that I modified are only called from within llvm-mca so there shouldn’t be any worries about breaking other tools. Differential Revision: https://reviews.llvm.org/D102709	2021-05-19 18:36:10 +01:00
Philip Reames	a4f9bca98e	Do actual DCE in LoopUnroll (try 4) Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param. I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation. I'm simplfy dropping that part of the change. Commit message from try 3: Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :) Original commit message: The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case. LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions. Differential Revision: https://reviews.llvm.org/D102511	2021-05-19 10:25:31 -07:00
Sanjay Patel	faf74c5c9d	[x86] propagate FMF from x86-specific intrinsic nodes to others during lowering This is another fast-math-flags failure exposed by D90901.	2021-05-19 13:11:15 -04:00
Sanjay Patel	a3f02702bf	[x86] add test check lines to demonstrate FMF propagation failure; NFC	2021-05-19 13:11:15 -04:00
Nikita Popov	ae95b48b2e	[ScalarEvolution] Remove unused ExitLimit::hasOperand() method (NFC) We only use BackedgeTakenInfo::hasOperand().	2021-05-19 18:42:14 +02:00
Jessica Paquette	c0b812fa18	Recommit "[GlobalISel] Simplify G_ICMP to true/false when the result is known" Add missing REQUIRES line to prelegalizer-combiner-icmp-to-true-false-known-bits.	2021-05-19 09:29:19 -07:00
Hongtao Yu	ba7bb5fc7d	[CSSPGO] Overwrite branch weight annotated in previous pass. Sample profile loader can be run in both LTO prelink and postlink. Currently the counts annoation in postilnk doesn't fully overwrite what's done in prelink. I'm adding a switch (`-overwrite-existing-weights=1`) to enable a full overwrite, which includes: 1. Clear old metadata for calls when their parent block has a zero count. This could be caused by prelink code duplication. 2. Clear indirect call metadata if somehow all the rest targets have a sum of zero count. 3. Overwrite branch weight for basic blocks. With a CS profile, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obsolete metadata, which come from a partial global inlining in prelink. It's not expected to work well for non-CS case with a less-accurate post-inline count quality. It's worth calling out that some prelink optimizations can damage counts quality in an irreversible way. One example is the loop rotate optimization. Due to lack of exact loop entry count (profiling can only give loop iteration count and loop exit count), moving one iteration out of the loop body leaves the rest iteration count unknown. We had to turn off prelink loop rotate to achieve a better postlink counts quality. A even better postlink counts quality can be archived by turning off prelink CGSCC inlining which is not context-sensitive. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D102537	2021-05-19 09:12:24 -07:00
Amy Huang	3f3a533213	Revert "Do actual DCE in LoopUnroll (try 3)" This reverts commit b6320eeb8622f05e4a5d4c7f5420523357490fca as it causes clang to assert; see https://reviews.llvm.org/rGb6320eeb8622f05e4a5d4c7f5420523357490fca.	2021-05-19 08:53:38 -07:00
Mariusz Ceier	4faf75c7ac	Fix lld macho standalone build by including llvm/Config/llvm-config.h instead of llvm/Config/config.h lld/MachO/Driver.cpp and lld/MachO/SyntheticSections.cpp include llvm/Config/config.h which doesn't exist when building standalone lld. This patch replaces llvm/Config/config.h include with llvm/Config/llvm-config.h just like it is in lld/ELF/Driver.cpp and HAVE_LIBXAR with LLVM_HAVE_LIXAR and moves LLVM_HAVE_LIBXAR from config.h to llvm-config.h Also it adds LLVM_HAVE_LIBXAR to LLVMConfig.cmake and links liblldMachO2.so with XAR_LIB if LLVM_HAVE_LIBXAR is set. Differential Revision: https://reviews.llvm.org/D102084	2021-05-19 11:15:07 -04:00
Simon Moll	0dc8431dd3	[VP] make getFunctionalOpcode return an Optional The operation of some VP intrinsics do/will not map to regular instruction opcodes. Returning 'None' seems more intuitive here than 'Instruction::Call'. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102778	2021-05-19 17:08:34 +02:00
Anirudh Prasad	5ae5cfdb6b	[AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1 - This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html \| main RFC here ]]) - This patch in particular introduces HLASM Parser support for Z machine instructions. - The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate. - The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well. The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf \| HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format): ``` <TokA><spaces.><TokB><spaces.><TokC><spaces.*><TokD> ``` 1. TokA is referred to as the Name Entry. This token is optional 2. TokB is referred to as the Operation Entry. This token is mandatory. 3. TokC is referred to as the Operand Entry. This token is mandatory 4. TokD is referred to as the Remarks Entry. This token is optional - If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on. - If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field - TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error. - TokD if provided is taken as is, and emitted as a comment. The following additional approach was examined, but not taken: - Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage. Testing: - This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on. - Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated. Reviewed By: Kai, uweigand Differential Revision: https://reviews.llvm.org/D98276	2021-05-19 11:05:30 -04:00
Wang, Pengfei	ca21d7bdab	Reapply "[X86] Limit X86InterleavedAccessGroup to handle the same type case only" The current implementation assumes the destination type of shuffle is the same as the decomposed ones. Add the check to avoid crush when the condition is not satisfied. This fixes PR37616. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D102751	2021-05-19 22:27:16 +08:00
Simon Pilgrim	3aef61c246	Revert rG528bc10e95d5f9d6a338f9bab5e91d7265d1cf05 : "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB" Reports on D101970 indicate this is causing failures on multi-stage compiles.	2021-05-19 15:01:20 +01:00
Simon Pilgrim	bab1553d0f	[X86][AVX] createVariablePermute - generalize the PR50356 fix for smaller indices vector as well Generalize the fix from rGd0902a8665b1 by ensuring we widen/narrow the indices subvector first and then perform the ZERO_EXTEND_VECTOR_INREG (if necessary), which should allow us to perform the variable permutes with source/destination/indices vectors of any widths.	2021-05-19 14:39:41 +01:00
Simon Pilgrim	8ad28e2352	[X86][Atom] Fix vector integer shift by immediate resource/throughputs Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only. Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.	2021-05-19 14:39:40 +01:00
Nico Weber	a80d7c0abf	Revert "[GlobalISel] Simplify G_ICMP to true/false when the result is known" This reverts commit 892497c806306a4b7185ead16d60b0ebcca0a304. Breaks tests, see comments on https://reviews.llvm.org/D102542	2021-05-19 09:02:27 -04:00
Peter Waller	5bc661d652	[llvm][AArch64][SVE] Model FFR-using intrinsics with inaccessiblemem Intriniscs reading or writing the FFR register need to model the fact there is additional state being read/wrtten. Model this state as inaccessible memory. * setffr => write inaccessiblememonly * rdffr => read inaccessiblememonly * ldff* => read arg memory, write inaccessiblemem * ldnf => read arg memory, write inaccessiblemem	2021-05-19 13:50:13 +01:00
Wang, Pengfei	7db4595c62	Revert "[X86] Limit X86InterleavedAccessGroup to handle the same type case only" This reverts commit ca23a38e373142a18ab56700ba4f3b947bfe9db0. Revert due to EXPENSIVE_CHECKS fail.	2021-05-19 20:35:45 +08:00
David Sherwood	505c304121	Remove scalable vector assert from InnerLoopVectorizer::setDebugLocFromInst In InnerLoopVectorizer::setDebugLocFromInst we were previously asserting that the VF is not scalable. This is because we want to use the number of elements to create a duplication factor for the debug profiling data. However, for scalable vectors we only know the minimum number of elements. I've simply removed the assert for now and added a FIXME saying that we assume vscale is always 1. When vscale is not 1 it just means that the profiling data isn't as accurate, but shouldn't cause any functional problems.	2021-05-19 13:33:10 +01:00
Kristina Bessonova	248e44368a	[ARM][NEON] Combine base address updates for vst1x intrinsics Differential Revision: https://reviews.llvm.org/D102256	2021-05-19 14:05:55 +02:00
Sanjay Patel	151c1a6abb	[SDAG] propagate FMF from target-specific IR intrinsics This is a step towards relying more on node-level FMF rather than function-wide or target settings. I think it was just an oversight that we didn't get this path in D87361 or follow-on patches. The lack of FMF propagation is blocking D90901 from converting tests to IR-level FMF. We can't do much more than this currently because we also fail to propagate flags from x86-specific node to generic FMA node. That would be another patch, so the test just verifies that we can transfer from IR to initial SDAG node. Differential Revision: https://reviews.llvm.org/D102725	2021-05-19 07:50:50 -04:00
Simon Pilgrim	8154ce8a38	[X86] Atom (pre-SLM) doesn't support PTEST instructions	2021-05-19 12:25:29 +01:00
Simon Pilgrim	e681f622ed	[X86] Remove copy + paste typos in AtomWriteResPair comment. Remnants from when the Atom model was copied from the Btver2 model.....	2021-05-19 12:25:28 +01:00
Roman Lebedev	1a82744191	[NFCI][SimplifyCFG] removeEmptyCleanup(): use DeleteDeadBlock() This required some changes to, instead of eagerly making PHI's in the UnwindDest valid as-if the BB is already not a predecessor, to be valid while BB is still a predecessor.	2021-05-19 14:08:25 +03:00
Roman Lebedev	a81902844e	[NFCI][SimplifyCFG] removeEmptyCleanup(): streamline PHI node updating	2021-05-19 14:08:25 +03:00
Roman Lebedev	e35bc1895e	[NFC][SimplifyCFG] removeEmptyCleanup(): use BasicBlock::phis()	2021-05-19 14:08:24 +03:00
Frederik Gossen	389ddc2632	[x86] Fix FMF propagation test	2021-05-19 12:50:03 +02:00
Wang, Pengfei	da3f0df65c	[X86] Limit X86InterleavedAccessGroup to handle the same type case only The current implementation assumes the destination type of shuffle is the same as the decomposed ones. Add the check to avoid crush when the condition is not satisfied. This fixes PR37616. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D102751	2021-05-19 18:39:08 +08:00
Simon Giesecke	5d240263ac	Use a non-recursive mutex in GsymCreator. There doesn't seem to be a need to support recursive locking, and a recursive mutex is unnecessarily inefficient. Differential Revision: https://reviews.llvm.org/D102486	2021-05-19 10:06:47 +00:00
Simon Giesecke	5e5a7e126d	Move FunctionInfo in addFunctionInfo rather than copying. Differential Revision: https://reviews.llvm.org/D102485	2021-05-19 10:06:47 +00:00
Simon Giesecke	81114c6d6b	Avoid calculating the string hash twice in GsymCreator::insertString. Do the single hash calculation before acquiring the lock, to reduce lock contention. If Copy is true, and the string was not yet contained in the StringStorage, use the new address from StringStorage, but reuse the hash we already calculated. Differential Revision: https://reviews.llvm.org/D102484	2021-05-19 10:06:47 +00:00
Simon Giesecke	3711884c22	Reformat GSYMCreator.cpp Differential Revision: https://reviews.llvm.org/D102483	2021-05-19 10:06:47 +00:00
Tim Northover	d6d3226a57	MachineBasicBlock: add liveout iterator aware of which liveins are defined by the runtime. Using this in RegAlloc fast reduces register pressure, and in some cases allows x86 code to compile that wouldn't before.	2021-05-19 11:00:24 +01:00
Sander de Smalen	2e7409ccb5	[LV] Add -scalable-vectorization=<option> flag. This patch adds a new option to the LoopVectorizer to control how scalable vectors can be used. Initially, this suggests three levels to control scalable vectorization, although other more aggressive options can be added in the future. The possible options are: - Disabled: Disables vectorization with scalable vectors. - Enabled: Vectorize loops using scalable vectors or fixed-width vectors, but favors fixed-width vectors when the cost is a tie. - Preferred: Like 'Enabled', but favoring scalable vectors when the cost-model is inconclusive. Reviewed By: paulwalker-arm, vkmr Differential Revision: https://reviews.llvm.org/D101945	2021-05-19 10:40:56 +01:00
Roman Lebedev	ed37916b75	[NFCI][SimplifyCFG] simplifyUnreachable(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	53f872efb3	[NFCI][SimplifyCFG] simplifyReturn(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	48a15d566f	[NFCI][SimplifyCFG] simplifySingleResume(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Roman Lebedev	e411ba6370	[NFCI][SimplifyCFG] simplifyCommonResume(): use DeleteDeadBlock()	2021-05-19 12:04:22 +03:00
Sergey Dmitriev	6d117dab29	[llvm-objcopy] Add support for '--' for delimiting options from input/output files This will allow to use llvm-objcopy with file names that begin with dashes. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D102665	2021-05-19 01:56:46 -07:00
Fraser Cormack	5e139f014a	[RISCV] Support INSERT_VECTOR_ELT into i1 vectors Like the element extraction of these vectors, we choose to promote up to an i8 vector type and perform the insertion there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102697	2021-05-19 09:41:50 +01:00

... 2 3 4 5 6 ...

216192 Commits