1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

216059 Commits

Author SHA1 Message Date
Andrew Savonichev
607f7c19c2 [AArch64] Combine vector shift instructions in SelectionDAG
bswap.v2i16 + sitofp in LLVM IR generate a sequence of:

  - REV32 + USHR for bswap.v2i16
  - SHL + SSHR + SCVTF for sext to v2i32 and scvt

The shift instructions are excessive as noted in PR24820, and they can
be optimized to just SSHR.

Differential Revision: https://reviews.llvm.org/D102333
2021-05-20 10:50:13 +03:00
Amara Emerson
e9e3784d95 [GlobalISel] Fix div+rem -> divrem combine causing use-def violation. 2021-05-19 23:13:41 -07:00
Simon Giesecke
aa649d36a0 Add option to llvm-gsymutil to read addresses from stdin.
Differential Revision: https://reviews.llvm.org/D102224
2021-05-20 06:10:35 +00:00
Xiang1 Zhang
652257ec88 Revert "[HWASAN] Update the tag info for X86_64."
This reverts commit 81c18ce03cd8199cc4f2c817e31b42a191a0fe7d.
2021-05-20 13:12:59 +08:00
Xiang1 Zhang
de91b3f2fe [HWASAN] Update the tag info for X86_64.
In LAM model X86_64 will use bits 57-62 (of 0-63) as HWASAN tag.
So here we make sure the tag shift position and tag mask is correct for x86-64.

Differential Revision: https://reviews.llvm.org/D102472
2021-05-20 11:22:12 +08:00
Sergey Dmitriev
5281788ba1 [llvm-objcopy] Update LIT test to resolve bot failure [NFC]
Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D102823
2021-05-19 19:56:35 -07:00
Zhiwei Chen
8ecb4a2780 [sanitizer] Reduce redzone size for small size global objects
Currently 1 byte global object has a ridiculous 63 bytes redzone.
This patch reduces the redzone size to be less than 32 if the size of global object is less than or equal to half of 32 (the minimal size of redzone).
A 12 bytes object has a 20 bytes redzone, a 20 bytes object has a 44 bytes redzone.

Reviewed By: MaskRay, #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D102469
2021-05-19 19:18:50 -07:00
Jon Roelofs
94fbaca3d2 Fix warnings in windows bots. NFC 2021-05-19 17:42:34 -07:00
LLVM GN Syncbot
077e051b22 [gn build] Port 4bf69fb52b3c 2021-05-19 22:27:27 +00:00
Ahmed Bougacha
50227f56b6 [docs] Describe reporting security issues on the chromium tracker.
To track security issues, we're starting with the chromium bug tracker
(using the llvm project there).

We considered using Github Security Advisories.  However, they are
currently intended as a way for project owners to publicize their
security advisories, and aren't well-suited to reporting issues.

This also moves the issue-reporting paragraph to the beginning of the
document, in part to make it more discoverable, in part to allow the
anchor-linking to actually display the paragraph at the top of the page.

Note that this doesn't update the concrete list of security-sensitive
areas, which is still an open item.  When we do, we may want to move the
list of security-sensitive areas next to the issue-reporting paragraph
as well, as it seems like relevant information needed in the reporting
process.

Finally, when describing the discission medium, this splits the topics
discussed into two: the concrete security issues, discussed in the
issue tracker, and the logistics of the group, in our mailing list,
as patches on public lists, and in the monthly sync-up call.

While there, add a SECURITY.md page linking to the relevant paragraph.

Differential Revision: https://reviews.llvm.org/D100873
2021-05-19 15:21:50 -07:00
Jon Roelofs
ad30e385ed [Remarks] Add analysis remarks for memset/memcpy/memmove lengths
Differential revision: https://reviews.llvm.org/D102452
2021-05-19 15:09:18 -07:00
Ryan Prichard
dc2b1a16fe [MC][ARM] Reject Thumb "ror rX, #0"
The ROR instruction can only handle immediates between 1 and 31. The
would-be encoding for ROR #0 is actually the RRX instruction.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D102455
2021-05-19 15:05:39 -07:00
Petr Hosek
2054565e98 [CMake] Don't LTO optimize targets that aren't part of any distribution
When using distributions, targets that aren't included in any
distribution don't need to be as optimized as targets that are
included since those targets are typically only used for tests.

We might consider avoiding LTO for these targets altogether, see
https://lists.llvm.org/pipermail/llvm-dev/2021-April/149843.html

Differential Revision: https://reviews.llvm.org/D102732
2021-05-19 15:02:11 -07:00
wlei
aa88690f46 [CSSPGO] Avoid deleting probe instruction in FoldValueComparisonIntoPredecessors
This change tries to fix a place missing `moveAndDanglePseudoProbes `. In FoldValueComparisonIntoPredecessors, it folds the BB into predecessors and then marked the BB unreachable. However, the original logic from the BB is still alive, deleting the probe will mislead the SampleLoader mark it as zero count sample.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D102721
2021-05-19 13:39:05 -07:00
Lang Hames
0e09eb0cf2 [ORC] Add a CPU getter to JITTargetMachineBuilder. 2021-05-19 13:31:25 -07:00
Arthur Eubanks
63f5e603f7 [OpaquePtr] Make atomicrmw work with opaque pointers
FullTy is only necessary when we need to figure out what type an
instruction works with given a pointer's pointee type. However, we just
end up using the value operand's type, so FullTy isn't necessary.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102788
2021-05-19 12:49:28 -07:00
Arthur Eubanks
208107dd2c [OpaquePtr] Make cmpxchg work with opaque pointers
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102745
2021-05-19 12:44:10 -07:00
Arthur Eubanks
764e5745a3 [OpaquePtr] Make GEPs work with opaque pointers
No verifier changes needed, the verifier currently doesn't check that
the pointer operand's pointee type matches the GEP type. There is a
similar check in GetElementPtrInst::Create() though.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102744
2021-05-19 12:39:37 -07:00
Joseph Huber
a0d824fa55 [Diagnostics] Allow emitting analysis and missed remarks on functions
Summary:
Currently, only `OptimizationRemarks` can be emitted using a Function.
Add constructors to allow this for `OptimizationRemarksAnalysis` and
`OptimizationRemarkMissed` as well.

Reviewed By: jdoerfert thegameg

Differential Revision: https://reviews.llvm.org/D102784
2021-05-19 15:10:20 -04:00
Sanjay Patel
f66389763f [x86] add tests for fma folds with fast-math-flags; NFC
Part of prep work for D90901
2021-05-19 14:28:57 -04:00
Sanjay Patel
f47bf23962 [x86] propagate FMF from x86-specific intrinsic nodes to others during combining
This is another FMF gap exposed by D90901, but I don't see a way
to show the difference in a regression test as with:
f66ba4c
6025663

We will see an asm difference if we add a test as part of D90901.
2021-05-19 14:25:09 -04:00
Andrea Di Biagio
af6ea1e0bd [MCA] Unbreak the buildbots by passing flag -mcpu=generic to the new test added by commit e5d59db469.
This should unbreak buildbot clang-ppc64le-linux-lnt.
2021-05-19 19:12:33 +01:00
Sanjay Patel
ce4fb4a2a8 [x86] update fma test with deprecated intrinsics; NFC
Similar to 8854b27 -

All of the CHECK lines should be identical to before,
but without any of the x86-specific calls that were
replaced with generic FMA long ago.

The file still has value because it shows a miscompile
as demonstrated in D90901, but we probably need to
add tests with FMF to make that explicit without
losing coverage.
2021-05-19 13:52:08 -04:00
Pirama Arumuga Nainar
7b5dc69f32 [CoverageMapping] Handle gaps in counter IDs for source-based coverage
For source-based coverage, the frontend sets the counter IDs and the
constraints of counter IDs is not defined.  For e.g., the Rust frontend
until recently had a reserved counter #0
(https://github.com/rust-lang/rust/pull/83774).  Rust coverage
instrumentation also creates counters on edges in addition to basic
blocks.  Some functions may have more counters than regions.

This breaks an assumption in CoverageMapping.cpp where the number of
counters in a function is assumed to be bounded by the number of
regions:
  Counts.assign(Record.MappingRegions.size(), 0);

This assumption causes CounterMappingContext::evaluate() to fail since
there are not enough counter values created in the above call to
`Counts.assign`.  Consequently, some uncovered functions are not
reported in coverage reports.

This change walks a Function's CoverageMappingRecord to find the maximum
counter ID, and uses it to initialize the counter array when instrprof
records are missing for a function in sparse profiles.

Differential Revision: https://reviews.llvm.org/D101780
2021-05-19 10:46:38 -07:00
Roman Lebedev
22691d2522 [NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): use DeleteDeadBlocks() 2021-05-19 20:38:30 +03:00
Roman Lebedev
9892711a85 [NFCI][Local] MergeBlockIntoPredecessor(): use DeleteDeadBlocks() 2021-05-19 20:38:30 +03:00
Roman Lebedev
8b0f054cf7 [NFCI][Local] removeUnreachableBlocks(): use DeleteDeadBlocks() 2021-05-19 20:38:30 +03:00
Patrick Holland
81da9f4819 [MCA] llvm-mca MCTargetStreamer segfault fix
In order to create the code regions for llvm-mca to analyze, llvm-mca creates an
AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions().
Within this function, both an MCAsmParser and MCTargetAsmParser are created so
that MCAsmParser::Run() can be used to create the code regions for us.

These parser classes were created for llvm-mc so they are designed to emit code
with an MCStreamer and MCTargetStreamer that are expected to be setup and passed
into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any
code, an MCStreamerWrapper class gets created instead and passed into the
MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides
many of the emit methods to just do nothing. The exception is the
emitInstruction() method which calls Regions.addInstruction(Inst).

This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build
our code regions, however there are a few directives which rely on the
MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the
MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because
llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of
those directives, a segfault will occur.

In x86, each one of these 7 directives will cause this segfault if they exist in
the input assembly to llvm-mca:

.cv_fpo_proc
.cv_fpo_setframe
.cv_fpo_pushreg
.cv_fpo_stackalloc
.cv_fpo_stackalign
.cv_fpo_endprologue
.cv_fpo_endproc
I haven’t looked at other targets, but I wouldn’t be surprised if some of the
other ones also have certain directives which could result in this same
segfault.

My proposed solution is to simply initialize an MCTargetStreamer after we
initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream
object, but we don’t actually want any of these directives to be emitted
anywhere, so I use an ostream created with the nulls() function. Since this
needs to happen after the MCStreamerWrapper has been initialized, it needs to
happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The
MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize
within the main() function of llvm-mca. So this MCInstPrinter gets constructed
within main() then passed into the parseCodeRegions() function as a parameter.
(If you feel like it would be appropriate and possible to create the
MCInstPrinter within the parseCodeRegions() function, then feel free to modify
my solution. That would stop us from having to pass it into the function and
would limit its scope / lifetime.)

My solution stops the segfault from happening and still passes all of the
current (expected) llvm-mca tests. I also added a new test for x86 that checks
for this segfault on an input that includes one of the .cv_fpo directives (this
test fails without my solution, but passes with it).

As far as I can tell, all of the functions that I modified are only called from
within llvm-mca so there shouldn’t be any worries about breaking other tools.

Differential Revision: https://reviews.llvm.org/D102709
2021-05-19 18:36:10 +01:00
Philip Reames
a4f9bca98e Do actual DCE in LoopUnroll (try 4)
Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param.  I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation.  I'm simplfy dropping that part of the change.

Commit message from try 3:

Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)

Original commit message:

The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.

LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.

Differential Revision: https://reviews.llvm.org/D102511
2021-05-19 10:25:31 -07:00
Sanjay Patel
faf74c5c9d [x86] propagate FMF from x86-specific intrinsic nodes to others during lowering
This is another fast-math-flags failure exposed by D90901.
2021-05-19 13:11:15 -04:00
Sanjay Patel
a3f02702bf [x86] add test check lines to demonstrate FMF propagation failure; NFC 2021-05-19 13:11:15 -04:00
Nikita Popov
ae95b48b2e [ScalarEvolution] Remove unused ExitLimit::hasOperand() method (NFC)
We only use BackedgeTakenInfo::hasOperand().
2021-05-19 18:42:14 +02:00
Jessica Paquette
c0b812fa18 Recommit "[GlobalISel] Simplify G_ICMP to true/false when the result is known"
Add missing REQUIRES line to
prelegalizer-combiner-icmp-to-true-false-known-bits.
2021-05-19 09:29:19 -07:00
Hongtao Yu
ba7bb5fc7d [CSSPGO] Overwrite branch weight annotated in previous pass.
Sample profile loader can be run in both LTO prelink and postlink. Currently the counts annoation in postilnk doesn't fully overwrite what's done in prelink. I'm adding a switch (`-overwrite-existing-weights=1`) to enable a full overwrite, which includes:

1. Clear old metadata for calls when their parent block has a zero count. This could be caused by prelink code duplication.

2. Clear indirect call metadata if somehow all the rest targets have a sum of zero count.

3. Overwrite branch weight for basic blocks.

With a CS profile, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obsolete metadata, which come from a partial global inlining in prelink.  It's not expected to work well for non-CS case with a less-accurate post-inline count quality.

It's worth calling out that some prelink optimizations can damage counts quality in an irreversible way. One example is the loop rotate optimization. Due to lack of exact loop entry count (profiling can only give loop iteration count and loop exit count), moving one iteration out of the loop body leaves the rest iteration count unknown. We had to turn off prelink loop rotate to achieve a better postlink counts quality. A even better postlink counts quality can be archived by turning off prelink CGSCC inlining which is not context-sensitive.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D102537
2021-05-19 09:12:24 -07:00
Amy Huang
3f3a533213 Revert "Do actual DCE in LoopUnroll (try 3)"
This reverts commit b6320eeb8622f05e4a5d4c7f5420523357490fca
as it causes clang to assert; see
https://reviews.llvm.org/rGb6320eeb8622f05e4a5d4c7f5420523357490fca.
2021-05-19 08:53:38 -07:00
Mariusz Ceier
4faf75c7ac Fix lld macho standalone build by including llvm/Config/llvm-config.h instead of llvm/Config/config.h
lld/MachO/Driver.cpp and lld/MachO/SyntheticSections.cpp include
llvm/Config/config.h which doesn't exist when building standalone lld.

This patch replaces llvm/Config/config.h include with llvm/Config/llvm-config.h
just like it is in lld/ELF/Driver.cpp and HAVE_LIBXAR with LLVM_HAVE_LIXAR and
moves LLVM_HAVE_LIBXAR from config.h to llvm-config.h

Also it adds LLVM_HAVE_LIBXAR to LLVMConfig.cmake and links liblldMachO2.so
with XAR_LIB if LLVM_HAVE_LIBXAR is set.

Differential Revision: https://reviews.llvm.org/D102084
2021-05-19 11:15:07 -04:00
Simon Moll
0dc8431dd3 [VP] make getFunctionalOpcode return an Optional
The operation of some VP intrinsics do/will not map to regular
instruction opcodes.  Returning 'None' seems more intuitive here than
'Instruction::Call'.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102778
2021-05-19 17:08:34 +02:00
Anirudh Prasad
5ae5cfdb6b [AsmParser][SystemZ][z/OS] Introducing HLASM Parser support to AsmParser - Part 1
- This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html | main RFC here ]])
- This patch in particular introduces HLASM Parser support for Z machine instructions.
- The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate.
- The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well.

The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf | HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format):

```
<TokA><spaces.*><TokB><spaces.*><TokC><spaces.*><TokD>
```

1. TokA is referred to as the Name Entry. This token is optional
2. TokB is referred to as the Operation Entry. This token is mandatory.
3. TokC is referred to as the Operand Entry. This token is mandatory
4. TokD is referred to as the Remarks Entry. This token is optional

- If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on.
- If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field
- TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error.
- TokD if provided is taken as is, and emitted as a comment.

The following additional approach was examined, but not taken:

- Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage.

Testing:

- This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on.

- Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated.

Reviewed By: Kai, uweigand

Differential Revision: https://reviews.llvm.org/D98276
2021-05-19 11:05:30 -04:00
Wang, Pengfei
ca21d7bdab Reapply "[X86] Limit X86InterleavedAccessGroup to handle the same type case only"
The current implementation assumes the destination type of shuffle is the same as the decomposed ones. Add the check to avoid crush when the condition is not satisfied.

This fixes PR37616.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D102751
2021-05-19 22:27:16 +08:00
Simon Pilgrim
3aef61c246 Revert rG528bc10e95d5f9d6a338f9bab5e91d7265d1cf05 : "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB"
Reports on D101970 indicate this is causing failures on multi-stage compiles.
2021-05-19 15:01:20 +01:00
Simon Pilgrim
bab1553d0f [X86][AVX] createVariablePermute - generalize the PR50356 fix for smaller indices vector as well
Generalize the fix from rGd0902a8665b1 by ensuring we widen/narrow the indices subvector first and then perform the ZERO_EXTEND_VECTOR_INREG (if necessary), which should allow us to perform the variable permutes with source/destination/indices vectors of any widths.
2021-05-19 14:39:41 +01:00
Simon Pilgrim
8ad28e2352 [X86][Atom] Fix vector integer shift by immediate resource/throughputs
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-19 14:39:40 +01:00
Nico Weber
a80d7c0abf Revert "[GlobalISel] Simplify G_ICMP to true/false when the result is known"
This reverts commit 892497c806306a4b7185ead16d60b0ebcca0a304.
Breaks tests, see comments on https://reviews.llvm.org/D102542
2021-05-19 09:02:27 -04:00
Peter Waller
5bc661d652 [llvm][AArch64][SVE] Model FFR-using intrinsics with inaccessiblemem
Intriniscs reading or writing the FFR register need to model the fact
there is additional state being read/wrtten.

Model this state as inaccessible memory.

* setffr => write inaccessiblememonly
* rdffr => read inaccessiblememonly
* ldff* => read arg memory, write inaccessiblemem
* ldnf => read arg memory, write inaccessiblemem
2021-05-19 13:50:13 +01:00
Wang, Pengfei
7db4595c62 Revert "[X86] Limit X86InterleavedAccessGroup to handle the same type case only"
This reverts commit ca23a38e373142a18ab56700ba4f3b947bfe9db0.

Revert due to EXPENSIVE_CHECKS fail.
2021-05-19 20:35:45 +08:00
David Sherwood
505c304121 Remove scalable vector assert from InnerLoopVectorizer::setDebugLocFromInst
In InnerLoopVectorizer::setDebugLocFromInst we were previously
asserting that the VF is not scalable. This is because we want to
use the number of elements to create a duplication factor for the
debug profiling data. However, for scalable vectors we only know the
minimum number of elements. I've simply removed the assert for now
and added a FIXME saying that we assume vscale is always 1. When
vscale is not 1 it just means that the profiling data isn't as
accurate, but shouldn't cause any functional problems.
2021-05-19 13:33:10 +01:00
Kristina Bessonova
248e44368a [ARM][NEON] Combine base address updates for vst1x intrinsics
Differential Revision: https://reviews.llvm.org/D102256
2021-05-19 14:05:55 +02:00
Sanjay Patel
151c1a6abb [SDAG] propagate FMF from target-specific IR intrinsics
This is a step towards relying more on node-level FMF rather than function-wide
or target settings.
I think it was just an oversight that we didn't get this path in D87361
or follow-on patches.

The lack of FMF propagation is blocking D90901 from converting tests to IR-level FMF.

We can't do much more than this currently because we also fail to propagate flags
from x86-specific node to generic FMA node. That would be another patch, so the
test just verifies that we can transfer from IR to initial SDAG node.

Differential Revision: https://reviews.llvm.org/D102725
2021-05-19 07:50:50 -04:00
Simon Pilgrim
8154ce8a38 [X86] Atom (pre-SLM) doesn't support PTEST instructions 2021-05-19 12:25:29 +01:00
Simon Pilgrim
e681f622ed [X86] Remove copy + paste typos in AtomWriteResPair comment.
Remnants from when the Atom model was copied from the Btver2 model.....
2021-05-19 12:25:28 +01:00