1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 19:23:23 +01:00
Commit Graph

198342 Commits

Author SHA1 Message Date
Qiu Chaofan
d9107e9132 [PowerPC] Exploit vnmsubfp instruction
On PowerPC, we have vnmsubfp Altivec instruction for fnmsub operation on
v4f32 type. Default pattern for this instruction never works since we
don't have legal fneg for v4f32 when VSX disabled.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D80617
2020-06-14 23:19:17 +08:00
Qiu Chaofan
e59e06d663 [DAGCombiner] Require ninf for division estimation
Current implementation of division estimation isn't correct for some
cases like 1.0/0.0 (result is nan, not expected inf).

And this change exposes a potential infinite loop: we use
isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the
divisor is some constant. But it doesn't work after legalized on some
platforms. This patch restricts the method to act before LegalDAG.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D80542
2020-06-14 22:58:22 +08:00
Sanjay Patel
0a4fe71f16 [PassManager] restore early-cse to vector cleanup
As noted in D80236 - the early-cse pass was included here before:
D75145 / rG71a316883d50
But it got moved outside of the "extra" option there, then it
got dropped while adjusting -vector-combine:
rG6438ea45e053
rG57bb4787d72f

So this is restoring the behavior and adding a test to prevent
accidental changes again. I don't see an equivalent option for
the new pass manager.
2020-06-14 10:04:53 -04:00
Nikita Popov
90047b072b [LVI] Fix class indentation (NFC)
This class uses a mix of different indentation levels, normalize it.
2020-06-14 15:42:27 +02:00
Nikita Popov
cdb551ea49 [LVI] Cache lookup of experimental.guard intrinsic (NFC)
When LVI is performing assume intersections, it also checks for
llvm.experimental.guard intrinsics. To avoid unnecessary block
scans, it first checks whether this intrinsic is declared in the
module at all. I've noticed that we end up spending quite a lot
of time looking up that function again and again...

Avoid this by only looking it up once when LazyValueInfo is
constructed. This of course assumes that we don't introduce new
guard intrinsics (which is the case for all existing uses of LVI --
and even if it weren't, it would not introduce miscompiles, just
potentially lose optimization power.)

Differential Revision: https://reviews.llvm.org/D81796
2020-06-14 15:32:30 +02:00
David Green
e27592190d [ARM] Additional cast cost tests.
This adds additional cast cpst tests useful for MVE, notably around half
types.
2020-06-14 14:30:07 +01:00
Sanjay Patel
648f89af4d [InstCombine] reassociate FP diff of sums into sum of diffs
(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])

This should be the last step in solving PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953

We started emitting reduction intrinsics with:
D80867/ rGe50059f6b6b3
So it's a relatively easy pattern match now to re-order those ops.
Also, I have not seen any complaints for the switch to intrinsics
yet, so I'll propose to remove the "experimental" tag from the
intrinsics soon.

Differential Revision: https://reviews.llvm.org/D81491
2020-06-14 09:09:03 -04:00
Sanjay Patel
eb4a7d1736 [InstCombine] allow undef elements when comparing vector constants for min/max bailout
This is a hacky, but low-risk fix to avoid the infinite loop in PR46271:
https://bugs.llvm.org/show_bug.cgi?id=46271

As discussed there, the problem is that FoldOpIntoSelect() can get into a conflict
with a transform that wants to pull a 'not' op through min/max via
SimplifyDemandedVectorElts(). We need to relax our matching of min/max to include
undefined elements in vector constants to avoid that. Alternatively, we could
improve or cripple the demanded elements analysis, but that could create even
more problems.

The likely better, safer alternative will be to create min/max intrinsics, so
we can remove all of the hacks related to min/max matching in instcombine.

Differential Revision: https://reviews.llvm.org/D81698
2020-06-14 09:02:47 -04:00
Simon Pilgrim
226765afee [X86][SSE] LowerVectorAllZeroTest - add support for pre-SSE41 targets
Even without PTEST, we can still efficiently perform an OR reduction as PMOVMSKB(PCMPEQB(X,0)) == 0, avoiding xmm->gpr extractions.
2020-06-14 13:41:56 +01:00
Simon Pilgrim
7d91c0c67b [X86][SSE] Add non-SSE41 target PTEST tests
Ensure codegen is still reasonable - ideally we'd make use of MOVMSK for this.
2020-06-14 12:23:10 +01:00
Xing GUO
e16c61e2d3 [NFC] mv llvm/test/tools/obj2yaml/macho-DWARF-debug-ranges.yaml llvm/test/ObjectYAML/MachO/DWARF-debug_ranges.yaml 2020-06-14 16:39:15 +08:00
Xing GUO
6a0fa3e985 [ObjectYAML][DWARF] Let the target address size be inferred from FileHeader.
This patch adds a new field `bool Is64bit` in `DWARFYAML::Data` to indicate the address size of target. It's helpful for inferring the `AddrSize` in some DWARF sections.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D81709
2020-06-14 12:42:20 +08:00
Fangrui Song
7106336a4c [IteratedDominanceFrontier] Decrease number of SmallPtrSet::insert and delete unneeded SmallVector::clear
Also, fix the argument name to be consistent with the declaration.
2020-06-13 19:48:50 -07:00
Craig Topper
12f0d14d7c [X86] Add mayLoad flag to FARCALL*m/FARJMP memory instrutions. Add 'm' to the end of FARJMP64/FARCALL64 instruction names.
We never codegen them so this doesn't matter in practice. But
sometimes someone comes along and tries to use these flags
for something else. LIke the Load Value Inject inline assembly
handling.
2020-06-13 15:40:51 -07:00
Craig Topper
0f96f788ae [X86] Automatically harden inline assembly RET instructions against Load Value Injection (LVI)
Previously, the X86AsmParser would issue a warning whenever a ret instruction is encountered. This patch changes the behavior to automatically transform each ret instruction in an inline assembly stream into:

shlq $0, (%rsp)
lfence
ret

which is secure, according to https://software.intel.com/security-software-guidance/insights/deep-dive-load-value-injection#specialinstructions.

Patch by Scott Constable with some minor changes by Craig Topper.
2020-06-13 15:16:05 -07:00
Craig Topper
1b3f3aebe1 [X86] Teach combineBitcastvxi1 to prefer movmsk on avx512 in more cases
If the input to the bitcast is a sign bit test, it makes sense to
directly use vpmovmskb or vmovmskps/pd. This removes the need to
copy the sign bits to a k-register and then to a GPR.

Fixes PR46200.

Differential Revision: https://reviews.llvm.org/D81327
2020-06-13 14:50:13 -07:00
Craig Topper
e6c4a0d6e9 [X86] Move -x86-use-vzeroupper command line flag into runOnMachineFunction for the pass itself rather than the pass pipeline construction
This pass has no dependencies on other passes so conditionally
including it in the pipeline doens't do much. Just move it the
pass itself to keep it isolated.
2020-06-13 14:42:41 -07:00
Roman Lebedev
6de1cdd580 [NFCI][AggressiveInstCombiner] Add STATISTIC()s for transforms 2020-06-13 23:53:16 +03:00
Florian Hahn
c23c96f75f [DSE,MSSA] Fix location order in isOverwrite call.
isOverwrite expects the later location as first argument and the earlier
result later. The adjusted call is intended to check whether CC
overwrites DefLoc.
2020-06-13 20:39:00 +01:00
Craig Topper
12bd6c2a43 [X86] Enable the EVEX->VEX compression pass at -O0.
A lot of what EVEX->VEX does is equivalent to what the
prioritization in the assembly parser does. When an AVX mnemonic
is used without any EVEX features or XMM16-31, the parser will
pick the VEX encoding.

Since codegen doesn't go through the parser, we should also
use VEX instructions when we can so that the code coming out of
integrated assembler matches what you'd get from outputing an
assembly listing and parsing it.

The pass early outs if AVX isn't enabled and uses TSFlags to
check for EVEX instructions before doing the more costly table
lookups. Hopefully that's enough to keep this from impacting
-O0 compile times.
2020-06-13 12:29:04 -07:00
Craig Topper
f7e2b5eebe [X86] Separate imm from relocImm handling.
relocImm was a complexPattern that handled both ConstantSDNode
and X86Wrapper. But it was only applied selectively because using
it would cause patterns to be not importable into FastISel or
GlobalISel. So it only got applied to flag setting instructions,
stores, RMW arithmetic instructions, and rotates.

Most of the test changes are a result of making patterns available
to GlobalISel or FastISel. The absolute-cmp.ll change is due to
this fixing a pattern ordering issue to make an absolute symbol
match to an 8-bit immediate before trying a 32-bit immediate.

I tried to use PatFrags to reduce the repetition, but I was getting
errors from TableGen.
2020-06-13 11:29:28 -07:00
Amanieu d'Antras
867f35bc0b Fix FastISel dropping srcloc metadata from InlineAsm
Summary:
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46060

I've also added the Extra_IsConvergent flag which was missing from FastISel.

Reviewers: echristo

Reviewed By: echristo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80759
2020-06-13 16:52:37 +01:00
Xing GUO
deb304b59f Recommit "[DWARFYAML][debug_line] Replace InitialLength with Format and Length."
This recommits fcc0c186e9cea0af644581069058f0e00469d20e
2020-06-13 23:39:11 +08:00
Xing GUO
370f4bd1aa Revert "[DWARFYAML][debug_line] Replace InitialLength with Format and Length."
This reverts commit fcc0c186e9cea0af644581069058f0e00469d20e.
2020-06-13 17:57:02 +08:00
Xing GUO
8aff593f62 [DWARFYAML][debug_line] Replace InitialLength with Format and Length. 2020-06-13 17:47:06 +08:00
Nikita Popov
f4f19b701b Reapply [LVI] Restructure caching to fix non-determinism
This was reverted due to a reported memory usage increase. However,
a test case was never provided, and I wasn't able to reproduce it
myself.

Relative to the original patch, I have moved the block cache
structure behind a unique_ptr, to avoid storing a huge structure
inside a DenseMap.

---

Variant on D70103 to fix https://bugs.llvm.org/show_bug.cgi?id=43909.
The caching is switched to always use a BB to cache entry map, which
then contains per-value caches. A separate set contains value handles
with a deletion callback. This allows us to properly invalidate
overdefined values.

A possible alternative would be to always cache by value first and
have per-BB maps/sets in the each cache entry. In that case we could
use a ValueMap and would avoid the separate value handle set. I went
with the BB indexing at the top level to make it easier to integrate
D69914, but possibly that's not the right choice.

Differential Revision: https://reviews.llvm.org/D70376
2020-06-13 11:31:40 +02:00
Craig Topper
232abcb694 [X86] Remove brand_id check from getHostCPUName.
Brand index was a feature some Pentium III and Pentium 4 CPUs.
It provided an index into a software lookup table to provide a
brand name for the CPU. This is separate from the family/model.

It's unclear to me why this index being non-zero was used to
block checking family/model. I think the effect of this is that
-march=native was not working correctly on the CPUs that have a
non-zero brand index. They are all about 20 years old so this
probably hasn't affected many users.
2020-06-12 20:38:30 -07:00
Mehdi Amini
5f9fe3c1d0 Fix GCC5 build by renaming variable used in 'auto' deduction (NFC)
GCC5 errors out with:

llvm/lib/Analysis/StackSafetyAnalysis.cpp:935:21: error: use of 'KV' before deduction of 'auto'
     for (auto &KV : KV.second.Params) {
                     ^
2020-06-13 03:08:56 +00:00
Dan Gohman
a6e436091a [WebAssembly] WebAssembly doesn't support "protected" visibility
Implement the `hasProtectedVisibility()` hook to indicate that, like
Darwin, WebAssembly doesn't support "protected" visibility.

On ELF, "protected" visibility is intended to be an optimization, however
in practice it often [isn't], and ELF documentation generally ranges from
[not mentioning it at all] to [strongly discouraging its use].

[isn't]: https://www.airs.com/blog/archives/307
[not mentioning it at all]: https://gcc.gnu.org/wiki/Visibility
[strongly discouraging its use]: https://www.akkadia.org/drepper/dsohowto.pdf

While here, also mention the new Reactor support in the release notes.
2020-06-12 19:52:35 -07:00
Craig Topper
fb25648f7b [X86] Combine the three feature variables in getHostCPUName into an array and pass it around as an array reference.
This makes the setting and clearing of bits simpler.
2020-06-12 18:30:41 -07:00
Vitaly Buka
5eed995b68 [StackSafety] Run ThinLTO
Summary:
ThinLTO linking runs dataflow processing on collected
function parameters. Then StackSafetyGlobalInfoWrapperPass
in ThinLTO backend will run as usual looking up to external
symbol in the summary if needed.

Depends on D80985.

Reviewers: eugenis, pcc

Reviewed By: eugenis

Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81242
2020-06-12 18:11:29 -07:00
Vitaly Buka
ee47bd7b36 [StackSafety,NFC] Extract addOverflowNever 2020-06-12 17:42:32 -07:00
Eric Christopher
c7d09d1011 Temporarily revert "[MemCpyOptimizer] Simplify API of processStore and processMem* functions"
as it seems to be causing some internal crashes in AA after
email with the author.

This reverts commit f79e6a8847aa330cac6837168d02f6b319024858.
2020-06-12 14:01:27 -07:00
Roman Lebedev
132f6b3168 [NFCI][MachineCopyPropagation] invalidateRegister(): use SmallSet<8> instead of DenseSet.
This decreases the time consumed by the pass [during RawSpeed unity build]
by 25% (0.0586 s -> 0.04388 s).

While that isn't really impressive overall, that wasn't the goal here.
The memory results here are noticeable.
The baseline results are:
```
total runtime: 55.65s.
calls to allocation functions: 19754254 (354960/s)
temporary memory allocations: 4951609 (88974/s)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.79MB
total memory leaked: 198.01MB
```
While with this patch the results are:
```
total runtime: 55.37s.
calls to allocation functions: 19068237 (344403/s)   # -3.47 %
temporary memory allocations: 4261772 (76974/s)      # -13.93 % (!!!)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.73MB
total memory leaked: 198.01MB
```

So we get rid of *a lot* of temporary allocations.

Using `SmallSet<8>` makes sense to me because at least here
for x86 BdVer2, the size of that set is *never* more than 3,
over all of llvm test-suite + RawSpeed.

The story might be different on other targets,
not sure if it will ever justify whole DenseSet,
but if it does SmallDenseSet might be a compromise.
2020-06-12 23:10:54 +03:00
Roman Lebedev
d7bb9dc47c [NFCI] VectorCombine: add statistic for bitcast(shuf()) -> shuf(bitcast()) xform 2020-06-12 23:10:53 +03:00
Roman Lebedev
1a4a95dfd5 [NFC] OpenMPOpt: add a statistic for num of parallel regions deleted 2020-06-12 23:10:53 +03:00
Ronak Chauhan
94e53ef1f1 [MC] Changes to help improve target specific symbol disassembly
Summary:
This commit slightly modifies the MCDisassembler, and llvm-objdump to
allow targets to also decode entire symbols.

WebAssembly uses the onSymbolStart hook it to decode preludes.
WebAssembly partially disassembles the symbol in its target specific
way; and then falls back to the normal flow of llvm-objdump.

AMDGPU needs it to decode kernel descriptors entirely, and move to the
next symbol.

This commit is to split the above task into 2.
- Changes to llvm-objdump and MC-layer without breaking WebAssembly code
  [ this commit ]
- AMDGPU's implementation of onSymbolStart that decodes kernel
  descriptors. [ https://reviews.llvm.org/D80713 ]

Reviewers: scott.linder, t-tye, sunfish, arsenm, jhenderson, MaskRay, aardappel

Reviewed By: scott.linder, jhenderson, aardappel

Subscribers: bcain, dschuff, wdng, tpr, sbc100, jgravelle-google, hiraditya, aheejin, MaskRay, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80512
2020-06-12 15:51:37 -04:00
Christopher Tetreault
3be3719b08 [SVE] Break dependency of Type.h on DerivedTypes.h
Summary:
Inline functions in Type.h depended upon inline functions isVectorTy and
getScalarType defined in DerivedTypes.h. Reimplement these functions in
Type.h in terms of Type

Reviewers: rengolin, efriedma, echristo, c-rhodes, david-arm

Reviewed By: echristo

Subscribers: tschuett, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81684
2020-06-12 12:43:33 -07:00
David Blaikie
60f7d5f8b7 llvm-dwarfdump: Include unit count in DWP index header dumping
And add comma separators (to be consistent with recent
changes/improvements to the dumping of other section headers) while I'm
here.
2020-06-12 12:40:02 -07:00
Michael Liao
7a293f5b3c [amdgpu] Skip OR combining on 64-bit integer before legalizing ops.
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81710
2020-06-12 15:22:38 -04:00
Erich Keane
3960ffe231 Update Kaleidoscope tutorial inline code
Reported on IRC, the tutorial code at the bottom of the page correctly
namespaces the FunctionPassManager, but the as-you-go code does not.
This patch adds the namespace to those.
2020-06-12 12:02:35 -07:00
Amara Emerson
d1a0fd3633 [AArch64][GlobalISel] Legalize vector G_PTR_ADD and enable selection.
Differential Revision: https://reviews.llvm.org/D81419
2020-06-12 11:25:17 -07:00
David Green
a70a741e88 [ARM] Always use reductions intrinsics under MVE
Similar to a recent change to the X86 backend, this changes things so
that we always produce a reduction intrinsics for all reduction types,
not just the legal ones. This gives a better chance in the backend to
custom lower them to something more suitable for MVE. Especially for
something like fadd the in-order reduction produced during DAG lowering
is already better than the shuffles produced in the midend, and we can
do even better with a bit of custom lowering.

Differential Revision: https://reviews.llvm.org/D81398
2020-06-12 19:21:17 +01:00
Daniel Grumberg
4e65793a85 [TableGen] Make behavior of getValueAsListOfStrings consistent with getValueAsString 2020-06-12 19:16:48 +01:00
Louis Dionne
db01bed9bb [Lit] Pass through SSH_AUTH_SOCK from the surrounding environment
This allows running Lit tests that run ssh without having to manually
enter a password (which is inconvenient), by just having ssh-agent
setup properly when running the test suite.
2020-06-12 13:59:29 -04:00
Michael Liao
2c215416cc [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2))
Reviewers: arsenm

Subscribers: sdardis, wdng, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, ecnelises, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81708
2020-06-12 13:53:08 -04:00
Jessica Paquette
3f385bc0de [AArch64][GlobalISel] Allow G_DUP for elements smaller than 32 B.
We select all of these via patterns now, so there's no reason to disallow this.

Update select-dup.mir to show that we correctly select the smaller types.

Differential Revision: https://reviews.llvm.org/D81322
2020-06-12 09:40:34 -07:00
Jessica Paquette
8a20f4e977 [AArch64][GlobalISel] Set hasSideEffects = 0 on custom shuffle opcodes
This was making it so that the instructions weren't eliminated in
select-rev.mir and select-trn.mir despite not being used.

Update the tests accordingly.

Differential Revision: https://reviews.llvm.org/D81492
2020-06-12 09:39:46 -07:00
Huihui Zhang
5e72cbc6bb [NFC] Silence compiler warning [-Wmissing-braces].
llvm/lib/Target/AArch64/AArch64SLSHardening.cpp:146:5: warning: suggest braces around initialization of subobject [-Wmissing-braces]
    "__llvm_slsblr_thunk_x0",  "__llvm_slsblr_thunk_x1",
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    {
llvm/lib/Target/AArch64/AArch64SLSHardening.cpp:168:5: warning: suggest braces around initialization of subobject [-Wmissing-braces]
    AArch64::X0,  AArch64::X1,  AArch64::X2,  AArch64::X3,  AArch64::X4,
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    {
2020-06-12 08:55:03 -07:00
Matt Arsenault
a4ac51f83e GlobalISel: Fix not erasing old instruction in sitofp/uitofp lowering 2020-06-12 10:33:23 -04:00