1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 18:42:46 +02:00
Commit Graph

204560 Commits

Author SHA1 Message Date
Martin Storsjö
8fc47f8652 [AArch64] Omit SEH directives for the epilogue if none are needed
For these cases, we already omit the prologue directives, if
(!AFI->hasStackFrame() && !windowsRequiresStackProbe && !NumBytes).

When writing the epilogue (after the prolog has been written), if
the function doesn't have the WinCFI flag set (i.e. if no prologue
was generated), assume that no epilogue will be needed either,
and don't emit any epilog start pseudo instruction. After completing
the epilogue, make sure that it actually matched the prologue.

Previously, when epilogue start/end was generated, but no prologue,
the unwind info for such functions actually was huge; 12 bytes xdata
(4 bytes header, 4 bytes for one non-folded epilogue header, 4 bytes
for padded opcodes) and 8 bytes pdata. Because the epilog consisted of
one opcode (end) but the prolog was empty (no .seh_endprologue), the
epilogue couldn't be folded into the prologue, and thus couldn't be
considered for packed form either.

On a 6.5 MB DLL with 110 KB pdata and 166 KB xdata, this gets rid of
38 KB pdata and 62 KB xdata.

Differential Revision: https://reviews.llvm.org/D88641
2020-10-02 09:12:56 +03:00
Max Kazantsev
a15cdde064 [SCEV] Limited support for unsigned preds in isImpliedViaOperations
The logic there only considers `SLT/SGT` predicates. We can use the same logic
for proving `ULT/UGT` predicates if all involved values are non-negative.

Adding full-scale support for unsigned might be challenging because of code amount,
so we can consider this in the future.

Differential Revision: https://reviews.llvm.org/D88087
Reviewed By: reames
2020-10-02 10:20:57 +07:00
Philip Reames
4bb7568ead [gvn] Handle a corner case w/vectors of non-integral pointers
If we try to coerce a vector of non-integral pointers to a narrower type (either narrower vector or single pointer), we use inttoptr and violate the semantics of non-integral pointers.  In theory, we can handle many of these cases, we just need to use a different code idiom to convert without going through inttoptr and back.

This shows up as wrong code bugs, and in some cases, crashes due to failed asserts.  Modeled after a change which has lived downstream for a couple years, though completely rewritten to be more idiomatic.
2020-10-01 19:20:21 -07:00
Carl Ritson
75e5d1576a [AMDGPU] SIInsertSkips: Tidy block splitting to use splitAt
Convert to use new MachineBasicBlock splitAt function.
Place code in splitBlock function for reuse in future changes.
Should yield no functional change.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88537
2020-10-02 11:10:55 +09:00
Carl Ritson
01c4226b07 CodeGen: Fix livein calculation in MachineBasicBlock splitAt
Fix and simplify computation of liveins for new block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88535
2020-10-02 10:45:04 +09:00
Esme-Yi
a538a3723b [PowerPC] Put the CR field in low bits of GRC during copying CRRC to GRC.
Summary: How we copying the CRRC to GRC is using a single MFOCRF to copy the contents of CR field n (CR bits 4×n+32:4×n+35) into bits 4×n+32:4×n+35 of register GRC. That’s not correct because we expect the value of destination register equals to source so we have to put the the contents of CR field in the lowest 4 bits. This patch adds a RLWINM after MFOCRF to achieve that.
The problem came up when adding builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp, as posted in D88278. We need to move the outputs (in CR register) to GRC. However outputs of these instructions may not in a fixed CR# register, so we can’t directly add a rotation instruction in the .td patterns, but need to wait until the CR register is determined. Then we confirmed this should be a bug in POST-RA PSEUDO PASS.

Reviewed By: nemanjai, shchenz

Differential Revision: https://reviews.llvm.org/D88274
2020-10-02 01:26:18 +00:00
Joseph Huber
dbad0ba351 [OpenMP] Add Missing Runtime Call for Globalization Remarks
Summary:
Add a missing runtime call to perform data globalization checks.

Reviewers: jdoerfert

Subscribers: guansong hiraditya llvm-commits sstefan1 yaxunl

Tags: #LLVM #OpenMP

Differential Revision: https://reviews.llvm.org/D88621
2020-10-01 21:19:53 -04:00
jasonliu
89f3a464f7 [XCOFF] Enable -fdata-sections on AIX
Summary:
Some design decision worth noting about:

I've noticed a recent mailing discussing about why string literal is
not affected by -fdata-sections for ELF target:
http://lists.llvm.org/pipermail/llvm-dev/2020-September/145121.html

But on AIX, our linker could not split the mergeable string like other target.
So I think it would make more sense for us to emit separate csect for
every mergeable string in -fdata-sections mode,
as there might not be other ways for linker to do garbage collection
on unused mergeable string.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D88339
2020-10-02 00:16:24 +00:00
Philip Reames
24c50af17e [memcpyopt] Conservatively handle non-integral pointers
If we allow the non-integral pointers to become memset and memcpy, we loose the ability to reason about pointer propagation.  This patch is modeled on changes we've carried downstream for a long time, figured it was worth being equally conservative for other users.  There is room to refine the semantics and handling here if anyone is motivated.
2020-10-01 16:46:56 -07:00
Muhammad Asif Manzoor
94e48de239 [AArch64][SVE] Add lowering for llvm fabs
Add the functionality to lower fabs for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88679
2020-10-01 19:41:25 -04:00
Philip Reames
874fb0cf7a Fix a bug in memset formation with vectors of non-integral pointers
We were converting the non-integral store into a integer store which is not legal.
2020-10-01 16:11:11 -07:00
Stanislav Mekhanoshin
b53c88017f [AMDGPU] Allow SOP asm mnemonic to differ
Allows the creation of real SOP1 instructions with
assembler mnemonics that differ from their
pseudo-instruction mnemonics. The default behavior
keeps the mnemonics matching.

Corrects a subtarget label typo in a comment.

Authored By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D88708
2020-10-01 16:00:04 -07:00
Jessica Paquette
c5bda355e6 [GlobalISel][AArch64] Don't emit cset for G_FCMPs feeding into G_BRCONDs
Similar to the FP case in `AArch64TargetLowering::LowerBR_CC`.

Instead of emitting the csets + a tbnz, just emit a compare + bcc
(or two bccs, depending on the condition code)

This improves cases like this: https://godbolt.org/z/v8hebx

This is a 0.1% geomean code size improvement for CTMark at -O3.

Differential Revision: https://reviews.llvm.org/D88624
2020-10-01 15:34:16 -07:00
Jessica Paquette
7197bc439e [AArch64][GlobalISel] Use emitTestBit in selection for G_BRCOND
Partially refactoring, partially fixing a bug.

- We shouldn't use TB(N)ZX unless the bit number is >= 32
- We can fold more than xor using emitTestBit

Also remove a check which isn't relevant anymore + update tests.

Rename select-brcond-of-not.mir to select-brcond-of-binop.mir, since it now
tests more than just G_XOR.

Differential Revision: https://reviews.llvm.org/D88702
2020-10-01 15:33:34 -07:00
Amara Emerson
b4d0184534 [AArch64][GlobalISel] Alias rules for G_FCMP to G_ICMP.
No need to be different here for the vast majority of rules.
2020-10-01 15:20:09 -07:00
Amara Emerson
b838382ceb [AArch64][GlobalISel] Make <8 x s8> integer arithmetic ops legal. 2020-10-01 14:35:21 -07:00
Amara Emerson
2e35d9215f [AArch64][GlobalISel] Make <8 x s8> shifts legal and add selection support. 2020-10-01 14:21:18 -07:00
Amara Emerson
22d12ebe1b Revert "[AArch64][GlobalISel] Make <8 x s8> shifts legal."
Accidentally pushed this.
2020-10-01 14:15:57 -07:00
Amara Emerson
54e3d67d23 [AArch64][GlobalISel] Make <8 x s8> shifts legal. 2020-10-01 14:10:10 -07:00
Amara Emerson
d997cfda8a [AArch64][GlobalISel] Merge G_SHL, G_ASHR and G_LSHR legalizer rules together.
There's no need for any difference between these.
2020-10-01 14:02:45 -07:00
Arthur Eubanks
f70b28627d [gn build] Support building with ThinLTO
Differential Revision: https://reviews.llvm.org/D88584
2020-10-01 13:48:31 -07:00
Amara Emerson
c0c56e8bad [AArch64][GlobalISel] Use custom legalization for G_TRUNC for v8i8 vectors.
Truncating to v8i8 is a case where we want to split the source but also generate
intermediate truncates to reduce the size of the source vector before truncating
down to v8i8. This implements the same strategy as what SelectionDAG does, but
I'm not certain where if anywhere in generic code it should live.

Use it for legalization of v8s8 = G_ICMP v8s32.

Differential Revision: https://reviews.llvm.org/D88191
2020-10-01 13:22:00 -07:00
Amara Emerson
4d12624339 [AArch64][GlobalISel] Camp oversize v4s64 G_FPEXT operations. 2020-10-01 13:08:31 -07:00
Reid Kleckner
f0bf98943a [lit] Fix Python 2/3 compat in new winreg search code
This should fix the test failures on the clang win64 bot:
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/18830
It has been red since Sept 23-ish.

This was subtle to debug. Windows has 'find' and 'sort' utilities in
C:\Windows\system32, but they don't support all the same flags as the
coreutils programs. I configured the buildbot above with Python 2.7
64-bit (hey, it was set up in 2016). When I installed git for Windows, I
opted to add all the Unix utilities that come with git to the system
PATH. This is *almost* enough to make the LLVM tests pass, but not
quite, because if you use the system PATH, the Windows version of find
and sort come first, but the tests that use diff, cmp, etc, will all
pass. So only a handful of tests will fail, and with cryptic error
messages.

The code changed in this CL doesn't work with Python 2. Before
Python 3.2, the winreg.OpenKey function did not accept the `access=`
keyword argument, the caller was required to pass an unused `reserved`
positional argument of 0. The try/except/pass around the OpenKey
operation masked this usage error in Python 2.

Further, the result of the registry operation has to be converted from
unicode to add it to the environment, but that was incidental.
2020-10-01 12:22:28 -07:00
Nikita Popov
6d2cf6fc6a [InstCombine] Fix select operand simplification with undef (PR47696)
When replacing X == Y ? f(X) : Z with X == Y ? f(Y) : Z, make sure
that Y cannot be undef. If it may be undef, we might end up picking
a different value for undef in the comparison and the select
operand.
2020-10-01 21:15:48 +02:00
Petr Hosek
98bed1a704 [CMake] Use -isystem flag to access libc++ headers
This is a partial revert of D62155. Rather than copying libc++ headers
into the build directory to be later overwritten by the final headers,
use -isystem flag to access libc++ headers during CMake checks. This
should address the occasional flake we've seen, especially on Windows
builders where CMake fails to overwrite __config with the final version.

Differential Revision: https://reviews.llvm.org/D88454
2020-10-01 12:09:27 -07:00
Sanjay Patel
4a25740376 [APFloat] convert SNaN to QNaN in convert() and raise Invalid signal
This is an alternate fix (see D87835) for a bug where a NaN constant
gets wrongly transformed into Infinity via truncation.
In this patch, we uniformly convert any SNaN to QNaN while raising
'invalid op'.
But we don't have a way to directly specify a 32-bit SNaN value in LLVM IR,
so those are always encoded/decoded by calling convert from/to 64-bit hex.

See D88664 for a clang fix needed to allow this change.

Differential Revision: https://reviews.llvm.org/D88238
2020-10-01 14:37:38 -04:00
Arthur Eubanks
920d09ef67 Revert "[CFGuard] Add address-taken IAT tables and delay-load support"
This reverts commit ef4e971e5e18ae796466623df8f26265ba6bdfb5.
2020-10-01 11:29:54 -07:00
Sanjay Patel
65ea3c84f4 [InstCombine] auto-generate complete test checks; NFC 2020-10-01 13:44:31 -04:00
zoecarver
b38d4083e5 [DSE] Look through memory PHI arguments when removing noop stores in MSSA.
Summary:
Adds support for "following" memory through MSSA PHI arguments. This will help catch more noop stores that exist between blocks.

Originally part of D79391.

Reviewers: fhahn, jfb, asbirlea

Differential Revision: https://reviews.llvm.org/D82588
2020-10-01 10:42:02 -07:00
Jamie Schmeiser
37e405d0af Reland No.3: Add new hidden option -print-changed which only reports changes to IR
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.

The code introduces an abstract template base class that generalizes the
comparison of IRs that takes an IR representation as template parameter.
Derived classes provide overrides that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.

The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.

Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen), MaskRay (Fangrui Song)

Differential Revision: https://reviews.llvm.org/D86360
2020-10-01 17:39:13 +00:00
Mircea Trofin
14a8d83207 [NFC] Let (MC)Register APIs check isStackSlot
The user is expected to make the isStackSlot check before calling isPhysicalRegister
or isVirtualRegister. The APIs assert otherwise. We can improve the usability
of these APIs by carrying out the check in the 2 APIs: they become a
complete "source of truth" and remove an extra responsibility from the
user.

Differential Revision: https://reviews.llvm.org/D88598
2020-10-01 09:55:20 -07:00
Shoaib Meenai
8896c8f66c [runtimes] Remove TOOLCHAIN_TOOLS specialization
https://reviews.llvm.org/D88310 fixed the AIX issue in LLVMExternalProjectUtils,
so we shouldn't need the workaround in the runtimes build anymore. I'm
reverting it because it prevents the target-specific tool selection in
LLVMExternalProjectUtils from taking effect, which we rely on for our
runtimes builds.

Reviewed By: daltenty

Differential Revision: https://reviews.llvm.org/D88627
2020-10-01 09:53:10 -07:00
Vy Nguyen
a8dbe39f1b Reland rG4fcd1a8e6528:[llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking.
This is mostly for the benefit of the LBR latency mode.
Right now, it performs no checking. If this is run on non-supported hardware, it will produce all zeroes for latency.

      Differential Revision: https://reviews.llvm.org/D85254

New change: Updated lit.local.cfg to use pass the right argument to llvm-exegesis to actually request the LBR mode.

Differential Revision: https://reviews.llvm.org/D88670
2020-10-01 12:21:16 -04:00
Martin Storsjö
0bc7a3ead4 [AArch64] Don't merge sp decrement into later stores when using WinCFI
This matches the corresponding existing case in
AArch64LoadStoreOpt::findMatchingUpdateInsnForward.

Both cases could also be modified to check
MBBI->getFlag(FrameSetup/FrameDestroy) instead of forbidding any
optimization involving SP, but the effect is probably pretty much
the same.

Differential Revision: https://reviews.llvm.org/D88541
2020-10-01 19:03:27 +03:00
Martin Storsjö
bbdb6c4cf7 [AArch64] Remove a duplicate call to setHasWinCFI. NFCI.
The function already has a cleanup scope that calls the same whenever
the function is exited. When reading the code, seeing that this return
codepath has an explicit call while other return paths lack it is
confusing.

In the hypothetical case of a function having a prologue that
set the HasWinCFI flag in the MF, but the epilogue containing no
WinCFI instructions, the HasWinCFI flag in the MF would end up reset back
to false.

Differential Revision: https://reviews.llvm.org/D88636
2020-10-01 19:03:27 +03:00
Simon Pilgrim
744dd3076c [InstCombine] collectBitParts - convert to use PatterMatch matchers and avoid IntegerType casts.
Make sure we're using getScalarSizeInBits instead of cast<IntegerType> to get Type bit widths.

This is preliminary cleanup before we can start adding vector support to the bswap/bitreverse (element level) matching.
2020-10-01 16:44:14 +01:00
Meera Nakrani
1ebfeef2a5 [ARM] Removed hasSideEffects from signed/unsigned saturates
Removed hasSideEffects from SSAT and USAT so that they are no longer
marked as unpredictable.

Differential Revision: https://reviews.llvm.org/D88545
2020-10-01 14:55:01 +00:00
Jay Foad
b964542465 [AMDGPU] Simplify getNumFlatOffsetBits. NFC.
Remove some checks that have already been done in the only caller.
2020-10-01 15:24:09 +01:00
LLVM GN Syncbot
fb69abb068 [gn build] Port f6b1323bc68 2020-10-01 14:18:52 +00:00
Simon Pilgrim
a4a9ce83e5 [InstCombine] Use m_FAbs matcher helper. NFCI. 2020-10-01 14:42:34 +01:00
Simon Pilgrim
038cfb6d37 [IR] PatternMatch - add m_FShl/m_FShr funnel shift intrinsic matchers. NFCI. 2020-10-01 14:42:34 +01:00
Jay Foad
0359573ae0 [AMDGPU] Tiny cleanup in isLegalFLATOffset. NFC. 2020-10-01 14:26:03 +01:00
David Sherwood
42f980da4f [SVE][CodeGen] Replace use of TypeSize operator< in GlobalMerge::doMerge
We don't support global variables with scalable vector types so I've
changed the code to compare the fixed sizes instead.

Differential Revision: https://reviews.llvm.org/D88564
2020-10-01 14:06:59 +01:00
James Henderson
6428f0596b [Archive] Don't throw away errors for malformed archive members
When adding an archive member with a problem, e.g. a new bitcode with an
old archiver, containing an unsupported attribute, or an ELF file with a
malformed symbol table, the archiver would throw away the error and
simply add the member to the archive without any symbol entries. This
meant that the resultant archive could be silently unusable when not
using --whole-archive, and result in unexpected undefined symbols.

This change fixes this issue by addressing two FIXMEs and only throwing
away not-an-object errors. However, this meant that some LLD tests which
didn't need symbol tables and were using invalid members deliberately to
test the linker's malformed input handling no longer worked, so this
patch also stops the archiver from looking for symbols in an object if
it doesn't require a symbol table, and updates the tests accordingly.

Differential Revision: https://reviews.llvm.org/D88288

Reviewed by: grimar, rupprecht, MaskRay
2020-10-01 14:03:34 +01:00
LLVM GN Syncbot
77de5cb36a [gn build] Port d53b4bee0cc 2020-10-01 12:55:59 +00:00
Sjoerd Meijer
c208ece583 [LoopFlatten] Add a loop-flattening pass
This is a simple pass that flattens nested loops.  The intention is to optimise
loop nests like this, which together access an array linearly:

  for (int i = 0; i < N; ++i)
    for (int j = 0; j < M; ++j)
      f(A[i*M+j]);

into one loop:

  for (int i = 0; i < (N*M); ++i)
    f(A[i]);

It can also flatten loops where the induction variables are not used in the
loop. This can help with codesize and runtime, especially on simple cpus
without advanced branch prediction.

This is only worth flattening if the induction variables are only used in an
expression like i*M+j. If they had any other uses, we would have to insert a
div/mod to reconstruct the original values, so this wouldn't be profitable.

This partially fixes PR40581 as this pass triggers on one of the two cases. I
will follow up on this to learn LoopFlatten a few more (small) tricks. Please
note that LoopFlatten is not yet enabled by default.

Patch by Oliver Stannard, with minor tweaks from Dave Green and myself.

Differential Revision: https://reviews.llvm.org/D42365
2020-10-01 13:54:45 +01:00
Sam Parker
55afff27c3 [NFC][ARM] LowOverheadLoop DEBUG statements 2020-10-01 13:38:16 +01:00
Simon Pilgrim
7c21647448 [InstCombine] collectBitParts - use APInt directly to check for out of range bit shifts. NFCI. 2020-10-01 12:50:36 +01:00
Andrew Paverd
ee355f6f7f [CFGuard] Add address-taken IAT tables and delay-load support
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D87544
2020-10-01 12:45:07 +01:00