1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 02:33:06 +01:00
Commit Graph

213861 Commits

Author SHA1 Message Date
Yuanfang Chen
2ff261f721 abtest.py: support bisection based on a response file
Also makes LINK_TEST customizable from commandline with `--test` option.
2021-04-08 09:46:01 -07:00
Andrew Savonichev
9c918327f6 [MCA] Add tests for IPC on Cortex-A55
The tests compare IPC statistics that MCA provides with IPC values
measured on Cortex-A55 hardware. For hardware tests, each snippet is
run in a loop unrolled by 1000, and IPC is measured by linux-perf.

Several tests do not match the hardware: the skewed ALU is not
supported, LDR seem to be missing a forwarding path.

Differential Revision: https://reviews.llvm.org/D98174
2021-04-08 19:37:07 +03:00
Stephen Tozer
24e848bf91 [DebugInfo] Correctly track SDNode dependencies for list debug values
During SelectionDAG, we must track the SDNodes that each SDDbgValue depends on
to compute its value. These are ultimately derived from the location operands to
the SDDbgValue, but were stored in a separate vector prior to this patch. This
resulted in cases where one of the lists was updated incorrectly, resulting in
crashes during compilation. This patch fixes the issue by directly recomputing
the dependency list from the SDDbgOperands in getDependencies().

Differential Revision: https://reviews.llvm.org/D99423
2021-04-08 17:01:45 +01:00
Jay Foad
1bd3ba1ff8 [AMDGPU] Add some implicit uses to tests. NFC.
This is just to stop a future patch from optimizing away the things that
we actually want to check for.
2021-04-08 16:37:48 +01:00
Jay Foad
7ed2ce6b42 [AMDGPU] SIFoldOperands: remove an unneeded isReg check. NFC. 2021-04-08 16:37:43 +01:00
Jay Foad
f43cbd4ee9 [AMDGPU] SIFoldOperands: make use of emplace_back. NFC. 2021-04-08 14:34:10 +01:00
Jay Foad
50867895fb [AMDGPU] SIFoldOperands: remove an unneeded make_early_inc_range. NFC. 2021-04-08 14:32:36 +01:00
Jay Foad
337240fd51 [AMDGPU] SIFoldOperands: try harder to fold cndmask instructions
Look through copies to find more cases where the two values being
selected are identical. The motivation for this is just to be able to
remove the weird special case where tryFoldCndMask was called from
foldInstOperand, part way through folding a move-immediate into its
users, without regressing any lit tests.
2021-04-08 14:26:12 +01:00
Florian Hahn
3be13814be [LV] Pass VPWidenPHIRecipe to widenPHIInstruction (NFC).
Instead of passing the start value and the defined value to
widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be
used to get both (and more in future patches).
2021-04-08 14:25:10 +01:00
Joseph Tremoulet
c47e6e9e06 Support: mapped_file_region: Pass MAP_NORESERVE to mmap
This allows mapping larger files, delaying OOM failures until too many
pages of them are accessed.  This is makes the behavior of the
mapped_file_region in this regard consistent between its "Unix" and
"Windows" implementations.

Guard the code witih #if defined(MAP_NORESERVE), consistent with other
uses of MAP_NORESERVE in llvm-project, because some FreeBSD versions do
not provide this flag.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D96626
2021-04-08 09:07:25 -04:00
Jay Foad
0533f9d3ae [AMDGPU] SIFoldOperands: make tryFoldCndMask a member function. NFC. 2021-04-08 14:05:29 +01:00
Sebastian Neubauer
e0e28835d1 [AMDGPU] Fix computing live registers in prolog
ScratchExecCopy needs to be marked as live, we cannot use that register
while EXEC is stored in there.

Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as
available is unnecessary, they should not be live at that point anway.

Differential Revision: https://reviews.llvm.org/D100098
2021-04-08 14:52:50 +02:00
Sanjay Patel
4786c07012 [InstCombine] add icmp with no-wrap add tests; NFC
Goes with D100095
2021-04-08 08:40:04 -04:00
Paul C. Anagnostopoulos
d8201f454e [TableGen] Make behavior of list slice suffix consistent across all values
Differential Revision: https://reviews.llvm.org/D99883
2021-04-08 08:38:44 -04:00
Paul C. Anagnostopoulos
282eb5170a [TableGen] Add support for the 'assert' statement in multiclasses 2021-04-08 08:36:03 -04:00
David Sherwood
5def792b7d [CodeGen][AArch64] Fix isel crash for truncating FP stores
When attempting to truncate a FP vector and store the result out
to memory we crashed because we had no pattern for truncating FP
stores. In fact, we don't support these types of stores and the
correct fix is to stop marking these truncating stores as legal.

Tests have been added here:

  CodeGen/AArch64/sve-fptrunc-store.ll

Differential Revision: https://reviews.llvm.org/D100025
2021-04-08 13:21:29 +01:00
Jay Foad
01d1da52ff [AMDGPU] SIFoldOperands: refactor tryFoldCndMask with early-outs. NFC. 2021-04-08 13:16:07 +01:00
Stephen Tozer
0261173904 [DebugInfo] Prevent invalid debug info being produced during LoopStrengthReduce
During LoopStrengthReduce, some of the SSA values that are used by debug values
may be lost and/or salvaged. After LSR we attempt to recover any undef debug
values, including any that were salvaged but then lost their values afterwards,
by replacing the lost values with any live equal values (plus a possible
constant offset) that have been gathered prior to running LSR. When we do this
we restore the debug value's original DIExpression, to undo any salvaging (as we
have gone back to using the original debug value).

This process can currently produce invalid debug info if the number of operands
has changed by salvaging during LSR. Replacing old values during the
applyEqualValues step does not change the number of location operands, which
means that when we restore the old DIExpression we may have a mismatch between
the number of operands used by the debug value and the number of operands
referenced by the DIExpression. This patch fixes this by restoring the full
original location metadata at the start of the applyEqualValues step, so that
there is no mismatch in operand count between the debug value and its
DIExpression.

Differential Revision: https://reviews.llvm.org/D98644
2021-04-08 13:04:48 +01:00
Roman Lebedev
b825875812 [NFC][X86][CostModel] Add some load/store tests w/ non-power-of-two elt cnt vectors
For example the cost to load <48 x i16> should likely be 3,
because that's just 3x load i256.
2021-04-08 15:00:28 +03:00
Mikael Holmen
dda9735107 [NVPTX] Fix compiler warning in NDEBUG build [NFC]
Without the fix we get

../lib/Target/NVPTX/NVPTXLowerArgs.cpp:236:24: error: lambda capture 'Arg' is not used [-Werror,-Wunused-lambda-capture]
  auto IsALoadChain = [Arg](Value *Start) {
                       ^~~
1 error generated.
2021-04-08 13:21:21 +02:00
Serguei Katkov
efbc7a658c [GreedyRA ORE] Add function level spill/reloads stats
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames, thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100014
2021-04-08 16:55:52 +07:00
David Green
154e9ac3d0 [LV] Logical and/or select costs
D99674 stopped the folding of certain select operations into and/or, due
to incorrect folding in the presence of poison. D97360 added some costs
to attempt to account for the change, but only worked at the getUserCost
level, not the getCmpSelInstrCost that the vectorizer will use directly.
This adds similar logic into the vectorizer to handle these logical
and/or selects, treating them like and/or directly.

This fixes 60% performance regressions from code like the attached test
case.

Differential Revision: https://reviews.llvm.org/D99884
2021-04-08 10:39:47 +01:00
David Green
1c65cc8e30 [LV] Add a logical and/or select cost test. NFC 2021-04-08 10:27:06 +01:00
Fraser Cormack
cefb1ed047 [RISCV] Support OR/XOR/AND reductions on vector masks
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.

The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.

Other reduction types were not deemed to be relevant for mask vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100030
2021-04-08 09:46:38 +01:00
Thomas Preud'homme
a168a0cb31 [AMDGPU, test] Fix use of undef FileCheck var
Test CodeGen/AMDGPU/amdgpu.private-memory.ll and
CodeGen/AMDGPU/private-memory-r600.ll have a block of CHECK directives
whose prefix is inconsistent: R600-CHECK Vs R600. This leads to a
R600-NOT directive using an undefined CHAN variable due to R600-CHECK
directives never being considered by FileCheck. Fixing the prefix leads
to the testcase failing. As per https://reviews.llvm.org/D99865#2675235
this commit removes the directives instead since it is not possible to
write a reliable check.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D99865
2021-04-08 09:42:59 +01:00
LemonBoy
29ba9d08ef [AsmParser] Recognize more escaped characters between single quotes
The GNU AS manual states the following about single-character constants enclosed within single quotes:

>  Some backslash escapes apply to characters, \b, \f, \n, \r, \t, and \" with the same meaning as for strings, plus \' for a single quote.

Add two more characters to the switch handling this case to match GAS behaviour, plus a test to make sure nothing regresses.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99609
2021-04-08 09:59:37 +02:00
Serguei Katkov
c36162f4d9 [GreedyRA ORE] Extract computeNumberOfSplillsReloads to use in different places. NFC.
Extract one basic block handling to introduce stat computation for method scope.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames, thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100013
2021-04-08 14:40:45 +07:00
Serguei Katkov
075fe72e46 [GreedyRA ORE] Extract stats in RAGreedyStats struct. NFC.
Combine all collected stats into separate struct RAGreedyStats
with add and report methods.

The motivation is to extend the number of statistics to capture and instead of
adding new parameters, just combine all of them into one structure.
Additionally I plan to use report from different places in future to report data
for function as well.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100012
2021-04-08 14:27:37 +07:00
Serguei Katkov
e21829da6d [GreedyRA ORE] Compute ORE stats if extra analysis is enabled
To save compile time, avoid computation of stats if ORE will not emit it.
The motivation is to add more stats and compute it only if it will dumped.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100010
2021-04-08 14:24:18 +07:00
Esme-Yi
353dabc855 [Debug-Info] Use inlined strings in .dwinfo section by default for DBX.
Summary: Set the default DwarfInlinedStrings as inlined strings for DBX, due to DBX does not support .dwstr section for now.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D99933
2021-04-08 07:20:22 +00:00
Hsiangkai Wang
f26eeade90 [RISCV] Add scalable offset under very large stack size.
If the stack size is larger than 12 bits, we have to use a scratch
register to store the stack size. Before we introduce the scalable stack
offset, we could simplify

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%0 = ADD %fp, %scratch

However, if the offset contains scalable part, we need to consider it.

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%scratch = ADD %fp, %scratch
%scalable_offset = ... # sequence of instructions for vscaled-offset.
%0 = ADD/SUB %scratch, %scalable_offset

Differential Revision: https://reviews.llvm.org/D100035
2021-04-08 14:46:05 +08:00
Hsiangkai Wang
a750e1eab2 [NFC][RISCV] Add test for scalable offset under large stack size.
This test case shows that we access wrong stack slots when the frame
object has scalable offset under large stack size.

Differential Revision: https://reviews.llvm.org/D100084
2021-04-08 14:46:05 +08:00
Juneyoung Lee
dda330f547 [Constant] Remove unused variable 2021-04-08 15:44:42 +09:00
Juneyoung Lee
01aeaffdfd [Constant] ConstantStruct/Array should not lower poison to undef
This is a (late) follow-up patch of 8871a4b4cab8a56fd6ff12fd024002c3c79128b3 and
c95f39891a282ebf36199c73b705d4a2c78a46ce to make ConstantStruct::get/ConstantArray::getImpl
correctly return PoisonValue if all elements are poison.
This was found while discussing about the elements of a vector-typed UndefValue (D99853)
2021-04-08 15:23:12 +09:00
Hongtao Yu
12f5193bf4 [CSSPGO] Move pseudo probes to the beginning of a block to unblock SelectionDAG combine.
Pseudo probes, when scattered in a block, can be chained dependencies of other regular DAG nodes and block DAG combine optimizations. To fix this, scattered probes in a block are grouped and placed at the beginning of the block. This shouldn't affect the profile quality.

Test Plan:

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D100002
2021-04-07 22:45:35 -07:00
Philip Reames
b81ddb9786 [docs] Document our norms around reverts
This has come up a few times recently, and I was surprised to notice that we don't have anything in the docs.

This patch deliberately sticks to stuff that is uncontroversial in the community. Everything herein is thought to be widely agreed to by a large majority of the community.  A few things were noted and removed in review which failed this standard, if you spot anything else, please point it out.

Differential Revision: https://reviews.llvm.org/D99305
2021-04-07 21:02:19 -07:00
Serge Pavlov
9a92b12ed5 [RISCV] DAG nodes and pseudo instructions for CSR access
New custom DAG nodes were added to represent operations on CSR. These
nodes are lowered to corresponding pseudo instruction. Using the pseudo
instructions allows to specify different scheduling information for
operations on different system registers. It also make possible to
specify dependencies of instructions on specific system registers.

Differential Revision: https://reviews.llvm.org/D98936
2021-04-08 10:36:36 +07:00
hsmahesha
8e42e3ba43 [AMDGPU] Only use ds_read/write_b128 for alignment >= 16
PS: Submitting on behalf of Jay.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100008
2021-04-08 08:12:05 +05:30
hsmahesha
4d33c3ce58 [AMDGPU] Add some exhaustive ds read/write alignment tests
PS: Submitting on behalf of Jay.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100007
2021-04-08 08:08:49 +05:30
Chen Zheng
80021b3956 [PowerPC] fixup killed flags for ri + addi to ri transformation
Fixup killed flags if DefMI and MI are not in the same basic blocks.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D100023
2021-04-07 22:04:08 -04:00
Congzhe Cao
7172aedb02 Revert "[LoopInterchange] Fix transformation bugs in loop interchange"
This reverts commit 6ec68bd815d00c1eec2a6b9766452554f0e6cb61.
2021-04-07 21:17:30 -04:00
Tony Tye
2ebd0d6fb1 [NFC][AMDGPU] Correct indentation in AMDGPUUsage.rst
Correct indentation that results in rST syntax error.
2021-04-08 01:00:13 +00:00
CongzheUalberta
bf513df9ea [LoopInterchange] Fix transformation bugs in loop interchange
After loop interchange, the (old) outer loop header should not jump to
`LoopExit`. Note that the old outer loop becomes the new inner loop
after interchange. If we branched to `LoopExit` then after interchange
we would jump directly from the (new) inner loop header to `LoopExit`
without executing the rest of (new) outer loop.

This patch modifies adjustLoopBranches() such that the old outer
loop header (which becomes the new inner loop header) jumps to the
old inner loop latch which becomes the new outer loop latch after
interchange.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D98475
2021-04-07 20:55:44 -04:00
Stanislav Mekhanoshin
bb3840a56e Disable use of SCC bit from asm
Differential Revision: https://reviews.llvm.org/D100069
2021-04-07 15:32:17 -07:00
Tony Tye
36036d99fe [AMDGPU] Update gfx90a memory model support
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100070
2021-04-07 22:17:58 +00:00
Stanislav Mekhanoshin
416857a2b5 [AMDGPU] Split GCNRegBankReassign
Allow pass to work separately with SGPR, VGPR registers or both.
This is NFC now but will be needed to split RA for separate
SGPR and VGPR passes.

Differential Revision: https://reviews.llvm.org/D100063
2021-04-07 14:45:13 -07:00
Florian Hahn
4b85eda637 [BasicAA] Add another GEP modulo test with shl with odd op. 2021-04-07 22:31:51 +01:00
Sanjay Patel
025938a3d9 [InstCombine] fold not ops around min/max intrinsics
This is another step towards parity with the existing
cmp+select folds (see D98152).
2021-04-07 17:31:36 -04:00
Sanjay Patel
10a7f7026e [InstCombine] add test for min/max intrinsic with not ops; NFC 2021-04-07 17:31:36 -04:00
Craig Topper
76aa82a7df [RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition.
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.

We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.

We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.

Differential Revision: https://reviews.llvm.org/D99021
2021-04-07 13:47:17 -07:00