1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 11:13:28 +01:00
Commit Graph

208470 Commits

Author SHA1 Message Date
LLVM GN Syncbot
594960eb50 [gn build] Port e69e551e0e5 2020-12-18 13:00:09 +00:00
Kerry McLaughlin
a9d5f92f0d [SVE][CodeGen] Vector + immediate addressing mode for masked gather/scatter
This patch extends LowerMGATHER/MSCATTER to make use of the vector + reg/immediate
addressing modes for scalable masked gathers & scatters.

selectGatherScatterAddrMode checks if the base pointer is null, in which case
we can swap the base pointer and the index, e.g.
     getelementptr nullptr, <vscale x N x T> (splat(%offset)) + %indices)
  -> getelementptr %offset, <vscale x N x T> %indices

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93132
2020-12-18 11:56:36 +00:00
Simon Pilgrim
28cf5e1f81 [X86][AVX] Replace extract_subvector(broadcast(), 0) folds with generic SimplifyDemandedVectorEltsForTargetNode handling.
Simplifies a few more cases, notably shuffle demanded elts cases.
2020-12-18 11:51:10 +00:00
Carl Ritson
b988d0d807 [AMDGPU][NFC] Remove unused Hi16Elt definition 2020-12-18 20:38:54 +09:00
Lucas Prates
bf2fbafd5f [AArch64] Add support for the SPE-EEF feature
This is an addition to the existing Statistical Profiling extension, which
introduces an extra system register that is enabled by the new 'spe-eef'
subtarget feature.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D92391
2020-12-18 11:11:56 +00:00
Lucas Prates
60c2a88e72 [AArch64] Add support for the Branch Record Buffer extension
This introduces asm support for the Branch Record Buffer extension, through
the new 'brbe' subtarget feature. It consists of a new set of system registers
that enable the handling of branch records.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D92389
2020-12-18 11:11:06 +00:00
Carl Ritson
d466c5273e [AMDGPU][NFC] Document high parameter of f16 interp intrinsics 2020-12-18 19:59:13 +09:00
Cullen Rhodes
b434082b2a [TTI] Add supportsScalableVectors target hook
This is split off from D91718 and adds a new target hook
supportsScalableVectors that can be queried to check if scalable vectors
are supported by the backend. For AArch64 this returns true if SVE is
enabled.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93060
2020-12-18 10:37:01 +00:00
Bjorn Pettersson
3774e2781f Add intrinsics for saturating float to int casts
This patch adds support for the fptoui.sat and fptosi.sat intrinsics,
which provide basically the same functionality as the existing fptoui
and fptosi instructions, but will saturate (or return 0 for NaN) on
values unrepresentable in the target type, instead of returning
poison. Related mailing list discussion can be found at:
https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ

The intrinsics have overloaded source and result type and support
vector operands:

    i32 @llvm.fptoui.sat.i32.f32(float %f)
    i100 @llvm.fptoui.sat.i100.f64(double %f)
    <4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
    // etc

On the SelectionDAG layer two new ISD opcodes are added,
FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands
and one result. The second operand is an integer constant specifying
the scalar saturation width. The idea here is that initially the
second operand and the scalar width of the result type are the same,
but they may change during type legalization. For example:

    i19 @llvm.fptsi.sat.i19.f32(float %f)
    // builds
    i19 fp_to_sint_sat f, 19
    // type legalizes (through integer result promotion)
    i32 fp_to_sint_sat f, 19

I went for this approach, because saturated conversion does not
compose well. There is no good way of "adjusting" a saturating
conversion to i32 into one to i19 short of saturating twice.
Specifying the saturation width separately allows directly saturating
to the correct width.

There are two baseline expansions for the fp_to_xint_sat opcodes. If
the integer bounds can be exactly represented in the float type and
fminnum/fmaxnum are legal, we can expand to something like:

    f = fmaxnum f, FP(MIN)
    f = fminnum f, FP(MAX)
    i = fptoxi f
    i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

If the bounds cannot be exactly represented, we expand to something
like this instead:

    i = fptoxi f
    i = select f ult FP(MIN), MIN, i
    i = select f ogt FP(MAX), MAX, i
    i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

It should be noted that this expansion assumes a non-trapping fptoxi.

Initial tests are for AArch64, x86_64 and ARM. This exercises all of
the scalar and vector legalization. ARM is included to test float
softening.

Original patch by @nikic and @ebevhan (based on D54696).

Differential Revision: https://reviews.llvm.org/D54749
2020-12-18 11:09:41 +01:00
Yevgeny Rouban
80a4f7c734 [IndVars] A test for adding trunc instructions to unwind blocks
Differential Revision: https://reviews.llvm.org/D93521
Reviewed By: skatkov
2020-12-18 17:08:26 +07:00
Kazu Hirata
e0c0bfdedc [InlineCost] Implement cost-benefit-based inliner
This patch adds an alternative cost metric for the inliner to take
into account both the cost (i.e. size) and cycle count savings into
account.

Without this patch, we decide to inline a given call site if the size
of inlining the call site is below the threshold that is computed
according to the hotness of the call site.

This patch adds a new cost metric, turned off by default, to take over
the handling of hot call sites.  Specifically, with the new cost
metric, we decide to inline a given call site if the ratio of cycle
savings to size exceeds a threshold.  The cycle savings are computed
from call site costs, parameter propagation, folded conditional
branches, etc, all weighted by their respective profile counts.  The
size is primarily the callee size, but we subtract call site costs and
the size of basic blocks that are never executed.

The new cost metric implicitly takes advantage of the machine function
splitter recently introduced by Snehasish Kumar, which dramatically
reduces the cost of duplicating (e.g. inlining) cold basic blocks by
placing cold basic blocks of hot functions in the .text.split
section.

We evaluated the new cost metric on clang bootstrap and SPECInt 2017.

For clang bootstrap, we observe 0.69% runtime improvement.

For SPECInt we report the change in IntRate the C/C++ benchmarks.  All
benchmarks apart from perlbench and omnetpp improve, on average by
0.21% with the max for mcf at 1.96%.

Benchmark               % Change
500.perlbench_r         -0.45
502.gcc_r                0.13
505.mcf_r                1.96
520.omnetpp_r           -0.28
523.xalancbmk_r          0.49
525.x264_r               0.00
531.deepsjeng_r          0.00
541.leela_r              0.35
557.xz_r                 0.21

Differential Revision: https://reviews.llvm.org/D92780
2020-12-18 00:37:24 -08:00
Jan Svoboda
da37def107 [clang][cli] Convert Analyzer option string based options to new option parsing system
Depends on D84185

Reviewed By: dexonsmith

Original patch by Daniel Grumberg.

Differential Revision: https://reviews.llvm.org/D84186
2020-12-18 08:56:06 +01:00
QingShan Zhang
4c07a9f4d6 [PowerPC] Select the D-Form load if we know its offset meets the requirement
The LD/STD likewise instruction are selected only when the alignment in
the load/store >= 4 to deal with the case that the offset might not be
known(i.e. relocations). That means we have to select the X-Form load
for %0 = load i64, i64* %arrayidx, align 2 In fact, we can still select
the D-Form load if the offset is known. So, we only query the load/store
alignment when we don't know if the offset is a multiple of 4.

Reviewed By: jji, Nemanjai

Differential Revision: https://reviews.llvm.org/D93099
2020-12-18 07:27:26 +00:00
Mircea Trofin
a76e040629 [NFC][utils] Factor remaining APIs under FunctionTestBuilder
Finishing the refactoring started in D93413.

Differential Revision: https://reviews.llvm.org/D93506
2020-12-17 22:13:14 -08:00
Yevgeny Rouban
011d4cc599 [IndVars] Fix adding trunc instructions to unwind blocks
Truncate instruction must not be inserted before landing pads.
The insertion point is fixed.
2020-12-18 12:52:23 +07:00
Kazu Hirata
e527e2e6d7 [IVDescriptors] Remove getConsecutiveDirection (NFC)
The last use of the function was removed on Sep 18, 2016 in commit
5f8cc0c3469ba3a7aa440b43aaababa3a6274213.

The function was later moved to llvm/lib/Analysis/IVDescriptors.cpp on
Sep 12, 2018 in commit 7e98d69847aefb1028aaa7131b508f4b4e9896ae.
2020-12-17 20:19:15 -08:00
Kazu Hirata
4088b88d51 [Transforms] Use llvm::erase_if (NFC) 2020-12-17 19:53:10 -08:00
Hsiangkai Wang
d53504a97c [RISCV] Remove NoVReg to avoid compile warning messages. 2020-12-18 11:37:47 +08:00
Rong Xu
be7baa3c43 Fix clang-ppc64le-rhel buildbot build error
ix buildbot build error due to
commit 3733463d: [IR][PGO] Add hot func attribute and use hot/cold
attribute in func section
2020-12-17 19:14:43 -08:00
Rong Xu
92b9137fcb [IR][PGO] Add hot func attribute and use hot/cold attribute in func section
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.

This patch adds support of hot function attribute to LLVM IR.  This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).

This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
    is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
    isFunctionColdInCallGraph is true, this function will be marked as
    cold.

The changes are:
(1) user annotated function attribute will used in setting function
    section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.

The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.

Differential Revision: https://reviews.llvm.org/D92493
2020-12-17 18:41:12 -08:00
Monk Chiang
f958745fcd [RISCV] Define vsadd/vsaddu/vssub/vssubu intrinsics.
We work with @rogfer01 from BSC to come out this patch.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Co-Authored-by: Monk Chiang <monk.chiang@sifive.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D93366
2020-12-18 10:24:24 +08:00
Layton Kifer
787313e887 [DAGCombiner] Improve shift by select of constant
Clean up a TODO, to support folding a shift of a constant by a
select of constants, on targets with different shift operand sizes.

Reviewed By: RKSimon, lebedev.ri

Differential Revision: https://reviews.llvm.org/D90349
2020-12-18 02:21:42 +00:00
Andrew Litteken
603148b130 [IRSim][IROutliner] Adding InstVisitor to disallow certain operations.
This adds a custom InstVisitor to return false on instructions that
should not be allowed to be outlined.  These match the illegal
instructions in the IRInstructionMapper with exception of the addition
of the llvm.assume intrinsic.

Tests all the tests marked: illegal-*-.ll with a test for each kind of
instruction that has been marked as illegal.

Reviewers: jroelofs, paquette

Differential Revisions: https://reviews.llvm.org/D86976
2020-12-17 19:33:57 -06:00
Zakk Chen
a97267c772 [RISCV] Define vlse/vsse intrinsics.
Define vlse/vsse intrinsics and lower to V instructions.

We work with @rogfer01 from BSC to come out this patch.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D93445
2020-12-17 17:00:01 -08:00
Nikita Popov
b8f0d648f9 [DSE] Add test for potential caching bug (NFC)
This one would miscompile if read-clobber checks switched to using
the EarlierAccess location, but the read cache was retained.
2020-12-17 23:35:01 +01:00
Sanjay Patel
55eacfb858 [VectorCombine] add tests for gep load with cast; NFC 2020-12-17 16:40:55 -05:00
Roman Lebedev
a354d00a71 [SimplifyCFG] Teach simplifyUnreachable() to preserve DomTree
Pretty boring, removeUnwindEdge() already known how to update DomTree,
so if we are to call it, we must first flush our own pending updates;
otherwise, we just stop predecessors from branching to us,
and for certain predecessors, stop their predecessors from
branching to them also.
2020-12-18 00:37:22 +03:00
Roman Lebedev
db36cde4ed [SimplifyCFG] ConstantFoldTerminator() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:22 +03:00
Roman Lebedev
2f6c6c6ca4 [SimplifyCFG] DeleteDeadBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-18 00:37:21 +03:00
Bangtian Liu
33b4e1043e Revert "Ensure SplitEdge to return the new block between the two given blocks"
This reverts commit d20e0c3444ad9ada550d9d6d1d56fd72948ae444.
2020-12-17 21:00:37 +00:00
Nico Weber
3aa1f027dc [gn build] Link with -Wl,--gdb-index when linking with LLD
For full-debug-info (is_debug=true / symbol_level=2 builds), this makes
linking 15% slower, but gdb startup 1500% faster (for lld: link time
3.9s->4.4s, gdb load time >30s->2s).

For link time, I ran

    bench.py -o {noindex,index}.txt \
        sh -c 'rm out/gn/bin/lld && ninja -C out/gn lld'

and then `ministat noindex.txt index.txt`:

```
x noindex.txt
+ index.txt
    N           Min           Max        Median           Avg        Stddev
x   5      3.784461     4.0200169     3.8452811     3.8754988   0.089902595
+   5       4.32496     4.6058481     4.3361208     4.4141198    0.12288267
Difference at 95.0% confidence
	0.538621 +/- 0.15702
	13.8981% +/- 4.05161%
	(Student's t, pooled s = 0.107663)
```

For gdb load time I loaded the crash in PR48392 with

    gdb -ex r --args ../out/gn/bin/ld64.lld.darwinnew @response.txt

and just stopped the time until the crash got displayed with a stopwatch
a few times. So the speedup there is less precise, but it's so
pronounced that that's ok (loads ~instantly with the patch, takes a very
long time without it).

Only doing this for LLD because I haven't tried it with other linkers.

Differential Revision: https://reviews.llvm.org/D92844
2020-12-17 15:39:00 -05:00
Johannes Doerfert
7bcfe2157e [OpenMP][NFC] Provide a new remark and documentation
If a GPU function is externally reachable we give up trying to find the
(unique) kernel it is called from. This can hinder optimizations. Emit a
remark and explain mitigation strategies.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D93439
2020-12-17 14:38:26 -06:00
Nico Weber
5a3cb1f76f [gn build] (manually) merge f4c8b8031800 2020-12-17 15:09:51 -05:00
Nikita Popov
2810fecbf2 [DSE] Add more tests for read clobber location (NFC) 2020-12-17 21:03:00 +01:00
Arthur Eubanks
517fe7c42b [test] Factor out creation of copy of SCC Nodes into function
Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D93434
2020-12-17 11:39:34 -08:00
Tony
062d45d53c [NFC][AMDGPU] Reorganize description of scratch handling
Differential Revision: https://reviews.llvm.org/D93440
2020-12-17 19:33:14 +00:00
Valentin Clement
328670a855 [openmp] Remove clause from OMPKinds.def and use OMP.td info
Remove the OpenMP clause information from the OMPKinds.def file and use the
information from the new OMP.td file. There is now a single source of truth for the
directives and clauses.

To avoid generate lots of specific small code from tablegen, the macros previously
used in OMPKinds.def are generated almost as identical. This can be polished and
possibly removed in a further patch.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D92955
2020-12-17 14:08:12 -05:00
Baptiste Saleil
5cbcf5677b [PowerPC] Rename the vector pair intrinsics and builtins to replace the _mma_ prefix by _vsx_
On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.

Differential Revision: https://reviews.llvm.org/D91974
2020-12-17 13:19:27 -05:00
LLVM GN Syncbot
13f5c4b392 [gn build] Port dae34463e3e 2020-12-17 17:28:45 +00:00
Andrew Litteken
cdd15bf157 [IRSim][IROutliner] Adding the extraction basics for the IROutliner.
Extracting the similar regions is the first step in the IROutliner.

Using the IRSimilarityIdentifier, we collect the SimilarityGroups and
sort them by how many instructions will be removed.  Each
IRSimilarityCandidate is used to define an OutlinableRegion.  Each
region is ordered by their occurrence in the Module and the regions that
are not compatible with previously outlined regions are discarded.

Each region is then extracted with the CodeExtractor into its own
function.

We test that correctly extract in:
test/Transforms/IROutliner/extraction.ll
test/Transforms/IROutliner/address-taken.ll
test/Transforms/IROutliner/outlining-same-globals.ll
test/Transforms/IROutliner/outlining-same-constants.ll
test/Transforms/IROutliner/outlining-different-structure.ll

Recommit of bf899e891387d07dfd12de195ce2a16f62afd5e0 fixing memory
leaks.

Reviewers: paquette, jroelofs, yroux

Differential Revision: https://reviews.llvm.org/D86975
2020-12-17 11:27:26 -06:00
Arthur Eubanks
3dc6698f98 [gn build] Add symbol_level to adjust debug info level
is_debug by default makes symbol_level = 2 and !is_debug means by
default symbol_level = 0.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D92958
2020-12-17 09:20:53 -08:00
Fangrui Song
803a9c9668 [LangRef] Update new ssp/sspstrong/sspreq semantics after D91816
Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D93422
2020-12-17 09:16:37 -08:00
Valentin Clement
d6763a4688 [flang][openacc] Enforce restriction on routine directive and clauses
This patch add some checks for the restriction on the routine directive
and fix several issue at the same time.

Validity tests have been added in a separate file than acc-clause-validity.f90 since this one
became quite large. I plan to split the larger file once on-going review are done.

Reviewed By: sameeranjoshi

Differential Revision: https://reviews.llvm.org/D92672
2020-12-17 11:33:34 -05:00
Nabeel Omer
4ce8799006 [DebugInfo] Avoid re-ordering assignments in LCSSA
The LCSSA pass makes use of a function insertDebugValuesForPHIs() to
propogate dbg.value() intrinsics to newly inserted PHI instructions. Faulty
behaviour occurs when the parent PHI of a newly inserted PHI is not the
most recent assignment to a source variable. insertDebugValuesForPHIs ends
up propagating a value that isn't the most recent assignemnt.

This change removes the call to insertDebugValuesForPHIs() from LCSSA,
preventing incorrect dbg.value intrinsics from being propagated.
Propagating variable locations between blocks will occur later, during
LiveDebugValues.

Differential Revision: https://reviews.llvm.org/D92576
2020-12-17 16:17:32 +00:00
Jinsong Ji
2af4f68200 [PowerPC][NFC] Cleanup PPCCTRLoopsVerify pass
The PPCCTRLoop pass has been moved to HardwareLoops,
so the comments and some useless code are deprecated now.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D93336
2020-12-17 11:16:33 -05:00
Jon Chesterfield
0309056e6e [amdgpu] Default to code object v3
[amdgpu] Default to code object v3
v4 is not yet readily available, and doesn't appear
to be implemented in the back end

Reviewed By: t-tye, yaxunl

Differential Revision: https://reviews.llvm.org/D93258
2020-12-17 16:09:33 +00:00
Bangtian Liu
a2ec1d8ec2 Ensure SplitEdge to return the new block between the two given blocks
This PR implements the function splitBasicBlockBefore to address an
issue
that occurred during SplitEdge(BB, Succ, ...), inside splitBlockBefore.
The issue occurs in SplitEdge when the Succ has a single predecessor
and the edge between the BB and Succ is not critical. This produces
the result ‘BB->Succ->New’. The new function splitBasicBlockBefore
was added to splitBlockBefore to handle the issue and now produces
the correct result ‘BB->New->Succ’.

Below is an example of splitting the block bb1 at its first instruction.

/// Original IR
bb0:
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlock
bb0:
	br bb1
bb1:
	br bb1.split
bb1.split:
        %0 = mul i32 1, 2
	br bb2
bb2:
/// IR after splitEdge(bb0, bb1) using splitBasicBlockBefore
bb0:
	br bb1.split
bb1.split
	br bb1
bb1:
        %0 = mul i32 1, 2
	br bb2
bb2:

Differential Revision: https://reviews.llvm.org/D92200
2020-12-17 16:00:15 +00:00
Amy Huang
ec802f1356 [llvm-symbolizer][Windows] Add start line when searching in line table sections.
Fixes issue where if a line section doesn't start with a line number
then the addresses at the beginning of the section don't have line numbers.

For example, for a line section like this
```
  0001:00000010-00000014, line/column/addr entries = 1
     7 00000013 !
```
a line number wouldn't be found for addresses from 10 to 12.

This matches behavior when using the DIA SDK.

Differential Revision: https://reviews.llvm.org/D93306
2020-12-17 07:57:36 -08:00
Simon Pilgrim
1c482de06b [SampleFDO] Fix uninitialized field warnings. NFCI.
Seems to have been caused by D93254 which added the SecHdrTableEntry::LayoutIndex field.
2020-12-17 15:51:26 +00:00
Valentin Clement
03ce68ffad [flang][openacc] Update serial construct clauses for OpenACC 3.1
Update the allowed clauses for the SERIAL construct for the new OpenACC 3.1
specification.

Reviewed By: sameeranjoshi

Differential Revision: https://reviews.llvm.org/D92123
2020-12-17 10:50:47 -05:00