1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 19:23:23 +01:00
Commit Graph

203641 Commits

Author SHA1 Message Date
Craig Topper
dcfdc54cf8 [DAGCombiner] Teach visitMLOAD to replace an all ones mask with an unmasked load
If we have an all ones mask, we can just a regular masked load. InstCombine already gets this in IR. But the all ones mask can appear after type legalization.

Only avx512 test cases are affected because X86 backend already looks for element 0 and the last element being 1. It replaces this with an unmasked load and blend. The all ones mask is a special case of that where the blend will be removed. That transform is only enabled on avx2 targets. I believe that's because a non-zero passthru on avx2 already requires a separate blend so its more profitable to handle mixed constant masks.

This patch adds a dedicated all ones handling to the target independent DAG combiner. I've skipped extending, expanding, and index loads for now. X86 doesn't use index so I don't know much about it. Extending made me nervous because I wasn't sure I could trust the memory VT had the right element count due to some weirdness in vector splitting. For expanding I wasn't sure if we needed different undef handling.

Differential Revision: https://reviews.llvm.org/D87788
2020-09-16 13:21:16 -07:00
Craig Topper
87cd53cd27 [X86] Add test case for a masked load mask becoming all ones after type legalization.
We should be able to turn this into a unmasked load. X86 has an
optimization to detect that the first and last element aren't masked
and then turn the whole thing into an unmasked load and a blend.
That transform is disabled on avx512 though.

But if we know the blend isn't needed, then the unmasked load by
itself should always be profitable.
2020-09-16 13:10:04 -07:00
Philip Reames
1d6082b5e3 [aarch64][tests] Add tests which show current lack of implicit null support
I will be posting a patch which adds appropriate target support shortly; landing the tests so that the diffs are clear.
2020-09-16 12:55:29 -07:00
David Greene
eb1409d08e [UpdateTestChecks] Allow $ in function names
Some compilers generation functions with '$' in their names, so recognize those
functions.

This also requires recognizing function names inside quotes in some contexts in
order to escape certain characters.

Differential Revision: https://reviews.llvm.org/D82995
2020-09-16 14:34:18 -05:00
LLVM GN Syncbot
05cd8277f1 [gn build] Port 56069b5c71c 2020-09-16 19:03:25 +00:00
Nikita Popov
4a43cab9e0 Reapply [InstCombine] Simplify select operand based on equality condition
Reapply after fixing SimplifyWithOpReplaced() to never return
the original value, which would lead to an infinite loop in this
transform.

-----

For selects of the type X == Y ? A : B, check if we can simplify A
by using the X == Y equality and replace the operand if that's
possible. We already try to do this in InstSimplify, but will only
fold if the result of the simplification is the same as B, in which
case the select can be dropped entirely. Here the select will be
retained, just one operand simplified.

As we are performing an actual replacement here, we don't have
problems with refinement / poison values.

Differential Revision: https://reviews.llvm.org/D87480
2020-09-16 20:53:58 +02:00
Nikita Popov
3a8ed708c6 [InstSimplify] Clarify SimplifyWithOpReplaced() return value
If SimplifyWithOpReplaced() cannot simplify the value, null should
be returned. Make sure this really does happen in all cases,
including those where SimplifyBinOp() returns the original value.

This does not matter for existing users, but does mattter for
D87480, which would go into an infinite loop otherwise.
2020-09-16 20:53:26 +02:00
Nikita Popov
f88393ae10 [InstCombine] Add test for infinite combine loop (NFC)
Test courtesy of bkramer for the infinite combine loop introduced
by D87480.
2020-09-16 20:53:25 +02:00
Michael Liao
18f35c5544 Fix build. 2020-09-16 14:52:00 -04:00
Nico Weber
aa432b5dba [gn build] unconfuse sync script about "sources = []" in clang/lib/Headers/BUILD.gn 2020-09-16 14:50:29 -04:00
Rahman Lavaee
c08157a6ed Revert "[obj2yaml] - Match ".stack_size" with the original section name, and not the uniquified name."
This reverts commit 14e55f82980cf1342d4d3eea4885a5375e829496.
2020-09-16 11:42:37 -07:00
Stanislav Mekhanoshin
26c2b984ef [AMDGPU] gfx1030 RT support
Differential Revision: https://reviews.llvm.org/D87782
2020-09-16 11:40:58 -07:00
Johannes Doerfert
facd70cf60 [OpenMP] Context selector extensions for template functions
With this extension the effects of `omp begin declare variant` will be
applied to template function declarations. The behavior is opt-in and
controlled by the `extension(allow_templates)` trait. While generally
useful, this will enable us to implement complex math function calls by
overloading the templates of the standard library with the ones in
libc++.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D85735
2020-09-16 13:37:10 -05:00
Johannes Doerfert
bb8acd5a57 [OpenMP] Context selector extensions for return value overloading
This extension allows to declare variants in between `omp begin/end
declare variant` that do not match the type of the existing function
with that name. Without this extension we would not find a base function
(with a compatible type), therefore create a new one, which would
cause conflicting declarations. With this extension we will not create
"missing" base functions, which basically renders these specializations
harmless. They will be generated but never called.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D85878
2020-09-16 13:37:09 -05:00
Johannes Doerfert
c187f80294 [UpdateTestChecks][NFC] Fix spelling 2020-09-16 13:37:08 -05:00
Rahman Lavaee
848731407e [obj2yaml] - Match ".stack_size" with the original section name, and not the uniquified name.
Without this patch, obj2yaml decodes the content of only one ".stack_size" section. Other sections are dumped with their full contents.

Reviewed By: grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D87727
2020-09-16 11:33:20 -07:00
Matt Arsenault
07bfd23edb GlobalISel: Lift store value widening restriction
This doesn't change the memory size and doesn't need to worry about
non-power-of-2 sizes.
2020-09-16 14:25:07 -04:00
Nico Weber
1db80af6a3 [gn build] make "all" target build
If you want to build everything, building the default target
via just `ninja` is better, but `ninja all` shouldn't give you
compile errors -- this fixes that.
2020-09-16 14:21:48 -04:00
Amara Emerson
684834aea1 [AArch64][GlobalISel] Make G_BUILD_VECTOR os <16 x s8> legal. 2020-09-16 11:19:47 -07:00
Michael Kitzan
fec094fca1 [GISel] Add new combines for unary FP instrs with constant operand
https://reviews.llvm.org/D86393

Patch adds five new `GICombinerRules`, one for each of the following unary
FP instrs: `G_FNEG`, `G_FABS`, `G_FPTRUNC`, `G_FSQRT`, and `G_FLOG2`. The
combine rules perform the FP operation on the constant operand and replace
the original instr with the result. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.
2020-09-16 10:34:15 -07:00
Simon Pilgrim
b6c38b504d DwarfUnit.h - remove unnecessary includes. NFCI. 2020-09-16 18:32:29 +01:00
Simon Pilgrim
c0364fac98 raw_ostream.cpp - remove duplicate includes. NFCI.
Remove headers already included in raw_ostream.h
2020-09-16 18:32:28 +01:00
Simon Pilgrim
0f41e280da InterferenceCache.cpp - remove duplicate includes. NFCI.
Remove headers already included in InterferenceCache.h
2020-09-16 18:32:28 +01:00
Simon Pilgrim
2041d8607d ValueEnumerator.cpp - remove duplicate includes. NFCI.
Remove headers already included in ValueEnumerator.h
2020-09-16 18:32:28 +01:00
Sanjay Patel
173d6a2882 [SLP] add tests for reduction ordering; NFC 2020-09-16 13:28:19 -04:00
Fangrui Song
2e45ee1e4c [llvm-nm] Use aggregate initialization instead of memset zero 2020-09-16 10:27:12 -07:00
Jamie Schmeiser
8d6d1d8a73 Re-land: Add new hidden option -print-changed which only reports changes to IR
A new hidden option -print-changed is added along with code to support
printing the IR as it passes through the opt pipeline in the new pass
manager. Only those passes that change the IR are reported, with others
only having the banner reported, indicating that they did not change the
IR, were filtered out or ignored. Filtering of output via the
-filter-print-funcs is supported and a new supporting hidden option
-filter-passes is added. The latter takes a comma separated list of pass
names and filters the output to only show those passes in the list that
change the IR. The output can also be modified via the -print-module-scope
function.

The code introduces a template base class that generalizes the comparison
of IRs that takes an IR representation as template parameter. The
constructor takes a series of lambdas that provide an event based API
for generalized reporting of IRs as they are changed in the opt pipeline
through the new pass manager.

The first of several instantiations is provided that prints the IR
in a form similar to that produced by -print-after-all with the above
mentioned filtering capabilities. This version, and the others to
follow will be introduced at the upcoming developer's conference.

Reviewed By: aeubanks (Arthur Eubanks), yrouban (Yevgeny Rouban), ychen (Yuanfang Chen)

Differential Revision: https://reviews.llvm.org/D86360
2020-09-16 17:25:18 +00:00
Matt Arsenault
bd0c7f4ec9 RegAllocFast: Make self loop live-out heuristic more aggressive
This currently has no impact on code, but prevents sizeable code size
regressions after D52010. This prevents spilling and reloading all
values inside blocks that loop back. Add a baseline test which would
regress without this patch.
2020-09-16 13:12:38 -04:00
Reid Kleckner
eccb0fb0b3 Include (Type|Symbol)Record.h less
Most clients only need CVType and CVSymbol, not structs for every type
and symbol. Move CVSymbol and CVType to CVRecord.h to accomplish this.
Update some of the common headers that need CVSymbol and CVType to use
the new location.
2020-09-16 09:59:03 -07:00
Matt Arsenault
7a37a84909 AMDGPU: Clear offset register when using local stack area
eliminateFrameIndex won't fix up the offset register when the direct
frame index reference is moved to a separate move instruction. Switch
the offset to a base 0 (which it probably should be to begin with).
2020-09-16 12:56:40 -04:00
Matt Arsenault
c25699b258 AMDGPU: Add baseline test for incorrect SP access 2020-09-16 12:56:40 -04:00
Matt Arsenault
cc3396a954 LocalStackSlotAllocation: Swap order of check 2020-09-16 12:56:40 -04:00
Arthur Eubanks
7a72eeae37 [Coro][NewPM] Handle llvm.coro.prepare.retcon in NPM coro-split pass
Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D87731
2020-09-16 09:09:10 -07:00
Sjoerd Meijer
047df8d3e0 [ARM][MVE] Tail-predication: predicate new elementcount checks on force-enabled
Additional sanity checks were added to get.active.lane.mask's second argument,
the loop tripcount/elementcount, in rG635b87511ec3. Like the other (overflow)
checks, skip this if tail-predication is forced.

Differential Revision: https://reviews.llvm.org/D87769
2020-09-16 17:05:14 +01:00
Jay Foad
21197a6926 [AMDGPU] Remove obsolete comment
Obsoleted by e4464bf3d45848461630e3771d66546d389f1ed5 "AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR"
2020-09-16 17:03:55 +01:00
Francesco Petrogalli
f1d24f0e57 [llvm][CodeGen] Do not scalarize llvm.masked.[gather|scatter] operating on scalable vectors.
This patch prevents the `llvm.masked.gather` and `llvm.masked.scatter` intrinsics to be scalarized when invoked on scalable vectors.

The change in `Function.cpp` is needed to prevent the warning that is raised when `getNumElements` is used in place of `getElementCount` on `VectorType` instances. The tests guards for regressions on this change.

The tests makes sure that calls to `llvm.masked.[gather|scatter]` are still scalarized when:

  # the intrinsics are operating on fixed size vectors, and
  # the compiler is not targeting fixed length SVE code generation.

Reviewed By: efriedma, sdesmalen

Differential Revision: https://reviews.llvm.org/D86249
2020-09-16 16:00:28 +00:00
Arthur Eubanks
825221f2e5 [NPM] Translate alias analysis into require<> as well
'require<globals-aa>' is needed to make globals-aa work in NPM, since
globals-aa is a module analysis but function passes cannot run module
analyses on demand.
So don't skip translating alias analyses to 'require<>'.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87743
2020-09-16 08:54:09 -07:00
Dmitry Preobrazhensky
2ead9fa20c [AMDGPU] Corrected directive to use for ELF weak refs
WeakRefDirective should specify a directive to declare "a global as being a weak undefined symbol".
The directive used by AMDGPU was incorrect - ".weakref" was intended for other purposes.
The correct directive is ".weak" and it is already defined as default for ELF.
So the redefinition was removed.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D87762
2020-09-16 18:51:26 +03:00
Simon Pilgrim
c2569d39e2 [X86] EmitInstrWithCustomInserter - remove redundant getDebugLoc() calls. NFCI.
Use the same DebugLoc that is called at the top of the method.

Fixes some Wshadow static analyzer warnings.
2020-09-16 16:29:56 +01:00
Mircea Trofin
2e97c41718 [NFC][Regalloc] accessors for 'reg' and 'weight'
Also renamed the fields to follow style guidelines.

Accessors help with readability - weight mutation, in particular,
is easier to follow this way.

Differential Revision: https://reviews.llvm.org/D87725
2020-09-16 08:28:57 -07:00
Matt Arsenault
f2aa3ef913 AMDGPU: Improve <2 x i24> arguments and return value handling
This was asserting for GlobalISel. For SelectionDAG, this was
passing this on the stack. Instead, scalarize this as if it were a
32-bit vector.
2020-09-16 11:21:56 -04:00
Sebastian Neubauer
6aba538d9d [AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420
2020-09-16 17:20:27 +02:00
Simon Pilgrim
eae470c498 [X86] Assert that we've found a terminator instruction. NFCI.
Fixes clang static analayzer null dereference warning.
2020-09-16 16:17:49 +01:00
Jay Foad
d11aa00c67 [AMDGPU] Enable scheduling around FP MODE-setting instructions
Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is
marked as having unmodeled side effects, which makes the machine
scheduler treat it as a barrier. Now that we have proper implicit $mode
operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead
for setregs that only touch the FP MODE bits, to give the scheduler more
freedom.

Differential Revision: https://reviews.llvm.org/D87446
2020-09-16 16:10:47 +01:00
Jay Foad
eea65ac487 [AMDGPU] Add -show-mc-encoding to setreg tests
This is a pre-commit for D87446 "[AMDGPU] Enable scheduling around FP MODE-setting instructions"
2020-09-16 16:09:47 +01:00
Simon Pilgrim
c3ae82fe83 [X86][SSE] Move VZEXT_MOVL(INSERT_SUBVECTOR(UNDEF,X,0)) handling into combineTargetShuffle.
Now that we're getting better at combining shuffles of different vector widths, this can now be performed as part of the standard target shuffle combines and isn't required for cleanup.

Exposed a minor issue in combineX86ShufflesRecursively where we failed to check if a shuffle's src ops were simple types.
2020-09-16 16:08:31 +01:00
Dangeti Tharun kumar
b31191fb60 [Partial Inliner] Compute intrinsic cost through TTI
https://bugs.llvm.org/show_bug.cgi?id=45932

assert(OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined function cost should be no less than the outlined region") getting triggered in computeBBInlineCost.

Intrinsics like "assume" are considered regular function calls while computing costs.
This patch enables computeBBInlineCost to queries TTI for intrinsic call cost.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D87132
2020-09-16 15:12:31 +01:00
Florian Hahn
bd672b8800 [DSE] Add another test cases with loop carried dependence. 2020-09-16 14:50:35 +01:00
Paul C. Anagnostopoulos
3fb53046bc Add section with details about DAGs. 2020-09-16 09:27:28 -04:00
Sanjay Patel
4abaadfc37 [SLP] fix formatting; NFC
Also move variable declarations closer to usage and add code comments.
2020-09-16 08:50:27 -04:00