1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 10:42:39 +01:00
Commit Graph

2656 Commits

Author SHA1 Message Date
Krzysztof Parzyszek
a31ff2abea [Hexagon] Emit enough stores when aligning vector addresses 2020-12-15 18:59:53 -06:00
Krzysztof Parzyszek
2c841246e5 [Hexagon] Fix bitcasting v1i8 -> i8 2020-12-15 16:01:24 -06:00
Reid Kleckner
0ab72ed3c5 [Hexagon] Tweak _MSC_VER workaround version
My bot runs VS 2019, but it could not compile this code.

Message:
[55/2465] Building CXX object lib\Target\Hexagon\CMakeFiles\LLVMHexagonCodeGen.dir\HexagonVectorCombine.cpp.obj
FAILED: lib/Target/Hexagon/CMakeFiles/LLVMHexagonCodeGen.dir/HexagonVectorCombine.cpp.obj
...
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.23.28105\include\map(71): error C2976: 'std::map': too few template arguments
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.23.28105\include\map(71): note: see declaration of 'std::map'

The version in the path, 14.23, corresponds to _MSC_VER 1923, so raise
the version floor to 1924.

I have not tested with versions between 1924 and 1928 (latest), but the
latest works with the variadic version.
2020-12-14 11:26:36 -08:00
Kazu Hirata
c204059cba [Target] Use llvm::is_contained (NFC) 2020-12-13 19:35:10 -08:00
Krzysztof Parzyszek
577e1b0232 [Hexagon] Reconsider getMask fix, return original mask, convert later
The getPayload/getMask/getPassThrough functions should return values
that could be composed into a masked load/store without any additional
type casts. The previous fix violated that.
Instead, convert scalar mask to a vector right before rescaling.
2020-12-12 13:27:22 -06:00
Krzysztof Parzyszek
988ff0aa45 [Hexagon] Create vector masks for scalar loads/stores
AlignVectors treats all loaded/stored values as vectors of bytes,
and masks as corresponding vectors of booleans, so make getMask
produce a 1-element vector for scalars from the start.
2020-12-12 11:12:17 -06:00
Krzysztof Parzyszek
bc08eac614 [Hexagon] Workaround for compilation error with VS2017 2020-12-11 15:11:44 -06:00
Krzysztof Parzyszek
eb2eae25ad [Hexagon] Fix gcc6 compilation issue 2020-12-10 08:17:07 -06:00
Benjamin Kramer
94cf8f1d41 [Hexagon] Fold single-use variables into assert. NFCI.
Silences unused variable warnings in Release builds.
2020-12-10 10:53:56 +01:00
Krzysztof Parzyszek
071e5ba62e [Hexagon] Silence warnings about unused objects 2020-12-09 17:54:10 -06:00
Krzysztof Parzyszek
1a325ba866 [Hexagon] Fix build: move template specialization into namespace scope 2020-12-09 17:40:15 -06:00
Krzysztof Parzyszek
deb082d99d [Hexagon] Realign HVX vectors wherever possible
Introduce HexagonVectorCombine as a helper class for vector-related
optimizations.
2020-12-09 17:11:25 -06:00
Mircea Trofin
611de20466 [NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister.
Typing the API appropriately.

Differential Revision: https://reviews.llvm.org/D92341
2020-12-02 15:46:38 -08:00
Krzysztof Parzyszek
a38cbfc628 [Hexagon] Improve check for HVX types
Allow non-simple types, like <17 x i32> to be treated as HVX vector
types.
2020-11-27 13:33:10 -06:00
Simon Pilgrim
5a411c8bf8 [Hexagon] Add HVX support for ISD::SMAX/SMIN/UMAX/UMIN instead of custom dag patterns
Followup to D92112 now that I've learnt about HVX type splitting.

This is some necessary cleanup work for min/max ops to eventually help us move the add/sub sat patterns into DAGCombine - D91876.

Differential Revision: https://reviews.llvm.org/D92169
2020-11-27 15:46:11 +00:00
Nikita Popov
0e6a699715 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Simon Pilgrim
b907f80fbe [Hexagon] Add support for ISD::SMAX/SMIN/UMAX/UMIN instead of custom dag patterns
This should handle the basic integer min/max handling - the HVX ops are still TODO.

This is some necessary cleanup work for min/max ops to eventually help us move the add/sub sat patterns into DAGCombine - D91876.

Differential Revision: https://reviews.llvm.org/D92112
2020-11-25 19:02:17 +00:00
Craig Topper
f7ac298f12 [SelectionDAG][ARM][AArch64][Hexagon][RISCV][X86] Add SDNPCommutative to fma and fmad nodes in tablegen. Remove explicit commuted patterns from targets.
X86 was already specially marking fma as commutable which allowed
tablegen to autogenerate commuted patterns. This moves it to the target
independent definition and fix up the targets to remove now
unneeded patterns.

Unfortunately, the tests change because the commuted version of
the patterns are generating operands in a different than the
explicit patterns.

Differential Revision: https://reviews.llvm.org/D91842
2020-11-23 10:09:20 -08:00
Arthur Eubanks
04d409f2b1 [Hexagon][NewPM] Port -hexagon-loop-idiom and add to pipeline
Fixes pmpy-mod.ll under NPM

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D91829
2020-11-20 09:34:37 -08:00
Gaurav Jain
025b28cffd [NFC] Use [MC]Register for Hexagon target
Differential Revision: https://reviews.llvm.org/D91160
2020-11-18 08:17:07 -08:00
Florian Hahn
db0b0dabe6 [AsmWriter] Factor out mnemonic generation to accessible getMnemonic.
This patch factors out the part of printInstruction that gets the
mnemonic string for a given MCInst. This is intended to be used
subsequently for the instruction-mix remarks to display the final
mnemonic (D90040).

Unfortunately making `getMnemonic` available to the AsmPrinter
seems to require making it virtual. Not sure if there's a way around
that with the current layering of the AsmPrinters.

Reviewed By: Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D90039
2020-11-17 09:47:38 +00:00
serge-sans-paille
82b6e6053d llvmbuildectomy - replace llvm-build by plain cmake
No longer rely on an external tool to build the llvm component layout.

Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.

These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.

Differential Revision: https://reviews.llvm.org/D90848
2020-11-13 10:35:24 +01:00
Sander de Smalen
e0eda3654e [SVE] Return StackOffset for TargetFrameLowering::getFrameIndexReference.
To accommodate frame layouts that have both fixed and scalable objects
on the stack, describing a stack location or offset using a pointer + uint64_t
is not sufficient. For this reason, we've introduced the StackOffset class,
which models both the fixed- and scalable sized offsets.

The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset,
so that this can be used in other interfaces, such as to eliminate frame indices
in PEI or to emit Debug locations for variables on the stack.

This patch is purely mechanical and doesn't change the behaviour of how
the result of this function is used for fixed-sized offsets. The patch adds
various checks to assert that the offset has no scalable component, as frame
offsets with a scalable component are not yet supported in various places.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90018
2020-11-05 11:02:18 +00:00
Krzysztof Parzyszek
13b75ea80b [Hexagon] Move isTypeForHVX from Hexagon TTI to HexagonSubtarget, NFC
It's useful outside of Hexagon TTI, and with how TTI is implemented,
it is not accessible outside of TTI.
2020-11-02 14:00:45 -06:00
Florian Hahn
1db8566f5e Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408facc3a79ee4ff7e9983cc972f797e176.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Florian Hahn
f44af1a603 Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df58dca9d1596440b866b52929e3878de.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Florian Hahn
d8012eedb7 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
Krzysztof Parzyszek
00f9d2560c [Hexagon] Handle additional shuffles that can be made perfect 2020-10-29 19:09:00 -05:00
Krzysztof Parzyszek
28d3c031c0 [Hexagon] Handle selection between HVX vector predicates
Make sure that (select i1 q0 q1) is handled properly.
2020-10-23 18:22:03 -05:00
Nicholas Guy
9e9cf3e008 Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate
Some instructions may be removable through processes such as IfConversion,
however DefinesPredicate can not be made aware of when this should be considered.
This parameter allows DefinesPredicate to distinguish these removable instructions
on a per-call basis, allowing for more fine-grained control from processes like
ifConversion.

Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose

Differential Revision: https://reviews.llvm.org/D88494
2020-10-21 11:52:47 +01:00
Krzysztof Parzyszek
45720e3d8e [Hexagon] Fix license headers in some .td files, NFC 2020-10-16 10:03:05 -05:00
Krzysztof Parzyszek
b743f9e32b [Hexagon] Generate better splat code on v62+ 2020-10-14 12:55:20 -05:00
Krzysztof Parzyszek
796a593c0d [Hexagon] Replace HexagonISD::VSPLAT with ISD::SPLAT_VECTOR
This removes VSPLAT and VZERO. VZERO is now SPLAT_VECTOR of (i32 0).

Included is also a testcase for the previous (target-independent)
commit.
2020-10-10 19:49:47 -05:00
Krzysztof Parzyszek
3600e70de4 [Hexagon] Remove ISD node VSPLATW, use VSPLAT instead
This is a step towards improving HVX codegen for splat.
2020-10-09 15:38:02 -05:00
Krzysztof Parzyszek
0065c2c37b [Hexagon] Generalize handling of SDNodes created during ISel
The selection of HVX shuffles can produce more nodes in the DAG,
which need special handling, or otherwise they would be left
unselected by the main selection code. Make the handling of such
nodes more general.
2020-10-09 15:38:02 -05:00
Krzysztof Parzyszek
9ee6426bce [Hexagon] Return 1 instead of 0 from getMaxInterleaveFactor 2020-10-09 09:46:18 -05:00
Krzysztof Parzyszek
3482300416 [Hexagon] Move selection of HVX multiply from lowering to patterns
Also, change i32*i32 to V6_vmpyieoh + V6_vmpyiewuh_acc, which works
on V60 as well.
2020-10-02 16:04:34 -05:00
David Sherwood
9542ca0d95 [SVE][CodeGen] Add new EVT/MVT getFixedSizeInBits() functions
When we know that a particular type is always going to be fixed
width we have so far been writing code like this:

  getSizeInBits().getFixedSize()

Since we are doing this in quite a few places now it seems to make
sense to add a new helper function that allows us to replace
these calls with a single getFixedSizeInBits() call.

Differential Revision: https://reviews.llvm.org/D88649
2020-10-02 07:47:31 +01:00
Arthur Eubanks
6bf4e16602 [NPM] Add target specific hook to add passes for New Pass Manager
The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D88138
2020-09-30 13:29:43 -07:00
Krzysztof Parzyszek
df96f62f63 [Hexagon] Avoid crash on CONCAT_VECTORS with illegal element types
Legal vector element types may not be legal as scalar types. When
CONCAT_VECTORS is converted to BUILD_VECTOR, the individual vector
elements become standalone operands to the build operation. If they
have illegal (scalar) types, they need to be made legal. In doing
so, the case of TRUNCATE was not handled, causing an assertion to
fail.
2020-09-24 20:05:23 -05:00
Stefanos Baziotis
22876f12a9 Small fixes for "[LoopInfo] empty() -> isInnermost(), add isOutermost()" 2020-09-22 23:59:34 +03:00
Stefanos Baziotis
b245c32b48 [LoopInfo] empty() -> isInnermost(), add isOutermost()
Differential Revision: https://reviews.llvm.org/D82895
2020-09-22 23:28:51 +03:00
Pengxuan Zheng
acbb48020d [Hexagon] Make HexagonVLCR compatibile with New PM
The patch modifies HexagonVectorLoopCarriedReuse pass to make it compatible with both Legacy Pass Manager through HexagonVectorLoopCarriedReuseLegacyPass and with New Pass Manager through HexagonVectorLoopCarriedReusePass.

Reviewed By: pzheng

Differential Revision: https://reviews.llvm.org/D86955
2020-09-21 13:45:12 -07:00
Krzysztof Parzyszek
718a375cea [Hexagon] Replace incorrect pattern for vpackl HWI32 -> HVi8
V6_vdealb4w is not correct for pairs, use V6_vpackeh/V6_vpackeb instead.
2020-09-15 20:34:50 -05:00
Krzysztof Parzyszek
62ea8de0fa [Hexagon] Widen loads and handle any-/sign-/zero-extensions 2020-09-14 18:10:23 -05:00
Krzysztof Parzyszek
25fb93fc6a [Hexagon] Some HVX DAG combines
1. VINSERTW0 x, undef -> x
2. VROR (VROR x, a), b) -> VROR x, a+b
2020-09-14 18:10:23 -05:00
Craig Topper
ad2bea0363 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Krzysztof Parzyszek
6f7d358ab3 [Hexagon] Avoid widening vectors with non-HVX element types 2020-09-12 20:26:54 -05:00
Krzysztof Parzyszek
9308743423 [Hexagon] Split pair-based masked memops 2020-09-10 14:24:42 -05:00
Simon Pilgrim
f620ab1b9f Hexagon.h - remove unnecessary includes. NFCI.
Replace with forward declarations and move includes to implicit dependent files.
2020-09-10 16:59:43 +01:00
Krzysztof Parzyszek
90ea855f41 [Hexagon] Account for truncating pairs to non-pairs when widening truncates
Added missing selection patterns for vpackl.
2020-09-09 14:31:52 -05:00
Krzysztof Parzyszek
7bb3107990 [Hexagon] Fix order of operands in V6_vdealb4w 2020-09-08 22:09:28 -05:00
Krzysztof Parzyszek
77adca5920 [Hexagon] Handle widening of truncation's operand with legal result
Failing example: v8i8 = truncate v8i32. v8i8 is legal, but v8i32 was
widened to HVX. Make sure that v8i8 does not get altered (even if it's
changed to another legal type).
2020-09-08 16:07:39 -05:00
Roman Lebedev
8406429eae Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec2183d466dad64b763bab4e15fd8804239
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec2183d466dad64b763bab4e15fd8804239.
2020-09-08 00:24:03 +03:00
Krzysztof Parzyszek
cb324edefe [Hexagon] Add assertions about V6_pred_scalar2 2020-09-05 18:20:23 -05:00
Krzysztof Parzyszek
34a99d5a26 [Hexagon] When widening truncate result, also widen operand if necessary 2020-09-05 18:19:32 -05:00
Krzysztof Parzyszek
c8b94c00f4 [Hexagon] Resize the mem operand when widening loads and stores 2020-09-05 18:17:48 -05:00
Krzysztof Parzyszek
00ccf90843 [Hexagon] Handle widening of vector truncate 2020-09-05 15:07:38 -05:00
Krzysztof Parzyszek
af4c0a6b56 [Hexagon] Unindent everything in HexagonISelLowering.h, NFC
Just a shift, no other formatting changes.
2020-09-04 17:25:29 -05:00
Krzysztof Parzyszek
1eb8234c85 [Hexagon] Fix perfect shuffle generation for single vectors
Perfect shuffle instruction (vdealvdd/vshuffvdd) work on vector
pairs. When given a single input vector, half of it first needs
to be transposed into the other vector before the generated
shuffles can take effect. Also the first transpose needs to be
undone at the end (this last step was missing).
2020-08-30 06:43:16 -05:00
Craig Topper
09050e4cf7 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Krzysztof Parzyszek
9ae5fb90d3 [Hexagon] Emit better 32-bit multiplication sequence for HVXv62+ 2020-08-27 15:24:32 -05:00
Benjamin Kramer
1af5e41f38 [Hexagon] Fold another layer of single-use variable into assert. NFCI. 2020-08-27 16:52:34 +02:00
Benjamin Kramer
f7ce567fa8 [Hexagon] Fold single-use variable into assert. NFCI. 2020-08-27 16:44:22 +02:00
Krzysztof Parzyszek
fd8ba7e648 [Hexagon] Widen short vector stores to HVX vectors using masked stores
Also invent a flag -hexagon-hvx-widen=N to set the minimum threshold
for widening short vectors to HVX vectors.
2020-08-27 09:25:08 -05:00
Krzysztof Parzyszek
0829adce54 [Hexagon] Implement llvm.masked.load and llvm.masked.store for HVX 2020-08-26 13:10:22 -05:00
Ankit Aggarwal
b4954780f6 [Hexagon] Check if EVT is simple type in HVX lowering 2020-08-25 15:02:44 -05:00
Krzysztof Parzyszek
293642c8c1 [Hexagon] Remove (redundant) HexagonISelLowering::isHvxOperation(SDValue)
Use isHvxOperation(SDNode*) instead.
2020-08-25 11:45:08 -05:00
Roman Lebedev
29a87631f2 Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with
	https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d89bd33fb8874e242ab87b265b4dec1c.
2020-08-22 00:33:22 +03:00
Craig Topper
10839866a1 [X86][MC][Target] Initial backend support a tune CPU to support -mtune
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.

This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.

One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.

I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.

Differential Revision: https://reviews.llvm.org/D85165
2020-08-14 15:31:50 -07:00
Krzysztof Parzyszek
b46450a31b [Hexagon] Return scalar size in getMinVectorRegisterBitWidth() when no HVX
This fixes https://llvm.org/PR47128.
2020-08-12 10:13:58 -05:00
Kerry McLaughlin
76e22108d4 [CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85220
2020-08-11 12:17:10 +01:00
Arthur Eubanks
ff7fade869 [Hexagon] Use InstSimplify instead of ConstantProp
This is the last remaining use of ConstantProp, migrate it to InstSimplify in the goal of removing ConstantProp.

Add -hexagon-instsimplify option to enable skipping of instsimplify in
tests that can't handle the extra optimization.

Differential Revision: https://reviews.llvm.org/D85047
2020-08-04 15:42:39 -07:00
Krzysztof Parzyszek
3bf7627b97 [RDF] Remove uses of RDFRegisters::normalize (deprecate)
This function has been reduced to an identity function for some time.
2020-08-04 17:02:12 -05:00
hgreving
1fccd57ea4 [MC] Fix memory leak when allocating MCInst with bump allocator
Adds the function createMCInst() to MCContext that creates a MCInst using
a typed bump alloctor.

MCInst contains a SmallVector<MCOperand, 8>. The SmallVector is POD only
for <= 8 operands. The default untyped bump pointer allocator of MCContext
does not delete the MCInst, so if the SmallVector grows, it's a leak.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46900.
2020-08-03 16:08:26 -07:00
Sidharth Baveja
14aeea5cd6 [Loop Peeling] Separate the Loop Peeling Utilities from the Loop Unrolling Utilities
Summary: This patch separates the Loop Peeling Utilities from Loop Unrolling.
The reason for this change is that Loop Peeling is no longer only being used by
loop unrolling; Patch D82927 introduces loop peeling with fusion, such that
loops can be modified to have to same trip count, making them legal to be
peeled.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D83056
2020-07-31 18:31:58 +00:00
Roman Lebedev
4a9109b967 [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108
2020-07-29 20:05:30 +03:00
David Green
49873f2449 [Analysis] TTI: Add CastContextHint for getCastInstrCost
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types.  Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization.  Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.

For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.

To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.

Original patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79162
2020-07-29 13:32:53 +01:00
Ikhlas Ajbar
bba33c567e [Hexagon] Correct the order of operands when lowering funnel shift-left
This patch corrects the order of operands in the pattern that lowers fshl
in Hexagon.
2020-07-28 21:22:41 -05:00
Simon Pilgrim
a3033adc1a MCFixup.h - remove unnecessary MCExpr.h include. NFCI.
Move the include down to files that actually depend on MCExpr definitions.

Also exposes an implicit dependency on MCContext in AVRAsmBackend.h
2020-07-20 15:17:19 +01:00
Roman Lebedev
e7e1fdf7ba Reland "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions"
This reverts commit 1067d3e176ea7b0b1942c163bf8c6c90107768c1,
which reverted commit b2018198c32a0535bb1f5bb5b40fbcf50d8d47b7,
because it introduced a Dependency Cycle between Transforms/Scalar and
Transforms/Utils.

So let's just move SimplifyCFGOptions.h into Utils/, thus avoiding
the cycle.
2020-07-16 13:40:01 +03:00
Adrian Kuegel
eab3a83f04 Revert "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions"
This reverts commit b2018198c32a0535bb1f5bb5b40fbcf50d8d47b7.
This commit introduced a Dependency Cycle between Transforms/Scalar and
Transforms/Utils. Transforms/Scalar already depends on Transforms/Utils,
so if SimplifyCFGOptions.h is moved to Scalar, and Utils/Local.h still
depends on it, we have a cycle.
2020-07-16 10:54:10 +02:00
Roman Lebedev
fc354c8afc [NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions
Taking so many parameters is simply unmaintainable.

We don't want to include the entire llvm/Transforms/Utils/Local.h into
llvm/Transforms/Scalar.h so i've split SimplifyCFGOptions into
it's own header.
2020-07-16 01:27:54 +03:00
serge-sans-paille
568fbc4c30 Fix HexagonGenExtract return status
Differential Revision: https://reviews.llvm.org/D83460
2020-07-13 20:41:59 +02:00
Sidharth Baveja
28813fa1e6 [NFC] Separate Peeling Properties into its own struct (re-land after minor fix)
Summary:
This patch separates the peeling specific parameters from the UnrollingPreferences,
and creates a new struct called PeelingPreferences. Functions which used the
UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-10 18:39:30 +00:00
Nikita Popov
b0782fa636 Revert "[NFC] Separate Peeling Properties into its own struct"
This reverts commit 0369dc98f958a1ca2ec05f1897f091129bb16e8a.

Many failing tests.
2020-07-08 21:43:32 +02:00
Sidharth Baveja
5c273e628c [NFC] Separate Peeling Properties into its own struct
Summary:
This patch makes the peeling properties of the loop accessible by other loop transformations.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-08 18:59:59 +00:00
Anh Tuyen Tran
0cd52758f2 Revert "[NFC] Separate Peeling Properties into its own struct"
This reverts commit fead250b439bbd4ec0f21e6a52d0c174e5fcdf5a.
2020-07-08 18:58:05 +00:00
Anh Tuyen Tran
660c2f85e7 [NFC] Separate Peeling Properties into its own struct
Summary:
This patch makes the peeling properties of the loop accessible by other loop transformations.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-08 18:56:03 +00:00
Guillaume Chatelet
5c1ab6ec74 [Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D83082
2020-07-03 08:06:43 +00:00
Guillaume Chatelet
132b11f5e0 [Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82977
2020-07-02 11:28:02 +00:00
James Y Knight
af0734bc33 Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
Guillaume Chatelet
feb84bcc0e [Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82956
2020-07-01 14:31:56 +00:00
Adam Balogh
360c750d4b [Hexagon][NFC] Remove redundant condition
Condition `secondReg` is checked both in an outer and in an inner `if`
statement in static function `canCompareBeNewValueJump()` in file
`HexagonNewValueJump.cpp`. This patch removes the redundant inner check.

The issue was found using `clang-tidy` check under review
`misc-redundant-condition`. See https://reviews.llvm.org/D81272.

Differential Revision: https://reviews.llvm.org/D82556
2020-07-01 09:04:26 +02:00
Guillaume Chatelet
ced6ab5db1 [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemcpy to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82849
2020-06-30 13:12:31 +00:00
Guillaume Chatelet
eac048cb4b [Alignment][NFC] TargetLowering::allowsMemoryAccess
Second patch of a series to adapt TargetLowering::allowsXXX functions

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82785
2020-06-30 08:17:00 +00:00
Guillaume Chatelet
177d202096 [Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82749
2020-06-30 07:56:17 +00:00
Guillaume Chatelet
0a9a5ce8bf [Alignment][NFC] Migrate TTI::getGatherScatterOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82577
2020-06-26 11:08:27 +00:00
Guillaume Chatelet
9470341223 [Alignment][NFC] Migrate TTI::getInterleavedMemoryOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82573
2020-06-26 11:00:53 +00:00
Guillaume Chatelet
0a29dd919f [Alignment][NFC] Migrate TTI::getMaskedMemoryOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82569
2020-06-26 10:14:16 +00:00
dfukalov
e968fde7cb [NFCI][CostModel] Add const to Value*.
Summary:
Get back `const` partially lost in one of recent changes.
Additionally specify explicit qualifiers in few places.

Reviewers: samparker

Reviewed By: samparker

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82383
2020-06-24 23:16:08 +03:00
Ikhlas Ajbar
123a690853 [Hexagon] Reducing minimum alignment requirement
This patch reduces minimum alignment requirement to 1 byte for arguments
passed by value on stack.
2020-06-24 10:28:37 -05:00
Sam Parker
6ffd0e23b3 [CostModel] Unify getArithmeticInstrCost
Add the remaining arithmetic opcodes into the generic implementation
of getUserCost and then call this from getInstructionThroughput. Most
of the backends have been modified to return the base implementation
for cost kinds other RecipThroughput. The outlier here is AMDGPU
which already uses getArithmeticInstrCost for all the cost kinds.
This change means that most of the opcodes can be removed from that
backends implementation of getUserCost.

Differential Revision: https://reviews.llvm.org/D80992
2020-06-10 09:08:45 +01:00
Guillaume Chatelet
9bd39147ee Revert "[Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess"
This reverts commit f21c52667ed147903015a94643b0057319189d4e.
2020-06-09 10:43:59 +00:00
Guillaume Chatelet
574b28f02f [Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMemoryAccess` without marking it override.

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81379
2020-06-09 10:11:07 +00:00
Sam Parker
c7b113e4d1 [NFCI][CostModel] Unify getCmpSelInstrCost
Add cases for icmp, fcmp and select into the switch statement of the
generic getUserCost implementation with getInstructionThroughput then
calling into it. The BasicTTI and backend implementations have be set
to return a default value (1) when a cost other than throughput is
being queried.

Differential Revision: https://reviews.llvm.org/D80550
2020-06-09 07:41:22 +01:00
James Y Knight
f92fad214d MachineBasicBlock::updateTerminator now requires an explicit layout successor.
Previously, it tried to infer the correct destination block from the
successor list, but this is a rather tricky propspect, given the
existence of successors that occur mid-block, such as invoke, and
potentially in the future, callbr/INLINEASM_BR. (INLINEASM_BR, in
particular would be problematic, because its successor blocks are not
distinct from "normal" successors, as EHPads are.)

Instead, require the caller to pass in the expected fallthrough
successor explicitly. In most callers, the correct block is
immediately clear. But, in MachineBlockPlacement, we do need to record
the original ordering, before starting to reorder blocks.

Unfortunately, the goal of decoupling the behavior of end-of-block
jumps from the successor list has not been fully accomplished in this
patch, as there is currently no other way to determine whether a block
is intended to fall-through, or end as unreachable. Further work is
needed there.

Differential Revision: https://reviews.llvm.org/D79605
2020-06-06 22:30:51 -04:00
Sam Parker
0362138d32 [CostModel] Unify getMemoryOpCost
Use getMemoryOpCost from the generic implementation of getUserCost
and have getInstructionThroughput return the result of that for loads
and stores.

This also means that the X86 implementation of getUserCost can be
removed with the functionality folded into its getMemoryOpCost.

Differential Revision: https://reviews.llvm.org/D80984
2020-06-05 10:13:38 +01:00
hsmahesha
ed5decd2c8 [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary:
While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80545
2020-06-01 22:52:34 +05:30
Sam Parker
8420f6860c [CostModel] Unify getCastInstrCost
Add the remaining cast instruction opcodes to the base implementation
of getUserCost and directly return the result. This allows
getInstructionThroughput to return getUserCost for the casts. This
has required changes to PPC and SystemZ because they implement
getUserCost and/or getCastInstrCost with adjustments for vector
operations. Adjusts have also been made in the remaining backends
that implement the method so that they still produce a cost of zero
or one for cost kinds other than throughput.

Differential Revision: https://reviews.llvm.org/D79848
2020-05-26 11:29:57 +01:00
Fangrui Song
985b9e2acb [MC] Change MCCFIInstruction::createDefCfa to cfiDefCfa which does not negate Offset
The negative Offset has caused a bunch of problems and confused quite a
few call sites. Delete the unneeded negation and fix all call sites.
2020-05-22 15:47:26 -07:00
Marek Kurdej
d0d6be08d8 [Target] Fix typos. NFC 2020-05-22 14:40:43 +02:00
Sam Parker
d49bd95ac8 [NFCI][CostModel] Refactor getIntrinsicInstrCost
Combine the two API calls into one by introducing a structure to hold
the relevant data. This has the added benefit of moving the boiler
plate code for arguments and flags, into the constructors. This is
intended to be a non-functional change, but the complicated web of
logic involved here makes it very hard to guarantee.

Differential Revision: https://reviews.llvm.org/D79941
2020-05-20 11:59:08 +01:00
Florian Hahn
69f89dfbd9 [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

This patch was originally committed as b8a3c34eee06, but broke the
modules build, as LoopAccessAnalysis was using the Expander.

The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537
2020-05-20 10:53:40 +01:00
Brian Cain
ededcfd109 [Hexagon] pX.new cannot be used with p3:0 as producer
Writes to p3:0 do not produce new values, we should bar any .new
consumer trying to use it as a producer.
2020-05-19 17:06:34 -05:00
Simon Pilgrim
dd58bbb170 TargetLoweringObjectFile.h - remove unnecessary includes. NFCI.
Replace with forward declarations and move includes down to source files where required.

I also needed to move the TargetLoweringObjectFile::SectionForGlobal wrapper implementation down into TargetLoweringObjectFile.cpp
2020-05-19 09:28:13 +01:00
Craig Topper
0fe1404812 Fix several places that were calling verifyFunction or verifyModule without checking the return value.
verifyFunction/verifyModule don't assert or error internally. They
also don't print anything if you don't pass a raw_ostream to them.
So the caller needs to check the result and ideally pass a stream
to get the messages. Otherwise they're just really expensive no-ops.

I've filed PR45965 for another instance in SLPVectorizer
that causes a lit test failure.

Differential Revision: https://reviews.llvm.org/D80106
2020-05-18 13:28:46 -07:00
Ties Stuij
745a9668d4 [IR][BFloat] Add BFloat IR type
Summary:
The BFloat IR type is introduced to provide support for, initially, the BFloat16
datatype introduced with the Armv8.6 architecture (optional from Armv8.2
onwards). It has an 8-bit exponent and a 7-bit mantissa and behaves like an IEEE
754 floating point IR type.

This is part of a patch series upstreaming Armv8.6 features. Subsequent patches
will upstream intrinsics support and C-lang support for BFloat.

Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, sdesmalen, deadalnix, ctetreau

Subscribers: hiraditya, llvm-commits, danielkiss, arphaman, kristof.beyls, dexonsmith

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78190
2020-05-15 14:43:43 +01:00
Xinglong Liao
dcf9829900 [Hexagon] Check isInstr() before getInstr() with SUnit
SUnit represent a MachineInstr in post-regalloc scheduling but SDNode
in pre-regalloc scheduling. when pass -enable-hexagon-sdnode-sched to
Hexagon backend with -O1 and above, this may cause an assertion failed.

Fixes PR45194.

Differential Revision: https://reviews.llvm.org/D76134
2020-05-14 08:47:54 -05:00
Christopher Tetreault
1f37696718 [SVE] Remove usages of VectorType::getNumElements() from Hexagon
Reviewers: efriedma, kmclaughlin, sdesmalen, kparzysz

Reviewed By: kparzysz

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79819
2020-05-13 17:13:12 -07:00
Craig Topper
6cffa34898 [SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode.
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.

Removing getAlignment() will be done as a follow up.

Differential Revision: https://reviews.llvm.org/D79436
2020-05-08 16:04:11 -07:00
Simon Pilgrim
7b6765b11e [TTI] getScalarizationOverhead - use explicit VectorType operand
getScalarizationOverhead is only ever called with vectors (and we already had a load of cast<VectorType> calls immediately inside the functions).

Followup to D78357

Reviewed By: @samparker

Differential Revision: https://reviews.llvm.org/D79341
2020-05-05 16:59:23 +01:00
Sam Parker
c8018d2237 [NFC][CostModel] Add TargetCostKind to relevant APIs
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.

RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html

Differential Revision: https://reviews.llvm.org/D79002
2020-05-05 10:35:54 +01:00
Sam McCall
9e1817bf4e std::isspace -> llvm::isSpace (where locale should be ignored)
I've left out some cases where I wasn't totally sure this was right or
whether the include was ok (compiler-rt) or idiomatic (flang).
2020-05-02 15:36:04 +02:00
Suyog Sarda
80b227666f Handle cases for subregisters.
While restoring latency, check if any of the registers of
source instruction is a subregister of the successor instructions
apart from being same register.
2020-04-30 20:32:33 -05:00
Simon Pilgrim
27264ddabe [TTI] Add DemandedElts to getScalarizationOverhead
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.

This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.

This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.

A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.

Reviewed By: @craig.topper

Differential Revision: https://reviews.llvm.org/D78216
2020-04-29 12:00:38 +01:00
Krzysztof Parzyszek
e4c53b0cb6 Handle part-word LL/SC in atomic expansion pass
Differential Revision: https://reviews.llvm.org/D77213
2020-04-28 10:07:39 -05:00
Sam Parker
d44080f77a [TTI] Add TargetCostKind argument to getUserCost
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.

The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.

getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.

This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.

Differential Revision: https://reviews.llvm.org/D78635
2020-04-28 08:57:45 +01:00
Simon Pilgrim
e9e77991e4 [Pass] Ensure we don't include PassSupport.h or PassAnalysisSupport.h directly
Both PassSupport.h and PassAnalysisSupport.h are only supposed to be included via Pass.h.

Differential Revision: https://reviews.llvm.org/D78815
2020-04-26 12:58:20 +01:00
Fangrui Song
d192bcd830 [TableGen] Drop deprecated leading # operation (NOP) and replace ## with # 2020-04-25 16:26:45 -07:00
Simon Pilgrim
177549095d HexagonShuffler.h - remove duplicate STLExtras.h include. NFC. 2020-04-24 13:27:56 +01:00
Krzysztof Parzyszek
68392a914f [Hexagon] Fix result word order when bitcasting vector pred to int64/128 2020-04-23 19:15:11 -05:00
Christopher Tetreault
9c9d3ddb66 [SVE] Remove calls to isScalable from Hexagon
Reviewers: efriedma, sdesmalen, kparzysz, colinl

Reviewed By: kparzysz

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77757
2020-04-23 14:02:14 -07:00
Kazuaki Ishizaki
7ce19394dc [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Simon Pilgrim
50ff948f8f [Hexagon] Remove unused forward declarations. NFC. 2020-04-22 18:26:50 +01:00
Benjamin Kramer
7a0b21ccae [Hexagon] Silence warning
llvm/lib/Target/Hexagon/HexagonTargetObjectFile.cpp:296:11: warning: enumeration value 'ScalableVectorTyID' not handled in switch [-Wswitch]
  switch (Ty->getTypeID()) {
          ^
2020-04-22 18:57:08 +02:00
Christopher Tetreault
7f0438624e [SVE] Add new VectorType subclasses
Summary:
Introduce new types for fixed width and scalable vectors.

Does not remove getNumElements yet so as to not break code during transition
period.

Reviewers: deadalnix, efriedma, sdesmalen, craig.topper, huntergr

Reviewed By: sdesmalen

Subscribers: jholewinski, arsenm, jvesely, nhaehnle, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, kerbowa, Joonsoo, grosul1, frgossen, lldb-commits, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm, #lldb

Differential Revision: https://reviews.llvm.org/D77587
2020-04-22 08:59:01 -07:00
Shengchen Kan
88597bc560 [MC][Bugfix] Remove redundant parameter for relaxInstruction
Summary:
Before this patch, `relaxInstruction` takes three arguments, the first
argument refers to the instruction before relaxation and the third
argument is the output instruction after relaxation. There are two quite
strange things:
  1) The first argument's type is `const MCInst &`, the third
  argument's type is `MCInst &`, but they may be aliased to the same
  variable
  2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third
  argument is a fresh uninitialized `MCInst` even if `relaxInstruction`
  may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a
  loop.

In this patch, we drop the thrid argument, and let `relaxInstruction`
directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and
breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon.

Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain

Reviewed By: Razer6, MaskRay, bcain

Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78364
2020-04-21 11:06:55 +08:00
Fraser Cormack
2e7316dbb2 Provide operand indices to adjustSchedDependency
This allows targets to know exactly which operands are contributing to
the dependency, which is required for targets with per-operand
scheduling models.

Differential Revision: https://reviews.llvm.org/D77135
2020-04-17 11:08:44 +01:00
Simon Pilgrim
4ac40d287e MCObjectWriter.h - remove Endian.h/EndianStream.h/raw_ostream.h includes. NFC
Push these includes down to the the writers that actually need them, a number of which were implicitly relying on the MCObjectWriter.h.
2020-04-17 10:44:08 +01:00
Christopher Tetreault
60270641b5 [SVE] Remove calls to getBitWidth from Hexagon
Reviewers: efriedma, sdesmalen, kparzysz

Reviewed By: kparzysz

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77899
2020-04-14 11:09:49 -07:00
Craig Topper
4e5ba59647 [CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase
Differential Revision: https://reviews.llvm.org/D77995
2020-04-13 13:50:15 -07:00
Fangrui Song
89981ade6c [MC] Add UseIntegratedAssembler = false. NFC 2020-04-11 10:13:49 -07:00
Matt Arsenault
34f4899e34 CodeGen: Use Register in TargetLowering 2020-04-08 12:10:58 -04:00
Matt Arsenault
1588abc3c6 CodeGen: Use Register in TargetFrameLowering 2020-04-07 17:07:44 -04:00
Eli Friedman
cfeebf9848 Remove SequentialType from the type heirarchy.
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.

In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.

In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)

Differential Revision: https://reviews.llvm.org/D75661
2020-04-06 17:03:49 -07:00
Matt Arsenault
d5df445655 CodeGen: Convert some TII hooks to use Register 2020-04-03 14:52:54 -04:00
Christopher Tetreault
6fdad00e46 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: kparzysz, sdesmalen, efriedma

Reviewed By: kparzysz

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77267
2020-04-03 11:26:51 -07:00
Guillaume Chatelet
bf6dd4469b [Alignment][NFC] Use more Align versions of various functions
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: MatzeB, qcolombet, arsenm, sdardis, jvesely, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77291
2020-04-02 09:00:53 +00:00
Guillaume Chatelet
78ddb62141 [Alignment][NFC] Transition to MachineFrameInfo::getObjectAlign()
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77215
2020-04-01 14:08:28 +00:00