Widen the scope of memory operations that are allowed to be tail predicated
to include gathers and scatters, such that loops that are auto-vectorized
with the option -enable-arm-maskedgatscat (and actually end up containing
an MVE gather or scatter) can be tail predicated.
Differential Revision: https://reviews.llvm.org/D85138
This patch makes the 'AddrSize' field optional. If the address size is
missing, yaml2obj will infer it from the object file.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D85805
On z/OS, the information is stored in the Common System Data Area
(CSD). It is the number of CPs allocated to the current LPAR.
Reviewers: aganea, hubert.reinterpertcast, MaskRay
Reviewed By: hubert.reinterpertcast
Differential Revision: https://reviews.llvm.org/D85531
When TTI was updated to use an explicit cost, TCK_CodeSize was used
although the default implicit cost would have been the hand-wavey
cost of size and latency. So, revert back to this behaviour. This is
not expected to have (much) impact on targets since most (all?) of
them return the same value for SizeAndLatency and CodeSize.
When optimising for size, the logic has been changed to query
CodeSize costs instead of SizeAndLatency.
This patch also adds a testing option in the unroller so that
OptSize thresholds can be specified.
Differential Revision: https://reviews.llvm.org/D85723
This removes the last `unwrapOrError` call from the `printRelocationsHelper`.
There is a little additional complexity because of `SHT_RELR/SHT_ANDROID_RELR` sections.
Such sections contains only relative relocations and they do not have a
symbol table associated with them, hence we should not try to treat
their `sh_link` field as a reference to a symbol table.
Differential revision: https://reviews.llvm.org/D85430
When visiting load and store instructions in SROA skip scalable vectors.
This is relevant in the implementation of the 'arm_sve_vector_bits'
attribute that is used to define VLS types, where an alloca of a
fixed-length vector could be bitcasted to scalable. See D85128 for more
information.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85725
Now that SCEVExpander can preserve LCSSA form,
we do not have to worry about LCSSA form when
trying to look through PHIs. SCEVExpander will take
care of inserting LCSSA PHI nodes as required.
This increases precision of the analysis in some cases.
Reviewed By: mkazantsev, bmahjour
Differential Revision: https://reviews.llvm.org/D71539
Note that DWARFUnit::getAbbreviations() returns nullptr if the
abbreviations could not be read, but callers used the returned
pointer without checking.
Differential Revision: https://reviews.llvm.org/D85738
This pick ups the work on the overflow checks for get.active.lane.mask,
which ensure that it is safe to insert the VCTP intrinisc that enables
tail-predication. For a 2d auto-correlation kernel and its inner loop j:
M = Size - i;
for (j = 0; j < M; j++)
Sum += Input[j] * Input[j+i];
For this inner loop, the SCEV backedge taken count (BTC) expression is:
(-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body>
and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC
cannot be max" could not be determined. So overflow behaviour had to be assumed
in the loop tripcount expression that uses the BTC. As a result
tail-predication had to be forced (with an option) for this case.
This change solves that by using ScalarEvolution's helper
getConstantMaxBackedgeTakenCount which is able to determine the range of BTC,
thus can determine it is safe, so that we no longer need to force tail-predication
as reflected in the changed test cases.
Differential Revision: https://reviews.llvm.org/D85737
In this patch I have fixed two issues:
1. Our SVE tuple get/set intrinsics were using the wrong constant type
for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the
function SelectionDAG::getVectorIdxConstant to create the value. Also, I
have updated the documentation for EXTRACT_SUBVECTOR describing what type
the constant index should be and we now enforce this when creating the
node.
2. The AArch64 backend was missing the appropriate patterns for
extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types.
I have added them as part of this patch.
The only way that I could find to test the new patterns was to use the
SVE tuple get intrinsics, although I realise it looks a bit unusual.
Tests added here:
test/CodeGen/AArch64/sve-extract-subvector.ll
Differential Revision: https://reviews.llvm.org/D85516
VE has only 64 bits AND/OR/XOR instructions. We pretended that VE has 32 bits
instructions also, but doing it increase the number of generated instructions.
Therefore, we decide to promote 32 bits operations and use only 64 bits
instructions in back end. We also avoid pretending that VE has 32 bits LEA
instruction. Update regression tests also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D85726
This patch adds the translation of the proc_bind clause in a
parallel operation.
The values that can be specified for the proc_bind clause are
specified in the OMP.td tablegen file in the llvm/Frontend/OpenMP
directory. From this single source of truth enumeration for
proc_bind is generated in llvm and mlir (used in specification of
the parallel Operation in the OpenMP dialect). A function to return
the enum value from the string representation is also generated.
A new header file (DirectiveEmitter.h) containing definitions of
classes directive, clause, clauseval etc is created so that it can
be used in mlir as well.
Reviewers: clementval, jdoerfert, DavidTruby
Differential Revision: https://reviews.llvm.org/D84347
SUBREG_TO_REG is supposed to be used when we know the producing
instruction already zeroed the bits we're extending. But that's
not the case here. So INSERT_SUBREG with an IMPLICIT_DEF is the
correct thing to use.
With this patch we will match most *uses* of "temporary" named things in
the IR via regular expressions, not their name at creation time. The new
"values" we match are:
- "unnamed" globals: `@[0-9]+`
- debug metadata: `!dbg ![0-9]+`
- loop metadata: `!loop ![0-9]+`
- tbaa metadata: `!tbaa ![0-9]+`
- range metadata: `!range ![0-9]+`
- generic metadata: `metadata ![0-9]+`
- attributes groups: `#[0-9]`
We still don't match the declarations but that can be done later. This
patch can introduce churn when existing check lines contain the old
hardcoded versions of the above "values". We can add a flag to opt-out,
or opt-in, if necessary.
Reviewed By: arichardson, MaskRay
Differential Revision: https://reviews.llvm.org/D85099
Rather than handling zlib handling manually, use find_package from CMake
to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB,
HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is
set to YES, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.
This is a reland of abb0075 with all followup changes and fixes that
should address issues that were reported in PR44780.
Differential Revision: https://reviews.llvm.org/D79219
Based on post-commit discussion in D81766, Hexagon sets this to "0".
I'll see if I can come up with a test, but making the obvious
code fix first to unblock that target.
Rather than just saying that some feature is missing, report the exact
features to make the error message more useful and actionable.
Differential Revision: https://reviews.llvm.org/D85795
When turning on -debug-info-kind=constructor we ran into a "fragment covers
entire variable" error during thinlto. The fragment is currently always
emitted if there is no type size, but sometimes the variable has a
forward declared struct type which doesn't have a size.
This changes the code to get the type size from the GlobalVariable instead.
Differential Revision: https://reviews.llvm.org/D85572
Without this patch, we attempt to distribute And over Xor even in
unsafe circumstances like so:
undef & (true ^ true) ==> (undef & true) ^ (undef & true)
and evaluate it to undef instead of false. Note that "true ^ true"
may show up implicitly with one true being part of a PHI node.
This patch fixes the problem by teaching SimplifyUsingDistributiveLaws
to not use undef as part of simplifications.
Reviewers: spatel, aqjune, nikic, lebedev.ri, fhahn, jdoerfert
Differential Revision: https://reviews.llvm.org/D85687
Introduce a helper on Instruction which can be used to update the debug
location after hoisting.
Use this in GVN and LICM, where we were mistakenly introducing new line
0 locations after hoisting (the docs recommend dropping the location in
this case).
For more context, see the discussion in https://reviews.llvm.org/D60913.
Differential Revision: https://reviews.llvm.org/D85670
Similar to what we do in IIQ, add an isUndefValue() helper that
checks for undef values while respective CanUseUndef. This makes
it much easier to search for places that don't respect the flag
yet.
We should be able to see that the new aggregate we have produced
is identical to the source aggregate from which we've extracted
the elements that we used to form a new aggregate.
This happens (a lot) in clang C++ exception code on unwind branch.
The officially specified abbreviation for WebAssembly is Wasm and the
spec explicitly calls out WASM as being an incorrect spelling. This
patch fixes a few comments and error messages to use the
spec-compliant abbreviation.
Differential Revision: https://reviews.llvm.org/D85764
SUMMARY:
1. in the patch , remove setting storageclass in function .getXCOFFSection and construct function of class MCSectionXCOFF
there are
XCOFF::StorageMappingClass MappingClass;
XCOFF::SymbolType Type;
XCOFF::StorageClass StorageClass;
in the MCSectionXCOFF class,
these attribute only used in the XCOFFObjectWriter, (asm path do not need the StorageClass)
we need get the value of StorageClass, Type,MappingClass before we invoke the getXCOFFSection every time.
actually , we can get the StorageClass of the MCSectionXCOFF from it's delegated symbol.
2. we also change the oprand of branch instruction from symbol name to qualify symbol name.
for example change
bl .foo
extern .foo
to
bl .foo[PR]
extern .foo[PR]
3. and if there is reference indirect call a function bar.
we also add
extern .bar[PR]
Reviewers: Jason liu, Xiangling Liao
Differential Revision: https://reviews.llvm.org/D84765
This reverts commit 52b71aa8b1a019403b0ecc184744b2f8ca2f7cba.
The problem was a missing lit.local.cfg file, which was causing the
test to be incorrectly run on bots that had not built the WebAssembly
target.
This reverts commit 94791970de109eb9a6b296825ddb0fc2a196b366.
The test is failing on multiple bots, event though it passes for me
locally. Reverting while I investigate further.
8cc911fa5b06 refactored the `getIntrinsicInstrCost` function and was
meant to be a nonfunctional change, but it accidentally changed how
costs were calculated in the SLP vectorizer, which regressed
WebAssembly codegen and resulted in a downstream bug report at
https://github.com/emscripten-core/emscripten/issues/11449.
The fix for this regression is in D85759, and this patch just
pre-commits the test from that patch to demonstrate the regressed
behavior first.
This implements
```
(logic_op (op x...), (op y...)) -> (op (logic_op x, y))
```
when `op` is an extend, a shift, or an and.
This is similar to `DAGCombiner::hoistLogicOpWithSameOpcodeHands`
(with a bunch of missing cases, e.g. G_TRUNC, G_BITCAST, etc.)
This is implemented so it works both pre and post-legalization.
This also adds a general way to add a series of instructions in a combine.
(`applyBuildInstructionSteps`).
Differential Revision: https://reviews.llvm.org/D85050
Summary:
COFF and XCOFF in llvm are very different and serves different platform.
Since we have different Dumper.cpp file in llvm-readobj's
implementation, we should have separate testing directory for them too.
Reviewed By: jhenderson, DiggerLin
Differential Revision: https://reviews.llvm.org/D85675