1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

43840 Commits

Author SHA1 Message Date
Amara Emerson
20267c085f [GlobalISel][Localizer] Don't localize phi operands which are used more than once in the phi.
The current algorithm just tries to localize defs as far as they can go, and in
the case of G_PHI operands, it clones the def into the predecessor block for
each incoming edge. When multiple edges have the same register value, this can
cause unnecessary code bloat, and inhibit later optimizations.

This change checks if a given phi operand is unique in the phi, if not the
def of that register is not localized to the predecessor.

Differential Revision: https://reviews.llvm.org/D95406
2021-01-25 17:48:04 -08:00
Mitch Phillips
587eafbc21 Revert "Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method""
This reverts commit 554b3211fefd09b56b64357b9edd66c78ae200b5.

Differential Revision: https://reviews.llvm.org/D95035
2021-01-25 16:22:22 -08:00
modimo
46d7a90a64 [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks
This change leverages the work done in D83743 to replay in the SampleProfile inliner to also be used in the CGSCC inliner. NOTE: currently restricted to non-ML advisors only.

The added switch `-cgscc-inline-replay=<remarks file>` will replay the inlining decisions in that file where the remarks file is generated via `-Rpass=inline`. The aim here is to make it easier to analyze changes that would modify inlining heuristics to be separated from this behavior. Doing so allows easier examination of assembly and runtime behavior compared to the baseline rather than trying to dig through the large churn caused by inlining.

In LTO compilation, since inlining is done twice you can separately specify replay by passing the flag to the FE (`-cgscc-inline-replay=`) and to the linker (`-Wl,cgscc-inline-replay=`) with the remarks generated from their respective places.

Testing on mysqld by comparing the inline decisions between base (generates remarks.txt) and diff (replay using identical input/tools with remarks.txt) and examining the inlining sites with `diff` shows 14,000 mismatches out of 247,341 for a ~94% replay accuracy. I believe this gap can be narrowed further though for the general case we may never achieve full accuracy. For my personal use, this is close enough to be representative: I set the baseline as the one generated by the replay on identical input/toolset and compare that to my modified input/toolset using the same replay.

Testing:
ninja check-llvm
newly added test correctly replays CGSCC inlining decisions

Reviewed By: mtrofin, wenlei

Differential Revision: https://reviews.llvm.org/D94334
2021-01-25 15:38:57 -08:00
Duncan P. N. Exon Smith
d8f7c22241 Support: Remove duplicated code in {File,clang::ModulesDependency}Collector, NFC
Refactor the duplicated canonicalize-path logic in `FileCollector` and
`ModulesDependencyCollector` into a new utility called
`PathCanonicalizer` that's shared. This popped up when tracking down a
bug common to both in https://reviews.llvm.org/D95202.

As drive-bys, update a few names and comments to better reflect the
effect of the code, delay removal of `..`s to avoid an unnecessary extra
string copy, and leave behind a couple of FIXMEs for future
consideration.

Differential Revision: https://reviews.llvm.org/D95279
2021-01-25 15:09:00 -08:00
Richard Smith
b9cfb936a1 Revert "[ObjC][ARC] Annotate calls with attributes instead of emitting retainRV"
This reverts commit 53176c168061d6f26dcf3ce4fa59288b7d67255e, which
introduceed a layering violation. LLVM's IR library can't include
headers from Analysis.
2021-01-25 13:53:38 -08:00
Jonas Devlieghere
ba9adaa9dd [YAML I/O] Fix bug in emission of empty sequence
Don't emit an output dash for an empty sequence. Take emitting a vector
of strings for example:

  std::vector<std::string> Strings = {"foo", "bar"};
  LLVM_YAML_IS_SEQUENCE_VECTOR(std::string)
  yout << Strings;

This emits the following YAML document.

  ---
  - foo
  - bar
  ...

When the vector is empty, this generates the following result:

  ---
  - []
  ...

Although this is valid YAML, it does not match what we meant to emit.
The result is a one-element sequence consisting of an empty list.
Indeed, if we were to try to read this again we get an error:

  YAML:2:4: error: not a mapping
  - []

The problem is the output dash before the empty list. The correct output
would be:

  ---
  []
  ...

This patch fixes that by not emitting the output dash for an empty
sequence.

Differential revision: https://reviews.llvm.org/D95280
2021-01-25 13:35:36 -08:00
Akira Hatanaka
96195f7442 [ObjC][ARC] Annotate calls with attributes instead of emitting retainRV
or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end annotates calls with attribute "clang.arc.rv"="retain"
  or "clang.arc.rv"="claim", which indicates the call is implicitly
  followed by a marker instruction and a retainRV/claimRV call that
  consumes the call result. This is currently done only when the target
  is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the
  annotated calls in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the annotated
  calls. It doesn't remove the attribute on the call since the backend
  needs it to emit the marker instruction. The retainRV/claimRV calls
  are emitted late in the pipeline to prevent optimization passes from
  transforming the IR in a way that makes it harder for the ARC
  middle-end passes to figure out the def-use relationship between the
  call and the retainRV/claimRV calls (which is the cause of PR31925).

- The function inliner removes the autoreleaseRV call in the callee that
  returns the result if nothing in the callee prevents it from being
  paired up with the calls annotated with "clang.arc.rv"="retain/claim"
  in the caller. If the call is annotated with "claim", a release call
  is inserted since autoreleaseRV+claimRV is equivalent to a release. If
  it cannot find an autoreleaseRV call, it tries to transfer the
  attributes to a function call in the callee. This is important since
  ARC optimizer can remove the autoreleaseRV call returning the callee
  result, which makes it impossible to pair it up with the retainRV or
  claimRV call in the caller. If that fails, it simply emits a retain
  call in the IR if the call is annotated with "retain" and does nothing
  if it's annotated with "claim".

- This patch teaches dead argument elimination pass not to change the
  return type of a function if any of the calls to the function are
  annotated with attribute "clang.arc.rv". This is necessary since the
  pass can incorrectly determine nothing in the IR uses the function
  return, which can happen since the front-end no longer explicitly
  emits retainRV/claimRV calls in the IR, and change its return type to
  'void'.

Future work:

- Use the attribute on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the attributes.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-01-25 11:57:08 -08:00
Sander de Smalen
110f0110d8 [InstructionCost] Prevent InstructionCost being created with CostState.
For a function that returns InstructionCost, it is very tempting to write:

  return InstructionCost::Invalid;

But that actually returns InstructionCost(1 /* int value of Invalid */))
which has a totally different meaning. By marking this constructor as
`delete`, this can no longer happen.
2021-01-25 11:26:56 +00:00
Georgii Rymar
fc47fdd498 [yaml2obj, obj2yaml] - Implement section header table as a special Chunk.
This was discussed in D93678 thread.
Currently we have one special chunk - Fill.

This patch re implements the "SectionHeaderTable" key to become a special chunk too.
With that we are able to place the section header table at any location,
just like we place sections.

Differential revision: https://reviews.llvm.org/D95140
2021-01-25 13:08:08 +03:00
Fangrui Song
7cc3572b85 [XRay] Support DW_TAG_call_site and delete unneeded PATCHABLE_EVENT_CALL/PATCHABLE_TYPED_EVENT_CALL lowering 2021-01-25 00:49:18 -08:00
QingShan Zhang
d5b70bbb38 [NFC] [DAGCombine] Correct the result for sqrt even the iteration is zero
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.

Reviewed By: dmgreen, spatel, RKSimon

Differential Revision: https://reviews.llvm.org/D94480
2021-01-25 04:02:44 +00:00
Chen Zheng
db58448497 [PowerPC] support register pressure reduction in machine combiner.
Reassociating some patterns to generate more fma instructions to
reduce register pressure.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D92071
2021-01-24 21:28:21 -05:00
Carl Ritson
8b5995e559 [AMDGPU] Fix llvm.amdgcn.init.exec and frame materialization
Frame-base materialization may insert vector instructions before EXEC is initialised.
Fix this by moving lowering of llvm.amdgcn.init.exec later in backend.
Also remove SI_INIT_EXEC_LO pseudo as this is not necessary.

Reviewed By: ruiling

Differential Revision: https://reviews.llvm.org/D94645
2021-01-25 08:31:17 +09:00
Kazu Hirata
0ec2908ce9 [llvm] Use pop_back_val (NFC) 2021-01-24 12:18:57 -08:00
Kazu Hirata
2fb5578408 [CodeGen] Forward-declare TargetMachine (NFC)
InstrEmitter.h needs TargetMachine but relies on a forward declaration
of TargetMachine in MachineOperand.h.  This patch adds a forward
declaration right in InstrEmitter.h.

While we are at it, this patch removes the one in MachineOperand.h,
where it is unnecessary.
2021-01-24 12:18:54 -08:00
Nikita Popov
710501602d [Utils] Use NoAliasScopeDeclInst in a few more places (NFC)
In the cloning infrastructure, only track an MDNode mapping,
without explicitly storing the Metadata mapping, same as is done
during inlining. This makes things slightly simpler.
2021-01-24 16:24:11 +01:00
Florian Hahn
91e3095774 [LTO] Move DisableVerify setting to LTOCodeGenerator class (NFC).
To simplify the transition to using LTOBackend, move DisableVerify to
the LTOCodeGenerator class, like most/all other options.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D95223
2021-01-24 14:14:40 +00:00
Jeroen Dobbelaere
2c513aff8c [LoopRotate] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed
Similar to D92887, LoopRotation also needs duplicate the noalias scopes when rotating a `@llvm.experimental.noalias.scope.decl` across a block boundary.
This is based on the version from the Full Restrict paches (D68511).

The problem it fixes also showed up in Transforms/Coroutines/ex5.ll after D93040 (when enabling strict checking with -verify-noalias-scope-decl-dom).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94306
2021-01-24 13:53:13 +01:00
Jeroen Dobbelaere
76b232f00a [LoopUnroll] Use llvm.experimental.noalias.scope.decl for duplicating noalias metadata as needed
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=39282. Compared to D90104, this version is based on part of the full restrict patched (D68484) and uses the `@llvm.experimental.noalias.scope.decl` intrinsic to track the location where !noalias and !alias.scope scopes have been introduced. This allows us to only duplicate the scopes that are really needed.

Notes:
- it also includes changes and tests from D90104

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D92887
2021-01-24 13:48:20 +01:00
Michael Kruse
d945273b52 [OpenMPIRBuilder] Implement tileLoops.
The  tileLoops method implements the code generation part of the tile directive introduced in OpenMP 5.1. It takes a list of loops forming a loop nest, tiles it, and returns the CanonicalLoopInfo representing the generated loops.

The implementation takes n CanonicalLoopInfos, n tile size Values and returns 2*n new CanonicalLoopInfos. The input CanonicalLoopInfos are invalidated and BBs not reused in the new loop nest removed from the function.

In a modified version of D76342, I was able to correctly compile and execute a tiled loop nest.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D92974
2021-01-23 19:39:29 -06:00
Nikita Popov
15041a14bb [IR] Add NoAliasScopeDeclInst (NFC)
Add an intrinsic type class to represent the
llvm.experimental.noalias.scope.decl intrinsic, to make code
working with it a bit nicer by hiding the metadata extraction
from view.
2021-01-23 22:40:32 +01:00
Florian Hahn
5b8c530938 [FuzzMutate] Add mutator to modify instruction flags.
This patch adds a new InstModificationIRStrategy to mutate flags/options
for instructions. For example, it may add or remove nuw/nsw flags from
add, mul, sub, shl instructions or change the predicate for icmp
instructions.

Subtle changes such as those mentioned above should lead to a more
interesting range of inputs. The presence or absence of overflow flags
can expose subtle bugs, for example.

Reviewed By: bogner

Differential Revision: https://reviews.llvm.org/D94905
2021-01-23 19:05:20 +00:00
Kazu Hirata
f4934140cd [llvm] Forward-declare ICFLoopSafetyInfo (NFC)
LoopUtils.h needs ICFLoopSafetyInfo but relies on a forward
declaration of ICFLoopSafetyInfo in IVDescriptors.h.  This patch adds
a forward declaration right in LoopUtils.h.

While we are at it, this patch removes the one in IVDescriptors.h,
where it is unnecessary.
2021-01-23 10:56:30 -08:00
Florian Hahn
283961f4c8 [Local] Treat calls that may not return as being alive.
With the addition of the `willreturn` attribute, functions that may
not return (e.g. due to an infinite loop) are well defined, if they are
not marked as `willreturn`.

This patch updates `wouldInstructionBeTriviallyDead` to not consider
calls that may not return as dead.

This patch still provides an escape hatch for intrinsics, which are
still assumed as willreturn unconditionally. It will be removed once
all intrinsics definitions have been reviewed and updated.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94106
2021-01-23 16:05:14 +00:00
Roman Lebedev
cbc12c9267 [SimplifyCFG] Change 'LoopHeaders' to be ArrayRef<WeakVH>, not a naked set, thus avoiding dangling pointers
If i change it to AssertingVH instead, a number of existing tests fail,
which means we don't consistently remove from the set when deleting blocks,
which means newly-created blocks may happen to appear in that set
if they happen to occupy the same memory chunk as did some block
that was in the set originally.

There are many places where we delete blocks,
and while we could probably consistently delete from LoopHeaders
when deleting a block in transforms located in SimplifyCFG.cpp itself,
transforms located elsewhere (Local.cpp/BasicBlockUtils.cpp) also may
delete blocks, and it doesn't seem good to teach them to deal with it.

Since we at most only ever delete from LoopHeaders,
let's just delegate to WeakVH to do that automatically.

But to be honest, personally, i'm not sure that the idea
behind LoopHeaders is sound.
2021-01-23 16:48:35 +03:00
Florian Hahn
438026988c [LTO] Store target attributes as vector of strings (NFC).
The target features are obtained as a list of features/attributes.
Instead of storing them in a single string, store the vector. This
matches lto::Config's behavior and simplifies the transition to
lto::backend().

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D95224
2021-01-23 12:11:58 +00:00
Simon Pilgrim
982a3c3ce9 [Support] TrigramIndex::insert - pass std::String argument by const reference. NFCI.
Avoid string copies and fix clang-tidy warning.
2021-01-23 11:04:00 +00:00
Kazu Hirata
67c2d74113 [llvm] Use static_assert instead of assert (NFC)
Identified with misc-static-assert.
2021-01-22 23:25:05 -08:00
Kazu Hirata
0069e576c3 [llvm] Use isAlpha/isAlnum (NFC) 2021-01-22 23:25:03 -08:00
Hsiangkai Wang
02776ce1c4 [RISCV] Implement vsoxseg/vsuxseg intrinsics.
Define vsoxseg/vsuxseg intrinsics and pseudo instructions.
Lower vsoxseg/vsuxseg intrinsics to pseudo instructions in RISCVDAGToDAGISel.

Differential Revision: https://reviews.llvm.org/D94940
2021-01-23 08:54:56 +08:00
Hsiangkai Wang
233ef2a3eb [RISCV] Implement vloxseg/vluxseg intrinsics.
Define vloxseg/vluxseg intrinsics and pseudo instructions.
Lower vloxseg/vluxseg intrinsics to pseudo instructions in RISCVDAGToDAGISel.

Differential Revision: https://reviews.llvm.org/D94903
2021-01-23 08:54:56 +08:00
Duncan P. N. Exon Smith
e675c8ca11 ADT: Use 'using' to inherit assign and append in SmallString
Rather than reimplement, use a `using` declaration to bring in
`SmallVectorImpl<char>`'s assign and append implementations in
`SmallString`.

The `SmallString` versions were missing reference invalidation
assertions from `SmallVector`. This patch also fixes a bug in
`llvm::FileCollector::addFileImpl`, which was a copy/paste from
`clang::ModuleDependencyCollector::copyToRoot`, both caught by the
no-longer-skipped assertions.

As a drive-by, this also sinks the `const SmallVectorImpl&` versions of
these methods down into `SmallVectorImpl`, since I imagine they'd be
useful elsewhere.

Differential Revision: https://reviews.llvm.org/D95202
2021-01-22 16:17:58 -08:00
Stanislav Mekhanoshin
dd3332e2e5 Change materializeFrameBaseRegister() to return register
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.

Differential Revision: https://reviews.llvm.org/D95268
2021-01-22 15:51:06 -08:00
Mitch Phillips
7a51025e46 Revert "[GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method"
This reverts commit 2bb92bf451d7eb2c817f3e5403353e7c0c14d350.

Dependent patch broke UBSan on Android:
3dedad475da45c05bc4f66cd14e9f44581edf0bc
2021-01-22 14:32:11 -08:00
Jonas Devlieghere
68cbad8a6e [VFS] Fix inconsistencies between relative paths and fallthrough.
This patch addresses inconsistencies in the way fallthrough is handled
in the RedirectingFileSystem. Rather than trying to change the working
directory of the external filesystem, the RedirectingFileSystem will
canonicalize every path before handing it down. This guarantees that
relative paths are resolved relative to the RedirectingFileSystem's
working directory.

This allows us to have a strictly virtual working directory, and still
fallthrough for absolute paths, but not for relative paths that would
get resolved incorrectly at the lower layer (for example, in case of the
RealFileSystem, because the strictly virtual path does not exist).

Differential revision: https://reviews.llvm.org/D95188
2021-01-22 14:15:48 -08:00
Cassie Jones
166f6f7864 [GlobalISel] LegalizerHelper - Extract widenScalarAddoSubo method
The widenScalar implementation for signed and unsigned overflowing
operations were very similar: both are checked by truncating the result
and then re-sign/zero-extending it and checking that it matches the
computed operation.

Using a truncate + zero-extend for the unsigned case instead of manually
producing the AND instruction like before leads to an extra copy
instruction during legalization, but this should be harmless.

Differential Revision: https://reviews.llvm.org/D95035
2021-01-22 14:08:46 -08:00
Shimin Cui
50ae94abae [Analysis] Support AIX vec_malloc routines
This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX.

Differential Revision: https://reviews.llvm.org/D94710
2021-01-22 16:03:01 -05:00
Yaxun (Sam) Liu
2e16ddcdc8 [HIP] Support __managed__ attribute
This patch implements codegen for __managed__ variable attribute for HIP.

Diagnostics will be added later.

Differential Revision: https://reviews.llvm.org/D94814
2021-01-22 11:43:58 -05:00
Roman Lebedev
1cff9b2a83 [NFC][InstCombine] Extract freelyInvertAllUsersOf() out of canonicalizeICmpPredicate()
I'd like to use it in an upcoming fold.
2021-01-22 17:23:53 +03:00
Nikita Popov
8c54b2d6f0 [IR] Optimize adding attribute to AttributeList (NFC)
When adding an enum attribute to an AttributeList, avoid going
through an AttrBuilder and instead directly add the attribute to
the correct set. Going through AttrBuilder is expensive, because
it requires all string attributes to be reconstructed.

This can be further improved by inserting the attribute at the
right position and using the AttributeSetNode::getSorted() API.

This recovers the small compile-time regression from D94633.
2021-01-22 11:30:21 +01:00
Jay Foad
eee5207f2a [LegacyPM] Update InversedLastUser on the fly. NFC.
This speeds up setLastUser enough to give a 5% to 10% speed up on
trivial invocations of opt and llc, as measured by:

perf stat -r 100 opt -S -o /dev/null -O3 /dev/null
perf stat -r 100 llc -march=amdgcn /dev/null -filetype null

Don't dump last use information unless -debug-pass=Details to avoid
printing lots of spam that will break some existing lit tests. Before
this patch, dumping last use information was broken anyway, because it
used InversedLastUser before it had been populated.

Differential Revision: https://reviews.llvm.org/D92309
2021-01-22 09:48:54 +00:00
Sven van Haastregt
22391a6c81 [APSInt][NFC] Clean up doxygen comments
Add a Doxygen class comment and clean up other Doxygen comments in
this file while we're at it.
2021-01-22 09:23:41 +00:00
Nathan Lanza
c9d2b08298 NFC: Remove simple_ilist comment mentioning ilist/iplist allocating
Allocation was removed from ilist in 2016 in the git commit
b5da00533510.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D93953
2021-01-22 03:24:54 -05:00
Arthur Eubanks
d815d3c37a [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook
Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
   some limit
2) It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94153
2021-01-21 20:29:17 -08:00
Jacques Pienaar
d6b74b6a39 [mlir] Enable passing crash reproducer stream factory method
Add factory to create streams for logging the reproducer. Allows for more general logging (beyond file) and logging the configuration/module separately (logged in order, configuration before module).

Also enable querying filename of ToolOutputFile.

Differential Revision: https://reviews.llvm.org/D94868
2021-01-21 20:03:15 -08:00
Kazu Hirata
69d44af40a [llvm] Use isDigit (NFC) 2021-01-21 19:59:50 -08:00
ShihPo Hung
a54a750343 [RISCV] Add intrinsics for RVV1.0 VFRSQRTE7 & VFRECE7
Reviewed By: craig.topper, frasercrmck

Differential Revision: https://reviews.llvm.org/D95113
2021-01-21 18:38:49 -08:00
ShihPo Hung
1d80f86c9c [RISCV] Add intrinsics for vector unordered indexed load in RVV 1.0
Add unordered indexed load: vluxei

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95028
2021-01-21 18:38:49 -08:00
ShihPo Hung
8009ef0a5f [RISCV] Add intrinsics for RVV 1.0 vrgatherei16
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95014
2021-01-21 18:38:49 -08:00
Craig Topper
2134974a6f [RISCV] Add a VL output to vleff intrinsics.
The fault-only-first-load instructions can reduce VL if an element
other than element 0 triggers a memory fault. This can be used to
vectorize loops with data dependent exit conditions like strcmp or
strlen.

This patch adds a VL output to these intrinsics so that the new
VL value can be captured by software. This will be expanded to
'csrr gpr, vl' after the vleff instruction during SelectionDAG.

By doing this with one intrinsic we are able to guarantee that the
csrr reads the VL value produced by the vleff instruction. Having
it as a separate intrinsic would make it impossible to guarantee
ordering without making every other vector intrinsic have side
effects.

The intrinsics are expanded during lowering into two ISD nodes
that are glued together. These ISD nodes will go
through isel separately, but should maintain the glue so that they
get emitted adjacently by InstrEmitter.

I've only ran the chain through the vleff instruction, allowing
the READ_VL to be deleted if it is unused.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D94286
2021-01-21 17:19:58 -08:00