Its unlikely an undef element in a zero vector will be any use, and SimplifyDemandedVectorElts now calls combineX86ShufflesRecursively so its unlikely we actually have a dependency on these specific elements.
This is yet another attempt at providing support for epilogue
vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none
and reviews D30247 and D88819.
Similar to D88819, this patch achieve epilogue vectorization by
executing a single vplan twice: once on the main loop and a second
time on the epilogue loop (using a different VF). However it's able
to handle more loops, and generates more optimal control flow for
cases where the trip count is too small to execute any code in vector
form.
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D89566
This might be a small improvement in readability, but the
real motivation is to make it easier to adapt the code to
deal with intrinsics like 'maxnum' and/or integer min/max.
There is potentially help in doing that with D92086, but
we might also just add specialized wrappers here to deal
with the expected patterns.
OpenMPIRBuilder::createParallel outlines the body region of the parallel
construct into a new function that accepts any value previously defined outside
the region as a function argument. This function is called back by OpenMP
runtime function __kmpc_fork_call, which expects trailing arguments to be
pointers. If the region uses a value that is not of a pointer type, e.g. a
struct, the produced code would be invalid. In such cases, make createParallel
emit IR that stores the value on stack and pass the pointer to the outlined
function instead. The outlined function then loads the value back and uses as
normal.
Reviewed By: jdoerfert, llitchev
Differential Revision: https://reviews.llvm.org/D92189
This also removes the empty extra "module asm" that would be created,
and updates the test to reflect that while making it more explicit.
Broken out from https://reviews.llvm.org/D92335
This patch consists of the addition of some common additional
extended mnemonics to the SystemZ target.
- These are jnop, jct, jctg, jas, jasl, jxh, jxhg, jxle,
jxleg, bru, brul, br*, br*l.
- These mnemonics and the instructions they map to are
defined here, Chapter 4 - Branching with extended
mnemonic codes.
- Except for jnop (which is a variant of brc 0, label), every
other mnemonic is marked as a MnemonicAlias since there is
already a "defined" instruction with the same encoding
and/or condition mask values.
- brc 0, label doesn't have a defined extended mnemonic, thus
jnop is defined using as an InstAlias. Furthermore, the
applyMnemonicAliases function is called in the overridden
parseInstruction function in SystemZAsmParser.cpp to ensure
any mnemonic aliases are applied before any further
processing on the instruction is done.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D92185
In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:
br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
...
!2 = !{!2, !3, !4}
!3 = !{!"llvm.loop.vectorize.width", i32 8}
!4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.
Differential Revision: https://reviews.llvm.org/D88962
This implementation of `ELFDumper<ELFT>::printAttributes()` in llvm-readobj has issues:
1) It crashes when the content of the attribute section is empty.
2) It uses `unwrapOrError` and `reportWarning` calls, though
ideally we want to use `reportUniqueWarning`.
3) It contains a TODO about redundant format version check.
`lib/Support/ELFAttributeParser.cpp` uses a hardcoded constant instead of the named constant.
This patch fixes all these issues.
Differential revision: https://reviews.llvm.org/D92318
These were re-added by fbfb1c790982277eaa5134c2b6aa001e97fe828d but
should not have been. This removes the old experimental versions of the
reduction intrinsics again, leaving the new non experimental ones.
Differential Revision: https://reviews.llvm.org/D92411
In lowering of FLT_ROUNDS_, FPSCR content will be moved into FP register
and then GPR, and then truncated into word.
For subtargets without direct move support, it will store and then load.
The load address needs adjustment (+4) only on big-endian targets. This
patch fixes it on using generic opcodes on little-endian and subtargets
with direct-move.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D91845
This:
1) Changes `reportWarning` to `reportUniqueWarning` (no-op here).
2) Adds more context to the message.
3) Merges `broken-dynsym-link.test` into `dyn-symbols.test`, adds more testing.
Differential revision: https://reviews.llvm.org/D92380
i1 is the native type for PowerPC if crbits is enabled. However, we need
to promote the i1 to i64 as we didn't have the pattern for i1.
Reviewed By: Qiu Chao Fang
Differential Revision: https://reviews.llvm.org/D92067
This adds missing `select` instruction support and block return type
support for reference types. Also refactors WebAssemblyInstrRef.td and
rearranges tests in reference-types.s. Tests don't include `exnref`
types, because we currently don't support `exnref` for `ref.null` and
the type will be removed soon anyway.
Reviewed By: tlively, sbc100, wingo
Differential Revision: https://reviews.llvm.org/D92359
The static_assert in "libcxx/include/memory" was the main offender here,
but then I figured I might as well `git grep -i instantat` and fix all
the instances I found. One was in user-facing HTML documentation;
the rest were in comments or tests.
We are avoiding writing to WZR just about everywhere else.
Also update the code to use MachineIRBuilder for the sake of consistency.
We also didn't have a GlobalISel testcase for this path, so add a simple one
now.
Differential Revision: https://reviews.llvm.org/D90626
So that instructions like `lla a5, (0xFF + end) - 4` (supported by GNU as) can
be parsed.
Add a missing test that an operand like `foo + foo` is not allowed.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D92293
When handling a DSOLocalEquivalent operand change:
- Remove assertion checking that the `To` type and current type are the
same type. This is not always a requirement.
- Add a missing bitcast from an old DSOLocalEquivalent to the type of
the new one.
Instead of falling back to selecting TB(N)Z when we fail to select an
optimized compare against 0, select Bcc instead.
Also simplify selectCompareBranch a little while we're here, because the logic
was kind of hard to follow.
At -O0, this is a 0.1% geomean code size improvement for CTMark.
A simple example of where this can kick in is here:
https://godbolt.org/z/4rra6P
In the example above, GlobalISel currently produces a subs, cset, and tbnz.
SelectionDAG, on the other hand, just emits a compare and b.le.
Differential Revision: https://reviews.llvm.org/D92358
This reverts commit cf1c774d6ace59c5adc9ab71b31e762c1be695b1.
This change caused several regressions in the gdb test suite - at least
a sample of which was due to line zero instructions making breakpoints
un-lined. I think they're worth investigating/understanding more (&
possibly addressing) before moving forward with this change.
Revert "[FastISel] NFC: Clean up unnecessary bookkeeping"
This reverts commit 3fd39d3694d32efa44242c099e923a7f4d982095.
Revert "[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option"
This reverts commit a474657e30edccd9e175d92bddeefcfa544751b2.
Revert "Remove static function unused after cf1c774."
This reverts commit dc35368ccf17a7dca0874ace7490cc3836fb063f.
Revert "[lldb] Fix TestThreadStepOut.py after "Flush local value map on every instruction""
This reverts commit 53a14a47ee89dadb8798ca8ed19848f33f4551d5.
This allows us to use its value everywhere, rather than just clang. Some
other places, like opt and lld, will use its value soon.
Rename it internally to LLVM_ENABLE_NEW_PASS_MANAGER.
The #define for it is now in llvm-config.h.
The initial land accidentally set the value of
LLVM_ENABLE_NEW_PASS_MANAGER to the string
ENABLE_EXPERIMENTAL_NEW_PASS_MANAGER instead of its value.
Reviewed By: rnk, hans
Differential Revision: https://reviews.llvm.org/D92072
This allows us to use its value everywhere, rather than just clang. Some
other places, like opt and lld, will use its value soon.
The #define for it is now in llvm-config.h.
Reviewed By: rnk, hans
Differential Revision: https://reviews.llvm.org/D92072