This is a hacky, but low-risk fix to avoid the infinite loop in PR46271:
https://bugs.llvm.org/show_bug.cgi?id=46271
As discussed there, the problem is that FoldOpIntoSelect() can get into a conflict
with a transform that wants to pull a 'not' op through min/max via
SimplifyDemandedVectorElts(). We need to relax our matching of min/max to include
undefined elements in vector constants to avoid that. Alternatively, we could
improve or cripple the demanded elements analysis, but that could create even
more problems.
The likely better, safer alternative will be to create min/max intrinsics, so
we can remove all of the hacks related to min/max matching in instcombine.
Differential Revision: https://reviews.llvm.org/D81698
This patch adds a new field `bool Is64bit` in `DWARFYAML::Data` to indicate the address size of target. It's helpful for inferring the `AddrSize` in some DWARF sections.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D81709
We never codegen them so this doesn't matter in practice. But
sometimes someone comes along and tries to use these flags
for something else. LIke the Load Value Inject inline assembly
handling.
If the input to the bitcast is a sign bit test, it makes sense to
directly use vpmovmskb or vmovmskps/pd. This removes the need to
copy the sign bits to a k-register and then to a GPR.
Fixes PR46200.
Differential Revision: https://reviews.llvm.org/D81327
This pass has no dependencies on other passes so conditionally
including it in the pipeline doens't do much. Just move it the
pass itself to keep it isolated.
isOverwrite expects the later location as first argument and the earlier
result later. The adjusted call is intended to check whether CC
overwrites DefLoc.
A lot of what EVEX->VEX does is equivalent to what the
prioritization in the assembly parser does. When an AVX mnemonic
is used without any EVEX features or XMM16-31, the parser will
pick the VEX encoding.
Since codegen doesn't go through the parser, we should also
use VEX instructions when we can so that the code coming out of
integrated assembler matches what you'd get from outputing an
assembly listing and parsing it.
The pass early outs if AVX isn't enabled and uses TSFlags to
check for EVEX instructions before doing the more costly table
lookups. Hopefully that's enough to keep this from impacting
-O0 compile times.
relocImm was a complexPattern that handled both ConstantSDNode
and X86Wrapper. But it was only applied selectively because using
it would cause patterns to be not importable into FastISel or
GlobalISel. So it only got applied to flag setting instructions,
stores, RMW arithmetic instructions, and rotates.
Most of the test changes are a result of making patterns available
to GlobalISel or FastISel. The absolute-cmp.ll change is due to
this fixing a pattern ordering issue to make an absolute symbol
match to an 8-bit immediate before trying a 32-bit immediate.
I tried to use PatFrags to reduce the repetition, but I was getting
errors from TableGen.
This was reverted due to a reported memory usage increase. However,
a test case was never provided, and I wasn't able to reproduce it
myself.
Relative to the original patch, I have moved the block cache
structure behind a unique_ptr, to avoid storing a huge structure
inside a DenseMap.
---
Variant on D70103 to fix https://bugs.llvm.org/show_bug.cgi?id=43909.
The caching is switched to always use a BB to cache entry map, which
then contains per-value caches. A separate set contains value handles
with a deletion callback. This allows us to properly invalidate
overdefined values.
A possible alternative would be to always cache by value first and
have per-BB maps/sets in the each cache entry. In that case we could
use a ValueMap and would avoid the separate value handle set. I went
with the BB indexing at the top level to make it easier to integrate
D69914, but possibly that's not the right choice.
Differential Revision: https://reviews.llvm.org/D70376
Brand index was a feature some Pentium III and Pentium 4 CPUs.
It provided an index into a software lookup table to provide a
brand name for the CPU. This is separate from the family/model.
It's unclear to me why this index being non-zero was used to
block checking family/model. I think the effect of this is that
-march=native was not working correctly on the CPUs that have a
non-zero brand index. They are all about 20 years old so this
probably hasn't affected many users.
GCC5 errors out with:
llvm/lib/Analysis/StackSafetyAnalysis.cpp:935:21: error: use of 'KV' before deduction of 'auto'
for (auto &KV : KV.second.Params) {
^
Implement the `hasProtectedVisibility()` hook to indicate that, like
Darwin, WebAssembly doesn't support "protected" visibility.
On ELF, "protected" visibility is intended to be an optimization, however
in practice it often [isn't], and ELF documentation generally ranges from
[not mentioning it at all] to [strongly discouraging its use].
[isn't]: https://www.airs.com/blog/archives/307
[not mentioning it at all]: https://gcc.gnu.org/wiki/Visibility
[strongly discouraging its use]: https://www.akkadia.org/drepper/dsohowto.pdf
While here, also mention the new Reactor support in the release notes.
Summary:
ThinLTO linking runs dataflow processing on collected
function parameters. Then StackSafetyGlobalInfoWrapperPass
in ThinLTO backend will run as usual looking up to external
symbol in the summary if needed.
Depends on D80985.
Reviewers: eugenis, pcc
Reviewed By: eugenis
Subscribers: inglorion, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81242
This decreases the time consumed by the pass [during RawSpeed unity build]
by 25% (0.0586 s -> 0.04388 s).
While that isn't really impressive overall, that wasn't the goal here.
The memory results here are noticeable.
The baseline results are:
```
total runtime: 55.65s.
calls to allocation functions: 19754254 (354960/s)
temporary memory allocations: 4951609 (88974/s)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.79MB
total memory leaked: 198.01MB
```
While with this patch the results are:
```
total runtime: 55.37s.
calls to allocation functions: 19068237 (344403/s) # -3.47 %
temporary memory allocations: 4261772 (76974/s) # -13.93 % (!!!)
peak heap memory consumption: 239.13MB
peak RSS (including heaptrack overhead): 463.73MB
total memory leaked: 198.01MB
```
So we get rid of *a lot* of temporary allocations.
Using `SmallSet<8>` makes sense to me because at least here
for x86 BdVer2, the size of that set is *never* more than 3,
over all of llvm test-suite + RawSpeed.
The story might be different on other targets,
not sure if it will ever justify whole DenseSet,
but if it does SmallDenseSet might be a compromise.
Summary:
This commit slightly modifies the MCDisassembler, and llvm-objdump to
allow targets to also decode entire symbols.
WebAssembly uses the onSymbolStart hook it to decode preludes.
WebAssembly partially disassembles the symbol in its target specific
way; and then falls back to the normal flow of llvm-objdump.
AMDGPU needs it to decode kernel descriptors entirely, and move to the
next symbol.
This commit is to split the above task into 2.
- Changes to llvm-objdump and MC-layer without breaking WebAssembly code
[ this commit ]
- AMDGPU's implementation of onSymbolStart that decodes kernel
descriptors. [ https://reviews.llvm.org/D80713 ]
Reviewers: scott.linder, t-tye, sunfish, arsenm, jhenderson, MaskRay, aardappel
Reviewed By: scott.linder, jhenderson, aardappel
Subscribers: bcain, dschuff, wdng, tpr, sbc100, jgravelle-google, hiraditya, aheejin, MaskRay, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80512
Summary:
Inline functions in Type.h depended upon inline functions isVectorTy and
getScalarType defined in DerivedTypes.h. Reimplement these functions in
Type.h in terms of Type
Reviewers: rengolin, efriedma, echristo, c-rhodes, david-arm
Reviewed By: echristo
Subscribers: tschuett, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81684
Reported on IRC, the tutorial code at the bottom of the page correctly
namespaces the FunctionPassManager, but the as-you-go code does not.
This patch adds the namespace to those.
Similar to a recent change to the X86 backend, this changes things so
that we always produce a reduction intrinsics for all reduction types,
not just the legal ones. This gives a better chance in the backend to
custom lower them to something more suitable for MVE. Especially for
something like fadd the in-order reduction produced during DAG lowering
is already better than the shuffles produced in the midend, and we can
do even better with a bit of custom lowering.
Differential Revision: https://reviews.llvm.org/D81398
This allows running Lit tests that run ssh without having to manually
enter a password (which is inconvenient), by just having ssh-agent
setup properly when running the test suite.
We select all of these via patterns now, so there's no reason to disallow this.
Update select-dup.mir to show that we correctly select the smaller types.
Differential Revision: https://reviews.llvm.org/D81322
This was making it so that the instructions weren't eliminated in
select-rev.mir and select-trn.mir despite not being used.
Update the tests accordingly.
Differential Revision: https://reviews.llvm.org/D81492
Here, I am proposing to add an special case for massv powf4/powd2 function (SIMD counterpart of powf/pow function in MASSV library) in MASSV pass to get later optimizations like conversion from pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's in the DAGCombiner in vector float case. My reason for doing this is: the optimized pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's is faster than powf4/powd2 on P8 and P9.
In case MASSV functions is called, and if the exponent of pow is 0.75 or 0.25, we will get the sequence of sqrt's and if exponent is not 0.75 or 0.25 we will get the appropriate MASSV function.
Reviewed By: steven.zhang
Tags: #LLVM #PowerPC
Differential Revision: https://reviews.llvm.org/D80744
In TestRunner.py, D78589 extracts a `_parseKeywords` function from
`parseIntegratedTestScript`, which then expects `_parseKeywords` to
always return a list of keyword/value pairs. However, the extracted
code sometimes returns an unresolved `lit.Test.Result` on a keyword
parsing error, which then produces a stack dump instead of the
expected diagnostic.
This patch fixes that, makes the style of those diagnostics more
consistent, and extends the lit test suite to cover them.
Reviewed By: ldionne
Differential Revision: https://reviews.llvm.org/D81665
Refactor redzone size calculation. This will simplify changing the
redzone size calculation in future.
Note that AddressSanitizer.cpp violates the latest LLVM style guide in
various ways due to capitalized function names. Only code related to the
change here was changed to adhere to the style guide.
No functional change intended.
Reviewed By: andreyknvl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81367
Noticed while trying to cleanup D66004 - if a shuffle operand came from a scalar, we're better off using INSERTPS vs UNPCKLPS as this is more likely to load fold later on. It also matches our existing BUILD_VECTOR lowering.
We can extend this to other PINSRB/D/Q/W cases in the future as the need arises.