This isn't the ideal fix (use FMF on the select), but it's still an
improvement until we have better FMF propagation to selects and other
FP math operators.
I don't think there's much risk of regression from this change by
not including the FMF on the fcmp any more. The nsz/nnan FMF
should be the same on the fcmp and the fneg (fsub) because they
have the same operand.
This works around the most glaring FMF logical inconsistency cited
in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086
llvm-svn: 362909
This is another step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.
By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.
This is a continuation of D62979 / rL362879.
llvm-svn: 362903
This is the second part of the commit fixing PR38917 (hoisting
partitially redundant machine instruction). Most of PRE (partitial
redundancy elimination) and CSE work is done on LLVM IR, but some of
redundancy arises during DAG legalization. Machine CSE is not enough
to deal with it. This simple PRE implementation works a little bit
intricately: it passes before CSE, looking for partitial redundancy
and transforming it to fully redundancy, anticipating that the next
CSE step will eliminate this created redundancy. If CSE doesn't
eliminate this, than created instruction will remain dead and eliminated
later by Remove Dead Machine Instructions pass.
The third part of the commit is supposed to refactor MachineCSE,
to make it more clear and to merge MachinePRE with MachineCSE,
so one need no rely on further Remove Dead pass to clear instrs
not eliminated by CSE.
First step: https://reviews.llvm.org/D54839
Fixes llvm.org/PR38917
This is fixed recommit of r361356 after PowerPC64 multistage build failure.
llvm-svn: 362901
Pointers that are in-bounds (either through dereferenceable_or_null or
thorough a getelementptr inbounds) cannot be captured with a comparison
against null. There is no way to construct a pointer that is still in
bounds but also NULL.
This helps safe languages that insert null checks before load/store
instructions. Without this patch, almost all pointers would be
considered captured even for simple loads. With this patch, an icmp with
null will not be seen as escaping as long as certain conditions are met.
There was a lot of discussion about this patch. See the Phabricator
thread for detals.
Differential Revision: https://reviews.llvm.org/D60047
llvm-svn: 362900
This is 1 step towards correcting our usage of fast-math-flags when applied on an fcmp.
In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of
the fcmp. But I'm leaving that clause in until we're more confident that we can stop
relying on fcmp's FMF.
By using the more general "isKnownNeverNaN()", we gain a simplification shown on the
tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN).
On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction
in addition to the FMF on the fcmp.
I'll update the 'ult' case below here as a follow-up assuming no problems here.
Differential Revision: https://reviews.llvm.org/D62979
llvm-svn: 362879
Types such as float and i64's do not have legal loads in Thumb1, but will still
be loaded with a LDR (or potentially multiple LDR's). As such we can treat the
cost of addressing mode calculations the same as an i32 and get some optimisation
benefits.
Differential Revision: https://reviews.llvm.org/D62968
llvm-svn: 362874
Now with MVE being added, we can add the vector addressing mode costs for it.
These are generally imm7 multiplied by the size of the type being loaded /
stored.
Differential Revision: https://reviews.llvm.org/D62967
llvm-svn: 362873
The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs
to account for those, and adds extra testing. It is dependant upon hasFPRegs16
as this is what the load/store instructions require.
Differential Revision: https://reviews.llvm.org/D62966
llvm-svn: 362872
We are starting to add an entirely separate vector architecture to the ARM
backend. To do that we need at least some separation between the existing NEON
and the new MVE code. This patch just goes through the Neon patterns and
ensures that they are predicated on HasNEON, giving MVE a stable place to start
from.
No tests yet as this is largely an NFC, and we don't have the other target that
will treat any of these intructions as legal.
Differential Revision: https://reviews.llvm.org/D62945
llvm-svn: 362870
This patch aims to reduce spilling and register moves by using the 3-address
versions of instructions per default instead of the 2-address equivalent
ones. It seems that both spilling and register moves are improved noticeably
generally.
Regalloc hints are passed to increase conversions to 2-address instructions
which are done in SystemZShortenInst.cpp (after regalloc).
Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are
the same), foldMemoryOperandImpl() can no longer trivially fold a spilled
source register since the reg/reg instruction is now 3-address. In order to
remedy this, new 3-address pseudo memory instructions are used to perform the
folding only when the dst and lhs virtual registers are known to be allocated
to the same physreg. In order to not let MachineCopyPropagation run and
change registers on these transformed instructions (making it 3-address), a
new target pass called SystemZPostRewrite.cpp is run just after
VirtRegRewriter, that immediately lowers the pseudo to a target instruction.
If it would have been possibe to insert a COPY instruction and change a
register operand (convert to 2-address) in foldMemoryOperandImpl() while
trusting that the caller (e.g. InlineSpiller) would update/repair the
involved LiveIntervals, the solution involving pseudo instructions would not
have been needed. This is perhaps a potential improvement (see Phabricator
post).
Common code changes:
* A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a
target pass immediately before MachineCopyPropagation.
* VirtRegMap is passed as an argument to foldMemoryOperand().
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D60888
llvm-svn: 362868
In order for GlobalISel to re-use the significant amount of analysis and
optimization code in SDAG's switch lowering, we first have to extract it and
create an interface to be used by both frameworks.
No test changes as it's NFC.
Differential Revision: https://reviews.llvm.org/D62745
llvm-svn: 362857
Summary: Move some code around, in preparation for later fixes
to the non-integral addrspace handling (D59661)
Patch By Jameson Nash <jameson@juliacomputing.com>
Reviewed By: reames, loladiro
Differential Revision: https://reviews.llvm.org/D59729
llvm-svn: 362853
Summary:
The cleanup in D62751 introduced a compile-time regression due to the way DT updates are performed.
Add all insert edges then all delete edges in DTU to match the previous compile time.
Compile time on the test provided by @mstorsjo before and after this patch on my machine:
113.046s vs 35.649s
Repro: clang -target x86_64-w64-mingw32 -c -O3 glew-preproc.c; on https://martin.st/temp/glew-preproc.c.
Reviewers: kuhar, NutshellySima, mstorsjo
Subscribers: jlebar, mstorsjo, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62981
llvm-svn: 362839
rather than two callbacks.
The asynchronous lookup API (which the synchronous lookup API wraps for
convenience) used to take two callbacks: OnResolved (called once all requested
symbols had an address assigned) and OnReady to be called once all requested
symbols were safe to access). This patch updates the asynchronous lookup API to
take a single 'OnComplete' callback and a required state (SymbolState) to
determine when the callback should be made. This simplifies the common use case
(where the client is interested in a specific state) and will generalize neatly
as new states are introduced to track runtime initialization of symbols.
Clients who were making use of both callbacks in a single query will now need to
issue two queries (one for SymbolState::Resolved and another for
SymbolState::Ready). Synchronous lookup API clients who were explicitly passing
the WaitOnReady argument will now need neeed to pass a SymbolState instead (for
'WaitOnReady == true' use SymbolState::Ready, for 'WaitOnReady == false' use
SymbolState::Resolved). Synchronous lookup API clients who were using default
arugment values should see no change.
llvm-svn: 362832
When we call checkResourceLimit in bumpCycle or bumpNode, and we
know the resource count has just reached the limit (the equations
are equal). We should return true to mark that we are resource
limited for next schedule, or else we might continue to schedule
in favor of latency for 1 more schedule and create a schedule that
actually overbook the resource.
When we call checkResourceLimit to estimate the resource limite before
scheduling, we don't need to return true even if the equations are
equal, as it shouldn't limit the schedule for it .
Differential Revision: https://reviews.llvm.org/D62345
llvm-svn: 362805
This could fail, which looked concerning. However nothing was actually
using the results of this. I assume this was intended to use the
anti-feature of analyzeBranch of removing instructions, but wasn't
actually calling it with AllowModify = true.
Fixes bug 42162.
llvm-svn: 362800
lib.exe doesn't allow creating .lib files with object files that have
differing machine types. Update llvm-lib to match.
The motivation is to make it possible to infer the machine type of a
.lib file in lld, so that it can warn when e.g. a 32-bit .lib file is
passed to a 64-bit link (PR38965).
Fixes PR38782.
Differential Revision: https://reviews.llvm.org/D62913
llvm-svn: 362798
This is a potentially large perf win for AVX1 targets because of the way we
auto-vectorize to 256-bit but then expect the backend to legalize/optimize
for the half-implemented AVX1 ISA.
On the motivating example from PR37428 (even though this patch doesn't solve
the vector shift issue):
https://bugs.llvm.org/show_bug.cgi?id=37428
...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell)
because we eliminate the remaining 256-bit vblendv ops.
I added comments on a couple of tests that require further work. If we have
256-bit logic ops separating the vselect and extract, we should probably narrow
everything to 128-bit, but that requires a larger pattern match.
Differential Revision: https://reviews.llvm.org/D62969
llvm-svn: 362797
Change D60691 caused some knock-on failures that weren't caught by the
existing tests. Firstly, selecting a CPU that should have had a
restricted FPU (e.g. `-mcpu=cortex-m4`, which should have 16 d-regs
and no double precision) could give the unrestricted version, because
`ARM::getFPUFeatures` returned a list of features including subtracted
ones (here `-fp64`,`-d32`), but `ARMTargetInfo::initFeatureMap` threw
away all the ones that didn't start with `+`. Secondly, the
preprocessor macros didn't reliably match the actual compilation
settings: for example, `-mfpu=softvfp` could still set `__ARM_FP` as
if hardware FP was available, because the list of features on the cc1
command line would include things like `+vfp4`,`-vfp4d16` and clang
didn't realise that one of those cancelled out the other.
I've fixed both of these issues by rewriting `ARM::getFPUFeatures` so
that it returns a list that enables every FP-related feature
compatible with the selected FPU and disables every feature not
compatible, which is more verbose but means clang doesn't have to
understand the dependency relationships between the backend features.
Meanwhile, `ARMTargetInfo::handleTargetFeatures` is testing for all
the various forms of the FP feature names, so that it won't miss cases
where it should have set `HW_FP` to feed into feature test macros.
That in turn caused an ordering problem when handling `-mcpu=foo+bar`
together with `-mfpu=something_that_turns_off_bar`. To fix that, I've
arranged that the `+bar` suffixes on the end of `-mcpu` and `-march`
cause feature names to be put into a separate vector which is
concatenated after the output of `getFPUFeatures`.
Another side effect of all this is to fix a bug where `clang -target
armv8-eabi` by itself would fail to set `__ARM_FEATURE_FMA`, even
though `armv8` (aka Arm v8-A) implies FP-Armv8 which has FMA. That was
because `HW_FP` was being set to a value including only the `FPARMV8`
bit, but that feature test macro was testing only the `VFP4FPU` bit.
Now `HW_FP` ends up with all the bits set, so it gives the right
answer.
Changes to tests included in this patch:
* `arm-target-features.c`: I had to change basically all the expected
results. (The Cortex-M4 test in there should function as a
regression test for the accidental double-precision bug.)
* `arm-mfpu.c`, `armv8.1m.main.c`: switched to using `CHECK-DAG`
everywhere so that those tests are no longer sensitive to the order
of cc1 feature options on the command line.
* `arm-acle-6.5.c`: been updated to expect the right answer to that
FMA test.
* `Preprocessor/arm-target-features.c`: added a regression test for
the `mfpu=softvfp` issue.
Reviewers: SjoerdMeijer, dmgreen, ostannard, samparker, JamesNagurne
Reviewed By: ostannard
Subscribers: srhines, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D62998
llvm-svn: 362791
Summary:
This allows some integer bitwise operations to instead be performed by
hardware fp instructions. This is correct because the RISC-V spec
requires the F and D extensions to use the IEEE-754 standard
representation, and fp register loads and stores to be bit-preserving.
This is tested against the soft-float ABI, but with hardware float
extensions enabled, so that the tests also ensure the optimisation also
fires in this case.
Reviewers: asb, luismarques
Reviewed By: asb
Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62900
llvm-svn: 362790
Summary:
This patch fixes a bug in the assembler that permitted a type suffix on
predicate registers when not expected. For instance, the following was
previously valid:
faddv h0, p0.q, z1.h
This bug was present in all SVE instructions containing predicates with
no type suffix and no predication form qualifier, i.e. /z or /m. The
latter instructions are already caught with an appropiate error message
by the assembler, e.g.:
.text
<stdin>:1:13: error: not expecting size suffix
cmpne p1.s, p0.b/z, z2.s, 0
^
A similar issue for SVE vector registers was fixed in:
https://reviews.llvm.org/D59636
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D62942
llvm-svn: 362780
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
Takes the maximum number of elements processed in an iteration of
the loop body and subtracts this from the total count. Returns
false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
Takes the number of elements remaining to be processed as well as
the maximum numbe of elements processed in an iteration of the loop
body. Returns the updated number of elements remaining.
llvm-svn: 362774
In r356860, the legalization logic for BSWAP was modified to ISD::ROTL,
rather than the old ISD::{SHL, SRL, OR} nodes.
This works fine on AVR for 8-bit rotations, but 16-bit rotations are
currently unimplemented - they always trigger an assertion error in the
AVRExpandPseudoInsts pass ("RORW unimplemented").
This patch instructions the legalizer to expand 16-bit rotations into
the previous SHL, SRL, OR pattern it did previously.
This fixes the 'issue-cannot-select-bswap.ll' test. Interestingly, this
test failure seems flaky - it passes successfully on the avr-build-01
buildbot, but fails locally on my Arch Linux install.
llvm-svn: 362773
We should keep the symbol type (STT_GNU_IFUNC) for a local ifunc because
it may result in an IRELATIVE reloc that the dynamic loader will use to
resolve the address at startup time.
There is another problem that is not fixed by this patch: a PC relative
relocation should also create a relocation with the ifunc symbol.
llvm-svn: 362767