Narrowing stores when the target doesn't support the narrow version
forces the target to expand into a load-modify-store sequence, which
is highly suboptimal. The information narrowing throws away (legality
of the inverse transform) is hard to re-analyze. If the target doesn't
support a store of the narrow type, don't narrow even in pre-legalize
mode.
No test as this is DAGCombiner and depends on target bits.
llvm-svn: 370576
Use a { iN undef, i1 false } struct as the base, and only insert
the first operand, instead of using { iN undef, i1 undef } as the
base and inserting both. This is the same as what we do in InstCombine.
Differential Revision: https://reviews.llvm.org/D67034
llvm-svn: 370573
Restructured the code a little bit in preparation for adding
UMULFIXSAT. I think it will be easier to understand the code
if not interleaving the codegen for signed/unsigned/saturated
cases that much.
llvm-svn: 370569
Some saturation examples for llvm.smul.fix.sat were not showing
the correct result. I've adjusted the operands to make sure that
we actually trigger overflow in those examples.
llvm-svn: 370566
1. zlib::compress accept &size_t but the param is an uint64_t.
2. Some systems don't have zlib installed. Don't use compression by default.
llvm-svn: 370564
cold versus function being newly added.
This is the second half of https://reviews.llvm.org/D66374.
Profile symbol list is the collection of function symbols showing up in
the binary which generates the current profile. It is used to discriminate
function being cold versus function being newly added. Profile symbol list
is only added for profile with ExtBinary format.
During profile use compilation, when profile-sample-accurate is enabled,
a function without profile will be regarded as cold only when it is
contained in that list.
Differential Revision: https://reviews.llvm.org/D66766
llvm-svn: 370563
Summary:
Adds clang builtins and LLVM intrinsics for these experimental
instructions. They are not implemented in engines yet, but that is ok
because the user must opt into using them by calling the builtins.
Reviewers: aheejin, dschuff
Reviewed By: aheejin
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D67020
llvm-svn: 370556
In r370135 I committed a temporary workaround for the sanitized bot to
not set (DY)LD_LIBRARY_PATH when (DY)LD_INSERT_LIBRARIES was set.
Setting (DY)LD_LIBRARY_PATH is only necessary for (standalone)
shared-library builds, so a better solution is to only set the
environment variable when necessary.
Differential revision: https://reviews.llvm.org/D67012
llvm-svn: 370549
This is an updated version of https://reviews.llvm.org/D66909 to fix PR42605.
Basically, current phi translatation translates an old value number to an new
value number for a call instruction based on the literal equality of call
expression, without verifying there is no clobber in between. This is incorrect.
To get a finegrain check, use MachineDependence analysis to do the job. However,
this is still not ideal. Although given a call instruction,
`MemoryDependenceResults::getCallDependencyFrom` returns identical call
instructions without clobber in between using MemDepResult with its DepType to
be `Def`. However, identical is too strict here and we want it to be relaxed a
little to consider phi-translation -- callee is the same, param operands can be
different. That means changing the semantic of `MemDepResult::Def` and I don't
know the potential impact.
So currently the patch is still conservative to only handle
MemDepResult::NonFuncLocal, which means the current call has no function local
clobber. If there is clobber, even if the clobber doesn't stand in between the
current call and the call with the new value, we won't do phi-translate.
Differential Revision: https://reviews.llvm.org/D67013
llvm-svn: 370547
Also improve assembler parser register validation for .seh_ directives.
This requires moving X86-specific seh directive handling into the x86
backend, which addresses some assembler FIXMEs.
Differential Revision: https://reviews.llvm.org/D66625
llvm-svn: 370533
Users have complained llvm.trap produce two ud2 instructions on Win64,
one for the trap, and one for unreachable. This change fixes that.
TrapUnreachable was added and enabled for Win64 in r206684 (April 2014)
to avoid poorly understood issues with the Windows unwinder.
There seem to be two major things in play:
- the unwinder
- C++ EH, _CxxFrameHandler3 & co
The unwinder disassembles forward from the return address to scan for
epilogues. Inserting a ud2 had the effect of stopping the unwinder, and
ensuring that it ran the EH personality function for the current frame.
However, it's not clear what the unwinder does when the return address
happens to be the last address of one function and the first address of
the next function.
The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out
what the current EH state number is. It does this by consulting the
ip2state table, which maps from PC to state number. This seems to go
wrong when the return address is the last PC of the function or catch
funclet.
I'm not sure precisely which system is involved here, but in order to
address these real or hypothetical problems, I believe it is enough to
insert int3 after a call site if it would otherwise be the last
instruction in a function or funclet. I was able to reproduce some
similar problems locally by arranging for a noreturn call to appear at
the end of a catch block immediately before an unrelated function, and I
confirmed that the problems go away when an extra trailing int3
instruction is added.
MSVC inserts int3 after every noreturn function call, but I believe it's
only necessary to do it if the call would be the last instruction. This
change inserts a pseudo instruction that expands to int3 if it is in the
last basic block of a function or funclet. I did what I could to run the
Microsoft compiler EH tests, and the ones I was able to run showed no
behavior difference before or after this change.
Differential Revision: https://reviews.llvm.org/D66980
llvm-svn: 370525
The sequence between the function call and the asm start
may change without affecting what this test is looking for,
but we should have a better idea about what that sequence
looks like.
llvm-svn: 370518
gcc produces the error:
error: specialization of
‘template<class T, class Enable> struct llvm::yaml::ScalarTraits’ in
different namespace
For all specializations outside of llvm::yaml. So I added llvm::yaml to these
specializations to fix the errors on the bots building with gcc (/usr/bin/c++).
llvm-svn: 370510
The Hexagon itineraries are cunningly crafted such that functional units between
itineraries do not clash. Because all itineraries are bundled into the same DFA,
a functional unit index clash would cause an incorrect DFA to be generated.
A workaround for this is to ensure all itineraries declare the universe of all
possible functional units, but this isn't ideal for three reasons:
1) We only have a limited number of FUs we can encode in the packetizer, and
using the universe causes us to hit the limit without care.
2) Silent codegen faults are bad, and careful triage of the FU list shouldn't
be required.
3) Smooshing all itineraries into the same automaton allows combinations of
instruction classes that cannot exist, which bloats the table.
A simple solution is to allow "namespacing" packetizers.
Differential Revision: https://reviews.llvm.org/D66940
llvm-svn: 370508
Something weird happened with the v2i64/v2f64 test cases which
don't use broadcast. So they should already be hoisted, but
weren't in the version I submitted in r370506. This fixes that.
Not sure if something changed or I screwed up.
llvm-svn: 370507
MachineLICM is able to unfold loads to move an invariant load out
a loop, but X86 infrastructure currently lacks the ability to do
this when avx512 embedded broadcasting is used.
This test adds examples for the basic float point operations,
add, mul, and, or, and xor.
llvm-svn: 370506
Summary:
This is brought up in
https://reviews.llvm.org/D64662?id=209923#inline-599490
CFI information are non-relevant to quite some testcases,
we should get rid of checking them when its unecessary.
This patch avoid generating cfi info in testcases that are not
testing prolog/epilog or exception handling.
Reviewers: kbarton, hfinkel, nemanjai, #powerpc
Reviewed By: hfinkel
Subscribers: MaskRay, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67016
llvm-svn: 370505
This is the first stage in refactoring the pipeliner and making it more
accessible for backends to override and control. This separates the logic and
state required to *emit* a scheudule from the logic that *computes* and
validates a schedule.
This will enable (a) new schedule emitters and (b) new modulo scheduling
implementations to coexist.
NFC.
Differential Revision: https://reviews.llvm.org/D67006
llvm-svn: 370500
This tool merges interface stub files to produce a merged interface stub file
or a stub library. Currently it for stub library generation it can produce an
ELF .so stub file, or a TBD file (experimental). It will be used by the clang
-emit-interface-stubs compilation pipeline to merge and assemble the per-CU
stub files into a stub library.
The new IFS format is as follows:
--- !experimental-ifs-v1
IfsVersion: 1.0
Triple: <llvm triple>
ObjectFileFormat: <ELF | TBD>
Symbols:
_ZSymbolName: { Type: <type>, etc... }
...
Differential Revision: https://reviews.llvm.org/D66405
llvm-svn: 370499
Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case.
Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests
llvm-svn: 370497
gcc and icc pass these types in zmm registers in zmm registers.
This patch implements a quick hack to override the register
type before calling convention handling to one that is legal.
Longer term we might want to do something similar to 256-bit
integer registers on AVX1 where we just split all the operations.
Fixes PR42957
Differential Revision: https://reviews.llvm.org/D66708
llvm-svn: 370495
Summary:
MTE allows memory access to bypass tag check iff the address argument
is [SP, #imm]. This change takes advantage of this to demote uses of
tagged addresses to regular FrameIndex operands, reducing register
pressure in large functions.
MO_TAGGED target flag is used to signal that the FrameIndex operand
refers to memory that might be tagged, and needs to be handled with
care. Such operand must be lowered to [SP, #imm] directly, without a
scratch register.
The transformation pass attempts to predict when the offset will be
out of range and disable the optimization.
AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case
this prediction has been wrong, but it is quite inefficient and should
be avoided.
Reviewers: pcc, vitalybuka, ostannard
Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66457
llvm-svn: 370490
I'm looking at unfolding broadcast loads on AVX512 which will
require refactoring this code to select broadcast opcodes instead
of regular load/stores in some cases. Merging them to avoid
further complicating their interfaces.
llvm-svn: 370484
Summary:
Instead of recomputing information for call sites we now use the
function information directly. This is always valid and once we have
call site specific information we can improve here.
This patch also bootstraps attributes that are created on-demand through
an initial update call. Information that is known will then directly be
available in the new attribute without causing an iteration delay.
The tests show how this improves the iteration count.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66781
llvm-svn: 370480