This adds a new instrinsic to support the rdpid instruction. The implementation is a bit weird because the intrinsic is defined as always returning 32-bits, but the assembler support thinks the instruction produces a 64-bit register in 64-bit mode. But really it zeros the upper 32 bits. So I had to add separate patterns where 64-bit mode uses an extract_subreg.
Differential Revision: https://reviews.llvm.org/D42205
llvm-svn: 322910
We did this for inline call site line tables, but we hadn't done it for
regular function line tables yet. This patch copies that logic from
encodeInlineLineTable.
llvm-svn: 322905
Summary:
This patch implements d16 support for image load, image store and image sample intrinsics.
Reviewers:
Matt, Brian.
Differential Revision:
https://reviews.llvm.org/D3991
llvm-svn: 322903
Previously, these parts weren't ever checked. The label patterns
need to be extended to match successfully on macho.
Differential Revision: https://reviews.llvm.org/D42126
llvm-svn: 322900
The compilation directory has always been #0, but as of DWARF v5 it is
explicitly listed in the line-table section instead of implicitly
being a reference to the compile_unit DIE's DW_AT_comp_dir attribute.
This means the dumper should number the dumped array starting with 0
or 1 depending on the DWARF version of the line table.
References in the generated DWARF are correct, it's just the dumper
that was wrong. Also some assembler-coded tests were similarly
confused about directory numbers.
llvm-svn: 322884
r322086 removed the trailing information describing reg classes for each
register.
This patch adds printing reg classes next to every register when
individual operands/instructions/basic blocks are printed. In the case
of dumping MIR or printing a full function, by default don't print it.
Differential Revision: https://reviews.llvm.org/D42239
llvm-svn: 322867
LTO sets dso_local as an optimization, so don't clear it.
This avoid clearing it from undefined hidden symbols, which would then
fail the verifier.
llvm-svn: 322814
For example, a build_vector of i64 bitcasted from v2i32 can be turned into a concat_vectors of the v2i32 vectors with a bitcast to a vXi64 type
Differential Revision: https://reviews.llvm.org/D42090
llvm-svn: 322811
This is similar to r322317, but for visibility. It is not as neat
because we have to special case extern_weak.
The idea is the same as the previous change, make the transition to
explicit dso_local easier for the frontends. With this they only have
to add dso_local to symbols where we need some external information to
decide if it is dso_local (like it being part of an ELF executable).
llvm-svn: 322806
Right now, it is not possible to run MachineCSE in the middle of the
GlobalISel pipeline. Being able to run generic optimizations between the
core passes of GlobalISel was one of the goals of the new ISel framework.
This is the first attempt to do it.
The problem is that MachineCSE pass assumes all register operands have a
register class, which, in GlobalISel context, won't be true until after the
InstructionSelect pass. The reason for this behaviour is that before
replacing one virtual register with another, MachineCSE pass (and most of
the other optimization machine passes) must check if the virtual registers'
constraints have a (sufficiently large) intersection, and constrain the
resulting register appropriately if such intersection exists.
GlobalISel extends the representation of such constraints from just a
register class to a triple (low-level type, register bank, register
class).
This commit adds MachineRegisterInfo::constrainRegAttrs method that extends
MachineRegisterInfo::constrainRegClass to such a triple.
The idea is that going forward we should use:
- RegisterBankInfo::constrainGenericRegister within GlobalISel's
InstructionSelect pass
- MachineRegisterInfo::constrainRegClass within SelectionDAG ISel
- MachineRegisterInfo::constrainRegAttrs everywhere else regardless
the target and instruction selector it uses.
Patch by Roman Tereshin. Thanks!
llvm-svn: 322805
Before, it wasn't possible to get backtraces inside outlined functions. This
commit adds DISubprograms to the IR functions created by the outliner which
makes this possible. Also attached a test that ensures that the produced
debug information is correct. This is useful to users that want to debug
outlined code.
llvm-svn: 322789
Every known PE COFF target emits /EXPORT: linker flags into a .drective
section. The AsmPrinter should handle this.
While we're at it, use global_values() and emit each export flag with
its own .ascii directive. This should make the .s file output more
readable.
llvm-svn: 322788
Summary:
-hwasan-mapping-offset defines the non-zero shadow base address.
-hwasan-kernel disables calls to __hwasan_init in module constructors.
Unlike ASan, -hwasan-kernel does not force callback instrumentation.
This is controlled separately with -hwasan-instrument-with-calls.
Reviewers: kcc
Subscribers: srhines, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42141
llvm-svn: 322785
The code wasn't zero-extending correctly, so the comparison could
spuriously fail.
Adds some AArch64 tests to cover this case.
Inspired by D41791.
Differential Revision: https://reviews.llvm.org/D41798
llvm-svn: 322767
It appears that we haven't been prioritizing rules that contain nested
instructions properly. InstructionOperandMatcher didn't override
isHigherPriorityThan so it never compared the instructions/operands/predicates
inside nested instructions.
Fixes PR35926. Thanks to Diana Picus for the bug report.
llvm-svn: 322754
Get rid of DEBUG_FUNCTION_NAME symbols. When we actually debug
data, maybe we'll want somewhere to put it... but having a symbol
that just stores the name of another symbol seems odd.
It means you have multiple Symbols with the same name, one
containing the actual function and another containing the name!
Store the names in a vector on the WasmObjectFile when reading
them in. Also stash them on the WasmFunctions themselves.
The names are //not// "symbol names" or aliases or anything,
they're just the name that a debugger should show against the
function body itself. NB. The WasmObjectFile stores them so that
they can be exported in the YAML losslessly, and hence the tests
can be precise.
Enforce that the CODE section has been read in before reading
the "names" section. Requires minor adjustment to some tests.
Patch by Nicholas Wilson!
Differential Revision: https://reviews.llvm.org/D42075
llvm-svn: 322741
Trying to link
__attribute__((weak, visibility("hidden"))) extern int foo;
int *main(void) {
return &foo;
}
on OS X fails with
ld: 32-bit RIP relative reference out of range (-4294971318 max is +/-2GB): from _main (0x100000FAB) to _foo@0x00001000 (0x00000000) in '_main' from test.o for architecture x86_64
The problem being that 0 cannot be computed as a fixed difference from
%rip. Exactly the same issue exists on ELF and we can use the same
solution.
llvm-svn: 322739
This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG.
Differential revision: https://reviews.llvm.org/D40922
llvm-svn: 322738
The ARM backend contains code that tries to optimize compares by replacing them with an existing instruction that sets the flags the same way. This allows it to replace a "cmp" with a "adds", generalizing the code that replaces "cmp" with "sub". It also heuristically disables sinking of instructions that could potentially be used to replace compares (currently only if they're next to each other).
Differential revision: https://reviews.llvm.org/D38378
llvm-svn: 322737
If we are splatting pairs of 32-bit elements, we can use a 64-bit broadcast to get the job done.
We could probably could probably do this with other sizes too, for example four 16-bit elements. Or we could broadcast pairs of 16-bit elements using a 32-bit element broadcast. But I've left that as a future improvement.
I've also restricted this to AVX2 only because we can only broadcast loads under AVX.
Differential Revision: https://reviews.llvm.org/D42086
llvm-svn: 322730
We legalize selects of masks with scalar conditions using a bitcast to an integer type. But if we are in 32-bit mode we can't convert v64i1 to i64. So instead split the v64i1 to v32i1 and concat it back together. Each half will then be legalized by bitcasting to i32 which is fine.
The test case is a little indirect. If we have the v64i1 select in IR it will get legalized by legalize vector ops which has a run of type legalization after it. That type legalization run is able to fix this i64 bitcast. So in order to avoid that we need a build_vector of a splat which legalize vector ops will ignore. Legalize DAG will then turn that into a select via LowerBUILD_VECTORvXi1. And the select will get legalized. In this case there is no type legalizer run to cleanup the bitcast.
This fixes pr35972.
llvm-svn: 322724
candidates with coldcc attribute.
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.
Differential Revision: https://reviews.llvm.org/D38413
llvm-svn: 322721
BRCTH is capable of a long branch which needs to be recognized during branch
relaxation. This is done by checking for ExtraRelaxSize == 0.
Review: Ulrich Weigand
llvm-svn: 322688
Summary:
Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.
In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.
Reviewers: craig.topper, jmolloy, olista01
Reviewed By: olista01
Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D41863
llvm-svn: 322663
I was comparing the demanded-bits implementations between InstCombine
and TargetLowering as part of investigating questions in D42088 and
noticed that this was wrong in IR. We were losing all of the prior
known bits when we got back to the 'zext'.
llvm-svn: 322662
When the compressed instruction set is enabled, the 16-bit c.nop can be
generated if necessary.
Differential Revision: https://reviews.llvm.org/D41221
Patch by Shiva Chen.
llvm-svn: 322658