1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00
Commit Graph

8312 Commits

Author SHA1 Message Date
Cullen Rhodes
1ae2c0fb16 [AArch64][SME] Add zero instruction
This patch adds the zero instruction for zeroing a list of 64-bit
element ZA tiles. The instruction takes a list of up to eight tiles
ZA0.D-ZA7.D, which must be in order, e.g.

  zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6.d,za7.d}
  zero {za1.d,za3.d,za5.d,za7.d}

The assembler also accepts 32-bit, 16-bit and 8-bit element tiles which
are mapped to corresponding 64-bit element tiles in accordance with the
architecturally defined mapping between different element size tiles,
e.g.

  * Zeroing ZA0.B, or the entire array name ZA, is equivalent to zeroing
    all eight 64-bit element tiles ZA0.D to ZA7.D.
  * Zeroing ZA0.S is equivalent to zeroing ZA0.D and ZA4.D.

The preferred disassembly of this instruction uses the shortest list of
tile names that represent the encoded immediate mask, e.g.

  * An immediate which encodes 64-bit element tiles ZA0.D, ZA1.D, ZA4.D and
    ZA5.D is disassembled as {ZA0.S, ZA1.S}.
  * An immediate which encodes 64-bit element tiles ZA0.D, ZA2.D, ZA4.D and
    ZA6.D is disassembled as {ZA0.H}.
  * An all-ones immediate is disassembled as {ZA}.
  * An all-zeros immediate is disassembled as an empty list {}.

This patch adds the MatrixTileList asm operand and related parsing to support
this.

Depends on D105570.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105575
2021-07-27 08:35:45 +00:00
Lei Huang
bbc51b9f17 [PowerPC]Add addex instruction definition and MC tests
Add td definitions and asm/disasm tests for the addex instruction introduced in
ISA 3.0.

Reviewed By: nemanjai, amyk, NeHuang

Differential Revision: https://reviews.llvm.org/D106666
2021-07-26 14:55:38 -05:00
Michael Liao
b2d24acf01 [amdgpu] Add 64-bit PC support when expanding unconditional branches.
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D106445
2021-07-26 14:50:30 -04:00
Ulrich Weigand
81afdbc83c [SystemZ] Add support for new cpu architecture - arch14
This patch adds support for the next-generation arch14
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Detection of arch14 as host processor.
- Assembler/disassembler support for new instructions.
- New LLVM intrinsics for certain new instructions.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining  __VEC__ == 10304.

Note: No currently available Z system supports the arch14
architecture.  Once new systems become available, the
official system name will be added as supported -march name.
2021-07-26 16:57:28 +02:00
Cullen Rhodes
f81ad3ab04 [AArch64][AsmParser] NFC: Parser.getTok().getLoc() -> getLoc()
Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D106635
2021-07-26 09:36:34 +00:00
Thomas Johnson
4866aceb76 [ARC] Add tablegen definition for the Find Leading Set (FLS) instruction
Differential Revision: https://reviews.llvm.org/D106602
2021-07-22 17:42:25 -07:00
Thomas Johnson
60773756e3 [ARC] Add disassembly for the conditioned RSUB immediate instruction
Differential Revision: https://reviews.llvm.org/D106497
2021-07-22 11:34:39 -07:00
Cullen Rhodes
5202ca9718 [AArch64][SME] Improve diagnostic for vector select register
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D106540
2021-07-22 13:46:40 +00:00
Simon Tatham
bc23ee33a0 [clang] Use i64 for the !srcloc metadata on asm IR nodes.
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.

!srcloc is generated in clang codegen, and pulled back out by llvm
functions like AsmPrinter::emitInlineAsm that need to report errors in
the inline asm. From there it goes to LLVMContext::emitError, is
stored in DiagnosticInfoInlineAsm, and ends up back in clang, at
BackendConsumer::InlineAsmDiagHandler(), which reconstitutes a true
clang::SourceLocation from the integer cookie.

Throughout this code path, it's now 64-bit rather than 32, which means
that if SourceLocation is expanded to a 64-bit type, this error report
won't lose half of the data.

The compiler will tolerate both of i32 and i64 !srcloc metadata in
input IR without faulting. Test added in llvm/MC. (The semantic
accuracy of the metadata is another matter, but I don't know of any
situation where that matters: if you're reading an IR file written by
a previous run of clang, you don't have the SourceManager that can
relate those source locations back to the original source files.)

Original version of the patch by Mikhail Maltsev.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D105491
2021-07-22 10:24:52 +01:00
Carl Ritson
8b2d2affc5 [AMDGPU] Add VReg_192/VReg_224 support for MIMG instructions
Allow MIMG instructions to be selected with 6/7 VGPRs for vaddr.
Previously these were rounded up to VReg_256 this saves VGPRs.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103800
2021-07-22 10:42:15 +09:00
Cullen Rhodes
23e61e0bd4 [AArch64][SME] Support .arch and .arch_extension assembler directives
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105566
2021-07-21 08:40:27 +00:00
Cullen Rhodes
f8b2068905 [AArch64][SME] Add mova instructions
This patch adds the mova instruction to insert/extract an SVE vector
register to/from a ZA tile vector.

The preferred MOV aliases are also implemented.

Depends on D105572.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm, CarolineConcatto

Differential Revision: https://reviews.llvm.org/D105574
2021-07-21 08:20:01 +00:00
Cullen Rhodes
db780333ba [AArch64][SME] Add ldr and str instructions
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D105573
2021-07-21 08:17:13 +00:00
Craig Topper
a2254d3fcc [RISCV] Teach RISCVMatInt about cases where it can use LUI+SLLI to replace LUI+ADDI+SLLI for large constants.
If we need to shift left anyway we might be able to take advantage
of LUI implicitly shifting its immediate left by 12 to cover part
of the shift. This allows us to use more bits of the LUI immediate
to avoid an ADDI.

isDesirableToCommuteWithShift now considers compressed instruction
opportunities when deciding if commuting should be allowed.

I believe this is the same or similar to one of the optimizations
from D79492.

Reviewed By: luismarques, arcbbb

Differential Revision: https://reviews.llvm.org/D105417
2021-07-20 09:22:06 -07:00
Cullen Rhodes
6765229853 [AArch64][SME] Add system registers and related instructions
This patch adds the new system registers introduced in SME:

  - ID_AA64SMFR0_EL1 (ro) SME feature identifier.
  - SMCR_ELx (r/w) streaming mode control register for configuring
    effective SVE Streaming SVE Vector length when the PE is in
    Streaming SVE mode.
  - SVCR (r/w) streaming vector control register, visible at all
    exception levels. Provides access to PSTATE.SM and PSTATE.ZA
    using MSR and MRS instructions.
  - SMPRI_EL1 (r/w) streaming mode execution priority register.
  - SMPRIMAP_EL2 (r/w) streaming mode priority mapping register.
  - SMIDR_EL1 (ro) streaming mode identification register.
  - TPIDR2_EL0 (r/w) for use by SME software to manage per-thread
    SME context.
  - MPAMSM_EL1 (r/w) MPAM (v8.4) streaming mode register, for
    labelling memory accesses performed in streaming mode.

Also added in this patch are the SME mode change instructions.
Three MSR immediate instructions are implemented to set or clear
PSTATE.SM, PSTATE.ZA, or both respectively:

  - MSR SVCRSM, #<imm1>
  - MSR SVCRZA, #<imm1>
  - MSR SVCRSMZA, #<imm1>

The following smstart/smstop aliases are also implemented for
convenience:

  smstart    -> MSR SVCRSMZA, #1
  smstart sm -> MSR SVCRSM,   #1
  smstart za -> MSR SVCRZA,   #1

  smstop     -> MSR SVCRSMZA, #0
  smstop sm  -> MSR SVCRSM,   #0
  smstop za  -> MSR SVCRZA,   #0

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105576
2021-07-20 08:06:26 +00:00
Derek Schuff
0c7127a873 [WebAssembly] Generate R_WASM_FUNCTION_OFFSET relocs in debuginfo sections
Debug info sections need R_WASM_FUNCTION_OFFSET_I32 relocs (with FK_Data_4 fixup
kinds) to refer to functions (instead of R_WASM_TABLE_INDEX as is used in data
sections). Usually this is done in a convoluted way, with unnamed temp data
symbols which target the start of the function, in which case
WasmObjectWriter::recordRelocation converts it to use the section symbol
instead. However in some cases the function can actually be undefined; in this
case the dwarf generator uses the function symbol (a named undefined function
symbol) instead. In that case the section-symbol transform doesn't work and we
need to generate the correct reloc type a different way. In this change
WebAssemblyWasmObjectWriter::getRelocType takes the fixup section type into
account to choose the correct reloc type.

Fixes PR50408
Differential Revision: https://reviews.llvm.org/D103557
2021-07-19 14:02:33 -07:00
Wouter van Oortmerssen
8a42745952 [WebAssembly] Support R_WASM_MEMORY_ADDR_TLS_SLEB64 for wasm64
Also fixed TLS tests swapping addr & value in store op
Differential Revision: https://reviews.llvm.org/D106096
2021-07-19 10:22:43 -07:00
Cullen Rhodes
fa9e7701df [AArch64][SME] Add SVE2 instructions added in SME
This patch adds support for the following instructions:

    SCLAMP, UCLAMP, REV, DUP (predicate)

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D105577
2021-07-19 08:03:05 +00:00
Craig Topper
84de2a93a3 [RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions.
If the upper 32 bits are zero and bit 31 is set, we might be able to
use zext.w to fill in the zeros after using an lui and/or addi.

Most of this patch is plumbing the subtarget features into the constant
materialization.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D105509
2021-07-16 09:35:56 -07:00
Cullen Rhodes
3f31269929 [AArch64][SME] Add load and store instructions
This patch adds support for following contiguous load and store
instructions:

  * LD1B, LD1H, LD1W, LD1D, LD1Q
  * ST1B, ST1H, ST1W, ST1D, ST1Q

A new register class and operand is added for the 32-bit vector select
register W12-W15. The differences in the following tests which have been
re-generated are caused by the introduction of this register class:

  * llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
  * llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir
  * llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir
  * llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir

D88663 attempts to resolve the issue with the store pair test
differences in the AArch64 load/store optimizer.

The GlobalISel differences are caused by changes in the enum values of
register classes, tests have been updated with the new values.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D105572
2021-07-16 10:11:10 +00:00
Harald van Dijk
f675df37ba [X86] Fix handling of maskmovdqu in X32
The maskmovdqu instruction is an odd one: it has a 32-bit and a 64-bit
variant, the former using EDI, the latter RDI, but the use of the
register is implicit. In 64-bit mode, a 0x67 prefix can be used to get
the version using EDI, but there is no way to express this in
assembly in a single instruction, the only way is with an explicit
addr32.

This change adds support for the instruction. When generating assembly
text, that explicit addr32 will be added. When not generating assembly
text, it will be kept as a single instruction and will be emitted with
that 0x67 prefix. When parsing assembly text, it will be re-parsed as
ADDR32 followed by MASKMOVDQU64, which still results in the correct
bytes when converted to machine code.

The same applies to vmaskmovdqu as well.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103427
2021-07-15 22:56:08 +01:00
Fangrui Song
35af2802d5 [test] Avoid llvm-readelf/llvm-readobj one-dash long options and deprecated aliases (e.g. --file-headers) 2021-07-15 10:26:21 -07:00
Cullen Rhodes
c55c74c634 [AArch64][SME] Add outer product instructions
This patch adds support for the following outer product instructions:

  * BFMOPA, BFMOPS, FMOPA, FMOPS, SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA,
    UMOPS, USMOPA, USMOPS.

Depends on D105570.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105571
2021-07-15 09:51:06 +00:00
Thomas Lively
81bb5f99ad [WebAssembly] Remove datalayout strings from llc tests
The data layout strings do not have any effect on llc tests and will become
misleadingly out of date as we continue to update the canonical data layout, so
remove them from the tests.

Differential Revision: https://reviews.llvm.org/D105842
2021-07-14 11:17:08 -07:00
Jinsong Ji
8853fbe08a [AIX] Enable dollar sign as PC in inlineasm
$ is used as PC for PowerPC inlineasm, ELF use it,
enable it for AIX XCOFF as well.

Reviewed By: #powerpc, amyk, nemanjai

Differential Revision: https://reviews.llvm.org/D105956
2021-07-14 13:37:52 +00:00
Cullen Rhodes
fcd9253fa0 [AArch64][SME] Add matrix register definitions and parsing support
SME introduces the ZA array, a new piece of architectural register state
consisting of a matrix of [SVLb x SVLb] bytes, where SVL is the
implementation defined Streaming SVE vector length and SVLb is the
number of 8-bit elements in a vector of SVL bits.

SME instructions consist of three types of matrix operands:

  * Tiles: a ZA tile is a square, two-dimensional sub-array of elements
  within the ZA array. These tiles make up the larger accumulator array
  and the granularity varies based on the element size, i.e.
    - ZAQ0..ZAQ15 (smallest tile granule)
    - ZAD0..ZAD7
    - ZAS0..ZAS3
    - ZAH0..ZAH1
    or ZAB0       (largest tile granule, single tile)
  * Tile vectors: similar to regular tiles, but have an extra 'h' or 'v'
  to tell how the vector at [reg+offset] is layed out in the tile,
  horizontally or vertically. E.g. za1h.h or za15v.q, which corresponds
  to vectors in registers ZAH1 and ZAQ15, respectively.
  * Accumulator matrix: this is the entire accumulator array ZA.

This patch adds the register classes and related operands and parsing
for SME instructions operating on the accumulator array.

The ADDHA and ADDVA instructions which operate on tiles are also added
in this patch to make some use of the code added, later patches will
make use of the other operands introduced here.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Co-authored by: Sander de Smalen (@sdesmalen)

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105570
2021-07-14 08:25:49 +00:00
Jinsong Ji
74491befb9 [AIX] Update testcase to use aix triple
We have implemented the basic MCAsmParser now, we can use the triple
directly now.
2021-07-14 03:32:37 +00:00
Hafiz Abid Qadeer
5ad78e6343 [AMDGPU] Handle s_branch to another section.
Currently, if target of s_branch instruction is in another section, it will fail with the error of undefined label.  Although in this case, the label is not undefined but present in another section. This patch tries to handle this issue. So while handling fixup_si_sopp_br fixup in getRelocType, if the target label is undefined we issue an error as before. If it is defined, a new relocation type R_AMDGPU_REL16 is returned.

This issue has been reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100181 and https://bugs.llvm.org/show_bug.cgi?id=45887. Before https://reviews.llvm.org/D79943, we used to get an crash for this scenario. The crash is fixed now but the we still get an undefined label error.  Jumps to other section can arise with hold/cold splitting.

A patch to handle the relocation in lld will follow shortly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D105760
2021-07-13 12:17:47 +01:00
Cullen Rhodes
e25a1a8f41 [AArch64] Add target features for Armv9-A Scalable Matrix Extension (SME)
First patch in a series adding MC layer support for the Arm Scalable
Matrix Extension.

This patch adds the following features:

    sme, sme-i64, sme-f64

The sme-i64 and sme-f64 flags are for the optional I16I64 and F64F64
features.

If a target supports I16I64 then the following instructions are
implemented:

  * 64-bit integer ADDHA and ADDVA variants (D105570).
  * SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA, UMOPS, USMOPA, and USMOPS
    instructions that accumulate 16-bit integer outer products into 64-bit
    integer tiles.

If a target supports F64F64 then the FMOPA and FMOPS instructions that
accumulate double-precision floating-point outer products into
double-precision tiles are implemented.

Outer products are implemented in D105571.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D105569
2021-07-12 13:28:10 +00:00
Wouter van Oortmerssen
538b137e0b [WebAssembly] Added initial type checker to MC Assembler
This to protect against non-sensical instruction sequences being assembled,
which would either cause asserts/crashes further down, or a Wasm module being output that doesn't validate.

Unlike a validator, this type checker is able to give type-errors as part of the parsing process, which makes the assembler much friendlier to be used by humans writing manual input.

Because the MC system is single pass (instructions aren't even stored in MC format, they are directly output) the type checker has to be single pass as well, which means that from now on .globaltype and .functype decls must come before their use. An extra pass is added to Codegen to collect information for this purpose, since AsmPrinter is normally single pass / streaming as well, and would otherwise generate this information on the fly.

A `-no-type-check` flag was added to llvm-mc (and any other tools that take asm input) that surpresses type errors, as a quick escape hatch for tests that were not intended to be type correct.

This is a first version of the type checker that ignores control flow, i.e. it checks that types are correct along the linear path, but not the branch path. This will still catch most errors. Branch checking could be added in the future.

Differential Revision: https://reviews.llvm.org/D104945
2021-07-09 14:07:25 -07:00
Anirudh Prasad
5a7de1d054 [MCParser][z/OS] Mark a few tests as unsupported for the z/OS Target
- Background here is that that these sets of tests are "invalid" to be run on z/OS
- The reason is because these test constructs that HLASM never supports (HLASM doesn't support GNU style directives)
- Usually tests are geared towards a particular target via the use of a triple that targets just that platform, but these tests require the use of a "default triple"
- Thus, we mark these tests as "UNSUPPORTED" for z/OS since we don't want to run these for z/OS

Reviewed By: yusra.syeda, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D105204
2021-07-05 11:06:52 -04:00
Jinsong Ji
4d99eb410d [AIX] Add dummy XCOFF MCAsmParserExtension
Implement XCOFFMCAsmParser so that we can use MC to parse inline asm.

The directives and storage mapping classes will be added later
iteratively.

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D105259
2021-07-02 16:12:21 +00:00
Igor Kudrin
ddd200c2a0 [ARMInstPrinter] Print the target address of a branch instruction
This follows other patches that changed printing immediate values of
branch instructions to target addresses, see D76580 (x86), D76591 (PPC),
D77853 (AArch64).

As observing immediate values might sometimes be useful, they are
printed as comments for branch instructions.

// llvm-objdump -d output (before)
000200b4 <_start>:
   200b4: ff ff ff fa   blx     #-4 <thumb>
000200b8 <thumb>:
   200b8: ff f7 fc ef   blx     #-8 <_start>

// llvm-objdump -d output (after)
000200b4 <_start>:
   200b4: ff ff ff fa   blx     0x200b8 <thumb>         @ imm = #-4
000200b8 <thumb>:
   200b8: ff f7 fc ef   blx     0x200b4 <_start>        @ imm = #-8

// GNU objdump -d.
000200b4 <_start>:
   200b4:       faffffff        blx     200b8 <thumb>
000200b8 <thumb>:
   200b8:       f7ff effc       blx     200b4 <_start>

Differential Revision: https://reviews.llvm.org/D104701
2021-06-30 16:35:28 +07:00
Fangrui Song
b74e84fd4f [test] Change -t to --syms and -s to -S for llvm-readobj RUN lines
-s and -t will be changed to improve consistency with llvm-readelf.
The inconsistency issue regularly contributes to confusion using the two tools.
2021-06-29 11:50:31 -07:00
Soham Dixit
2186304c09 [DebugInfo] Bug 41152 - Improve dumping of empty location expressions
Fixes PR41152 (https://bugs.llvm.org/show_bug.cgi?id=41152).

Reviewed by: jhenderson, dblaikie, SouraVX

Differential Revision: https://reviews.llvm.org/D103502
2021-06-29 09:21:00 +01:00
David Spickett
276c38fd65 [llvm][ARM] Treat xscale arch as an alias of armv5te
Previously xscale was known to everything apart
from the ELF streamer so we would crash as soon
as you tried to output an object file.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D104776
2021-06-28 15:20:24 +00:00
Lucas Prates
e6525ee662 [Aarch64] Adding support for Armv9-A Realm Management Extension
This adds support for Armv9-A's Realm Management Extension, including
three new system registers - MFAR_EL3, GPCCR_EL3 and GPTBR_EL3 - and
four new TLBI instructions.

The reference for the Realm Management Extension can be found at: https://developer.arm.com/documentation/ddi0615/aa.

Based on patches by Victor Campos.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D104773
2021-06-28 13:45:22 +01:00
Igor Kudrin
7fb2900382 [llvm-objdump] Prefix memory operand addresses with '0x'
This helps to avoid ambiguity when the address contains only digits 0..9.

Differential Revision: https://reviews.llvm.org/D104909
2021-06-28 14:25:21 +07:00
Ulrich Weigand
fc8f374f5d [SystemZ] Add support for .reloc assembler directive
Add support for the .reloc directive along the lines of
other back-ends.

This fixes a regression after https://reviews.llvm.org/D104080
was merged, since that patch presupposed support for .reloc.
2021-06-25 21:51:10 +02:00
Fangrui Song
ba4b6a66c6 [MC][ELF] Change SHT_LLVM_CALL_GRAPH_PROFILE relocations from SHT_RELA to SHT_REL
... even on targets preferring RELA. The section is only consumed by ld.lld
which can handle REL.

Follow-up to D104080 as I explained in the review. There are two advantages:

* The D104080 code only handles RELA, so arm/i386/mips32 etc may warn for -fprofile-use=/-fprofile-sample-use= usage.
* Decrease object file size for RELA targets

While here, change the relocation to relocate weights, instead of 0,1,2,3,..
I failed to catch the issue during review.
2021-06-24 21:35:48 -07:00
Aakanksha Patil
d4359ff02a [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Alexander Yermolovich
d12ae1eaf8 [LLD][LLVM] CG Graph profile using relocations
Currently when .llvm.call-graph-profile is created by llvm it explicitly encodes the symbol indices. This section is basically a black box for post processing tools. For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not. In non-visible behavior indices point to wrong symbols. The visible behavior indices point outside of Symbol table: "invalid symbol index".

This patch changes the format by using R_*_NONE relocations to indicate the from/to symbols. The Frequency (Weight) will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.

With this approach post processing tools that handle relocations correctly work for this section also. Tools can add/remove symbols and as long as they handle relocation sections with this approach information stays correct.

Doing a quick experiment with clang-13.
The size went up from 107KB to 322KB, aggregate of all the input sections. Size of clang-13 binary is ~118MB. For users of -fprofile-use/-fprofile-sample-use the size of object files will go up slightly, it will not impact final binary size.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D104080
2021-06-24 09:09:33 -07:00
Fangrui Song
bb7326c8ea [AArch64][X86] Allow 64-bit label differences lower to IMAGE_REL_*_REL32
`IMAGE_REL_ARM64_REL64/IMAGE_REL_AMD64_REL64` do not exist and `.quad a - .` is
currently not representable.

For instrumentation, `.quad a - .` is useful representing a cross-section
reference in a metadata section, to allow ELF medium/large code models. The COFF
limitation makes such generic instrumentations inconvenient. I plan to make a
PGO/coverage metadata section field relative in D104556.

Differential Revision: https://reviews.llvm.org/D104564
2021-06-21 14:32:25 -07:00
Saleem Abdulrasool
daedfc6daf RISCV: simplify a test case for RISCV (NFCI)
The output of the object file is unimportant and entirely discarded.
Simply redirect the output to `/dev/null` or `NUL` as the case may be.
Additionally, the space between the labels is unimportant.  There is no
need to add space between the labels.  Two labels at the same address
are sufficient to generate the difference expression and should still
test the same behaviour.
2021-06-18 08:19:16 -07:00
Heejin Ahn
f7b0205560 [WebAssembly] Rename event to tag
We recently decided to change 'event' to 'tag', and 'event section' to
'tag section', out of the rationale that the section contains a
generalized tag that references a type, which may be used for something
other than exceptions, and the name 'event' can be confusing in the web
context.

See
- https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130
- https://github.com/WebAssembly/exception-handling/pull/161

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D104423
2021-06-17 20:34:19 -07:00
Saleem Abdulrasool
0c385efcae RISCV: clean up target expression handling
The target specific expression handling was slightly regressed by
bbea64250f65480d787e1c5ff45c4de3ec2dcda8.  This restores the proper
sub-expression evaluation to allow for constant folding within the
expression.  We explicitly discard the layout and assembler when
evaluating the expression to avoid any symbolic computation and instead
using the `evaluateAsRelocatable` to canonicalise and constant fold
only.

We can also simplify the expression handling - none of the target
variants support symbolic difference.  This simplifies the logic for
that and adds additional tests to ensure that we do not accidentally
regress here in the future.

Reviewed By: maskray

Differential Revision: https://reviews.llvm.org/D104473
2021-06-17 13:35:32 -07:00
Saleem Abdulrasool
f56e4f6d3d RISCV: adjust handling of relocation emission for RISCV
This re-architects the RISCV relocation handling to bring the
implementation closer in line with the implementation in binutils.  We
would previously aggressively resolve the relocation.  With this
restructuring, we always will emit a paired relocation for any symbolic
difference of the type of S±T[±C] where S and T are labels and C is a
constant.

GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE`
which indicates that a fixup may be expanded into multiple relocations.
This is used by the RISCV backend to always emit a paired relocation -
either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] +
SUB[WIDTH] for a debug info relocation.  Irrespective of whether linker
relaxation support is enabled, symbolic difference is always emitted as
a paired relocation.

This change also sinks the target specific behaviour down into the
target specific area rather than exposing it to the shared relocation
handling.  In the process, we also sink the "special" handling for debug
information down into the RISCV target.  Although this improves the path
for the other targets, this is not necessarily entirely ideal either.
The changes in the debug info emission could be done through another
type of hook as this functionality would be required by any other target
which wishes to do linker relaxation.  However, as there are no other
targets in LLVM which currently do this, this is a reasonable thing to
do until such time as the code needs to be shared.

Improve the handling of the relocation (and add a reduced test case from
the Linux kernel) to ensure that we handle complex expressions for
symbolic difference.  This ensures that we correct relocate symbols with
the adddends normalized and associated with the addition portion of the
paired relocation.

This change also addresses some review comments from Alex Bradbury about
the relocations meant for use in the DWARF CFA being named incorrectly
(using ADD6 instead of SET6) in the original change which introduced the
relocation type.

This resolves the issues with the symbolic difference emission
sufficiently to enable building the Linux kernel with clang+IAS+lld
(without linker relaxation).

Resolves PR50153, PR50156!
Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143

Reviewed By: nickdesaulniers, maskray

Differential Revision: https://reviews.llvm.org/D103539
2021-06-17 08:20:02 -07:00
Florian Hahn
19b14a93b0 [X86] Check using default in test added in 0bd5bbb31e0345ae.
Make sure llvm-mc is invariant with respect to debug locations in the
test (checks update to use the -x86-pad-for-align default value)
2021-06-17 13:19:43 +01:00
Florian Hahn
3e8d4a4389 [X86] Add test showing binary differences with -x86-pad-for-align.
This patch adds a test case showing how a single extra .loc can cause
binary differences when using -x86-pad-for-align=true.

The issue has been discussed in D94542, PR42138, PR48742.
2021-06-17 12:27:17 +01:00
Kai Luo
adf785206e [PowerPC] Export 16 byte load-store instructions
Export `lq`, `stq`, `lqarx` and `stqcx.` in preparation for implementing 16-byte lock free atomic operations on AIX.
Add a new register class `g8prc` for these instructions, since these instructions require even-odd register pair.

Reviewed By: nemanjai, jsji, #powerpc

Differential Revision: https://reviews.llvm.org/D103010
2021-06-15 01:56:10 +00:00