1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00
Commit Graph

144198 Commits

Author SHA1 Message Date
Craig Topper
5ba4ec05cc [RISCV] Split zvlsseg searchable table into 4 separate tables. Index by properties rather than intrinsic ID.
Intrinsic ID is a 32-bit value which made each row of the table 4
byte aligned. The remaining fields used 5 bytes. This meant 3 bytes
of padding per row.

This patch breaks the table into 4 separate tables and indexes them
by properties we know about the intrinsic. NF, masked,
strided, ordered, etc. The indexed load/store tables have no
padding in their rows now.

All together this reduces the size of llc binary by ~28K.

I'm considering adding similar tables for isel of non-segment
load/store as well to cut down the size of the isel table and
probably improve our isel performance. Those tables would need to
indexed from intrinsics, IR loads/stores, gathers/scatters, and
RISCVISD opcodes. So having a table that can be indexed without using
intrinsic ID is more flexible.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D96894
2021-02-18 19:00:49 -08:00
Craig Topper
be46ec7d63 [RISCV] Enable PrimaryKeyEarlyOut on RISCVVPseudosTable.
This table is queried in RISCVMCInstLower without knowing
whether the instruction is a vector pseudo. Due to the way the
binary search works, we have to do log2(tablesize) checks just
to determine a non-vector instruction isn't in the table.

Conveniently, all the vector pseudos are pretty tightly
packed within the internal instruction enum. By enabling the
PrimaryKeyEarlyOut, tablegen will emit a check against the
beginning and end of the table before doing the binary search.
This gives a quick early out on the search for the majority
of non-vector instructions.

Differential Revision: https://reviews.llvm.org/D97016
2021-02-18 18:59:32 -08:00
Adrian Prantl
5446af1778 Reset the EntryValue location flag in finalizeEntryValue.
This fixes an assertion error when entry values are combined with
DW_OP_LLVM_fragment.
2021-02-18 18:36:36 -08:00
Wei Mi
1344e2b49c [SampleFDO] Stop repeated indirect call promotion for the same target.
Found a problem in indirect call promotion in sample loader pass. Currently
if an indirect call is promoted for a target, and if the parent function is
inlined into some other function, the indirect call can be promoted for the
same target again. That is redundent which can harm performance and can cause
excessive compile time in some extreme case.

The patch fixes the issue. If a target is promoted for an indirect call, the
patch will write ICP metadata with the target call count being set to 0.
In the later ICP in sample profile loader, if it sees a target has 0 count
for an indirect call, it knows the target has been promoted and won't do
indirect call promotion for the indirect call.

The fix brings 0.1~0.2% performance on our search benchmark.

Differential Revision: https://reviews.llvm.org/D96806
2021-02-18 17:01:32 -08:00
Matt Arsenault
b9ed6d64dc MIR: Fix parser crash on syntax error on first character
This was calling the diagnostic printer before the context member was
initialized.
2021-02-18 18:59:08 -05:00
Jianzhou Zhao
fc057bebb5 [dfsan] Instrument origin variable and function definitions
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse, gbalats

Differential Revision: https://reviews.llvm.org/D96977
2021-02-18 23:50:05 +00:00
Leonard Chan
b854304d0e [llvm][IR] Do not place constants with static relocations in a mergeable section
This patch provides two major changes:

1. Add getRelocationInfo to check if a constant will have static, dynamic, or
   no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.)
2. Only allow a constant with no relocations (static or dynamic) to be placed
   in a mergeable section.

This will allow unused symbols that contain static relocations and happen to
fit in mergeable constant sections (.rodata.cstN) to instead be placed in
unique-named sections if -fdata-sections is used and subsequently garbage collected
by --gc-sections.

See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html.

Differential Revision: https://reviews.llvm.org/D95960
2021-02-18 15:39:00 -08:00
Hongtao Yu
9372bfe8ed [CSSPGO] Use callsite sample counts to annotate indirect call sites.
With CSSPGO all indirect call targets are counted torwards the original indirect call site in the profile, including both inlined and non-inlined targets. Therefore no need to look for callee entry counts. This also fixes the issue where callee entry count doesn't match callsite count due to the nature of CS sampling.

I'm also cleaning up the orginal code that called `findIndirectCallFunctionSamples` just to compute the sum, the return value of which was disgarded.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D96990
2021-02-18 14:52:34 -08:00
Petr Hosek
d548975423 [Coverage] Store compilation dir separately in coverage mapping
We currently always store absolute filenames in coverage mapping.  This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines.  We are also
duplicating the path prefix potentially wasting space.

This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.

Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.

Differential Revision: https://reviews.llvm.org/D95753
2021-02-18 14:34:39 -08:00
Matt Arsenault
2dc47494f2 GlobalISel: Merge some AMDGPU ABI lowering code to generic code
AMDGPU currently has a lot of pre-processing code to pre-split
argument types into 32-bit pieces before passing it to the generic
code in handleAssignments. This is a bit sloppy and also requires some
overly fancy iterator work when building the calls. It's better if all
argument marshalling code is handled directly in
handleAssignments. This handles more situations like decomposing large
element vectors into sub-element sized pieces.

This should mostly be NFC, but does change the generated code by
shifting where the initial argument packing instructions are placed. I
think this is nicer looking, since it now emits the packing code
directly after the relevant copies, rather than after the copies for
the remaining arguments.

This doubles down on gfx6/gfx7 using the gfx8+ ABI for 16-bit
types. This is ultimately the better option, but incompatible with the
DAG. Fixing this requires more work, especially for f16.
2021-02-18 17:26:55 -05:00
Nikita Popov
9fa93a3e70 [BasicAA] Always strip single-argument phi nodes
We can always look through single-argument (LCSSA) phi nodes when
performing alias analysis. getUnderlyingObject() already does this,
but stripPointerCastsAndInvariantGroups() does not. We still look
through these phi nodes with the usual aliasPhi() logic, but
sometimes get sub-optimal results due to the restrictions on value
equivalence when looking through arbitrary phi nodes. I think it's
generally beneficial to keep the underlying object logic and the
pointer cast stripping logic in sync, insofar as it is possible.

With this patch we get marginally better results:

  aa.NumMayAlias | 5010069 | 5009861
  aa.NumMustAlias | 347518 | 347674
  aa.NumNoAlias | 27201336 | 27201528
  ...
  licm.NumPromoted | 1293 | 1296

I've renamed the relevant strip method to stripPointerCastsForAliasAnalysis(),
as we're past the point where we can explicitly spell out everything
that's getting stripped.

Differential Revision: https://reviews.llvm.org/D96668
2021-02-18 23:07:50 +01:00
Simon Pilgrim
488d2af727 [DAG] getTruncatedUSUBSAT - always truncate operands. NFCI.
As noticed on D96703, we're always truncating the operands so should use getNode(ISD::TRUNCATE) instead of getZExtOrTrunc.
2021-02-18 21:28:55 +00:00
Guozhi Wei
4131403905 [DAGCombiner] Transform (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
If extload is legal, following transform
    (zext (select c, load1, load2)) -> (select c, zextload1, zextload2)
can save one ext instruction.

Differential Revision: https://reviews.llvm.org/D95086
2021-02-18 13:15:20 -08:00
Petr Hosek
8ce86e4200 Revert "[Coverage] Store compilation dir separately in coverage mapping"
This reverts commit 97ec8fa5bb07e3f5bf25ddcb216b545cd3d03b65 since
the test is failing on some bots.
2021-02-18 12:50:24 -08:00
Craig Topper
604f57507d [RISCV] Simplify VPseudoAMOEI multiclass. NFC
lmul was already iterated in one of the loops. We don't need to recreate
it from a string.
2021-02-18 12:40:51 -08:00
Stanislav Mekhanoshin
ce925aa3d2 [AMDGPU] Correct gfx90c feature list
Looks like we have forced FeatureXNACK and forgot
FeatureMadMacF32Insts.

Differential Revision: https://reviews.llvm.org/D96989
2021-02-18 12:40:27 -08:00
Petr Hosek
fd85f23c36 [Coverage] Store compilation dir separately in coverage mapping
We currently always store absolute filenames in coverage mapping.  This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines.  We are also
duplicating the path prefix potentially wasting space.

This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.

Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.

Differential Revision: https://reviews.llvm.org/D95753
2021-02-18 12:27:42 -08:00
Sam Powell
3bda5a764e [llvm][TextAPI] add equality operator for InterfaceFile
This patch adds functionality to compare for the equality between `InterfaceFile`s based on attributes specific to linking.

Reviewed By: cishida, steven_wu

Differential Revision: https://reviews.llvm.org/D96629
2021-02-18 11:53:08 -08:00
Wouter van Oortmerssen
c0b486e8f3 [WebAssembly] Fix assert in lookup of section symbols
Fixes assert in: https://bugs.llvm.org/show_bug.cgi?id=49036

getWasmSection creates sections if they don't exist, but doesn't add them to the Symbols table. This may cause problems in subsequent calls to getOrCreateSymbol which checks this table, the calls createSymbol assuming it doesn't exist, which then checks UsedNames and finds out it does exist, causing an assert on trying to rename a non-temp symbol.

I tried also fixing the somewhat unintuitive forced suffixing (adding `0`), but it turns out that WasmObjectWriter currently assumes these section symbols are unique, so that may have to be a separate fix: https://bugs.llvm.org/show_bug.cgi?id=49252

Also worth noting is that getWasmSection calling createSymbol may not be correct to start with, given that createSymbol seems to assume it is creating non-section symbols. But again, for a future fix.

Related: where some of this was introduced: 8d396acac3

Differential Revision: https://reviews.llvm.org/D96473
2021-02-18 11:50:14 -08:00
Jessica Clarke
55ebf0201e [RISCV] Use XLenRI alias for RegInfoByHwMode instances
This avoids tedious repetition and matches what we do for the
ValueTypeByHwMode uses.

Reviewed By: craig.topper, luismarques

Differential Revision: https://reviews.llvm.org/D96649
2021-02-18 19:38:36 +00:00
Ta-Wei Tu
67639b9007 [NPM] Properly reset parent loop after loop passes
This fixes https://bugs.llvm.org/show_bug.cgi?id=49185

When `NDEBUG` is not set, `LPMUpdater` checks if the added loops have the same parent loop as the current one in `addSiblingLoops`.
If multiple loop passes are executed through `LoopPassManager`, `U.ParentL` will be the same across all passes.
However, the parent loop might change after running a loop pass, resulting in assertion failures in subsequent passes.

This patch resets `U.ParentL` after running individual loop passes in `LoopPassManager`.

Reviewed By: asbirlea, ychen

Differential Revision: https://reviews.llvm.org/D96727
2021-02-19 02:50:53 +08:00
Sean Fertile
f52566451e [PowerPC][AIX] Add support for vector arg passing on the stack.
Enable passing more vector arguments then available vector
argument passing registers.

Differential Revision: https://reviews.llvm.org/D96415
2021-02-18 13:32:40 -05:00
Heejin Ahn
2e5cc35255 [WebAssembly] Handle multiple EH_LABELs in EH pad
Usually `EH_LABEL`s are placed in
- Before an `invoke` (which becomes calls in the backend)
- After an `invoke`
- At the start of an EH pad

I don't know exactly why, but I noticed there are cases of multiple, not
a single, `EH_LABEL` instructions in the beginning of an EH pad. In that
case `global.set` instruction placed to restore `__stack_pointer` ended
up between two `EH_LABEL` instructions before `CATCH`. It should follow
after the `EH_LABEL`s and `CATCH`. This CL fixes that case.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D96970
2021-02-18 10:18:00 -08:00
Jianzhou Zhao
5a41d9e657 [dfsan] Refactor defining TLS variables
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96941
2021-02-18 18:04:21 +00:00
Jianzhou Zhao
217067aceb [dfsan] Refactor runtime functions checking
This is a part of https://reviews.llvm.org/D95835.

Reviewed-by: morehouse

Differential Revision: https://reviews.llvm.org/D96940
2021-02-18 18:01:46 +00:00
Philip Reames
255056fb04 Fix a buildbot warning triggered by 1dfb06d 2021-02-18 09:37:49 -08:00
Craig Topper
5d21dba459 [RISCV] Add support for fixed vector MULHU/MULHS.
This uses to division by constant optimization to use MULHU/MULHS.

Reviewed By: frasercrmck, arcbbb

Differential Revision: https://reviews.llvm.org/D96934
2021-02-18 09:15:08 -08:00
Philip Reames
1a58ffcc4f [verify-regalloc] Verify after allocation and before postOptimization
I've now hit several cases where a mistake in the regalloc main loop caused corrupt live intervals that didn't get caught until either the next verify or during post-optimization.  The later case is rather confusing and tends to lead one down false trails, so let's catch corruption before that.
2021-02-18 09:10:50 -08:00
Craig Topper
c64c48d8d3 [RISCV] Add support for fixed vector sign/zero extend from mask types.
Due to vXi64 on RV32, I've directly emitted this using _VL ISD
opcodes. If it wasn't for that we could just use fixed vector
BUILD_VECTOR and VSELECT and let those each be legalized.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96910
2021-02-18 09:08:10 -08:00
Craig Topper
afd2389cd5 [RISCV] Support isel of scalable vector bitcasts
These should be NOPs so we can just replace with the input. This
matches what SVE does with isel patterns for all permutations.
Custom isel saves us from having to list all permurations for
all LMULs.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96921
2021-02-18 09:01:13 -08:00
Philip Reames
42c907ecd1 [splitkit] Add a minor wrapper function for readability [NFC] 2021-02-18 09:00:22 -08:00
Bradley Smith
e3dbaf1580 [AArch64][SVE] Add patterns to generate FMLA/FMLS/FNMLA/FNMLS/FMAD
Adjust generateFMAsInMachineCombiner to return false if SVE is present
in order to combine fmul+fadd into fma. Also add new pseudo instructions
so as to select the most appropriate of FMLA/FMAD depending on register
allocation.

Depends on D96599

Differential Revision: https://reviews.llvm.org/D96424
2021-02-18 16:55:16 +00:00
Craig Topper
5c5e25358e [TableGen][SelectionDAG] Improve efficiency of encoding negative immediates for isel's CheckInteger opcode.
CheckInteger uses an int64_t encoded using a variable width encoding
that is optimized for encoding a number with a lot of leading zeros.
Negative numbers have no leading zeros so use the largest encoding
requiring 9 bytes.

I believe its most like we want to check for positive and negative
numbers near 0. -1 is quite common due to its use in the 'not'
idiom.

To optimize for this, we can borrow an idea from the bitcode format
and move the sign bit to bit 0 with the magnitude stored in the
upper bits. This will drastically increase the number of leading
zeros for small magnitudes. Then we can run this value through
VBR encoding.

This gives a small reduction in the table size on all in tree
targets except VE where size increased by about 300 bytes due
to intrinsic ids now requiring 3 bytes instead of 2. Since the
intrinsic enum space is shared by all targets this an unfortunate
consquence of where VE is currently located in the range.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D96317
2021-02-18 08:53:17 -08:00
Bradley Smith
b97c64b15c [AArch64] Allow folding FMUL/FADD into FMA for FP16 types
isFMAFasterThanFMulAndFAdd should return true for FP16 types when
HasFullFP16 is present, since we have the instructions to handle it for
both SVE and NEON. (SVE patterns and tests will follow).

Differential Revision: https://reviews.llvm.org/D96599
2021-02-18 16:51:22 +00:00
Philip Reames
8258d5a8dd [regalloc] Add a couple of dump routines for ease of debugging [NFC] 2021-02-18 08:50:00 -08:00
Philip Reames
f8c9bdda53 [instcombine] Exploit UB implied by nofree attributes
This patch simply implements the documented UB of the current nofree attributes as specified. It doesn't try to be fancy about inference (yet), it just implements the cases already specified and inferred.

Note: When this lands, it may expose miscompiles. If so, please revert and provide a test case. It's likely the bug is in the existing inference code and without a relatively complete test case, it will be hard to debug.

Differential Revision: https://reviews.llvm.org/D96349
2021-02-18 08:34:22 -08:00
Hsiangkai Wang
200f7b3df8 [RISCV] Fix typo. Use ValueType instead of LLVMType. 2021-02-18 23:21:27 +08:00
David Green
df614e3ecd [ARM] Expand the range of allowed post-incs in load/store optimizer
Currently the load/store optimizer will only fold in increments of the
same size as the load/store. This patch expands that to any legal
immediate for the post-inc instruction.

Differential Revision: https://reviews.llvm.org/D95885
2021-02-18 14:59:02 +00:00
Paul C. Anagnostopoulos
5e8a4a10a2 Revert "[TableGen] Improve algorithms for processing template arguments"
This reverts commit e589207d5aaee6cbf1d7c7de8867a17727d14aca.
2021-02-18 09:26:26 -05:00
Baptiste Saleil
c98164419e [PowerPC] Exploit the vinsw, vinsd, and vins[wd][lr]x instructions on P10
This patch generates the vinsw, vinsd, vinsblx, vinshlx, vinswlx, vinsdlx,
vinsbrx, vinshrx, vinswrx and vinsdrx instructions for vector insertion on P10.

Differential Revision: https://reviews.llvm.org/D94454
2021-02-18 14:17:47 +00:00
Hsiangkai Wang
9c7d8b68c4 [RISCV] Fix bugs in pseudo instructions for masked segment load.
For masked segment load, the destination register should not overlap
with mask register. It could not be V0.

In the original implementation, there is no segment load/store register
class without V0. In this patch, I added these register classes and
modify `GetVRegNoV0` to get the correct one.

Differential Revision: https://reviews.llvm.org/D96937
2021-02-18 22:17:00 +08:00
Hsiangkai Wang
03906f4996 [NFC][RISCV] Use concise way to describe load/store instructions.
Differential Revision: https://reviews.llvm.org/D96923
2021-02-18 22:17:00 +08:00
Paul C. Anagnostopoulos
31c49eba49 [TableGen] Improve algorithms for processing template arguments
Rework template argument checking so that all arguments are type-checked
and cast if necessary.

Add a test.

Differential Revision: https://reviews.llvm.org/D96416
2021-02-18 09:15:26 -05:00
David Green
8f3770b197 [ARM] Ensure types provided to getIntrinsicCost are valid
It appears that pointer types were causing issues for the min/max cost
code in getIntrinsicInstrCost. This makes sure that when matching
icmp/select to a min/max, we only do that for normal int or float types.
2021-02-18 14:00:23 +00:00
Stefan Pintilie
0f683ca9ed [PowerPC] Add option for ROP Protection
Added -mrop-protection for Power PC to turn on codegen that provides some
protection from ROP attacks.

The option is off by default and can be turned on for Power 8, Power 9 and
Power 10.

This patch is for the option only. The feature will be implemented by a later
patch.

Reviewed By: amyk

Differential Revision: https://reviews.llvm.org/D96512
2021-02-18 12:15:50 +00:00
David Green
20c6684726 [ARM] Add larger than legal ICmp costs
A v8i32 compare will produce a v8i1 predicate, but during codegen the
v8i32 will be split into two v4i32, potentially requiring two v4i1
predicates to be merged into a single v8i1. Because this merging of two
v4i1's into a v8i1 is very expensive, we need to make the cost of the
compare equally high.

This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost.
Because we don't know whether the user of the predicate can be split,
and the cost model is mostly pre-instruction, we may be pessimistic but
that should only be for larger and legal types. This also adds min/max
detection to the costmodel where it can be detected, to keep those in
line with the cost of simple min/max instructions. Otherwise for the
most part, costs that were already expensive have become more expensive.

Differential Revision: https://reviews.llvm.org/D96692
2021-02-18 11:42:17 +00:00
Benjamin Kramer
d12587eb6c [RISCV] Rewrite assert to not give unused variable warnings in Release builds
NFCI
2021-02-18 11:42:36 +01:00
Fraser Cormack
b2ba90aa15 [RISCV] Begin to support more subvector inserts/extracts
This patch adds support for INSERT_SUBVECTOR and EXTRACT_SUBVECTOR
(nominally where both operands are scalable vector types) where the
vector, subvector, and index align sufficiently to allow decomposition
to subregister manipulation:

* For extracts, the extracted subvector must correctly align with the
lower elements of a vector register.
* For inserts, the inserted subvector must be at least one full vector
register, and correctly align as above.

This approach should work for fixed-length vector insertion/extraction
too, but that will come later.

Reviewed By: craig.topper, khchen, arcbbb

Differential Revision: https://reviews.llvm.org/D96873
2021-02-18 10:18:27 +00:00
Fraser Cormack
0099d58a88 [SVE][CodeGen] Expand SVE MULH[SU] and [SU]MUL_LOHI nodes
This patch fixes a codegen crash introduced in fde24661718c, where the
DAGCombiner started generating optimized MULH[SU] or [SU]MUL_LOHI nodes
unless the target opted out. The AArch64 backend cannot currently select
any of these nodes, so ensure that they are not generated in the first
place.

This issue was raised by @huihuiz in D94501.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D96849
2021-02-18 10:06:24 +00:00
Djordje Todorovic
78cda129f4 Revert "[Debugify] Make the debugify aware of the original (-g) Debug Info"
This reverts rG8ee7c7e02953.
One test is failing, I'll reland this as soon as possible.
2021-02-18 02:04:27 -08:00