Summary:
The commit rL348922 introduced a means to set Metadata
section kind for a global variable, if its explicit section
name was prefixed with ".AMDGPU.metadata.".
This patch changes that prefix to ".AMDGPU.comment.",
as "metadata" in the section name might lead to
ambiguity with metadata used by AMD PAL runtime.
Change-Id: Idd4748800d6fe801441d91595fc21e5a4171e668
Reviewers: kzhuravl
Reviewed By: kzhuravl
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D56197
llvm-svn: 350292
A DBG_VALUE between a two-address instruction and a following COPY
would prevent rescheduleMIBelowKill optimization inside
TwoAddressInstructionPass.
Differential Revision: https://reviews.llvm.org/D55987
llvm-svn: 350289
There can be multiple local symbols with the same name (for e.g.
comdat sections), and thus the symbol name itself isn't enough
to disambiguate symbols.
Differential Revision: https://reviews.llvm.org/D56140
llvm-svn: 350288
The test cases are constructed to avoid folding the AND into a masked compare operation.
Currently we emit a KAND and a KORTEST for these cases.
llvm-svn: 350287
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.
This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.
With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:
x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%
And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved.
Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810
llvm-svn: 350285
OptimizeAutoreleaseRVCall skips optimizing llvm.objc.autoreleaseReturnValue if it
sees a user which is llvm.objc.retainAutoreleasedReturnValue, and if they have
equivalent arguments (either identical or equivalent PHIs). It then assumes that
ObjCARCOpt::OptimizeRetainRVCall will optimize the pair instead.
Trouble is, ObjCARCOpt::OptimizeRetainRVCall doesn't know about equivalent PHIs
so optimizes in a different way and we are left with an unoptimized llvm.objc.autoreleaseReturnValue.
This teaches ObjCARCOpt::OptimizeRetainRVCall to also understand PHI equivalence.
rdar://problem/47005143
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D56235
llvm-svn: 350284
Calculate which item is being held and then display it with the appropriate type. We also
optimize the display of PointerUnion3 to take advantage of our knowing that the IntMask is
always 1 in PointerUnion types
llvm-svn: 350280
Summary: Add read[only|write] PIC relocation models to the C API and teach the TargetMachine API about it.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D56187
llvm-svn: 350279
Summary:
Sometimes it's useful to emit assembly after LTO stage to modify it manually. Emitting precodegen bitcode file (via save-temps plugin option) and then feeding it to llc doesn't always give the same binary as original.
This patch is simpler alternative to https://reviews.llvm.org/D24020.
Patch by Denis Bakhvalov.
Reviewers: mehdi_amini, tejohnson
Reviewed By: tejohnson
Subscribers: MaskRay, inglorion, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D56114
llvm-svn: 350276
Summary:
This was previously ignored and an incorrect value generated.
Also fixed Disassembler's handling of block_type.
Reviewers: dschuff, aheejin
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D56092
llvm-svn: 350270
Summary:
Alias can make one (but not all) live, we still need to scan all others if this symbol is reachable
from somewhere else.
Reviewers: tejohnson, grimar
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D56117
llvm-svn: 350269
The caller to EraseInstruction had this conditional:
// ARC calls with null are no-ops. Delete them.
if (IsNullOrUndef(Arg))
but the assert inside EraseInstruction only allowed ConstantPointerNull and not
undef or bitcasts.
This adds support for both of these cases.
rdar://problem/47003805
llvm-svn: 350261
If an instruction has no demanded bits, remove it directly during BDCE,
instead of leaving it for something else to clean up.
Differential Revision: https://reviews.llvm.org/D56185
llvm-svn: 350257
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.
This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.
I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.
This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.
Differential Revision: https://reviews.llvm.org/D55975
llvm-svn: 350245
Sometimes it's useful to be able to output demangled names without
tag specifiers like "struct", "class", etc. This patch adds a
flag enabling this.
llvm-svn: 350241
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.
Improves the test case from PR38217. There may be additional opportunities after this.
Differential Revision: https://reviews.llvm.org/D56145
llvm-svn: 350239
Also add a fuzzer() template for defining fuzzers that's similar to
add_llvm_fuzzer in the CMake build, and a build file for dependency
llvm/lib/FuzzMutate.
Also make `assert(defined(...` error strings a bit more self-consistent.
Differential Revision: https://reviews.llvm.org/D56194
llvm-svn: 350238
Adding MC regressions tests to cover the XOP isa set.
This patch is part of a larger task to cover MC encoding of all X86 isa sets started in revision: https://reviews.llvm.org/D39952
Differential Revision: https://reviews.llvm.org/D41392
llvm-svn: 350237
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.
By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.
This fixes the regression on X86 in D56156.
Differential Revision: https://reviews.llvm.org/D56176
llvm-svn: 350236
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.
The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.
Differential Revision: https://reviews.llvm.org/D56156
llvm-svn: 350235
These two plugins are loaded into a host process that contains all LLVM
symbols, so they don't link against anything. This required minor readjustments
to the tablegen() setup of IR.
Needed for check-llvm.
Differential Revision: https://reviews.llvm.org/D56204
llvm-svn: 350234
Also add build files for dependencies llvm/lib/ExecutionEngine/{Interpreter,Orc}
Needed for check-llvm.
Differential Revision: https://reviews.llvm.org/D56193
llvm-svn: 350226
PPCPreEmitPeephole pass.
PPCPreEmitPeephole will convert a BC to B when the conditional branch is
based on a constant CR by CRSET or CRUNSET. This is added in
https://reviews.llvm.org/rL343100.
When the conditional branch is known to be always taken, all branches will
be removed and a new unconditional branch will be inserted. However, when
SeenUse is false the original patch will not remove the branches, but still
insert the new unconditional branch, update the successors and create
inconsistent IR. Compiling the synthetic testcase included can show the
problem we run into.
The patch simply removes the SeenUse condition when adding branches into
InstrsToErase set.
Differential Revision: https://reviews.llvm.org/D56041
llvm-svn: 350223
Peek through shift modulo masks while matching double shift patterns.
I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now.
Differential Revision: https://reviews.llvm.org/D56199
llvm-svn: 350222
Motivated by the discussion in D38499, this patch updates BasicAA to support
arbitrary pointer sizes by switching most remaining non-APInt calculations to
use APInt. The size of these APInts is set to the maximum pointer size (maximum
over all address spaces described by the data layout string).
Most of this translation is straightforward, but this patch contains a fix for
a bug that revealed itself during this translation process. In order for
test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit
pointers, the intermediate calculations must be performed using 64-bit
integers. This is because, as noted in the patch, when GetLinearExpression
decomposes an expression into C1*V+C2, and we then multiply this by Scale, and
distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even
through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can
overflow. If this happens, later logic will draw invalid conclusions from the
(base) offset value. Thus, when initially applying the APInt conversion,
because the maximum pointer size in this test is 32 bits, it started failing.
Suspicious, I created a 64-bit version of this test (included here), and that
failed (miscompiled) on trunk for a similar reason (the multiplication can
overflow).
After fixing this overflow bug, the first test case (at least) in
Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was
relying on having 64-bit intermediate values to have BasicAA return an accurate
result. In order to fix this problem, and because I believe that it is not
uncommon to use i64 indexing expressions in 32-bit code (especially portable
code using int64_t), it seems reasonable to always use at least 64-bit
integers. In this way, we won't regress our analysis capabilities (and there's
a command-line option added, so experimenting with this should be easy).
As pointed out by Eli during the review, there are other potential overflow
conditions that this patch does not address. Fixing those is left to follow-up
work.
Patch by me with contributions from Michael Ferguson (mferguson@cray.com).
Differential Revision: https://reviews.llvm.org/D38662
llvm-svn: 350220
GlobalVariable
Summary:
Extend Module::getOrInsertGlobal to accept a callback for creating a new
GlobalVariable if necessary instead of calling the GV constructor
directly using default arguments. Additionally overload
getOrInsertGlobal for the previous default behavior.
Reviewers: chandlerc
Subscribers: hiraditya, llvm-commits, bollu
Differential Revision: https://reviews.llvm.org/D56130
llvm-svn: 350219
Common code used by the default resource strategy to select pipeline resources
has been moved to an helper function.
The new selection logic has been slightly rewritten to get rid of a redundant
zero check on the `ReadyMask` value. Before this patch, method select internally
called function `PowerOf2Floor` to compute the next ready pipeline resource.
However, `PowerOf2Floor` forces an implicit (redundant) zero check on the input
value. By construction, `ReadyMask` can never be zero. This patch replaces the
call to `PowerOf2Floor` with an equivalent block of code which avoids the
redundant zero check. This gives a minor 3-3.5% speedup on a release build.
No functional change intended.
llvm-svn: 350218
Needed for check-llvm.
This is the last target reading llvm_install_binutils_symlinks.
Differential Revision: https://reviews.llvm.org/D56190
llvm-svn: 350215
Also add build file for dependency llvm/lib/XRay.
Needed for check-llvm.
(yaml-bench is an llvm/util, not an llvm/tool.)
Differential Revision: https://reviews.llvm.org/D56163
llvm-svn: 350211
All of these use custom isel so we can pretty easily detect the differences in the custom code in X86ISelDAGToDAG. The ISD opcodes just need to express the desired semantics not the details of how they would be selected by isel. So unifying them lets us remove the special casing from lowering.
llvm-svn: 350206
These require a different X86ISD node to be created than i16/i32/i64. I guess no one wanted to add the special code for that except in LowerXALUO. But now LowerXALUO, LowerSELECT, and LowerBRCOND all use a common helper function so they all share the special code.
Unfortunately, there are no test changes because we seem to correct the miss in a DAG combine later. I did verify it manually using test cases from xmulo.ll
llvm-svn: 350205