Very minor change aiming to make it easier to extend the script
downstream to support non-llc, but llc-like tools. The main objective is
to decrease the probability of merge conflicts.
llvm-svn: 372276
Summary:
Large slowdowns were observed in Rust due to many small, constant
sized copies in conjunction with poorly-optimized memory.copy
implementations. Since memory.copy cannot be expected to be inlined
efficiently by engines at this time, stop using it for the smallest
copies. We continue to lower all memcpy intrinsics to memory.copy,
though.
Reviewers: aheejin, alexcrichton
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, JDevlieghere, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67639
llvm-svn: 372275
Since we now lower most tail calls, it makes sense to support musttail.
Instead of always falling back to SelectionDAG, only fall back when a musttail
call was not able to be emitted as a tail call. Once we can handle most
incoming and outgoing arguments, we can change this to a `report_fatal_error`
like in ISelLowering.
Remove the assert that we don't have varargs and a musttail, and replace it
with a return false. Implementing this requires that we implement
`saveVarArgRegisters` from AArch64ISelLowering, which is an entirely different
patch.
Add GlobalISel lines to vararg-tallcall.ll to make sure that we produce correct
code. Right now we only fall back, but eventually this will be relevant.
Differential Revision: https://reviews.llvm.org/D67681
llvm-svn: 372273
DIFlagBlockByRefStruct is an unused DIFlag that originally was used by
clang to express (Objective-)C block captures in debug info. For the
last year Clang has been emitting complex DIExpressions to describe
block captures instead, which makes all the code supporting this flag
redundant.
This patch removes the flag and all supporting "dead" code, so we can
reuse the bit for something else in the future.
Since this only affects debug info generated by Clang with the block
extension this mostly affects Apple platforms and I don't have any
bitcode compatibility concerns for removing this. The Verifier will
reject debug info that uses the bit and thus degrade gracefully when
LTO'ing older bitcode with a newer compiler.
rdar://problem/44304813
Differential Revision: https://reviews.llvm.org/D67453
llvm-svn: 372272
Summary:
Add function to AutoUpgrade to change the datalayout of old X86 datalayout strings.
This adds "-p270:32:32-p271:32:32-p272:64:64" to X86 datalayouts that are otherwise valid
and don't already contain it.
This also removes the compatibility changes in https://reviews.llvm.org/D66843.
Datalayout change in https://reviews.llvm.org/D64931.
Reviewers: rnk, echristo
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67631
llvm-svn: 372267
MSAN bot complains that there is use-of-uninitialized-value
of this FreeStores later in IsWorthwhile().
Perhaps FreeStores needs to be stored in a vector?
llvm-svn: 372262
Summary:
`DAGCombiner::visitADDLikeCommutative()` already has a sibling fold:
`(add X, Carry) -> (addcarry X, 0, Carry)`
This fold, as suggested by @efriedma, helps recover from //some//
of the regressions of D62266
Reviewers: efriedma, deadalnix
Subscribers: javed.absar, kristof.beyls, llvm-commits, efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62392
llvm-svn: 372259
Summary:
I don't have a direct motivational case for this,
but it would be good to have this for completeness/symmetry.
This pattern is basically the motivational pattern from
https://bugs.llvm.org/show_bug.cgi?id=43251
but with different predicate that requires that the offset is non-zero.
The completeness bit comes from the fact that a similar pattern (offset != zero)
will be needed for https://bugs.llvm.org/show_bug.cgi?id=43259,
so it'd seem to be good to not overlook very similar patterns..
Proofs: https://rise4fun.com/Alive/21b
Also, there is something odd with `isKnownNonZero()`, if the non-zero
knowledge was specified as an assumption, it didn't pick it up (PR43267)
With this, i see no other missing folds for
https://bugs.llvm.org/show_bug.cgi?id=43251
Reviewers: spatel, nikic, xbolva00
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67412
llvm-svn: 372257
Summary:
AArch64 GlobalISel doesn't support MachO's large code model, so this patch
adds a check for that combination before implicitly enabling it.
Reviewers: paquette
Subscribers: kristof.beyls, ributzka, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67724
llvm-svn: 372256
Summary:
As it can be see in the changed test, while `div` is really costly,
we were speculating it. This does not seem correct.
Also, the old code would run for every single insturuction in BB,
instead of eagerly bailing out as soon as there are too many instructions.
This function still has a problem that `PHINodeFoldingThreshold` is
per-basic-block, while it should be for all the basic blocks.
Reviewers: efriedma, craig.topper, dmgreen, jmolloy
Reviewed By: jmolloy
Subscribers: xbolva00, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67315
llvm-svn: 372255
Summary:
As discussed in https://reviews.llvm.org/D62341#1515637,
for MIPS `add %x, -1` isn't optimal. Unlike X86 there
are no fastpaths to matearialize such `-1`/`1` vector constants,
and `sub %x, 1` results in better codegen,
so undo canonicalization
Reviewers: atanasyan, Petar.Avramovic, RKSimon
Reviewed By: atanasyan
Subscribers: sdardis, arichardson, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66805
llvm-svn: 372254
In case of using 32-bit GOT access to the table requires two instructions
with attached %got_hi and %got_lo relocations. This patch implements
correct expansion of 'lw/sw' instructions in that case.
Differential Revision: https://reviews.llvm.org/D67705
llvm-svn: 372251
For patterns c/d/e we too can deal with the pattern even if we can't
just drop the mask, we can just apply it afterwars:
https://rise4fun.com/Alive/gslRa
llvm-svn: 372244
Summary:
Also fixup rL371928 for cases that occur on our out-of-tree backend
There were still quite a few intermediate APInts and this caused the
compile time of MCCodeEmitter for our target to jump from 16s up to
~5m40s. This patch, brings it back down to ~17s by eliminating pretty
much all of them using two new APInt functions (extractBitsAsZExtValue(),
insertBits() but with a uint64_t). The exact conditions for eliminating
them is that the field extracted/inserted must be <=64-bit which is
almost always true.
Note: The two new APInt API's assume that APInt::WordSize is at least
64-bit because that means they touch at most 2 APInt words. They
statically assert that's true. It seems very unlikely that someone
is patching it to be smaller so this should be fine.
Reviewers: jmolloy
Reviewed By: jmolloy
Subscribers: hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67686
llvm-svn: 372243
Summary:
This is the first patch in a series of patches that will implement data dependence graph in LLVM. Many of the ideas used in this implementation are based on the following paper:
D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe (1981). DEPENDENCE GRAPHS AND COMPILER OPTIMIZATIONS.
This patch contains support for a basic DDGs containing only atomic nodes (one node for each instruction). The edges are two fold: def-use edges and memory-dependence edges.
The implementation takes a list of basic-blocks and only considers dependencies among instructions in those basic blocks. Any dependencies coming into or going out of instructions that do not belong to those basic blocks are ignored.
The algorithm for building the graph involves the following steps in order:
1. For each instruction in the range of basic blocks to consider, create an atomic node in the resulting graph.
2. For each node in the graph establish def-use edges to/from other nodes in the graph.
3. For each pair of nodes containing memory instruction(s) create memory edges between them. This part of the algorithm goes through the instructions in lexicographical order and creates edges in reverse order if the sink of the dependence occurs before the source of it.
Authored By: bmahjour
Reviewer: Meinersbur, fhahn, myhsu, xtian, dmgreen, kbarton, jdoerfert
Reviewed By: Meinersbur, fhahn, myhsu
Subscribers: ychen, arphaman, simoll, a.elovikov, mgorny, hiraditya, jfb, wuzish, llvm-commits, jsji, Whitney, etiotto
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D65350
llvm-svn: 372238
is enabled.
We can save memory and reduce binary size significantly by enabling
ProfileSampleAccurate. However when ProfileSampleAccurate is true,
function without sample will be regarded as cold and this could
potentially cause performance regression.
To minimize the potential negative performance impact, we want to be
a little conservative here saying if a function shows up in the profile,
no matter as outline instance, inline instance or call targets, treat
the function as not being cold. This will handle the cases such as most
callsites of a function are inlined in sampled binary (thus outline copy
don't get any sample) but not inlined in current build (because of source
code drift, imprecise debug information, or the callsites are all cold
individually but not cold accumulatively...), so that the outline function
showing up as cold in sampled binary will actually not be cold after current
build. After the change, such function will be treated as not cold even
profile-sample-accurate is enabled.
At the same time we lower the hot criteria of callsiteIsHot check when
profile-sample-accurate is enabled. callsiteIsHot is used to determined
whether a callsite is hot and qualified for early inlining. When
profile-sample-accurate is enabled, functions without profile will be
regarded as cold and much less inlining will happen in CGSCC inlining pass,
so we can worry less about size increase and be aggressive to allow more
early inlining to happen for warm callsites and it is helpful for performance
overall.
Differential Revision: https://reviews.llvm.org/D67561
llvm-svn: 372232
Summary:
This fixes B42473 and B42706.
This patch makes the SDA propagate branch divergence until the end of the RPO traversal. Before, the SyncDependenceAnalysis propagated divergence only until the IPD in rpo order. RPO is incompatible with post dominance in the presence of loops. This made the SDA crash because blocks were missed in the propagation.
Reviewers: foad, nhaehnle
Reviewed By: foad
Subscribers: jvesely, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65274
llvm-svn: 372223
We need "xgot" flag in the MipsAsmParser to implement correct expansion
of some pseudo instructions in case of using 32-bit GOT (XGOT).
MipsAsmParser does not have reference to MipsSubtarget but has a
reference to "feature bit set".
llvm-svn: 372220
Fixes quoting of profile arguments to work on Windows
Suppresses adding profile arguments to linker flags when using lld-link
Avoids -fprofile-instr-use being added to rc.exe flags
Removes duplicated adding of -fprofile-instr-use to linker flags (since
r355541)
Move handling LLVM_PROFDATA_FILE to HandleLLVMOptions.cmake
Differential Revision: https://reviews.llvm.org/D62063
llvm-svn: 372209
This patch fixes a bug exposed by D65653 where a subsequent invocation
of `determineCalleeSaves` ends up with a different size for the callee
save area, leading to different frame-offsets in debug information.
In the invocation by PEI, `determineCalleeSaves` tries to determine
whether it needs to spill an extra callee-saved register to get an
emergency spill slot. To do this, it calls 'estimateStackSize' and
manually adds the size of the callee-saves to this. PEI then allocates
the spill objects for the callee saves and the remaining frame layout
is calculated accordingly.
A second invocation in LiveDebugValues causes estimateStackSize to return
the size of the stack frame including the callee-saves. Given that the
size of the callee-saves is added to this, these callee-saves are counted
twice, which leads `determineCalleeSaves` to believe the stack has
become big enough to require spilling an extra callee-save as emergency
spillslot. It then updates CalleeSavedStackSize with a larger value.
Since CalleeSavedStackSize is used in the calculation of the frame
offset in getFrameIndexReference, this leads to incorrect offsets for
variables/locals when this information is recalculated after PEI.
Reviewers: omjavaid, eli.friedman, thegameg, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D66935
llvm-svn: 372204
Summary:
The latter is slightly more efficient and communicates the intent of the
API: writeFileAtomically does not own or copy the callback, it merely
calls it at some point.
Reviewers: jkorous
Reviewed By: jkorous
Subscribers: hiraditya, dexonsmith, jfb, llvm-commits, cfe-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67584
llvm-svn: 372201
This generates worse code, but matches what is done for avx2 and
prevents crashes when more arguments are passed than we have
registers for.
llvm-svn: 372200