The RTDyldMemoryManager::getSymbolAddressInProcess method accepts a
linker-mangled symbol name, but it calls through to dlsym to do the lookup (via
DynamicLibrary::SearchForAddressOfSymbol), and dlsym expects an unmangled
symbol name.
Historically we've attempted to "demangle" by removing leading '_'s on all
platforms, and fallen back to an extra search if that failed. That's broken, as
it can cause symbols to resolve incorrectly on platforms that don't do mangling
if you query '_foo' and the process also happens to contain a 'foo'.
Fix this by demangling conditionally based on the host platform. That's safe
here because this function is specifically for symbols in the host process, so
the usual cross-process JIT looking concerns don't apply.
M unittests/ExecutionEngine/ExecutionEngineTest.cpp
M lib/ExecutionEngine/RuntimeDyld/RTDyldMemoryManager.cpp
llvm-svn: 262657
This experiment was originally about trying to use facts implied dominating conditions to infer more precise known bits. While the compile time was found to be acceptable on several large code bases, we never found sufficiently profitable examples to justify turning on the code by default. Given this, it's time to abandon the experiment.
Several folks have commented that they've found this useful for experimentation, but nothing has come of those experiments. Given how easy the patch is to apply, there's no reason to leave the code in tree.
For anyone interested in further investigation in this area, I recommend finding the summary email I sent on one of the original review threads. In particular, I now believe the use-list based approach is strictly worse than the dom-tree-walking approach.
llvm-svn: 262646
Given that we're not actually reducing the instruction count in the included
regression tests, I think we would call this a canonicalization step.
The motivation comes from the example in PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702
If we hoist the bitwise logic ahead of the bitcast, the previously unoptimizable
example of:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
%not = xor <4 x i32> %lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%bc = bitcast <4 x i32> %not to <2 x i64>
%notnot = xor <2 x i64> %bc, <i64 -1, i64 -1>
%bc2 = bitcast <2 x i64> %notnot to <4 x i32>
ret <4 x i32> %bc2
}
Simplifies to the expected:
define <4 x i32> @is_negative(<4 x i32> %x) {
%lobit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31>
ret <4 x i32> %lobit
}
Differential Revision: http://reviews.llvm.org/D17583
llvm-svn: 262645
After r262438 we can have provably positive NSW SCEV expressions whose
zero extensions cannot be simplified (since r262438 makes SCEV better at
computing constant ranges). This means demoting sexts of positive add
recurrences eagerly can result in an unsimplified zero extension where
we could have had a simplified sign extension. This change fixes the
issue by teaching SCEV to demote sext of a positive SCEV expression to a
zext only if the sext could not be simplified.
llvm-svn: 262638
This patch provides the following infrastructure for PGO enhancements in inliner:
Enable the use of block level profile information in inliner
Incremental update of block frequency information during inlining
Update the function entry counts of callees when they get inlined into callers.
Differential Revision: http://reviews.llvm.org/D16381
llvm-svn: 262636
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.
This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.
Differential Revision: http://reviews.llvm.org/D17681
llvm-svn: 262635
Summary: With discriminator, LineLocation can uniquely identify a callsite without the need to specifying callee name. Remove Callee function name from the key, and put it in the value (FunctionSamples).
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17827
llvm-svn: 262634
Fixed the ordering to check first for X86 interrupt handler then for MCU target.
Differential Revision: http://reviews.llvm.org/D17801
llvm-svn: 262628
That's not the case for VPERMV/VPERMV3, which cover all possible
combinations (the C intrinsics use a different order; the AVX vs
AVX512 intrinsics are different still).
Since:
r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.
This breaks assumptions in most callers, as they expect
the non-mask operands to start at index 0.
VPERMV has the mask as operand #0; VPERMV3 has it in the middle.
Instead of the faulty assumption, have getTargetShuffleMask return
its operands as well.
One alternative we considered was to change the operand order of
VPERMV, but we agreed to stick to the instruction order, as there
are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular).
Differential Revision: http://reviews.llvm.org/D17041
llvm-svn: 262627
The vectorization of first-order recurrences (r261346) caused PR26734. When
detecting these recurrences, we need to ensure that the previous value is
actually defined inside the loop. This patch includes the fix and test case.
llvm-svn: 262624
This test failed in some ARM bots after a divmod change because it was
running on a native llc, instead of targeted one. This makes sure the test
is target-specific (as intended), and also copies to ARM and AArch64
directories. If it is also supposed to work on other architectures, I'll
leave as an exercise to the respective maintainers.
llvm-svn: 262620
Summary: Adds the 'avr_intrcc' and 'avr_signalcc' IR calling convention tokens to the parser.
Reviewers: arsenm
Subscribers: dylanmckay, llvm-commits
Differential Revision: http://reviews.llvm.org/D16348
llvm-svn: 262600
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 262599
Summary:
Removing MMOs is not our prefer behavior any more.
Reviewers: mcrosier, reames
Differential Revision: http://reviews.llvm.org/D17668
llvm-svn: 262580
Patch by: Konstantin Zhuravlyov
Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.
Reviewers: tstellarAMD, arsenm
Subscribers: echristo, dblaikie, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17454
llvm-svn: 262579
Summary:
When there were no free SGPRs, we were trying to move this value into
some of the reserved registers which was causing a segmentation fault.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17590
llvm-svn: 262577
The code was previously not able to track a boolean argument
at a call site back to the formal argument of the caller.
Differential Revision: http://reviews.llvm.org/D17786
llvm-svn: 262575
If we have a loop with a rarely taken path, we will prune that from the blocks which get added as part of the loop chain. The problem is that we weren't then recognizing the loop chain as schedulable when considering the preheader when forming the function chain. We'd then fall to various non-predecessors before finally scheduling the loop chain (as if the CFG was unnatural.) The net result was that there could be lots of garbage between a loop preheader and the loop, even though we could have directly fallen into the loop. It also meant we separated hot code with regions of colder code.
The particular reason for the rejection of the loop chain was that we were scanning predecessor of the header, seeing the backedge, believing that was a globally more important predecessor (true), but forgetting to account for the fact the backedge precessor was already part of the existing loop chain (oops!.
Differential Revision: http://reviews.llvm.org/D17830
llvm-svn: 262547
Catch objects with a displacement of zero do not initialize a catch
object. The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.
If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.
Address this by creating our catch objects as fixed objects. We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.
Differential Revision: http://reviews.llvm.org/D17823
llvm-svn: 262546
Summary: This is the last step toward supporting aggregate memory access in instcombine. This explodes stores of arrays into a serie of stores for each element, allowing them to be optimized.
Reviewers: joker.eph, reames, hfinkel, majnemer, mgrang
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17828
llvm-svn: 262530
Summary: This is another step toward improving fca support. This unpack load of array in a series of load to array's elements.
Reviewers: chandlerc, joker.eph, majnemer, reames, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15890
llvm-svn: 262521