llvm::huge_valf is defined in a header file, so it is initialized
multiple times in every compiled unit upon program startup.
With non-VC compilers huge_valf is set to a HUGE_VALF which the
compiler can probably optimize out.
With VC numeric_limits<float>::infinity() does not return a number
but a runtime structure member which therotically may change
between calls so the compiler does not optimize out the
initialization and it happens many times. It can be easily seen by
placing a breakpoint on the initialization line.
This patch moves llvm::huge_valf initialization to a source file
instead of the header.
llvm-svn: 218567
I spotted this by inspection when debugging something else, so I have no
test case what-so-ever, and am not even sure it is possible to
realistically trigger the bug. But this is what was intended here.
llvm-svn: 218565
and in the target shuffle combining when trying to widen vector
elements.
Previously only one of these was correct, and we didn't correctly
propagate zeroing target shuffle masks (which have a different sentinel
value from undef in non- target shuffle masks now). This isn't just
a missed optimization, this caused us to drop zeroing shuffles on the
floor and miscompile code. The added test case is one example of that.
There are other fixes to the test suite as a consequence of this as well
as restoring the undef elements in some of the masks that were lost when
I brought sanity to the actual *value* of the undef and zero sentinels.
I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW
combining code, but that code really needs to go. It was a nice initial
attempt, but it isn't very principled and the recursive shuffle combiner
is much more powerful.
llvm-svn: 218562
to significantly more sane sentinels. Notably, everywhere else in the
backend's representation of shuffles uses '-1' to represent undef. The
target shuffle masks really shouldn't diverge from that, especially as
in a few places they are manipulated by shared code.
This causes us to lose some undef lanes in various test masks. I want to
get these back, but technically it isn't invalid and there are a *lot*
of bugs here so I want to try to establish a saner baseline for fixing
some of the bugs by aligning the specific senitnel values used.
llvm-svn: 218561
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
And:
z = y / x
into:
z = y * rcpe(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.
Differential Revision: http://reviews.llvm.org/D5484
llvm-svn: 218553
Users of getSectionContents shouldn't try to pass in BSS or virtual
sections. In all instances, this is a bug in the code calling this
routine.
N.B. Some COFF implementations (like CL) will mark their BSS sections as
taking space on disk. This would confuse COFFObjectFile into thinking
the section is larger than the file.
llvm-svn: 218549
So in fully linked images when a call is made through a stub it now gets a
comment like the following in the disassembly:
callq 0x100000f6c ## symbol stub for: _printf
indicating the call is to a symbol stub and which symbol it is for. This is
done for branch reference types and seeing if the branch target is in a stub
section and if so using the indirect symbol table entry for that stub and
using that symbol table entries symbol name.
llvm-svn: 218546
files in this directory. If it should be defined anywhere, it should be defined
when building lib/LTO/LTOCodeGenerator.cpp, but we've not had it defined there
for quite some time, so that doesn't really seem to be very important. (It also
would slow down the modules build by creating extra module variants.)
llvm-svn: 218544
lldb sets the variable SHARED_LIBRARY to 1, which breaks this conditional,
because older versions of CMake interpret
if ("${t}" STREQUAL "SHARED_LIBRARY")
as meaning
if ("${t}" STREQUAL "1")
in this case. Change the conditional so it does the right thing with both old
and new CMakes.
llvm-svn: 218542
that managed to elude all of my fuzz testing historically. =/
Something changed to allow this code path to actually be exercised and
it was doing bad things. It is especially heavily exercised by the
patterns that emerge when doing AVX shuffles that end up lowered through
the 128-bit code path.
llvm-svn: 218540
This has weird operand requirements so it's worthwhile
to have very strict checks for its operands.
Add different combinations of SGPR operands.
llvm-svn: 218535
Instead of moving the first SGPR that is different than the first,
legalize the operand that requires the fewest moves if one
SGPR is used for multiple operands.
This saves extra moves and is also required for some instructions
which require that the same operand be used for multiple operands.
llvm-svn: 218532
Disable the SGPR usage restriction parts of the DAG legalizeOperands.
It now should only be doing immediate folding until it can be replaced
later. The real legalization work is now done by the other
SIInstrInfo::legalizeOperands
llvm-svn: 218531
The base implementation of commuteInstruction is used
in some cases, but it turns out this has been broken for a
long time since modifiers were inserted between the real operands.
The base implementation of commuteInstruction also fails on immediates,
which also needs to be fixed.
llvm-svn: 218530
e.g. v_cndmask_b32 requires the condition operand be an SGPR.
If one of the source operands were an SGPR, that would be considered
the one SGPR use and the condition operand would be illegally moved.
llvm-svn: 218529
This needs a test, but I'm not sure if it is currently possible and
I originally hit it due to a bug. Right now the only global address
operands have no reason to be VALU instructions, although it
theoretically could be a problem.
llvm-svn: 218528
No test since the current SIISelLowering::legalizeOperands
effectively hides this, and the general uses seem to only fire
on SALU instructions which don't have modifiers between
the operands.
When trying to use legalizeOperands immediately after
instruction selection, it now sees a lot more patterns
it did not see before which break on this.
llvm-svn: 218527
No tests hit this, and I don't see any way a GlobalAddress
node would survive beyond lowering on SI. It it would, the
move should probably be inserted by selection.
llvm-svn: 218526
The annotation instructions are dropped during codegen and have no
impact on size. In some cases, the annotations were preventing the
unroller from unrolling a loop because the annotation calls were
pushing the cost over the unrolling threshold.
Differential Revision: http://reviews.llvm.org/D5335
llvm-svn: 218525
layer of tie-breaking sorting, it really helps to check that you're in
a tie first. =] Otherwise the whole thing cycles infinitely. Test case
added, another one found through fuzz testing.
llvm-svn: 218523
AVX support.
New test cases included. Note that none of the existing test cases
covered these buggy code paths. =/ Also, it is clear from this that
SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[
These were all detected by fuzz-testing. (I <3 fuzz testing.)
llvm-svn: 218522
This patch makes the ARM backend transform 3 operand instructions such as
'adds/subs' to the 2 operand version of the same instruction if the first
two register operands are the same.
Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'.
Currently for some instructions such as 'adds' if you try to assemble
'adds r0, r0, #8' for thumb v6m the assembler would throw an error message
because the immediate cannot be encoded using 3 bits.
The backend should be smart enough to transform the instruction to
'adds r0, #8', which allows for larger immediate constants.
Patch by Ranjeet Singh.
llvm-svn: 218521
The SSE rsqrt instruction (a fast reciprocal square root estimate) was
grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very
slow) SSE sqrt instruction. For code which uses rsqrt (possibly with
newton-raphson iterations) this poor scheduling was affecting performances.
This patch splits off the rsqrt instruction from the sqrt instruction scheduling
classes and creates new IIC_SSE_RSQER* classes with latency values based on
Agner's table.
Differential Revision: http://reviews.llvm.org/D5370
Patch by Simon Pilgrim.
llvm-svn: 218517
This reverts commit r218513.
Buildbots using libstdc++ issue an error when trying to copy
SmallVector<std::unique_ptr<>>. Revert the commit until we have a fix.
llvm-svn: 218514
Summary:
There will be multiple TypeUnits in an unlinked object that will be extracted
from different sections. Now that we have DWARFUnitSection that is supposed
to represent an input section, we need a DWARFUnitSection<TypeUnit> per
input .debug_types section.
Once this is done, the interface is homogenous and we can move the Section
parsing code into DWARFUnitSection.
Reviewers: samsonov, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5482
llvm-svn: 218513
Summary:
This will allow us to handle f128 arguments without duplicating code from
CCState::AnalyzeFormalArguments() or CCState::AnalyzeCallOperands().
No functional change.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5292
llvm-svn: 218509