PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.
llvm-svn: 272477
These are byte shift instructions and it will make shuffle combining a lot more straightforward if we can assume a vXi8 vector of bytes so decoded shuffle masks match the return type's number of elements
llvm-svn: 272468
This shouldn't have any functional difference, but it appears to be the
pattern used for other methods on DynamicLibrary, and it should avoid
the -Wpedantic warning on one of the build bots about the direct
reinterpret_cast.
llvm-svn: 272461
MCJIT will now set the DataLayout on a module when it is added to the JIT,
rather than waiting until it is codegen'd, and the runFunction method will
finalize the module containing the function to be run before running it.
The fibonacci example has been updated to include and link against MCJIT.
llvm-svn: 272455
undef uses are no real uses of a register and must be ignored by
findLastUseBefore() so that handleMove() does not produce invalid live
intervals in some cases.
This fixed http://llvm.org/PR28083
llvm-svn: 272446
Summary:
Iterates all (except the first and the last) operands within each GEP
instruction for instrumentation.
Adds test struct_field_gep.ll.
Reviewers: aizatsky
Subscribers: vitalybuka, zhaoqin, kcc, eugenis, bruening, llvm-commits
Differential Revision: http://reviews.llvm.org/D21242
llvm-svn: 272442
Summary:
I can't find a case where we can rotate a loop more than once, and it looks
like we never do this. To rotate a loop following conditions should be met:
1) its header should be exiting
2) its latch shouldn't be exiting
But after the first rotation the header becomes the new latch, so this
condition can never be true any longer.
Tested on with an assert on LNT testsuite and make check.
Reviewers: hfinkel, sanjoy
Subscribers: sebpop, sanjoy, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D20181
llvm-svn: 272439
This fixes an alignment issue by forcing all cached allocations
to be 8 byte aligned, and also fixes an issue arising on big
endian systems by writing ulittle32_t's instead of uint32_t's
in the test.
llvm-svn: 272437
A memory access defined on function entry cannot be locally dominated by another memory access.
The patch was split from http://reviews.llvm.org/D19338 which exposes the problem.
Differential Revision: http://reviews.llvm.org/D21039
llvm-svn: 272436
Summary:
When stack-protection is activated and WinEH exceptions is used,
the EHRegNode (exception handling registration) is allocated twice on the stack.
This was not breaking anything except loosing space on the stack.
```
D:\src\llvm\examples>llc exc2.ll -debug-only=pei
alloc FI(0) at SP[-24]
alloc FI(1) at SP[-48] <<-- Allocated
alloc FI(1) at SP[-72] <<-- Allocated twice!?
alloc FI(2) at SP[-76]
alloc FI(4) at SP[-80]
alloc FI(3) at SP[-84]
```
Reviewers: rnk, majnemer
Subscribers: chrisha, llvm-commits
Differential Revision: http://reviews.llvm.org/D21188
llvm-svn: 272426
Loop unswitching may cause MSan false positive when the unswitch
condition is not guaranteed to execute.
This is very similar to ASan and TSan special case in
llvm::isSafeToSpeculativelyExecute (they don't like speculative loads
and stores), but for branch instructions.
This is a workaround for PR28054.
llvm-svn: 272421
Support and generate Compare and Traps like CRT, CIT, etc.
Support Trap as legal DAG opcodes and generate "j .+2" for them by default.
Add support for Conditional Traps and use the If Converter to convert them into
the corresponding compare and trap opcodes.
Differential Revision: http://reviews.llvm.org/D21155
llvm-svn: 272419
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.
Reviewers: arsenm, kzhuravl
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21153
llvm-svn: 272417
Adds a MachineFunctionPass that scans the body to find calls, and
update the register mask with the one saved by the
RegUsageInfoCollector analysis in PhysicalRegisterUsageInfo.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D21180
llvm-svn: 272414
The costs are somewhat hand-wavy, but should be much closer to the truth
than what we get from BasicTTI.
Differential Revision: http://reviews.llvm.org/D21156
llvm-svn: 272406
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.
When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.
The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.
An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D20769
llvm-svn: 272403
Somehow, the codegen logic for these sequences has gone completely untested
until now (note the 2 compare instructions generated per test).
There's also an *Intel* AVX optimization opportunity exposed in these cases
and the existing tests. Intel's (but not AMD's) AVX spec shows that extra FP
predicates were added, so a single comparison should always be sufficient,
and operand commutation should never be necessary.
llvm-svn: 272397