1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

16180 Commits

Author SHA1 Message Date
Nuno Lopes
ea7b37e3ae teach DSE and isInstructionTriviallyDead() about calloc
llvm-svn: 156553
2012-05-10 17:14:00 +00:00
Joel Jones
cc8aa34ea8 formatting change: strip debug info from test
llvm-svn: 156551
2012-05-10 16:55:31 +00:00
Manman Ren
727b7d5e4c ARM: peephole optimization to remove cmp instruction
This patch will optimize the following cases:
  sub r1, r3 | sub r1, imm
  cmp r3, r1 or cmp r1, r3 | cmp r1, imm
  bge L1

TO
  subs r1, r3
  bge  L1 or ble L1

If the branch instruction can use flag from "sub", then we can replace
"sub" with "subs" and eliminate the "cmp" instruction.

rdar: 10734411
llvm-svn: 156550
2012-05-10 16:48:21 +00:00
Joel Jones
305ecb3495 Fix a problem with incomplete equality testing of PHINodes in
Instruction::IsIdenticalToWhenDefined.

This manifested itself when inlining two calls to the same function.  The 
inlined function had a switch statement that returned one of a set of 
global variables.  Without this modification, the two phi instructions that 
chose values from the branches of the switch instruction inlined from the 
callee were considered equivalent and jump-threading replaced a load for the 
first switch value with a phi selecting from the second switch, thereby 
producing incorrect code.

This patch has been tested with "make check-all", "lnt runteste nt", and 
llvm self-hosted, and on the original program that had this problem, 
wireshark.

<rdar://problem/11025519>

llvm-svn: 156548
2012-05-10 15:59:41 +00:00
Nadav Rotem
157be301c5 AVX2: Add an additional broadcast idiom.
llvm-svn: 156540
2012-05-10 12:39:13 +00:00
Nadav Rotem
64319ce27c Generate AVX/AVX2 shuffles even when there is a memory op somewhere else in the program.
Starting r155461 we are able to select patterns for vbroadcast even when the load op is used by other users.

Fix PR11900.

llvm-svn: 156539
2012-05-10 12:22:05 +00:00
Dan Gohman
9e72870dd1 Fix the objc_storeStrong recognizer to stop before walking off the
end of a basic block if there's no store.

llvm-svn: 156520
2012-05-09 23:08:33 +00:00
Nuno Lopes
3d7a8137ee objectsize:
refactor code a bit to enable future changes to support run-time information
add support to compute allocation sizes at run-time if penalty > 1 (e.g., malloc(x), calloc(x, y), and VLAs)

llvm-svn: 156515
2012-05-09 21:30:57 +00:00
Danil Malyshev
0e378f7bcb Added a regress test for the bug #9964 before close it.
This bug was fixed by Jim Grosbach in #138879, thanks Jim!

llvm-svn: 156505
2012-05-09 19:07:04 +00:00
Nuno Lopes
e8880a9916 change the objectsize intrinsic signature: add a 3rd parameter to denote the maximum runtime performance penalty that the user is willing to accept.
This commit only adds the parameter. Code taking advantage of it will follow.

llvm-svn: 156473
2012-05-09 15:52:43 +00:00
Filipe Cabecinhas
fe00fb1f06 Fixed a typo
llvm-svn: 156471
2012-05-09 14:43:50 +00:00
Akira Hatanaka
574e68feec Add another peephole pattern for conditional moves.
llvm-svn: 156460
2012-05-09 02:29:29 +00:00
Akira Hatanaka
a53bdc878f Make register FP allocatable if the compiled function does not have dynamic
allocas.

llvm-svn: 156458
2012-05-09 01:38:13 +00:00
Akira Hatanaka
bd2f3d1c46 Expand 64-bit shifts if target ABI is O32.
llvm-svn: 156457
2012-05-09 00:55:21 +00:00
Dan Gohman
b47d02f929 Fix objc_storeStrong pattern matching to catch a potential use of the
old value after the store but before it is released.
This fixes rdar:/11116986.

llvm-svn: 156442
2012-05-08 23:34:08 +00:00
Eric Christopher
63b73ede75 Handle OpDeref in case it comes in as a register operand.
Part of rdar://11352000

llvm-svn: 156405
2012-05-08 18:56:00 +00:00
Daniel Dunbar
8e13944f35 Revert r156393, "[tests] Remove some remaining DejaGNU related cruft.", this
patch wasn't ready yet.

llvm-svn: 156395
2012-05-08 18:26:07 +00:00
Daniel Dunbar
882b236879 [tests] Remove some remaining DejaGNU related cruft.
llvm-svn: 156393
2012-05-08 18:11:49 +00:00
Duncan Sands
c9f011a85b Calling ReassociateExpression recursively is extremely dangerous since it will
replace the operands of expressions with only one use with undef and generate
a new expression for the original without using RAUW to update the original.
Thus any copies of the original expression held in a vector may end up
referring to some bogus value - and using a ValueHandle won't help since there
is no RAUW.  There is already a mechanism for getting the effect of recursion
non-recursively: adding the value to be recursed on to RedoInsts.  But it wasn't
being used systematically.  Have various places where recursion had snuck in at
some point use the RedoInsts mechanism instead.  Fixes PR12169.

llvm-svn: 156379
2012-05-08 12:16:05 +00:00
Stepan Dyatkovskiy
b150cd5ced Rejected r156374: Ordinary PR1255 patch. Due to clang-x86_64-debian-fnt buildbot failure.
llvm-svn: 156377
2012-05-08 08:33:21 +00:00
Craig Topper
77b1a4cee5 Remove 256-bit AVX non-temporal store intrinsics. Similar was previously done for 128-bit.
llvm-svn: 156375
2012-05-08 06:58:15 +00:00
Stepan Dyatkovskiy
33fd2a5bf4 Ordinary patch for PR1255.
Added new case-ranges orientated methods for adding/removing cases in SwitchInst. After this patch cases will internally representated as ConstantArray-s instead of ConstantInt, externally cases wrapped within the ConstantRangesSet object.
Old methods of SwitchInst are also works well, but marked as deprecated. So on this stage we have no side effects except that I added support for case ranges in BitcodeReader/Writer, of course test for Bitcode is also added. Old "switch" format is also supported.

llvm-svn: 156374
2012-05-08 06:36:08 +00:00
Owen Anderson
8adb0322ce Teach DAG combine to fold x-x to 0.0 when unsafe FP math is enabled.
llvm-svn: 156324
2012-05-07 20:51:25 +00:00
Owen Anderson
1e7a4f0f91 Teach reassociate to commute FMul's and FAdd's in order to canonicalize the order of their operands across instructions. This allows for greater CSE opportunities.
llvm-svn: 156323
2012-05-07 20:47:23 +00:00
Chad Rosier
3e284d8bd6 Fix a regression from r147481. This combine should only happen if there is a
single use.
rdar://11360370

llvm-svn: 156316
2012-05-07 18:47:44 +00:00
Manman Ren
6fde9f74b4 X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;

rdar: 10961709
llvm-svn: 156312
2012-05-07 18:06:23 +00:00
Eric Christopher
87e8163c57 Add support for the 'l' constraint.
Patch by Jack Carter.

llvm-svn: 156294
2012-05-07 06:25:15 +00:00
Eric Christopher
af8eabbbd8 Add support for the 'c' constraint.
Patch by Jack Carter.

llvm-svn: 156293
2012-05-07 06:25:10 +00:00
Eric Christopher
0f1a0afa75 Add support for the 'P' constraint.
Patch by Jack Carter.

llvm-svn: 156292
2012-05-07 06:25:02 +00:00
Eric Christopher
a6552ba637 Add support for the 'O' constraint.
Patch by Jack Carter.

llvm-svn: 156285
2012-05-07 05:46:48 +00:00
Eric Christopher
5e1efebf09 Add support for the 'N' inline asm constraint.
Patch by Jack Carter.

llvm-svn: 156284
2012-05-07 05:46:43 +00:00
Eric Christopher
e5a46b70b3 Add support for the 'L' inline asm constraint.
Patch by Jack Carter.

llvm-svn: 156283
2012-05-07 05:46:37 +00:00
Eric Christopher
267aa256cb Add support for the inline asm constraint 'K'.
llvm-svn: 156282
2012-05-07 05:46:29 +00:00
Craig Topper
c6d0bc2afc Add SSE4A MOVNTSS/MOVNTSD instructions.
llvm-svn: 156281
2012-05-07 05:36:19 +00:00
Eric Christopher
bf784be9ae Support the 'J' constraint.
Patch by Jack Carter.

llvm-svn: 156280
2012-05-07 03:13:42 +00:00
Eric Christopher
929ba63dcf Add support for the 'I' inline asm constraint. Also add tests
from the previous 2 patches.

Patch by Jack Carter.

llvm-svn: 156279
2012-05-07 03:13:32 +00:00
Benjamin Kramer
786f7671ab Switch the select to branch transformation on by default.
The primitive conservative heuristic seems to give a slight overall
improvement while not regressing stuff. Make it available to wider
testing. If you notice any speed regressions (or significant code
size regressions) let me know!

llvm-svn: 156258
2012-05-06 14:25:16 +00:00
Benjamin Kramer
0463564612 CodeGenPrepare: Add a transform to turn selects into branches in some cases.
This came up when a change in block placement formed a cmov and slowed down a
hot loop by 50%:

	ucomisd	(%rdi), %xmm0
	cmovbel	%edx, %esi

cmov is a really bad choice in this context because it doesn't get branch
prediction. If we emit it as a branch, an out-of-order CPU can do a better job
(if the branch is predicted right) and avoid waiting for the slow load+compare
instruction to finish. Of course it won't help if the branch is unpredictable,
but those are really rare in practice.

This patch uses a dumb conservative heuristic, it turns all cmovs that have one
use and a direct memory operand into branches. cmovs usually save some code
size, so we disable the transform in -Os mode. In-Order architectures are
unlikely to benefit as well, those are included in the
"predictableSelectIsExpensive" flag.

It would be better to reuse branch probability info here, but BPI doesn't
support select instructions currently. It would make sense to use the same
heuristics as the if-converter pass, which does the opposite direction of this
transform.


Test suite shows a small improvement here and there on corei7-level machines,
but the actual results depend a lot on the used microarchitecture. The
transformation is currently disabled by default and available by passing the
-enable-cgp-select2branch flag to the code generator.

Thanks to Chandler for the initial test case to him and Evan Cheng for providing
me with comments and test-suite numbers that were more stable than mine :)

llvm-svn: 156234
2012-05-05 12:49:22 +00:00
Stepan Dyatkovskiy
469935e0ae Small fix in InstCombineCasts.cpp. Restored "alloca + bitcast" reducing for case when alloca's size is calculated within the "add/sub/... nsw".
Also added fix to 2011-06-13-nsw-alloca.ll test.

llvm-svn: 156231
2012-05-05 07:09:40 +00:00
Justin Holewinski
4ca961430f This patch adds a new NVPTX back-end to LLVM which supports code generation for NVIDIA PTX 3.0. This back-end will (eventually) replace the current PTX back-end, while maintaining compatibility with it.
The new target machines are:

nvptx (old ptx32) => 32-bit PTX
nvptx64 (old ptx64) => 64-bit PTX

The sources are based on the internal NVIDIA NVPTX back-end, and
contain more functionality than the current PTX back-end currently
provides.

NV_CONTRIB

llvm-svn: 156196
2012-05-04 20:18:50 +00:00
Sebastian Pop
2b868d474e Added missing CMN case in Thumb2SizeReduction pass so that LLVM emits 16-bits encoding of CMN instructions.
llvm-svn: 156195
2012-05-04 19:53:56 +00:00
Craig Topper
6881f1067c Allow v16i16 and v32i8 shuffles to be rewritten as narrower shuffles.
llvm-svn: 156156
2012-05-04 04:44:49 +00:00
Kevin Enderby
1ab00df6a7 Fix issues with the ARM bl and blx thumb instructions and the J1 and J2 bits
for the assembler and disassembler.  Which were not being set/read correctly
for offsets greater than 22 bits in some cases.

Changes to lib/Target/ARM/ARMAsmBackend.cpp from Gideon Myles!

llvm-svn: 156118
2012-05-03 22:41:56 +00:00
Nuno Lopes
2762496a1a remove calls to calloc if the allocated memory is not used (it was already being done for malloc)
fix a few typos found by Chad in my previous commit

llvm-svn: 156110
2012-05-03 22:08:19 +00:00
Sirish Pande
253d16b1b0 Support for target dependent Hexagon VLIW packetizer.
This patch creates and optimizes packets as per Hexagon ISA rules.

llvm-svn: 156109
2012-05-03 21:52:53 +00:00
Nuno Lopes
26239aeb99 add support for calloc to objectsize lowering
llvm-svn: 156102
2012-05-03 21:19:58 +00:00
Silviu Baranga
b5e46c12d6 Fixed disassembler for vstm/vldm ARM VFP instructions.
llvm-svn: 156077
2012-05-03 16:38:40 +00:00
Craig Topper
52869bf5bf Fix 256-bit vpshuflw and vpshufhw immediate encoding to handle undefs in the lower half correctly. Missed in r155982.
llvm-svn: 156059
2012-05-03 07:12:59 +00:00
Evan Cheng
59c8d1af93 Fix two-address pass's aggressive instruction commuting heuristics. It's meant
to catch cases like:
 %reg1024<def> = MOV r1
 %reg1025<def> = MOV r0
 %reg1026<def> = ADD %reg1024, %reg1025
 r0            = MOV %reg1026

By commuting ADD, it let coalescer eliminate all of the copies. However, there
was a bug in the heuristics where it ended up commuting the ADD in:

 %reg1024<def> = MOV r0
 %reg1025<def> = MOV 0
 %reg1026<def> = ADD %reg1024, %reg1025
 r0            = MOV %reg1026

That did no benefit but rather ensure the last MOV would not be coalesced.

rdar://11355268

llvm-svn: 156048
2012-05-03 01:45:13 +00:00
Owen Anderson
e3d41b44cc Teach DAGCombine the same multiply-by-1.0 folding trick when doing FMAs, just like it now knows for FMULs.
llvm-svn: 156029
2012-05-02 22:17:40 +00:00