llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00

Author	SHA1	Message	Date
Jakob Stoklund Olesen	a63aa4e54b	Add Target hook to duplicate machine instructions. Some instructions refer to unique labels, and so cannot be trivially cloned with CloneMachineInstr. llvm-svn: 92873	2010-01-06 23:47:07 +00:00
Jim Grosbach	5a1c16e5bb	Rough first pass at compare_and_swap atomic builtins for ARM mode. Work in progress. llvm-svn: 91090	2009-12-11 01:42:04 +00:00
Dan Gohman	f9654e9258	Remove the target hook TargetInstrInfo::BlockHasNoFallThrough in favor of MachineBasicBlock::canFallThrough(), which is target-independent and more thorough. llvm-svn: 90634	2009-12-05 00:44:40 +00:00
Bob Wilson	b293fe32cb	Remove isProfitableToDuplicateIndirectBranch target hook. It is profitable for all the processors where I have tried it, and even when it might not help performance, the cost is quite low. The opportunities for duplicating indirect branches are limited by other factors so code size does not change much due to tail duplicating indirect branches aggressively. llvm-svn: 90144	2009-11-30 18:35:03 +00:00
Bob Wilson	c5fa56c805	Refactor target hook for tail duplication as requested by Chris. Make tail duplication of indirect branches much more aggressive (for targets that indicate that it is profitable), based on further experience with this transformation. I compiled 3 large applications with and without this more aggressive tail duplication and measured minimal changes in code size. ("size" on Darwin seems to round the text size up to the nearest page boundary, so I can only say that any code size increase was less than one 4k page.) Radar 7421267. llvm-svn: 89814	2009-11-24 23:35:49 +00:00
Anton Korobeynikov	0f885eb7fd	Materialize global addresses via movt/movw pair, this is always better than doing the same via constpool: 1. Load from constpool costs 3 cycles on A9, movt/movw pair - just 2. 2. Load from constpool might stall up to 300 cycles due to cache miss. 3. Movt/movw does not use load/store unit. 4. Less constpool entries => better compiler performance. This is only enabled on ELF systems, since darwin does not have needed relocations (yet). llvm-svn: 89720	2009-11-24 00:44:37 +00:00
Evan Cheng	a7496ef9a6	Add predicate operand to NEON instructions. Fix lots (but not all) 80 col violations in ARMInstrNEON.td. llvm-svn: 89542	2009-11-21 06:21:52 +00:00
Bob Wilson	6b68bd153a	Add a target hook to allow changing the tail duplication limit based on the contents of the block to be duplicated. Use this for ARM Cortex A8/9 to be more aggressive tail duplicating indirect branches, since it makes it much more likely that they will be predicted in the branch target buffer. Testcase coming soon. llvm-svn: 89187	2009-11-18 03:34:27 +00:00
Evan Cheng	9b46e74f42	- Change TargetInstrInfo::reMaterialize to pass in TargetRegisterInfo. - If destination is a physical register and it has a subreg index, use the sub-register instead. This fixes PR5423. llvm-svn: 88745	2009-11-14 02:55:43 +00:00
Evan Cheng	069209cf6b	Refactor code. llvm-svn: 86423	2009-11-08 00:15:23 +00:00
Jim Grosbach	7152d0423b	80-column cleanup of file header comments llvm-svn: 86408	2009-11-07 22:00:39 +00:00
Evan Cheng	899d8cb6a0	Refactor code. Fix a potential missing check. Teach isIdentical() about tLDRpci_pic. llvm-svn: 86330	2009-11-07 04:04:34 +00:00
Evan Cheng	6e3e66375a	- Add pseudo instructions tLDRpci_pic and t2LDRpci_pic which does a pc-relative load of a GV from constantpool and then add pc. It allows the code sequence to be rematerializable so it would be hoisted by machine licm. - Add a late pass to break these pseudo instructions into a number of real instructions. Also move the code in Thumb2 IT pass that breaks up t2MOVi32imm to this pass. This is done before post regalloc scheduling to allow the scheduler to proper schedule these instructions. It also allow them to be if-converted and shrunk by later passes. llvm-svn: 86304	2009-11-06 23:52:48 +00:00
Anton Korobeynikov	3ba3789153	Use NEON reg-reg moves, where profitable. This reduces "domain-cross" stalls, when we used to mix vfp and neon code (the former were used for reg-reg moves) llvm-svn: 85764	2009-11-02 00:10:38 +00:00
Bob Wilson	af37728221	Add a Thumb BRIND pattern. Change the ARM BRIND assembly to separate the opcode and operand with a tab. Check for these instructions in the usual places. llvm-svn: 85411	2009-10-28 18:26:41 +00:00
Evan Cheng	54a1a87862	Make ARM and Thumb2 32-bit immediate materialization into a single 32-bit pseudo instruction. This makes it re-materializable. Thumb2 will split it back out into two instructions so IT pass will generate the right mask. Also, this expose opportunies to optimize the movw to a 16-bit move. llvm-svn: 82982	2009-09-28 09:14:39 +00:00
Evan Cheng	984f8efcaa	Fix PR4789. Teach eliminateFrameIndex how to handle VLDRQ and VSTRQ which cannot fold any immediate offset. llvm-svn: 80191	2009-08-27 01:23:50 +00:00
Evan Cheng	9d351a7246	Turn on if-conversion for thumb2. llvm-svn: 79084	2009-08-15 07:59:10 +00:00
Jim Grosbach	3c898a99bd	Whitespace cleanup. Remove trailing whitespace. llvm-svn: 78666	2009-08-11 15:33:49 +00:00
Evan Cheng	ad380aa97a	Add support to reduce most of 32-bit Thumb2 arithmetic instructions. llvm-svn: 78550	2009-08-10 02:37:24 +00:00
Evan Cheng	4046c75e96	Code refactoring. No functionality change. llvm-svn: 78455	2009-08-08 03:20:32 +00:00
Evan Cheng	897663328b	A big oops. Thumb1 default CC is a def of CPSR, not a use of CPSR. llvm-svn: 78418	2009-08-07 22:36:37 +00:00
Chris Lattner	c388490738	Move the getInlineAsmLength virtual method from TAI to TII, where the only real caller (GetFunctionSizeInBytes) uses it. The custom ARM implementation of this is basically reimplementing an assembler poorly for negligible gain. It should be removed IMNSHO, but I'll leave that to ARMish folks to decide. llvm-svn: 77877	2009-08-02 05:20:37 +00:00
Evan Cheng	b740190d2e	- More refactoring. This gets rid of all of the getOpcode calls. - This change also makes it possible to switch between ARM / Thumb on a per-function basis. - Fixed thumb2 routine which expand reg + arbitrary immediate. It was using using ARM so_imm logic. - Use movw and movt to do reg + imm when profitable. - Other code clean ups and minor optimizations. llvm-svn: 77300	2009-07-28 05:48:47 +00:00
Evan Cheng	718ab76a04	More DCE. llvm-svn: 77231	2009-07-27 18:48:45 +00:00
Evan Cheng	fa630cca3f	Get rid of more dead code. llvm-svn: 77227	2009-07-27 18:38:54 +00:00
Evan Cheng	03307125f5	Clean up. llvm-svn: 77221	2009-07-27 18:25:24 +00:00
Evan Cheng	a773ae7a39	Get rid of some more getOpcode calls. This also fixes potential problems in ARMBaseInstrInfo routines not recognizing thumb1 instructions when 32-bit and 16-bit instructions mix. llvm-svn: 77218	2009-07-27 18:20:05 +00:00
Evan Cheng	674c4d47b9	Use t2LDRi12 and t2STRi12 to load / store to / from stack frames. Eliminate more getOpcode calls. llvm-svn: 77181	2009-07-27 03:14:20 +00:00
Evan Cheng	d615e606c4	Change Thumb2 jumptable codegen to one that uses two level jumps: Before: adr r12, #LJTI3_0_0 ldr pc, [r12, +r0, lsl #2] LJTI3_0_0: .long LBB3_24 .long LBB3_30 .long LBB3_31 .long LBB3_32 After: adr r12, #LJTI3_0_0 add pc, r12, +r0, lsl #2 LJTI3_0_0: b.w LBB3_24 b.w LBB3_30 b.w LBB3_31 b.w LBB3_32 This has several advantages. 1. This will make it easier to optimize this to a TBB / TBH instruction + (smaller) table. 2. This eliminate the need for ugly asm printer hack to force the address into thumb addresses (bit 0 is one). 3. Same codegen for pic and non-pic. 4. This eliminate the need to align the table so constantpool island pass won't have to over-estimate the size. Based on my calculation, the later is probably slightly faster as well since ldr pc with shifter address is very slow. That is, it should be a win as long as the HW implementation can do a reasonable job of branch predict the second branch. llvm-svn: 77024	2009-07-25 00:33:29 +00:00
Eli Friedman	1be7dfb1aa	Remove unused member functions. llvm-svn: 76960	2009-07-24 07:43:59 +00:00
Evan Cheng	d13b8f8353	FLDD, FLDS, FCPYD, FCPYS, FSTD, FSTS, VMOVD, VMOVQ maps to the same instructions on all sub-targets. llvm-svn: 76925	2009-07-24 00:53:56 +00:00
David Goodwin	85bcdffa4f	Correctly handle the Thumb-2 imm8 addrmode. Specialize frame index elimination more exactly for Thumb-2 to get better code gen. llvm-svn: 76919	2009-07-24 00:16:18 +00:00
David Goodwin	f3c839e0b9	Fix frame index elimination to correctly handle thumb-2 addressing modes that don't allow negative offsets. During frame elimination convert i12 opcode to a i8 when necessary due to a negative offset. llvm-svn: 76883	2009-07-23 17:06:46 +00:00
Evan Cheng	fd384e9493	Fix a regression from 76124. Thumb1 instructions default to S bit being true. llvm-svn: 76374	2009-07-19 19:16:46 +00:00
Anton Korobeynikov	dc39f4fff8	Emit cross regclass register moves for thumb2. Minor code duplication cleanup. llvm-svn: 76124	2009-07-16 23:26:06 +00:00
Evan Cheng	4d2678d1ff	Move isPredicated from .cpp to .h llvm-svn: 75217	2009-07-10 01:38:27 +00:00
David Goodwin	94209a4c31	Generalize opcode selection in ARMBaseRegisterInfo. llvm-svn: 75036	2009-07-08 20:28:28 +00:00
David Goodwin	5bdef4b3f7	Checkpoint Thumb2 Instr info work. Generalized base code so that it can be shared between ARM and Thumb2. Not yet activated because register information must be generalized first. llvm-svn: 75010	2009-07-08 16:09:28 +00:00

39 Commits