1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

9238 Commits

Author SHA1 Message Date
Matt Arsenault
f68a94e609 R600/SI: Expand selects on vectors.
llvm-svn: 203134
2014-03-06 17:34:03 +00:00
Richard Osborne
7d4ecf1273 [XCore] Add support for the "m" inline asm constraint.
Summary:
This provides support for CP and DP relative global accesses in inline
asm.

Reviewers: robertlytton

Reviewed By: robertlytton

Differential Revision: http://llvm-reviews.chandlerc.com/D2943

llvm-svn: 203129
2014-03-06 16:37:48 +00:00
Chad Rosier
6c9595d931 [AArch64] This is a work in progress to provide a machine description
for the Cortex-A53 subtarget in the AArch64 backend.

This patch lays the ground work to annotate each AArch64 instruction
(no NEON yet) with a list of SchedReadWrite types. The patch also
provides the Cortex-A53 processor resources, maps those the the default
SchedReadWrites, and provides basic latency. NEON support will be added
in a subsequent patch with proper forwarding logic.

Verification was done by setting the pre-RA scheduler to linearize to
better gauge the effect of the MIScheduler. Even without modeling the
forward logic, the results show a modest improvement for Cortex-A53.

Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 203125
2014-03-06 16:04:00 +00:00
Hal Finkel
c04da1f4c7 Fixup PPC Darwin i1 argument handling
Like on other targets, we need to zero_extend/truncate i1 args before copying
them to GPRs.

llvm-svn: 203045
2014-03-06 00:45:19 +00:00
Hal Finkel
0373847793 When using CR bit registers on PPC32, handle the i1 vaarg case
When copying an i1 value into a GPR for a vaarg call, we need to explicitly
zero-extend the i1 value (otherwise an invalid CRBIT -> GPR copy will be
generated).

llvm-svn: 203041
2014-03-06 00:23:33 +00:00
Jack Carter
79eae149e1 [Mips] Testcase typo fix. No functionality change.
llvm-svn: 203020
2014-03-05 22:54:56 +00:00
Hal Finkel
18344a3ff6 With PPC CR bit registers, handle int_to_fp on older cores
On cores without fpcvt support, we cannot promote int_to_fp i1 operations,
because there is nothing to promote them to. The most straightforward
implementation of this uses a select to choose between the two possible
resulting floating-point values (and that's what is done here).

llvm-svn: 203015
2014-03-05 22:14:00 +00:00
Rafael Espindola
d987164eed Always print the implicit .text at the start of an asm file.
Before llvm-mc would print it, but llc was assuming that it would produce
another section changing directive before one was needed. That assumption is
false with inline asm.

Fixes PR19049.

Another option would be to always create the section, but in the asm printer
avoid printing sections changes during initialization. That would work, but
* We do use the fact that llvm-mc prints it in testing. The tests can be changed
  if needed.
* A quick poll on IRC suggest that most developers prefer the implicit .text to
  be printed.

llvm-svn: 203001
2014-03-05 20:09:15 +00:00
Cameron McInally
80fa2d42e5 Lower AVX v4i64->v4i32 truncate to one shuffle.
llvm-svn: 202996
2014-03-05 19:41:16 +00:00
Oliver Stannard
6aa486598a ARM: Correctly align arguments after a byval struct is passed on the stack
llvm-svn: 202985
2014-03-05 15:25:27 +00:00
Andrew Trick
bcd20c0c29 Make stackmap machineinstrs clobber the scratch regs too.
Patchpoints already did this. Doing it for stackmaps is a convenience
for the runtime in the event that it needs to scratch register to
patch or perform a runtime call thunk.

Unlike patchpoints, we just assume the AnyRegCC calling
convention. This is the only language and target independent calling
convention specific to stackmaps so makes sense.  Although the calling
convention is not currently used to select the scratch registers.

llvm-svn: 202943
2014-03-05 07:08:16 +00:00
Hans Wennborg
c1cb270dba Check for dynamic allocas and inline asm that clobbers sp before building
selection dag (PR19012)

In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).

The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.

This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.

Differential Revision: http://llvm-reviews.chandlerc.com/D2954

llvm-svn: 202930
2014-03-05 02:43:26 +00:00
Richard Osborne
b9f5c6e728 [XCore] Fix call of absolute address.
Previously for:

tail call void inttoptr (i64 65536 to void ()*)() nounwind

We would emit:

bl 65536

The immediate operand of the bl instruction is a relative offset so it is
wrong to use the absolute address here.

llvm-svn: 202860
2014-03-04 16:50:30 +00:00
Daniel Sanders
2e526d806c [mips][msa] Correct the behaviour of the COPY_FW pseudo on lanes 2 and 3.
Summary:
Previously, attempting to extract lanes 2 and 3 would actually extract lane 1.
The MSA CodeGen tests only covered lanes 0 and 1.

Differential Revision: http://llvm-reviews.chandlerc.com/D2935

llvm-svn: 202848
2014-03-04 13:54:30 +00:00
Chad Rosier
e60c767814 Revert "[AArch64] This is a work in progress to provide a machine description"
This reverts commit ff717c8fc786a0cfa1602982b91895fa09e514fc.

llvm-svn: 202773
2014-03-04 00:32:07 +00:00
Chad Rosier
ad64e09862 [AArch64] This is a work in progress to provide a machine description
for the Cortex-A53 subtarget in the AArch64 backend.

This patch lays the ground work to annotate each AArch64 instruction
(no NEON yet) with a list of SchedReadWrite types. The patch also
provides the Cortex-A53 processor resources, maps those the the default
SchedReadWrites, and provides basic latency. NEON support will be added
in a subsequent patch with proper forwarding logic.

Verification was done by setting the pre-RA scheduler to linearize to
better gauge the effect of the MIScheduler. Even without modeling the
forward logic, the results show a modest improvement for Cortex-A53.

Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!

llvm-svn: 202767
2014-03-03 23:32:47 +00:00
Daniel Sanders
ea44f13708 [mips] Prevent %lo relocation being used on MSA loads and stores.
Summary:
Parts of the compiler still believed MSA load/stores have a 16-bit offset when
it is actually 10-bit. Corrected this, and fixed a closely related issue this
uncovered where load/stores with 10-bit and 12-bit offsets (MSA and microMIPS
respectively) could not load/store using offsets from the stack/frame pointer.
They accepted frameindex+offset, but not frameindex by itself.

Reviewers: jacksprat, matheusalmeida

Reviewed By: jacksprat

Differential Revision: http://llvm-reviews.chandlerc.com/D2888

llvm-svn: 202717
2014-03-03 14:31:21 +00:00
Hal Finkel
64680c3ba1 Add a PPC inline asm constraint type for single CR bits
Now that the PowerPC backend can track individual CR bits as first-class
registers, we should also have a way of allocating them for inline asm
statements. Because these registers are only one bit, if an output variable is
implicitly cast to a larger integer size, we'll get an any_extend to that
larger type (this is part of the existing target-independent logic). As a
result, regardless of the size of the output type, only the first bit is
meaningful.

The constraint identifier "wc" has been chosen for this purpose. Although gcc
does not currently support allocating individual CR bits, this identifier
choice has been coordinated with the gcc PowerPC team, and will be marked as
reserved for this purpose in the gcc constraints.md file.

llvm-svn: 202657
2014-03-02 18:23:39 +00:00
Elena Demikhovsky
838b163a58 AVX-512: Fixed extract_vector_elt for v8i1 vector
llvm-svn: 202624
2014-03-02 09:19:44 +00:00
Matt Arsenault
394a9d104d R600: Add failing control flow tests.
Simple cases hit a variety of problems at -O0.

llvm-svn: 202601
2014-03-01 21:45:41 +00:00
Hal Finkel
4937443651 Remove extra truncs/exts around i32 bit operations on PPC64
This generalizes the code to eliminate extra truncs/exts around i1 bit
operations to also do the same on PPC64 for i32 bit operations. This eliminates
a fairly prevalent code wart:

int foo(int a) {
  return a == 5 ? 7 : 8;
}

On PPC64, because of the extension implied by the ABI, this would generate:

	cmplwi 0, 3, 5
	li 12, 8
	li 4, 7
	isel 3, 4, 12, 2
	rldicl 3, 3, 0, 32
	blr

where the 'rldicl 3, 3, 0, 32', the extension, is completely unnecessary. At
least for the single-BB case (which is all that the DAG combine mechanism can
handle), this unnecessary extension is no longer generated.

llvm-svn: 202600
2014-03-01 21:36:57 +00:00
Venkatraman Govindaraju
789e2fd1b7 [Sparc] Add support for parsing directives in SparcAsmParser.
llvm-svn: 202564
2014-03-01 02:18:04 +00:00
Venkatraman Govindaraju
439a7d90a6 [Sparc] Emit 'restore' instead of 'restore %g0, %g0, %g0'. This improves the readability of the generated code.
llvm-svn: 202563
2014-03-01 01:04:26 +00:00
Manman Ren
fe531c446c SpillPlacement: fix a bug in iterate.
Inside iterate, we scan backwards then scan forwards in a loop. When iteration
is not zero, the last node was just updated so we can skip it. But when
iteration is zero, we can't skip the last node.

For the testing case, fixing this will save a spill and move register copies
from hot path to cold path.

llvm-svn: 202557
2014-02-28 23:05:31 +00:00
Tom Stellard
6280afdecd R600/SI: Expand all v16[if]32 operations
llvm-svn: 202543
2014-02-28 21:36:37 +00:00
Justin Bogner
56a8b49ffd CommandLine: Exit successfully for -version and -help
Tools that use the CommandLine library currently exit with an error
when invoked with -version or -help. This is unusual and non-standard,
so we'll fix them to exit successfully instead.

I don't expect that anyone relies on the current behaviour, so this
should be a fairly safe change.

llvm-svn: 202530
2014-02-28 19:08:01 +00:00
Adam Nemet
0fe89b88ce Test commit
llvm-svn: 202528
2014-02-28 18:44:39 +00:00
Zoran Jovanovic
9c1887bef4 Fixed operand of SC microMIPS instruction.
llvm-svn: 202526
2014-02-28 18:22:56 +00:00
Hal Finkel
1970087008 Swap PPC isel operands to allow for 0-folding
The PPC isel instruction can fold 0 into the first operand (thus eliminating
the need to materialize a zero-containing register when the 'true' result of
the isel is 0). When the isel is fed by a bit register operation that we can
invert, do so as part of the bit-register-operation peephole routine.

llvm-svn: 202469
2014-02-28 06:11:16 +00:00
Hal Finkel
883c64377d Add CR-bit tracking to the PowerPC backend for i1 values
This change enables tracking i1 values in the PowerPC backend using the
condition register bits. These bits can be treated on PowerPC as separate
registers; individual bit operations (and, or, xor, etc.) are supported.
Tracking booleans in CR bits has several advantages:

 - Reduction in register pressure (because we no longer need GPRs to store
   boolean values).

 - Logical operations on booleans can be handled more efficiently; we used to
   have to move all results from comparisons into GPRs, perform promoted
   logical operations in GPRs, and then move the result back into condition
   register bits to be used by conditional branches. This can be very
   inefficient, because the throughput of these CR <-> GPR moves have high
   latency and low throughput (especially when other associated instructions
   are accounted for).

 - On the POWER7 and similar cores, we can increase total throughput by using
   the CR bits. CR bit operations have a dedicated functional unit.

Most of this is more-or-less mechanical: Adjustments were needed in the
calling-convention code, support was added for spilling/restoring individual
condition-register bits, and conditional branch instruction definitions taking
specific CR bits were added (plus patterns and code for generating bit-level
operations).

This is enabled by default when running at -O2 and higher. For -O0 and -O1,
where the ability to debug is more important, this feature is disabled by
default. Individual CR bits do not have assigned DWARF register numbers,
and storing values in CR bits makes them invisible to the debugger.

It is critical, however, that we don't move i1 values that have been promoted
to larger values (such as those passed as function arguments) into bit
registers only to quickly turn around and move the values back into GPRs (such
as happens when values are returned by functions). A pair of target-specific
DAG combines are added to remove the trunc/extends in:
  trunc(binary-ops(binary-ops(zext(x), zext(y)), ...)
and:
  zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...)
In short, we only want to use CR bits where some of the i1 values come from
comparisons or are used by conditional branches or selects. To put it another
way, if we can do the entire i1 computation in GPRs, then we probably should
(on the POWER7, the GPR-operation throughput is higher, and for all cores, the
CR <-> GPR moves are expensive).

POWER7 test-suite performance results (from 10 runs in each configuration):

SingleSource/Benchmarks/Misc/mandel-2: 35% speedup
MultiSource/Benchmarks/Prolangs-C++/city/city: 21% speedup
MultiSource/Benchmarks/MiBench/automotive-susan: 23% speedup
SingleSource/Benchmarks/CoyoteBench/huffbench: 13% speedup
SingleSource/Benchmarks/Misc-C++/Large/sphereflake: 13% speedup
SingleSource/Benchmarks/Misc-C++/mandel-text: 10% speedup

SingleSource/Benchmarks/Misc-C++-EH/spirit: 10% slowdown
MultiSource/Applications/lemon/lemon: 8% slowdown

llvm-svn: 202451
2014-02-28 00:27:01 +00:00
Roman Divacky
f36febf578 Lower FNEG just like FABS to fneg[ds] and fmov[ds], thus avoiding
expensive libcall. Also, Qp_neg is not implemented on at least
FreeBSD. This is also what gcc is doing.

llvm-svn: 202422
2014-02-27 19:26:29 +00:00
Adrian Prantl
d7f77dd966 Debug info: Remove ARMAsmPrinter::EmitDwarfRegOp(). AsmPrinter can now
scan the register file for sub- and super-registers.
No functionality change intended.

(Tests are updated because the comments in the assembler output are
different.)

llvm-svn: 202416
2014-02-27 17:56:08 +00:00
Richard Osborne
947c19eaa0 [XCore] Support functions returning more than 4 words.
If a function returns a large struct by value return the first 4 words
in registers and the rest on the stack in a location reserved by the
caller. This is needed to support the xC language which supports
functions returning an arbitrary number of return values. This is
r202397 reapplied with a fix to avoid an uninitialized read of a member.

llvm-svn: 202414
2014-02-27 17:47:54 +00:00
Richard Osborne
f8fb4e8a7f Revert r202396, r202397.
These are causing test failures, revert for now.

llvm-svn: 202398
2014-02-27 14:24:13 +00:00
Richard Osborne
cb6866dfec [XCore] Support functions returning more than 4 words.
Summary:
If a function returns a large struct by value return the first 4 words
in registers and the rest on the stack in a location reserved by the
caller. This is needed to support the xC language which supports
functions returning an arbitrary number of return values.

Reviewers: robertlytton

Reviewed By: robertlytton

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2889

llvm-svn: 202397
2014-02-27 14:00:40 +00:00
Richard Osborne
5ac74685fd [XCore] Target optimized library function __memcpy_4()
Summary:
If the src, dst and size of a memcpy are known to be 4 byte aligned we
can call __memcpy_4() instead of memcpy().

Reviewers: robertlytton

Reviewed By: robertlytton

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2871

llvm-svn: 202395
2014-02-27 13:39:07 +00:00
Richard Osborne
75c16f2bf4 [XCore] Add dag combines for instructions that ignore some input bits.
These instructions ignore the high bits of one of their input operands -
try and use this to simplify the code.

llvm-svn: 202394
2014-02-27 13:20:11 +00:00
Richard Osborne
f815df9c6e [XCore] Provide information about known zero bits of resource instructions.
llvm-svn: 202393
2014-02-27 13:20:06 +00:00
Daniel Sanders
98ea718b1d Stop test/CodeGen/X86/v4i32load-crash.ll targeting non-X86-64 targets.
Summary:
Fixes an issue where a test attempts to use -mcpu=x86-64 on non-X86-64 targets.
This triggers an assertion in the MIPS backend since it doesn't know what ABI to
use by default for unrecognized processors.

CC: llvm-commits, rafael

Differential Revision: http://llvm-reviews.chandlerc.com/D2877

llvm-svn: 202369
2014-02-27 09:24:31 +00:00
Michel Danzer
8edacce1de R600/SI: Optimize SI_KILL for constant operands
If the SI_KILL operand is constant, we can either clear the exec mask if
the operand is negative, or do nothing otherwise.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 202337
2014-02-27 01:47:09 +00:00
Michel Danzer
0ddce64f7c R600/SI: Allow SI_KILL for geometry shaders
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 202336
2014-02-27 01:47:02 +00:00
Andrew Trick
323d31a625 Use regnum regex in an XCore test case.
llvm-svn: 202315
2014-02-26 23:22:49 +00:00
Andrew Trick
4823d7c2b4 Very temporarily XFAILing a test. Will be fixed shortly.
llvm-svn: 202310
2014-02-26 22:39:59 +00:00
Andrew Trick
ba61c4e6cf Add a limit to the heuristic that register allocates instructions in local order.
This handles pathological cases in which we see 2x increase in spill
code for large blocks (~50k instructions). I don't have a unit test
for this behavior.

Fixes rdar://16072279.

llvm-svn: 202304
2014-02-26 22:07:26 +00:00
Quentin Colombet
e639a79f72 Lower unsigned vsetcc to psubus in certain cases
The current approach to lower a vsetult is to flip the sign bit of the
operands, swap the operands and then use a (signed) pcmpgt.  psubus (unsigned
saturating subtract) can be used to emulate a vsetult more efficiently:

+    case ISD::SETULT: {
+      // If the comparison is against a constant we can turn this into a
+      // setule.  With psubus, setule does not require a swap.  This is
+      // beneficial because the constant in the register is no longer
+      // destructed as the destination so it can be hoisted out of a loop.

I also enable lowering via psubus in a few other cases where it's clearly
beneficial: setule and setuge if minu/maxu cannot be used.
    
rdar://problem/14338765

Patch by Adam Nemet <anemet@apple.com>.

llvm-svn: 202301
2014-02-26 21:39:12 +00:00
Tim Northover
852c0d63ee AArch64: simplify tbl/tbx polymorphism
The table argument is always 128-bit (and interpreted as <16 x i8>) so the
extra specifier for it is just clutter.

No user-visible behaviour change, so no tests.

llvm-svn: 202258
2014-02-26 11:55:09 +00:00
Artyom Skrobov
94122a0879 ARMv8 IfConversion must skip narrow instructions that a) define CPSR and b) wouldn't affect CPSR in an IT block
llvm-svn: 202257
2014-02-26 11:27:28 +00:00
Daniel Sanders
664020809d Stop test/CodeGen/ARM/a15.ll targetting non-ARM targets.
Summary:
Fixes an issue where a test attempts to use -mcpu=cortex-a15 on non-ARM targets.
This triggers an assertion on MIPS since it doesn't know what ABI to use by default for
unrecognized processors.

Reviewers: rengolin

Reviewed By: rengolin

CC: llvm-commits, aemerson, rengolin

Differential Revision: http://llvm-reviews.chandlerc.com/D2876

llvm-svn: 202256
2014-02-26 11:26:18 +00:00
Tom Stellard
3dafad8efc R600/SI: Custom select 64-bit ADD
llvm-svn: 202194
2014-02-25 21:36:18 +00:00
Hal Finkel
08c64addef Account for 128-bit integer operations in PPCCTRLoops
We need to abort the formation of counter-register-based loops where there are
128-bit integer operations that might become function calls.

llvm-svn: 202192
2014-02-25 20:51:50 +00:00