1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
Commit Graph

15719 Commits

Author SHA1 Message Date
Quentin Colombet
cf0d20f78c [MachineBlockPlacement] Let the target optimize the branches at the end.
After the layout of the basic blocks is set, the target may be able to get rid
of unconditional branches to fallthrough blocks that the generic code does not
catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to
analyze all the branches involved in the terminators sequence, while still
understanding a few of them.

In such situation, AnalyzeBranch can directly modify the branches if it has been
instructed to do so.

This patch takes advantage of that.

llvm-svn: 268328
2016-05-02 22:58:59 +00:00
Quentin Colombet
6b53c89899 [X86] Model FAULTING_LOAD_OP as a terminator and branch.
This operation may branch to the handler block and we do not want it
to happen anywhere within the basic block.
Moreover, by marking it "terminator and branch" the machine verifier
does not wrongly assume (because of AnalyzeBranch not knowing better)
the branch is analyzable. Indeed, the target was seeing only the
unconditional branch and not the faulting load op and thought it was
a simple unconditional block.
The machine verifier was complaining because of that and moreover,
other optimizations could have done wrong transformation!

In the process, simplify the representation of the handler block in
the faulting load op. Now, we directly reference the handler block
instead of using a label. This has the benefits of:
1. MC knows how to issue a label for a BB, so leave that to it.
2. Accessing the target BB from its label is painful, whereas it is
   direct from a MBB operand.

Note: The 2 bytes offset in implicit-null-check.ll comes from the
fact the unconditional jumps are not removed anymore, as the whole
terminator sequence is not analyzable anymore.

Will fix it in a subsequence commit.

llvm-svn: 268327
2016-05-02 22:58:54 +00:00
Matt Arsenault
dfb613a88d AMDGPU: Custom lower v2i32 loads and stores
This will allow us to split up 64-bit private accesses when
necessary.

llvm-svn: 268296
2016-05-02 20:13:51 +00:00
Tom Stellard
d541008932 AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratch
We were using v_readlane_b32 with the lane set to zero, but this won't
work if thread 0 is not active.

Differential Revision: http://reviews.llvm.org/D19745

llvm-svn: 268295
2016-05-02 20:11:44 +00:00
Matt Arsenault
7932e530a0 AMDGPU: Make i64 loads/stores promote to v2i32
Now that unaligned access expansion should not attempt
to produce i64 accesses, we can remove the hack in
PreprocessISelDAG where this is done.

This allows splitting i64 private accesses while
allowing the new add nodes indexing the vector components
can be folded with the base pointer arithmetic.

llvm-svn: 268293
2016-05-02 20:07:26 +00:00
Simon Pilgrim
6c3bbc1c10 [X86][AVX2] Added 128-bit wide shuffle test
Demonstrate missing 128-bit wide shuffle combine support

llvm-svn: 268290
2016-05-02 19:46:58 +00:00
Tim Northover
d33c3d2654 ARM: fix handling of SUB immediates in peephole opt.
We were negating an immediate that was going to be used in a SUBri form
unnecessarily. Since ADD/SUB are very similar we *can* do that, but we have to
change the SUB to an ADD at the same time. This also applies to ADD, and allows
us to handle a slightly larger range of immediates for those two operations.

rdar://25992245

llvm-svn: 268276
2016-05-02 18:30:08 +00:00
Justin Holewinski
9cc32d1b1b [NVPTX] Fix sign/zero-extending ldg/ldu instruction selection
Summary:
We don't have sign-/zero-extending ldg/ldu instructions defined,
so we need to emulate them with explicit CVTs. We were originally
handling the i8 case, but not any other cases.

Fixes PR26185

Reviewers: jingyue, jlebar

Subscribers: jholewinski

Differential Revision: http://reviews.llvm.org/D19615

llvm-svn: 268272
2016-05-02 18:12:02 +00:00
Tom Stellard
179b86b996 AMDGPU/SI: Use the hazard recognizer to break SMEM soft clauses
Summary:
Add support for detecting hazards in SMEM soft clauses, so that we only
break the clauses when necessary, either by adding s_nop or re-ordering
other alu instructions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18870

llvm-svn: 268260
2016-05-02 17:39:06 +00:00
Derek Schuff
f0cc027b2e [WebAssembly] Rename memory_size intrinsic to current_memory
This follows the recent renaming in the wasm spec.

llvm-svn: 268255
2016-05-02 17:25:22 +00:00
Tom Stellard
7f58d124e5 AMDGPU/SI: Use hazard recognizer to detect DPP hazards
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18603

llvm-svn: 268247
2016-05-02 16:23:09 +00:00
David L Kreitzer
5e9178eeb3 Enable the X86 call frame optimization for the 64-bit targets that allow it.
Fixes PR27241.

Differential Revision: http://reviews.llvm.org/D19688

llvm-svn: 268227
2016-05-02 13:45:25 +00:00
Jonas Paulsson
ee848f9766 [SystemZ] Temporarily disable codegen test int-add-12.ll.
This checks for AGSI transformation, which is temporarily disabled.

llvm-svn: 268219
2016-05-02 10:42:47 +00:00
Craig Topper
721f6428df [AVX512] VPACKUSWB/VPACKSSWB should not be encoded with EVEX.W=1. While there fix the execution domain for VPACKSSDW/VPACKUSDW.
llvm-svn: 268200
2016-05-01 17:38:32 +00:00
Igor Breger
fa752e801d getelementptr instruction, support index vector of EVT.
Differential Revision: http://reviews.llvm.org/D19775

llvm-svn: 268195
2016-05-01 13:29:12 +00:00
Igor Breger
a0208b4462 Change AVX512 braodcastsd/ss patterns interaction with spilling . New implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth.
Differential Revision: http://reviews.llvm.org/D19579

llvm-svn: 268190
2016-05-01 08:40:00 +00:00
Craig Topper
5387c11293 [AVX512] Prefer AVX512 VPACK instructions over AVX/AVX2 instructions when VLX and BWI are supported.
llvm-svn: 268189
2016-05-01 06:52:19 +00:00
Tom Stellard
6245c9db08 AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaits
This was supposed to be part of r268143.

llvm-svn: 268154
2016-04-30 04:04:48 +00:00
Tom Stellard
51b37329c1 AMDGPU/SI: Enable the post-ra scheduler
Summary:
This includes a hazard recognizer implementation to replace some of
the hazard handling we had during frame index elimination.

Reviewers: arsenm

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18602

llvm-svn: 268143
2016-04-30 00:23:06 +00:00
Haicheng Wu
611abb7dc9 [MBP] Use Function::optForSize() instead of checking OptimizeForSize directly.
Fix a FIXME.  Disable loop alignment if compiled with -Oz now.

llvm-svn: 268121
2016-04-29 22:01:10 +00:00
Matt Arsenault
42ea6294ae AMDGPU: Fix crash with unreachable terminators.
If a block has no successors because it ends in unreachable,
this was accessing an invalid iterator.

Also stop counting instructions that don't emit any
real instructions.

llvm-svn: 268119
2016-04-29 21:52:13 +00:00
Sriraman Tallam
899df7646a Differential Revision: http://reviews.llvm.org/D19733
llvm-svn: 268106
2016-04-29 21:19:16 +00:00
Matt Arsenault
87a15c33eb AMDGPU: Add kernarg.segment.ptr intrinsic
llvm-svn: 268105
2016-04-29 21:16:52 +00:00
Matt Arsenault
1e65ead116 DAGCombiner: Reduce truncated shl width
llvm-svn: 268094
2016-04-29 19:53:16 +00:00
Guozhi Wei
194cccd63b [PPC] Enable shuffling of VSX vectors
This patch fixes PR27078 by enabling shuffling of vectors if VSX is available.

llvm-svn: 268064
2016-04-29 17:00:54 +00:00
Simon Dardis
ccaff6aa52 [mips][FastISel] A store is not a load.
Correct trivial error. One of the failing tests from PR/27458.

Reviewers: dsanders, vkalintiris, mcrosier

Differential Review: http://reviews.llvm.org/D19726

llvm-svn: 268053
2016-04-29 16:07:47 +00:00
Krzysztof Parzyszek
d4659dc8ea [Hexagon] Optimize addressing modes for load/store
Patch by Jyotsna Verma.

llvm-svn: 268051
2016-04-29 15:49:13 +00:00
Tom Stellard
33134ca52e AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions
Summary:
These instructions can add an immediate offset to the address, like other
ds instructions.

Reviewers: arsenm

Subscribers: arsenm, scchan

Differential Revision: http://reviews.llvm.org/D19233

llvm-svn: 268043
2016-04-29 14:34:26 +00:00
Nikolay Haustov
048a920e0e AMDGPU/SI: Assembler: Unify parsing/printing of operands.
Summary:
The goal is for each operand type to have its own parse function and
at the same time share common code for tracking state as different
instruction types share operand types (e.g. glc/glc_flat, etc).

Introduce parseAMDGPUOperand which can parse any optional operand.
DPP and Clamp/OMod have custom handling for now. Sam also suggested
to have class hierarchy for operand types instead of table. This
can be done in separate change.

Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps,
parseMubufOptionalOps, parseDPPOptionalOps.
Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class.
Rename AsmMatcher/InstPrinter methods accordingly.
Print immediate type when printing parsed immediate operand.
Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3).
Update tests.

Reviewers: tstellarAMD, SamWot, artem.tamazov

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19584

llvm-svn: 268015
2016-04-29 09:02:30 +00:00
Matthias Braun
5d4a43cf37 RegisterPressure: Fix default lanemask for missing regunit intervals
In case of missing live intervals for a physical registers
getLanesWithProperty() would report 0 which was not a safe default in
all situations. Add a parameter to pass in a safe default.
No testcase because in-tree targets do not skip computing register unit
live intervals.

Also cleanup the getXXX() functions to not perform the
RequireLiveIntervals checks anymore so we do not even need to return
safe defaults.

llvm-svn: 267977
2016-04-29 02:44:54 +00:00
Marcin Koscielnicki
832d560a7e [PowerPC] Fix the EH_SjLj_Setup pseudo.
This instruction is just a control flow marker - it should not
actually exist in the object file.  Unfortunately, nothing catches
it before it gets to AsmPrinter.  If integrated assembler is used,
it's considered to be a normal 4-byte instruction, and emitted as
an all-0 word, crashing the program.  With external assembler,
a comment is emitted.

Fixed by setting Size to 0 and handling it in MCCodeEmitter - this
means the comment will still be emitted if integrated assembler
is not used.

This broke an ASan test, which has been disabled for a long time
as a result (see the discussion on D19657).  We can reenable it
once this lands.

llvm-svn: 267943
2016-04-28 21:24:37 +00:00
Krzysztof Parzyszek
89f6a784c5 [RDF] Improve handling of inline-asm
- Keep implicit defs from inline-asm instructions.
- Treat register references from inline-asm as fixed.

llvm-svn: 267936
2016-04-28 20:33:33 +00:00
Matt Arsenault
28f0a3fe58 AMDGPU: Emit error if too much LDS is used
llvm-svn: 267922
2016-04-28 19:37:35 +00:00
Krzysztof Parzyszek
f555db9453 Reset the TopRPTracker's position in ScheduleDAGMILive::initQueues
ScheduleDAGMI::initQueues changes the RegionBegin to the first non-debug
instruction. Since it does not track register pressure, it does not affect
any RP trackers. ScheduleDAGMILive inherits initQueues from ScheduleDAGMI,
and it does reset the TopTPTracker in its schedule method. Any derived,
target-specific scheduler will need to do it as well, but the TopRPTracker
is only exposed as a "const" object to derived classes. Without the ability
to modify the tracker directly, this leaves a derived scheduler with a
potential of having the TopRPTracker out-of-sync with the CurrentTop.

The symptom of the problem:
  void llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit *, bool):
  Assertion `TopRPTracker.getPos() == CurrentTop && "out of sync"' failed.

Differential Revision: http://reviews.llvm.org/D19438

llvm-svn: 267918
2016-04-28 19:17:44 +00:00
Matt Arsenault
f94836045a AMDGPU: Fix mishandling array allocations when promoting alloca
The canonical form for allocas is a single allocation of the array type.
In case we see a non-canonical array alloca, make sure we aren't
replacing this with an array N times smaller.

llvm-svn: 267916
2016-04-28 18:38:48 +00:00
Simon Dardis
156870a1a4 [mips][atomics] Fix partword atomic binary operation implementation
Currently Mips::emitAtomicBinaryPartword() does not properly respect the
width of pointers. For MIPS64 this causes the memory address that the ll/sc
sequence uses to be truncated. At runtime this causes a segmentation fault.

This can be fixed by applying similar changes as r266204, so that a full 64bit
pointer is loaded.

Reviewers: dsanders

Differential Review: http://reviews.llvm.org/D19651

llvm-svn: 267900
2016-04-28 16:26:43 +00:00
Krzysztof Parzyszek
49d1f997e6 [RDF] Handle undefined registers in RDF copy propagation
When updating the graph, make sure that new uses without reaching defs
are handled correctly.

llvm-svn: 267891
2016-04-28 15:09:19 +00:00
Matthias Braun
18562ab366 CodeGen: Add DetectDeadLanes pass.
The DetectDeadLanes pass performs a dataflow analysis of used/defined
subregister lanes across COPY instructions and instructions that will
get lowered to copies. It detects dead definitions and uses reading
undefined values which are obscured by COPY and subregister usage.

These dead definitions cause trouble in the register coalescer which
cannot deal with definitions suddenly becoming dead after coalescing
COPY instructions.

For now the pass only adds dead and undef flags to machine operands. It
should be possible to extend it in the future to remove the dead
instructions and redo the analysis for the affected virtual
registers.

Differential Revision: http://reviews.llvm.org/D18427

llvm-svn: 267851
2016-04-28 03:07:16 +00:00
Bryan Chan
2567ab558c [SystemZ] Support Swift Calling Convention
Summary:
Port rL265480, rL264754, rL265997 and rL266252 to SystemZ, in order to enable the Swift port on the architecture. SwiftSelf and SwiftError are assigned to R10 and R9, respectively, which are normally callee-saved registers. For more information, see:

RFC: Implementing the Swift calling convention in LLVM and Clang
https://groups.google.com/forum/#!topic/llvm-dev/epDd2w93kZ0

Reviewers: kbarton, manmanren, rjmccall, uweigand

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19414

llvm-svn: 267823
2016-04-28 00:17:23 +00:00
Mitch Bodart
fde6615c97 [X86] Enable the post-RA-scheduler for clang's default 32-bit cpu.
For compilations with no explicit cpu specified, this exhibits
nice gains on Silvermont, with neutral performance on big cores.

Differential Revision: http://reviews.llvm.org/D19138

llvm-svn: 267809
2016-04-27 22:52:35 +00:00
Quentin Colombet
d6bb035737 [X86][FastISel] Make sure we use the right register class when we select stores.
llvm-svn: 267806
2016-04-27 22:33:42 +00:00
Quentin Colombet
96d6f82ab0 [X86] Fix the lowering of TLS calls.
The callseq_end node must be glued with the TLS calls, otherwise,
the generic code will miss the uses of the returned value and will
mark it dead.
Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI,
the pseudo uses the symbol address at this point not RDI and the
lowering will do the right thing.

llvm-svn: 267797
2016-04-27 21:37:37 +00:00
Matt Arsenault
982e737c85 AMDGPU: Account for globals in AMDGPUPromoteAlloca pass
Patch by Bas Nieuwenhuizen

llvm-svn: 267791
2016-04-27 21:05:08 +00:00
Ahmed Bougacha
e8bff14c32 [AArch64] Set correct successors in CMPXCHG pseudo expansion.
transferSuccessors() would LoadCmpBB a successor of DoneBB,
whereas it should be a successor of the original MBB.

Follow-up to r266339.

Unfortunately, it's tricky to catch this in the verifier.

llvm-svn: 267779
2016-04-27 20:33:02 +00:00
Ahmed Bougacha
492c1a346a [ARM] Set correct successors in CMPXCHG pseudo expansion.
transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas
it should be a successor of the original MBB.

The testcase changes are caused by Thumb2SizeReduction, which
was previously confused by the broken CFG.

Follow-up to r266679.

Unfortunately, it's tricky to catch this in the verifier.

llvm-svn: 267778
2016-04-27 20:32:54 +00:00
Kevin B. Smith
1783031f2d [X86]: Quit promoting 16 bit loads to 32 bit.
Differential Revision: http://reviews.llvm.org/D19592

llvm-svn: 267773
2016-04-27 19:58:03 +00:00
Marcin Koscielnicki
1e17bfd3e5 [Mips] Add support for llvm.thread.pointer intrinsic.
This will be used to implement __builtin_thread_pointer in clang.

Differential Revision: http://reviews.llvm.org/D19569

llvm-svn: 267743
2016-04-27 17:21:49 +00:00
Nicolai Haehnle
494b4aee1e AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.

(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)

Reviewers: arsenm, mareko, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19203

llvm-svn: 267729
2016-04-27 15:46:01 +00:00
Artem Tamazov
0b6855273a [AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware registers.
Possibility to specify code of hardware register kept.
Disassemble to symbolic name, if name is known.
Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19335

llvm-svn: 267724
2016-04-27 15:17:03 +00:00
Nico Weber
b519b357d0 Revert r267649, it caused PR27539.
llvm-svn: 267723
2016-04-27 15:16:54 +00:00