1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00
Commit Graph

2282 Commits

Author SHA1 Message Date
Florian Hahn
28cfee69f6 [AArch64] Fix order of checks in shouldScheduleAdjacent.
We need to check the opcode of FirstMI before accessing the operands. This
caused a buildbot failure during bootstrapping on AArch64.

llvm-svn: 305694
2017-06-19 13:45:41 +00:00
Florian Hahn
d1d5802289 Recommit rL305677: [CodeGen] Add generic MacroFusion pass
Use llvm::make_unique to avoid ambiguity with MSVC.

This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.

Differential Revision: https://reviews.llvm.org/D34144

llvm-svn: 305690
2017-06-19 12:53:31 +00:00
Florian Hahn
8e7ee1f5d5 Revert r305677 [CodeGen] Add generic MacroFusion pass.
This causes Windows buildbot failures do an ambiguous call.

llvm-svn: 305681
2017-06-19 11:26:15 +00:00
Florian Hahn
9de5c3ed15 [CodeGen] Add generic MacroFusion pass.
Summary:
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.

Reviewers: craig.topper, evandro, t.p.northover, atrick, MatzeB

Reviewed By: MatzeB

Subscribers: atrick, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D34144

llvm-svn: 305677
2017-06-19 10:51:38 +00:00
Nirav Dave
2818e7786f [AArch64] Add indexed check to splitStores. NFC.
Add explicit check for unhandled cases in preparation for delaying
splitStores to post-legalization.

llvm-svn: 305471
2017-06-15 14:47:44 +00:00
Florian Hahn
155cc0ae26 [AArch64] Enable FeatureFuseAES for the generic processor model.
Summary:
Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back
gives a double digit speedup on benchmarks using those instructions on
Cortex-A processors. In GCC, this optimization is part of the generic
processor model as well.

This change should not have a major performance impact on processors
that do not optimize AES instruction pairs, although I only had access
to Cortex-A processors for benchmarking.


Reviewers: rengolin, kristof.beyls, javed.absar, evandro, silviu.baranga, MatzeB, mcrosier, joelkevinjones, joel_k_jones, bmakam, t.p.northover

Reviewed By: evandro

Subscribers: sbaranga, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D33836

llvm-svn: 305457
2017-06-15 09:31:23 +00:00
Geoff Berry
9326eb04c1 [AArch64][Falkor] Fix sched details for FDIV, FSQRT, SDIV, UDIV
llvm-svn: 305310
2017-06-13 17:43:39 +00:00
Tim Northover
45477d3671 AArch64: don't try to emit an add (shifted reg) for SP.
The "Add/sub (shifted reg)" instructions use the 31 encoding for xzr and wzr
rather than the SP, so we need to use different variants.

Situations where this actually comes up are rare enough (see test-case) that I
think falling back to DAG is fine.

llvm-svn: 305230
2017-06-12 20:49:53 +00:00
Haicheng Wu
20d2f0ba58 [Falkor] Enable SW Prefetch.
SW prefetch is good for Falkor.

Differential Revision: http://reviews.llvm.org/D34084

llvm-svn: 305199
2017-06-12 16:34:19 +00:00
Daniel Neilson
1fd6840870 Const correctness for TTI::getRegisterBitWidth
Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation.

Reviewers: chandlerc, rnk, reames

Reviewed By: reames

Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits

Differential Revision: https://reviews.llvm.org/D33903

llvm-svn: 305189
2017-06-12 14:22:21 +00:00
I-Jui (Ray) Sung
46c22f7924 [AArch64] Add fallback in FastISel fp16 conversions
Summary:
- Fix assertion failures on F16 to/from int types in FastISel by falling
  back to regular ISel
- Add a testcase of various conversion cases with FastISel (-O0)

Reviewers: kristof.beyls, jmolloy, SjoerdMeijer

Reviewed By: SjoerdMeijer

Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls

Differential Revision: https://reviews.llvm.org/D33734

llvm-svn: 305127
2017-06-09 22:40:50 +00:00
Zachary Turner
c5632126fc Move Object format code to lib/BinaryFormat.
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.

Differential Revision: https://reviews.llvm.org/D33843

llvm-svn: 304864
2017-06-07 03:48:56 +00:00
Chandler Carruth
eb66b33867 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Mandeep Singh Grang
efd068d7d5 [llvm] Remove double semicolons
Reviewers: craig.topper, arsenm, mehdi_amini

Reviewed By: mehdi_amini

Subscribers: mehdi_amini, wdng, nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33924

llvm-svn: 304767
2017-06-06 05:08:36 +00:00
Geoff Berry
bef0ea0487 [AArch64][Falkor] Model immediate forwarding.
llvm-svn: 304552
2017-06-02 14:27:41 +00:00
Eugene Zelenko
8839504c59 [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 304495
2017-06-01 23:25:02 +00:00
Florian Hahn
87c40bfe25 [AArch64] Enable FeatureFuseAES on Cortex-A53.
It improves performance on Cortex-A53.

llvm-svn: 304307
2017-05-31 15:50:03 +00:00
Florian Hahn
c510e36e85 [AArch64] Enable FeatureFuseAES on Cortex-A73.
It improves performance on Cortex-A73.

llvm-svn: 304304
2017-05-31 15:25:25 +00:00
Abderrazek Zaafrani
eb1c576449 Add latency info for Exynos interleaved Load/Store instructions.
llvm-svn: 304259
2017-05-31 00:20:55 +00:00
Matthias Braun
7b8d690af1 TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFC
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.

While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.

llvm-svn: 304247
2017-05-30 21:36:41 +00:00
Craig Topper
b82b26e256 [SelectionDAG] Set ISD::FPOWI to Expand by default
Summary:
Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".

This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default.

Reviewers: spatel, RKSimon, efriedma

Reviewed By: RKSimon

Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits

Differential Revision: https://reviews.llvm.org/D33530

llvm-svn: 304215
2017-05-30 15:27:55 +00:00
Kristof Beyls
1166792bbb Fix PR33031: correct the estimate of maximum offset for instructions spilling/filling the stack.
llvm-svn: 304196
2017-05-30 06:58:41 +00:00
Geoff Berry
a7dc8119e5 [AArch64][Falkor] Combine sched details files into one. NFC.
llvm-svn: 304109
2017-05-28 22:20:44 +00:00
Geoff Berry
f08bdf2271 [AArch64][Falkor] Fix some sched details.
- Remove all uses of base sched model entries and set them all to
  Unsupported so all the opcodes are described in
  AArch64SchedFalkorDetails.td.
- Remove entries for unsupported half-float opcodes.
- Remove entries for unsupported LSE extension opcodes.
- Add entry for MOVbaseTLS (and set Sched in base td file entry to
  WriteSys) and a few other pseudo ops.
- Fix a few FP load/store with reg offset entries to use the LSLfast
  predicates.
- Add Q size BIF/BIT/BSL entries.
- Fix swapped Q/D sized CLS/CLZ/CNT/RBIT entires.
- Fix pre/post increment address register latency (this operand is
  always dest 0).
- Fix swapped FCVTHD/FCVTHS/FCVTDH/FCVTDS entries.
- Fix XYZ resource over usage on LD[1-4] opcodes.

llvm-svn: 304108
2017-05-28 21:48:31 +00:00
Matthias Braun
40877e1e1e AArch64/PEI: Do not add reserved regs to liveins
We do not track liveness for reserved registers. It is unnecessary to
add them to block livein lists.

llvm-svn: 304059
2017-05-27 03:38:02 +00:00
Quentin Colombet
ea039226e9 [AArch64][GlobalISel] Add the Localizer pass for the O0 pipeline
This should fix most of the issue we have right now with constants being
spilled all over the place.

llvm-svn: 304052
2017-05-27 01:34:07 +00:00
Matthias Braun
c52e06e4bd AArch64: Fix cmpxchg O0 expansion
- Rewrite livein calculation to use the computeLiveIns() helper
  function. This is slightly less efficient but easier to reason about
  and doesn't unnecessarily add pristine and reserved registers[1]
- Zero the status register at the beginning of the loop to make sure it
  has a defined value.
- Remove kill flags of values that need to stay alive throughout the loop.

[1] An upcoming commit of mine will tighten the MachineVerifier to catch
    these.

llvm-svn: 304048
2017-05-26 23:48:59 +00:00
Matthias Braun
4d8b3e20c7 LivePhysRegs: Rework constructor + documentation; NFC
- Take reference instead of pointer to a TRI that cannot be nullptr.
- Improve documentation comments.

llvm-svn: 304038
2017-05-26 21:51:00 +00:00
Nirav Dave
f6a68a8085 Fix signedness of constant. NFC.
llvm-svn: 303980
2017-05-26 12:53:10 +00:00
Manoj Gupta
8e696dc227 [AArch64]: add 'a' inline asm operand modifier.
Summary:
This is used in the Linux kernel, and effectively just means "print an
address". This brings back r193593.

Reviewed by: Renato Golin

Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls

Subscribers: aemerson, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D33558

llvm-svn: 303901
2017-05-25 19:07:57 +00:00
Nirav Dave
7dc90034fc [AArch64] Prevent nested ADDs from address calc in splitStoreSplat. NFC
In preparation for late-stage store merging.

llvm-svn: 303800
2017-05-24 19:55:49 +00:00
Matthew Simpson
983065c8f6 Revert r291254: [AArch64] Reduce vector insert/extract cost for Falkor
The default vector insert/extract cost is more profitable on Falkor than the
reduced cost.

llvm-svn: 303771
2017-05-24 16:48:39 +00:00
Geoff Berry
30786023f2 [AArch64][Falkor] Refine sched details for LSLfast/ASRfast.
llvm-svn: 303682
2017-05-23 19:57:45 +00:00
Geoff Berry
267ecfb00c [AArch64][Falkor] Fix sched details for FMOV of WZR/XZR.
llvm-svn: 303680
2017-05-23 19:54:28 +00:00
Florian Hahn
36a235023f [AArch64] Make instruction fusion more aggressive.
Summary:
This patch makes instruction fusion more aggressive by
* adding artificial edges between the successors of FirstSU and
  SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps.
* updating PostGenericScheduler::tryCandidate to keep clusters together,
   similar to GenericScheduler::tryCandidate.

This change increases the number of AES instruction pairs generated on
 Cortex-A57 and Cortex-A72. This doesn't change code at all in
 most benchmarks or general code, but we've seen improvement on kernels
 using AESE/AESMC and AESD/AESIMC. 

Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB

Reviewed By: evandro

Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D33230

llvm-svn: 303618
2017-05-23 09:33:34 +00:00
Akira Hatanaka
bb4f32cc9f [AArch64] Fix PRR33100.
This commit fixes a bug introduced in r301019 where optimizeLogicalImm
would replace a logical node's immediate operand that was CSE'd and
was also an operand of another node.

This commit fixes the bug by replacing the logical node instead of its
immediate operand.

rdar://problem/32295276

llvm-svn: 303607
2017-05-23 06:08:37 +00:00
Daniel Sanders
ba45842e7a [globalisel][tablegen] Demote OptForSize/OptForMinSize/ForCodeSize to per-function predicates.
Summary:
This causes them to be re-computed more often than necessary but resolves
objections that were raised post-commit on r301750.

Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls

Reviewed By: qcolombet

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D32861

llvm-svn: 303418
2017-05-19 11:08:33 +00:00
Francis Visoiu Mistrih
5f6c901f02 [LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
  `getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
  `addRequired<TargetPassConfig>` and call
  `getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

llvm-svn: 303360
2017-05-18 17:21:13 +00:00
Francis Visoiu Mistrih
d02a2cc569 BitVector: add iterators for set bits
Differential revision: https://reviews.llvm.org/D32060

llvm-svn: 303227
2017-05-17 01:07:53 +00:00
Amara Emerson
b4afa9c73c Re-commit r302678, fixing PR33053.
The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions
which didn't have a lowering.

llvm-svn: 303211
2017-05-16 21:29:22 +00:00
Chad Rosier
66c9889ab8 Fix an improperly placed curly bracket. NFC.
llvm-svn: 303165
2017-05-16 12:43:23 +00:00
Tim Northover
cfd3f40830 AArch64: use linker-private symbols for globals in MachO.
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).

llvm-svn: 303118
2017-05-15 21:51:38 +00:00
Adam Nemet
f9607f0660 [SLP] Enable 64-bit wide vectorization on AArch64
ARM Neon has native support for half-sized vector registers (64 bits).  This
is beneficial for example for 2D and 3D graphics.  This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.

*** Performance Analysis

This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.

The results are with -O3 and PGO.  A negative percentage is an improvement.
The testsuite was run with a sample size of 4.

** SPEC

* CFP2006/482.sphinx3  -3.34%

A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.

* CFP2000/177.mesa     -3.34%
* CINT2000/256.bzip2   +6.97%

My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%.  There are also other problems with the codegen in
this loop so there is further room for improvement.

** LLVM testsuite

* SingleSource/Benchmarks/Misc/ReedSolomon               -10.75%

There are multiple small SLP vectorizations outside the hot code.  It's a bit
surprising that it adds up to 10%.  Some of this may be code-layout noise.

* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%

The opt-viewer screenshot can be seen at F3218284.  We start at a colder store
but the tree leads us into the hottest loop.

* MultiSource/Applications/lambda-0.1.3/lambda            -2.68%
* MultiSource/Benchmarks/Bullet/bullet                    -2.18%

This is using 3D vectors.

* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%

Noise, binary is unchanged.

* MultiSource/Benchmarks/Ptrdist/anagram/anagram          +4.90%

There is an additional SLP in the cold code.  The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.

* MultiSource/Applications/aha/aha                        +1.63%
* MultiSource/Applications/JM/lencod/lencod               +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark         +1.15%

Differential Revision: https://reviews.llvm.org/D31965

llvm-svn: 303116
2017-05-15 21:15:01 +00:00
Hans Wennborg
247e13c637 Revert r302678 "[AArch64] Enable use of reduction intrinsics."
This caused PR33053.

Original commit message:

> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247

llvm-svn: 303115
2017-05-15 20:59:32 +00:00
Tim Northover
77c86e2d11 AArch64: diagnose unrecognized features in .cpu directive.
We were silently ignoring any features we couldn't match up, which led to
errors in an inline asm block missing the conventional "\n\t".

llvm-svn: 303108
2017-05-15 19:42:15 +00:00
Geoff Berry
db69675c48 [AArch64][Falkor] Fix sched details for FMOV
llvm-svn: 303099
2017-05-15 18:50:22 +00:00
Florian Hahn
c8227d3439 [AArch64] Enable FeatureFuseAES on Cortex-A72.
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.

llvm-svn: 303073
2017-05-15 15:15:22 +00:00
Geoff Berry
b8c6e63903 [AArch64][Falkor] Refine modeling of multiply accumulate forwarding.
llvm-svn: 302933
2017-05-12 18:57:10 +00:00
Chad Rosier
4962115cec [AArch64][MachineCombine] Fold FNMUL+FSUB -> FNMADD.
Differential Revision: http://reviews.llvm.org/D33101.

llvm-svn: 302822
2017-05-11 20:07:24 +00:00
Quentin Colombet
625bc4b0e8 [AArch64][RegisterBankInfo] Change the default mapping of fp stores.
For stores, check if the stored value is defined by a floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.

llvm-svn: 302679
2017-05-10 15:19:41 +00:00