1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 19:52:54 +01:00
Commit Graph

619 Commits

Author SHA1 Message Date
Ahmed Bougacha
9f7e91c37f [AArch64] Use intermediate step for concat_vectors of illegal truncs.
Optimize concat_vectors of truncated vectors, where the intermediate
type is illegal, to avoid said illegality,  e.g.,
  (v4i16 (concat_vectors (v2i16 (truncate (v2i64))),
                         (v2i16 (truncate (v2i64)))))
->
  (v4i16 (truncate (v4i32 (concat_vectors (v2i32 (truncate (v2i64))),
                                          (v2i32 (truncate (v2i64)))))))

This isn't really target-specific, and, as such, would best go in the
DAGCombiner.  However, ISD::TRUNCATE legality isn't keyed on both input
and result type, so we might generate worse code when we don't know
better.  On AArch64 we know it's fine for v2i64->v4i16 and v4i32->v8i8.
rdar://20022387

llvm-svn: 232459
2015-03-17 03:23:09 +00:00
Duncan P. N. Exon Smith
9b4abdcdd5 DebugInfo: Fix testcases that fail -verify-debug-info=true
As part of PR22777, fix testcases that fail the debug info verifier.
The changes fall into the following categories:

  - Empty `filename:` fields in `MDFile`s.  Compile units and some types
    require non-empty filenames.  A number of testcases have empty
    filenames, probably due to hand-reduction of testcases.
  - Not-quite empty arrays: `!{i32 0}`.  This used to be equivalent in
    the debug info schema to `!{}`.  They cause problems for
    `!MDSubroutineType`'s `types:` array, since it requires all operands
    to be valid types.  (Note that `!{null}` is the correct type array
    for functions that take no arguments and return `void`.)
  - Significantly bitrotted testcases.  Nodes got left behind a few
    upgrades ago because of missing or invalid tags.

llvm-svn: 232415
2015-03-16 21:10:12 +00:00
Ahmed Bougacha
b8ea46fe37 Add a bunch of CHECK missing colons in tests. NFC.
Some wouldn't pass;  fixed most, the rest will be fixed separately.

llvm-svn: 232239
2015-03-14 01:43:57 +00:00
David Blaikie
3ea2df7c7b [opaque pointer type] Add textual IR support for explicit type parameter to gep operator
Similar to gep (r230786) and load (r230794) changes.

Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.

(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)

import fileinput
import sys
import re

rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)

def conv(match):
  line = match.group(1)
  line += match.group(4)
  line += ", "
  line += match.group(2)
  return line

line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
  sys.stdout.write(line[off:match.start()])
  sys.stdout.write(conv(match))
  off = match.end()
sys.stdout.write(line[off:])

llvm-svn: 232184
2015-03-13 18:20:45 +00:00
Hao Liu
ac40111e96 [MachineCopyPropagation] Fix a bug causing incorrect removal for the instruction sequences as follows
%Q5_Q6<def> = COPY %Q2_Q3
   %D5<def> =
   %D3<def> =
   %D3<def> = COPY %D6     // Incorrectly removed in MachineCopyPropagation
   Using of %D3 results in incorrect result ...

   Reviewed in http://reviews.llvm.org/D8242 

llvm-svn: 232142
2015-03-13 05:15:23 +00:00
Ahmed Bougacha
faad462651 [AArch64] Avoid going through GPRs for across-vector instructions.
This adds new node types for each intrinsic.
For instance, for addv, we have AArch64ISD::UADDV, such that:
  (v4i32 (uaddv ...))
is the same as
  (v4i32 (scalar_to_vector (i32 (int_aarch64_neon_uaddv ...))))
that is,
  (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
           (i32 (int_aarch64_neon_uaddv ...)), ssub)

In a combine, we transform all such across-vector-lanes intrinsics to:

  (i32 (extract_vector_elt (uaddv ...), 0))

This has one big advantage: by making the extract_element explicit, we
enable the existing patterns for lane-aware instructions to fire.
This lets us avoid needlessly going through the GPRs.  Consider:

    uint32x4_t test_mul(uint32x4_t a, uint32x4_t b) {
        return vmulq_n_u32(a, vaddvq_u32(b));
    }

We now generate:
    addv.4s  s1, v1
    mul.4s   v0, v0, v1[0]
instead of the previous:
    addv.4s  s1, v1
    fmov     w8, s1
    dup.4s   v1, w8
    mul.4s   v0, v1, v0

rdar://20044838

llvm-svn: 231840
2015-03-10 20:45:38 +00:00
Quentin Colombet
96e4689440 [AArch64][LoadStoreOptimizer] Generate LDP + SXTW instead of LD[U]R + LD[U]RSW.
Teach the load store optimizer how to sign extend a result of a load pair when
it helps creating more pairs.
The rational is that loads are more expensive than sign extensions, so if we
gather some in one instruction this is better!

<rdar://problem/20072968>

llvm-svn: 231527
2015-03-06 22:42:10 +00:00
Ahmed Bougacha
c9b6f0f6b4 [AArch64] Teach AsmPrinter about GlobalAddress operands.
Fixes PR22761, rdar://20024866.
Differential Revision: http://reviews.llvm.org/D8042

llvm-svn: 231400
2015-03-05 20:04:21 +00:00
Kristof Beyls
4e1076f140 Fix PR22408 - LLVM producing AArch64 TLS relocations that GNU linkers cannot handle yet.
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers
ld.bfd and ld.gold currently only support a subset of the whole range of AArch64
ELF TLS relocations. Furthermore, they assume that some of the code sequences to
access thread-local variables are produced in a very specific sequence.
When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize
the instructions.
Even if that wouldn't be the case, it's good to produce the exact sequence,
as that ensures that linkers can perform optimizing relaxations.

This patch:

* implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang
  would grow an -mtls-size option to allow support for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold
  linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation)
  is added to enable generation of local dynamic code sequence, but is off by default.
* makes sure that the exact expected code sequence for local dynamic and general dynamic
  accesses is produced, by making use of a new pseudo instruction. The patch also removes
  two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo
  SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ).

llvm-svn: 231227
2015-03-04 09:12:08 +00:00
Chad Rosier
c6fa32cc2a [AArch64] When combining constant mul of -3, prefer (sub x, (shl x, N)).
This change only effects codegen when the constant is -3.

llvm-svn: 231085
2015-03-03 17:31:01 +00:00
Duncan P. N. Exon Smith
8d1b74869c DebugInfo: Move new hierarchy into place
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464.  I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned.  Let me know if I'm wrong :).

The code changes are fairly mechanical:

  - Bumped the "Debug Info Version".
  - `DIBuilder` now creates the appropriate subclass of `MDNode`.
  - Subclasses of DIDescriptor now expect to hold their "MD"
    counterparts (e.g., `DIBasicType` expects `MDBasicType`).
  - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
    for printing comments.
  - Big update to LangRef to describe the nodes in the new hierarchy.
    Feel free to make it better.

Testcase changes are enormous.  There's an accompanying clang commit on
its way.

If you have out-of-tree debug info testcases, I just broke your build.

  - `upgrade-specialized-nodes.sh` is attached to PR22564.  I used it to
    update all the IR testcases.
  - Unfortunately I failed to find way to script the updates to CHECK
    lines, so I updated all of these by hand.  This was fairly painful,
    since the old CHECKs are difficult to reason about.  That's one of
    the benefits of the new hierarchy.

This work isn't quite finished, BTW.  The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro).  Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers.  I
also expect to make a few schema changes now that it's easier to reason
about everything.

llvm-svn: 231082
2015-03-03 17:24:31 +00:00
David Blaikie
ab043ff680 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie
0d99339102 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Mehdi Amini
32875af6e3 Change the fast-isel-abort option from bool to int to enable "levels"
Summary:
Currently fast-isel-abort will only abort for regular instructions,
and just warn for function calls, terminators, function arguments.
There is already fast-isel-abort-args but nothing for calls and
terminators.

This change turns the fast-isel-abort options into an integer option,
so that multiple levels of strictness can be defined.
This will help no being surprised when the "abort" option indeed does
not abort, and enables the possibility to write test that verifies
that no intrinsics are forgotten by fast-isel.

Reviewers: resistor, echristo

Subscribers: jfb, llvm-commits

Differential Revision: http://reviews.llvm.org/D7941

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230775
2015-02-27 18:32:11 +00:00
Sanjoy Das
60e1014097 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
(The change was landed in r230280 and caused the regression PR22674.
This version contains a fix and a test-case for PR22674).
    
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.
    
This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.
    
Apart from the attached test case, another (more realistic)
manifestation of the bug can be seen in
Transforms/IndVarSimplify/pr20680.ll.

Differential Revision: http://reviews.llvm.org/D7778

llvm-svn: 230533
2015-02-25 20:02:59 +00:00
Matthias Braun
45ead6c770 AArch64: Relax assert about large shift sizes.
The reason why these large shift sizes happen is because OpaqueConstants
currently inhibit alot of DAG combining, but that has to be addressed in
another commit (like the proposal in D6946).

Differential Revision: http://reviews.llvm.org/D6940

llvm-svn: 230355
2015-02-24 18:52:04 +00:00
Hans Wennborg
fb5af71543 Revert r230280: "Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap"
This caused PR22674, failing this assert:

Instructions.h:2281: llvm::Value* llvm::PHINode::getOperand(unsigned int) const: Assertion `i_nocapture < OperandTraits<PHINode>::operands(this) && "getOperand() out of range!"' failed.

llvm-svn: 230341
2015-02-24 16:19:29 +00:00
Sanjoy Das
9dddcf7f33 Bugfix: SCEVExpander incorrectly marks increment operations as no-wrap
When emitting the increment operation, SCEVExpander marks the
operation as nuw or nsw based on the flags on the preincrement SCEV.
This is incorrect because, for instance, it is possible that {-6,+,1}
is <nuw> while {-6,+,1}+1 = {-5,+,1} is not.

This change teaches SCEV to mark the increment as nuw/nsw only if it
can explicitly prove that the increment operation won't overflow.

Apart from the attached test case, another (more realistic) manifestation
of the bug can be seen in Transforms/IndVarSimplify/pr20680.ll.

NOTE: this change was landed with an incorrect commit message in
rL230275 and was reverted for that reason in rL230279.  This commit
message is the correct one.

Differential Revision: http://reviews.llvm.org/D7778

llvm-svn: 230280
2015-02-23 23:22:58 +00:00
Sanjoy Das
e5ca05754e Revert 230275.
230275 got committed with an incorrect commit message due to a mixup
on my side.  Will re-land in a few moments with the correct commit
message.

llvm-svn: 230279
2015-02-23 23:13:22 +00:00
Sanjoy Das
421baaa45f Fix bug 22641
The bug was a result of getPreStartForExtend interpreting nsw/nuw
flags on an add recurrence more strongly than is legal.  {S,+,X}<nsw>
implies S+X is nsw only if the backedge of the loop is taken at least
once.

Differential Revision: http://reviews.llvm.org/D7808

llvm-svn: 230275
2015-02-23 22:55:13 +00:00
Andrew Trick
e7964c82c7 AArch64: Safely handle the incoming sret call argument.
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.

Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.

Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering

llvm-svn: 229413
2015-02-16 18:10:47 +00:00
Chandler Carruth
0b684c6980 [SDAG] Teach the SelectionDAG to canonicalize vector shuffles of splats
directly into blends of the splats.

These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.

Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.

llvm-svn: 229308
2015-02-15 12:18:12 +00:00
James Molloy
d75793f030 [SimplifyCFG] Be more aggressive
Up the phi node folding threshold from a cheap "1" to a meagre "2".

Update tests for extra added selects and slight code churn.

llvm-svn: 229099
2015-02-13 10:48:30 +00:00
Ahmed Bougacha
88db0c3c30 [CodeGen] Don't blindly combine (fp_round (fp_round x)) to (fp_round x).
We used to do this DAG combine, but it's not always correct:
If the first fp_round isn't a value preserving truncation, it might
introduce a tie in the second fp_round, that wouldn't occur in the
single-step fp_round we want to fold to.
In other words, double rounding isn't the same as rounding.

Differential Revision: http://reviews.llvm.org/D7571

llvm-svn: 228911
2015-02-12 06:15:29 +00:00
James Molloy
ba8cd33738 [SimplifyCFG] Swap to using TargetTransformInfo for cost
analysis.

We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.

Test changes:
  * combine-comparisons-by-cse.ll: Removed unneeded branch check.
  * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
  * coalesce-subregs.ll: Superfluous block check.
  * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
  * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
  * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.

llvm-svn: 228826
2015-02-11 12:15:41 +00:00
Tim Northover
78c64c2e1c ARM & AArch64: teach LowerVSETCC that output type size may differ from input.
While various DAG combines try to guarantee that a vector SETCC
operation will have the same output size as input, there's nothing
intrinsic to either creation or LegalizeTypes that actually guarantees
it, so the function needs to be ready to handle a mismatch.

Fortunately this is easy enough, just extend or truncate the naturally
compared result.

I couldn't reproduce the failure in other backends that I know have
SIMD, so it's probably only an issue for these two due to shared
heritage.

Should fix PR21645.

llvm-svn: 228518
2015-02-08 00:50:47 +00:00
Matthias Braun
3baac7bffc AArch64: Make test more robust.
Avoid the creation of select instructions which can result in different
scheduling of the selects.

I also added a bunch of additional store volatiles. Those avoid A
CodeGen problem (bug?) where normalizes and denomarlizing the control
moves all shift instructions into the first block where ISel can't match
them together with the cmps.

llvm-svn: 228362
2015-02-05 23:52:14 +00:00
Matthias Braun
84aaa1dd81 MachineCSE: Clear dead-def flag on CSE.
In case CSE reuses a previoulsy unused register the dead-def flag has to
be cleared on the def operand, as exposed by the arm64-cse.ll test.

This fixes PR22439 and the corresponding rdar://19694987

Differential Revision: http://reviews.llvm.org/D7395

llvm-svn: 228178
2015-02-04 19:35:16 +00:00
Renato Golin
0bf68985f6 Adding support to LLVM for targeting Cortex-A72
Currently, Cortex-A72 is modelled as an Cortex-A57 except the fp
load balancing pass isn't enabled for Cortex-A72 as it's not
profitable to have it enabled for this core.

Patch by Ranjeet Singh.

llvm-svn: 228140
2015-02-04 13:31:29 +00:00
Ahmed Bougacha
53ed9373bc [AArch64] Prefer DUP/MOV ("CPY") to INS for vector_extract.
This avoids a partial false dependency on the previous content of
the upper lanes of the destination vector register.

Differential Revision: http://reviews.llvm.org/D7307

llvm-svn: 227820
2015-02-02 17:55:57 +00:00
Ahmed Bougacha
77743b0a8b [AArch64] Add a few more DUP testcases. NFC.
Also, don't lie about testing index 0.

llvm-svn: 227642
2015-01-30 23:41:15 +00:00
Ahmed Bougacha
bc7d2a0446 [AArch64] Robustize neon-scalar-copy.ll tests. NFC.
Some of those didn't even have run lines: they were removed
inadvertently during the Great Merge of 2014.

They used to check for DUPs, but now we go through W-regs?
Filed PR22418 for that potential regression.

For now, just make the tests explicit, so we now where we stand.

llvm-svn: 227635
2015-01-30 23:13:57 +00:00
Hao Liu
89352e7534 [AArch64]Fix PR21675, a bug about lowering llvm.ctpop.i32. We should noot use "DAG.getUNDEF(MVT::v8i8)" to get all zero vector.
Patch by Wei-cheng Wang.

llvm-svn: 227550
2015-01-30 02:13:53 +00:00
Quentin Colombet
9be6cf402a [AArch64][LoadStoreOptimizer] Form LDPSW when possible.
This patch adds the missing LD[U]RSW variants to the load store optimizer, so
that we generate LDPSW when possible.

<rdar://problem/19583480>

llvm-svn: 226978
2015-01-24 01:25:54 +00:00
Tim Northover
5b0e908c64 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
It can help with argument juggling on some targets, and is generally a good
idea.

llvm-svn: 226740
2015-01-21 23:17:19 +00:00
Tim Northover
c2963c8019 Revert "DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))"
It hadn't gone through review yet, but was still on my local copy.

This reverts commit r226663

llvm-svn: 226665
2015-01-21 15:48:52 +00:00
Tim Northover
4b020bd23b AArch64: add backend option to reserve x18 (platform register)
AAPCS64 says that it's up to the platform to specify whether x18 is
reserved, and a first step on that way is to add a flag controlling
it.

From: Andrew Turner <andrew@fubar.geek.nz>
llvm-svn: 226664
2015-01-21 15:43:31 +00:00
Tim Northover
c9cc73b336 DAGCombine: fold (or (and X, M), (and X, N)) -> (and X, (or M, N))
llvm-svn: 226663
2015-01-21 15:43:28 +00:00
Greg Fitzgerald
91e9dd41fa [AArch64] Implement GHC calling convention
Original patch by Luke Iannini.  Minor improvements and test added by
Erik de Castro Lopo.

Differential Revision: http://reviews.llvm.org/D6877

From: Erik de Castro Lopo <erikd@mega-nerd.com>
llvm-svn: 226473
2015-01-19 17:40:05 +00:00
Duncan P. N. Exon Smith
4a5feedcaa IR: Move MDLocation into place
This commit moves `MDLocation`, finishing off PR21433.  There's an
accompanying clang commit for frontend testcases.  I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.

This changes the schema for `DebugLoc` and `DILocation` from:

    !{i32 3, i32 7, !7, !8}

to:

    !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)

Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.

llvm-svn: 226048
2015-01-14 22:27:36 +00:00
Sanjoy Das
f93f60b30a Fix PR22179.
We were incorrectly inferring nsw for certain SCEVs. We can be more
aggressive here (see Richard Smith's comment on
http://llvm.org/bugs/show_bug.cgi?id=22179) but this change just
focuses on correctness.

Differential Revision: http://reviews.llvm.org/D6914

llvm-svn: 225591
2015-01-10 23:41:24 +00:00
Karthik Bhat
d463305416 Revert r225165 and r225169
Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case
where the original subtraction could have caused an overflow.
Reverting the same.

llvm-svn: 225341
2015-01-07 06:34:34 +00:00
Ahmed Bougacha
297bd45000 [AArch64] Improve codegen of store lane instructions by avoiding GPR usage.
We used to generate code similar to:

  umov.b        w8, v0[2]
  strb  w8, [x0, x1]

because the STR*ro* patterns were preferred to ST1*.
Instead, we can avoid going through GPRs, and generate:

  add   x8, x0, x1
  st1.b { v0 }[2], [x8]

This patch increases the ST1* AddedComplexity to achieve that.

rdar://16372710
Differential Revision: http://reviews.llvm.org/D6202

llvm-svn: 225183
2015-01-05 17:10:26 +00:00
Ahmed Bougacha
3f0fc5a029 [AArch64] Improve codegen of store lane 0 instructions by directly storing the subregister.
For 0-lane stores, we used to generate code similar to:

  fmov w8, s0
  str w8, [x0, x1, lsl #2]

instead of:

  str s0, [x0, x1, lsl #2]

To correct that: for store lane 0 patterns, directly match to STR <subreg>0.

Byte-sized instructions don't have the special case for a 0 index,
because FPR8s are defined to have untyped content.

rdar://16372710
Differential Revision: http://reviews.llvm.org/D6772

llvm-svn: 225181
2015-01-05 17:02:28 +00:00
Karthik Bhat
29e0d6032b Select lower fsub,fabs pattern to fabd on AArch64
This patch lowers patterns such as-
  fsub   v0.4s, v0.4s, v1.4s
  fabs   v0.4s, v0.4s
to
  fabd  v0.4s, v0.4s, v1.4s
on AArch64.

Review: http://reviews.llvm.org/D6791
llvm-svn: 225169
2015-01-05 13:57:59 +00:00
Karthik Bhat
be56428c62 Select lower sub,abs pattern to sabd on AArch64
This patch lowers patterns such as-
  sub	v0.4s, v0.4s, v1.4s
  abs	v0.4s, v0.4s
to
  sabd	v0.4s, v0.4s, v1.4s
on AArch64.

Review: http://reviews.llvm.org/D6781
llvm-svn: 225165
2015-01-05 13:11:07 +00:00
Karthik Bhat
a80b575047 Lower multiply-negate operation to mneg on AArch64
This patch pattern matches code such as-
neg	 w8, w8
mul	 w8, w9, w8
to
mneg	 w8, w8, w9

Review: http://reviews.llvm.org/D6754
llvm-svn: 224706
2014-12-22 13:38:58 +00:00
Duncan P. N. Exon Smith
9c5542c040 IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

llvm-svn: 224257
2014-12-15 19:07:53 +00:00
Juergen Ributzka
8175f5b997 [AArch64] MachO large code-model: Materialize FP constants in code.
In the large code model we have to first get the address of the GOT entry, load
the address of the constant, and then load the constant itself.

To avoid these loads and the GOT entry alltogether this commit changes the way
how FP constants are materialized in the large code model. The constats are now
materialized in a GPR and then bitconverted/moved into the FPR.

Reviewed by Tim Northover

Fixes rdar://problem/16572564.

llvm-svn: 223941
2014-12-10 19:43:32 +00:00
Juergen Ributzka
a7f0b27412 [FastISel][AArch64] Fix a missing nullptr check in 'computeAddress'.
The load/store value type is currently not available when lowering the memcpy
intrinsic. Add the missing nullptr check to support this in 'computeAddress'.

Fixes rdar://problem/19178947.

llvm-svn: 223818
2014-12-09 19:44:38 +00:00