1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00
Commit Graph

135319 Commits

Author SHA1 Message Date
Sanjay Patel
8f69575ea0 add missing test for simplifySelectBitTest()
llvm-svn: 275990
2016-07-19 16:49:55 +00:00
Tobias Grosser
1b2b3c1ea1 [InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))
Summary:
Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one:

> Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp.

Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check:

`if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...`

This check seems to sort out more cases than necessary because:
- the reverse transformation is obviously done for `or` instructions only
- and also not every `zext icmp` pair is necessarily the result of this reverse transformation

Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`).

As an example, consider the following IR snippet

```
%1 = icmp sgt i64 %a, %b
%2 = zext i1 %1 to i8
%3 = icmp slt i64 %a, %c
%4 = zext i1 %3 to i8
%5 = and i8 %2, %4
```

which would now be transformed to

```
%1 = icmp sgt i64 %a, %b
%2 = icmp slt i64 %a, %c
%3 = and i1 %1, %2
%4 = zext i1 %3 to i8
```

This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code.

Reviewers: grosser, vtjnash, majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D22511

Contributed-by: Matthias Reisinger
llvm-svn: 275989
2016-07-19 16:39:17 +00:00
Matt Arsenault
86ee9f5013 AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

llvm-svn: 275988
2016-07-19 16:27:56 +00:00
Simon Pilgrim
e2f3b489b8 [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion clang patch is at D22105

Differential Revision: https://reviews.llvm.org/D22106

llvm-svn: 275981
2016-07-19 15:07:43 +00:00
Sam Parker
de7eeee861 [ARM] Refactor Thumb2 Mul and Mla instr descs
Recommitting after r274347 was reverted. This patch introduces some
classes to refactor the 3 and 4 register Thumb2 multiplication
instruction descriptions, plus improved tests for some of those
instructions.

Differential Revision: https://reviews.llvm.org/D21929

llvm-svn: 275979
2016-07-19 14:44:05 +00:00
Pankaj Gode
b0f149dd81 [AArch64] PredictableSelectIsExpensive for Vulcan.
Adding PredictableSelectIsExpensive for Vulcan

Differential Revision: https://reviews.llvm.org/D22448

llvm-svn: 275978
2016-07-19 14:30:21 +00:00
Peter Smith
eee35f2834 Add support for tlsldm assembler operator to ARM target
The standard local dynamic model for TLS on ARM systems needs two 
relocations:
- R_ARM_TLS_LDM32 (module idx)
- R_ARM_TLS_LDO32 (offset of object from origin of module TLS block)
    
In GNU style assembler we use symbol(tlsldm) and symbol(tlsldo) to
produce these relocations.
    
llvm-mc for ARM supports symbol(tlsldo) but does not support symbol(tlsldm).
This patch wires up the existing symbol(tlsldm) to R_ARM_TLS_LDM32.
    
TLS for ARM is defined in Addenda to, and Errata in, the ABI for the
ARM Architecture
    
Differential Revision: https://reviews.llvm.org/D22461

llvm-svn: 275977
2016-07-19 14:15:33 +00:00
Simon Pilgrim
c94489c306 [AARCH64] Fix linu triple typo
As promised in D22191

llvm-svn: 275976
2016-07-19 14:12:45 +00:00
Simon Pilgrim
91bb437eb8 [AARCH64] Enable AARCH64 lit tests on windows dev machines
As discussed on PR27654, this patch fixes the triples of a lot of aarch64 tests and enables lit tests on windows

This will hopefully help stop cases where windows developers break the aarch64 target

Differential Revision: https://reviews.llvm.org/D22191

llvm-svn: 275973
2016-07-19 13:35:11 +00:00
Simon Pilgrim
66bfb56b57 Get rid of VS2015 operator precedence warning. NFCI.
llvm-svn: 275971
2016-07-19 12:26:51 +00:00
Daniel Sanders
754a2807ef [mips][ias] R_MIPS_GOT_(PAGE|OFST) do not need symbols
Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D22458

llvm-svn: 275968
2016-07-19 10:58:06 +00:00
Daniel Sanders
57cec27c07 [mips] Correct label prefixes for N32 and N64.
Summary:
N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own
($).

This fixes the majority of object differences between -fintegrated-as and
-fno-integrated-as.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D22412

llvm-svn: 275967
2016-07-19 10:49:03 +00:00
Daniel Sanders
e79a374823 [mips] Recognise the triple used by Debian stretch for mips64el.
Summary:
The triple used for this distribution is mips64el-linux-gnuabi64.

Reviewers: sdardis

Subscribers: sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D22406

llvm-svn: 275966
2016-07-19 10:22:19 +00:00
Tobias Grosser
f3bf40f4ab [InstCombine] Minor cleanup of cast simplification code [NFC]
Summary:
This patch cleans up parts of InstCombine to raise its compliance with the LLVM coding standards and to increase its readability. The changes and according rationale are summarized in the following:

- Rename `ShouldOptimizeCast()` to `shouldOptimizeCast()` since functions should start with a lower case letter.

- Move `shouldOptimizeCast()` from InstCombineCasts.cpp to InstCombineAndOrXor.cpp since it's only used there.

- Simplify interface of `shouldOptimizeCast()`.

- Minor code style adaptions in `shouldOptimizeCast()`.

- Remove the documentation on the function definition of `shouldOptimizeCast()` since it just repeats the documentation on its declaration. Also enhance the documentation on its declaration with more information describing its intended use and make it doxygen-compliant.

- Change a comment in `foldCastedBitwiseLogic()` from `fold (logic (cast A), (cast B)) -> (cast (logic A, B))` to `fold logic(cast(A), cast(B)) -> cast(logic(A, B))` since the surrounding comments use this format.

- Remove comment `Only do this if the casts both really cause code to be generated.` in `foldCastedBitwiseLogic()` since it just repeats parts of the documentation of `shouldOptimizeCast()` and does not help to improve readability.

- Simplify the interface of `isEliminableCastPair()`.

- Removed the documentation on the function definition of `isEliminableCastPair()` which only contained obvious statements about its implementation. Instead added more general doxygen-compliant documentation to its declaration.

- Renamed parameter `DoXform` of `transformZExtIcmp()` to `DoTransform` to make its intention clearer.

- Moved documentation of `transformZExtIcmp()` from its definition to its declaration and made it doxygen-compliant.

Reviewers: vtjnash, grosser

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D22449

Contributed-by: Matthias Reisinger
llvm-svn: 275964
2016-07-19 09:06:08 +00:00
Tobias Grosser
fc9b1f99e6 Style: drop some unnecessary ';' [NFC]
llvm-svn: 275963
2016-07-19 09:01:46 +00:00
Elena Demikhovsky
37e609e198 AVX-512: Fixed BT instruction selection.
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).

I added the new sequence (truncate (a >>n) to i1) to the BT pattern.

Differential Revision: https://reviews.llvm.org/D22354

llvm-svn: 275950
2016-07-19 07:14:21 +00:00
Craig Topper
b649360c0a [AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions.
llvm-svn: 275942
2016-07-19 02:00:38 +00:00
Craig Topper
1f08f9b1ce [X86] Remove superfluous parameter from a multiclass. All instantiations passed the same value.
llvm-svn: 275941
2016-07-19 02:00:35 +00:00
George Burgess IV
c568184023 [MemorySSA] Update to the new shiny walker.
This patch updates MemorySSA's use-optimizing walker to be more
accurate and, in some cases, faster.

Essentially, this changed our core walking algorithm from a
cache-as-you-go DFS to an iteratively expanded DFS, with all of the
caching happening at the end. Said expansion happens when we hit a Phi,
P; we'll try to do the smallest amount of work possible to see if
optimizing above that Phi is legal in the first place. If so, we'll
expand the search to see if we can optimize to the next phi, etc.

An iteratively expanded DFS lets us potentially quit earlier (because we
don't assume that we can optimize above all phis) than our old walker.
Additionally, because we don't cache as we go, we can now optimize above
loops.

As an added bonus, this patch adds a ton of verification (if
EXPENSIVE_CHECKS are enabled), so finding bugs is easier.

Differential Revision: https://reviews.llvm.org/D21777

llvm-svn: 275940
2016-07-19 01:29:15 +00:00
Craig Topper
19362f8769 [X86] Rename VINSERTzrr to use a capital Z to match other instructions. NFC
llvm-svn: 275939
2016-07-19 01:26:19 +00:00
Vedant Kumar
78dfceef4b Retry: [llvm-profdata] Speed up merging by using a thread pool
Add a "-j" option to llvm-profdata to control the number of threads used.
Auto-detect NumThreads when it isn't specified, and avoid spawning threads when
they wouldn't be beneficial.

I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:

  No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
  With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total

Changes since the initial commit:

  - When handling odd-length inputs, call ThreadPool::wait() before merging the
    last profile. Should fix a race/off-by-one (see r275937).

Differential Revision: https://reviews.llvm.org/D22438

llvm-svn: 275938
2016-07-19 01:17:20 +00:00
Vedant Kumar
338daec4d5 Revert "[llvm-profdata] Speed up merging by using a thread pool"
This reverts commit r275921. It broke the ppc64be bot:

  http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/3537

I'm not sure why it broke, but based on the output, it looks like an
off-by-one (one profile left un-merged).

llvm-svn: 275937
2016-07-19 00:57:09 +00:00
Wei Mi
a63472cac9 Recommit the patch "Use uniforms set to populate VecValuesToIgnore".
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.

The change will make the vector RegUsages estimation less conservative.

Differential Revision: https://reviews.llvm.org/D20474

The recommit fixed the testcase global_alias.ll.

llvm-svn: 275936
2016-07-19 00:50:43 +00:00
Matt Arsenault
449d829b41 AMDGPU/SI: Fix SI scheduler refcount issue
Without this fix, releaseSuccessors when InOrOutBlock is
false could release SUs outside the schedule BasicBlock.

Patch by Axel Davy

llvm-svn: 275935
2016-07-19 00:35:22 +00:00
Matt Arsenault
93828aab9b AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00
Lang Hames
92bc82a15a [Kaleidoscope][BuildingAJIT] More work on the text for Chapter 3.
Add an overview of stubs and compile callbacks before the discussion of the
source changes.

-- This line, and those below, will be ignored--

M    docs/tutorial/BuildingAJIT3.rst

llvm-svn: 275933
2016-07-19 00:25:52 +00:00
Sanjoy Das
057da043be [LoopReroll] Reroll loops with unordered atomic memory accesses
Reviewers: hfinkel, jfb, reames

Subscribers: mcrosier, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D22385

llvm-svn: 275932
2016-07-19 00:23:54 +00:00
Matt Arsenault
c431d0b6a1 TableGen: Allow custom register operand decoder method
This is for a situation where the encoding for a register may be
different depending on the specific operand. For some instructions,
we want to apply additional restrictions beyond the encoding's
constraints.

In AMDGPU some operands are VSrc_32, using the VS_32 pseudo register
class which accept VGPRs, SGPRs, or immediates in the encoding.
Some specific instructions with the same encoding operand do not want
to allow immediates or SGPRs, but the encoding format is different
in this case than a regular VGPR_32 operand.

This allows specifying the encoding should be treated the same
without introducing yet another dummy register class.

llvm-svn: 275929
2016-07-18 23:20:46 +00:00
Matt Arsenault
bba828fc0b AMDGPU: Fix test name and broken CHECK-LABEL
llvm-svn: 275928
2016-07-18 23:09:51 +00:00
Vedant Kumar
271dba4dc2 [utils] Generate html reports with the code coverage utility script
Instead of extracting raw coverage mappings into an artifact directory,
actually generate useful html reports for a given list of binaries with
symbol demangling turned on.

No tests, but this is actively being used to drive the (still nascent)
coverage bot.

llvm-svn: 275927
2016-07-18 22:50:10 +00:00
Matt Arsenault
4df0821f1b Fix -Wreturn-type with gcc 4.8 and libc++
llvm-svn: 275922
2016-07-18 22:12:46 +00:00
Vedant Kumar
498432e86b [llvm-profdata] Speed up merging by using a thread pool
Add a "-j" option to llvm-profdata to control the number of threads
used. Auto-detect NumThreads when it isn't specified, and avoid spawning
threads when they wouldn't be beneficial.

I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:

  No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
  With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total

Differential Revision: https://reviews.llvm.org/D22438

llvm-svn: 275921
2016-07-18 22:02:39 +00:00
Artem Belevich
c6852a2c64 [NVPTX] Make sure we adjust alignment at all call sites
.. including calls from kernel functions that were
ignored by mistake before.

llvm-svn: 275920
2016-07-18 21:58:48 +00:00
Dehao Chen
8074cec195 [PM] Convert Loop Strength Reduce pass to new PM
Summary: Convert Loop String Reduce pass to new PM

Reviewers: davidxl, silvas

Subscribers: junbuml, sanjoy, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D22468

llvm-svn: 275919
2016-07-18 21:41:50 +00:00
Mehdi Amini
fe671e23ec Update doxygen description for WriteBitcodeToFile() API (NFC)
llvm-svn: 275917
2016-07-18 21:29:24 +00:00
Teresa Johnson
ef461cc7ce [PM] Port FunctionImport Pass to new PM
Summary: Port FunctionImport Pass to new PM.

Reviewers: mehdi_amini, davide

Subscribers: davidxl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22475

llvm-svn: 275916
2016-07-18 21:22:24 +00:00
Wei Mi
34c2e98656 Revert rL275912.
llvm-svn: 275915
2016-07-18 21:14:43 +00:00
Wei Mi
e3b5e56180 Use uniforms set to populate VecValuesToIgnore.
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.

The change will make the vector RegUsages estimation less conservative.

Differential Revision: https://reviews.llvm.org/D20474

llvm-svn: 275912
2016-07-18 20:59:53 +00:00
Sanjay Patel
107970ab14 refactor SimplifySelectInst; NFCI
llvm-svn: 275911
2016-07-18 20:56:53 +00:00
Justin Lebar
22d839f8b9 Write isUInt using template specializations to work around an incorrect MSVC warning.
Summary:
Per D22441, MSVC warns on our old implementation of isUInt<64>.  It sees
uint64_t(1) << 64 and doesn't realize that it's not going to be
executed.  Writing as a template specialization is ugly, but prevents
the warning.

Reviewers: RKSimon

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D22472

llvm-svn: 275909
2016-07-18 20:40:35 +00:00
Sanjay Patel
326b50bbbb add tests for missed sext transform
llvm-svn: 275908
2016-07-18 20:37:51 +00:00
Hans Wennborg
1cc961d882 build_llvm_package.bat: update version to 4.0.0
llvm-svn: 275903
2016-07-18 20:26:46 +00:00
Sanjay Patel
14cc5677cc auto-generate checks
llvm-svn: 275899
2016-07-18 20:06:51 +00:00
Hans Wennborg
d1d3ea8114 Revert r273099 "If the revision number starts with r, drop it. It will get added back"
This doesn't seem to work with Bash:

$ /work/llvm/utils/release/merge.sh --proj llvm --rev r275870
/work/llvm/utils/release/merge.sh: line 34: ${$1#r}: bad substitution

I get the same error with and without a leading 'r'.

llvm-svn: 275898
2016-07-18 20:06:27 +00:00
Artem Belevich
b6f334a5b3 [NVPTX] Force minimum alignment of 4 for byval arguments of device-side functions.
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.

The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.

Differential Revision: https://reviews.llvm.org/D22428

llvm-svn: 275893
2016-07-18 19:54:56 +00:00
Michael Zolotukhin
0c4cc27783 [LoopSimplify] Update LCSSA after separating nested loops.
Summary:
Usually LCSSA survives this transformation, but in some cases (see
attached test) it doesn't: values from the original loop after
separating might be used from the outer loop. Before the transformation
it was the same loop, so LCSSA phis were not required.

This fixes PR28272.

Reviewers: sanjoy, hfinkel, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D21665

llvm-svn: 275891
2016-07-18 19:44:19 +00:00
Vitaly Buka
ed5fb3f10a Revert "[ARM] Skip inline asm memory operands in DAGToDAGISel"
Breaks asan, see https://reviews.llvm.org/D22103

This reverts commit r275776.

llvm-svn: 275890
2016-07-18 19:44:01 +00:00
Vitaly Buka
d15ba23617 Revert "[ARM] Update test to use CHECK-LABEL. NFCI."
Breaks asan, see https://reviews.llvm.org/D22103

This reverts commit r275777.

llvm-svn: 275889
2016-07-18 19:43:58 +00:00
Nirav Dave
991d54db70 [MC] Separate non-parsing operations from conditional chains. NFC.
llvm-svn: 275888
2016-07-18 19:35:21 +00:00
David Majnemer
4e1e8b0d12 [GVNHoist] Remove a home-grown version of replaceUsesOfWith
replaceUsesOfWith will, on average, consider fewer values when trying
to do the replacement.

No functional change is intended.

llvm-svn: 275884
2016-07-18 19:14:14 +00:00