1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 11:13:28 +01:00
Commit Graph

202547 Commits

Author SHA1 Message Date
Sjoerd Meijer
ac0a363b76 [LangRef] Revise semantics of intrinsic get.active.lane.mask
A first version of get.active.lane.mask was committed in rG7fb8a40e5220. One of
the main purposes and uses of this intrinsic is to communicate information from
the middle-end to the back-end, but its current definition and semantics make
this actually very difficult. The intrinsic was defined as:

  @llvm.get.active.lane.mask(%IV, %BTC)

where %BTC is the Backedge-Taken Count (variable names are different in the
LangRef spec). This allows to implicitly communicate the loop tripcount, which
can be reconstructed by calculating BTC + 1. But it has been very difficult to
prove that calculating BTC + 1 is safe and doesn't overflow. We need
complicated range and SCEV analysis, and thus the problem is that this
intrinsic isn't really doing what it was supposed to solve. Examples of the
overflow checks that are required in the (ARM) back-end are D79175 and D86074,
which aren't even complete/correct yet.

To solve this problem, we are revising the definitions/semantics for
get.active.lane.mask to avoid all the complicated overflow analysis. This means
that instead of communicating the BTC, we are now using the loop tripcount. Now
using LangRef's variable names, its semantics is changed from:

  icmp ule (%base + i), %n

to:

  icmp ult (%base + i), %n

with %n > 0 and corresponding to the loop tripcount. The intrinsic signature
remains the same.

Differential Revision: https://reviews.llvm.org/D86147
2020-08-25 16:23:51 +01:00
Sanjay Patel
66453001f1 [InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try)
The 1st attempt (rG557b890) was reverted because it caused miscompiles.
That bug is avoided here by changing the order of folds and as verified
in the new tests.

Original commit message:
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460
2020-08-25 11:19:36 -04:00
Sanjay Patel
c2dd03cd0f [InstCombine] add vector demanded elements tests with shuffles; NFC
The 1st draft of D86460 (reverted) would show miscompiles with these tests
because the undef element tracking went wrong and became visible in the
shuffle masks.
2020-08-25 11:19:35 -04:00
Jay Foad
bb09b6c69f AMDGPU/GlobalISel: re-auto-generate some test checks 2020-08-25 15:54:22 +01:00
Sjoerd Meijer
47761b113f [Verifier] Additional check for intrinsic get.active.lane.mask
This adapts the verifier checks for intrinsic get.active.lane.mask to the new
semantics of it as described in D86147. I.e., the second argument %n, which
corresponds to the loop tripcount, must be greater than 0 if it is a constant,
so check that.

Differential Revision: https://reviews.llvm.org/D86301
2020-08-25 15:44:33 +01:00
Xing GUO
6857a04f03 [DWARFYAML] Make the 'Attributes' field optional.
This patch makes the 'Attributes' field optional. We don't need to
explicitly specify the 'Attributes' field in the future.

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D86537
2020-08-25 22:37:43 +08:00
Sjoerd Meijer
02f39d5a7e [SelectionDAG] Legalize intrinsic get.active.lane.mask
This adapts legalization of intrinsic get.active.lane.mask to the new semantics
as described in D86147. Because the second argument is now the loop tripcount,
we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the
backedge-taken count.

Differential Revision: https://reviews.llvm.org/D86302
2020-08-25 15:00:10 +01:00
Jeremy Morse
f5080847e6 [LiveDebugValues] Add switches for using instr-ref variable locations
This patch adds the -Xclang option
"-fexperimental-debug-variable-locations" and same LLVM CodeGen option,
to pick which variable location tracking solution to use.

Right now all the switch does is pick which LiveDebugValues
implementation to use, the normal VarLoc one or the instruction
referencing one in rGae6f78824031. Over time, the aim is to add fragments
of support in aid of the value-tracking RFC:

  http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html

also controlled by this command line switch. That will slowly move
variable locations to be defined by an instruction calculating a value,
and a DBG_INSTR_REF instruction referring to that value. Thus, this is
going to grow into a "use the new kind of variable locations" switch,
rather than just "use the new LiveDebugValues implementation".

Differential Revision: https://reviews.llvm.org/D83048
2020-08-25 14:58:48 +01:00
Matt Arsenault
1a92d5b134 AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge
Most notably, we were incorrectly reporting <3 x s16> as a legal type
for these. Make sure these aren't legal to help make progress on
fixing the artifact combiner and vector legalizer
rules. Unfortunately, this means spreading the -global-isel-abort=0
hack, although this doesn't change the legalizer result in any
situation.
2020-08-25 09:40:20 -04:00
Matt Arsenault
f16e802454 AMDGPU/GlobalISel: Fix using unlegalizable values in tests
Implicit uses of non-register value types places impossible to satisfy
constraints on the legalizer / artifact combiner. These prevent
writing sensible legalize rules for the artifacts without triggering
infinite loops in the legalizer.

The verifier really needs to enforce this, but I'm not sure what the
exact conditions would look like yet.
2020-08-25 09:39:32 -04:00
Sjoerd Meijer
bd571cfbf0 [ARM][MVE] Tail-predication: remove the BTC + 1 overflow checks
This adapts tail-predication to the new semantics of get.active.lane.mask as
defined in D86147. This means that:
- we can remove the BTC + 1 overflow checks because now the loop tripcount is
  passed in to the intrinsic,
- we can immediately use that value to setup a counter for the number of
  elements processed by the loop and don't need to materialize BTC + 1.

Differential Revision: https://reviews.llvm.org/D86303
2020-08-25 14:38:03 +01:00
Matt Arsenault
2a33728d72 AMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors
The selection patterns will currently fail on these.
2020-08-25 09:37:41 -04:00
Anatoly Trosinenko
1adb0ef206 [Utils] Add highlighting definition for byref IR attribute
This patch assumes `byref` can be handled identically to `byval`.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D85768
2020-08-25 16:19:24 +03:00
Sjoerd Meijer
1cd139275c [LV] get.active.lane.mask consuming tripcount instead of backedge-taken count
This adapts LV to the new semantics of get.active.lane.mask as discussed in
D86147, which means that the LV now emits intrinsic get.active.lane.mask with
the loop tripcount instead of the backedge-taken count as its second argument.
The motivation for this is described in D86147.

Differential Revision: https://reviews.llvm.org/D86304
2020-08-25 13:49:19 +01:00
Alex Richardson
65d5b3b1e2 Fix update_llc_test_checks function regex for RV64
Some functions also include a `.Lfunc$local:` label due to
-fno-semantic-interposition

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D85888
2020-08-25 12:20:33 +01:00
Sam Parker
50697e16b0 [NFC][SimplifyCFG] More tests for Arm 2020-08-25 12:13:48 +01:00
David Green
906e8b5e0e [ARM][CGP] Fix scalar condition selects for MVE
The arm backend does not handle select/select_cc on vectors with scalar
conditions, preferring to expand them in codegenprepare instead. This
usually works except when optimizing for size, where the optsize check
would end up overruling the backend isSelectSupported check.

We could handle the selects in ISel too, but this seems like smaller
code than trying to splat the condition to all lanes.

Differential Revision: https://reviews.llvm.org/D86433
2020-08-25 12:09:06 +01:00
Mikael Holmen
5ae70c97fb [PowerPC] Fix gcc warning [NFC]
Without the fix gcc 7.4 warns with

../lib/Target/PowerPC/PPCAsmPrinter.cpp: In member function 'void {anonymous}::PPCAsmPrinter::EmitTlsCall(const llvm::MachineInstr*, llvm::MCSymbolRefExpr::VariantKind)':
../lib/Target/PowerPC/PPCAsmPrinter.cpp:525:53: warning: enumeral and non-enumeral type in conditional expression [-Wextra]
                  MCInstBuilder(Subtarget->isPPC64() ? Opcode : PPC::BL_TLS)
                                ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
2020-08-25 12:58:38 +02:00
Sam Parker
2ad592033d [NFC][SimplifyCFG] Add some more tests for Arm. 2020-08-25 11:44:17 +01:00
Shinji Okumura
3667bf202b [Attributor][NFC] Clang format 2020-08-25 19:32:58 +09:00
Paul Walker
7cd72f042a [SVE] Lower scalable vector ISD::FNEG operations.
Also updates isConstOrConstSplatFP to allow the mul(A,-1) -> neg(A)
transformation when -1 is expressed as an ISD::SPLAT_VECTOR.

Differential Revision: https://reviews.llvm.org/D86415
2020-08-25 11:22:28 +01:00
Sam Parker
5f52cd12df [NFC][ARM] arith code size cost tests
Add a run to measure the code size cost of arithmetic instructions
and add a function for i1 types.
2020-08-25 11:16:01 +01:00
Sam Parker
c6eebcc686 [UpdatesTestChecks] Fix typo in common.py
global_vars_see_dict -> global_vars_seen_dict
2020-08-25 11:13:33 +01:00
Georgii Rymar
f23c466d76 [llvm-readobj] - Print "Unknown" when a program header is unknown.
Currently, when a program header type is unknown, we dont print anything:

```
ProgramHeader {
  Type:  (0x60000000)
```

With this patch the output will be:

```
ProgramHeader {
  Type: Unknown (0x60000000)
```

It was discussed in D85526 and consistent with what we print for
'--sections' already, e.g.:

```
Section {
  Name: .sec
  Type: Unknown (0x7FFFFFFF)
}
```

Differential revision: https://reviews.llvm.org/D86213
2020-08-25 13:05:17 +03:00
Roman Lebedev
8908aecdf3 [NFC][InstCombine] Tests for PHI-of-extractvalues
Much like with it's sibling fold HI-of-insertvalues,
it appears to be much more worthwhile than it would seem.
2020-08-25 13:01:07 +03:00
Benjamin Kramer
cc40144ebc Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract"
This reverts commit 557b890ff4f4dd5fa979c232df5b31cf3fef04c1. Causing
miscompiles, test case is on llvm-commits.
2020-08-25 11:31:31 +02:00
Hans Wennborg
8821af3ec2 Revert "[CMake] Fix ncurses/zlib in LLVM_SYSTEM_LIBS for Windows GNU"
It broke Chromium's llvm build:

 CMake Error at lib/Support/CMakeLists.txt:13 (string):
   string sub-command REGEX, mode REPLACE: regex "^()" matched an empty
   string.
 Call Stack (most recent call first):
   lib/Support/CMakeLists.txt:223 (get_system_libname)

This reverts commit 2b3807d822c50d361ae67184b6de5a41bd7b1bba /  https://reviews.llvm.org/D86434
2020-08-25 11:22:50 +02:00
Georgii Rymar
ce5d5ddb40 [llvm-readelf/obj] - Change the return type of the createDRI(...) to Expected<>
This allows to get rid of "Invalid data was encountered while parsing the file"
error reported in cases when sh_size/sh_offset of sections are broken.

Differential revision: https://reviews.llvm.org/D86451
2020-08-25 12:11:26 +03:00
Yang Zhihui
cd03b849a7 [FileCheck][docs] Fix word errors
ouput -> output

Reviewed By: thopre

Differential Revision: https://reviews.llvm.org/D86504
2020-08-25 09:53:52 +01:00
OCHyams
6420954176 [llvm-dwarfdump] Fix misleading scope byte coverage statistics
Fixes PR46575.

Bump statistics version to 6.

Without this patch, for a variable described with a location list the stat
'sum_all_variables(#bytes in parent scope covered by DW_AT_location)' is
calculated by summing all bytes covered by the location ranges in the list and
capping the result to the number of bytes in the parent scope. With the patch,
only bytes which overlap with the parent DIE scope address ranges contribute to
the stat. A new stat 'sum_all_variables(#bytes in any scope covered by
DW_AT_location)' has been added which displays the total bytes covered when
ignoring scopes.
2020-08-25 06:40:11 +01:00
David Sherwood
82c9874179 [SVE] Fix TypeSize related warnings with IR truncates of scalable vectors
In getCastInstrCost when the instruction is a truncate we were relying
upon the implicit TypeSize -> uint64_t cast when asking if a given type
has the same size as a legal integer. I've changed the code to only
ask the question if the type is fixed length.

I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail
out for now if the type is a scalable vector.

I've added the following new tests:

  Analysis/CostModel/AArch64/sve-trunc.ll
  Transforms/InstCombine/AArch64/sve-trunc.ll

for both of these fixes.

Differential revision: https://reviews.llvm.org/D86432
2020-08-25 09:17:56 +01:00
Florian Hahn
2c80ab9174 [DSE,MemorySSA] Cache accesses with/without reachable read-clobbers.
Currently we repeatedly check the same uses for read clobbers in some
cases. We can avoid unnecessary checks by keeping track of the memory
accesses we already found read clobbers for. To do so, we just add
memory access causing read-clobbers to a set. Note that marking all
visited accesses as read-clobbers would be to pessimistic, as that might
include accesses not on any path to  the actual read clobber.

If we do not find any read-clobbers, we can add all visited instructions
to another set and use that to skip the same accesses in the next call.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D75025
2020-08-25 08:48:46 +01:00
Roman Lebedev
ed8ecc651f [InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's
As per statistic, this happens pretty exceedingly rare,
but i have seen it in exactly the situations the
Phi-aware aggregate reconstruction would have handled,
eventually, and allowed invoke -> call fold later on.

So while this might be something that other fold
will have to learn about, i believe we should be
doing this transform in general.

Here, we are okay with adding two PHI's to get both the base aggregate,
and the inserted value. I'm not sure it makes much sense to restrict
it to a single phi (to just the inserted value?), because originally
we'd be receiving the final aggregate already..

llvm test-suite + RawSpeed:
```
| statistic name                             | baseline  | proposed  |    Δ |      % | \|%\| |
|--------------------------------------------|-----------|-----------|-----:|-------:|------:|
| instcombine.NumPHIsOfInsertValues          | 0         | 12        |  12  |  0.00% | 0.00% |
| asm-printer.EmittedInsts                   | 8926643   | 8926595   | -48  |  0.00% | 0.00% |
| instcombine.NumCombined                    | 3846614   | 3846640   |  26  |  0.00% | 0.00% |
| instcombine.NumConstProp                   | 24302     | 24293     |  -9  | -0.04% | 0.04% |
| instcombine.NumDeadInst                    | 1620140   | 1620112   | -28  |  0.00% | 0.00% |
| instcount.NumBrInst                        | 898466    | 898464    |  -2  |  0.00% | 0.00% |
| instcount.NumCallInst                      | 1760819   | 1760875   |  56  |  0.00% | 0.00% |
| instcount.NumExtractValueInst              | 45659     | 45649     | -10  | -0.02% | 0.02% |
| instcount.NumInsertValueInst               | 4991      | 4981      | -10  | -0.20% | 0.20% |
| instcount.NumIntToPtrInst                  | 27084     | 27087     |   3  |  0.01% | 0.01% |
| instcount.NumPHIInst                       | 371435    | 371429    |  -6  |  0.00% | 0.00% |
| instcount.NumStoreInst                     | 906011    | 906019    |   8  |  0.00% | 0.00% |
| instcount.TotalBlocks                      | 1105520   | 1105518   |  -2  |  0.00% | 0.00% |
| instcount.TotalInsts                       | 9795737   | 9795776   |  39  |  0.00% | 0.00% |
| simplifycfg.NumInvokes                     | 2784      | 2786      |   2  |  0.07% | 0.07% |
| simplifycfg.NumSimpl                       | 1001840   | 1001850   |  10  |  0.00% | 0.00% |
| simplifycfg.NumSinkCommonInstrs            | 15174     | 15170     |  -4  | -0.03% | 0.03% |
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86306
2020-08-25 10:38:11 +03:00
Sam Parker
077f615e10 [NFC][RDA] Add explicit def check
Explicitly check that there is a local def prior to the given
instruction in getReachingLocalMIDef instead of just relying on
a nullptr return from getInstFromId.
2020-08-25 08:37:45 +01:00
Freddy Ye
3ad559cce3 [X86] Support -march=sapphirerapids
Support -march=sapphirerapids for x86.
Compare with Icelake Server, it includes 14 more new features. They are
amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote,
enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86503
2020-08-25 14:21:21 +08:00
Petr Hosek
2d6ef3a647 [CMake] Fix ncurses/zlib in LLVM_SYSTEM_LIBS for Windows GNU
For the Windows GNU platform, CMAKE_FIND_LIBRARY_PREFIXES is a list
containing an empty string, which ended up in a regex capturing group,
which is invalid in CMake's regex engine. With this change, we get the
following:

  set(CMAKE_FIND_LIBRARY_PREFIXES "lib" "")
  set(CMAKE_FIND_LIBRARY_SUFFIXES ".dll.a" ".a" ".lib")
  get_system_libname(path/to/libz.dll.a zlib)
  message("${zlib}")

outputs z, as expected.

Patch By: haampie

Differential Revision: https://reviews.llvm.org/D86434
2020-08-24 23:00:54 -07:00
Alexandre Ganea
70a9ce1df9 Disable 'not' test on Windows because 'env' from GnuWin32 cannot be used without arguments. 2020-08-24 21:55:34 -04:00
Mircea Trofin
9c71d4e1d1 [MLInliner] Support training that doesn't require partial rewards
If we use training algorithms that don't need partial rewards, we don't
need to worry about an ir2native model. In that case, training logs
won't contain a 'delta_size' feature either (since that's the partial
reward).

Differential Revision: https://reviews.llvm.org/D86481
2020-08-24 17:36:29 -07:00
Fangrui Song
cf617b95ee [not][test] Fix disable-symbolization.test when 'printenv' is not available
On Windows, 'env' or 'printenv' may not exist.

Also switch back to 'env' which is specified by POSIX.1-2017. 'printenv' is not
standard (I picked it because 'printenv' exists on GnuWin32 but 'env' does not).

Reviewed By: zequanwu

Differential Revision: https://reviews.llvm.org/D86496
2020-08-24 17:27:34 -07:00
Venkataramanan Kumar
255be4506d [DAGCombine]: Fold X/Sqrt(X) to Sqrt(X)
With FMF ( "nsz" and " reassoc") fold X/Sqrt(X) to Sqrt(X).

This is done after targets have the chance to produce a
reciprocal sqrt estimate sequence because that expansion
is probably more efficient than an expansion of a
non-reciprocal sqrt. That is also why we deferred doing
this transform in IR (D85709).

Differential Revision: https://reviews.llvm.org/D86403
2020-08-24 18:16:13 -04:00
Sanjay Patel
7a4b11cdee [x86][AArch64] adjust fast-math-flags in tests; NFC
This goes with the proposal in D86403.
2020-08-24 18:16:13 -04:00
Matt Arsenault
e5708466b3 AMDGPU/GlobalISel: Handle AGPRs used for SGPR operands.
We would still need to waterfall if the value were somehow an AGPR,
and also need to explicitly copy to a VGPR.
2020-08-24 17:54:34 -04:00
Nemanja Ivanovic
887edb78a5 [PowerPC] Do not use FISel for calls and TOC-based accesses with PC-Rel
PC-Relative addressing introduces a fair bit of complexity for correctly
eliminating TOC accesses. FastISel does not include any of that handling so we
miscompile code with -mcpu=pwr10 -O0 if it includes an external call that
FastISel does not handle followed by any of the following:

    Floating point constant materialization
    Materialization of a GlobalValue
    Call that FastISel does handle

This patch switches to SDISel for any of the above.

Differential revision: https://reviews.llvm.org/D86343
2020-08-24 16:51:44 -05:00
Craig Topper
82f73ac58e [X86] Copy the tuning features and scheduler model from pentium4/x86-64 to generic
This is preparation for making clang default to -mtune=generic when no -march is specified. This will allow the default tuning to be "generic" even though our default march is "pentium4" or "x86-64".

To avoid llc lit test regressions, if no mcpu is specified, I've defaulted tune to use i586 to match the old tuning settings of no CPU. Some tests explicitly used -mcpu=generic which I've removed so they instead get this default of architecture features from generic and tune from i586.

I updated one llvm-mca test to check a different CPU since generic has a scheduler model now

Differential Revision: https://reviews.llvm.org/D86312
2020-08-24 14:47:10 -07:00
Matt Arsenault
3b7d6a6aaa AMDGPU: Have a few selection failure tests check both paths
SelectionDAG and GlobalISel take different failure paths for these and
end up producing different failure errors. Check both so the test
passes when the default is switched.
2020-08-24 17:46:31 -04:00
Nemanja Ivanovic
b06fbd740a [PowerPC] Handle SUBFIC in reg+reg -> reg+imm transformation
We initially missed the subtract-immediate in this transformation.
This patch just adds that.

Differential revision: https://reviews.llvm.org/D84659
2020-08-24 16:22:59 -05:00
Sanjay Patel
8f9cb71b9c [InstCombine] improve demanded element analysis for vector insert-of-extract
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460
2020-08-24 17:00:16 -04:00
Sanjay Patel
663055a339 [SLP] avoid 'tmp' names in regression tests; NFC
That can cause problems for update_test_checks.py (it warns when updating this file).
2020-08-24 17:00:16 -04:00
Sanjay Patel
07008dabee [InstCombine] add tests for insert+extract demanded elements; NFC 2020-08-24 17:00:16 -04:00
Shoaib Meenai
a3c544e1b7 [runtimes] Use llvm-libtool-darwin for runtimes build
It's full featured now and we can use it for the runtimes build instead
of relying on an external libtool, which means the CMAKE_HOST_APPLE
restriction serves no purpose either now. Restrict llvm-lipo to Darwin
targets while I'm here, since it's only needed there.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D86367
2020-08-24 13:48:30 -07:00