1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

24077 Commits

Author SHA1 Message Date
Juneyoung Lee
d6be273bbc [ValueTracking] Let propagatesPoison support binops/unaryops/cast/etc.
Summary:
This patch makes propagatesPoison be more accurate by returning true on
more bin ops/unary ops/casts/etc.

The changed test in ScalarEvolution/nsw.ll was introduced by
a19edc4d15 .
IIUC, the goal of the tests is to show that iv.inc's SCEV expression still has
no-overflow flags even if the loop isn't in the wanted form.
It becomes more accurate with this patch, so think this is okay.

Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, sanjoy

Reviewed By: spatel, nikic

Subscribers: regehr, nlopes, efriedma, fhahn, javed.absar, llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78615
2020-05-13 02:51:42 +09:00
Fangrui Song
595401e441 [gcov] Default coverage version to '408*' and delete CC1 option -coverage-exit-block-before-body
gcov 4.8 (r189778) moved the exit block from the last to the second.
The .gcda format is compatible with 4.7 but

* decoding libgcov 4.7 produced .gcda with gcov [4.7,8) can mistake the
  exit block, emit bogus `%s:'%s' has arcs from exit block\n` warnings,
  and print wrong `" returned %s` for branch statistics (-b).
* decoding libgcov 4.8 produced .gcda with gcov 4.7 has similar issues.

Also, rename "return block" to "exit block" because the latter is the
appropriate term.
2020-05-12 09:14:03 -07:00
Eric Christopher
dd6e28d9f9 Fix typos encountered while working on pass pipeline for O1. 2020-05-12 00:45:15 -07:00
Johannes Doerfert
5888e708cd [Attributor][FIX] Disallow function signature rewrite for casted calls
We will now ensure ensure the return type of called function is the type
of all call sites we are going to rewrite. This avoids a problem
partially fixed by D79680. The part that was not covered is a use of
this "weird" casted call site (see `@func3` in `misc_crash.ll`).

misc_crash.ll checks are auto-generated now.
2020-05-11 15:32:47 -05:00
Johannes Doerfert
723f4e91f2 [Attributor] Make AAIsDead dependences optional to prevent top state
We should never give up on AAIsDead as it guards other AAs from
unreachable code (in which SSA properties are meaningless). We did
however use required dependences on some queries in AAIsDead which
caused us to invalidate AAIsDead if the queried AA got invalidated.
We now use optional dependences instead. The bug that exposed this is
added to the liveness.ll test and other test changes show the impact.

Bug report by @sdmitriev.
2020-05-11 15:32:47 -05:00
Johannes Doerfert
6ac18055b2 [Attributor] Force update of "newly live" abstract attributes
During an update of AAIsDead, new instructions become live. If we query
information from them, the result is often just the initial state, e.g.,
for call site `noreturn` and `nounwind`. We will now trigger an update
for cached attributes during the AAIsDead update, though other AAs might
later use the same API.
2020-05-11 15:32:47 -05:00
Sanjay Patel
27365d9ef6 [VectorCombine] account for extra uses in scalarization cost
Follow-up to D79452.
Mimics the extra use cost formula for the inverse transform with extracts.
2020-05-11 15:20:57 -04:00
Mircea Trofin
3cd660b606 [llvm][NFC] Move inlining decision-related APIs in InliningAdvisor.
Summary: Factoring out in preparation to https://reviews.llvm.org/D79042

Reviewers: dblaikie, davidxl

Subscribers: mgorny, eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79613
2020-05-11 09:00:59 -07:00
Sergey Dmitriev
945de9c022 [Attributor] Fix for a crash on RAUW when rewriting function signature
Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: uenoku

Subscribers: hiraditya, uenoku, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79680
2020-05-11 08:06:19 -07:00
Tyker
dd21e1ed58 [AssumeBundles] fix crashes
Summary:
this patch fixe crash/asserts found in the test-suite.
the AssumeptionCache cannot be assumed to have all assumes contrary to what i tought.
prevent generation of information for terminators, because this can create broken IR in transfromation where we insert the new terminator before removing the old one.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79458
2020-05-11 11:52:21 +02:00
OCHyams
ceba7b314a [NFC][DwarfDebug] Add test for variables with a single location which
don't span their entire scope.

The previous commit (6d1c40c171e) is an older version of the test.

Reviewed By: aprantl, vsk

Differential Revision: https://reviews.llvm.org/D79573
2020-05-11 11:49:11 +02:00
Xun Li
01dc1f624a Remove an unused Module param
Summary:
In D65848 the function getFuncNameInModule was refactored to no longer use module.
This diff removes the parameter and rename the function name to avoid confusion.

Reviewers: wenlei, wmi, davidxl

Reviewed By: wenlei

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79310
2020-05-10 22:09:55 -07:00
Johannes Doerfert
0992aa483d [Attributor] Merge the query set into AbstractAttribute
The old QuerriedAAs contained two vectors, one for required one for
optional dependences (=queries). We now use a single vector and encode
the kind directly in the pointer.

This reduces memory consumption and makes the connection between
abstract attributes and their dependences clearer.

No functional change is intended, changes in the test are due to
different order in the query map. Neither the order before nor now is in
any way special.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 543734 (329735/s)
temporary memory allocations: 105895 (64217/s)
peak heap memory consumption: 19.19MB
peak RSS (including heaptrack overhead): 102.26MB
total memory leaked: 269.10KB
```

After:
```
calls to allocation functions: 513292 (341511/s)
temporary memory allocations: 106028 (70544/s)
peak heap memory consumption: 13.35MB
peak RSS (including heaptrack overhead): 95.64MB
total memory leaked: 269.10KB
```

Difference:
```
calls to allocation functions: -30442 (208506/s)
temporary memory allocations: 133 (-910/s)
peak heap memory consumption: -5.84MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```

---

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D78729
2020-05-10 22:27:00 -05:00
Johannes Doerfert
1e8b1544dc [Attributor][FIX] Carefully handle/ignore/forget argmemonly
When we have an existing `argmemonly` or `inaccessiblememorargmemonly`
we used to "know" that information. However, interprocedural constant
propagation can invalidate these attributes. We now ignore and remove
these attributes for internal functions (which may be affected by IP
constant propagation), if we are deriving new attributes for the
function.
2020-05-10 19:06:11 -05:00
Johannes Doerfert
2251bdf0ea [Attributor] Use "simplify to constant" in genericValueTraversal
As we replace values with constants interprocedurally, we also need to
do this "look-through" step during the generic value traversal or we
would derive properties from replaced values. While this is often not
problematic, it is when we use the "kind" of a value for reasoning,
e.g., accesses to arguments allow `argmemonly`.
2020-05-10 19:06:11 -05:00
Johannes Doerfert
340c8cfa95 [Attributor] Ignore illegal accesses to null
When we categorize a pointer value we bailed at `null` before. If we
know `null` is not a valid memory location we can ignore it as there
won't be an access at all.
2020-05-10 19:06:10 -05:00
Johannes Doerfert
cd3449844d [Attributor] Use existing helpers to determine IR facts
We now use getPointerDereferenceableBytes to determine `nonnull` and
`dereferenceable` facts from the IR. We also use getPointerAlignment in
AAAlign for the same reason. The latter can interfere with callbacks so
we do restrict it to non-function-pointers for now.
2020-05-10 19:06:10 -05:00
Johannes Doerfert
8614729425 [Attributor][NFC] Clang format Attributor*.cpp 2020-05-10 19:06:10 -05:00
Fangrui Song
9c7a9b76b2 [gcov] Default coverage version to '407*' and delete CC1 option -coverage-cfg-checksum
Defaulting to -Xclang -coverage-version='407*' makes .gcno/.gcda
compatible with gcov [4.7,8)

In addition, delete clang::CodeGenOptionsBase::CoverageExtraChecksum and GCOVOptions::UseCfgChecksum.
We can infer the information from the version.

With this change, .gcda files produced by `clang --coverage a.o` linked executable can be read by gcov 4.7~7.
We don't need other -Xclang -coverage* options.
There may be a mismatching version warning, though.

(Note, GCC r173147 "split checksum into cfg checksum and line checksum"
 made gcov 4.7 incompatible with previous versions.)
2020-05-10 16:14:07 -07:00
Fangrui Song
83e3089026 [gcov] Delete CC1 option -coverage-no-function-names-in-data
rL144865 incorrectly wrote function names for GCOV_TAG_FUNCTION
(this might be part of the reasons the header says
"We emit files in a corrupt version of GCOV's "gcda" file format").

rL176173 and rL177475 realized the problem and introduced -coverage-no-function-names-in-data
to work around the issue. (However, the description is wrong.
libgcov never writes function names, even before GCC 4.2).

In reality, the linker command line has to look like:

clang --coverage -Xclang -coverage-version='407*' -Xclang -coverage-cfg-checksum -Xclang -coverage-no-function-names-in-data

Failing to pass -coverage-no-function-names-in-data can make gcov 4.7~7
either produce wrong results (for one gcov-4.9 program, I see "No executable lines")
or segfault (gcov-7).
(gcov-8 uses an incompatible format.)

This patch deletes -coverage-no-function-names-in-data and the related
function names support from libclang_rt.profile
2020-05-10 12:37:44 -07:00
Tyker
81116006c1 [AssumeBundles] Remove non-determinisme from assume builder
Summary:
The assume builder was non-deterministic when working on unamed values.
this patch fixes this.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78616
2020-05-10 21:18:33 +02:00
Tyker
0e4a65ecbd [AssumeBundles] Prevent generation of some redundant assumes
Summary: with this patch the assume salvageKnowledge will not generate assume if all knowledge is already available in an assume with valid context. assume bulider can also in some cases update an existing assume with better information.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78014
2020-05-10 19:23:59 +02:00
Florian Hahn
366aeb3a0b [LAA] Move runtime-check generation to Transforms/Utils/loopUtils (NFC)
Currently LAA's uses of ScalarEvolutionExpander blocks moving the
expander from Analysis to Transforms. Conceptually the expander does not
fit into Analysis (it is only used for code generation) and
runtime-check generation also seems to be better suited as a
transformation utility.

Reviewers: Ayal, anemet

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D78460
2020-05-10 17:39:26 +01:00
Sanjay Patel
35504cee33 [InstCombine] canonicalize bitcast after insertelement into undef
We have a transform in the opposite direction only for the x86 MMX type,
Other types are not handled either way before this patch.

The motivating case from PR45748:
https://bugs.llvm.org/show_bug.cgi?id=45748
...is the last test diff. In that example, we are triggering an existing
bitcast transform, so we reduce the number of casts, and that should give
us the ideal x86 codegen.

Differential Revision: https://reviews.llvm.org/D79171
2020-05-10 11:37:47 -04:00
Simon Pilgrim
1c617b6425 [InstCombine] matchOrConcat - match BITREVERSE
Fold or(zext(bitreverse(x)),shl(zext(bitreverse(y)),bw/2) -> bitreverse(or(zext(x),shl(zext(y),bw/2))

Practically this is the same as the BSWAP pattern so we might as well handle it.
2020-05-10 16:00:29 +01:00
Florian Hahn
df2c2d8ea5 Recommit "[LAA] Remove one addRuntimeChecks function (NFC)."
The failing assertion has been fixed and the problematic test case has
been added.

This reverts the revert commit fc44617f28847417e55836193bbe8e9c3f09eca9.
2020-05-10 15:19:57 +01:00
Florian Hahn
08e92a0b48 Revert "[LAA] Remove one addRuntimeChecks function (NFC)."
This reverts commit c28114c8ffde705d7e16cd4c065fd23269661c81.

This causes some bots to fail:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-android/builds/30596/steps/build%20android%2Faarch64/logs/stdio
2020-05-10 13:28:00 +01:00
Florian Hahn
08756fc67b [LAA] Remove one addRuntimeChecks function (NFC).
In order to reduce the API surface area (preparation for D78460), remove
a addRuntimeChecks() function and do the additional check in the single
caller.

Reviewers: Ayal, anemet

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D79679
2020-05-10 12:48:55 +01:00
Sanjay Patel
b3c5529405 [InstCombine] fold fpext into exact integer-to-FP cast
We can combine a floating-point extension cast with a conversion
from integer if we know the earlier cast is exact.

This is an optimization suggested in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c19

However, this patch does not change the example suggested there.
This patch only uses the existing analysis to handle cases where
the integer source value magnitude is narrower than the
intermediate FP mantissa (guarantees that the conversion to FP is
exact). Follow-up patches to the analysis function can enable
more cases.

Differential Revision: https://reviews.llvm.org/D79116
2020-05-10 07:04:54 -04:00
Arthur Eubanks
1320fee88a Add missing pass initialization
Summary: This was preventing MemorySanitizerLegacyPass from appearing in --print-after-all.

Reviewers: vitalybuka

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79661
2020-05-09 21:31:52 -07:00
Jinsong Ji
f65d413c3a [sanitizer] Enable whitelist/blacklist in new PM
https://reviews.llvm.org/D63616 added `-fsanitize-coverage-whitelist`
and `-fsanitize-coverage-blacklist` for clang.

However, it was done only for legacy pass manager.
This patch enable it for new pass manager as well.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D79653
2020-05-10 02:34:29 +00:00
Matt Arsenault
a73f815286 InstCombine: Broaden copy-constant-to-alloca optimization
Consider any constant memory type, not just global constants. AMDGPU
kernel parameters are effectively global constants, but appear as
either reads from an intrinsic derived pointer or function argument.
2020-05-09 16:00:27 -04:00
Evgenii Stepanov
5be5626d69 [hwasan] Allow -hwasan-globals flag to appear more than once. 2020-05-08 16:35:48 -07:00
Layton Kifer
1eccff3d80 [TRE][NFC] Refactor shared state into member variables.
Separate functions that require shared state into a class to avoid
needing to pass them though multiple functions just to be available
where needed.

The main motivation for this is that we would like to remove the
limitation that accumulator values be dynamic constant, which would
require additional shared state between call eliminations in the same
function, compounding this issue.

Differential Revision: https://reviews.llvm.org/D79299
2020-05-08 14:36:02 -07:00
Sanjay Patel
ba0bcdfe21 [VectorCombine] scalarize binop of inserted elements into vector constants
As with the extractelement patterns that are currently in vector-combine,
there are going to be several possible variations on this theme. This
should be the clearest, simplest example.

Scalarization is the right direction for target-independent canonicalization,
and InstCombine has some of those folds already, but it doesn't do this.
I proposed a similar transform in D50992. Here in vector-combine, we can
check the cost model to be sure it's profitable, so there should be less risk.

Differential Revision: https://reviews.llvm.org/D79452
2020-05-08 16:31:12 -04:00
Sanjay Patel
243ee772d2 [InstCombine] fix typo in comment; NFC 2020-05-08 15:43:14 -04:00
zoecarver
03bc2a070d Re-commit: Mark values as trivially dead when their only use is a start or end lifetime intrinsic.
Summary:
If the only use of a value is a start or end lifetime intrinsic then mark the intrinsic as trivially dead. This should allow for that value to then be removed as well.

Currently, this only works for allocas, globals, and arguments.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79355
2020-05-08 12:24:10 -07:00
Sanjay Patel
db334742c4 [InstCombine] add helper for known exact cast to FP; NFC
As suggested in D79116 - there's shared logic between the
existing code and potential new folds. This could go in
ValueTracking if it seems generally useful.
2020-05-08 15:22:36 -04:00
Ricky Zhou
30d6c2bb93 [SimplifyCFG] Remap rewritten debug intrinsic operands.
FoldBranchToCommonDest clones instructions to a different basic block,
but handles debug intrinsics in a separate path. Previously, when
cloning debug intrinsics, their operands were not updated to reference
the correct cloned values. As a result, we would emit debug.value
intrinsics with broken operand references which are discarded in later
passes. This leads to incorrect debuginfo that reports incorrect values
for variables.

Fix this by remapping debug intrinsic operands when cloning them.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45667.

Differential Revision: https://reviews.llvm.org/D79602
2020-05-08 11:10:25 -07:00
Sanjay Patel
312650aa90 [InstCombine] clean up foldItoFPtoI; NFC
Mostly cosmetic improvements to variable names and logic to ease
refactoring suggested in D79116.
2020-05-08 12:13:42 -04:00
Sanjay Patel
b78314f414 [InstCombine] simplify code for FP to integer casts; NFCI
FoldIToFPtoI() returns immediately if the operand is not
an opposite cast instruction, so the extra checks in the
callers are redundant.
2020-05-08 10:14:03 -04:00
Benjamin Kramer
518a11bdb4 Revert "Recommit "[LV] Induction Variable does not remain scalar under tail-folding.""
This reverts commit ae45b4dbe73ffde5fe3119835aa947d5a49635ed. It
causes miscompilations, test case on the mailing list.
2020-05-08 14:49:10 +02:00
Diego Caballero
4968611833 [LoopFusion] Remove unreachable blocks from DT and LI after fusion
This patch removes FC0.ExitBlock and FC1GuardBlock from DT and LI
after fusion of guarded loops. They become unreachable and LI
verification failed when they happened to be inside another loop.

Reviewed By: kbarton

Differential Revision: https://reviews.llvm.org/D78679
2020-05-07 16:44:40 -07:00
Johannes Doerfert
dd4a1a3fb6 [Attributor][FIX] Record dependences for assumed dead abstract attributes
In a recent patch we introduced a problem with abstract attributes that
were assumed dead at some point. Since `Attributor::updateAA` was
introduced in 95e0d28b71e42c9b7cd77c96f728311981a021f6, we did not
remember the dependence on the liveness AA when an abstract attribute
was assumed dead and therefore not updated.

Explicit reproducer added in liveness.ll.

---

Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):

Before:
```
calls to allocation functions: 509242 (345483/s)
temporary memory allocations: 98666 (66937/s)
peak heap memory consumption: 18.60MB
peak RSS (including heaptrack overhead): 103.29MB
total memory leaked: 269.10KB
```

After:
```
calls to allocation functions: 529332 (355494/s)
temporary memory allocations: 102107 (68574/s)
peak heap memory consumption: 19.40MB
peak RSS (including heaptrack overhead): 102.79MB
total memory leaked: 269.10KB
```

Difference:
```
calls to allocation functions: 20090 (1339333/s)
temporary memory allocations: 3441 (229400/s)
peak heap memory consumption: 801.45KB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
2020-05-07 17:00:50 -05:00
Johannes Doerfert
c70f94a2a1 [Attributor] Mark dependence as optional 2020-05-07 17:00:50 -05:00
Alina Sbirlea
493d68bb41 [SimpleLoopUnswitch] Update DefaultExit condition to check unreachable is not empty.
Summary:
Update the check for the default exit block to not only check that the
terminator is not unreachable, but also check that unreachable block has
*only* the unreachable instruction.

Reviewers: chandlerc

Subscribers: hiraditya, uabelho, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78277
2020-05-07 13:48:30 -07:00
Huihui Zhang
d6cea89c53 [InstCombine][SVE] Fix visitExtractElementInst for scalable type.
Summary:
This patch fix the following issues with visitExtractElementInst:

      1. Restrict VectorUtils::findScalarElement to fixed-length vector.
         For scalable type, the number of elements in shuffle mask is
         unknown at compile-time.
      2. Fix out-of-range calculation for fixed-length vector.
      3. Skip scalable type when analysis rely on fixed number of elements.
      4. Add unit tests to check functionality of extractelement for scalable type.

Reviewers: sdesmalen, efriedma, spatel, nikic

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78267
2020-05-07 13:03:52 -07:00
Huihui Zhang
2764e2781b [InstCombine][SVE] Fix visitInsertElementInst for scalable type.
Summary:
This patch fixes the following issues in visitInsertElementInst:

      1. Bail out for scalable type when analysis requires fixed size number of vector elements.
      2. Use cast<FixedVectorType> to get vector number of elements. This ensure assertion
          on scalable vector type.
      3. For scalable type, avoid folding a chain of insertelement into splat:
            insertelt(insertelt(insertelt(insertelt X, %k, 0), %k, 1), %k, 2) ...
              ->
            shufflevector(insertelt(X, %k, 0), undef, zero)
          The length of scalable vector is unknown at compile-time, therefore we don't know if
          given insertelement sequence is valid for splat.

Reviewers: sdesmalen, efriedma, spatel, nikic

Reviewed By: sdesmalen, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78895
2020-05-07 12:44:52 -07:00
Sanjay Patel
440448d12c [SLP] add another bailout for load-combine patterns (2nd try)
The original patch (rG86dfbc676ebe) exposed an existing bug:
we could wrongly cast a constant expression to BinaryOperator
because the pattern matching allows that. This adds a check
for that case, and there's a reduced test case to verify no
crashing.

Original commit message:

This builds on the or-reduction bailout that was added with D67841.
We still do not have IR-level load combining, although that could
be a target-specific enhancement for -vector-combiner.

The heuristic is narrowly defined to catch the motivating case from
PR39538:
https://bugs.llvm.org/show_bug.cgi?id=39538
...while preserving existing functionality.

That is, there's an unmodified test of pure load/zext/store that is
not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll.
That's the reason for the logic difference to require the 'or'
instructions. The chances that vectorization would actually help a
memory-bound sequence like that seem small, but it looks nicer with:

  vpmovzxwd     (%rsi), %xmm0
  vmovdqu       %xmm0, (%rdi)

rather than:

  movzwl        (%rsi), %eax
  movl  %eax, (%rdi)
  ...

In the motivating test, we avoid creating a vector mess that is
unrecoverable in the backend, and SDAG forms the expected bswap
instructions after load combining:

  movzbl (%rdi), %eax
  vmovd %eax, %xmm0
  movzbl 1(%rdi), %eax
  vmovd %eax, %xmm1
  movzbl 2(%rdi), %eax
  vpinsrb $4, 4(%rdi), %xmm0, %xmm0
  vpinsrb $8, 8(%rdi), %xmm0, %xmm0
  vpinsrb $12, 12(%rdi), %xmm0, %xmm0
  vmovd %eax, %xmm2
  movzbl 3(%rdi), %eax
  vpinsrb $1, 5(%rdi), %xmm1, %xmm1
  vpinsrb $2, 9(%rdi), %xmm1, %xmm1
  vpinsrb $3, 13(%rdi), %xmm1, %xmm1
  vpslld $24, %xmm0, %xmm0
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpslld $16, %xmm1, %xmm1
  vpor %xmm0, %xmm1, %xmm0
  vpinsrb $1, 6(%rdi), %xmm2, %xmm1
  vmovd %eax, %xmm2
  vpinsrb $2, 10(%rdi), %xmm1, %xmm1
  vpinsrb $3, 14(%rdi), %xmm1, %xmm1
  vpinsrb $1, 7(%rdi), %xmm2, %xmm2
  vpinsrb $2, 11(%rdi), %xmm2, %xmm2
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpinsrb $3, 15(%rdi), %xmm2, %xmm2
  vpslld $8, %xmm1, %xmm1
  vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero
  vpor %xmm2, %xmm1, %xmm1
  vpor %xmm1, %xmm0, %xmm0
  vmovdqu %xmm0, (%rsi)

  movl  (%rdi), %eax
  movl  4(%rdi), %ecx
  movl  8(%rdi), %edx
  movbel        %eax, (%rsi)
  movbel        %ecx, 4(%rsi)
  movl  12(%rdi), %ecx
  movbel        %edx, 8(%rsi)
  movbel        %ecx, 12(%rsi)

Differential Revision: https://reviews.llvm.org/D78997
2020-05-07 15:04:37 -04:00
Christopher Tetreault
5ac1d35e81 [SVE] Fix incorrect usage of getNumElements() in InstCombineCalls
Summary:
Remove incorrect usage of getNumElements() from visitCallInst(). The
number of elements was being used to construct a DemandedElts bitfield.
This operation does not make sense for scalable vectors. Cast to
FixedVectorType

Identified by test case Clang :: CodeGen/aarch64-sve-intrinsics/acle_sve_mla.c

Reviewers: rengolin, efriedma, sdesmalen, c-rhodes, david-arm

Reviewed By: david-arm

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79524
2020-05-07 08:46:51 -07:00