1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00
Commit Graph

202148 Commits

Author SHA1 Message Date
Matt Arsenault
53df467cf7 GlobalISel: Early continue to reduce loop indentation 2020-08-17 13:51:08 -04:00
Florian Hahn
a97645c804 [DSE,MemorySSA] Account for ScanLimit == 0 on entry.
Currently the code does not account for the fact that getDomMemoryDef
can be called with ScanLimit == 0, if we reached the limit while
processing an earlier access. Also tighten the check a bit more and bump
the scan limit now that it is handled properly.

In some cases, this brings a 2x speedup in terms of compile-time.
2020-08-17 17:55:14 +01:00
Aditya Kumar
dd9bfb0a11 NFC: [GVNHoist] Hoist loop invariant code and rename variables for readability
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D86031
2020-08-17 09:43:34 -07:00
Matt Arsenault
b3efd01f95 AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset
We may have an SGPR->VGPR copy if a totally uniform pointer
calculation is used for a VGPR pointer operand.

Also hack around a bug in MUBUF matching which would incorrectly use
MUBUF for global when flat was requested. This should really be a
predicate on the parent pattern, but the DAG always checked this
manually inside the complex pattern.
2020-08-17 12:31:38 -04:00
Steven Perron
7816f6c56b Reset PAL metadata when AMDGPU traget stream finishes
If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one.  See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC.

No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side.  Let me know if there is a good way to test this.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D85667
2020-08-17 10:56:11 -04:00
Matt Arsenault
dacb61a2e0 DAG: Add missing comment for transform 2020-08-17 10:01:12 -04:00
Aleksandr Platonov
9f052b00d7 [clangd] Fix Windows build when remote index is enabled.
CMake log:
```
CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:823 (add_executable):
  Target "clangd" links to target "Threads::Threads" but the target was not
  found.  Perhaps a find_package() call is missing for an IMPORTED target, or
  an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:150 (add_llvm_executable)
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:160 (add_clang_executable)
  D:/llvm-project/clang-tools-extra/clangd/tool/CMakeLists.txt:4 (add_clang_tool)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:821 (add_executable):
  Target "ClangdTests" links to target "Threads::Threads" but the target was
  not found.  Perhaps a find_package() call is missing for an IMPORTED
  target, or an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:1417 (add_llvm_executable)
  D:/llvm-project/clang-tools-extra/clangd/unittests/CMakeLists.txt:32 (add_unittest)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:527 (add_library):
  Target "RemoteIndexProtos" links to target "Threads::Threads" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:103 (llvm_add_library)
  D:/llvm-project/llvm/cmake/modules/FindGRPC.cmake:105 (add_clang_library)
  D:/llvm-project/clang-tools-extra/clangd/index/remote/CMakeLists.txt:2 (generate_grpc_protos)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:527 (add_library):
  Target "clangdRemoteIndex" links to target "Threads::Threads" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:103 (llvm_add_library)
  D:/llvm-project/clang-tools-extra/clangd/index/remote/CMakeLists.txt:11 (add_clang_library)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:527 (add_library):
  Target "clangdRemoteMarshalling" links to target "Threads::Threads" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?
Call Stack (most recent call first):
  D:/llvm-project/clang/cmake/modules/AddClang.cmake:103 (llvm_add_library)
  D:/llvm-project/clang-tools-extra/clangd/index/remote/marshalling/CMakeLists.txt:1 (add_clang_library)

CMake Error at D:/llvm-project/llvm/cmake/modules/AddLLVM.cmake:823 (add_executable):
  Target "clangd-index-server" links to target "Threads::Threads" but the
  target was not found.  Perhaps a find_package() call is missing for an
  IMPORTED target, or an ALIAS target is missing?
```

Reviewed By: kbobyrev

Differential Revision: https://reviews.llvm.org/D86052
2020-08-17 16:55:01 +03:00
Matt Arsenault
630fcdd4be AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping 2020-08-17 09:53:26 -04:00
Matt Arsenault
ccf0f19849 AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics
This should use the default mapping and insert a copy to the vcc bank,
and not try to insert a readfirstlane.
2020-08-17 09:53:25 -04:00
Matt Arsenault
97b374cbad AMDGPU: Don't look at dbg users for foldable operands
These would have always failed to fold, so checking them or adding
them to the fold candidates is useless.
2020-08-17 09:53:25 -04:00
Matt Arsenault
5bcccd9fb4 GlobalISel: Remove unnecessary check for copy type
COPY isn't allowed to change the type, but can mix no type with type.
2020-08-17 09:19:25 -04:00
Matt Arsenault
45fb989d05 AMDGPU/GlobalISel: Fix using post-legal combiner without LegalizerInfo 2020-08-17 09:19:22 -04:00
Matt Arsenault
aa9d3db2ef AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics
Global instructions have the signed offsets.
2020-08-17 09:19:15 -04:00
Alex Zinenko
544267f834 [llvm] support graceful failure of DataLayout parsing
Existing implementation always aborts on syntax errors in a DataLayout
description. While this is meaningful for consuming textual IR modules, it is
inconvenient for users that may need fine-grained control over the layout from,
e.g., command-line options. Propagate errors through the parsing functions and
only abort in the top-level parsing function instead.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D85650
2020-08-17 15:10:37 +02:00
Kai Nacke
79d0f5426e [SystemZ/ZOS]__(de)register_frame are not available on z/OS.
The functions `__register_frame`/`__deregister_frame` are not
available on z/OS, so add a guard to not use them.

Reviewed By: lhames, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D84787
2020-08-17 09:00:09 -04:00
Sam Parker
fd5dfeb886 [NFC] run update test script
On Transforms/LoopUnroll/runtime-small-upperbound.ll
2020-08-17 13:54:28 +01:00
Georgii Rymar
fe476e8dfc [llvm-readobj] - Remove unwrapOrError calls from GNUStyle<ELFT>::printRelocations.
This fixes existent FIXMEs: we should not error out when unable to
find the number of relocations.

Differential revision: https://reviews.llvm.org/D85891
2020-08-17 15:16:36 +03:00
Sam Elliott
510c5643ce [RISCV] Enable the use of the old mucounteren name
The RISC-V Privileged Specification 1.11 defines `mcountinhibit`, which
has the same numeric CSR value as `mucounteren` from 1.09.1. This patch
enables the use of the old `mucounteren` name.

Patch by Yuichi Sugiyama.

Reviewed By: lenary, jrtc27, pzheng

Differential Revision: https://reviews.llvm.org/D85067
2020-08-17 13:11:49 +01:00
Sam Elliott
74d7198e6b [RISCV] Indirect branch generation in position independent code
This fixes the "Unable to insert indirect branch" fatal error sometimes
seen when generating position-independent code.

Patch by msizanoen1

Reviewed By: jrtc27

Differential Revision: https://reviews.llvm.org/D84833
2020-08-17 13:09:26 +01:00
LLVM GN Syncbot
089fe26d78 [gn build] Port c1f6ce0c732 2020-08-17 12:02:24 +00:00
Sanjay Patel
7b09ee3ac8 [InstCombine] fold abs(X)/X to cmp+select
The backend can convert the select-of-constants to
bit-hack shift+logic if desirable.

https://alive2.llvm.org/ce/z/pgJT6E

  define i8 @src(i8 %x) {
  %0:
    %a = abs i8 %x, 1
    %d = sdiv i8 %x, %a
    ret i8 %d
  }
  =>
  define i8 @tgt(i8 %x) {
  %0:
    %cond = icmp sgt i8 %x, 255
    %r = select i1 %cond, i8 1, i8 255
    ret i8 %r
  }
  Transformation seems to be correct!
2020-08-17 08:01:28 -04:00
Sanjay Patel
450571ecd1 [InstCombine] add tests for sdiv-of-abs; NFC 2020-08-17 08:01:27 -04:00
Sanjay Patel
0eb5677907 [InstCombine] reduce code duplication; NFC 2020-08-17 08:01:27 -04:00
Georgii Rymar
cd6681160b [llvm-readobj/elf] - Refine the warning about the broken PT_DYNAMIC segment.
Splitted out from D85519.

Currently we report "PT_DYNAMIC segment offset + size exceeds the size of the file",
this changes it to
"PT_DYNAMIC segment offset (0x1234) + file size (0x5678) exceeds the size of the file (0x68ab)"

Differential revision: https://reviews.llvm.org/D85654
2020-08-17 14:57:19 +03:00
Simon Pilgrim
8366289c89 [DemandedBits] Improve accuracy of Add propagator
The current demand propagator for addition will mark all input bits at and right of the alive output bit as alive. But carry won't propagate beyond a bit for which both operands are zero (or one/zero in the case of subtraction) so a more accurate answer is possible given known bits.

I derived a propagator by working through truth tables and using a bit-reversed addition to make demand ripple to the right, but I'm not sure how to make a convincing argument for its correctness in the comments yet. Nevertheless, here's a minimal implementation and test to get feedback.

This would help in a situation where, for example, four bytes (<128) packed into an int are added with four others SIMD-style but only one of the four results is actually read.

Known A:     0_______0_______0_______0_______
Known B:     0_______0_______0_______0_______
AOut:        00000000001000000000000000000000
AB, current: 00000000001111111111111111111111
AB, patch:   00000000001111111000000000000000

Committed on behalf of: @rrika (Erika)

Differential Revision: https://reviews.llvm.org/D72423
2020-08-17 12:54:09 +01:00
Simon Pilgrim
d2501e9a79 [DemandedBits] Reorder addition test checks. NFC.
As suggested on D72423 we should try to keep the same order as the original IR
2020-08-17 12:54:09 +01:00
Sam Parker
113b79c6d1 [NFC] Run update script on test
Update IndVarSimplify/no-iv-rewrite.ll
2020-08-17 12:53:14 +01:00
Simon Pilgrim
8df5ab21af [X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases
Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported.

We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004
2020-08-17 11:58:51 +01:00
Cullen Rhodes
ed95f77522 [InlineCost] Fix scalable vectors in visitAlloca
Discovered as part of the VLS type work (see D85128).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85848
2020-08-17 10:34:27 +00:00
Vitaly Buka
42145d2b15 [NFC][StackSafety] Move out sort from the loop 2020-08-17 03:30:14 -07:00
Kazushi (Jam) Marukawa
efd49fca73 [VE] Support f128
Support f128 using VE instructions.  Update regression tests.
I've noticed there is no load or store i128 test, so I add them too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D86035
2020-08-17 17:26:52 +09:00
Craig Topper
1eb4fcbf00 [X86] Reject dirflag in inline asm constraints other than clobber.
Fixes the crash from PR47195.
2020-08-16 23:33:45 -07:00
Chen Zheng
91c84c2226 [PowerPC] Make StartMI ignore COPY like instructions.
Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D85659
2020-08-17 02:12:30 -04:00
Yonghong Song
d99b7697f1 [InstCombine] Fix a compilation bug
With gcc 6.3.0, I hit the following compilation bug.
  ../lib/Transforms/InstCombine/InstCombineVectorOps.cpp:937:2: error: extra ‘;’ [-Werror=pedantic]
   };
    ^
  cc1plus: all warnings being treated as errors

The error is introduced by Commit ae7f08812e09 ("[InstCombine]
Aggregate reconstruction simplification (PR47060)")
2020-08-16 21:56:42 -07:00
Vitaly Buka
6f71d99b21 [StackSafety] Skip ambiguous lifetime analysis
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D84630
2020-08-16 18:05:52 -07:00
Roman Lebedev
f5e7438a4e [NFCI][InstCombine] Pacify GCC builds - don't name variable and enum class identically 2020-08-16 23:37:36 +03:00
Roman Lebedev
e44f2bc0d5 [InstCombine] Aggregate reconstruction simplification (PR47060)
This pattern happens in clang C++ exception lowering code, on unwind branch.
We end up having a `landingpad` block after each `invoke`, where RAII
cleanup is performed, and the elements of an aggregate `{i8*, i32}`
holding exception info are `extractvalue`'d, and we then branch to common block
that takes extracted `i8*` and `i32` elements (via `phi` nodes),
form a new aggregate, and finally `resume`'s the exception.

The problem is that, if the cleanup block is effectively empty,
it shouldn't be there, there shouldn't be that `landingpad` and `resume`,
said `invoke` should be a  `call`.

Indeed, we do that simplification in e.g. SimplifyCFG `SimplifyCFGOpt::simplifyResume()`.
But the thing is, all this extra `extractvalue` + `phi` + `insertvalue` cruft,
while it is pointless, does not look like "empty cleanup block".
So the `SimplifyCFGOpt::simplifyResume()` fails, and the exception is has
higher cost than it could have on unwind branch :S

This doesn't happen *that* often, but it will basically happen once per C++
function with complex CFG that called more than one other function
that isn't known to be `nounwind`.

I think, this is a missing fold in InstCombine, so i've implemented it.

I think, the algorithm/implementation is rather self-explanatory:
1. Find a chain of `insertvalue`'s that fully tell us the initializer of the aggregate.
2. For each element, try to find from which aggregate it was extracted.
   If it was extracted from the aggregate with identical type,
   from identical element index, great.
3. If all elements were found to have been extracted from the same aggregate,
   then we can just use said original source aggregate directly,
   instead of re-creating it.
4. If we fail to find said aggregate when looking only in the current block,
   we need be PHI-aware - we might have different source aggregate when coming
   from each predecessor.

I'm not sure if this already handles everything, and there are some FIXME's,
i'll deal with all that later in followups.

I'd be fine with going with post-commit review here code-wise,
but just in case there are thoughts, i'm posting this.

On RawSpeed, for example, this has the following effect:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |     1253 |  1253 |   0.00% |  0.00% |
| simplifycfg.NumInvokes                            |      948 |     1355 |   407 |  42.93% | 42.93% |
| instcount.NumInsertValueInst                      |     4382 |     3210 | -1172 | -26.75% | 26.75% |
| simplifycfg.NumSinkCommonCode                     |      574 |      458 |  -116 | -20.21% | 20.21% |
| simplifycfg.NumSinkCommonInstrs                   |     1154 |      921 |  -233 | -20.19% | 20.19% |
| instcount.NumExtractValueInst                     |    29017 |    26397 | -2620 |  -9.03% |  9.03% |
| instcombine.NumDeadInst                           |   166618 |   174705 |  8087 |   4.85% |  4.85% |
| instcount.NumPHIInst                              |    51526 |    50678 |  -848 |  -1.65% |  1.65% |
| instcount.NumLandingPadInst                       |    20865 |    20609 |  -256 |  -1.23% |  1.23% |
| instcount.NumInvokeInst                           |    34023 |    33675 |  -348 |  -1.02% |  1.02% |
| simplifycfg.NumSimpl                              |   113634 |   114708 |  1074 |   0.95% |  0.95% |
| instcombine.NumSunkInst                           |    15030 |    14930 |  -100 |  -0.67% |  0.67% |
| instcount.TotalBlocks                             |   219544 |   219024 |  -520 |  -0.24% |  0.24% |
| instcombine.NumCombined                           |   644562 |   645805 |  1243 |   0.19% |  0.19% |
| instcount.TotalInsts                              |  2139506 |  2135377 | -4129 |  -0.19% |  0.19% |
| instcount.NumBrInst                               |   156988 |   156821 |  -167 |  -0.11% |  0.11% |
| instcount.NumCallInst                             |  1206144 |  1207076 |   932 |   0.08% |  0.08% |
| instcount.NumResumeInst                           |     5193 |     5190 |    -3 |  -0.06% |  0.06% |
| asm-printer.EmittedInsts                          |   948580 |   948299 |  -281 |  -0.03% |  0.03% |
| instcount.TotalFuncs                              |    11509 |    11507 |    -2 |  -0.02% |  0.02% |
| inline.NumDeleted                                 |    97595 |    97597 |     2 |   0.00% |  0.00% |
| inline.NumInlined                                 |   210514 |   210522 |     8 |   0.00% |  0.00% |
```
So we manage to increase the amount of `invoke` -> `call` conversions in SimplifyCFG by almost a half,
and there is a very apparent decrease in instruction and basic block count.

On vanilla llvm-test-suite:
```
| statistic name                                    | baseline | proposed |     Δ |       % | abs(%) |
|---------------------------------------------------|---------:|---------:|------:|--------:|-------:|
| instcombine.NumAggregateReconstructionsSimplified |        0 |      744 |   744 |   0.00% |  0.00% |
| instcount.NumInsertValueInst                      |     2705 |     2053 |  -652 | -24.10% | 24.10% |
| simplifycfg.NumInvokes                            |     1212 |     1424 |   212 |  17.49% | 17.49% |
| instcount.NumExtractValueInst                     |    21681 |    20139 | -1542 |  -7.11% |  7.11% |
| simplifycfg.NumSinkCommonInstrs                   |    14575 |    14361 |  -214 |  -1.47% |  1.47% |
| simplifycfg.NumSinkCommonCode                     |     6815 |     6743 |   -72 |  -1.06% |  1.06% |
| instcount.NumLandingPadInst                       |    14851 |    14712 |  -139 |  -0.94% |  0.94% |
| instcount.NumInvokeInst                           |    27510 |    27332 |  -178 |  -0.65% |  0.65% |
| instcombine.NumDeadInst                           |  1438173 |  1443371 |  5198 |   0.36% |  0.36% |
| instcount.NumResumeInst                           |     2880 |     2872 |    -8 |  -0.28% |  0.28% |
| instcombine.NumSunkInst                           |    55187 |    55076 |  -111 |  -0.20% |  0.20% |
| instcount.NumPHIInst                              |   321366 |   320916 |  -450 |  -0.14% |  0.14% |
| instcount.TotalBlocks                             |   886816 |   886493 |  -323 |  -0.04% |  0.04% |
| instcount.TotalInsts                              |  7663845 |  7661108 | -2737 |  -0.04% |  0.04% |
| simplifycfg.NumSimpl                              |   886791 |   887171 |   380 |   0.04% |  0.04% |
| instcount.NumCallInst                             |   553552 |   553733 |   181 |   0.03% |  0.03% |
| instcombine.NumCombined                           |  3200512 |  3201202 |   690 |   0.02% |  0.02% |
| instcount.NumBrInst                               |   741794 |   741656 |  -138 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                  |    14443 |    14445 |     2 |   0.01% |  0.01% |
| asm-printer.EmittedInsts                          |  7978085 |  7977916 |  -169 |   0.00% |  0.00% |
| inline.NumDeleted                                 |    73188 |    73189 |     1 |   0.00% |  0.00% |
| inline.NumInlined                                 |   291959 |   291968 |     9 |   0.00% |  0.00% |
```
Roughly similar effect, less instructions and blocks total.

See also: rGe492f0e03b01a5e4ec4b6333abb02d303c3e479e.

Compile-time wise, this appears to be roughly geomean-neutral:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=instructions

And this is a win size-wize in general:
http://llvm-compile-time-tracker.com/compare.php?from=39617aaed95ac00957979bc1525598c1be80e85e&to=b59866cf30420da8f8e3ca239ed3bec577b23387&stat=size-text

See https://bugs.llvm.org/show_bug.cgi?id=47060

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D85787
2020-08-16 23:27:56 +03:00
David Green
078943ca4c [ARM] Tests for tail predicated loads. NFC 2020-08-16 19:46:37 +01:00
Simon Pilgrim
4f05bce9b3 [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types
We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup.

There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.
2020-08-16 15:00:41 +01:00
Sanjay Patel
7fea2bded5 Revert "[PhaseOrdering] add test for memcpy removal (PR47114); NFC"
This reverts commit babb59496b540583c6951813d1e0b3abdea97e7d.

This test addition was queued up with some unrelated changes,
but it seems more likely that we need to fix something internal
to -memcpyopt. Also, I'm not sure if including target-specifc
attributes in a generic regression test dir will cause bot
problems.
2020-08-16 09:52:33 -04:00
Sanjay Patel
cb1baf71d0 [InstCombine] fold copysign with fabs/fneg operand
We already get this in the backend, but we need to do
it in IR too to consistently get yet more copysign
transforms.
2020-08-16 08:53:47 -04:00
Sanjay Patel
10a491101b [InstCombine] reduce code duplication; NFC 2020-08-16 08:53:47 -04:00
Sanjay Patel
4d018b637f [InstCombine] add tests for copysign; NFC 2020-08-16 08:53:47 -04:00
Sanjay Patel
1d9373b196 [PhaseOrdering] add test for memcpy removal (PR47114); NFC 2020-08-16 08:53:47 -04:00
Vitaly Buka
7e7fd55416 [StackSafety] Change how callee searched in index
Handle other than local linkage types.
2020-08-16 04:37:19 -07:00
Simon Pilgrim
85ebf04519 [X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp
Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.
2020-08-16 12:26:27 +01:00
Simon Pilgrim
94c10564f6 [X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC.
Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size.

This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.
2020-08-16 11:51:44 +01:00
Shoaib Meenai
06643b5db7 [llvm-libtool-darwin] Fix test on all host architectures
By default, if a universal binary has a slice matching the host
architecture, llvm-objdump will only print that slice, otherwise it'll
print all architectures. Explicitly pass `--arch all` to force it to
always print all architectures, as we want for this test.
2020-08-16 00:18:03 -07:00
Fady Ghanim
a11b92e493 [OpenMP][OMPBuilder] Adding support for omp single
This adds support for generating `omp single`, and necessary calls for
`copyprivate` clause.

Differential Revision: https://reviews.llvm.org/D85617
2020-08-16 01:15:16 -04:00
Shoaib Meenai
01b4f59362 [llvm-libtool-darwin] Speculative buildbot fix
http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l is failing
this test. Attempt to explicitly use the Mach-O dump format as a
speculative fix.
2020-08-15 21:32:09 -07:00