1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00
Commit Graph

186117 Commits

Author SHA1 Message Date
Thomas Lively
24f1028639 [WebAssembly] v8x16.swizzle and rewrite BUILD_VECTOR lowering
Summary:
Adds the new v8x16.swizzle SIMD instruction as specified at
https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices.
In addition to adding swizzles as a candidate lowering in
LowerBUILD_VECTOR, also rewrites and simplifies the lowering to
minimize the number of replace_lanes necessary rather than trying to
minimize code size. This leads to more uses of v128.const instead of
splats, which is expected to increase performance.

The new code will be easier to tune once V8 implements all the vector
construction operations, and it will also be easier to add new
candidate instructions in the future if necessary.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68527

llvm-svn: 374188
2019-10-09 17:39:19 +00:00
Kevin P. Neal
7b5700c1e4 [FPEnv][NFC] Change test to conform to strictfp attribute rules.
In particular, the function definition is not marked strictfp despite
containing a function marked strictfp. Also, if any function call is marked
strictfp then all function calls in that function must be marked.

This change to move the one strictfp call to a new properly marked function
meets all the new rules.

Tested with a stricter version of D68233.

Reviewed by:	spatel
Approved by:	spatel
Differential Revision:	https://reviews.llvm.org/D68713

llvm-svn: 374186
2019-10-09 17:24:56 +00:00
Sanjay Patel
63fe889fb1 [SLP] respect target register width for GEP vectorization (PR43578)
We failed to account for the target register width (max vector factor)
when vectorizing starting from GEPs. This causes vectorization to
proceed to obviously illegal widths as in:
https://bugs.llvm.org/show_bug.cgi?id=43578

For x86, this also means that SLP can produce rogue AVX or AVX512
code even when the user specifies a narrower vector width.

The AArch64 test in ext-trunc.ll appears to be better using the
narrower width. I'm not exactly sure what getelementptr.ll is trying
to do, but it's testing with "-slp-threshold=-18", so I'm not worried
about those diffs. The x86 test is an over-reduction from SPEC h264;
this patch appears to restore the perf loss caused by SLP when using
-march=haswell.

Differential Revision: https://reviews.llvm.org/D68667

llvm-svn: 374183
2019-10-09 16:32:49 +00:00
Momchil Velikov
19650c96c6 [AArch64] Ensure no tagged memory is left in the unallocated portion of the
stack

This patch makes sure that if we tag some memory, we untag that memory before
the function returns/throws via any exit, reachable from the tag operation. For
that we place the untag operation either at:

  a) the lifetime end call for the alloca, if that call post-dominates the
     lifetime start call (where the tag operation is placed), or it (the
     lifetime end call) dominates all reachable exits, otherwise
  b) at the reachable exits

Differential Revision: https://reviews.llvm.org/D68469

llvm-svn: 374182
2019-10-09 16:31:50 +00:00
Jason Liu
80a5184d6b [NFC] Remove files got accidentally upload in llvm-svn 374179
llvm-svn: 374181
2019-10-09 16:24:25 +00:00
Jason Liu
1b68493fed [AIX][XCOFF][NFC] Change the SectionLen field name of CSect Auxiliary entry to SectionOrLength.
Summary:
According the the XCOFF document,
If
Then
XTY_SD
x_scnlen contains the csect length.
XTY_LD
x_scnlen contains the symbol table index of the containing csect.
XTY_CM
x_scnlen contains the csect length.
XTY_ER
x_scnlen contains 0.

Change the SectionLen member name to SectionOrLength is more reasonable.

Authored By: DiggerLin

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D68650

llvm-svn: 374179
2019-10-09 16:19:39 +00:00
Jonas Devlieghere
8a6c6bbbd4 Re-land "[dsymutil] Fix handling of common symbols in multiple object files."
The original patch got reverted because it hit a long-standing legacy
issue on Windows that prevents files from being named `com`. Thanks
Kristina & Jeremy for pointing this out.

llvm-svn: 374178
2019-10-09 16:19:13 +00:00
Alina Sbirlea
5f91d66ebc [MemorySSA] Make the use of moveAllAfterMergeBlocks consistent.
Summary:
The rule for the moveAllAfterMergeBlocks API si for all instructions
from `From` to have been moved to `To`, while keeping the CFG edges (and
block terminators) unchanged.
Update all the callsites for moveAllAfterMergeBlocks to follow this.

Pending follow-up: since the same behavior is needed everytime, merge
all callsites into one. The common denominator may be the call to
`MergeBlockIntoPredecessor`.

Resolves PR43569.

Reviewers: george.burgess.iv

Subscribers: Prazek, sanjoy.google, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68659

llvm-svn: 374177
2019-10-09 15:54:24 +00:00
Simon Pilgrim
b3d2b15894 Fix Wdocumentation unknown parameter warning. NFCI.
llvm-svn: 374171
2019-10-09 14:26:09 +00:00
Clement Courbet
59fcff6cfd [llvm-exegesis] Ensure that ExecutableFunction are aligned.
Summary: Experiments show that this is the alignment we get (for ELF+Linux), but let's ensure that we have it.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68703

llvm-svn: 374170
2019-10-09 14:25:08 +00:00
David Green
ed6415380f Add and adjust saturating tests. NFC
This adds some extra testing to the existing [su][add/sub]_sat X86 and AArch64
tests and adds equivalent tests for ARM.

llvm-svn: 374169
2019-10-09 14:17:38 +00:00
Sjoerd Meijer
7a73112681 [LV] Emitting SCEV checks with OptForSize
When optimising for size and SCEV runtime checks need to be emitted to check
overflow behaviour, the loop vectorizer can run in this assert:

  LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks(
  llvm::Loop *, llvm::BasicBlock *): Assertion `!BB->getParent()->hasOptSize()
  && "Cannot SCEV check stride or overflow when opt

We should not generate predicates while optimising for size because
code will be generated for predicates such as these SCEV overflow runtime
checks.

This should fix PR43371.

Differential Revision: https://reviews.llvm.org/D68082

llvm-svn: 374166
2019-10-09 13:19:41 +00:00
Simon Atanasyan
42e14e84bc [mips] Rename local variable. NFC
llvm-svn: 374165
2019-10-09 13:12:27 +00:00
Simon Atanasyan
5708280dc2 [mips] Split expandLoadImmReal into multiple methods. NFC
The `expandLoadImmReal` handles four different and almost non-overlapping
cases: loading a "single" float immediate into a GPR, loading a "single"
float immediate into a FPR, and the same couple for a "double" float
immediate.

It's better to move each `else if` branch into separate methods.

llvm-svn: 374164
2019-10-09 13:12:21 +00:00
Clement Courbet
b508b3b70b [llvm-exegesis] Fix r374158
Some bots complain about missing 'class':

LlvmState.h:70:40: error: declaration of ‘std::unique_ptr<const llvm::TargetMachine> llvm::exegesis::LLVMState::TargetMachine’ [-fpermissive]
   std::unique_ptr<const TargetMachine> TargetMachine;

llvm-svn: 374162
2019-10-09 12:37:56 +00:00
Simon Pilgrim
bad15ead01 [CostModel][X86] Add tests for insertelement to non-immediate vector element indices
llvm-svn: 374161
2019-10-09 12:36:34 +00:00
Simon Pilgrim
862f561ce8 [CostModel][X86] Add tests for extractelement from non-immediate vector element indices
llvm-svn: 374160
2019-10-09 12:36:22 +00:00
David Green
ea189dfb19 [ARM] Add saturating arithmetic tests for MVE. NFC
llvm-svn: 374159
2019-10-09 12:29:51 +00:00
Clement Courbet
de4e3c1cf7 [llvm-exegesis][NFC] Remove extra llvm:: qualifications.
Summary: Second patch: in the lib.

Reviewers: gchatelet

Subscribers: nemanjai, tschuett, MaskRay, mgrang, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68692

llvm-svn: 374158
2019-10-09 11:58:42 +00:00
Clement Courbet
99479f2b1a [llvm-exegesis][NFC] Remove extra llvm:: qualifications.
Summary: First patch: in unit tests.

Subscribers: nemanjai, tschuett, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68687

llvm-svn: 374157
2019-10-09 11:29:21 +00:00
James Molloy
b8dff35c25 [TableGen] Fix crash when using HwModes in CodeEmitterGen
When an instruction has an encoding definition for only a subset of
the available HwModes, ensure we just avoid generating an encoding
rather than crash.

llvm-svn: 374150
2019-10-09 09:15:34 +00:00
Clement Courbet
9a982602b3 [llvm-exegesis] Add missing std::move in rL374146.
This was breaking some bots:

/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/include/llvm/Support/Error.h:483:5:   required from ‘llvm::Expected<T>::Expected(OtherT&&, typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type*) [with OtherT = std::vector<llvm::exegesis::CodeTemplate>&; T = std::vector<llvm::exegesis::CodeTemplate>; typename std::enable_if<std::is_convertible<_Rep2, _Rep>::value>::type = void]’
/home/buildbots/ppc64le-clang-lnt-test/clang-ppc64le-lnt/llvm/tools/llvm-exegesis/lib/X86/Target.cpp:238:20:   required from here
/usr/include/c++/6/bits/stl_construct.h:75:7: error: use of deleted function ‘llvm::exegesis::CodeTemplate::CodeTemplate(const llvm::exegesis::CodeTemplate&)’
     { ::new(static_cast<void*>(__p)) _T1(std::forward<_Args>(__args)...); }
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

llvm-svn: 374149
2019-10-09 09:07:21 +00:00
Hans Wennborg
70269b4845 Unify the two CRC implementations
David added the JamCRC implementation in r246590. More recently, Eugene
added a CRC-32 implementation in r357901, which falls back to zlib's
crc32 function if present.

These checksums are essentially the same, so having multiple
implementations seems unnecessary. This replaces the CRC-32
implementation with the simpler one from JamCRC, and implements the
JamCRC interface in terms of CRC-32 since this means it can use zlib's
implementation when available, saving a few bytes and potentially making
it faster.

JamCRC took an ArrayRef<char> argument, and CRC-32 took a StringRef.
This patch changes it to ArrayRef<uint8_t> which I think is the best
choice, and simplifies a few of the callers nicely.

Differential revision: https://reviews.llvm.org/D68570

llvm-svn: 374148
2019-10-09 09:06:30 +00:00
Clement Courbet
2a037be6e6 [llvm-exegesis][NFC] Fix rL374146.
Remove extra semicolon: Target.cpp:187:2: warning: extra ‘;’ [-Wpedantic]

llvm-svn: 374147
2019-10-09 09:03:42 +00:00
Clement Courbet
b3877eff38 [llvm-exegesis] Explore LEA addressing modes.
Summary:
This will help for PR32326.

This shows the well-known issue with `RBP` and `R13` as base registers.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits, RKSimon, andreadb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68646

llvm-svn: 374146
2019-10-09 08:49:13 +00:00
Jeremy Morse
c79513d483 Revert r374139, "[dsymutil] Fix handling of common symbols in multiple object files."
The added test files ("com", "com1.o", "com2.o") are reserved names on
Windows, and makes 'git checkout' fail with a filesystem error.

llvm-svn: 374144
2019-10-09 08:27:48 +00:00
Clement Courbet
932bbb565b [llvm-exegesis][NFC] Remove unecessary using llvm:: directives.
We've been in namespace llvm for at least a year.

llvm-svn: 374143
2019-10-09 07:52:07 +00:00
Jonas Devlieghere
5ba2afa18c [dsymutil] Fix handling of common symbols in multiple object files.
For common symbols the linker emits only a single symbol entry in the
debug map. This caused dsymutil to not relocate common symbols when
linking DWARF coming form object files that did not have this entry.
This patch fixes that by keeping track of common symbols in the object
files and synthesizing a debug map entry for them using the address from
the main binary.

Differential revision: https://reviews.llvm.org/D68680

llvm-svn: 374139
2019-10-09 04:16:18 +00:00
Kristina Brooks
68776fe8ce [TypeSize] Fix module builds (cassert)
TypeSize.h uses `assert` statements without including
the <cassert> header first which leads to failures
in modular builds.

llvm-svn: 374138
2019-10-09 04:00:03 +00:00
Nico Weber
6aa0bc0789 gn build: unbreak libcxx build after r374116 by restoring gen_link_script.py for gn
llvm-svn: 374129
2019-10-08 23:08:18 +00:00
DeForest Richards
fef32a3f70 [Docs] Fixes broken sphinx build - undefined label
Removes label ref pointing to non-existent subsystem docs page.

llvm-svn: 374128
2019-10-08 22:45:20 +00:00
Bill Wendling
2a1b7733a8 [IA] Add tests for a few other edge cases
Test with the last eight bits within the range [7F, FF] and with
lower-case hex letters.

llvm-svn: 374124
2019-10-08 22:06:09 +00:00
Jonas Devlieghere
4f0a9f4024 [dsymutil] Improve verbose output (NFC)
The verbose output for finding relocations assumed that we'd always dump
the DIE after (which starts with a newline) and therefore didn't include
one itself. However, this isn't always true, leading to garbled output.

This patch adds a newline to the verbose output and adds a line that
says that the DIE is being kept (which isn't obvious otherwise). It also
adds a 0x prefix to the relocations.

llvm-svn: 374123
2019-10-08 22:03:13 +00:00
David Blaikie
952b16d00e DebugInfo: Move LLE enum handling to .def to match RLE handling
llvm-svn: 374122
2019-10-08 21:48:46 +00:00
Roman Lebedev
1b3874e519 [CVP} Replace SExt with ZExt if the input is known-non-negative
Summary:
zero-extension is far more friendly for further analysis.
While this doesn't directly help with the shift-by-signext problem, this is not unrelated.

This has the following effect on test-suite (numbers collected after the finish of middle-end module pass manager):
| Statistic                            |     old |     new | delta | percent change |
| correlated-value-propagation.NumSExt |       0 |    6026 |  6026 |   +100.00%     |
| instcount.NumAddInst                 |  272860 |  271283 | -1577 |     -0.58%     |
| instcount.NumAllocaInst              |   27227 |   27226 | -1    |      0.00%     |
| instcount.NumAndInst                 |   63502 |   63320 | -182  |     -0.29%     |
| instcount.NumAShrInst                |   13498 |   13407 | -91   |     -0.67%     |
| instcount.NumAtomicCmpXchgInst       |    1159 |    1159 |  0    |      0.00%     |
| instcount.NumAtomicRMWInst           |    5036 |    5036 |  0    |      0.00%     |
| instcount.NumBitCastInst             |  672482 |  672353 | -129  |     -0.02%     |
| instcount.NumBrInst                  |  702768 |  702195 | -573  |     -0.08%     |
| instcount.NumCallInst                |  518285 |  518205 | -80   |     -0.02%     |
| instcount.NumExtractElementInst      |   18481 |   18482 |  1    |      0.01%     |
| instcount.NumExtractValueInst        |   18290 |   18288 | -2    |     -0.01%     |
| instcount.NumFAddInst                |  139035 |  138963 | -72   |     -0.05%     |
| instcount.NumFCmpInst                |   10358 |   10348 | -10   |     -0.10%     |
| instcount.NumFDivInst                |   30310 |   30302 | -8    |     -0.03%     |
| instcount.NumFenceInst               |     387 |     387 |  0    |      0.00%     |
| instcount.NumFMulInst                |   93873 |   93806 | -67   |     -0.07%     |
| instcount.NumFPExtInst               |    7148 |    7144 | -4    |     -0.06%     |
| instcount.NumFPToSIInst              |    2823 |    2838 |  15   |      0.53%     |
| instcount.NumFPToUIInst              |    1251 |    1251 |  0    |      0.00%     |
| instcount.NumFPTruncInst             |    2195 |    2191 | -4    |     -0.18%     |
| instcount.NumFSubInst                |   92109 |   92103 | -6    |     -0.01%     |
| instcount.NumGetElementPtrInst       | 1221423 | 1219157 | -2266 |     -0.19%     |
| instcount.NumICmpInst                |  479140 |  478929 | -211  |     -0.04%     |
| instcount.NumIndirectBrInst          |       2 |       2 |  0    |      0.00%     |
| instcount.NumInsertElementInst       |   66089 |   66094 |  5    |      0.01%     |
| instcount.NumInsertValueInst         |    2032 |    2030 | -2    |     -0.10%     |
| instcount.NumIntToPtrInst            |   19641 |   19641 |  0    |      0.00%     |
| instcount.NumInvokeInst              |   21789 |   21788 | -1    |      0.00%     |
| instcount.NumLandingPadInst          |   12051 |   12051 |  0    |      0.00%     |
| instcount.NumLoadInst                |  880079 |  878673 | -1406 |     -0.16%     |
| instcount.NumLShrInst                |   25919 |   25921 |  2    |      0.01%     |
| instcount.NumMulInst                 |   42416 |   42417 |  1    |      0.00%     |
| instcount.NumOrInst                  |  100826 |  100576 | -250  |     -0.25%     |
| instcount.NumPHIInst                 |  315118 |  314092 | -1026 |     -0.33%     |
| instcount.NumPtrToIntInst            |   15933 |   15939 |  6    |      0.04%     |
| instcount.NumResumeInst              |    2156 |    2156 |  0    |      0.00%     |
| instcount.NumRetInst                 |   84485 |   84484 | -1    |      0.00%     |
| instcount.NumSDivInst                |    8599 |    8597 | -2    |     -0.02%     |
| instcount.NumSelectInst              |   45577 |   45913 |  336  |      0.74%     |
| instcount.NumSExtInst                |   84026 |   78278 | -5748 |     -6.84%     |
| instcount.NumShlInst                 |   39796 |   39726 | -70   |     -0.18%     |
| instcount.NumShuffleVectorInst       |  100272 |  100292 |  20   |      0.02%     |
| instcount.NumSIToFPInst              |   29131 |   29113 | -18   |     -0.06%     |
| instcount.NumSRemInst                |    1543 |    1543 |  0    |      0.00%     |
| instcount.NumStoreInst               |  805394 |  804351 | -1043 |     -0.13%     |
| instcount.NumSubInst                 |   61337 |   61414 |  77   |      0.13%     |
| instcount.NumSwitchInst              |    8527 |    8524 | -3    |     -0.04%     |
| instcount.NumTruncInst               |   60523 |   60484 | -39   |     -0.06%     |
| instcount.NumUDivInst                |    2381 |    2381 |  0    |      0.00%     |
| instcount.NumUIToFPInst              |    5549 |    5549 |  0    |      0.00%     |
| instcount.NumUnreachableInst         |    9855 |    9855 |  0    |      0.00%     |
| instcount.NumURemInst                |    1305 |    1305 |  0    |      0.00%     |
| instcount.NumXorInst                 |   10230 |   10081 | -149  |     -1.46%     |
| instcount.NumZExtInst                |   60353 |   66840 |  6487 |     10.75%     |
| instcount.TotalBlocks                |  829582 |  829004 | -578  |     -0.07%     |
| instcount.TotalFuncs                 |   83818 |   83817 | -1    |      0.00%     |
| instcount.TotalInsts                 | 7316574 | 7308483 | -8091 |     -0.11%     |

TLDR: we produce -0.11% less instructions, -6.84% less `sext`, +10.75% more `zext`.
To be noted, clearly, not all new `zext`'s are produced by this fold.

(And now i guess it might have been interesting to measure this for D68103 :S)

Reviewers: nikic, spatel, reames, dberlin

Reviewed By: nikic

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68654

llvm-svn: 374112
2019-10-08 20:29:48 +00:00
Roman Lebedev
b491f89f30 [CVP][NFC] Revisit sext vs. zext test
llvm-svn: 374111
2019-10-08 20:29:36 +00:00
Jordan Rose
6422c4ff57 Mark several PointerIntPair methods as lvalue-only
No point in mutating 'this' if it's just going to be thrown away.

https://reviews.llvm.org/D63945

llvm-svn: 374102
2019-10-08 19:01:48 +00:00
Daniel Sanders
a562501cd0 [tblgen] Add getOperatorAsDef() to Record
Summary:
While working with DagInit's, it's often the case that you expect the
operator to be a reference to a def. This patch adds a wrapper for this
common case to reduce the amount of boilerplate callers need to duplicate
repeatedly.

getOperatorAsDef() returns the record if the DagInit has an operator that is
a DefInit. Otherwise, it prints a fatal error.

There's only a few pre-existing examples in LLVM at the moment and I've
left a few instances of the code this simplifies as they had more specific
error messages than the generic one this produces. I'm going to be using
this a fair bit in my subsequent patches.

Reviewers: bogner, volkan, nhaehnle

Reviewed By: nhaehnle

Subscribers: nhaehnle, hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68424

llvm-svn: 374101
2019-10-08 18:41:32 +00:00
Yonghong Song
7769861822 [BPF] do compile-once run-everywhere relocation for bitfields
A bpf specific clang intrinsic is introduced:
   u32 __builtin_preserve_field_info(member_access, info_kind)
Depending on info_kind, different information will
be returned to the program. A relocation is also
recorded for this builtin so that bpf loader can
patch the instruction on the target host.
This clang intrinsic is used to get certain information
to facilitate struct/union member relocations.

The offset relocation is extended by 4 bytes to
include relocation kind.
Currently supported relocation kinds are
 enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
 };
for __builtin_preserve_field_info. The old
access offset relocation is covered by
    FIELD_BYTE_OFFSET = 0.

An example:
struct s {
    int a;
    int b1:9;
    int b2:4;
};
enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
};

void bpf_probe_read(void *, unsigned, const void *);
int field_read(struct s *arg) {
  unsigned long long ull = 0;
  unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET);
  unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE);
 #ifdef USE_PROBE_READ
  bpf_probe_read(&ull, size, (const void *)arg + offset);
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
  lshift = lshift + (size << 3) - 64;
 #endif
 #else
  switch(size) {
  case 1:
    ull = *(unsigned char *)((void *)arg + offset); break;
  case 2:
    ull = *(unsigned short *)((void *)arg + offset); break;
  case 4:
    ull = *(unsigned int *)((void *)arg + offset); break;
  case 8:
    ull = *(unsigned long long *)((void *)arg + offset); break;
  }
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #endif
  ull <<= lshift;
  if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS))
    return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
  return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
}

There is a minor overhead for bpf_probe_read() on big endian.

The code and relocation generated for field_read where bpf_probe_read() is
used to access argument data on little endian mode:
        r3 = r1
        r1 = 0
        r1 = 4  <=== relocation (FIELD_BYTE_OFFSET)
        r3 += r1
        r1 = r10
        r1 += -8
        r2 = 4  <=== relocation (FIELD_BYTE_SIZE)
        call bpf_probe_read
        r2 = 51 <=== relocation (FIELD_LSHIFT_U64)
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r2
        r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
        r0 = r1
        r0 >>= r2
        r3 = 1  <=== relocation (FIELD_SIGNEDNESS)
        if r3 == 0 goto LBB0_2
        r1 s>>= r2
        r0 = r1
LBB0_2:
        exit

Compare to the above code between relocations FIELD_LSHIFT_U64 and
FIELD_LSHIFT_U64, the code with big endian mode has four more
instructions.
        r1 = 41   <=== relocation (FIELD_LSHIFT_U64)
        r6 += r1
        r6 += -64
        r6 <<= 32
        r6 >>= 32
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r6
        r2 = 60   <=== relocation (FIELD_RSHIFT_U64)

The code and relocation generated when using direct load.
        r2 = 0
        r3 = 4
        r4 = 4
        if r4 s> 3 goto LBB0_3
        if r4 == 1 goto LBB0_5
        if r4 == 2 goto LBB0_6
        goto LBB0_9
LBB0_6:                                 # %sw.bb1
        r1 += r3
        r2 = *(u16 *)(r1 + 0)
        goto LBB0_9
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
        if r4 == 8 goto LBB0_8
        goto LBB0_9
LBB0_8:                                 # %sw.bb9
        r1 += r3
        r2 = *(u64 *)(r1 + 0)
        goto LBB0_9
LBB0_5:                                 # %sw.bb
        r1 += r3
        r2 = *(u8 *)(r1 + 0)
        goto LBB0_9
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51
        r2 <<= r1
        r1 = 60
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

Considering verifier is able to do limited constant
propogation following branches. The following is the
code actually traversed.
        r2 = 0
        r3 = 4   <=== relocation
        r4 = 4   <=== relocation
        if r4 s> 3 goto LBB0_3
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51   <=== relocation
        r2 <<= r1
        r1 = 60   <=== relocation
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

For native load case, the load size is calculated to be the
same as the size of load width LLVM otherwise used to load
the value which is then used to extract the bitfield value.

Differential Revision: https://reviews.llvm.org/D67980

llvm-svn: 374099
2019-10-08 18:23:17 +00:00
Matt Arsenault
fc005b5d71 AMDGPU: Fix i16 arithmetic pattern redundancy
There were 2 problems here. First, these patterns were duplicated to
handle the inverted shift operands instead of using the commuted
PatFrags.

Second, the point of the zext folding patterns don't apply to the
non-0ing high subtargets. They should be skipped instead of inserting
the extension. The zeroing high code would be emitted when necessary
anyway. This was also emitting unnecessary zexts in cases where the
high bits were undefined.

llvm-svn: 374092
2019-10-08 17:36:38 +00:00
Jinsong Ji
6616c06663 Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0e648a006c9f38e11919f181b6c7e0a.
This reverts commit 18b6fe07bcf44294f200bd2b526cb737ed275c04.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

llvm-svn: 374091
2019-10-08 17:32:56 +00:00
Sanjay Patel
edf9c4bf21 [SLP] add test with prefer-vector-width function attribute; NFC (PR43578)
llvm-svn: 374090
2019-10-08 17:18:32 +00:00
Vedant Kumar
6436fd4f1a [CodeExtractor] Factor out and reuse shrinkwrap analysis
Factor out CodeExtractor's analysis of allocas (for shrinkwrapping
purposes), and allow the analysis to be reused.

This resolves a quadratic compile-time bug observed when compiling
AMDGPUDisassembler.cpp.o.

Pre-patch (Release + LTO clang):

```
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  176.5278 ( 57.8%)   0.4915 ( 18.5%)  177.0192 ( 57.4%)  177.4112 ( 57.3%)  Hot Cold Splitting
```

Post-patch (ReleaseAsserts clang):

```
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  1.4051 (  3.3%)   0.0079 (  0.3%)   1.4129 (  3.2%)   1.4129 (  3.2%)  Hot Cold Splitting
```

Testing: check-llvm, and comparing the AMDGPUDisassembler.cpp.o binary
pre- vs. post-patch.

An alternate approach is to hide CodeExtractorAnalysisCache from clients
of CodeExtractor, and to recompute the analysis from scratch inside of
CodeExtractor::extractCodeRegion(). This eliminates some redundant work
in the shrinkwrapping legality check. However, some clients continue to
exhibit O(n^2) compile time behavior as computing the analysis is O(n).

rdar://55912966

Differential Revision: https://reviews.llvm.org/D68616

llvm-svn: 374089
2019-10-08 17:17:51 +00:00
Tom Stellard
99acd7b50d AMDGPU: Add offsets to MMO when lowering buffer intrinsics
Summary:
Without offsets on the MachineMemOperands (MMOs),
MachineInstr::mayAlias() will return true for all reads and writes to the
same resource descriptor.  This leads to O(N^2) complexity in the MachineScheduler
when analyzing dependencies of buffer loads and stores.  It also limits
the SILoadStoreOptimizer from merging more instructions.

This patch reduces the compile time of one pathological compute shader
from 12 seconds to 1 second.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65097

llvm-svn: 374087
2019-10-08 17:04:51 +00:00
Hideto Ueno
3073348b63 [Attributor][Fix] Temporary fix for windows build bot failure
D65402 causes test failure related to attributor-max-iterations.
This commit removes attributor-max-iterations-verify for now.
I'll examine the factor and the flag should be reverted.

llvm-svn: 374086
2019-10-08 17:01:56 +00:00
Simon Pilgrim
a23fa47859 CodeGenPrepare - silence static analyzer dyn_cast<> null dereference warnings. NFCI.
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<> directly and if not assert will fire for us.

llvm-svn: 374085
2019-10-08 17:00:01 +00:00
Stanislav Mekhanoshin
fb4b305275 [AMDGPU] Disable unused gfx10 dpp instructions
Inhibit generation of unused real dpp instructions on gfx10 just
like it is done on other subtargets. This does not change anything
because these are illegal anyway and not accepted, but it does
reduce the number of instruction definitions generated.

Differential Revision: https://reviews.llvm.org/D68607

llvm-svn: 374083
2019-10-08 16:56:01 +00:00
David Greene
bea2f8fa2d [UpdateCCTestChecks] Detect function mangled name on separate line
Sometimes functions with large comment blocks in front of them have their
declarations output on several lines by c-index-test.  Hence the one-line
function name/line/mangled pattern will not work to detect them.  Break the
pattern up into two patterns and keep state after seeing the name/line
information until we finally see the mangled name.

Differential Revision: https://reviews.llvm.org/D68272

llvm-svn: 374078
2019-10-08 16:25:42 +00:00
Roman Lebedev
9a1dbacc31 [NFC][CVP] Add tests where we can replace sext with zext
If the sign bit of the value that is being sign-extended is not set,
i.e. the value is non-negative (s>= 0), then zero-extension will suffice,
and is better for analysis: https://rise4fun.com/Alive/a8PD

llvm-svn: 374075
2019-10-08 16:21:13 +00:00
Amaury Sechet
c46babc9a8 (Re)generate various tests. NFC
llvm-svn: 374074
2019-10-08 16:16:26 +00:00