1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00
Commit Graph

194615 Commits

Author SHA1 Message Date
Alexey Lapshin
1ce931d8ef [DWARFLinker][dsymutil] followup for 88c2137b6d49f88186d0957a4e2d8030a3967334
That patch is a followup for "Move DwarfStreamer into DWARFLinker".
It fixes build with LLVM_LINK_LLVM_DYLIB.
2020-04-08 16:46:52 +03:00
Clement Courbet
1f97534ae0 [llvm-exegesis][NFC] Let the pfm::Counter own the PerfHelper.
A perf helper is always only ever cretaed to be checked for validity
then passed as Counter ctor argument, never to be touched again.
Its lifetime should outlive that of the counter, and there is never any
reason to have two different counters of top of the perf helper.
Make sure these assumptions always hold by making the Counter consume the
PerfHelper.
2020-04-08 15:37:30 +02:00
Stefan Pintilie
bd356ab8f2 [PowerPC][Future] Add Support For Functions That Do Not Use A TOC.
On PowerPC most functions require a valid TOC pointer.

This is the case because either the function itself needs to use this
pointer to access the TOC or because other functions that are called
from that function expect a valid TOC pointer in the register R2.
The main exception to this is leaf functions that do not access the TOC
since they are guaranteed not to need a valid TOC pointer.

This patch introduces a feature that will allow more functions to not
require a valid TOC pointer in R2.

Differential Revision: https://reviews.llvm.org/D73664
2020-04-08 08:07:35 -05:00
Sanjay Patel
28cf40e4e3 [LangRef] update text for shufflevector
D72467 updated the shufflevector instruction to include a constant mask
rather than a mask operand. The LangRef text was vague enough to still
make sense, but it is better to update here too, so there's no confusion
about valid mask values. The text here is adapted from the documentation
code comments for "class ShuffleVectorInst".

Differential Revision: https://reviews.llvm.org/D77396
2020-04-08 09:01:01 -04:00
Sanjay Patel
288b81996c [InstCombine] exclude bitcast of ppc_fp128 in icmp signbit fold
Based on the post-commit comments for rG0f56bbc, there might
be a problem with this transform:

(bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0

...and the ppc_fp128 data type, so conservatively bypass if we
are bitcasting a ppc_fp128.

We might be able to account for endian or other differences to
enable this for PowerPC again if that is useful.

Differential Revision: https://reviews.llvm.org/D77642
2020-04-08 08:56:19 -04:00
Clement Courbet
1cb40bb793 [llvm-exegesis][NFC] Remove dead code. 2020-04-08 14:29:26 +02:00
Simon Pilgrim
c945e58715 [AMDGPU] Regenerate vector-extract-insert test checks to fix issue reported on D77354 2020-04-08 13:18:32 +01:00
Simon Pilgrim
4b5bdb8892 [AMDGPU] Regenerate si-annotate-cfg-loop-assert test checks to fix issue reported on D77354 2020-04-08 13:09:08 +01:00
Simon Pilgrim
efa5f69c31 [X86][SSE] Combine PTEST(AND(X,Y),AND(X,Y)) -> PTEST(X,Y) and ANDN equivalents
Tests derived from PR42035 examples
2020-04-08 12:42:22 +01:00
Jeremy Morse
c4f32ed24d [DebugInfo][NFC] Early-exit when analyzing for single-location variables
This is a performance patch that hoists two conditions in DwarfDebug's
validThroughout to avoid a linear-scan of all instructions in a block. We
now exit early if validThrougout will never return true for the variable
location.

The first added clause filters for the two circumstances where
validThroughout will return true. The second added clause should be
identical to the one that's deleted from after the linear-scan.

Differential Revision: https://reviews.llvm.org/D77639
2020-04-08 12:27:11 +01:00
Peter Smith
9411f6a4ad [ELF][AArch64] Add R_AARCH64_PLT32 relocation type.
The R_AARCH64_PLT32 relocation type will be documented in the next release
of ELF for the 64-bit Arm Architecture. It is being added in draft state
for the benefit of the position independent vtable feature.

R_AARCH64_PLT32 is very similar to R_AARCH64_PREL32. The intention is to
provide a signed 32-bit integer representing an offset from the place
to a function.
- It relocates 32-bit data
- The expression is S + A - P
- The overflow check for the expression is -2^31 <= X < 2^31
- The relocation generates Thunks/Veneers/Stubs and PLT entries as per
  R_AArch64_CALL26
- If the symbol S is an undefined weak the ABI does not define its value.

The ABI defines a code for ilp32 for completeness, I have added the code
but have only added to the existing reloc-types-elf-aarch64.text as there
is no ilp32 equivalent.

Differential Revision: https://reviews.llvm.org/D77647
2020-04-08 12:19:35 +01:00
Shengchen Kan
63b7da67c3 [X86][MC] Support enhanced relaxation for branch align
Summary:
Since D75300 has been landed, I want to support enhanced relaxation when we need to align branches and allow prefix padding. "Enhanced Relaxtion" means we allow an instruction that could not be traditionally relaxed to be emitted into RelaxableFragment so that we increase its length by adding prefixes for optimization.

The motivation is straightforward, RelaxFragment is mostly for relative jumps and we can not increase the length of jumps when we need to align them, so if we need to achieve D75300's purpose (reducing the bytes of nops) when need to align jumps, we have to make more instructions "relaxable".

Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76286
2020-04-08 19:08:19 +08:00
Mikael Holmen
d075e066f3 [IfConversion] Disallow TrueBB == FalseBB for valid diamonds
Summary:
This fixes PR45302.

Previously the case

     BB1
     / \
    |   |
   TBB FBB
    |   |
     \ /
     BB2

was treated as a valid diamond also when TBB and FBB was the same basic
block. This then lead to a failed assertion in IfConvertDiamond.

Since TBB == FBB is quite a degenerated case of a diamond, we now
don't treat it as a valid diamond anymore, and thus we will avoid the
trouble of making IfConvertDiamond handle it correctly.

Reviewers: efriedma, kparzysz

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D77651
2020-04-08 12:50:36 +02:00
Anna Welker
5cfdb4f2d2 [ARM][MVE] Optimise offset addresses of gathers/scatters
This patch adds an analysis of the offset addresses used by gathers
and scatters to the MVEGatherScatterLowering pass to find
multiplications and additions that are loop invariant and thus can
be moved into the loop preheader, avoiding to execute them each time.

Differential Revision: https://reviews.llvm.org/D76681
2020-04-08 11:46:57 +01:00
Max Kazantsev
986d90607e [LoopLoadElim] Add test showing that LoopLoadElim doesn't work correctly with new PM 2020-04-08 17:32:03 +07:00
Dominik Montada
f88aa39e68 [GlobalISel] combine trunc(trunc) pattern
Summary:
Legalization can introduce the trunc(trunc) pattern. This can cause
problems if one of these intermediate truncs is not legal.
Combine truncs of this pattern, if the resulting trunc is legal.

Reviewers: arsenm, aemerson, dsanders

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76601
2020-04-08 11:58:28 +02:00
James Henderson
ec45a0d60a [llvm-objdump] Fix unstable disassembly output for sections with same address
When two sections shared the same address, the disassembly code was
using pointer values when sorting (see the SectionRef less than
operator). Since those values aren't guaranteed to have a specific
order, this meant the disassembly code would sometimes change which
section to pick when finding symbols targeted by calls in fully linked
objects.

This change fixes the non-determinism, so that the same section is
always picked. This might have a negative impact in that now a section
without any symbol might be picked over a section with symbols, but this
will be addressed in a later commit.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45411.

Reviewed by: grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D77640
2020-04-08 10:57:12 +01:00
Dominik Montada
8eb4f5335a [GlobalISel] Combine sext([sz]ext) -> [sz]ext, zext(zext) -> zext
Summary:
Combine sext(zext x) to (zext x) since the sign-bit is 0
after the zero-extension.

Combine sext(sext x) to (sext x) and ext(zext x) to (zext x)
since the intermediate step is not needed.

Reviewers: arsenm, volkan, aemerson, aditya_nandakumar

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77210
2020-04-08 11:24:29 +02:00
Dominik Montada
c300669e28 [GlobalISel] support narrow G_IMPLICIT_DEF for DstSize % NarrowSize != 0
Summary:
When narrowing G_IMPLICIT_DEF where the original size is not a multiple
of the narrow size, emit a smaller G_IMPLICIT_DEF and use G_ANYEXT.

To prevent a potential endless loop in the legalizer, the condition
to combine G_ANYEXT(G_IMPLICIT_DEF) is changed from isInstUnsupported
to !isInstLegal, since in this case the combine is only valid if
consequent legalization of the newly combined G_IMPLICIT_DEF does not
introduce G_ANYEXT due to narrowing.

Although this legalization for G_IMPLICIT_DEF would also be valid for
the general case, it actually caused a lot of code regressions when
tried due to superfluous COPYs and combines not getting hit anymore.

Reviewers: dsanders, aemerson, volkan, arsenm, aditya_nandakumar

Reviewed By: arsenm

Subscribers: jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76598
2020-04-08 11:00:07 +02:00
Kazushi (Jam) Marukawa
95dabe4c03 [VE] Simplify definitions of uimm6 and simm7
Summary: To prepare continuous changes, simplify uimm6 and simm7 operands.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D77700
2020-04-08 09:53:42 +02:00
Igor Kudrin
afc5a59902 [DebugInfo] Fix reading DWARFv5 type units in DWP.
In DWARFv5, type units are stored in .debug_info sections, along with
compilation units, and they are distinguished by the unit_type field
in the header, not by the name of the section. It is impossible to
associate the correct index section of a DWP file with the unit before
the unit's header is read. This patch fixes reading DWARFv5 type units
by parsing the header first and then applying the index entry according
to the actual unit type.

Differential Revision: https://reviews.llvm.org/D77552
2020-04-08 12:50:58 +07:00
Julian Lettner
037e80c6d7 [lit] Print slowest tests and time histogram before result groups 2020-04-07 22:19:50 -07:00
Julian Lettner
4fcbf321ca [lit] Improve test summary output
This change aligns the test summary output along the longest
category label.  We also properly align test counts.

Before:
```
Testing Time: 10.30s
  Unsupported Tests  : 1
  Expected Passes    : 30
```

After:
```
Testing Time: 10.29s
  Unsupported Tests:  1
  Expected Passes  : 30
```
2020-04-07 22:19:50 -07:00
LLVM GN Syncbot
a02a5ccf8a [gn build] Port f85ae058f58 2020-04-08 04:48:03 +00:00
Stanislav Mekhanoshin
52b04148e1 [AMDGPU] Expand vector trunc stores from i16 to i8
Differential Revision: https://reviews.llvm.org/D77693
2020-04-07 21:47:45 -07:00
Johannes Doerfert
51bfbdce19 [OpenMP] Add match_{all,any,none} declare variant selector extensions.
By default, all traits in the OpenMP context selector have to match for
it to be acceptable. Though, we sometimes want a single property out of
multiple to match (=any) or no match at all (=none). We offer these
choices as extensions via
  `implementation={extension(match_{all,any,none})}`
to the user. The choice will affect the entire context selector not only
the traits following the match property.

The first user will be D75788. There we can replace
```
  #pragma omp begin declare variant match(device={arch(nvptx64)})
  #define __CUDA__

  #include <__clang_cuda_cmath.h>

  // TODO: Hack until we support an extension to the match clause that allows "or".
  #undef __CLANG_CUDA_CMATH_H__

  #undef __CUDA__
  #pragma omp end declare variant

  #pragma omp begin declare variant match(device={arch(nvptx)})
  #define __CUDA__

  #include <__clang_cuda_cmath.h>

  #undef __CUDA__
  #pragma omp end declare variant
```
with the much simpler
```
  #pragma omp begin declare variant match(device={arch(nvptx, nvptx64)}, implementation={extension(match_any)})
  #define __CUDA__

  #include <__clang_cuda_cmath.h>

  #undef __CUDA__
  #pragma omp end declare variant
```

Reviewed By: mikerice

Differential Revision: https://reviews.llvm.org/D77414
2020-04-07 23:33:24 -05:00
Kazu Hirata
2d1b543d4d [JumpThreading] NFC: Simplify ComputeValueKnownInPredecessorsImpl
Summary:
ComputeValueKnownInPredecessorsImpl is the main folding mechanism in
JumpThreading.cpp.  To avoid potential infinite recursion while
chasing use-def chains, it uses:

  DenseSet<std::pair<Value *, BasicBlock *>> &RecursionSet

to keep track of Value-BB pairs that we've processed.

Now, when ComputeValueKnownInPredecessorsImpl recursively calls
itself, it always passes BB as is, so the second element is always BB.

This patch simplifes the function by dropping "BasicBlock *" from
RecursionSet.

Reviewers: wmi, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77699
2020-04-07 18:37:36 -07:00
Julian Lettner
c6a5ea6b85 [lit] Print slowest test first when timing tests
lit supports `--time-tests` which will report the 20 slowest tests and
print a nice histogram for test times.  This change prints this list and
the histogram rows by decreasing test times.  After all, we are most
interested in the slowest tests.
2020-04-07 18:18:33 -07:00
Julian Lettner
da1f80b302 [lit] Improve consistency when printing test results 2020-04-07 18:12:18 -07:00
LLVM GN Syncbot
3fce01ebae [gn build] Port 1adeeabb79a 2020-04-07 23:30:51 +00:00
Eli Friedman
ba79eba868 [NFC] Clean up uses of LoadInst constructor. 2020-04-07 16:28:53 -07:00
Daniel Sanders
418ea606d3 Add MIR-level debugify with only locations support for now
Summary:
Re-used the IR-level debugify for the most part. The MIR-level code then
adds locations to the MachineInstrs afterwards based on the LLVM-IR debug
info.

It's worth mentioning that the resulting locations make little sense as
the range of line numbers used in a Function at the MIR level exceeds that
of the equivelent IR level function. As such, MachineInstrs can appear to
originate from outside the subprogram scope (and from other subprogram
scopes). However, it doesn't seem worth worrying about as the source is
imaginary anyway.

There's a few high level goals this pass works towards:
* We should be able to debugify our .ll/.mir in the lit tests without
  changing the checks and still pass them. I.e. Debug info should not change
  codegen. Combining this with a strip-debug pass should enable this. The
  main issue I ran into without the strip-debug pass was instructions with MMO's and
  checks on both the instruction and the MMO as the debug-location is
  between them. I currently have a simple hack in the MIRPrinter to
  resolve that but the more general solution is a proper strip-debug pass.
* We should be able to test that GlobalISel does not lose debug info. I
  recently found that the legalizer can be unexpectedly lossy in seemingly
  simple cases (e.g. expanding one instr into many). I have a verifier
  (will be posted separately) that can be integrated with passes that use
  the observer interface and will catch location loss (it does not verify
  correctness, just that there's zero lossage). It is a little conservative
  as the line-0 locations that arise from conflicts do not track the
  conflicting locations but it can still catch a fair bit.

Depends on D77439, D77438

Reviewers: aprantl, bogner, vsk

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77446
2020-04-07 16:25:13 -07:00
Fangrui Song
fb5ed21881 [VE] Migrate to the getMachineMemOperand overload using llvm::Align
Just delete the deprecated overload because nothing uses it.
2020-04-07 16:04:54 -07:00
Matt Arsenault
a7ed58a2e2 CodeGen: More conversions to use Register 2020-04-07 18:54:36 -04:00
Fangrui Song
d41ca54775 [ThinLTO] Drop dso_local if a GlobalVariable satisfies isDeclarationForLinker()
dso_local leads to direct access even if the definition is not within this compilation unit (it is
still in the same linkage unit). On ELF, such a relocation (e.g. R_X86_64_PC32) referencing a
STB_GLOBAL STV_DEFAULT object can cause a linker error in a -shared link.

If the linkage is changed to available_externally, the dso_local flag should be dropped, so that no
direct access will be generated.

The current behavior is benign, because -fpic does not assume dso_local
(clang/lib/CodeGen/CodeGenModule.cpp:shouldAssumeDSOLocal).
If we do that for -fno-semantic-interposition (D73865), there will be an
R_X86_64_PC32 linker error without this patch.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D74751
2020-04-07 15:46:01 -07:00
Fangrui Song
c3d83ed19e [VE] Adapt aa26dd985848364df01d3f8f0f3eaccfd5ee80dc and 2481f26ac3f228cc085d4d68ee72dadc07afa48f 2020-04-07 15:45:19 -07:00
Wei Mi
0ca1ed2836 Recommit [SampleFDO] Add flag for partial profile.
Fix the error of show-prof-info.test on some platforms without zlib.

The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.

Differential Revision: https://reviews.llvm.org/D77426
2020-04-07 14:28:25 -07:00
Stanislav Mekhanoshin
c46ea22b5c [AMDGPU] Implement copyPhysReg for 16 bit subregs
Differential Revision: https://reviews.llvm.org/D74937
2020-04-07 14:22:46 -07:00
Matt Arsenault
1588abc3c6 CodeGen: Use Register in TargetFrameLowering 2020-04-07 17:07:44 -04:00
Nikita Popov
c3bd7e86bc [BPI] Clear handles when releasing memory (NFC)
This reduces max-rss of sqlite compilation by 2.5%.
2020-04-07 22:51:01 +02:00
David Blaikie
61aac3db1e Remove some top-level const from return values seen in review 2020-04-07 13:22:22 -07:00
George Burgess IV
c5ae33de4f [TLI] fix a function's (commented) signature; NFC
__strlen_chk returns a `size_t`, not a `char *`.
2020-04-07 13:04:54 -07:00
Matt Arsenault
1280465de4 CodeGen: Use Register in more places 2020-04-07 15:59:40 -04:00
Wei Mi
11ae325c37 Revert "[SampleFDO] Add flag for partial profile." show-prof-info.test breaks on some platforms.
This reverts commit e3ba652a1440794eff0b43ce747f1b0488585d22.
2020-04-07 12:54:51 -07:00
Wei Mi
eb03930b3a [SampleFDO] Add flag for partial profile.
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.

Differential Revision: https://reviews.llvm.org/D77426
2020-04-07 12:17:56 -07:00
Nemanja Ivanovic
9461d82e65 [NFC][PowerPC] Fix register class for patterns using XXPERMDIs
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
2020-04-07 14:06:08 -05:00
Graham Sellers
080e1d3570 [AMDGPU] Extend constant folding for logical operations
This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_OR_B32. Also added a couple of tests for existing folds.
2020-04-07 14:37:16 -04:00
Craig Topper
bd5b15fc7a [SelectionDAG] Make getZeroExtendInReg take a vector VT if the operand VT is a vector.
This removes a call to getScalarType from a bunch of call sites.
It also makes the behavior consistent with SIGN_EXTEND_INREG.

Differential Revision: https://reviews.llvm.org/D77631
2020-04-07 11:34:08 -07:00
LLVM GN Syncbot
ff12f8060c [gn build] Port 88c2137b6d4 2020-04-07 18:26:53 +00:00
Alexey Lapshin
6e415ad8d8 [DWARFLinker][dsymutil][NFC] Move DwarfStreamer into DWARFLinker.
For implementing "remove obsolete debug info in lld", it is neccesary
to have DWARF generation code implementation. dsymutil uses DwarfStreamer
for that purpose. DwarfStreamer uses AsmPrinter. It is considered OK
to use AsmPrinter based code in lld(D74169). This patch moves
DwarfStreamer implementation into DWARFLinker, so that it could be reused
from lld.

Generally, a better place for such a common DWARF generation code would be
not DWARFLinker but an additional separate library. Such a library could
contain a single version of DWARF generation routines and could also
be independent of AsmPrinter. At the current moment, DwarfStreamer
does not pretend to be such a general implementation of DWARF generation.
So I decided to put it into DWARFLinker since it is the only user
of DwarfStreamer.

Testing: it passes "check-all" lit testing. MD5 checksum for clang .dSYM
bundle matches for the dsymutil with/without that patch.

Reviewed By: JDevlieghere

Differential revision: https://reviews.llvm.org/D77169
2020-04-07 21:21:54 +03:00