1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

184764 Commits

Author SHA1 Message Date
Craig Topper
8ad4b280ee [X86] Add broadcast load unfolding support for VMAXPS/PD and VMINPS/PD.
llvm-svn: 371363
2019-09-09 04:25:01 +00:00
Craig Topper
9348c8589f [X86] Add broadcast load unfolding tests for vmaxps/pd and vminps/pd
llvm-svn: 371362
2019-09-09 04:24:57 +00:00
Craig Topper
7902bb2dbb [X86] Add fp128 test cases for ceil/floor/trunc/nearbyint/rint/round libcalls.
llvm-svn: 371360
2019-09-09 02:44:46 +00:00
Kai Luo
2c450092fa [MachineCopyPropagation] Remove redundant copies after TailDup via machine-cp
Summary:
After tailduplication, we have redundant copies. We can remove these
copies in machine-cp if it's safe to, i.e.
```
$reg0 = OP ...
... <<< No read or clobber of $reg0 and $reg1
$reg1 = COPY $reg0 <<< $reg0 is killed
...
<RET>
```
will be transformed to
```
$reg1 = OP ...
...
<RET>
```

Differential Revision: https://reviews.llvm.org/D65267

llvm-svn: 371359
2019-09-09 02:32:42 +00:00
Craig Topper
ffe15d8f5c [X86] Add test cases for fptoui/fptosi/sitofp/uitofp between fp128 and i128.
llvm-svn: 371358
2019-09-09 01:35:04 +00:00
Craig Topper
93a155d417 [X86] Use xorps to create fp128 +0.0 constants.
This matches what we do for f32/f64. gcc also does this for fp128.

llvm-svn: 371357
2019-09-09 01:35:00 +00:00
Craig Topper
1a27134bac [X86] Add avx and avx512f RUN lines to fp128-cast.ll
llvm-svn: 371356
2019-09-09 01:34:55 +00:00
Douglas Yung
a4ee085bc7 Relax opcode checks in test to check for only a number instead of a specific number.
llvm-svn: 371355
2019-09-09 01:21:33 +00:00
Simon Pilgrim
90410bca18 [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add faux shuffle support.
This patch decodes target and faux shuffles with getTargetShuffleInputs - a reduced version of resolveTargetShuffleInputs that doesn't resolve SM_SentinelZero cases, so we can correctly remove zero vectors if they aren't demanded.

llvm-svn: 371353
2019-09-08 21:38:33 +00:00
Roman Lebedev
876cd9fd22 [InstCombine][NFC] Some tests for usub overflow+nonzero check improvement (PR43251)
https://rise4fun.com/Alive/kHq

https://bugs.llvm.org/show_bug.cgi?id=43251

llvm-svn: 371352
2019-09-08 21:30:34 +00:00
Craig Topper
e77d4d1d59 [X86] Add a hack to combineVSelectWithAllOnesOrZeros to turn selects with two zero/undef vector inputs into an all zeroes vector.
If the two zero vectors have undefs in different places they
won't get combined by simplifySelect.

This fixes a regression from an earlier commit.

llvm-svn: 371351
2019-09-08 20:56:09 +00:00
Craig Topper
112f1fdd6d [X86] Remove call to getZeroVector from materializeVectorConstant. Add isel patterns for zero vectors with all types.
The change to avx512-vec-cmp.ll is a regression, but should be
easy to fix. It occurs because the getZeroVector call was
canonicalizing both sides to the same node, then SimplifySelect
was able to simplify it. But since only called getZeroVector
on some VTs this isn't a robust way to combine this.

The change to vector-shuffle-combining-ssse3.ll is more
instructions, but removes a constant pool load so its unclear
if its a regression or not.

llvm-svn: 371350
2019-09-08 20:56:05 +00:00
Roman Lebedev
ca7b7a7578 [InstSimplify] simplifyUnsignedRangeCheck(): if we know that X != 0, handle more cases (PR43246)
Summary:
This is motivated by D67122 sanitizer check enhancement.
That patch seemingly worsens `-fsanitize=pointer-overflow`
overhead from 25% to 50%, which strongly implies missing folds.

In this particular case, given
```
char* test(char& base, unsigned long offset) {
  return &base + offset;
}
```
it will end up producing something like
https://godbolt.org/z/LK5-iH
which after optimizations reduces down to roughly
```
define i1 @t0(i8* nonnull %base, i64 %offset) {
  %base_int = ptrtoint i8* %base to i64
  %adjusted = add i64 %base_int, %offset
  %non_null_after_adjustment = icmp ne i64 %adjusted, 0
  %no_overflow_during_adjustment = icmp uge i64 %adjusted, %base_int
  %res = and i1 %non_null_after_adjustment, %no_overflow_during_adjustment
  ret i1 %res
}
```
Without D67122 there was no `%non_null_after_adjustment`,
and in this particular case we can get rid of the overhead:

Here we add some offset to a non-null pointer,
and check that the result does not overflow and is not a null pointer.
But since the base pointer is already non-null, and we check for overflow,
that overflow check will already catch the null pointer,
so the separate null check is redundant and can be dropped.

Alive proofs:
https://rise4fun.com/Alive/WRzq

There are more patterns of "unsigned-add-with-overflow", they are not handled here,
but this is the main pattern, that we currently consider canonical,
so it makes sense to handle it.

https://bugs.llvm.org/show_bug.cgi?id=43246

Reviewers: spatel, nikic, vsk

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits, reames

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67332

llvm-svn: 371349
2019-09-08 20:14:15 +00:00
Sanjay Patel
8c0a936e23 [InstCombine] add tests for icmp with srem operand; NFC
llvm-svn: 371348
2019-09-08 19:48:47 +00:00
Roman Lebedev
732380abca [X86] X86DAGToDAGISel::combineIncDecVector(): call getSplatBuildVector() manually
As reported in post-commit review of r370327,
there is some case where the code crashes.

As discussed with Craig Topper, the problem is that getConstant()
internally calls getSplatBuildVector(), so we don't insert
the constant itself.

If we do that manually we're good.

llvm-svn: 371346
2019-09-08 19:36:13 +00:00
Craig Topper
309c97a629 [X86] Use DAG.getConstant instead of getZeroVector in combinePMULDQ.
getZeroVector canonicalizes the type to vXi32, but that's a
legalization action. We should use the most correct type if
possible.

llvm-svn: 371345
2019-09-08 19:24:42 +00:00
Craig Topper
9078ffc0a4 [DAGCombiner][X86][ARM] Teach visitMULO to fold multiplies with 0 to 0 and no carry.
I modified the ARM test to use two inputs instead of 0 so the
test hopefully still tests what was intended.

llvm-svn: 371344
2019-09-08 19:24:39 +00:00
Craig Topper
09af94b030 [X86] Teach materializeVectorConstant to not call getZeroVector/getOnesVector on the types we already have isel patterns for.
llvm-svn: 371343
2019-09-08 19:24:29 +00:00
Sanjay Patel
6736ed2107 [InstCombine] fold extract+insert into identity shuffle
This is similar to the existing fold for splats added with:
rL365379

If we can adjust the shuffle mask to include another element
in an identity mask (if it changes vector length, that's an
extract/insert subvector operation in the backend), then that
can eliminate extractelement/insertelement pairs in IR.

All targets are expected to lower shuffles with identity masks
efficiently.

llvm-svn: 371340
2019-09-08 19:03:01 +00:00
Roman Lebedev
cb7a418ec1 [NFC][InstSimplify] Some tests for dropping null check after uadd.with.overflow of non-null (PR43246)
https://rise4fun.com/Alive/WRzq

Name: C <= Y && Y != 0  -->  C <= Y  iff C != 0
Pre: C != 0
  %y_is_nonnull = icmp ne i64 %y, 0
  %no_overflow = icmp ule i64 C, %y
  %r = and i1 %y_is_nonnull, %no_overflow
=>
  %r = %no_overflow

Name: C <= Y || Y != 0  -->  Y != 0  iff C != 0
Pre: C != 0
  %y_is_nonnull = icmp ne i64 %y, 0
  %no_overflow = icmp ule i64 C, %y
  %r = or i1 %y_is_nonnull, %no_overflow
=>
  %r = %y_is_nonnull

Name: C > Y || Y == 0  -->  C > Y  iff C != 0
Pre: C != 0
  %y_is_null = icmp eq i64 %y, 0
  %overflow = icmp ugt i64 C, %y
  %r = or i1 %y_is_null, %overflow
=>
  %r = %overflow

Name: C > Y && Y == 0  -->  Y == 0  iff C != 0
Pre: C != 0
  %y_is_null = icmp eq i64 %y, 0
  %overflow = icmp ugt i64 C, %y
  %r = and i1 %y_is_null, %overflow
=>
  %r = %y_is_null

https://bugs.llvm.org/show_bug.cgi?id=43246

llvm-svn: 371339
2019-09-08 17:50:40 +00:00
David Stenberg
5557ae7418 [DebugInfo][X86] Describe call site values for zero-valued imms
Summary:
Add zero-materializing XORs to X86's describeLoadedValue() hook in order
to produce call site values.

I have had to change the defs logic in collectCallSiteParameters() a bit
to be able to describe the XORs. The XORs implicitly define $eflags,
which would cause them to never be considered, due to a guard condition
that I->getNumDefs() is one. I have changed that condition so that we
now only consider instructions where a forwarded register overlaps with
the instruction's single explicit define. We still need to collect the implicit
defines of other forwarded registers to remove them from the work list.
I'm not sure how to move towards supporting instructions with multiple
explicit defines, cases where forwarded register are implicitly defined,
and/or cases where an instruction produces values for multiple forwarded
registers. Perhaps the describeLoadedValue() hook should take a register
argument, and we then leave it up to the hook to describe the loaded
value in that register? I have not yet encountered a situation where
that would be necessary though.

Reviewers: aprantl, vsk, djtodoro, NikolaPrica

Reviewed By: vsk

Subscribers: ychen, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D67225

llvm-svn: 371333
2019-09-08 14:22:06 +00:00
David Stenberg
d13a1adf7f [NFC] Make the describeLoadedValue() hook return machine operand objects
Summary:
This changes the ParamLoadedValue pair which the describeLoadedValue()
hook returns so that MachineOperand objects are returned instead of
pointers.

When describing call site values we may need to describe operands which
are not part of the instruction. One such example is zero-materializing
XORs on x86, which I have implemented support for in a child revision.
Instead of having to return a pointer to an operand stored somewhere
outside the instruction, start returning objects directly instead, as
that simplifies the code.

The MachineOperand class only holds POD members, and on x86-64 it is 32
bytes large. That combined with copy elision means that the overhead of
returning a machine operand object from the hook does not become very
large.

I benchmarked this on a 8-thread i7-8650U machine with 32 GB RAM. The
benchmark consisted of building a clang 8.0 binary configured with:

  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DLLVM_TARGETS_TO_BUILD=X86 \
  -DLLVM_USE_SANITIZER=Address \
  -DCMAKE_CXX_FLAGS="-Xclang -femit-debug-entry-values -stdlib=libc++"

The average wall clock time increased by 4 seconds, from 62:05 to
62:09, which is an 0.1% increase.

Reviewers: aprantl, vsk, djtodoro, NikolaPrica

Reviewed By: vsk

Subscribers: hiraditya, ychen, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D67261

llvm-svn: 371332
2019-09-08 14:05:10 +00:00
David Green
80d9e4304e [ARM] Remove declaration of unimplemented function. NFC.
llvm-svn: 371331
2019-09-08 13:13:15 +00:00
Simon Pilgrim
d79e9b3ca4 [X86][SSE] Fix out of range shift introduced in D67070/rL371328
Use APInt to create the comparison mask instead.

llvm-svn: 371330
2019-09-08 12:44:22 +00:00
Simon Pilgrim
8c4b7a35da [X86] Add test case for PR32546
llvm-svn: 371329
2019-09-08 11:56:07 +00:00
Simon Pilgrim
4e64a95fff [X86][SSE] Add support for <64 x i1> bool reduction
This generalizes the existing <32 x i1> pre-AVX2 split code to support reductions from <64 x i1> as well, we can probably generalize to any larger pow2 case in the future if the (unlikely) need ever arises.

We still need to tweak combineBitcastvxi1 to improve AVX512F codegen as its assumes vXi1 types should be handled on the mask registers even when they aren't legal.

Differential Revision: https://reviews.llvm.org/D67070

llvm-svn: 371328
2019-09-08 11:46:21 +00:00
Xing GUO
061f8c3578 [StackMap] Current stackmap version should be 3. NFC.
llvm-svn: 371327
2019-09-08 11:42:51 +00:00
Craig Topper
49fb7bc9bf [X86] Make getZeroVector return floating point vectors in their native type on SSE2 and later.
isel used to require zero vectors to be canonicalized to a single
type to minimize the number of patterns needed to match. This is
 no longer required.

I plan to do this to integers too, but floating point was simpler
to start with. Integer has a complication where v32i16/v64i8 aren't
legal when the other 512-bit integer types are.

llvm-svn: 371325
2019-09-08 00:43:52 +00:00
Craig Topper
4bf0ebb232 [X86] Add support for unfold broadcast loads from FMA instructions.
llvm-svn: 371323
2019-09-07 21:54:40 +00:00
Craig Topper
50b3ce323c [X86] Add broadcast load unfolding tests for FMA instructions.
llvm-svn: 371322
2019-09-07 21:54:36 +00:00
Sebastian Pop
828d9d0610 [aarch64] Add combine patterns for fp16 fmla
This patch enables generation of fused multiply add/sub for instructions operating on fp16.
Tested on aarch64-linux.

Differential Revision: https://reviews.llvm.org/D67297

llvm-svn: 371321
2019-09-07 20:24:51 +00:00
Craig Topper
322f74c2df [X86] Add prefer-128-bit subtarget feature.
Summary:
Similar to the previous prefer-256-bit flag. We might want to
enable this by default some CPUs. This just starts the initial
work to implement and prove that it effects TTI's vector width.

Reviewers: RKSimon, echristo, spatel, atdt

Reviewed By: RKSimon

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67311

llvm-svn: 371319
2019-09-07 19:54:22 +00:00
George Rimar
1aecd28b57 [llvm-nm] - Fix a bug and unbreak ASan BB.
BB: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/13820/steps/check-llvm%20asan/logs/stdio

rL371074 revealed a bug in llvm-nm.

This patch fixes it.

llvm-svn: 371318
2019-09-07 19:45:27 +00:00
Simon Pilgrim
7b8ba30bb1 Fix typo. NFCI
llvm-svn: 371317
2019-09-07 18:09:09 +00:00
Simon Pilgrim
ee3c42e028 [X86] Avoid uses of getZextValue(). NFCI.
Use getAPIntValue() directly - this is mainly a best practice style issue to help prevent fuzz tests blowing up when a i12345 (or whatever) is generated.

Use getConstantOperandVal/getConstantOperandAPInt wrappers where possible.

llvm-svn: 371315
2019-09-07 16:13:57 +00:00
Simon Pilgrim
85b4d1cf79 [X86][AVX] Add 'f5' v4f64 shuffle test mentioned in D66004
llvm-svn: 371314
2019-09-07 16:13:48 +00:00
Fangrui Song
a1f00c090b [ELF][MC] Set types of aliases of IFunc to STT_GNU_IFUNC
```
.type  foo,@gnu_indirect_function
.set   foo,foo_resolver

.set foo2,foo
.set foo3,foo2
```

The types of foo2 and foo3 should be STT_GNU_IFUNC, but we currently
resolve them to the type of foo_resolver. This patch fixes it.

Differential Revision: https://reviews.llvm.org/D67206
Patch by Senran Zhang

llvm-svn: 371312
2019-09-07 14:58:47 +00:00
Roman Lebedev
287f598299 [SimplifyCFG][NFC] Autogenerate PhiEliminate3.ll
llvm-svn: 371311
2019-09-07 13:53:14 +00:00
Roman Lebedev
e33e2427b7 [SimplifyCFG][NFC] Autogenerate two tests
llvm-svn: 371310
2019-09-07 13:35:54 +00:00
Bjorn Pettersson
556c69a3de [CodeGen] Handle SMULFIXSAT with scale zero in TargetLowering::expandFixedPointMul
Summary:
Normally TargetLowering::expandFixedPointMul would handle
SMULFIXSAT with scale zero by using an SMULO to compute the
product and determine if saturation is needed (if overflow
happened). But if SMULO isn't custom/legal it falls through
and uses the same technique, using MULHS/SMUL_LOHI, as used
for non-zero scales.

Problem was that when checking for overflow (handling saturation)
when not using MULO we did not expect to find a zero scale. So
we ended up in an assertion when doing
  APInt::getLowBitsSet(VTSize, Scale - 1)

This patch fixes the problem by adding a new special case for
how saturation is computed when scale is zero.

Reviewers: RKSimon, bevinh, leonardchan, spatel

Reviewed By: RKSimon

Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67071

llvm-svn: 371309
2019-09-07 12:16:23 +00:00
Bjorn Pettersson
f8a47e6d1e [Intrinsic] Add the llvm.umul.fix.sat intrinsic
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.

Patch by: leonardchan, bjope

Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel

Reviewed By: leonardchan

Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57836

llvm-svn: 371308
2019-09-07 12:16:14 +00:00
Nikita Popov
4b082fe0ef [X86] Fix pshuflw formation from repeated shuffle mask (PR43230)
Fix for https://bugs.llvm.org/show_bug.cgi?id=43230.

When creating PSHUFLW from a repeated shuffle mask, we have to apply
the checks to the repeated mask, not the original one. For the test
case from PR43230 the inspected part of the original mask is all undef.

Differential Revision: https://reviews.llvm.org/D67314

llvm-svn: 371307
2019-09-07 12:13:44 +00:00
Nikita Popov
4c7abf2638 [LVI] Look through extractvalue of insertvalue
This addresses the issue mentioned on D19867. When we simplify
with.overflow instructions in CVP, we leave behind extractvalue
of insertvalue sequences that LVI no longer understands. This
means that we can not simplify any instructions based on the
with.overflow anymore (until some over pass like InstCombine
cleans them up).

This patch extends LVI extractvalue handling by calling
SimplifyExtractValueInst (which doesn't do anything more than
constant folding + looking through insertvalue) and using the block
value of the simplification.

A possible alternative would be to do something similar to
SimplifyIndVars, where we instead directly try to replace
extractvalue users of the with.overflow. This would need some
additional structural changes to CVP, as it's currently not legal
to remove anything but the current instruction -- we'd have to
introduce a worklist with instructions scheduled for deletion or similar.

Differential Revision: https://reviews.llvm.org/D67035

llvm-svn: 371306
2019-09-07 12:03:59 +00:00
Nikita Popov
e84eda7a46 [X86] Add test for PR43230; NFC
llvm-svn: 371305
2019-09-07 12:03:48 +00:00
Bjorn Pettersson
7157a84082 [DwarfExpression] Disallow some rewrites to avoid undefined behavior
Summary:
The value operand in DW_OP_plus_uconst/DW_OP_constu value can be
large (it uses uint64_t as representation internally in LLVM).
This means that in the uint64_t to int conversions, previously done
by DwarfExpression::addMachineRegExpression, could lose information.
Also, the negation done in "-Offset" was undefined behavior in case
Offset was exactly INT_MIN.

To avoid the above problems, we now avoid transformation like
 [Reg, DW_OP_plus_uconst, Offset] --> [DW_OP_breg, Offset]
and
 [Reg, DW_OP_constu, Offset, DW_OP_plus]  --> [DW_OP_breg, Offset]
when Offset > INT_MAX.

And we avoid to transform
 [Reg, DW_OP_constu, Offset, DW_OP_minus] --> [DW_OP_breg,-Offset]
when Offset > INT_MAX+1.

The patch also adjusts DwarfCompileUnit::constructVariableDIEImpl
to make sure that "DW_OP_constu, Offset, DW_OP_minus" is used
instead of "DW_OP_plus_uconst, Offset" when creating DIExpressions
with negative frame index offsets.

Notice that this might just be the tip of the iceberg. There
are lots of fishy handling related to these constants. I think both
DIExpression::appendOffset and DIExpression::extractIfOffset may
trigger undefined behavior for certain values.

Reviewers: sdesmalen, rnk, JDevlieghere

Reviewed By: JDevlieghere

Subscribers: jholewinski, aprantl, hiraditya, ychen, uabelho, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D67263

llvm-svn: 371304
2019-09-07 11:40:10 +00:00
Bjorn Pettersson
d5b32b1f1e [DebugInfo] Pre-commit of test case for DW_OP_breg/DW_OP_fbreg folds
This currently triggers undefined behavior if executed with an
ubsan build. It is just a precommit of the test case to show that
we got a problem.

Fix is proposed in https://reviews.llvm.org/D67263 and plan is to
commit the fix directly after this patch.

llvm-svn: 371303
2019-09-07 11:39:57 +00:00
Simon Pilgrim
8ec3d7c47b Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI.
llvm-svn: 371302
2019-09-07 11:04:04 +00:00
Roman Lebedev
96b149816b [SimplifyCFG][NFC] Make merge-cond-stores-cost.ll X86-specific, and rewrite it
We clearly perform store-merging, even though div is really costly.

llvm-svn: 371300
2019-09-07 10:55:04 +00:00
Benjamin Kramer
f037180b93 [Attributor] Make unimplemented method pure virtual.
Otherwise the compiler mistakes it for a vtable anchor.

llvm-svn: 371298
2019-09-07 10:27:13 +00:00
Roman Lebedev
4a89981931 [SimplifyCFG][NFC] Show that we don't consider the cost when merging cond stores
We count instruction count in each BB's separately, not their cost.

llvm-svn: 371297
2019-09-07 09:25:26 +00:00