1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

57 Commits

Author SHA1 Message Date
Wang, Pengfei
72838e8fb2 [X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics.
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D85385
2020-08-11 10:28:41 +08:00
Craig Topper
0986b8f2e8 [X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead.
This moves v32i16/v64i8 to a model consistent with how we
treat integer types with avx1.

This does change the ABI for types vXi16/vXi8 vectors larger than
512 bits to pass in multiple zmms instead of multiple ymms. We'd
already hacked some code to make v64i8/v32i16 pass in zmm.

Cost model is still a bit of a mess. In some place I tried to
match existing behavior. But really we need to account for
splitting and concating costs. Cost model for shuffles is
especially pessimistic.

Differential Revision: https://reviews.llvm.org/D76212
2020-04-15 12:17:18 -07:00
Craig Topper
b86ac63633 [X86] Add broadcast load folding patterns to the NoVLX compare patterns.
These patterns use zmm registers for 128/256-bit compares when
the VLX instructions aren't available. Previously we only
supported registers, but as PR36191 notes we can fold broadcast
loads, but not regular loads.

llvm-svn: 373423
2019-10-02 04:45:02 +00:00
Craig Topper
4c6063ee26 [X86] Simplify b2b KSHIFTL+KSHIFTR using demanded elts.
llvm-svn: 372155
2019-09-17 18:02:56 +00:00
Craig Topper
5973da6f16 [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
I also had to add a new combine to X86's combineExtractSubvector to prevent a regression.

This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.

I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.

Differential Revision: https://reviews.llvm.org/D66456

llvm-svn: 369459
2019-08-20 22:12:50 +00:00
Craig Topper
a93395c550 [X86] Add isel patterns for (i64 (zext (i8 (bitcast (v16i1 X))))) to use a KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend.
We already had patterns for extending to i32 to take advantage of
the impliciting zeroing of the upper bits of a 32-bit GPR that is
done by KMOVW/KMOVB. But the extend might be all the way to i64,
in which case the existing patterns would fail and we'd get a
KMOVW/B followed by a MOVZX. By adding patterns for i64 we can
use the fact that KMOVW/B zero the upper bits of the 32-bit GPR
and the normal property that 32-bit GPR writes implicitly zero the
upper 32-bits of the full 64-bit GPR.

The anyextend patterns are slightly different since we don't care
about the upper zeros. For the i8->i64 I think this avoids selecting
the anyextend as a MOVZX to prevent a partial register issue that
doesn't exist. For i16->i64 I think we would have just emitted an
insert_subreg on top of the extract_subreg that the vXi16->i16
bitcast pattern emits. The register coalescer or peephole pass
should combine those, but this saves that work and makes i8/16
consistent.

llvm-svn: 369431
2019-08-20 19:43:48 +00:00
Craig Topper
185e5e3022 [X86] Move promotion of vector and/or/xor from legalization to DAG combine
Summary:
I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that.

This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better.

In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53107

llvm-svn: 344487
2018-10-15 01:51:58 +00:00
Craig Topper
4dd98aa412 [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name.
I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask".

llvm-svn: 335744
2018-06-27 15:57:53 +00:00
Craig Topper
92f12ba673 [DAGCombiner] When combining zero_extend of a truncate, only mask before extending for vectors.
Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code.

Differential Revision: https://reviews.llvm.org/D42679

llvm-svn: 326500
2018-03-01 22:32:25 +00:00
Craig Topper
38e20ced44 [X86] Custom split v32i16/v64i8 bitcasts when AVX512F is available, but BWI is not.
The test changes you can see are related to the changes in ReplaceNodeResults. Though shuffle-vs-trunc-512.ll does have a test that exercises the code in LowerBITCAST. Looks like the test output didn't change because DAG combining is able to clean up the resulting type legalization. Adding the custom hook just makes type legalization work less hard.

Differential Revision: https://reviews.llvm.org/D43447

llvm-svn: 325933
2018-02-23 18:43:36 +00:00
Craig Topper
deb8d1bdf2 [X86] Stop swapping the operands of AVX512 setge.
We swapped the operands and used setle, but I don't see any reason to do that. I think this is a holdover from SSE where we swap and the invert to use pcmpgt. But with AVX512 we don't want an invert so we won't use pcmpgt. So there's no need to swap.

llvm-svn: 325527
2018-02-19 19:23:35 +00:00
Craig Topper
598a86fcc7 [X86] Use min/max for vector ult/ugt compares if avoids a sign flip.
Summary:
Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare.

I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior.

Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into.

I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42935

llvm-svn: 324842
2018-02-11 17:11:40 +00:00
Craig Topper
362df7b20a [X86] Change signatures of avx512 packed fp compare intrinsics to return a vXi1 mask type to be closer to an fcmp.
Summary:
This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR.

This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares.

Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it.

This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once.

Reviewers: spatel, delena, RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43137

llvm-svn: 324827
2018-02-10 23:33:55 +00:00
Craig Topper
794e34cca4 [X86] Modify a few tests to not use icmps that are provably false.
These used things like unsigned less than zero, which is always false because there is no unsigned number less than zero.

I plan to teach DAG combine to optimize these so need to stop using them.

llvm-svn: 324315
2018-02-06 06:44:05 +00:00
Craig Topper
952f7a0d70 [X86] Add DAG combine to turn (bitcast (and/or/xor (bitcast X), Y)) -> (and/or/xor X, (bitcast Y)) when casting between GPRs and mask operations.
This reduces the number of transitions between k-registers and GPRs, reducing the number of instructions.

There's still some room for improvement to remove more transitions, but this is a good start.

llvm-svn: 324184
2018-02-04 01:43:48 +00:00
Puyan Lotfi
d4c615be8c Followup on Proposal to move MIR physical register namespace to '$' sigil.
Discussed here:

http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html

In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.

llvm-svn: 323922
2018-01-31 22:04:26 +00:00
Craig Topper
df72b4ab25 [X86] Use vpternlog to implement vector not under AVX512.
Previously we had to materialize all 1s in a register using vpternlog or pcmpeq and then xor with that. By using vpternlog directly we can do it in one operation.

This is implemented using isel patterns, but we should maybe consider creating a generalized vpternlog combiner.

llvm-svn: 323572
2018-01-26 22:17:40 +00:00
Craig Topper
759b479987 [X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the default of promoting to v32i8.
Summary:
For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register.

I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes.

There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined.

I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc))

Overall this removes something like 13000 CHECK lines from lit tests.

Reviewers: zvi, RKSimon, delena, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42031

llvm-svn: 323201
2018-01-23 14:25:39 +00:00
Craig Topper
7095915282 [X86] Use ISD::TRUNCATE instead of X86ISD::VTRUNC when input and output types have the same number of elements.
llvm-svn: 322455
2018-01-14 08:11:36 +00:00
Craig Topper
297e87e001 [X86] Add DAG combine to promote vXi1 result of a vXi8/vXi16 setcc when we have AVX512 but not BWI.
This avoids having the result type stick around until lowering where we have to extend the setcc and insert a truncate. If we get the types converted early we can do more to optimize it.

llvm-svn: 322432
2018-01-13 06:24:46 +00:00
Craig Topper
971ece4f26 [X86] Make v2i1 and v4i1 legal types without VLX
Summary:
There are few oddities that occur due to v1i1, v8i1, v16i1 being legal without v2i1 and v4i1 being legal when we don't have VLX. Particularly during legalization of v2i32/v4i32/v2i64/v4i64 masked gather/scatter/load/store. We end up promoting the mask argument to these during type legalization and then have to widen the promoted type to v8iX/v16iX and truncate it to get the element size back down to v8i1/v16i1 to use a 512-bit operation. Since need to fill the upper bits of the mask we have to fill with 0s at the promoted type.

It would be better if we could just have the v2i1/v4i1 types as legal so they don't undergo any promotion. Then we can just widen with 0s directly in a k register. There are no real v4i1/v2i1 instructions anyway. Everything is done on a larger register anyway.

This also fixes an issue that we couldn't implement a masked vextractf32x4 from zmm to xmm properly.

We now have to support widening more compares to 512-bit to get a mask result out so new tablegen patterns got added.

I had to hack the legalizer for widening the operand of a setcc a bit so it didn't try create a setcc returning v4i32, extract from it, then try to promote it using a sign extend to v2i1. Now we create the setcc with v4i1 if the original setcc's result type is v2i1. Then extract that and don't sign extend it at all.

There's definitely room for improvement with some follow up patches.

Reviewers: RKSimon, zvi, guyblank

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41560

llvm-svn: 321967
2018-01-07 18:20:37 +00:00
Craig Topper
29a4ad935c [X86] In LowerTruncateVecI1, don't add SHL if the input is known to be all sign bits.
If the input is all sign bits then the LSB through MSB are all the same so we don't need to be move the LSB to the MSB.

llvm-svn: 321617
2018-01-01 04:52:58 +00:00
Craig Topper
df0b012274 [X86] Use CONCAT_VECTORS instead of INSERT_SUBVECTOR for padding v4i1/v2i1 vector to v8i1 pre-legalize.
The CONCAT_VECTORS will be lowered to INSERT_SUBVECTOR later. In the modified cases this seems to be enough to trick a later DAG combine into running in a different order than allows the ANDs to be removed.

I'll admit this is a bit of a hack that happens to work, but using CONCAT_VECTORS is more consistent with other legalization code anyway.

llvm-svn: 321611
2017-12-31 19:17:52 +00:00
Craig Topper
98a2e44b5b [X86] Add a DAG combine to widen (i4 (bitcast (v4i1))) before type legalization sees the i4 and changes to load/store.
Same for v2i1 and i2.

llvm-svn: 321602
2017-12-31 09:50:38 +00:00
Craig Topper
e8a65f8f62 [X86] Add a DAG combine to fix (v4i1 (bitcast (i4))) before type legalization sees the i4 and changes to load/store.
Same for i2 and v2i1.

llvm-svn: 321601
2017-12-31 08:25:50 +00:00
Craig Topper
56d68cd419 [X86] Prevent combining (v8i1 (bitconvert (i8 load)))->(v8i1 load) if we don't have DQI.
We end up using an i8 load via an isel pattern from v8i1 anyway. This just makes it more explicit. This seems to improve codgen in some cases and I'd like to kill off some of the load patterns.

llvm-svn: 321598
2017-12-31 07:38:41 +00:00
Craig Topper
653b40435f [X86] Use SIGN_EXTEND to implement ANY_EXTEND from vXi1.
llvm-svn: 321334
2017-12-22 02:30:26 +00:00
Craig Topper
807e4c5401 [X86] Canonicalize extract_vector_elt from vXi1 to always return MVT::i32.
This allows us to remove some isel patterns that allowed MVT::i8 result type.

llvm-svn: 320936
2017-12-17 01:35:47 +00:00
Craig Topper
14d377e2bb [X86] Fix a couple bugs in my recent changes to vXi1 insert_subvector lowering.
A couple places didn't use the same SDValue variables to connect everything all the way through.

I don't have a test case for a bug in insert into the lower bits of a non-zero, non-undef vector. Not sure the best way to create that. We don't create the case when lowering concat_vectors which is the main way to get insert_subvectors.

llvm-svn: 320790
2017-12-15 07:16:41 +00:00
Craig Topper
d9543646e4 [X86] Don't zero the upper bits of the k-register before extracting a single bit from a vXi1.
This doesn't match the semantics of the extract_vector_elt operation. Nothing downstream knows the bits were zeroed so they still get masked or sign extended after the extrat anyway.

llvm-svn: 320723
2017-12-14 18:35:25 +00:00
Craig Topper
29c72918cd [X86] Make ANY_EXTEND from vXi1 Custom for more types.
We should be able to support ANY_EXTEND for any types we support ZERO_EXTEND for.

llvm-svn: 320675
2017-12-14 08:26:00 +00:00
Craig Topper
ca1a5a2099 [SelectionDAG][X86] Improve legalization of v32i1 CONCAT_VECTORS of v16i1 for AVX512F.
A v32i1 CONCAT_VECTORS of v16i1 uses promotion to v32i8 to legalize the v32i1. This results in a bunch of extract_vector_elts and a build_vector that ultimately gets scalarized.

This patch checks to see if v16i8 is legal and inserts a any_extend to that so that we can concat v16i8 to v32i8 and avoid creating the extracts.

llvm-svn: 320674
2017-12-14 08:25:58 +00:00
Craig Topper
1f33f8f1b9 [X86] Handle alls version of vXi1 insert_vector_elt with a constant index without falling back to shuffles.
We previously only supported inserting to the LSB or MSB where it was easy to zero to perform an OR to insert.

This change effectively extracts the old value and the new value, xors them together and then xors that single bit with the correct location in the original vector. This will cancel out the old value in the first xor leaving the new value in the position.

The way I've implemented this uses 3 shifts and two xors and uses an additional register. We can avoid the additional register at the cost of another shift.

llvm-svn: 320120
2017-12-08 00:16:09 +00:00
Craig Topper
ec3fb09954 [X86] Fix InsertBitToMaskVector to only issue KSHIFTS of native size so that upper bits are properly zeroed.
There's no v2i1 or v4i1 kshift, and v8i1 is only supported with AVXDQ. Isel has fake patterns to extend these types to native shifts, but makes no guarantees about the value of any bits shifted in when shifting right.

This patch promotes the vector to a type that supports a native shift first and only allows inserting into the msb of a native sized shift.

I've constructed this in a way that doesn't do the promotion if we're going to fallback to using a xmm/ymm/zmm shuffle. I think I have a plan to remove the shuffle fall back entirely. In which case we this can be simplified, but I wanted to fix the correctness issue first.

llvm-svn: 320081
2017-12-07 20:10:04 +00:00
Francis Visoiu Mistrih
a2d7c39420 [CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register.
Work towards the unification of MIR and debug output by refactoring the
interfaces.

For MachineOperand::print, keep a simple version that can be easily called
from `dump()`, and a more complex one which will be called from both the
MIRPrinter and MachineInstr::print.

Add extra checks inside MachineOperand for detached operands (operands
with getParent() == nullptr).

https://reviews.llvm.org/D40836

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/<def>//g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<def[ ]*,[ ]*dead>/dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ]*,[ ]*dead>/implicit-def dead \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" -o -name "*.s" \) -type f -print0 | xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g'

llvm-svn: 320022
2017-12-07 10:40:31 +00:00
Craig Topper
e2dd3ace24 [X86] Use vector widening to support sign extend from i1 when the dest type is not 512-bits and vlx is not enabled.
Previously we used a wider element type and truncated. But its more efficient to keep the element type and drop unused elements.

If BWI isn't supported and we have a i16 or i8 type, we'll extend it to be i32 and still use a truncate.

llvm-svn: 319740
2017-12-05 06:37:21 +00:00
Francis Visoiu Mistrih
30264d4391 [CodeGen] Unify MBB reference format in both MIR and debug output
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.

The MIR printer prints the IR name of a MBB only for block definitions.

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix

Differential Revision: https://reviews.llvm.org/D40422

llvm-svn: 319665
2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih
961f3df27b [CodeGen] Print register names in lowercase in both MIR and debug output
As part of the unification of the debug format and the MIR format,
always print registers as lowercase.

* Only debug printing is affected. It now follows MIR.

Differential Revision: https://reviews.llvm.org/D40417

llvm-svn: 319187
2017-11-28 17:15:09 +00:00
Craig Topper
65940a5482 [X86] Teach isel that X86ISD::CMPM_RND zeros the upper bits of the mask register.
llvm-svn: 318933
2017-11-23 18:41:21 +00:00
Reid Kleckner
c37cd8d78d Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317579, originally committed as r317100.

There is a design issue with marking CFI instructions duplicatable. Not
all targets support the CFIInstrInserter pass, and targets like Darwin
can't cope with duplicated prologue setup CFI instructions. The compact
unwind info emission fails.

When the following code is compiled for arm64 on Mac at -O3, the CFI
instructions end up getting tail duplicated, which causes compact unwind
info emission to fail:
  int a, c, d, e, f, g, h, i, j, k, l, m;
  void n(int o, int *b) {
    if (g)
      f = 0;
    for (; f < o; f++) {
      m = a;
      if (l > j * k > i)
        j = i = k = d;
      h = b[c] - e;
    }
  }

We get assembly that looks like this:
; BB#1:                                 ; %if.then
Lloh3:
	adrp	x9, _f@GOTPAGE
Lloh4:
	ldr	x9, [x9, _f@GOTPAGEOFF]
	mov	 w8, wzr
Lloh5:
	str		wzr, [x9]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.lt	LBB0_3
	b	LBB0_7
LBB0_2:                                 ; %entry.if.end_crit_edge
Lloh6:
	adrp	x8, _f@GOTPAGE
Lloh7:
	ldr	x8, [x8, _f@GOTPAGEOFF]
Lloh8:
	ldr		w8, [x8]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.ge	LBB0_7
LBB0_3:                                 ; %for.body.lr.ph

Note the multiple .cfi_def* directives. Compact unwind info emission
can't handle that.

llvm-svn: 317726
2017-11-08 21:31:14 +00:00
Petar Jovanovic
5b6a90db77 Reland "Correct dwarf unwind information in function epilogue for X86"
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().

Original r317100 message:

"Correct dwarf unwind information in function epilogue for X86"

This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.

llvm-svn: 317579
2017-11-07 14:40:27 +00:00
Uriel Korach
dc1d0e7db3 [X86][AVX512] Improve lowering of AVX512 test intrinsics
Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits
and does not need the redundant shift left and shift right instructions afterwards.
Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM
and icmp(eq,and(X,Y), 0) goes folds into TESTNM
This commit is a preparation for lowering the test and testn X86 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38732

llvm-svn: 317465
2017-11-06 09:22:38 +00:00
Petar Jovanovic
8c7ceb5bd9 Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf
buildbot failure (build #15606).

llvm-svn: 317136
2017-11-01 23:05:52 +00:00
Petar Jovanovic
d61746acde Correct dwarf unwind information in function epilogue for X86
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.


Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D35844

llvm-svn: 317100
2017-11-01 16:04:11 +00:00
Craig Topper
9e42704879 [AVX512] Add patterns to commute integer comparison instructions during isel.
This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass.

llvm-svn: 315274
2017-10-10 06:36:46 +00:00
Reid Kleckner
bfd9be426f [MC] Suppress .Lcfi labels when emitting textual assembly
Summary:
This suppresses the generation of .Lcfi labels in our textual assembler.
It was annoying that this generated cascading .Lcfi labels:
  llc foo.ll -o - | llvm-mc | llvm-mc

After three trips through MCAsmStreamer, we'd have three labels in the
output when none are necessary. We should only bother creating the
labels and frame data when making a real object file.

This supercedes D38605, which moved the entire .seh_ implementation into
MCObjectStreamer.

This has the advantage that we do more checking when emitting textual
assembly, as a minor efficiency cost. Outputting textual assembly is not
performance critical, so this shouldn't matter.

Reviewers: majnemer, MatzeB

Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D38638

llvm-svn: 315259
2017-10-10 00:57:36 +00:00
Gadi Haber
b229841b1e [X86][SKX][KNL] Updated regression tests to use -mattr instead of -mcpu flag.NFC.
NFC.
 Updated 8 regression tests to use -mattr instead of -mcpu flag as follows:
 -mcpu=knl --> -mattr=+avx512f
 -mcpu=skx --> -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq

The updates are as part of the preparation of a large commit to add all instruction scheduling for the SKX target.

Reviewers: delena, zvi, RKSimon
Differential Revision: https://reviews.llvm.org/D38222

Change-Id: I2381c9b5bb75ecacfca017243c22d054f6eddd14
llvm-svn: 314306
2017-09-27 14:44:15 +00:00
Craig Topper
bff345b46e [AVX-512] Add pattern for selecting masked version of v8i32/v8f32 compare instructions when VLX isn't available.
We use a v16i32/v16f32 compare instead and truncate the result. We already did this for the unmasked version, but were missing the version with 'and'.

llvm-svn: 314072
2017-09-24 05:24:52 +00:00
Craig Topper
a7dd52dcee [AVX512] Use 256-bit extract instructions for extracting bits [255:128] from a 512-bit register
This enables the use of a smaller encoding by using a VEX instruction when possible.

Differential Revision: https://reviews.llvm.org/D37092

llvm-svn: 312100
2017-08-30 07:26:12 +00:00
Gadi Haber
9f39237fa7 [X86][Haswell] Updating HSW instruction scheduling information
This patch completely replaces the instruction scheduling information for the Haswell architecture target by modifying the file X86SchedHaswell.td located under the X86 Target.
We used the scheduling information retrieved from the Haswell architects in order to replace and modify the existing scheduling.
The patch continues the scheduling replacement effort started with the SNB target in r307529 and r310792.
Information includes latency, number of micro-Ops and used ports by each HSW instruction.

Please expect some performance fluctuations due to code alignment effects.

Reviewers: RKSimon, zvi, aymanmus, craig.topper, m_zuckerman, igorb, dim, chandlerc, aaboud

Differential Revision: https://reviews.llvm.org/D36663

llvm-svn: 311879
2017-08-28 10:04:16 +00:00