1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

94562 Commits

Author SHA1 Message Date
Kyle Butt
01686d3b5f IfConversion: Fix branch predication bug.
This bug shows up with diamonds that share unpredicable, unanalyzable branches.
There's an included test case from Hexagon. What was happening was that we were
attempting to predicate the branch instruction despite the fact that it was
checked to be the same. Now for unanalyzable branches we skip over the branch
instructions when predicating the block.

Differential Revision: https://reviews.llvm.org/D23939

llvm-svn: 279985
2016-08-29 18:27:12 +00:00
Vitaly Buka
da8c27db69 Use store operation to poison allocas for lifetime analysis.
Summary:
Calling __asan_poison_stack_memory and __asan_unpoison_stack_memory for small
variables is too expensive.

Code is disabled by default and can be enabled by -asan-experimental-poisoning.

PR27453

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23947

llvm-svn: 279984
2016-08-29 18:17:21 +00:00
Vitaly Buka
1d9c174ef6 [asan] Separate calculation of ShadowBytes from calculating ASanStackFrameLayout
Summary: No functional changes, just refactoring to make D23947 simpler.

Reviewers: eugenis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23954

llvm-svn: 279982
2016-08-29 17:41:29 +00:00
David Majnemer
485740d256 [SimplifyCFG] Hoisting invalidates metadata
We forgot to remove optimization metadata when performing hosting during
FoldTwoEntryPHINode.

This fixes PR29163.

llvm-svn: 279980
2016-08-29 17:14:08 +00:00
Evandro Menezes
1ef8196cc2 [AArch64] Adjust the scheduling model for Exynos M1.
Further refine the model for loads.

llvm-svn: 279976
2016-08-29 16:04:37 +00:00
Anna Thomas
2c186056d6 [StatepointsForGC] Rematerialize in the presence of PHIs
Summary:
While walking the use chain for identifying rematerializable values in RS4GC,
add the case where the current value and base value are the same PHI nodes.

This will aid rematerialization of geps and casts instead of relocating.

Reviewers: sanjoy, reames, igor

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23920

llvm-svn: 279975
2016-08-29 15:41:59 +00:00
Teresa Johnson
bad28dab29 [LTO] Remove extraneous output
Remove some debugging output to stderr that snuck in with r279576.

llvm-svn: 279974
2016-08-29 15:33:01 +00:00
Sanjay Patel
d0d823d9f1 [Constant] remove fdiv and frem from canTrap()
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.

This matches how we treat the fdiv/frem in IR with isSafeToSpeculativelyExecute() and in 
the backend after:
https://reviews.llvm.org/rL279970

llvm-svn: 279973
2016-08-29 15:27:17 +00:00
Gor Nishanov
cc3ff9c81d [Coroutines] Part 9: Add cleanup subfunction.
Summary:
[Coroutines] Part 9: Add cleanup subfunction.

This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)

Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.

CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code

CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.

Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.

* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23844

llvm-svn: 279971
2016-08-29 14:34:12 +00:00
Sanjay Patel
e3955cf340 [TargetLowering] remove fdiv and frem from canOpTrap() (PR29114)
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of 
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.

This matches how we treat these ops in IR with isSafeToSpeculativelyExecute(). There's a 
similar bug in Constant::canTrap().

This bug manifests in PR29114:
https://llvm.org/bugs/show_bug.cgi?id=29114
...as a sequence of scalar divisions instead of a vector division on x86 for a <3 x float> 
type.

Differential Revision: https://reviews.llvm.org/D23974

llvm-svn: 279970
2016-08-29 13:32:41 +00:00
Krzysztof Parzyszek
3628e5ad57 Do not use MRI::getMaxLaneMaskForVReg as a mask covering whole register
MRI::getMaxLaneMaskForVReg does not always cover the whole register.
For example, on X86 the upper 16 bits of EAX cannot be accessed via
any subregister. Consequently, there is no lane mask that only covers
that part of EAX. The getMaxLaneMaskForVReg will return the union of
the lane masks for all subregisters, and in case of EAX, that union
will not cover the upper 16 bits.

This fixes https://llvm.org/bugs/show_bug.cgi?id=29132

llvm-svn: 279969
2016-08-29 13:15:35 +00:00
Tom Stellard
12f6142ff9 AMDGPU/SI: Improve register allocation hints for sopk instructions
Summary:
For shrinking SOPK instructions, we were creating a hint to tell the
register allocator to use the register allocated for src0 for the dst
operand as well.  However, this seems to not work sometimes depending
on the order virtual registers are assigned physical registers.

To fix this, I've added a second allocation hint which does the reverse,
asks that the register allocated for dst is used for src0.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23862

llvm-svn: 279968
2016-08-29 13:06:10 +00:00
Rafael Espindola
2aeb8bcd81 Use the correct ctor/dtor section for dynamic-no-pic.
llvm-svn: 279967
2016-08-29 12:47:22 +00:00
Rafael Espindola
f7e5d81bb8 Move code only used by codegen out of MC. NFC.
MC itself never needs to know about these sections.

llvm-svn: 279965
2016-08-29 12:33:42 +00:00
Haojian Wu
1ffe62f2c3 Fix -Wunused-but-set-variable warning.
Summary: A follow-up fix on r279958.

Reviewers: bkramer

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23989

llvm-svn: 279964
2016-08-29 12:26:33 +00:00
Tom Stellard
10a0d7df6a AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint()
Summary:
The SILoadStoreOptimizer will need to use AliasAnalysis here in order to
move it before scheduling.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23813

llvm-svn: 279963
2016-08-29 12:05:32 +00:00
Igor Breger
1ff2ecf610 Fixed a bug in type legalizer for masked gather.
The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist.
In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain).

Differential Revision: http://reviews.llvm.org/D23756

llvm-svn: 279961
2016-08-29 09:12:31 +00:00
Igor Breger
e231cf9ebe [AVX512] In some cases KORTEST instruction may be used instead of ZEXT + TEST sequence.
Differential Revision: http://reviews.llvm.org/D23490

llvm-svn: 279960
2016-08-29 08:52:52 +00:00
Haojian Wu
cb4437094e [InstructionSelect] NumBlocks isn't defined in DEBUG build.
Summary: A follow-up fixing on http://llvm.org/viewvc/llvm-project?view=revision&revision=279905.

Reviewers: bkramer

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23985

llvm-svn: 279959
2016-08-29 08:48:15 +00:00
Craig Topper
b910cf620b [X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just create a ConstantFPSDNode and let that be lowered.
This allows broadcast loads to used when available.

llvm-svn: 279958
2016-08-29 04:49:31 +00:00
Craig Topper
dde5d7a8cc [AVX-512] Always use v8i64 when converting 512-bit FAND/FOR/FXOR/FANDN to integer operations when DQI isn't supported. This is consistent with the recent changes to promote logical operations to i64 vectors.
llvm-svn: 279957
2016-08-29 04:49:27 +00:00
Craig Topper
de3199b8e1 [AVX-512] Add support for selecting 512-bit VPABSB/VPABSW when BWI is available.
llvm-svn: 279951
2016-08-28 22:20:51 +00:00
Craig Topper
23bcb14401 [AVX-512] Add patterns for selecting 128/256-bit EVEX VPABS instructions.
llvm-svn: 279950
2016-08-28 22:20:48 +00:00
Sanjay Patel
d4c3ab24ab [InstCombine] use m_APInt to allow icmp (and X, Y), C folds for splat constant vectors
llvm-svn: 279937
2016-08-28 18:18:00 +00:00
Simon Pilgrim
913f5c9084 [X86][AVX512] Only combine EVEX targets shuffles to shuffles of the same number of vector elements
Over eager combing prevents the correct folding of writemasks.

At the moment this occurs for ALL EVEX shuffles, in the future we need to check that the user of the root shuffle is a VSELECT that can fold to a writemask.

llvm-svn: 279934
2016-08-28 17:27:14 +00:00
Hal Finkel
9e501da293 [PowerPC] Implement lowering for atomicrmw min/max/umin/umax
Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818.

llvm-svn: 279933
2016-08-28 16:17:58 +00:00
Elena Demikhovsky
eed0bb3d71 [Loop Vectorizer] Fixed memory confilict checks.
Fixed a bug in run-time checks for possible memory conflicts inside loop.
The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size".

Differential revision: https://reviews.llvm.org/D23176

llvm-svn: 279930
2016-08-28 08:53:53 +00:00
Craig Topper
0f68f0b2ab [AVX-512] Promote AND/OR/XOR to v2i64/v4i64/v8i64 even when we have AVX512F/AVX512VL.
Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available.

Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though.

llvm-svn: 279929
2016-08-28 06:06:28 +00:00
Craig Topper
9a9a09e60f [X86] Rename PABSB/D/W instructions to be consistent with SSE/AVX instructions instead of ending 128/256. NFC
llvm-svn: 279927
2016-08-28 06:06:21 +00:00
Jan Vesely
0d9671ace0 AMDGPU/R600: Enable Load combine
Fix and improve tests

Differential Revision: https://reviews.llvm.org/D23899

llvm-svn: 279925
2016-08-27 19:09:43 +00:00
Craig Topper
d6a4523f86 [X86] Rename predicate function that detects if requires one of the REX.B, REX.X or REX.R bits. It's old name conflicted with a function in X8II namespace that doesnt' quite do the same thing. NFC
llvm-svn: 279924
2016-08-27 17:13:43 +00:00
Craig Topper
ce1e9291d5 [X86] Keep looping over operands looking for byte registers even if we already found a register that requires a REX prefix. Otherwise we don't error if a high byte register is used after SPL/BPL/DIL/SIL.
llvm-svn: 279923
2016-08-27 17:13:41 +00:00
Craig Topper
929e8489be [X86] Include XMM/YMM/ZMM16-23 in X86II::isX86_64ExtendedReg. This feels more consistent with its name and simplifies assembler code.
llvm-svn: 279922
2016-08-27 17:13:37 +00:00
Craig Topper
c65627ae77 [X86] Don't allow DR8-DR15 to be assembled in 32-bit mode. Add missing test for CR8-CR15.
llvm-svn: 279921
2016-08-27 17:13:34 +00:00
Craig Topper
c727b74959 [X86] Remove stale comment about FixupBWInsts pass being off by default. NFC
llvm-svn: 279915
2016-08-27 05:26:54 +00:00
Craig Topper
7c9fd2b2be [AVX-512] Allow EVEX encoding unordered/ordered/equal/notequal VCMPPS/PD/SS/SD to be commuted just like the SSE and AVX counterparts.
llvm-svn: 279914
2016-08-27 05:22:15 +00:00
Craig Topper
dc033bf380 [X86] Enable FR32/FR64 cmpeq/cmpne/cmpunord/cmpord to be commuted.
llvm-svn: 279913
2016-08-27 05:22:12 +00:00
Craig Topper
1964e92665 [AVX-512] Add load folding for EVEX vcmpps/pd/ss/sd.
llvm-svn: 279912
2016-08-27 05:22:08 +00:00
Teresa Johnson
864ef41e7f [LTO] Don't create a new common unless merged has different size
Summary:
This addresses a regression in common handling from the new LTO
API in r278338. Only create a new common if the size is different.
The type comparison against an array type fails when the size is
different but not an array. GlobalMerge does not handle the
array types as well and we lose some global merging opportunities.

Reviewers: mehdi_amini

Subscribers: junbuml, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23955

llvm-svn: 279911
2016-08-27 04:41:22 +00:00
Matt Arsenault
7d35696bfc AMDGPU: Mark sched model complete
Fixes bug 26800

llvm-svn: 279910
2016-08-27 03:39:27 +00:00
Matt Arsenault
5c1096e49d AMDGPU: Remove unneeded implicit exec uses/defs
SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec.
SI_IF_BREAK and SI_ELSE_BREAK do not read it either.

llvm-svn: 279909
2016-08-27 03:00:51 +00:00
Sebastian Pop
c6175f06bd GVN-hoist: invalidate MD cache (PR29144)
Without invalidating the entries in the MD cache we would try to access instructions
that were removed in previous iterations of hoisting.

Differential Revision: https://reviews.llvm.org/D23927

llvm-svn: 279907
2016-08-27 02:48:41 +00:00
Quentin Colombet
a56859d74d [RegBankSelect] Do not abort when the target wants to fall back.
llvm-svn: 279906
2016-08-27 02:38:27 +00:00
Quentin Colombet
cd84f181b1 [InstructionSelect] Do not abort when the target wants to fall back.
llvm-svn: 279905
2016-08-27 02:38:24 +00:00
Quentin Colombet
ff81dc2c09 [MachineLegalize] Do not abort when the target wants to fall back.
llvm-svn: 279904
2016-08-27 02:38:21 +00:00
Matt Arsenault
d3892e17d5 AMDGPU: Select mulhi 24-bit instructions
llvm-svn: 279902
2016-08-27 01:32:27 +00:00
Matt Arsenault
af7cdf6a46 AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

llvm-svn: 279901
2016-08-27 01:00:37 +00:00
Matt Arsenault
9b698596fe AMDGPU: Fix sched type for branches
llvm-svn: 279900
2016-08-27 00:51:02 +00:00
Matt Arsenault
09098bc2b3 AMDGPU: Remove register operand from si_mask_branch
It isn't used for anything, and is also misleading since
it could be spilled at the end of the block, so it can't be relied
on. There ends up being a verifier error about using an undefined
register since the spill kills the register.

llvm-svn: 279899
2016-08-27 00:42:21 +00:00
Matt Arsenault
2922d104bf AMDGPU: Improve error reporting for maximum branch distance
Unfortunately this seems to only help the assembler diagnostic.

llvm-svn: 279895
2016-08-27 00:21:22 +00:00
Quentin Colombet
1636e7c63b [GlobalISel] Add a fallback path to SDISel.
When global-isel fails on a MachineFunction MF, MF will be cleaned up
and given to SDISel.
Thanks to this fallback, we can already perform correctness test even if
we support only a small portion of the functions in a test.

llvm-svn: 279891
2016-08-27 00:18:31 +00:00
Quentin Colombet
7a3f77bdff [AArch64][CallLowering] Do not assert for not implemented part.
When doing the ABI lowering, report a failure to the caller instead of
asserting. This gives a chance for the caller to recover.

llvm-svn: 279890
2016-08-27 00:18:28 +00:00
Quentin Colombet
c8ffecf434 [GlobalISel] Teach the core pipeline not to run if ISel failed.
llvm-svn: 279889
2016-08-27 00:18:24 +00:00
Quentin Colombet
a46582d729 [IRTranslator] Do not abort when the target wants to fall back.
Every pass in the GlobalISel pipeline will need to do something similar.

llvm-svn: 279886
2016-08-26 23:49:05 +00:00
Quentin Colombet
5f3ddf5325 [MFProperties] Introduce a FailedISel property.
This is used to communicate that the instruction selection pipeline
failed at some point.
Another way to achieve that would be to have some kind of conditional
scheduling in the PassManager, such that we only schedule a pass based
on the success/failure of another one. The property approach has the
advantage of being lightweight and solve the problem at stake.

llvm-svn: 279885
2016-08-26 23:49:01 +00:00
Teresa Johnson
048eeff265 [ThinLTO] Move loading of cache entry to client
Summary:
Have the cache pass back the path to the cache entry when it
is ready to be loaded, instead of a buffer.

For gold-plugin we can simply pass this file back to gold directly,
which avoids expensive writing of a separate tmp file. Ensure
the cache entry is not deleted on cleanup by adjusting the setting
of the IsTemporary flags.

Moved the loading of the buffer into llvm-lto2 to maintain current
behavior.

Reviewers: mehdi_amini

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23946

llvm-svn: 279883
2016-08-26 23:29:14 +00:00
Quentin Colombet
9f9ccd2d5a [TargetPassConfig] Add a target hook to know what GlobalISel should do on error.
By default, this hook tells GlobalISel to abort (report a fatal error)
when it encounters an error. The alternative will be to fall back on
SDISel.
This fall back will be removed when the bring-up of GlobalISel is over.

llvm-svn: 279879
2016-08-26 22:32:59 +00:00
Quentin Colombet
0b16bca0b7 [IRTranslator][NFC] Use DEBUG_TYPE instead of repeating the name.
llvm-svn: 279878
2016-08-26 22:32:57 +00:00
Quentin Colombet
2cd9ce7522 [SelectionDAG] Do not run the ISel process on already selected code.
Right now, this cannot happen, but with the fall back path of GlobalISel
it will show up eventually.

llvm-svn: 279877
2016-08-26 22:32:55 +00:00
Quentin Colombet
947cfac2b5 [MachineFunction] Introduce a reset method.
This method allows to reset the state of a MachineFunction as if it was
just created. This will be used during the bring-up of GlobalISel to
provide a way to fallback on SelectionDAG. That way, we can start doing
correctness testing even if we are not able to select all functions via
the global instruction selector.

llvm-svn: 279876
2016-08-26 22:32:53 +00:00
Quentin Colombet
fcb250bf0c [MFProperties] Introduce a reset method with no argument.
This method allows to reset all the properties in one go.

llvm-svn: 279874
2016-08-26 22:09:11 +00:00
Quentin Colombet
e0a55b84fa [MFProperties][NFC] Rename clear into reset to match BitVector naming.
The name clear is used to reset all the bit in bitvectors and using it
to reset just properties was confusing.

llvm-svn: 279873
2016-08-26 22:09:08 +00:00
Tom Stellard
2dbd047f3a AMDGPU/SI: Canonicalize offset order for merged DS instructions
Summary:
If the scheduler clusters the loads, then the offsets will be sorted,
but it is possible for the scheduler to scheduler loads together
without out explicitly clustering them, which would give us non-sorted
offsets.

Also, we will want to do this if we move the load/store optimizer before
the scheduler.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23776

llvm-svn: 279870
2016-08-26 21:36:47 +00:00
Tom Stellard
8b436348bd XXX
llvm-svn: 279868
2016-08-26 21:16:40 +00:00
Tom Stellard
13f830c61a AMDGPU/SI: Use a better method for determining the largest pressure sets
Summary:
There are a few different sgpr pressure sets, but we only care about
the one which covers all of the sgprs.  We were using hard-coded
register pressure set names to determine the reg set id for the
biggest sgpr set.  However, we were using the wrong name, and this
method is pretty fragile, since the reg pressure set names may
change.

The new method just looks for the pressure set that contains the most
reg units and sets that set as our SGPR pressure set.  We've also
adopted the same technique for determining our VGPR pressure set.

Reviewers: arsenm

Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23687

llvm-svn: 279867
2016-08-26 21:16:37 +00:00
Adam Nemet
68515406cf [Inliner] Report when inlining fails because callee's def is unavailable
Summary:
This is obviously an interesting case because it may motivate code
restructuring or LTO.

Reporting this requires instantiation of ORE in the loop where the call
sites are first gathered.  I've checked compile-time
overhead *with* -Rpass-with-hotness and the worst slow-down was 6% in
mcf and quickly tailing off.  As before without -Rpass-with-hotness
there is no overhead.

Because this could be a pretty noisy diagnostics, it is currently
qualified as 'verbose'.  As of this patch, 'verbose' diagnostics are
only emitted with -Rpass-with-hotness, i.e. when the output is expected
to be filtered.

Reviewers: eraman, chandlerc, davidxl, hfinkel

Subscribers: tejohnson, Prazek, davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D23415

llvm-svn: 279860
2016-08-26 20:21:05 +00:00
Rafael Espindola
26c1b18156 Make writeToResolutionFile a static helper.
llvm-svn: 279859
2016-08-26 20:19:35 +00:00
Kyle Butt
160e4d9563 TailDuplication: Record blocks that received the duplicated block. NFC.
This will allow tail duplication during layout to handle the cfg changes more
cleanly.

llvm-svn: 279858
2016-08-26 20:12:40 +00:00
Kevin Enderby
9991a7696a Next set of additional error checks for invalid Mach-O files for bad LC_SYMTAB’s.
This contains the missing checks for LC_SYMTAB load command fields.

llvm-svn: 279854
2016-08-26 19:34:07 +00:00
Manman Ren
23c5a589f3 Swift Calling Convetion: add support for AArch64.
It will just be the same as the regular calling convention.

rdar://28029509

llvm-svn: 279853
2016-08-26 19:28:17 +00:00
Tim Northover
0f9e84c16f AArch64: avoid assertion on illegal types in performFDivCombine.
In the code to detect fixed-point conversions and make use of AArch64's special
instructions, we weren't prepared for weird types. The fptosi direction got
fixed recently, but not the similar sitofp code.

llvm-svn: 279852
2016-08-26 18:52:31 +00:00
Sanjay Patel
66b7ce5846 [InstCombine] add helper function for icmp (and (sh X, Y), C2), C1 ; NFC
Like other recent changes near here, the goal is to allow vector types for
all of these folds. Splitting things up makes it easier to incrementally 
enhance the code and easier to read.

llvm-svn: 279851
2016-08-26 18:28:46 +00:00
Chad Rosier
ab1bc0d119 [AArch64] Avoid materializing constant values when generating csel instructions.
Differential Revision: https://reviews.llvm.org/D23677

llvm-svn: 279849
2016-08-26 18:05:50 +00:00
Davide Italiano
f0d80a35f9 [AsmParser] Placate a -Wmisleading-indentantion warning (GCC7).
llvm-svn: 279848
2016-08-26 18:05:03 +00:00
Reid Kleckner
7500758e97 [MC] Move .cv_loc management logic out of MCContext
MCContext already has many tasks, and separating CodeView out from it is
probably a good idea. The .cv_loc tracking was modelled on the DWARF
tracking which lived directly in MCContext.

Removes the inclusion of MCCodeView.h from MCContext.h, so now there are
only 10 build actions while I hack on CodeView support instead of 265.

llvm-svn: 279847
2016-08-26 17:58:37 +00:00
Tim Northover
4f35a13881 GlobalISel: mark G_FPEXT legal from float to double.
llvm-svn: 279845
2016-08-26 17:46:22 +00:00
Tim Northover
2f9c75fe2f GlobalISel: mark G_FCMP legal on float & double.
llvm-svn: 279844
2016-08-26 17:46:19 +00:00
Tim Northover
ac577545eb GlobalISel: simplify G_ICMP legalization regime.
It's unclear how the old

    %res(32) = G_ICMP { s32, s32 } intpred(eq), %0, %1

is actually different from an s1 verison

    %res(1) = G_ICMP { s1, s32 } intpred(eq), %0, %1

so we'll remove it for now.

llvm-svn: 279843
2016-08-26 17:46:17 +00:00
Tim Northover
a9938048f7 GlobalISel: legalize sdiv and srem operations.
llvm-svn: 279842
2016-08-26 17:46:13 +00:00
Tim Northover
94b2dc6476 GlobalISel: legalize under-width divisions.
llvm-svn: 279841
2016-08-26 17:46:06 +00:00
Tim Northover
e9b45b8f4e GlobalISel: mark selects legal
llvm-svn: 279840
2016-08-26 17:46:03 +00:00
Tim Northover
aa17b67046 GlobalISel: mark float/int conversions legal
llvm-svn: 279839
2016-08-26 17:45:58 +00:00
Sanjay Patel
327342b2d5 [InstCombine] clean up foldICmpAndConstConst(); NFC
1. Early exit to reduce indent
2. Fix comments and variable names to match
3. Reformat comments / clang-format code

llvm-svn: 279837
2016-08-26 17:15:22 +00:00
Krzysztof Parzyszek
fdbb18927f Missed a semicolon in r279835
llvm-svn: 279836
2016-08-26 16:50:57 +00:00
Krzysztof Parzyszek
d4f2333399 Add some more detailed debugging information in RegisterCoalescer
llvm-svn: 279835
2016-08-26 16:46:14 +00:00
Sanjay Patel
c810f30220 [InstCombine] add helper function for folding of icmp (and X, C2), C; NFC
llvm-svn: 279834
2016-08-26 16:42:33 +00:00
Bob Haarman
7dc400a765 limit the number of instructions per block examined by dead store elimination
Summary: Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.

Reviewers: dexonsmith, bruno, george.burgess.iv, dberlin, reames, davidxl

Subscribers: davide, chandlerc, dberlin, davidxl, eraman, tejohnson, mbodart, llvm-commits

Differential Revision: https://reviews.llvm.org/D15537

llvm-svn: 279833
2016-08-26 16:34:27 +00:00
Sanjay Patel
a6a044e2b8 [InstCombine] rename variables in foldICmpAndConstant(); NFC
llvm-svn: 279831
2016-08-26 16:14:06 +00:00
Bob Haarman
278b19017e test commit
llvm-svn: 279830
2016-08-26 16:00:04 +00:00
Adam Nemet
c4e22f3757 [LoopUnroll] Use OptimizationRemarkEmitter directly not via the analysis pass
We can't mark ORE (a function pass) preserved as required by the loop
passes because that is how we ensure that the required passes like
LazyBFI are all available any time ORE is used.  See the new comments in
the patch.

Instead we use it directly just like the inliner does in D22694.

As expected there is some additional overhead after removing the caching
provided by analysis passes.  The worst case, I measured was
LNT/CINT2006_ref/401.bzip2 which regresses by 12%.  As before, this only
affects -Rpass-with-hotness and not default compilation.

llvm-svn: 279829
2016-08-26 15:58:34 +00:00
Sanjay Patel
807e452fc0 [InstCombine] rename variables in foldICmpDivConstant(); NFC
Removing the redundant 'CmpRHSV' local variable exposes a bug in the caller
foldICmpShrConstant() - it was sending in the div constant instead of the
cmp constant. But I have not been able to expose this in a regression test
yet - the affected folds all appear to be handled before we ever reach this
code. I'll keep trying to find a case as I make changes to allow vector folds
in both functions.

llvm-svn: 279828
2016-08-26 15:53:01 +00:00
Davide Italiano
f463cb5866 [lib/LTO] Add an assertion to catch invalid opt levels.
llvm-svn: 279823
2016-08-26 15:22:59 +00:00
Chad Rosier
ae57150dc8 [AArch64] Avoid materializing constant 1 by using csinc, rather than csel.
This is similar to what was done in r261675, but for CSINC rather than CSINV.

Differential Revision: https://reviews.llvm.org/D23892

llvm-svn: 279822
2016-08-26 14:01:55 +00:00
Pablo Barrio
1f84662941 Handle empty functions with debug info in load/store opt pass
Summary:
In fuctions that contained debug info but were empty otherwise,
the ARM load/store optimizer could abort. This was because
function MergeReturnIntoLDM handled the special case where a
Machine Basic BLock is empty by calling MBB.empty(). However, this
returns false in presence of debug info, although the function
should be considered empty in the eyes of the load/store optimizer.
This has been fixed by handling the case where searching through the
block finds only debug instructions.

Reviewers: rengolin, dexonsmith, llvm-commits, jmolloy

Subscribers: t.p.northover, aemerson, rengolin, samparker

Differential Revision: https://reviews.llvm.org/D23847

llvm-svn: 279820
2016-08-26 13:00:39 +00:00
Simon Pilgrim
8d9b313c40 [X86][SSE4A] The EXTRQ/INSERTQ bit extraction/insertion ops should be in the integer domain
llvm-svn: 279811
2016-08-26 09:55:41 +00:00
Eugene Leviant
eb33dbc270 Implement getRandomBytes() function
This function allows getting arbitrary sized block of random bytes.
Primary motivation is support for --build-id=uuid in lld.

Differential revision: https://reviews.llvm.org/D23671

llvm-svn: 279807
2016-08-26 08:14:54 +00:00
Craig Topper
471febb7d3 [X86][SSE] Add CMPSS/CMPSD intrinsic scalar load folding support.
llvm-svn: 279806
2016-08-26 07:08:00 +00:00
Matt Arsenault
7621e40810 Replace subregister uses when processing tied operands
This was for some reason skipping operands that are subregisters
instead of keeping the same subregister index.

v_movreld_b32 expects src0 to be the subregister of the tied
super register use/def.

e.g.

v_movreld_b32 v0, v9, <imp-def, tied3> v[0:3], <imp-use, tied2> v[0:3]

was being replaced with

v[4:7] = copy v[0:3]
v_movreld_b32 v0, v9, <imp-def, tied3> v[4:7], <imp-use, tied2> v[4:7],

which really writes to v[0:3]

llvm-svn: 279804
2016-08-26 06:31:32 +00:00
Kostya Serebryany
70186ece8e [libFuzzer] simplify a test to make it pass on the bot
llvm-svn: 279796
2016-08-26 00:18:16 +00:00
Kostya Serebryany
d4cdf49632 [libFuzzer] make sure we have symbols on fuzzer tests
llvm-svn: 279792
2016-08-25 23:30:02 +00:00
Michael Kuperstein
0f3d6c7984 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 (AKA r274626) and its follow-ups (r276347, r277289),
due to miscompiles in the test suite. The FastISel change was left in, because
it apparently fixes an unrelated issue.

(Recommit of r279782 which was broken due to a bad merge.)

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279788
2016-08-25 22:48:11 +00:00
Kostya Serebryany
25e0e96b53 [libFizzer] rename -print_new_cov_pcs=1 into -print_pcs=1 and make it more useful: print PCs only after the initial corpus has been read and symbolize them
llvm-svn: 279787
2016-08-25 22:35:08 +00:00
Michael Kuperstein
9e11987d29 Revert r279782 due to debug buildbot breakage.
llvm-svn: 279785
2016-08-25 22:14:45 +00:00
Michael Kuperstein
041c1cdc10 Revert r274613 because it breaks the test suite with AVX512
This reverts most of r274613 and its follow-ups (r276347, r277289), due to
miscompiles in the test suite. The FastISel change was left in, because it
apparently fixes an unrelated issue.

This fixes 4 out of the 5 test failures in PR29112.

llvm-svn: 279782
2016-08-25 21:55:41 +00:00
Tim Shen
a8864abefc [MemCpy] Add comments for r279769
Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279778
2016-08-25 21:03:46 +00:00
Tim Northover
35c35dd4f1 ARM: by default don't set the Thumb bit on MachO relocated values.
Its existence is largely historical, apparently we tried to make ARM object
files look maybe-almost-possibly runnable by putting our best guess at the
actual value into relocated locations. Of course, the real linker then comes
along and can completely change things.

But it should only be there for word-sized and movw/movt relocations. It can't
be encoded in branch relocations, and I've seen it mess up validity
calculations twice in the last couple of weeks so the default is clearly problematic.

llvm-svn: 279773
2016-08-25 20:41:30 +00:00
Tim Shen
358f130893 [MemCpy] Check for alias in performMemCpyToMemSetOptzn, instead of the identity of two operands
Summary:
This fixes pr29105. The reason is that lifetime marks creates new
aliasing pointers the original ones, but before this patch aliases
were not checked in performMemCpyToMemSetOptzn.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23846

llvm-svn: 279769
2016-08-25 19:27:26 +00:00
Michael Kuperstein
815b7bc975 Reuse an SDLoc throughout a function. NFC.
llvm-svn: 279767
2016-08-25 18:50:56 +00:00
Tim Northover
1e63f9c419 GlobalISel: add missing type to G_UADDE instructions
llvm-svn: 279762
2016-08-25 17:37:44 +00:00
Tim Northover
3892589cf4 GlobalISel: mark overflow bit of overflow ops legal.
It's expected this will map to NZCV register class and be properly selectable.

llvm-svn: 279761
2016-08-25 17:37:41 +00:00
Tim Northover
5ba7519ed1 GlobalISel: mark simple ops legal even on types < 32-bit.
The 32-bit variants of these operations don't depend on the bits not being
operated on, so they also naturally model operations narrower than the actual
register width.

llvm-svn: 279760
2016-08-25 17:37:39 +00:00
Tim Northover
ae0d153231 GlobalISel: mark pointer constants as legal on AArch64.
llvm-svn: 279759
2016-08-25 17:37:35 +00:00
Tim Northover
73171f5329 GlobalISel: perform multi-step legalization
llvm-svn: 279758
2016-08-25 17:37:32 +00:00
Tim Northover
49be408e74 GlobalISel: mark small extends as legal on AArch64
llvm-svn: 279757
2016-08-25 17:37:25 +00:00
Michael Kuperstein
9544cc08df [X86] 512-bit VPAVG requires AVX512BW
Fix VPAVG detection to require AVX512BW, not AVX512F for 512-bit widths,
and change associated asserts to assert in the right direction...

This fixes PR29111.

llvm-svn: 279755
2016-08-25 17:17:46 +00:00
Simon Pilgrim
0936d75baa [X86][SSE] INSERTPS is only combined on v4f32 types. NFCI.
llvm-svn: 279751
2016-08-25 17:02:00 +00:00
Ron Lieberman
abbe6a467a [Hexagon] Remove extraneous debug output from HexagonCopyToCombine.cpp
BB# ...

llvm-svn: 279750
2016-08-25 16:46:09 +00:00
Wei Mi
d25dea67a3 [UNROLL] Postpone ScalarEvolution::forgetLoop after TripCountSC is expanded
when unroll runtime iteration loop.

In llvm::UnrollRuntimeLoopRemainder, if the loop to be unrolled is the inner
loop inside a loop nest, the scalar evolution needs to be dropped for its
parent loop which is done by ScalarEvolution::forgetLoop. However, we can
postpone forgetLoop to the end of UnrollRuntimeLoopRemainder so TripCountSC
expansion can still reuse existing value.

Differential Revision: https://reviews.llvm.org/D23572

llvm-svn: 279748
2016-08-25 16:17:18 +00:00
Simon Pilgrim
3245a54b10 Fix line endings
llvm-svn: 279745
2016-08-25 15:45:27 +00:00
Ron Lieberman
75c1625a5f [Hexagon] vector store print tracing.
Add vector store print tracing option for hexagon vector instructions.

https://reviews.llvm.org/D23870

llvm-svn: 279739
2016-08-25 13:35:48 +00:00
Simon Pilgrim
e1493096c9 [X86][AVX] Provide SubVectorBroadcast fallback if load fold fails (PR29133)
Fix for PR29133, matching the approach that was taken for AVX1 scalar broadcasts.

llvm-svn: 279735
2016-08-25 12:45:16 +00:00
Sebastian Pop
54c907eff6 GVN-hoist: fix hoistingFromAllPaths for loops (PR29034)
It is invalid to hoist stores or loads if they are not executed on all paths
from the hoisting point to the exit of the function. In the testcase, there are
paths in the loop that do not execute the stores or the loads, and so hoisting
them within the loop is unsafe.

The problem is that the current implementation of hoistingFromAllPaths is
incomplete: it walks all blocks dominated by the hoisting point, and does not
return false when the loop contains a path on which the hoisted ld/st is
not executed.

Differential Revision: https://reviews.llvm.org/D23843

llvm-svn: 279732
2016-08-25 11:55:47 +00:00
Craig Topper
4201cced25 [X86] Simplify getOperandBias as a bit. NFC
There's no reason for it to return a signed type. Just return the operand bias in each if instead of starting from 0 and adding in the 'if'.

llvm-svn: 279720
2016-08-25 04:16:10 +00:00
Craig Topper
7bd7c47934 [X86] Fix indentation per coding standards. NFC
llvm-svn: 279719
2016-08-25 04:16:08 +00:00
George Burgess IV
499f035638 Make buildbots happy.
"warning: extra ‘;’ [-Wpedantic]"

llvm-svn: 279703
2016-08-25 02:15:54 +00:00
Kyle Butt
d2d132562f TailDuplication: Don't pass MMI separately from MF. NFC
MMI must match the function passed, and MF has a handle on MMI. Use that instead
of accepting it as separate argument. No Functional Change.

llvm-svn: 279701
2016-08-25 01:37:07 +00:00
Kyle Butt
b2a5fef7df TailDuplication: Save MF and reduce number of parameters. NFC
Save the function in the class, and then don't pass it around. This reduces the
number of parameters and makes calls to member functions simpler.
No Functional Change.

llvm-svn: 279700
2016-08-25 01:37:03 +00:00
George Burgess IV
7056845909 Update a comment.
r279696, which changed `LLVM_CONSTEXPR AliasAttr` to `const AliasAttr`,
made this comment make less sense.

llvm-svn: 279699
2016-08-25 01:29:55 +00:00
Matthias Braun
923da8d677 MachineFunctionProperties/MIRParser: Rename AllVRegsAllocated->NoVRegs, compute it
Rename AllVRegsAllocated to NoVRegs. This avoids the connotation of
running after register and simply describes that no vregs are used in
a machine function. With that we can simply compute the property and do
not need to dump/parse it in .mir files.

Differential Revision: http://reviews.llvm.org/D23850

llvm-svn: 279698
2016-08-25 01:27:13 +00:00
Kostya Serebryany
d93b4b7761 [libFuzzer] simplify the code, NFC
llvm-svn: 279697
2016-08-25 01:25:03 +00:00
George Burgess IV
dd6439368a Make some LLVM_CONSTEXPR variables const. NFC.
This patch changes LLVM_CONSTEXPR variable declarations to const
variable declarations, since LLVM_CONSTEXPR expands to nothing if the
current compiler doesn't support constexpr. In all of the changed
cases, it looks like the code intended the variable to be const instead
of sometimes-constexpr sometimes-not.

llvm-svn: 279696
2016-08-25 01:05:08 +00:00
Eugene Zelenko
5c80b0e4f8 Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes.
Differential revision: https://reviews.llvm.org/D23861

llvm-svn: 279695
2016-08-25 00:45:04 +00:00
Xinliang David Li
69ca8eb5f7 [Profile] Propagate branch metadata properly in instcombine
Differential Revision: http://reviews.llvm.org/D23590

llvm-svn: 279693
2016-08-25 00:26:32 +00:00
Kostya Serebryany
2374a83857 [libFuzzer] make a test more deterministic
llvm-svn: 279686
2016-08-24 23:10:17 +00:00
Sanjay Patel
f01c1dc0b6 [InstCombine] move foldICmpDivConstConst() contents to foldICmpDivConstant(); NFCI
There was no logic in foldICmpDivConstant, so no need for a separate function.
The code is directly copy/pasted, so further cleanups to follow.

llvm-svn: 279685
2016-08-24 23:03:36 +00:00
Evgeny Stupachenko
a52e9dab82 The patch improves ValueTracking on left shift with nsw flag.
Summary:
The patch fixes PR28946.

Reviewers: majnemer, sanjoy

Differential Revision: http://reviews.llvm.org/D23296

From: Li Huang
llvm-svn: 279684
2016-08-24 23:01:33 +00:00
Heejin Ahn
6c053cb727 [WebAssembly] Change a comment line
Test for commit access.

llvm-svn: 279683
2016-08-24 22:53:00 +00:00
Krzysztof Parzyszek
3484206164 [Hexagon] Check for block end when skipping debug instructions
llvm-svn: 279681
2016-08-24 22:36:35 +00:00
Matthias Braun
097d0e97f1 MIRParser/MIRPrinter: Compute HasInlineAsm instead of printing/parsing it
llvm-svn: 279680
2016-08-24 22:34:06 +00:00
Krzysztof Parzyszek
b0a1532712 [Hexagon] Change insertion of expand-condsets pass to avoid memory leaks
llvm-svn: 279678
2016-08-24 22:27:36 +00:00
Sanjay Patel
2f27429f46 [InstCombine] use m_APInt to allow icmp eq/ne (shr X, C2), C folds for splat constant vectors
llvm-svn: 279677
2016-08-24 22:22:06 +00:00
Matthias Braun
a42c8d848a MachineRegisterInfo/MIR: Initialize tracksSubRegLiveness early, do not print/parser it
tracksSubRegLiveness only depends on the Subtarget and a cl::opt, there
is not need to change it or save/parse it in a .mir file.
Make the field const and move the initialization LiveIntervalAnalysis to the
MachineRegisterInfo constructor. Also cleanup some code and fix some
instances which better use MachineRegisterInfo::subRegLivenessEnabled() instead
of TargetSubtargetInfo::enableSubRegLiveness().

llvm-svn: 279676
2016-08-24 22:17:45 +00:00
Kyle Butt
3fc7c8cb26 CodeGen: If Convert blocks that would form a diamond when tail-merged.
The following function currently relies on tail-merging for if
conversion to succeed. The common tail of cond_true and cond_false is
extracted, and this then forms a diamond pattern that can be
successfully if converted.

If this block does not get extracted, either because tail-merging is
disabled or the threshold is higher, we should still recognize this
pattern and if-convert it.

Fixed a regression in the original commit. Need to un-reverse branches after
reversing them, or other conversions go awry.

define i32 @t2(i32 %a, i32 %b) nounwind {
entry:
        %tmp1434 = icmp eq i32 %a, %b           ; <i1> [#uses=1]
        br i1 %tmp1434, label %bb17, label %bb.outer

bb.outer:               ; preds = %cond_false, %entry
        %b_addr.021.0.ph = phi i32 [ %b, %entry ], [ %tmp10, %cond_false ]
        %a_addr.026.0.ph = phi i32 [ %a, %entry ], [ %a_addr.026.0, %cond_false ]
        br label %bb

bb:             ; preds = %cond_true, %bb.outer
        %indvar = phi i32 [ 0, %bb.outer ], [ %indvar.next, %cond_true ]
        %tmp. = sub i32 0, %b_addr.021.0.ph
        %tmp.40 = mul i32 %indvar, %tmp.
        %a_addr.026.0 = add i32 %tmp.40, %a_addr.026.0.ph
        %tmp3 = icmp sgt i32 %a_addr.026.0, %b_addr.021.0.ph
        br i1 %tmp3, label %cond_true, label %cond_false

cond_true:              ; preds = %bb
        %tmp7 = sub i32 %a_addr.026.0, %b_addr.021.0.ph
        %tmp1437 = icmp eq i32 %tmp7, %b_addr.021.0.ph
        %indvar.next = add i32 %indvar, 1
        br i1 %tmp1437, label %bb17, label %bb

cond_false:             ; preds = %bb
        %tmp10 = sub i32 %b_addr.021.0.ph, %a_addr.026.0
        %tmp14 = icmp eq i32 %a_addr.026.0, %tmp10
        br i1 %tmp14, label %bb17, label %bb.outer

bb17:           ; preds = %cond_false, %cond_true, %entry
        %a_addr.026.1 = phi i32 [ %a, %entry ], [ %tmp7, %cond_true ], [ %a_addr.026.0, %cond_false ]
        ret i32 %a_addr.026.1
}

Without tail-merging or diamond-tail if conversion:
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ble     LBB1_3
@ BB#2:                                 @ %cond_true
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r0, r0, r1
        cmp     r1, r0
        it      ne
        cmpne   r0, r1
        bgt     LBB1_4
LBB1_3:                                 @ %cond_false
                                        @   in Loop: Header=BB1_1 Depth=1
        subs    r1, r1, r0
        cmp     r1, r0
        bne     LBB1_1
LBB1_4:                                 @ %bb17
        bx      lr

With diamond-tail if conversion, but without tail-merging:
@ BB#0:                                 @ %entry
        cmp     r0, r1
        it      eq
        bxeq    lr
LBB1_1:                                 @ %bb
                                        @ =>This Inner Loop Header: Depth=1
        cmp     r0, r1
        ite     le
        suble   r1, r1, r0
        subgt   r0, r0, r1
        cmp     r1, r0
        bne     LBB1_1
@ BB#2:                                 @ %bb17
        bx      lr

llvm-svn: 279671
2016-08-24 21:34:27 +00:00
Kyle Butt
2a20a3f170 IfConversion: Rescan diamonds.
The cost of predicating a diamond is only the instructions that are not shared
between the two branches. Additionally If a predicate clobbering instruction
occurs in the shared portion of the branches (e.g. a cond move), it may still
be possible to if convert the sub-cfg. This change handles these two facts by
rescanning the non-shared portion of a diamond sub-cfg to recalculate both the
predication cost and whether both blocks are pred-clobbering.

Fixed 2 bugs before recommitting. Branch instructions must be compared and found
identical before diamond conversion. Also, predicate-clobbering instructions in
the shared prefix disqualifies a potential diamond conversion. Includes tests
for both.

llvm-svn: 279670
2016-08-24 21:34:24 +00:00
Tim Northover
67e1ff026d ARM: don't diagnose cbz/cbnz to Thumb functions.
A branch-distance to a Thumb function shouldn't be forced to be odd for
CBZ/CBNZ instructions because (assuming it's within range), it's going to be a
valid, even offset.

llvm-svn: 279665
2016-08-24 21:21:29 +00:00
Changpeng Fang
351c66ab91 AMDGCN/SI: Implement readlane/readfirstlane intrinsics
Summary:
  This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.

Reviewed by:
  arsenm and tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D22489

llvm-svn: 279660
2016-08-24 20:35:23 +00:00
Rafael Espindola
92d1fa90a3 Use isTargetMachO instead of isTargetDarwin.
llvm-svn: 279655
2016-08-24 19:02:29 +00:00
Simon Pilgrim
2b849153ad [X86][SSE] Add MINSD/MAXSD/MINSS/MAXSS intrinsic scalar load folding support
These are no different in load behaviour to the existing ADD/SUB/MUL/DIV scalar ops but were missing from isNonFoldablePartialRegisterLoad

llvm-svn: 279652
2016-08-24 18:40:53 +00:00
David Blaikie
22b8e86371 DebugInfo: Add flag to CU to disable emission of inline debug info into the skeleton CU
In cases where .dwo/.dwp files are guaranteed to be available, skipping
the extra online (in the .o file) inline info can save a substantial
amount of space - see the original r221306 for more details there.

llvm-svn: 279650
2016-08-24 18:29:49 +00:00
Matthew Simpson
e16dab59f1 [LV] Unify vector and scalar maps
This patch unifies the data structures we use for mapping instructions from the
original loop to their corresponding instructions in the new loop. Previously,
we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap.
WidenMap maintained the vector values each instruction from the old loop was
represented with, and ScalarIVMap maintained the scalar values each scalarized
induction variable was represented with. With this patch, all values created
for the new loop are maintained in VectorLoopValueMap.

The change allows for several simplifications. Previously, when an instruction
was scalarized, we had to insert the scalar values into vectors in order to
maintain the mapping in WidenMap. Then, if a user of the scalarized value was
also scalar, we had to extract the scalar values from the temporary vector we
created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions.
If a scalarized value is used by a scalar instruction, the scalar value is used
directly. However, if the scalarized value is needed by a vector instruction,
we generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization
code), is to first get the vector values an instruction from the original loop
maps to, and then extract a particular scalar value. This patch adds
getScalarValue for this purpose along side getVectorValue as an interface into
VectorLoopValueMap. These functions work together to return the requested
values if they're available or to produce them if they're not.

The mapping has also be made less permissive. Entries can be added to
VectorLoopValue map with the new initVector and initScalar functions.
getVectorValue has been modified to return a constant reference to the mapped
entries.

There's no real functional change with this patch; however, in some cases we
will generate slightly different code. For example, instead of an insertelement
sequence following the definition of an instruction, it will now precede the
first use of that instruction. This can be seen in the test case changes.

Differential Revision: https://reviews.llvm.org/D23169

llvm-svn: 279649
2016-08-24 18:23:17 +00:00
Evandro Menezes
903b947a0b [AArch64] Adjust the feature set for Exynos M1.
Enable zero cycle zeroing.

llvm-svn: 279648
2016-08-24 18:17:30 +00:00
Sanjoy Das
08f56c4db3 [SCCP] Don't delete side-effecting instructions
I'm not sure if the `!isa<CallInst>(Inst) &&
!isa<TerminatorInst>(Inst))` bit is correct either, but this fixes the
case we know is broken.

llvm-svn: 279647
2016-08-24 18:10:21 +00:00
Simon Pilgrim
17526c99b1 [X86][SSE] Add support for combining VZEXT_MOVL target shuffles
Includes adding more general support for the pattern: VZEXT_MOVL(VZEXT_LOAD(ptr)) -> VZEXT_LOAD(ptr)

This has unearthed a couple of latent poor codegen issues (MINSS/MAXSS scalar load folding and MOVDDUP/BROADCAST load folding patterns), which will be fixed shortly.

Its also reduced a couple of tests so that they no longer reach the instruction threshold necessary to be combined to PSHUFB (see PR26183).

llvm-svn: 279646
2016-08-24 18:07:53 +00:00
Krzysztof Parzyszek
6bcd799b5d [Hexagon] Enable subregister liveness tracking
llvm-svn: 279642
2016-08-24 17:17:39 +00:00
Krzysztof Parzyszek
7299b24172 [Hexagon] Remove the utilization of IMPLICIT_DEFs from expand-condsets
This is no longer necessary, because since r279625 the subregister
liveness properly accounts for read-undefs.

llvm-svn: 279637
2016-08-24 16:36:37 +00:00
Nico Weber
3170ff6896 fix typo 'varaible' in assert
llvm-svn: 279636
2016-08-24 16:34:54 +00:00
Wei Ding
73862a15f7 AMDGPU : Add V_SAD_U32 instruction pattern.
Differential Revision: http://reviews.llvm.org/D23069

llvm-svn: 279629
2016-08-24 14:59:47 +00:00
Sanjay Patel
556137fa8f [InstCombine] add assert and explanatory comment for fold removed in r279568; NFC
I deleted a fold from InstCombine at:
https://reviews.llvm.org/rL279568

because it (like any InstCombine to a constant?) should always happen in InstSimplify,
however, it's not obvious what the assumptions are in the remaining code.

Add a comment and assert to make it clearer.

Differential Revision: https://reviews.llvm.org/D23819

llvm-svn: 279626
2016-08-24 13:55:55 +00:00
Krzysztof Parzyszek
b56b4c886f Create subranges for new intervals resulting from live interval splitting
The register allocator can split a live interval of a register into a set
of smaller intervals. After the allocation of registers is complete, the
rewriter will modify the IR to replace virtual registers with the corres-
ponding physical registers. At this stage, if a register corresponding
to a subregister of a virtual register is used, the rewriter will check
if that subregister is undefined, and if so, it will add the <undef> flag
to the machine operand. The function verifying liveness of the subregis-
ter would assume that it is undefined, unless any of the subranges of the
live interval proves otherwise.
The problem is that the live intervals created during splitting do not
have any subranges, even if the original parent interval did. This could
result in the <undef> flag placed on a register that is actually defined.

Differential Revision: http://reviews.llvm.org/D21189

llvm-svn: 279625
2016-08-24 13:37:55 +00:00
Simon Dardis
a5ac225887 [mips] Preparatory work for a generic scheduler
Extend instruction definitions from nearly all ISAs to include
appropriate instruction itineraries. Change MIPS16s gp prologue
generation to use real instructions instead of using a pseudo
instruction.

Reviewers: dsanders, vkalintiris

Differential Review: https://reviews.llvm.org/D23548

llvm-svn: 279623
2016-08-24 13:00:47 +00:00
Simon Pilgrim
f39b229a31 [X86][AVX2] Ensure on 32-bit targets that we broadcast f64 types not i64 (PR29101)
llvm-svn: 279622
2016-08-24 12:42:31 +00:00
Gil Rapaport
06e1775b78 [Loop Vectorizer] Support predication of div/rem
div/rem instructions in basic blocks that require predication currently prevent
vectorization. This patch extends the existing mechanism for predicating stores
to handle other instructions and leverages it to predicate divs and rems.

Differential Revision: https://reviews.llvm.org/D22918

llvm-svn: 279620
2016-08-24 11:37:57 +00:00
Simon Pilgrim
f8172f3de2 [X86][SSE] Add support for 32-bit element vectors to X86ISD::VZEXT_LOAD
Consecutive load matching (EltsFromConsecutiveLoads) currently uses VZEXT_LOAD (load scalar into lowest element and zero uppers) for vXi64 / vXf64 vectors only.

For vXi32 / vXf32 vectors it instead creates a scalar load, SCALAR_TO_VECTOR and finally VZEXT_MOVL (zero upper vector elements), relying on tablegen patterns to match this into an equivalent of VZEXT_LOAD.

This patch adds the VZEXT_LOAD patterns for vXi32 / vXf32 vectors directly and updates EltsFromConsecutiveLoads to use this.

This has proven necessary to allow us to easily make VZEXT_MOVL a full member of the target shuffle set - without this change the call to combineShuffle (which is the main caller of EltsFromConsecutiveLoads) tended to recursively recreate VZEXT_MOVL nodes......

Differential Revision: https://reviews.llvm.org/D23673

llvm-svn: 279619
2016-08-24 10:46:40 +00:00
Chandler Carruth
94942f5c00 [PM] Introduce basic update capabilities to the new PM's CGSCC pass
manager, including both plumbing and logic to handle function pass
updates.

There are three fundamentally tied changes here:
1) Plumbing *some* mechanism for updating the CGSCC pass manager as the
   CG changes while passes are running.
2) Changing the CGSCC pass manager infrastructure to have support for
   the underlying graph to mutate mid-pass run.
3) Actually updating the CG after function passes run.

I can separate them if necessary, but I think its really useful to have
them together as the needs of #3 drove #2, and that in turn drove #1.

The plumbing technique is to extend the "run" method signature with
extra arguments. We provide the call graph that intrinsically is
available as it is the basis of the pass manager's IR units, and an
output parameter that records the results of updating the call graph
during an SCC passes's run. Note that "...UpdateResult" isn't a *great*
name here... suggestions very welcome.

I tried a pretty frustrating number of different data structures and such
for the innards of the update result. Every other one failed for one
reason or another. Sometimes I just couldn't keep the layers of
complexity right in my head. The thing that really worked was to just
directly provide access to the underlying structures used to walk the
call graph so that their updates could be informed by the *particular*
nature of the change to the graph.

The technique for how to make the pass management infrastructure cope
with mutating graphs was also something that took a really, really large
number of iterations to get to a place where I was happy. Here are some
of the considerations that drove the design:

- We operate at three levels within the infrastructure: RefSCC, SCC, and
  Node. In each case, we are working bottom up and so we want to
  continue to iterate on the "lowest" node as the graph changes. Look at
  how we iterate over nodes in an SCC running function passes as those
  function passes mutate the CG. We continue to iterate on the "lowest"
  SCC, which is the one that continues to contain the function just
  processed.

- The call graph structure re-uses SCCs (and RefSCCs) during mutation
  events for the *highest* entry in the resulting new subgraph, not the
  lowest. This means that it is necessary to continually update the
  current SCC or RefSCC as it shifts. This is really surprising and
  subtle, and took a long time for me to work out. I actually tried
  changing the call graph to provide the opposite behavior, and it
  breaks *EVERYTHING*. The graph update algorithms are really deeply
  tied to this particualr pattern.

- When SCCs or RefSCCs are split apart and refined and we continually
  re-pin our processing to the bottom one in the subgraph, we need to
  enqueue the newly formed SCCs and RefSCCs for subsequent processing.
  Queuing them presents a few challenges:
  1) SCCs and RefSCCs use wildly different iteration strategies at
     a high level. We end up needing to converge them on worklist
     approaches that can be extended in order to be able to handle the
     mutations.
  2) The order of the enqueuing need to remain bottom-up post-order so
     that we don't get surprising order of visitation for things like
     the inliner.
  3) We need the worklists to have set semantics so we don't duplicate
     things endlessly. We don't need a *persistent* set though because
     we always keep processing the bottom node!!!! This is super, super
     surprising to me and took a long time to convince myself this is
     correct, but I'm pretty sure it is... Once we sink down to the
     bottom node, we can't re-split out the same node in any way, and
     the postorder of the current queue is fixed and unchanging.
  4) We need to make sure that the "current" SCC or RefSCC actually gets
     enqueued here such that we re-visit it because we continue
     processing a *new*, *bottom* SCC/RefSCC.

- We also need the ability to *skip* SCCs and RefSCCs that get merged
  into a larger component. We even need the ability to skip *nodes* from
  an SCC that are no longer part of that SCC.

This led to the design you see in the patch which uses SetVector-based
worklists. The RefSCC worklist is always empty until an update occurs
and is just used to handle those RefSCCs created by updates as the
others don't even exist yet and are formed on-demand during the
bottom-up walk. The SCC worklist is pre-populated from the RefSCC, and
we push new SCCs onto it and blacklist existing SCCs on it to get the
desired processing.

We then *directly* update these when updating the call graph as I was
never able to find a satisfactory abstraction around the update
strategy.

Finally, we need to compute the updates for function passes. This is
mostly used as an initial customer of all the update mechanisms to drive
their design to at least cover some real set of use cases. There are
a bunch of interesting things that came out of doing this:

- It is really nice to do this a function at a time because that
  function is likely hot in the cache. This means we want even the
  function pass adaptor to support online updates to the call graph!

- To update the call graph after arbitrary function pass mutations is
  quite hard. We have to build a fairly comprehensive set of
  data structures and then process them. Fortunately, some of this code
  is related to the code for building the cal graph in the first place.
  Unfortunately, very little of it makes any sense to share because the
  nature of what we're doing is so very different. I've factored out the
  one part that made sense at least.

- We need to transfer these updates into the various structures for the
  CGSCC pass manager. Once those were more sanely worked out, this
  became relatively easier. But some of those needs necessitated changes
  to the LazyCallGraph interface to make it significantly easier to
  extract the changed SCCs from an update operation.

- We also need to update the CGSCC analysis manager as the shape of the
  graph changes. When an SCC is merged away we need to clear analyses
  associated with it from the analysis manager which we didn't have
  support for in the analysis manager infrsatructure. New SCCs are easy!
  But then we have the case that the original SCC has its shape changed
  but remains in the call graph. There we need to *invalidate* the
  analyses associated with it.

- We also need to invalidate analyses after we *finish* processing an
  SCC. But the analyses we need to invalidate here are *only those for
  the newly updated SCC*!!! Because we only continue processing the
  bottom SCC, if we split SCCs apart the original one gets invalidated
  once when its shape changes and is not processed farther so its
  analyses will be correct. It is the bottom SCC which continues being
  processed and needs to have the "normal" invalidation done based on
  the preserved analyses set.

All of this is mostly background and context for the changes here.

Many thanks to all the reviewers who helped here. Especially Sanjoy who
caught several interesting bugs in the graph algorithms, David, Sean,
and others who all helped with feedback.

Differential Revision: http://reviews.llvm.org/D21464

llvm-svn: 279618
2016-08-24 09:37:14 +00:00
Gor Nishanov
0201e08361 [Coroutines] Fix unused var warning in release build
llvm-svn: 279610
2016-08-24 05:20:30 +00:00
Gor Nishanov
2a69048889 [Coroutines] Part 8: Coroutine Frame Building algorithm
Summary:
This patch adds coroutine frame building algorithm. Now, simple coroutines such as ex0.ll and ex1.ll (first examples from docs\Coroutines.rst can be compiled).

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
...

7. Split coroutine into subfunctions. (https://reviews.llvm.org/D23461)
8. Coroutine Frame Building algorithm  <= we are here
9. Add f.cleanup subfunction.
10+. The rest of the logic

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23586

llvm-svn: 279609
2016-08-24 04:44:35 +00:00
Chandler Carruth
3595c24e57 Preserve a pointer to the newly allocated signal stack as well. That too
is flagged by LSan at least among leak detectors.

llvm-svn: 279605
2016-08-24 03:42:51 +00:00
Matthias Braun
d234079959 TargetSchedule: Do not consider subregister definitions as reads.
We should not consider subregister definitions as reads for schedule
model purposes (they are just modeled as reads of the overal vreg for
liveness calculation purposes, the CPU instructions are not actually
reading).

Unfortunately I cannot submit a test for this as it requires a target
which uses ReadAdvance annotation in the scheduling model and has
subregister liveness enabled at the same time, which is only the case on
an out of tree target.

llvm-svn: 279604
2016-08-24 02:32:29 +00:00
Matthias Braun
f96b4d234c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279602
2016-08-24 01:52:46 +00:00
Kostya Serebryany
ed73edee45 [libFuzzer] use __attribute__((target("popcnt"))) only on x86_64
llvm-svn: 279601
2016-08-24 01:38:42 +00:00
Matthias Braun
7b825391db MIRParser/MIRPrinter: Compute isSSA instead of printing/parsing it.
Specifying isSSA is an extra line at best and results in invalid MI at
worst. Compute the value instead.

Differential Revision: http://reviews.llvm.org/D22722

llvm-svn: 279600
2016-08-24 01:32:41 +00:00
Richard Smith
0b5fef23d4 Increase the size of the sigaltstack used by LLVM signal handlers. 8KB is not
sufficient in some cases; increase to 64KB, which should be enough for anyone :)

Patch by github.com/bryant!

llvm-svn: 279599
2016-08-24 00:54:49 +00:00
Matthias Braun
13d789ce38 MachineModuleInfo: Avoid dummy constructor, use INITIALIZE_TM_PASS
Change this pass constructor to just accept a const TargetMachine * and
use INITIALIZE_TM_PASS, that way we can get rid of the dummy
constructor. The pass will still fail when calling the default
constructor leading to TM == nullptr, this is no different than before
but is more in line what other codegen passes are doing and avoids the
dummy constructor.

llvm-svn: 279598
2016-08-24 00:42:05 +00:00
David Callahan
919a213064 [ADCE] Add control dependence computation
Summary:
This is part of a serious of patches to evolve ADCE.cpp to support
removing of unnecessary control flow.

This patch adds the ability to compute control dependences using
the iterated dominance frontier. We extend the liveness propagation
to alternate between data and control dependences until convergences.

Modify the pass manager intergation to compute the post-dominator tree
needed for iterator dominance frontier.

We still force all terminators live for now until we add code to
handlinge removing control flow in a later patch.

No changes to effective behavior with this patch

Previous patches:

D23225 [ADCE] Modify data structures to support removing control flow
D23065 [ADCE] Refactor anticipating new functionality (NFC)
D23102 [ADCE] Refactoring for new functionality (NFC)

Reviewers: nadav, majnemer, mehdi_amini

Subscribers: twoh, freik, llvm-commits

Differential Revision: https://reviews.llvm.org/D23559

llvm-svn: 279594
2016-08-24 00:10:06 +00:00
Philip Reames
e0cd757614 [stackmaps] Remove an unneeded member variable [NFC]
llvm-svn: 279590
2016-08-23 23:58:08 +00:00
Kostya Serebryany
b0ba8a2254 [libFuzzer] collect 64 states for value profile, not 65
llvm-svn: 279588
2016-08-23 23:37:37 +00:00
Philip Reames
9893373aa7 [stackmaps] More extraction of common code [NFCI]
General cleanup before starting to work on the part I want to actually change.

llvm-svn: 279586
2016-08-23 23:33:29 +00:00
Michael Zolotukhin
8f5cdd38cc [LoopUnroll] By default disable unrolling when optimizing for size.
Summary:
In clang commit r268509 we started to invoke loop-unroll pass from the
driver even under -Os. However, we happen to not initialize optsize
thresholds properly, which si fixed with this change.

r268509 led to some big compile time regressions, because we started to
unroll some loops that we didn't unroll before. With this change I hope
to recover most of the regressions. We still are slightly slower than
before, because we do some checks here and there in loop-unrolling
before we bail out, but at least the slowdown is not that huge now.

Reviewers: hfinkel, chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23388

llvm-svn: 279585
2016-08-23 23:13:15 +00:00
Richard Smith
5a63bc1b55 Don't use "return {...}" to initialize a std::tuple. This has only been valid
since 2015 (n4387), though it's allowed by a library DR so new implementations
accept it in their C++11 modes...

This should unbreak the build with libstdc++ 4.9.

llvm-svn: 279583
2016-08-23 22:21:58 +00:00
Richard Smith
c4bb029b55 #ifdef out validation code when asserts are disabled to remove unused variable
warnings.

llvm-svn: 279582
2016-08-23 22:14:15 +00:00
Richard Smith
785a4eccfd Remove unused data member to unbreak -Werror builds.
llvm-svn: 279581
2016-08-23 22:10:46 +00:00
Richard Smith
d13903d090 Revert r279564. It introduces undefined behavior (binding a reference to a
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.

llvm-svn: 279580
2016-08-23 22:08:27 +00:00
Sanjay Patel
c44c06d7a3 [InstCombine] use local variables for repeated values; NFCI
llvm-svn: 279578
2016-08-23 22:05:55 +00:00
Petr Hosek
46d79cf83a [MC] Support .dc directives in assembler parser
While these directives are mostly aliases for the existing integer
and float value directives, some of them like .dc.a have no direct
equivalents and are sometimes being used for convenience.

Differential Revision: https://reviews.llvm.org/D23810

llvm-svn: 279577
2016-08-23 21:34:53 +00:00
Mehdi Amini
54cea79c7b [ThinLTO] Add caching to the new LTO API
Add the ability to plug a cache on the LTO API.
I tried to write such that a linker implementation can
control the cache backend. This is intrusive and I'm
not totally happy with it, but I can't figure out a
better design right now.

Differential Revision: https://reviews.llvm.org/D23599

llvm-svn: 279576
2016-08-23 21:30:12 +00:00
Sanjay Patel
fd8c14ece2 [InstCombine] move foldICmpShrConstConst() contents to foldICmpShrConst(); NFCI
There will only be 3 lines of code in foldICmpShrConst() when the cleanup is done,
so it doesn't make much sense to have a separate function for a single fold.

llvm-svn: 279575
2016-08-23 21:25:13 +00:00
Philip Reames
4d9558b08e [stackmaps] Extract out magic constants [NFCI]
This is a first step towards clarifying the exact MI semantics of stackmap's "live values".  

llvm-svn: 279574
2016-08-23 21:21:43 +00:00
Matthias Braun
c10c0e4ef4 MachineFunction: Introduce NoPHIs property
I want to compute the SSA property of .mir files automatically in
upcoming patches. The problem with this is that some inputs will be
reported as static single assignment with some passes claiming not to
support SSA form.  In reality though those passes do not support PHI
instructions => Track the presence of PHI instructions separate from the
SSA property.

Differential Revision: https://reviews.llvm.org/D22719

llvm-svn: 279573
2016-08-23 21:19:49 +00:00
Sanjay Patel
f52d2911c1 [InstCombine] remove icmp shr folds that are already handled by InstSimplify
AFAICT, these already worked in all cases for scalar types, and I enhanced
the code to work for vector types in:
https://reviews.llvm.org/rL279543

llvm-svn: 279568
2016-08-23 21:01:35 +00:00
Tim Northover
274ad76abc GlobalISel: make truncate/extend casts uniform
They really should have both types represented, but early variants were created
before MachineInstrs could have multiple types so they're rather ambiguous.

llvm-svn: 279567
2016-08-23 21:01:33 +00:00
Tim Northover
3991ec90ca GlobalISel: legalize integer comparisons on AArch64.
Next step is doing both legalizations at the same time! Marvel at GlobalISel's
cunning.

llvm-svn: 279566
2016-08-23 21:01:26 +00:00
Tim Northover
c3b004b87b GlobalISel: legalize conditional branches on AArch64.
llvm-svn: 279565
2016-08-23 21:01:20 +00:00
Matthias Braun
d483fd6c76 CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279564
2016-08-23 20:58:29 +00:00
David Majnemer
70561a7ec3 [ValueTracking] Use a function_ref to avoid multiple instantiations
No functional change intended, this should just be a code size
improvement.

llvm-svn: 279563
2016-08-23 20:52:00 +00:00
Matthew Simpson
b74ea1ef12 [SLP] Avoid signed integer overflow
The test case included with r279125 exposed an existing signed integer
overflow. Since getTreeCost can return INT_MAX, we can't sum this cost together
with other costs, such as getReductionCost.

This patch removes the possibility of assigning a cost of INT_MAX. Since we
were previously using INT_MAX as an indicator for "should not vectorize", we
now explicitly check this condition with "isTreeTinyAndNotFullyVectorizable"
before computing a cost.

This patch adds a run-line to the test case used for r279125 that ensures we
don't vectorize. Previously, this line would vectorize the test case by chance
due to undefined behavior in the cost calculation.

Differential Revision: https://reviews.llvm.org/D23723

llvm-svn: 279562
2016-08-23 20:48:50 +00:00
Zachary Turner
585b950a97 Remove unused translation unit.
llvm-svn: 279561
2016-08-23 20:08:02 +00:00
Tim Northover
38c89ee55c GlobalISel: extend legalizer interface to handle multiple types.
Instructions like G_ICMP have multiple types that may need to be legalized (the
boolean output and nearly arbitrary inputs in this case). So the legalizer must
be capable of deciding what to do for each of them separately.

llvm-svn: 279554
2016-08-23 19:30:42 +00:00
Tim Northover
8eff7e11dc GlobalISel: mark pointer casts legal on AArch64.
llvm-svn: 279553
2016-08-23 19:30:38 +00:00
Mehdi Amini
1a6d8014ee Stop always creating and running an LTO compilation if there is not a single LTO object
Summary:
I assume there was a use case, so maybe this strawman patch will help
clarifying if it is legit.
In any case the current situation is not legit: a ThinLTO compilation
should not trigger an unexpected full LTO compilation.
Right now, adding a --save-temps option triggers this and makes the
number of output differs.

Reviewers: tejohnson

Subscribers: pcc, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23600

llvm-svn: 279550
2016-08-23 18:39:12 +00:00
Tim Northover
685dd8eded GlobalISel: legalize 1-bit load/store and mark 8/16 bit variants legal on AArch64.
llvm-svn: 279548
2016-08-23 18:20:09 +00:00
Sanjay Patel
af4e2d5037 [InstSimplify] allow icmp with constant folds for splat vectors, part 2
Completes the m_APInt changes for simplifyICmpWithConstant().

Other commits in this series:
https://reviews.llvm.org/rL279492
https://reviews.llvm.org/rL279530
https://reviews.llvm.org/rL279534
https://reviews.llvm.org/rL279538

llvm-svn: 279543
2016-08-23 18:00:51 +00:00
Xinliang David Li
4ee46b690b Possible fix of test failures on win bots
llvm-svn: 279542
2016-08-23 18:00:41 +00:00
Sanjay Patel
16b202aa5c [InstSimplify] allow icmp with constant folds for splat vectors, part 1
llvm-svn: 279538
2016-08-23 17:30:56 +00:00
Justin Lebar
75ed28e4bb [SelectionDAG] Use a union of bitfield structs for SDNode::SubclassData.
Summary:
This greatly simplifies our handling of SDNode::SubclassData.

NFC, hopefully.  :)

See discussion in D23035 for discussion about the design API of these
bitfields.

Reviewers: chandlerc

Subscribers: llvm-commits, rnk

Differential Revision: https://reviews.llvm.org/D23036

llvm-svn: 279537
2016-08-23 17:18:11 +00:00
Justin Lebar
33d12a9ade [CodeGen] Convert a loop to a for-each loop. NFC
llvm-svn: 279536
2016-08-23 17:18:07 +00:00
Eugene Zelenko
e710ddeef7 Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes.
Differential revision: https://reviews.llvm.org/D23789

llvm-svn: 279535
2016-08-23 17:14:32 +00:00
Mehdi Amini
6dfb67d640 [ThinLTO] Make sure the Context used for the ThinLTO backend has all the appropriate options
An important performance setting on the LLVMContext for LTO is
enableDebugTypeODRUniquing(), this adds an automatic merging of
debug information in the context based on type ids.

Also, the lto::Config includes a diagnostic handler that needs to
be set on the Context, as well as the setDiscardValueNames() setting.

llvm-svn: 279532
2016-08-23 16:53:34 +00:00
Pete Cooper
819dc8b79a Fix some more asserts after r279466.
That commit added a new version of Intrinsic::getName which should only
be called when the intrinsic has no overloaded types.  There are several
debugging paths, such as SDNode::dump which are printing the name of the
intrinsic but don't have the overloaded types.  These paths should be ok
to just print the name instead of crashing.

The fix here is ultimately to just add a 'None' second argument as that
calls the overload capable getName, which is less efficient, but this is a
debugging path anyway, and not perf critical.

Thanks to Björn Pettersson for pointing out that there were more crashes.

llvm-svn: 279528
2016-08-23 16:23:45 +00:00
Krzysztof Parzyszek
e2ff2bde18 [Hexagon] Packetize return value setup with the return instruction
Commit r279241 unintentionally reverted that ability.

llvm-svn: 279526
2016-08-23 16:01:01 +00:00
Xinliang David Li
df1f2a779a [Profile] refactor meta data copying/swapping code
Differential Revision: http://reviews.llvm.org/D23619

llvm-svn: 279523
2016-08-23 15:39:03 +00:00
Jacques Pienaar
c6f0b33662 [lanai] Use const instead of constexpr
The windows build bot did not like constexpr.

llvm-svn: 279517
2016-08-23 14:36:53 +00:00
Elliot Colp
5de866bfba Fix SystemZ hang caused by r279105
The change in r279105 causes an infinite loop in some cases, as it sets the upper bits of an AND mask constant, which DAGCombiner::SimplifyDemandedBits then unsets.
This patch reverts that part of the behaviour, instead relying on .td peepholes to perform the transformation to NILL. I reapplied my original fix for the problem addressed by r279105 (unsetting the upper bits, which prevents a compiler abort for a different reason).

Differential Revision: https://reviews.llvm.org/D23781

llvm-svn: 279515
2016-08-23 14:03:02 +00:00
Davide Italiano
7033df4633 [LTOCodeGenerator] Reduce code duplication. NFCI.
llvm-svn: 279514
2016-08-23 12:32:57 +00:00
NAKAMURA Takumi
7b5eac14fa LLVMLanaDesc: Update libdesp.
llvm-svn: 279510
2016-08-23 10:47:40 +00:00
NAKAMURA Takumi
c45aaefa81 Change the target's name, s/LanaiMCTargetDesc/LanaiDesc/g.
"AllTargetsDescs" in llvm-mc/CMakeLists.txt expects not ${target}MCTargetDesc, but ${target}Desc.

llvm-svn: 279509
2016-08-23 10:43:01 +00:00
Oliver Stannard
930aaa18e9 [ARM] Generate consistent frame records for Thumb2
There is not an official documented ABI for frame pointers in Thumb2,
but we should try to emit something which is useful.

We use r7 as the frame pointer for Thumb code, which currently means
that if a function needs to save a high register (r8-r11), it will get
pushed to the stack between the frame pointer (r7) and link register
(r14). This means that while a stack unwinder can follow the chain of
frame pointers up the stack, it cannot know the offset to lr, so does
not know which functions correspond to the stack frames.

To fix this, we need to push the callee-saved registers in two batches,
with the first push saving the low registers, fp and lr, and the second
push saving the high registers. This is already implemented, but
previously only used for iOS. This patch turns it on for all Thumb2
targets when frame pointers are required by the ABI, and the frame
pointer is r7 (Windows uses r11, so this isn't a problem there). If
frame pointer elimination is enabled we still emit a single push/pop
even if we need a frame pointer for other reasons, to avoid increasing
code size.

We must also ensure that lr is pushed to the stack when using a frame
pointer, so that we end up with a complete frame record. Situations that
could cause this were rare, because we already push lr in most
situations so that we can return using the pop instruction.

Differential Revision: https://reviews.llvm.org/D23516

llvm-svn: 279506
2016-08-23 09:19:22 +00:00
Daniel Berlin
4e0dc3166c GVNHoist: Use the pass version of MemorySSA and preserve it.
Summary: GVNHoist: Use the pass version of MemorySSA and preserve it.

Reviewers: sebpop, george.burgess.iv

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23782

llvm-svn: 279504
2016-08-23 05:42:41 +00:00
Matthias Braun
9b8c833657 Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.

This reverts commit r279502.

llvm-svn: 279503
2016-08-23 05:17:11 +00:00
Matthias Braun
8a769f61fb CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279502
2016-08-23 03:20:09 +00:00
Matt Arsenault
8bf97ed2ca BranchRelaxation: Fix handling of blocks with multiple conditional
branches

Looping over all terminators exposed AArch64 tests hitting
an assert from analyzeBranch failing. I believe these cases
were miscompiled before.

e.g.
  fcmp s0, s1
  b.ne LBB0_1
  b.vc LBB0_2
  b LBB0_2
LBB0_1:
  ; Large block
LBB0_2:
 ; ...

Both of the individual conditional branches need to
be expanded, since neither can reach the final block.

Split the original block into ones which analyzeBranch
will be able to understand.

llvm-svn: 279499
2016-08-23 01:30:30 +00:00
Jacques Pienaar
0caec29fcd [lanai] Exit early in Mem Alu combiner if sentinel reach.
LanaiMemAluCombiner could try to query the debug value of a list sentinel. Add check to exit early instead.

llvm-svn: 279497
2016-08-23 01:04:41 +00:00
George Burgess IV
03d63961c6 [MemorySSA] Remove unused field. NFC.
Given that we're not currently using blocker info, and whether or not we
will end up using it it is unclear, don't waste 8 (or 4) bytes of memory
per path node.

llvm-svn: 279493
2016-08-22 23:40:01 +00:00
Sanjay Patel
9f87f9ae09 [InstSimplify] add helper function for SimplifyICmpInst(); NFCI
And add a FIXME because the helper excludes folds for vectors. It's
not clear yet how many of these are actually testable (and therefore
necessary?) because later analysis uses computeKnownBits and other
methods to catch many of these cases.

llvm-svn: 279492
2016-08-22 23:12:02 +00:00
Pete Cooper
0c1334c4c9 Fix crash from assert in r279466.
The assert in r279466 checks that we call the correct version of
Intrinsic::getName.  The version which accepts only an ID should not
be used for intrinsics with overloaded types.  The global-isel
code was calling the wrong version.  The test CodeGen/AArch64/GlobalISel/arm64-irtranslator.ll
will ensure that we call the correct version from now on.

llvm-svn: 279487
2016-08-22 22:27:05 +00:00
Sanjay Patel
27f786ba44 [InstCombine] change param type from Instruction to BinaryOperator for icmp helpers; NFCI
This saves some casting in the helper functions and eases some further refactoring.

llvm-svn: 279478
2016-08-22 21:24:29 +00:00
Tim Shen
33e4d80307 [GraphTraits] Replace all NodeType usage with NodeRef
This should finish the GraphTraits migration.

Differential Revision: http://reviews.llvm.org/D23730

llvm-svn: 279475
2016-08-22 21:09:30 +00:00
Duncan P. N. Exon Smith
68b6058de5 ADT: Remove ilist_*sentinel_traits, NFC
Remove all the dead code around ilist_*sentinel_traits.  This is a
follow-up to gutting them as part of r279314 (originally r278974),
staged to prevent broken builds in sub-projects.

Uses were removed from clang in r279457 and lld in r279458.

llvm-svn: 279473
2016-08-22 20:51:00 +00:00
Sanjay Patel
1634b27769 [InstCombine] use m_APInt to allow icmp (shr exact X, Y), 0 folds for splat constant vectors
llvm-svn: 279472
2016-08-22 20:45:06 +00:00
Pete Cooper
6715a28437 Add ADT headers to the cmake headers directory for LLVMSupport. NFC.
Xcode and MSVC list the headers and source files for each library.

LLVMSupport lists included the source files for ADT but not the headers.  This
add the ADT headers so that they are browsable by the UI.

llvm-svn: 279470
2016-08-22 20:38:53 +00:00
Pete Cooper
27f0062b01 Add comments and an assert to follow-up on r279113. NFC.
Philip commented on r279113 to ask for better comments as to
when to use the different versions of getName.  Its also possible
to assert in the simple case that we aren't an overloaded intrinsic
as those have to use the more capable version of getName.

Thanks for the comments Philip.

llvm-svn: 279466
2016-08-22 20:18:28 +00:00
Matt Arsenault
321978a22d AMDGPU: Split SILowerControlFlow into two pieces
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464
2016-08-22 19:33:16 +00:00
Daniel Berlin
0d6d72f476 MSSA: Factor out phi node placement
llvm-svn: 279462
2016-08-22 19:14:30 +00:00
Daniel Berlin
56831e2edc MSSA: Only rename accesses whose defining access is nullptr
llvm-svn: 279461
2016-08-22 19:14:16 +00:00
James Molloy
b2a8710b5c [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
[Recommitting now an unrelated assertion in SROA is sorted out]

The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279460
2016-08-22 19:07:15 +00:00
James Molloy
d2a1a41c55 [SROA] Remove incorrect assertion
Confirmed with aprantl, this assertion is incorrect - code can get here (for example 80-bit FP types) and if it does it's benign. This is exposed by a completely unrelated patch of mine, so stop the compiler falling over.

Original differential: http://reviews.llvm.org/D16187
aprantl's advice to remove assertion: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160815/382129.html

llvm-svn: 279454
2016-08-22 18:49:42 +00:00
Tim Shen
d19ed3317f [SSP] Do not set __guard_local to hidden for OpenBSD SSP
__guard_local is defined as long on OpenBSD. If the source file contains
a definition of __guard_local, it mismatches with the int8 pointer type
used in LLVM. In that case, Module::getOrInsertGlobal() returns a
cast operation instead of a GlobalVariable. Trying to set the
visibility on the cast operation leads to random segfaults (seen when
compiling the OpenBSD kernel, which also runs with stack protection).

In the kernel, the hidden attribute does not matter. For userspace code,
__guard_local is defined as hidden in the startup code. If a program
re-defines __guard_local, the definition from the startup code will
either win or the linker complains about multiple definitions
(depending on whether the re-defined __guard_local is placed in the
common segment or not).

It also matches what gcc on OpenBSD does.

Thanks Stefan Kempf <sisnkemp@gmail.com> for the patch!

Differential Revision: http://reviews.llvm.org/D23674

llvm-svn: 279449
2016-08-22 18:26:27 +00:00
Jun Bum Lim
3135fa5afa [InstCombine] Allow sinking from unique predecessor with multiple edges
Summary: We can allow sinking if the single user block has only one unique predecessor, regardless of the number of edges. Note that a switch statement with multiple cases can have the same destination.

Reviewers: mcrosier, majnemer, spatel, reames

Subscribers: reames, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23722

llvm-svn: 279448
2016-08-22 18:21:56 +00:00
James Molloy
b575e6cf41 Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279443. It caused buildbot failures.

llvm-svn: 279447
2016-08-22 18:13:12 +00:00
James Molloy
d99f6d6d8b [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279443
2016-08-22 17:40:23 +00:00
Simon Pilgrim
d0c67378d9 [X86][AVX] Don't use SubVectorBroadcast if there are additional users of the chain (PR29088)
We could improve on this by making X86SubVBroadcast a full memory intrinsic similar to X86vzload

llvm-svn: 279441
2016-08-22 16:47:55 +00:00
Simon Atanasyan
53bc9e3773 [mips][ias] Support .dtprel[d]word and .tprel[d]word directives
Assembler directives .dtprelword, .dtpreldword, .tprelword, and
.tpreldword generates relocations R_MIPS_TLS_DTPREL32, R_MIPS_TLS_DTPREL64,
R_MIPS_TLS_TPREL32, and R_MIPS_TLS_TPREL64 respectively.

The main motivation for this patch is to be able to write test cases
for checking correctness of the LLD linker's behaviour.

Differential Revision: https://reviews.llvm.org/D23669

llvm-svn: 279439
2016-08-22 16:18:42 +00:00
Mehdi Amini
91a6c36524 [LTO] Constify the Module Hook function (NFC)
It use to be non-const for the sole purpose of custom handling of
commons symbol. This is moved now in the regular LTO handling now
and such we can constify the callback.

llvm-svn: 279438
2016-08-22 16:17:40 +00:00
Krzysztof Parzyszek
8a69174992 Reset isUndef when removing subreg from a def operand
llvm-svn: 279437
2016-08-22 14:50:12 +00:00
Simon Pilgrim
dc666729a5 [X86] Only accept SM_SentinelUndef (-1) as an undefined shuffle mask in range
As discussed on D23027 we should be trying to be more strict on what is an undefined mask value.

llvm-svn: 279435
2016-08-22 13:18:56 +00:00
Artur Pilipenko
7913289c6e Revert -r278267 [ValueTracking] An improvement to IR ValueTracking on Non-negative Integers
This change cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.

See https://reviews.llvm.org/D18777 for details.

llvm-svn: 279433
2016-08-22 13:14:07 +00:00
Artur Pilipenko
c66cf988a6 Revert -r278269 [IndVarSimplify] Eliminate zext of a signed IV when the IV is known to be non-negative
This change needs to be reverted in order to revert -r278267 which cause performance regression on MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt from LNT and some other bechmarks.

See comments on https://reviews.llvm.org/D18777 for details.

llvm-svn: 279432
2016-08-22 13:12:07 +00:00
Simon Pilgrim
5184b1aafe [X86][SSE] Avoid specifying unused arguments in SHUFPD lowering
As discussed on PR26491, we are missing the opportunity to make use of the smaller MOVHLPS instruction because we set both arguments of a SHUFPD when using it to lower a single input shuffle.

This patch sets the lowered argument to UNDEF if that shuffle element is undefined. This in turn makes it easier for target shuffle combining to decode UNDEF shuffle elements, allowing combines to MOVHLPS to occur.

A fix to match against MOVHPD stores was necessary as well.

This builds on the improved MOVLHPS/MOVHLPS lowering and memory folding support added in D16956

Adding similar support for SHUFPS will have to wait until have better support for target combining of binary shuffles.

Differential Revision: https://reviews.llvm.org/D23027

llvm-svn: 279430
2016-08-22 12:56:54 +00:00
Hrvoje Varga
8d40d14d1e [mips][microMIPS] Implement BLTZC, BLEZC, BGEZC and BGTZC instructions, fix disassembly and add operand checking to existing B<cond>C implementations
Differential Revision: https://reviews.llvm.org/D22667

llvm-svn: 279429
2016-08-22 12:17:59 +00:00
Davide Italiano
63fca85488 [MC] Remove guard(s). NFCI.
All the methods are already marked with
LLVM_DUMP_METHOD.

llvm-svn: 279428
2016-08-22 11:55:22 +00:00
Craig Topper
21e185006b [X86] Create a new instruction format to handle 4VOp3 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling.
llvm-svn: 279424
2016-08-22 07:38:50 +00:00