1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 10:32:48 +02:00
Commit Graph

74 Commits

Author SHA1 Message Date
Stephen Tozer
27b2ac2965 [DebugInfo] Correctly update debug users of SSA values in tail duplication
During tail duplication, SSA values may be updated and have their uses
replaced with a virtual register, and any debug instructions that use
that value are deleted. This patch fixes the implementation of the debug
instruction deletion to work correctly for debug instructions that use
the SSA value multiple times, by batching deletions so that we don't
attempt to delete the same instruction twice.

Differential Revision: https://reviews.llvm.org/D106557
2021-07-26 17:27:57 +01:00
Hongtao Yu
7fbb587058 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Hongtao Yu
f2b87eabed [CSSPGO] Unblocking optimizations by dangling pseudo probes.
This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

	1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
	2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
	3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D97481
2021-03-03 22:44:42 -08:00
Kazu Hirata
63f16fef3a [CodeGen] Use range-based for loops (NFC) 2021-02-19 22:44:14 -08:00
Kazu Hirata
183de26ac5 [CodeGen] Construct SmallVector with iterator ranges (NFC) 2020-12-31 09:39:11 -08:00
Bill Wendling
0be97dba3f [CodeGen][TailDuplicator] Don't duplicate blocks with INLINEASM_BR
Tail duplication of a block with an INLINEASM_BR may result in a PHI
node on the indirect branch. This is okay, but it also introduces a copy
for that PHI node *after* the INLINEASM_BR, which is not okay.

See: https://github.com/ClangBuiltLinux/linux/issues/1125

Differential Revision: https://reviews.llvm.org/D88823
2020-10-06 18:44:59 -07:00
James Y Knight
af0734bc33 Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
Matt Arsenault
692c52cacb TailDuplicator: Use Register 2020-06-30 12:13:08 -04:00
James Y Knight
f92fad214d MachineBasicBlock::updateTerminator now requires an explicit layout successor.
Previously, it tried to infer the correct destination block from the
successor list, but this is a rather tricky propspect, given the
existence of successors that occur mid-block, such as invoke, and
potentially in the future, callbr/INLINEASM_BR. (INLINEASM_BR, in
particular would be problematic, because its successor blocks are not
distinct from "normal" successors, as EHPads are.)

Instead, require the caller to pass in the expected fallthrough
successor explicitly. In most callers, the correct block is
immediately clear. But, in MachineBlockPlacement, we do need to record
the original ordering, before starting to reorder blocks.

Unfortunately, the goal of decoupling the behavior of end-of-block
jumps from the successor list has not been fully accomplished in this
patch, as there is currently no other way to determine whether a block
is intended to fall-through, or end as unreachable. Further work is
needed there.

Differential Revision: https://reviews.llvm.org/D79605
2020-06-06 22:30:51 -04:00
Djordje Todorovic
ca5f7d0a2f [CallSiteInfo] Handle bundles when updating call site info
This will address the issue: P8198 and P8199 (from D73534).

The methods was not handle bundles properly.

Differential Revision: https://reviews.llvm.org/D74904
2020-02-27 13:57:06 +01:00
Djordje Todorovic
2799d4faca [CSInfo][TailDuplicator] Delete the call site info when removing dead MBBs
This is needed for the debug entry values feature.

Differential Revision: https://reviews.llvm.org/D74702
2020-02-18 12:29:51 +01:00
Guozhi Wei
af1f058dcb [MBP] Partial tail duplication into hot predecessors
Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors.

This patch improves tail duplication in 3 aspects:

  A successor can be duplicated into part of its predecessors.
  A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only.
  If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors.

Differential Revision: https://reviews.llvm.org/D73387
2020-02-12 15:22:33 -08:00
Hiroshi Yamauchi
ae6be9df60 [PGO][PGSO] Handle MBFIWrapper
Some code gen passes use MBFIWrapper to keep track of the frequency of new
blocks. This was not taken into account and could lead to incorrect frequencies
as MBFI silently returns zero frequency for unknown/new blocks.

Add a variant for MBFIWrapper in the PGSO query interface.

Depends on D73494.
2020-01-31 09:36:55 -08:00
Stanislav Mekhanoshin
33946d17b3 Process BUNDLE in tail duplication
When tail duplication estimates a size of tail it uses instruction
count. Account for a number of instrictions in a bundle too.

Differential Revision: https://reviews.llvm.org/D72783
2020-01-15 15:46:57 -08:00
Hiroshi Yamauchi
1aa49b7b60 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

A second try after reverted D71072.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71149
2019-12-09 12:42:59 -08:00
Hiroshi Yamauchi
7e81b42fc1 Revert "[PGO][PGSO] Instrument the code gen / target passes."
This reverts commit 9a0b5e14075a1f42a72eedb66fd4fde7985d37ac.

This seems to break buildbots.
2019-12-06 12:17:32 -08:00
Hiroshi Yamauchi
800ab3625d [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71072
2019-12-06 10:43:39 -08:00
Daniel Sanders
f8a414589e Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Daniel Sanders
61bbdae349 Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
2019-08-01 23:27:28 +00:00
Amara Emerson
207f6805ea [TailDuplicator] Fix copy instruction emitting into the wrong block.
The code for duplicating instructions could sometimes try to emit copies
intended to deal with unconstrainable register classes to the tail block of the
original instruction, rather than before the newly cloned instruction in the
predecessor block.

This was exposed by GlobalISel on arm64.

Differential Revision: https://reviews.llvm.org/D64049

llvm-svn: 364888
2019-07-02 06:04:46 +00:00
Matt Arsenault
75b036d04d TailDuplicator: Remove no-op analyzeBranch call
This could fail, which looked concerning. However nothing was actually
using the results of this. I assume this was intended to use the
anti-feature of analyzeBranch of removing instructions, but wasn't
actually calling it with AllowModify = true.

Fixes bug 42162.

llvm-svn: 362800
2019-06-07 13:33:34 +00:00
Evandro Menezes
1fd81231ce [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Chandler Carruth
ae65e281f3 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Nicola Zaghen
9667127c14 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Petar Jovanovic
f59a462ea6 [DWARF] Allow duplication of tails with CFI instructions
This commit came as a result for revert of patch r317579 (originally
committed as r317100). The patch made CFI instructions duplicable, because
their existence in the epilogue block was affecting the Tail duplication
pass. However, duplicating blocks with CFI instructions was an issue for
compact unwind info on Darwin, which is why the patch was reverted.

This patch allows duplicating tails with CFI instructions, though they are
not duplicable, by copying them 'manually'.


Patch by Djordje Kovacevic.

Differential Revision: https://reviews.llvm.org/D40979

llvm-svn: 323883
2018-01-31 15:57:57 +00:00
Matthias Braun
ddd8ed6709 MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Francis Visoiu Mistrih
30264d4391 [CodeGen] Unify MBB reference format in both MIR and debug output
As part of the unification of the debug format and the MIR format, print
MBB references as '%bb.5'.

The MIR printer prints the IR name of a MBB only for block definitions.

* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber\(\)/" << printMBBReference(*\1)/g'
* find . \( -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber\(\)/" << printMBBReference(\1)/g'
* find . \( -name "*.txt" -o -name "*.s" -o -name "*.mir" -o -name "*.cpp" -o -name "*.h" -o -name "*.ll" \) -type f -print0 | xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g'
* grep -nr 'BB#' and fix

Differential Revision: https://reviews.llvm.org/D40422

llvm-svn: 319665
2017-12-04 17:18:51 +00:00
David Blaikie
e01dc73ad2 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Reid Kleckner
c37cd8d78d Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317579, originally committed as r317100.

There is a design issue with marking CFI instructions duplicatable. Not
all targets support the CFIInstrInserter pass, and targets like Darwin
can't cope with duplicated prologue setup CFI instructions. The compact
unwind info emission fails.

When the following code is compiled for arm64 on Mac at -O3, the CFI
instructions end up getting tail duplicated, which causes compact unwind
info emission to fail:
  int a, c, d, e, f, g, h, i, j, k, l, m;
  void n(int o, int *b) {
    if (g)
      f = 0;
    for (; f < o; f++) {
      m = a;
      if (l > j * k > i)
        j = i = k = d;
      h = b[c] - e;
    }
  }

We get assembly that looks like this:
; BB#1:                                 ; %if.then
Lloh3:
	adrp	x9, _f@GOTPAGE
Lloh4:
	ldr	x9, [x9, _f@GOTPAGEOFF]
	mov	 w8, wzr
Lloh5:
	str		wzr, [x9]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.lt	LBB0_3
	b	LBB0_7
LBB0_2:                                 ; %entry.if.end_crit_edge
Lloh6:
	adrp	x8, _f@GOTPAGE
Lloh7:
	ldr	x8, [x8, _f@GOTPAGEOFF]
Lloh8:
	ldr		w8, [x8]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.ge	LBB0_7
LBB0_3:                                 ; %for.body.lr.ph

Note the multiple .cfi_def* directives. Compact unwind info emission
can't handle that.

llvm-svn: 317726
2017-11-08 21:31:14 +00:00
David Blaikie
45b647d5eb Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering
This header includes CodeGen headers, and is not, itself, included by
any Target headers, so move it into CodeGen to match the layering of its
implementation.

llvm-svn: 317647
2017-11-08 01:01:31 +00:00
Petar Jovanovic
5b6a90db77 Reland "Correct dwarf unwind information in function epilogue for X86"
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().

Original r317100 message:

"Correct dwarf unwind information in function epilogue for X86"

This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.

llvm-svn: 317579
2017-11-07 14:40:27 +00:00
Petar Jovanovic
8c7ceb5bd9 Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf
buildbot failure (build #15606).

llvm-svn: 317136
2017-11-01 23:05:52 +00:00
Petar Jovanovic
d61746acde Correct dwarf unwind information in function epilogue for X86
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.


Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D35844

llvm-svn: 317100
2017-11-01 16:04:11 +00:00
Matthias Braun
733301855b Add test case for r311511
This also changes the TailDuplicator to be configured explicitely
pre/post regalloc rather than relying on the isSSA() flag. This was
necessary to have `llc -run-pass` work reliably.

llvm-svn: 311520
2017-08-23 03:17:59 +00:00
Matthias Braun
ccf939b56f TargetInstrInfo: Change duplicate() to work on bundles.
Adds infrastructure to clone whole instruction bundles rather than just
single instructions. This fixes a bug where tail duplication would
unbundle instructions while cloning.

This should unbreak the "Clang Stage 1: cmake, RA, with expensive checks
enabled" build on greendragon. The bot broke with r311139 hitting this
pre-existing bug.

A proper testcase will come next.

llvm-svn: 311511
2017-08-22 23:56:30 +00:00
Daniel Jasper
8d76d09d77 Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue"
I am 99% sure that this breaks the PPC ASAN build bot:
http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio

If it doesn't go back to green, we can recommit (and fix the original
commit message at the same time :) ).

llvm-svn: 306676
2017-06-29 13:58:24 +00:00
Petar Jovanovic
0199002e6e [X86] Correct dwarf unwind information in function epilogue
CFI instructions that set appropriate cfa offset and cfa register are now
inserted in emitEpilogue() in X86FrameLowering.

Majority of the changes in this patch:

1. Ensure that CFI instructions do not affect code generation.
2. Enable maintaining correct information about cfa offset and cfa register
in a function when basic blocks are reordered, merged, split, duplicated.

These changes are target independent and described below.

Changed CFI instructions so that they:

1. are duplicable
2. are not counted as instructions when tail duplicating or tail merging
3. can be compared as equal

Add information to each MachineBasicBlock about cfa offset and cfa register
that are valid at its entry and exit (incoming and outgoing CFI info). Add
support for updating this information when basic blocks are merged, split,
duplicated, created. Add a verification pass (CFIInfoVerifier) that checks
that outgoing cfa offset and register of predecessor blocks match incoming
values of their successors.

Incoming and outgoing CFI information is used by a late pass
(CFIInstrInserter) that corrects CFA calculation rule for a basic block if
needed. That means that additional CFI instructions get inserted at basic
block beginning to correct the rule for calculating CFA. Having CFI
instructions in function epilogue can cause incorrect CFA calculation rule
for some basic blocks. This can happen if, due to basic block reordering,
or the existence of multiple epilogue blocks, some of the blocks have wrong
cfa offset and register values set by the epilogue block above them.

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D18046

llvm-svn: 306529
2017-06-28 10:21:17 +00:00
Eugene Zelenko
88025fac90 [CodeGen] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 304954
2017-06-07 23:53:32 +00:00
Vitaly Buka
4a37d0821b [CodeGen] Fix uninitialized variables exposed by r303084
All other calls of analyzeBranch reset PredTBB and PredFBB, so I assume it's
expected behavior.

llvm-svn: 303581
2017-05-22 21:33:54 +00:00
Taewook Oh
b736eafd3f [TailDuplicator] Maintain DebugLoc for branch instructions
Summary: Existing implementation of duplicateSimpleBB function drops DebugLoc metadata of branch instructions during the transformation. This patch addresses this issue by making newly created branch instructions to keep the metadata of replaced branch instructions.

Reviewers: qcolombet, craig.topper, aprantl, MatzeB, sanjoy, dblaikie

Reviewed By: dblaikie

Subscribers: dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D30026

llvm-svn: 296371
2017-02-27 19:30:01 +00:00
Sanjoy Das
970ff026f0 [MachineBlockPlacement] Don't make blocks "uneditable"
Summary:
This fixes an issue with MachineBlockPlacement due to a badly timed call
to `analyzeBranch` with `AllowModify` set to true.  The timeline is as
follows:

 1. `MachineBlockPlacement::maybeTailDuplicateBlock` calls
    `TailDup.shouldTailDuplicate` on its argument, which in turn calls
    `analyzeBranch` with `AllowModify` set to true.

 2. This `analyzeBranch` call edits the terminator sequence of the block
    based on the physical layout of the machine function, turning an
    unanalyzable non-fallthrough block to a unanalyzable fallthrough
    block.  Normally MBP bails out of rearranging such blocks, but this
    block was unanalyzable non-fallthrough (and thus rearrangeable) the
    first time MBP looked at it, and so it goes ahead and decides where
    it should be placed in the function.

 3. When placing this block MBP fails to analyze and thus update the
    block in keeping with the new physical layout.

Concretely, before (1) we have something like:

```
LBL0:
  < unknown terminator op that may branch to LBL1 >
  jmp LBL1

LBL1:
  ... A

LBL2:
  ... B
```

In (2), analyze branch simplifies this to

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump removed

LBL1:
  ... A

LBL2:
  ... B
```

In (3), MachineBlockPlacement goes ahead with its plan of putting LBL2
after the first block since that is profitable.

```
LBL0:
  < unknown terminator op that may branch to LBL2 >
  ;; jmp LBL1 <- redundant jump

LBL2:
  ... B

LBL1:
  ... A
```

and the program now has incorrect behavior (we no longer fall-through
from `LBL0` to `LBL1`) because MBP can no longer edit LBL0.

There are several possible solutions, but I went with removing the teeth
off of the `analyzeBranch` calls in TailDuplicator.  That makes thinking
about the result of these calls easier, and breaks nothing in the lit
test suite.

I've also added some bookkeeping to the MachineBlockPlacement pass and
used that to write an assert that would have caught this.

Reviewers: chandlerc, gberry, MatzeB, iteratee

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D27783

llvm-svn: 289764
2016-12-15 05:08:57 +00:00
Kyle Butt
abb4b9d9ee Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283934
2016-10-11 20:36:43 +00:00
Daniel Jasper
80e95dce89 Revert "Codegen: Tail-duplicate during placement."
This reverts commit r283842.

test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
internal testing. I'll share a link with you.

llvm-svn: 283857
2016-10-11 07:36:11 +00:00
Kyle Butt
f6ea6695ce Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283842
2016-10-11 01:20:33 +00:00
Kyle Butt
db90994d08 Revert "Codegen: Tail-duplicate during placement."
This reverts commit 71c312652c10f1855b28d06697c08d47e7a243e4.

llvm-svn: 283647
2016-10-08 01:47:05 +00:00
Kyle Butt
71644ee1e5 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.

Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283619
2016-10-07 22:33:20 +00:00
Kyle Butt
83411c9c60 Revert "Codegen: Tail-duplicate during placement."
This reverts commit 062ace9764953e9769142c1099281a345f9b6bdc.

Issue with loop info and block removal revealed by polly.
I have a fix for this issue already in another patch, I'll re-roll this
together with that fix, and a test case.

llvm-svn: 283292
2016-10-05 01:39:29 +00:00
Kyle Butt
8590c55a7e Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

Issue from previous rollback fixed, and a new test was added for that
case as well.

Differential revision: https://reviews.llvm.org/D18226

llvm-svn: 283274
2016-10-04 23:54:18 +00:00
Kyle Butt
821db2945e Revert "Codegen: Tail-duplicate during placement."
This reverts commit ff234efbe23528e4f4c80c78057b920a51f434b2.

Causing crashes on aarch64 build.

llvm-svn: 283172
2016-10-04 00:38:23 +00:00
Kyle Butt
632cfe0b14 Codegen: Tail-duplicate during placement.
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.

In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.

This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.

llvm-svn: 283164
2016-10-04 00:00:09 +00:00