[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
//===---- BDCE.cpp - Bit-tracking dead code elimination -------------------===//
|
|
|
|
//
|
2019-01-19 09:50:56 +01:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
|
|
|
// This file implements the Bit-Tracking Dead Code Elimination pass. Some
|
|
|
|
// instructions (shifts, some ands, ors, etc.) kill some of their input bits.
|
|
|
|
// We track these dead bits and remove instructions that compute only these
|
2020-07-10 09:01:04 +02:00
|
|
|
// dead bits. We also simplify sext that generates unused extension bits,
|
|
|
|
// converting it to a zext.
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2016-05-25 03:57:04 +02:00
|
|
|
#include "llvm/Transforms/Scalar/BDCE.h"
|
2017-08-12 18:41:08 +02:00
|
|
|
#include "llvm/ADT/SmallPtrSet.h"
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
#include "llvm/ADT/SmallVector.h"
|
|
|
|
#include "llvm/ADT/Statistic.h"
|
2015-08-14 13:09:09 +02:00
|
|
|
#include "llvm/Analysis/DemandedBits.h"
|
2016-05-25 03:57:04 +02:00
|
|
|
#include "llvm/Analysis/GlobalsModRef.h"
|
2020-07-10 09:01:04 +02:00
|
|
|
#include "llvm/IR/IRBuilder.h"
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
#include "llvm/IR/InstIterator.h"
|
|
|
|
#include "llvm/IR/Instructions.h"
|
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 22:15:01 +01:00
|
|
|
#include "llvm/InitializePasses.h"
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
#include "llvm/Pass.h"
|
|
|
|
#include "llvm/Support/Debug.h"
|
|
|
|
#include "llvm/Support/raw_ostream.h"
|
2016-05-25 03:57:04 +02:00
|
|
|
#include "llvm/Transforms/Scalar.h"
|
Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 22:15:01 +01:00
|
|
|
#include "llvm/Transforms/Utils/Local.h"
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
using namespace llvm;
|
|
|
|
|
|
|
|
#define DEBUG_TYPE "bdce"
|
|
|
|
|
|
|
|
STATISTIC(NumRemoved, "Number of instructions removed (unused)");
|
|
|
|
STATISTIC(NumSimplified, "Number of instructions trivialized (dead bits)");
|
2020-07-10 09:01:04 +02:00
|
|
|
STATISTIC(NumSExt2ZExt,
|
|
|
|
"Number of sign extension instructions converted to zero extension");
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
|
2017-08-12 18:41:08 +02:00
|
|
|
/// If an instruction is trivialized (dead), then the chain of users of that
|
|
|
|
/// instruction may need to be cleared of assumptions that can no longer be
|
|
|
|
/// guaranteed correct.
|
|
|
|
static void clearAssumptionsOfUsers(Instruction *I, DemandedBits &DB) {
|
2018-12-07 16:38:13 +01:00
|
|
|
assert(I->getType()->isIntOrIntVectorTy() &&
|
|
|
|
"Trivializing a non-integer value?");
|
2017-08-12 18:41:08 +02:00
|
|
|
|
|
|
|
// Initialize the worklist with eligible direct users.
|
2019-03-07 07:38:03 +01:00
|
|
|
SmallPtrSet<Instruction *, 16> Visited;
|
2017-08-12 18:41:08 +02:00
|
|
|
SmallVector<Instruction *, 16> WorkList;
|
|
|
|
for (User *JU : I->users()) {
|
2017-08-14 17:13:46 +02:00
|
|
|
// If all bits of a user are demanded, then we know that nothing below that
|
|
|
|
// in the def-use chain needs to be changed.
|
2017-08-12 18:41:08 +02:00
|
|
|
auto *J = dyn_cast<Instruction>(JU);
|
2018-12-07 16:38:13 +01:00
|
|
|
if (J && J->getType()->isIntOrIntVectorTy() &&
|
2019-03-07 07:38:03 +01:00
|
|
|
!DB.getDemandedBits(J).isAllOnesValue()) {
|
|
|
|
Visited.insert(J);
|
2017-08-12 18:41:08 +02:00
|
|
|
WorkList.push_back(J);
|
2019-03-07 07:38:03 +01:00
|
|
|
}
|
2017-08-16 18:09:22 +02:00
|
|
|
|
2018-12-07 16:38:13 +01:00
|
|
|
// Note that we need to check for non-int types above before asking for
|
2017-08-16 18:09:22 +02:00
|
|
|
// demanded bits. Normally, the only way to reach an instruction with an
|
2018-12-07 16:38:13 +01:00
|
|
|
// non-int type is via an instruction that has side effects (or otherwise
|
2017-08-16 18:09:22 +02:00
|
|
|
// will demand its input bits). However, if we have a readnone function
|
|
|
|
// that returns an unsized type (e.g., void), we must avoid asking for the
|
|
|
|
// demanded bits of the function call's return value. A void-returning
|
|
|
|
// readnone function is always dead (and so we can stop walking the use/def
|
|
|
|
// chain here), but the check is necessary to avoid asserting.
|
2017-08-12 18:41:08 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
// DFS through subsequent users while tracking visits to avoid cycles.
|
|
|
|
while (!WorkList.empty()) {
|
|
|
|
Instruction *J = WorkList.pop_back_val();
|
|
|
|
|
|
|
|
// NSW, NUW, and exact are based on operands that might have changed.
|
|
|
|
J->dropPoisonGeneratingFlags();
|
|
|
|
|
|
|
|
// We do not have to worry about llvm.assume or range metadata:
|
|
|
|
// 1. llvm.assume demands its operand, so trivializing can't change it.
|
|
|
|
// 2. range metadata only applies to memory accesses which demand all bits.
|
|
|
|
|
|
|
|
for (User *KU : J->users()) {
|
2017-08-14 17:13:46 +02:00
|
|
|
// If all bits of a user are demanded, then we know that nothing below
|
|
|
|
// that in the def-use chain needs to be changed.
|
2017-08-12 18:41:08 +02:00
|
|
|
auto *K = dyn_cast<Instruction>(KU);
|
2019-03-07 07:38:03 +01:00
|
|
|
if (K && Visited.insert(K).second && K->getType()->isIntOrIntVectorTy() &&
|
2017-08-16 18:09:22 +02:00
|
|
|
!DB.getDemandedBits(K).isAllOnesValue())
|
2017-08-12 18:41:08 +02:00
|
|
|
WorkList.push_back(K);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-05-25 03:57:04 +02:00
|
|
|
static bool bitTrackingDCE(Function &F, DemandedBits &DB) {
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
SmallVector<Instruction*, 128> Worklist;
|
|
|
|
bool Changed = false;
|
2015-08-06 21:10:45 +02:00
|
|
|
for (Instruction &I : instructions(F)) {
|
[BDCE/DebugInfo] Preserve llvm.dbg.value's argument.
BDCE has two phases:
1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so,
replaces all its uses with the constant zero.
2. Then, it asks SimplifyDemandedBits again if the instruction is really dead
(no side effects etc..) and if so, eliminates it.
Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use:
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17
->
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17
but not eliminating the call because it may have arbitrary side effects.
In other words, we lose some debug informations.
This patch fixes the problem making sure that BDCE does nothing with the instruction if
it has side effects and no non-dbg uses.
Differential Revision: https://reviews.llvm.org/D27471
llvm-svn: 288851
2016-12-06 22:52:47 +01:00
|
|
|
// If the instruction has side effects and no non-dbg uses,
|
2016-12-07 22:47:32 +01:00
|
|
|
// skip it. This way we avoid computing known bits on an instruction
|
|
|
|
// that will not help us.
|
[BDCE/DebugInfo] Preserve llvm.dbg.value's argument.
BDCE has two phases:
1. It asks SimplifyDemandedBits if all the bits of an instruction are dead, and if so,
replaces all its uses with the constant zero.
2. Then, it asks SimplifyDemandedBits again if the instruction is really dead
(no side effects etc..) and if so, eliminates it.
Now, in 1) if all the bits of an instruction are dead, we may end up replacing a dbg use:
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 %call, i64 0, metadata !8, metadata !16), !dbg !17
->
%call = tail call i32 (...) @g() #4, !dbg !15
tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !8, metadata !16), !dbg !17
but not eliminating the call because it may have arbitrary side effects.
In other words, we lose some debug informations.
This patch fixes the problem making sure that BDCE does nothing with the instruction if
it has side effects and no non-dbg uses.
Differential Revision: https://reviews.llvm.org/D27471
llvm-svn: 288851
2016-12-06 22:52:47 +01:00
|
|
|
if (I.mayHaveSideEffects() && I.use_empty())
|
|
|
|
continue;
|
|
|
|
|
2019-01-02 21:02:14 +01:00
|
|
|
// Remove instructions that are dead, either because they were not reached
|
|
|
|
// during analysis or have no demanded bits.
|
|
|
|
if (DB.isInstructionDead(&I) ||
|
|
|
|
(I.getType()->isIntOrIntVectorTy() &&
|
|
|
|
DB.getDemandedBits(&I).isNullValue() &&
|
|
|
|
wouldInstructionBeTriviallyDead(&I))) {
|
2020-06-08 19:44:11 +02:00
|
|
|
salvageDebugInfo(I);
|
Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
2019-01-01 11:05:26 +01:00
|
|
|
Worklist.push_back(&I);
|
|
|
|
I.dropAllReferences();
|
|
|
|
Changed = true;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2020-07-10 09:01:04 +02:00
|
|
|
// Convert SExt into ZExt if none of the extension bits is required
|
|
|
|
if (SExtInst *SE = dyn_cast<SExtInst>(&I)) {
|
|
|
|
APInt Demanded = DB.getDemandedBits(SE);
|
|
|
|
const uint32_t SrcBitSize = SE->getSrcTy()->getScalarSizeInBits();
|
|
|
|
auto *const DstTy = SE->getDestTy();
|
|
|
|
const uint32_t DestBitSize = DstTy->getScalarSizeInBits();
|
|
|
|
if (Demanded.countLeadingZeros() >= (DestBitSize - SrcBitSize)) {
|
|
|
|
clearAssumptionsOfUsers(SE, DB);
|
|
|
|
IRBuilder<> Builder(SE);
|
|
|
|
I.replaceAllUsesWith(
|
|
|
|
Builder.CreateZExt(SE->getOperand(0), DstTy, SE->getName()));
|
|
|
|
Worklist.push_back(SE);
|
|
|
|
Changed = true;
|
|
|
|
NumSExt2ZExt++;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
2019-01-01 11:05:26 +01:00
|
|
|
for (Use &U : I.operands()) {
|
|
|
|
// DemandedBits only detects dead integer uses.
|
|
|
|
if (!U->getType()->isIntOrIntVectorTy())
|
|
|
|
continue;
|
|
|
|
|
2019-01-04 22:21:43 +01:00
|
|
|
if (!isa<Instruction>(U) && !isa<Argument>(U))
|
Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
2019-01-01 11:05:26 +01:00
|
|
|
continue;
|
|
|
|
|
|
|
|
if (!DB.isUseDead(&U))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
LLVM_DEBUG(dbgs() << "BDCE: Trivializing: " << U << " (all bits dead)\n");
|
2017-08-12 18:41:08 +02:00
|
|
|
|
|
|
|
clearAssumptionsOfUsers(&I, DB);
|
|
|
|
|
2015-08-14 13:09:09 +02:00
|
|
|
// FIXME: In theory we could substitute undef here instead of zero.
|
|
|
|
// This should be reconsidered once we settle on the semantics of
|
|
|
|
// undef, poison, etc.
|
Reapply "[BDCE][DemandedBits] Detect dead uses of undead instructions"
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.
BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.
The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.
The previous attempt to land this lead to miscompiles, because cases
where uses were initially dead but were later found to be live during
further analysis were not always correctly removed from the DeadUses
set. This is fixed now and the added test case demanstrates such an
instance.
Differential Revision: https://reviews.llvm.org/D55563
llvm-svn: 350188
2019-01-01 11:05:26 +01:00
|
|
|
U.set(ConstantInt::get(U->getType(), 0));
|
2015-08-14 13:09:09 +02:00
|
|
|
++NumSimplified;
|
|
|
|
Changed = true;
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
for (Instruction *&I : Worklist) {
|
|
|
|
++NumRemoved;
|
|
|
|
I->eraseFromParent();
|
|
|
|
}
|
|
|
|
|
|
|
|
return Changed;
|
|
|
|
}
|
|
|
|
|
2016-05-25 03:57:04 +02:00
|
|
|
PreservedAnalyses BDCEPass::run(Function &F, FunctionAnalysisManager &AM) {
|
|
|
|
auto &DB = AM.getResult<DemandedBitsAnalysis>(F);
|
2016-05-31 19:53:22 +02:00
|
|
|
if (!bitTrackingDCE(F, DB))
|
|
|
|
return PreservedAnalyses::all();
|
|
|
|
|
2017-01-15 07:32:49 +01:00
|
|
|
PreservedAnalyses PA;
|
|
|
|
PA.preserveSet<CFGAnalyses>();
|
2016-05-31 19:53:22 +02:00
|
|
|
PA.preserve<GlobalsAA>();
|
|
|
|
return PA;
|
[BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.
Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).
The motivation for this is a case like:
int __attribute__((const)) foo(int i);
int bar(int x) {
x |= (4 & foo(5));
x |= (8 & foo(3));
x |= (16 & foo(2));
x |= (32 & foo(1));
x |= (64 & foo(0));
x |= (128& foo(4));
return x >> 4;
}
As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).
I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).
I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark
llvm-svn: 229462
2015-02-17 02:36:59 +01:00
|
|
|
}
|
|
|
|
|
2016-05-25 03:57:04 +02:00
|
|
|
namespace {
|
|
|
|
struct BDCELegacyPass : public FunctionPass {
|
|
|
|
static char ID; // Pass identification, replacement for typeid
|
|
|
|
BDCELegacyPass() : FunctionPass(ID) {
|
|
|
|
initializeBDCELegacyPassPass(*PassRegistry::getPassRegistry());
|
|
|
|
}
|
|
|
|
|
|
|
|
bool runOnFunction(Function &F) override {
|
|
|
|
if (skipFunction(F))
|
|
|
|
return false;
|
|
|
|
auto &DB = getAnalysis<DemandedBitsWrapperPass>().getDemandedBits();
|
|
|
|
return bitTrackingDCE(F, DB);
|
|
|
|
}
|
|
|
|
|
|
|
|
void getAnalysisUsage(AnalysisUsage &AU) const override {
|
|
|
|
AU.setPreservesCFG();
|
|
|
|
AU.addRequired<DemandedBitsWrapperPass>();
|
|
|
|
AU.addPreserved<GlobalsAAWrapperPass>();
|
|
|
|
}
|
|
|
|
};
|
|
|
|
}
|
|
|
|
|
|
|
|
char BDCELegacyPass::ID = 0;
|
|
|
|
INITIALIZE_PASS_BEGIN(BDCELegacyPass, "bdce",
|
|
|
|
"Bit-Tracking Dead Code Elimination", false, false)
|
|
|
|
INITIALIZE_PASS_DEPENDENCY(DemandedBitsWrapperPass)
|
|
|
|
INITIALIZE_PASS_END(BDCELegacyPass, "bdce",
|
|
|
|
"Bit-Tracking Dead Code Elimination", false, false)
|
|
|
|
|
|
|
|
FunctionPass *llvm::createBitTrackingDCEPass() { return new BDCELegacyPass(); }
|