This is faster and avoids the stream and SmallString state synchronization issue.
resync() is a no-op and may be safely deleted. I'll do so in a follow-up commit.
Reviewed by Rafael Espindola.
llvm-svn: 244870
Summary: This patch moves the check of OptimizeForSize before traversing over all basic blocks in current loop. If OptimizeForSize is set to true, no non-trivial unswitch is ever allowed. Therefore, the early exit will help reduce compilation time. This patch should be NFC.
Reviewers: reames, weimingz, broune
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11997
llvm-svn: 244868
Now that we can properly promote mismatched FCOPYSIGNs (r244858), we
can mark the FP_ROUND on the result as truncating, to expose folding.
FCOPYSIGN doesn't change anything but the sign bit, so
(fp_round (fcopysign (fpext a), b))
is equivalent to (modulo the sign bit):
(fp_round (fpext a))
which is a no-op.
llvm-svn: 244862
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.
Follow-up to r243924, r243926, and r244858.
llvm-svn: 244860
We don't care about its type, and there's even a combine that'll fold
away the FP_EXTEND if we let it run. However, until it does, we'll have
something broken like:
(f32 (fp_extend (f64 v)))
Scalar f16 follow-up to r243924.
llvm-svn: 244858
We already check that vectors have the same number of elements, we
don't need to use the scalar types explicitly: comparing the size of
the whole vector is enough.
llvm-svn: 244857
code into methods on LoopIdiomRecognize.
This simplifies the code somewhat and also makes it much easier to move
the analyses around. Ultimately, the separate class wasn't providing
significant value over methods -- it contained the precondition basic
block and the current loop. The current loop is already available and
the precondition block wasn't needed everywhere and is easy to pass
around.
In several cases I just moved things to be static functions because they
already accepted most of their inputs as arguments.
This doesn't fix the way we manage analyses yet, that will be the next
patch, but it already makes the code over 50 lines shorter.
No functionality changed.
llvm-svn: 244851
complexity.
There is only one function that was called from multiple locations, and
that was 'getBranch' which has a reasonable one-line spelling already:
dyn_cast<BranchInst>(BB->getTerminator). We could make this shorter, but
it doesn't seem to add much value. Instead, we should avoid calling it
so many times on the same basic blocks, but that will be in a subsequent
patch.
The other functions are only called in one location, so inline them
there, and take advantage of this to use direct early exit and reduce
indentation. This makes it much more clear what is being tested for, and
in fact makes it clear now to me that there are simpler ways to do this
work. However, this patch just does the mechanical inlining. I'll clean
up the functionality of the code to leverage loop simplified form more
effectively in a follow-up.
Despite lots of early line breaks due to early-exit, this is still
shorter than it was before.
llvm-svn: 244841
This causes the other special members (like move and copy construction,
and move assignment) to come through for free. Some code in clang was
depending on the (deprecated, in the original code) copy ctor. Now that
there's no user-defined special members, they're all available without
any deprecation concerns.
llvm-svn: 244835
a significant code cleanup here.
The handling of analyses in this pass is overly complex and can be
simplified significantly, but the right way to do that is to simplify
all of the code not just the analyses, and that'll require pretty
extensive edits that would be noisy with formatting changes mixed into
them.
llvm-svn: 244828
This debugger was designed to catch places where the old update API was
failing to be used correctly. As I've removed the update API, it no
longer serves any purpose. We can introduce new debugging aid passes
around any future work w.r.t. updating AAs.
Note that I've updated the documentation here, but really I need to
rewrite the documentation to carefully spell out the ideas around
stateful AA and how things are changing in the AA world. However, I'm
hoping to do that as a follow-up to the refactoring of the AA
infrastructure to work in both old and new pass managers so that I can
write the documentation specific to that world.
Differential Revision: http://reviews.llvm.org/D11984
llvm-svn: 244825
To be clear: this is an *optimization* not a correctness change.
CodeGenPrep likes to duplicate icmps feeding branch instructions to take advantage of x86's ability to fuze many comparison/branch patterns into a single micro-op and to reduce the need for materializing i1s into general registers. PlaceSafepoints likes to place safepoint polls right at the end of basic blocks (immediately before terminators) when inserting entry and backedge safepoints. These two heuristics interact in a somewhat unfortunate way where the branch terminating the original block will be controlled by a condition driven by unrelocated pointers. This forces the register allocator to keep both the relocated and unrelocated values of the pointers feeding the icmp alive over the safepoint poll.
One simple fix would have been to just adjust PlaceSafepoints to move one back in the basic block, but you can reach similar cases as a result of LICM or other hoisting passes. As a result, doing a post insertion fixup seems to be more robust.
I considered doing this in CodeGenPrep itself, but having to update the live sets of already rewritten safepoints gets complicated fast. In particular, you can't just use def/use information because by moving the icmp, we're extending the live range of it's inputs potentially.
Instead, this patch teaches RewriteStatepointsForGC to make the required adjustments before making the relocations explicit in the IR. This change really highlights the fact that RSForGC is a CodeGenPrep-like pass which is performing target specific lowering. In the long run, we may even want to combine the two though this would require a lot more smarts to be integrated into RSForGC first. We currently rely on being able to run a set of cleanup passes post rewriting because the IR RSForGC generates is pretty damn ugly.
Differential Revision: http://reviews.llvm.org/D11819
llvm-svn: 244821
This commit moves the code that parses the frame indices for the fixed stack
objects from the method 'parseFixedStackObjectOperand' to a new method named
'parseFixedStackFrameIndex', so that it can be reused when parsing fixed stack
pseudo source values.
llvm-svn: 244814
When rewriting the IR such that base pointers are available for every live pointer, we potentially need to duplicate instructions to propagate the base. The original code had only handled PHI and Select under the belief those were the only instructions which would need duplicated. When I added support for vector instructions, I'd added a collection of hacks for ExtractElement which caught most of the common cases. Of course, I then found the one test case my hacks couldn't cover. :)
This change removes all of the early hacks for extract element. By defining extractelement as a BDV (rather than trying to look through it), we can extend the rewriting algorithm to duplicate the extract as needed. Note that a couple of peephole optimizations were left in for the moment, because while we now handle extractelement as a first class citizen, we're not yet handling insertelement. That change will follow in the near future.
llvm-svn: 244808
AliasAnalysis.
Same as the other commits, the TLI access from an alias analysis is
going away and isn't very clean -- it is better to explicitly mark the
dependencies.
llvm-svn: 244785
just depend on it directly.
This was particularly frustrating because there was a really wide
mixture of using a member variable and re-extracting it from the AA that
happened to be around. I think the result is much more clear.
I've also deleted all of the pointless null checks and used references
across the APIs where I could to make it explicit that this cannot be
null in a useful fashion.
llvm-svn: 244780
Summary:
D11924 implemented part of the floating-point comparisons, this patch implements the rest:
* Tell ISelLowering that all booleans are either 0 or 1.
* Expand the eq/ne/lt/le/gt/ge floating-point comparisons to the canonical ones (similar to what Mips32r6InstrInfo.td does).
* Add tests for ord/uno.
* Add tests for ueq/one/ult/ule/ugt/uge.
* Fix existing comparison tests to remove the (res & 1) code, which setBooleanContents stops from generating.
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D11970
llvm-svn: 244779
relying on sneaking it out of its AliasAnalysis.
This abuse of AA (to shuffle TLI around rather than explicitly depending
on it) is going away with my refactor of AA.
llvm-svn: 244778
r243382 changed the behavior to always require a set of memchecks to be
passed to LoopVer. This change restores the prior behavior as an
alternative to the new behavior. This allows the checks to be
implicitly taken from the LAA object.
Patch by Ashutosh Nema!
llvm-svn: 244763
r242520 was reverted in r244313 as the expected behaviour of the alias
attribute in C is that the alias has the same size as the aliasee. However
we can re-introduce adding the size on the alias when the aliasee does not,
from a source code or object perspective, exist as a discrete entity. This
happens when the aliasee is not a symbol, or when that symbol is private.
Differential Revision: http://reviews.llvm.org/D11943
llvm-svn: 244752
On Mach-O emitting aliases for the variables that make up a MergedGlobals
variable can cause problems when linking with dead stripping enabled so don't
do that, except for external variables where we must emit an alias.
llvm-svn: 244748