The loop vectorizer usually vectorizes any instruction it can and then
extracts the elements for a scalarized use. On SystemZ, all elements
containing addresses must be extracted into address registers (GRs). Since
this extraction is not free, it is better to have the address in a suitable
register to begin with. By forcing address arithmetic instructions and loads
of addresses to be scalar after vectorization, two benefits result:
* No need to extract the register
* LSR optimizations trigger (LSR isn't handling vector addresses currently)
Benchmarking show improvements on SystemZ with this new behaviour.
Any other target could try this by returning false in the new hook
prefersVectorizedAddressing().
Review: Renato Golin, Elena Demikhovsky, Ulrich Weigand
https://reviews.llvm.org/D32422
llvm-svn: 303744
When folding arguments of AddExpr or MulExpr with recurrences, we rely on the fact that
the loop of our base recurrency is the bottom-lost in terms of domination. This assumption
may be broken by an expression which is treated as invariant, and which depends on a complex
Phi for which SCEVUnknown was created. If such Phi is a loop Phi, and this loop is lower than
the chosen AddRecExpr's loop, it is invalid to fold our expression with the recurrence.
Another reason why it might be invalid to fold SCEVUnknown into Phi start value is that unlike
other SCEVs, SCEVUnknown are sometimes position-bound. For example, here:
for (...) { // loop
phi = {A,+,B}
}
X = load ...
Folding phi + X into {A+X,+,B}<loop> actually makes no sense, because X does not exist and cannot
exist while we are iterating in loop (this memory can be even not allocated and not filled by this moment).
It is only valid to make such folding if X is defined before the loop. In this case the recurrence {A+X,+,B}<loop>
may be existant.
This patch prohibits folding of SCEVUnknown (and those who use them) into the start value of an AddRecExpr,
if this instruction is dominated by the loop. Merging the dominating unknown values is still valid. Some tests that
relied on the fact that some SCEVUnknown should be folded into AddRec's are changed so that they no longer
expect such behavior.
llvm-svn: 303730
LazyRandomTypeCollection is designed for random access, and in
order to provide this it lazily indexes ranges of types. In the
case of types from an object file, there is no partial index
to build off of, so it has to index the full stream up front.
However, merging types only requires sequential access, and when
that is needed, this extra work is simply wasted. Changing the
algorithm to work on sequential arrays of types rather than
random access type collections eliminates this up front scan.
llvm-svn: 303707
When writing field list records, we would construct a temporary
type serializer that shared a bump ptr allocator with the rest
of the application, so anything allocated from here would live
forever. Furthermore, this temporary serializer had all the
properties of a full blown serializer including record hashing
and de-duplication.
These features are required when you're merging multiple type
streams into each other, because different streams may contain
identical records, but records from the same type stream will
never collide with each other. So all of this hashing was
unnecessary.
To solve this, two fixes are made:
1) The temporary serializer keeps its own bump ptr allocator
instead of sharing a global one. When it's finished, all of
its memory is freed.
2) Instead of using the same temporary serializer for the life
of an entire type stream, we use it only for the life of a single
field list record and delete it when the field list record is
completed. This way the hash table will not grow as other
records from the same type stream get inserted. Further improvements
could eliminate hashing entirely from this codepath.
This reduces the link time by 85% in my test, from 1 minute to 9
seconds.
llvm-svn: 303676
Summary:
This is a first patch for GSoC project, bash-completion for clang.
To use this on bash, please run `source clang/utils/bash-autocomplete.sh`.
bash-autocomplete.sh is code for bash-completion.
Simple flag completion and path completion is available in this patch.
Reviewers: teemperor, v.g.vassilev, ruiu, Bigcheese, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33237
llvm-svn: 303670
Summary:
First, StringMap uses llvm::HashString, which is only good for short
identifiers and really bad for large blobs of binary data like type
records. Moving to `DenseMap<StringRef, TypeIndex>` with some tricks for
memory allocation fixes that.
Unfortunately, that didn't buy very much performance. Profiling showed
that we spend a long time during DenseMap growth rehashing existing
entries. Also, in general, DenseMap is faster when the keys are small.
This change takes that to the logical conclusion by introducing a small
wrapper value type around a pointer to key data. The key data contains a
precomputed hash, the original record data (pointer and size), and the
type index, which is the "value" of our original map.
This reduces the time to produce llvm-as.exe and llvm-as.pdb from ~15s
on my machine to 3.5s, which is about a 4x improvement.
Reviewers: zturner, inglorion, ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33428
llvm-svn: 303665
Summary:
Before this change, AttributeLists stored a pair of index and
AttributeSet. This is memory efficient if most arguments do not have
attributes. However, it requires doing a search over the pairs to test
an argument or function attribute. Profiling shows that this loop was
0.76% of the time in 'opt -O2' of sqlite3.c, because LLVM constantly
tests values for nullability.
This was worth about 2.5% of mid-level optimization cycles on the
sqlite3 amalgamation. Here are the full perf results:
https://reviews.llvm.org/P7995
Here are just the before and after cycle counts:
```
$ perf stat -r 5 ./opt_before -O2 sqlite3.bc -o /dev/null
13,274,181,184 cycles # 3.047 GHz ( +- 0.28% )
$ perf stat -r 5 ./opt_after -O2 sqlite3.bc -o /dev/null
12,906,927,263 cycles # 3.043 GHz ( +- 0.51% )
```
This patch *does not* change the indices used to query attributes, as
requested by reviewers. Tracking whether an index is usable for array
indexing is a huge pain that affects many of the internal APIs, so it
would be good to come back later and do a cleanup to remove this
internal adjustment.
Reviewers: pete, chandlerc
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D32819
llvm-svn: 303654
This patch builds over https://reviews.llvm.org/rL303349 and replaces
the use of the condition only if it is safe to do so.
We should not blindly RAUW the condition if experimental.guard or assume
is a use of that
condition. This is because LVI may have used the guard/assume to
identify the
value of the condition, and RUAWing will fold the guard/assume and uses
before the guards/assumes.
Reviewers: sanjoy, reames, trentxintong, mkazantsev
Reviewed by: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33257
llvm-svn: 303633
Summary:
Add Max ModFlagBehavior, which can be used to take the max of two
module flag values when merging modules. Use it for the PIE and PIC
levels.
This avoids an error when we try to import from a module built -fpic
into a module built -fPIC, for example. For both PIE and PIC levels,
this will be legal, since the code generation gets more conservative
as the level is increased. Therefore we can take the max instead of
somehow trying to block importing between modules compiled with
different levels.
Reviewers: tmsriram, pcc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33418
llvm-svn: 303590
The forward declarations and the SimplifyQuery class at the beginning of the namespace weren't indented. But the closing brace for SimplifyQuery and everything after it were indented.
This commit makes the whole file consistent to no identation per coding standards. The signature of every function in this file changed a few weeks ago so this isn't a big disturbance to the revision history.
llvm-svn: 303588
Previous algotirhm assumed that types and ids are in a single
unified stream. For inputs that come from object files, this
is the case. But if the input is already a PDB, or is the result
of a previous merge, then the types and ids will already have
been split up, in which case we need an algorithm that can
accept operate on independent streams of types and ids that
refer across stream boundaries to each other.
Differential Revision: https://reviews.llvm.org/D33417
llvm-svn: 303577
MachineInstructions that don't generate any code (such as
IMPLICIT_DEFs) should not generate any debug info either.
Fixes PR33107.
https://bugs.llvm.org/show_bug.cgi?id=33107
This reapplies r303566 without any modifications. The stage2 build
failures persisted even after reverting this patch, and looking back
through history, it looks like these tests are flaky.
llvm-svn: 303575
MachineInstructions that don't generate any code (such as
IMPLICIT_DEFs) should not generate any debug info either.
Fixes PR33107.
https://bugs.llvm.org/show_bug.cgi?id=33107
llvm-svn: 303566
It's causing some buildbots to timeout whenever tablegen needs re-compilation,
particularly those with -fsanitize=memory but not only them. A compile time
regression was expected since it triples the amount of SelectionDAG rules we
are able to import but it's currently too high.
llvm-svn: 303542
Re-applying now that PR32825 which was raised on the commit this fixed up is now known to have also been fixed by this commit.
Original commit message:
Multiple ldr pseudoinstructions with the same constant value will
reuse the same constant pool entry. However, if the constant pool
is explicitly flushed with a .ltorg directive, we should not try
to reference constants in the previous pool any longer, since they
may be out of range.
This fixes assembling hand-written assembler source which repeatedly
loads the same constant value, across a binary size larger than the
pc-relative fixup range for ldr instructions (4096 bytes). Such
assembler source already uses explicit .ltorg instructions to emit
constant pools with regular intervals. However if we try to reuse
constants emitted in earlier pools, they end up out of range.
This makes the output of the testcase match what binutils gas does
(prior to this patch, it would fail to assemble).
Differential Revision: https://reviews.llvm.org/D32847
llvm-svn: 303540
Re-applying now that the open bug on this commit, PR32825, is known to be fixed.
Original commit message:
Summary: This patch returns the same label if the CP entry with the same value has been created.
Reviewers: eli.friedman, rengolin, jmolloy
Subscribers: majnemer, jmolloy, llvm-commits
Differential Revision: https://reviews.llvm.org/D25804
llvm-svn: 303539
This reverts commit r302416. This was a fixup for r286006, which has now been reverted so this doesn't apply (either in concept or in code).
This commit itself has no problems, but the underlying issue it was fixing has now disappeared from the codebase.
llvm-svn: 303536
llvm-symbolizer would fail to symbolize addresses in unlinked object
files when handling .dwo file data because the addresses would not be
relocated in the same way as the ranges in the skeleton CU in the object
file.
Fix that so object files can be symbolized the same as executables.
llvm-svn: 303532
This is a re-application of a r303497 that was reverted in r303498.
I thought it had broken a bot when it had not (the breakage did not
go away with the revert).
This change makes the split between the "exact" backedge taken count
and the "maximum" backedge taken count a bit more obvious. Both of
these are upper bounds on the number of times the loop header
executes (since SCEV does not account for most kinds of abnormal
control flow), but the latter is guaranteed to be a constant.
There were a few places where the max backedge taken count *was* a
non-constant; I've changed those to compute constants instead.
At this point, I'm not sure if the constant max backedge count can be
computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without
losing precision. If it can, we can simplify even further by making
`getMaxBackedgeTakenCount` a thin wrapper around
`getBackedgeTakenCount` and `getUnsignedRange`.
llvm-svn: 303531
This change makes the split between the "exact" backedge taken count
and the "maximum" backedge taken count a bit more obvious. Both of
these are upper bounds on the number of times the loop header
executes (since SCEV does not account for most kinds of abnormal
control flow), but the latter is guaranteed to be a constant.
There were a few places where the max backedge taken count *was* a
non-constant; I've changed those to compute constants instead.
At this point, I'm not sure if the constant max backedge count can be
computed by calling `getUnsignedRange(Exact).getUnsignedMax()` without
losing precision. If it can, we can simplify even further by making
`getMaxBackedgeTakenCount` a thin wrapper around
`getBackedgeTakenCount` and `getUnsignedRange`.
llvm-svn: 303497
Summary: This allows pthread_self to be pulled out of a loop by LICM.
Reviewers: hfinkel, arsenm, davide
Reviewed By: davide
Subscribers: davide, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D32782
llvm-svn: 303495
This is split up into two commits.
The will create the DEF parser in LLVM.
Check the following commit to see the removal from LLD
Reviewers: ruiu
Differential Revision: https://reviews.llvm.org/D32689
llvm-svn: 303490
Summary: Added the new modules in the Object/ folder. Updated the
llvm-cvtres interface as well, and added additional tests.
Subscribers: llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D33180
llvm-svn: 303480
This reapplies commit r303438 modified to not verify cross-imported
bitcode in FunctionImporter.
rdar://problem/31233625
Differential Revision: https://reviews.llvm.org/D33370
llvm-svn: 303470
Refactor the strlen optimization code to work for both strlen and wcslen.
This especially helps with programs in the wild where people pass
L"string"s to const std::wstring& function parameters and the wstring
constructor gets inlined.
This also fixes a lingerind API problem/bug in getConstantStringInfo()
where zeroinitializers would always give you an empty string (without a
length) back regardless of the actual length of the initializer which
did not work well in the TrimAtNul==false causing the PR mentioned
below.
Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG
memcpy lowering and may lead to some cases for out-of-bounds
zeroinitializer accesses not getting optimized anymore. So some code
with UB may produce out of bound memory reads now instead of just
producing zeros.
The refactoring "accidentally" fixes http://llvm.org/PR32124
Differential Revision: https://reviews.llvm.org/D32839
llvm-svn: 303461
This is a complicated bug involving two issues:
1. What do we do with phi nodes when we prove all arguments are not
live?
2. When is it safe to use value leaders to determine if we can ignore
an argumnet?
llvm-svn: 303453
This was originally reverted because it was a breaking a bunch
of bots and the breakage was not surfacing on Windows. After much
head-scratching this was ultimately traced back to a bug in the
lit test runner related to its pipe handling. Now that the bug
in lit is fixed, Windows correctly reports these test failures,
and as such I have finally (hopefully) fixed all of them in this
patch.
llvm-svn: 303446
Summary:
This patch adds udiv/sdiv/urem/srem/udivrem/sdivrem methods that can divide by a uint64_t. This makes division consistent with all the other arithmetic operations.
This modifies the interface of the divide helper method to work on raw arrays instead of APInts. This way we can pass the uint64_t in for the RHS without wrapping it in an APInt. This required moving all the Quotient and Remainder allocation handling up to the callers. For udiv/urem this was as simple as just creating the Quotient/Remainder with the right size when they were declared. For udivrem we have to rely on reallocate not changing the contents of the variable LHS or RHS is aliased with the Quotient or Remainder APInts. We also have to zero the upper bits of Remainder and Quotient that divide doesn't write to if lhsWords/rhsWords is smaller than the width.
I've update the toString method to use the new udivrem.
Reviewers: hans, dblaikie, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33310
llvm-svn: 303431
This is a squash of ~5 reverts of, well, pretty much everything
I did today. Something is seriously broken with lit on Windows
right now, and as a result assertions that fire in tests are
triggering failures. I've been breaking non-Windows bots all
day which has seriously confused me because all my tests have
been passing, and after running lit with -a to view the output
even on successful runs, I find out that the tool is crashing
and yet lit is still reporting it as a success!
At this point I don't even know where to start, so rather than
leave the tree broken for who knows how long, I will get this
back to green, and then once lit is fixed on Windows, hopefully
hopefully fix the remaining set of problems for real.
llvm-svn: 303409
We were using a BumpPtrAllocator to allocate stable storage for
a record, then trying to insert that into a hash table. If a
collision occurred, the bytes were never inserted and the
allocation was unnecessary. At the cost of an extra hash
computation, check first if it exists, and only if it does do
we allocate and insert.
llvm-svn: 303407
This reverts commit r303383.
This breaks the modules-enabled macOS build with:
lib/Support/LockFileManager.cpp:86:7: error: declaration of 'gethostuuid' must be imported from module 'Darwin.POSIX.unistd' before it is required
llvm-svn: 303402