Function names in ObjC can have spaces in them. This interacts poorly
with name compression, which uses spaces to separate PGO names. Fix the
issue by using a different separator and update a test.
I chose "\01" as the separator because 1) it's non-printable, 2) we
strip it from PGO names, and 3) it's the next natural choice once "\00"
is discarded (that one's overloaded).
This reverts the revert commit beaf3d18. What's changed?
- I fixed up the covmap-V2 binary format tests using a linux VM.
- I updated the expected counts in instrprof-comdat.h to account for
the fact that there have been bugfixes to clang coverage.
- I added an assert to make sure we don't get bitten by this again.
Differential Revision: http://reviews.llvm.org/D18516
llvm-svn: 264641
Instead of using a bit to detect if they are "dynamic", just look at
sh_link.
This is a simplification on its own, and will help with using
llvm-objdump in dynamic objects.
llvm-svn: 264624
Summary:
Hopefully this will make it easier for the next person to figure all
this out...
Reviewers: bogner, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18490
llvm-svn: 264611
They do have a def machine operand.
Fixing the definition is necessary for an upcoming patch.
Differential Revision: http://reviews.llvm.org/D18384
llvm-svn: 264607
When using Visual Studio 2015, cmake now puts the native visualizers in llvm.sln, so the developer automatically sees custom visualizations.
Much thanks to ariccio who provided extensive help on this change. (manual installation still needed on VS2013)
llvm-svn: 264601
The A2 cores support the popcntw/popcntd instructions, but they're microcoded,
and slower than our default software emulation. Specifically, popcnt[dw] take
approximately 74 cycles, whereas our software emulation takes only 24-28
cycles.
I've added a new target feature to indicate a slow popcnt[dw], instead of just
removing the existing target feature from the a2/a2q processor models, because:
1. This allows us to return more accurate information via the TTI interface
(I recognize that this currently makes no practical difference)
2. Is hopefully easier to understand (it allows the core's features to match
its manual while still having the desired effect).
llvm-svn: 264600
When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes
is currently used to recognize potential loops of which the block is the header and keep the block.
However, the current algorithm fails if the loops' exit condition is evaluated only with volatile
values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested
loop, the loop is collapsed into a single loop which prevent later optimizations from being
applied (e.g., transforming nested loops into simplified forms and loop vectorization).
The patch augments the existing PHI node-based check by adding a pre-test if the BB actually
belongs to a set of loop headers and not eliminating it if yes.
llvm-svn: 264596
Don't set the function hotness attribute on the fly. This changes the CFG
branch probability of the caller function, which leads to inconsistent BB
ordering. This patch moves the attribute setting to a separated loop after
the counts in all functions are populated.
Fixes PR27024 - PGO instrumentation profile data is not reflected in correct
basic blocks.
Differential Revision: http://reviews.llvm.org/D18491
llvm-svn: 264594
MachineFunctionProperties represents a set of properties that a MachineFunction
can have at particular points in time. Existing examples of this idea are
MachineRegisterInfo::isSSA() and MachineRegisterInfo::tracksLiveness() which
will eventually be switched to use this mechanism.
This change introduces the AllVRegsAllocated property; i.e. the property that
all virtual registers have been allocated and there are no VReg operands
left.
With this mechanism, passes can declare that they require a particular property
to be set, or that they set or clear properties by implementing e.g.
MachineFunctionPass::getRequiredProperties(). The MachineFunctionPass base class
verifies that the requirements are met, and handles the setting and clearing
based on the delcarations. Passes can also directly query and update the current
properties of the MF if they want to have conditional behavior.
This change annotates the target-independent post-regalloc passes; future
changes will also annotate target-specific ones.
Reviewers: qcolombet, hfinkel
Differential Revision: http://reviews.llvm.org/D18421
llvm-svn: 264593
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together. The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.
This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.
shader-db stats:
2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18451
llvm-svn: 264589
Function names in ObjC can have spaces in them. This interacts poorly
with name compression, which uses spaces to separate PGO names. Fix the
issue by using a different separator and update a test.
I chose "\01" as the separator because 1) it's non-printable, 2) we
strip it from PGO names, and 3) it's the next natural choice once "\00"
is discarded (that one's overloaded).
Differential Revision: http://reviews.llvm.org/D18516
llvm-svn: 264587
- Do not optimize stack slots in optnone functions.
- Get aligned-base register from HexagonMachineFunctionInfo instead of
looking for ALIGNA instruction in the function's body.
llvm-svn: 264580
ICMP instruction selection fails on SKX and KNL for i1 operand.
I use XOR to resolve:
(A == B) is equivalent to (A xor B) == 0
Differential Revision: http://reviews.llvm.org/D18511
llvm-svn: 264566
When emitting coverage mappings for functions with local linkage and an
unknown filename, we use "<unknown>:func" for the PGO function name. The
problem is that we don't strip "<unknown>" from the name when loading
coverage data, like we do for other file names. Fix that and add a test.
llvm-svn: 264559
Change writeFunctionMetadata to call writeMetadataRecords. For now
there's no functionality change, but makes it easy to serialize other
types of metadata in the function block in the future.
llvm-svn: 264557
To match writeMetadataRecords, writeNamedMetadata and
writeMetadataStrings, change:
WriteModuleMetadata => writeModuleMetadata
WriteFunctionLocalMetadata => writeFunctionMetadata
Write##CLASS => write##CLASS
The only major change is "FunctionLocal" => "Function". The point is to
be less specific, in preparation for emitting normal metadata records
inside function metadata blocks (currently we only emit
`LocalAsMetadata` there).
llvm-svn: 264556
We don't really need a separate vector here; instead, point at a range
inside the main MDs array. This matches how r264551 references the
ranges of strings and non-strings.
llvm-svn: 264552
Spiritually reapply commit r264409 (reverted in r264410), albeit with a
bit of a redesign.
Firstly, avoid splitting the big blob into multiple chunks of strings.
r264409 imposed an arbitrary limit to avoid a massive allocation on the
shared 'Record' SmallVector. The bug with that commit only reproduced
when there were more than "chunk-size" strings. A test for this would
have been useless long-term, since we're liable to adjust the chunk-size
in the future.
Thus, eliminate the motivation for chunk-ing by storing the string sizes
in the blob. Here's the layout:
vbr6: # of strings
vbr6: offset-to-blob
blob:
[vbr6]: string lengths
[char]: concatenated strings
Secondly, make the output of llvm-bcanalyzer readable.
I noticed when debugging r264409 that llvm-bcanalyzer was outputting a
massive blob all in one line. Past a small number, the strings were
impossible to split in my head, and the lines were way too long. This
version adds support in llvm-bcanalyzer for pretty-printing.
<STRINGS abbrevid=4 op0=3 op1=9/> num-strings = 3 {
'abc'
'def'
'ghi'
}
From the original commit:
Inspired by Mehdi's similar patch, http://reviews.llvm.org/D18342, this
should (a) slightly reduce bitcode size, since there is less record
overhead, and (b) greatly improve reading speed, since blobs are super
cheap to deserialize.
llvm-svn: 264551