All of ELF, COFF and MachO now manipulate the flags in helpers so we don't need
anyone to read the flags directly, but instead via those helpers.
Reviewed by Rafael Espíndola.
llvm-svn: 239317
Also delete the now unused MCMachOSymbolFlags.h header as the only enum in there was moved to MCSymbolMachO.
Similarly to ELF and COFF, manipulating the flags is now done via helpers instead of spread
throughout the codebase.
Reviewed by Rafael Espíndola.
llvm-svn: 239316
All flags setting/getting is now done in the class with helper methods instead
of users having to get the bits in the correct order.
Reviewed by Rafael Espíndola.
llvm-svn: 239314
The flags field in MCSymbol only needs to be 16-bits on ELF and MachO.
This moves the 16-bit Type out of there so that it can be reduced in size in a future commit.
Reviewed by Rafael Espíndola.
llvm-svn: 239313
Summary:
We need to add a runtime memcheck for pair of accesses (x,y) where at least one of x and y
are writes.
Assuming we have w writes and r reads, currently this number is estimated as being
w* (w+r-1). This estimation will count (write,write) pairs twice and will overestimate
the number of checks required.
This change adds a getNumberOfChecks method to RuntimePointerCheck, which
will count the number of runtime checks needed (similar in implementation to
needsAnyChecking) and uses it to produce the correct number of runtime checks.
Test Plan:
llvm test suite
spec2k
spec2k6
Performance results: no changes observed (not surprising since the formula for 1 writer is basically the same, which would covers most cases - at least with the current check limit).
Reviewers: anemet
Reviewed By: anemet
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D10217
llvm-svn: 239295
Interleaved memory accesses are grouped and vectorized into vector load/store and shufflevector.
E.g. for (i = 0; i < N; i+=2) {
a = A[i]; // load of even element
b = A[i+1]; // load of odd element
... // operations on a, b, c, d
A[i] = c; // store of even element
A[i+1] = d; // store of odd element
}
The loads of even and odd elements are identified as an interleave load group, which will be transfered into vectorized IRs like:
%wide.vec = load <8 x i32>, <8 x i32>* %ptr
%vec.even = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%vec.odd = shufflevector <8 x i32> %wide.vec, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
The stores of even and odd elements are identified as an interleave store group, which will be transfered into vectorized IRs like:
%interleaved.vec = shufflevector <4 x i32> %vec.even, %vec.odd, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>
store <8 x i32> %interleaved.vec, <8 x i32>* %ptr
This optimization is currently disabled by defaut. To try it by adding '-enable-interleaved-mem-accesses=true'.
llvm-svn: 239291
No functional change intended, other than some minor changes to certain
diagnostics.
Differential Revision: http://reviews.llvm.org/D10296
llvm-svn: 239278
The new naming is (to me) much easier to understand. Here is a summary
of the new state of the world:
- '*Threshold' is the threshold for full unrolling. It is measured
against the estimated unrolled cost as computed by getUserCost in TTI
(or CodeMetrics, etc). We will exceed this threshold when unrolling
loops where unrolling exposes a significant degree of simplification
of the logic within the loop.
- '*PercentDynamicCostSavedThreshold' is the percentage of the loop's
estimated dynamic execution cost which needs to be saved by unrolling
to apply a discount to the estimated unrolled cost.
- '*DynamicCostSavingsDiscount' is the discount applied to the estimated
unrolling cost when the dynamic savings are expected to be high.
When actually analyzing the loop, we now produce both an estimated
unrolled cost, and an estimated rolled cost. The rolled cost is notably
a dynamic estimate based on our analysis of the expected execution of
each iteration.
While we're still working to build up the infrastructure for making
these estimates, to me it is much more clear *how* to make them better
when they have reasonably descriptive names. For example, we may want to
apply estimated (from heuristics or profiles) dynamic execution weights
to the *dynamic* cost estimates. If we start doing that, we would also
need to track the static unrolled cost and the dynamic unrolled cost, as
only the latter could reasonably be weighted by profile information.
This patch is sadly not without functionality change for the new unroll
analysis logic. Buried in the heuristic management were several things
that surprised me. For example, we never subtracted the optimized
instruction count off when comparing against the unroll heursistics!
I don't know if this just got lost somewhere along the way or what, but
with the new accounting of things, this is much easier to keep track of
and we use the post-simplification cost estimate to compare to the
thresholds, and use the dynamic cost reduction ratio to select whether
we can exceed the baseline threshold.
The old values of these flags also don't necessarily make sense. My
impression is that none of these thresholds or discounts have been tuned
yet, and so they're just arbitrary placehold numbers. As such, I've not
bothered to adjust for the fact that this is now a discount and not
a tow-tier threshold model. We need to tune all these values once the
logic is ready to be enabled.
Differential Revision: http://reviews.llvm.org/D9966
llvm-svn: 239164
These are added mainly for the benefit of clang, but this also means that they
are now allowed in .fpu directives and we emit the correct .fpu directive when
single-precision-only is used.
Differential Revision: http://reviews.llvm.org/D10238
llvm-svn: 239151
Add getFPUFeatures to TargetParser, which gets the list of subtarget features
that are enabled/disabled for each FPU, and use it when handling the .fpu
directive.
No functional change in this commit, though clang will start behaving
differently once it starts using this.
Differential Revision: http://reviews.llvm.org/D10237
llvm-svn: 239150
Summary:
Only restoring AvailableFeatures is not enough and will lead to buggy behaviour.
For example, if we have a feature enabled and we ".set pop", the next time we try
to ".set" that feature nothing will happen because the "!(STI.getFeatureBits()[Feature])"
check will be false, because we didn't restore STI.FeatureBits.
In order to fix this, we need to make MipsAssemblerOptions remember the STI.FeatureBits
instead of the AvailableFeatures and then regenerate AvailableFeatures each time we ".set pop".
This is because, AFAIK, there is no way to convert from AvailableFeatures back to STI.FeatureBits,
but the reverse is possible by using ComputeAvailableFeatures(STI.FeatureBits).
I also moved the updating of AssemblerOptions inside the "if" statement in
setFeatureBits() and clearFeatureBits(), as there is no reason to update if
nothing changes.
Reviewers: dsanders, mkuper
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9156
llvm-svn: 239144
When we generate coverage data, we explicitly set each coverage map's
alignment to 8 (See InstrProfiling::lowerCoverageData), but when we
read the coverage data, we assume consecutive maps are exactly
adjacent. When we're dealing with 32 bit, maps can end on a 4 byte
boundary, causing us to think the padding is part of the next record.
Fix this by adjusting the buffer to an appropriately aligned address
between records.
This is pretty awkward to test, as it requires a binary with multiple
coverage maps to hit, so we'd need to check in multiple source files
and a binary blob as inputs.
llvm-svn: 239129
Apparently this functionality isn't used in-tree (or I would go & make
the explicit unique_ptr constructions into implicit constructions to
make them more self documenting now that clone doesn't return a raw
owning pointer anymore) but only by the Julia frontend. This isn't
ideal.
llvm-svn: 239091
Originally committed in r237975, GCC 4.7 gave a compilation error
regarding "looser throw specification" though it seemed to be pointing
to a virtual defaulted dtor in a base class and an override defaulted
dtor in a derived class - so I'm not quite sure why/how they could end
up with different throw specifications. To simplify and reduce the risk
of this, I've just removed the pointless override in the derived class,
the base class's should be sufficient. *fingers crossed*
llvm-svn: 239088
Now that we can look at users, we can trivially do this: when we would
have otherwise disabled GlobalMerge (currently -O<3), we can just run
it for minsize functions, as it's usually a codesize win.
Differential Revision: http://reviews.llvm.org/D10054
llvm-svn: 239087
Fix the FIXME and remove this old as(1) compat option. It was useful for
bringup of the integrated assembler to diff object files, but now it's
just causing more relocations than strictly necessary to be generated.
rdar://21201804
llvm-svn: 239084
I made a few changes here in a couple of commits - breaking them out
into smaller ones in case I hit the GCC oddities again.
I'm still not /entirely/ sure what the issues were, so apologies if any
of these experiments break things again. Feel free to revert
immediately.
llvm-svn: 239083
Summary:
Properly report the error in segment load commands from MachOObjectFile
constructor instead of crashing the program.
Adjust the test case accordingly.
Test Plan: regression test suite
Reviewers: rafael, filcab
Subscribers: llvm-commits
llvm-svn: 239081
Summary:
Currently all load commands are parsed in MachOObjectFile constructor.
If the next load command cannot be parsed, or if command size is too
small, properly report it through the error code and fail to construct
the object, instead of crashing the program.
Test Plan: regression test suite
Reviewers: rafael, filcab
Subscribers: llvm-commits
llvm-svn: 239080