This is necessary not only for representing empty ranges, but for handling
multibyte characters in the input. (If the end pointer in a range refers to
a multibyte character, should it point to the beginning or the end of the
character in a char array?) Some of the code in the asm parsers was already
assuming this anyway.
llvm-svn: 171765
turning a code like this:
if (foo)
free(foo)
into that:
free(foo)
Move a call to free from basic block FB into FB's predecessor, P,
when the path from P to FB is taken only if the argument of free is
not equal to NULL.
Some restrictions apply on P and FB to be sure that this code motion
is profitable. Namely:
1. FB must have only one predecessor P.
2. FB must contain only the call to free plus an unconditional
branch to S.
3. P's successors are FB and S.
Because of 1., we will not increase the code size when moving the call
to free from FB to P.
Because of 2., FB will be empty after the move.
Because of 2. and 3., P's branch instruction becomes useless, so as FB
(simplifycfg will do the job).
llvm-svn: 171762
peculiar headers under include/llvm.
This struct still doesn't make a lot of sense, but it makes more sense
down in TargetLowering than it did before.
llvm-svn: 171739
TargetTransformInfo rather than TargetLowering, removing one of the
primary instances of the layering violation of Transforms depending
directly on Target.
This is a really big deal because LSR used to be a "special" pass that
could only be tested fully using llc and by looking at the full output
of it. It also couldn't run with any other loop passes because it had to
be created by the backend. No longer is this true. LSR is now just
a normal pass and we should probably lift the creation of LSR out of
lib/CodeGen/Passes.cpp and into the PassManagerBuilder. =] I've not done
this, or updated all of the tests to use opt and a triple, because
I suspect someone more familiar with LSR would do a better job. This
change should be essentially without functional impact for normal
compilations, and only change behvaior of targetless compilations.
The conversion required changing all of the LSR code to refer to the TTI
interfaces, which fortunately are very similar to TargetLowering's
interfaces. However, it also allowed us to *always* expect to have some
implementation around. I've pushed that simplification through the pass,
and leveraged it to simplify code somewhat. It required some test
updates for one of two things: either we used to skip some checks
altogether but now we get the default "no" answer for them, or we used
to have no information about the target and now we do have some.
I've also started the process of removing AddrMode, as the TTI interface
doesn't use it any longer. In some cases this simplifies code, and in
others it adds some complexity, but I think it's not a bad tradeoff even
there. Subsequent patches will try to clean this up even further and use
other (more appropriate) abstractions.
Yet again, almost all of the formatting changes brought to you by
clang-format. =]
llvm-svn: 171735
This c'tor takes the AttributeSet class as the parameter. It will eventually
grab the attributes from the specified index and create a new attribute builder
with those attributes.
llvm-svn: 171712
This works fine with GDB for member variable pointers, but GDB's support for
member function pointers seems to be quite unrelated to
DW_TAG_ptr_to_member_type. (see GDB bug 14998 for details)
llvm-svn: 171698
through as a reference rather than a pointer. There is always *some*
implementation of this available, so this simplifies code by not having
to test for whether it is available or not.
Further, it turns out there were piles of places where SimplifyCFG was
recursing and not passing down either TD or TTI. These are fixed to be
more pedantically consistent even though I don't have any particular
cases where it would matter.
llvm-svn: 171691
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681
pass into the SelectionDAG itself rather than snooping on the
implementation of that pass as exposed by the TargetMachine. This
removes the last direct client of the ScalarTargetTransformInfo class
outside of the TTI pass implementation.
llvm-svn: 171625
interfaces which could be extracted from it, and must be provided on
construction, to a chained analysis group.
The end goal here is that TTI works much like AA -- there is a baseline
"no-op" and target independent pass which is in the group, and each
target can expose a target-specific pass in the group. These passes will
naturally chain allowing each target-specific pass to delegate to the
generic pass as needed.
In particular, this will allow a much simpler interface for passes that
would like to use TTI -- they can have a hard dependency on TTI and it
will just be satisfied by the stub implementation when that is all that
is available.
This patch is a WIP however. In particular, the "stub" pass is actually
the one and only pass, and everything there is implemented by delegating
to the target-provided interfaces. As a consequence the tools still have
to explicitly construct the pass. Switching targets to provide custom
passes and sinking the stub behavior into the NoTTI pass is the next
step.
llvm-svn: 171621
VectorTargetTransformInfo into the TargetTransformInfo pass,
implementing them be delegating back out to the two subobjects.
This is the first step to folding the interfaces together and making
TargetTransformInfo a normal analysis pass (specifically an analysis
group which targets can provide target-specific analysis pass
implementations of).
No callers are migrated here, this just stubs out the interface. Next
step will be to migrate all the callers to directly operate on TTI
instead of STTI or VTTI respectively. That will allow replacing the
machinery for delivering TTI without changing every caller at once.
WIP, I promise all the duplicated interfaces will be removed in the end,
this just decouples the steps of the process.
llvm-svn: 171615
values -- that's not required to fix the bug that was cropping up, and
the values selected made the enumeration's underlying type signed and
introduced some warnings. This fixes the -Werror build.
The underlying issue here was that the DenseMapInfo was casting values
completely outside the range of the underlying storage of the
enumeration to the enumeration's type. GCC went and "optimized" that
into infloops and other misbehavior. By providing designated special
values for these keys in the dense map, we ensure they are indeed
representable and that they won't be used for anything else.
It might be better to reuse None for the empty key and have the
tombstone share the value of the sentinel enumerator, but honestly
having 2 extra enumerators seemed not to matter and this seems a bit
simpler. I'll let Bill shuffle this around (or ask me to shuffle it
around) if he prefers it to look a different way.
I also made the switch a bit more clear (and produce a better assert)
that the enumerators are *never* going to show up and are errors if they
do.
llvm-svn: 171614
With DenseMapInfo<Enum>, it is miscompiled on g++-4.4.
static inline Enum getEmptyKey() { return Enum(<arbitrary int/unsigned value>); }
isEauql(getEmptyKey(), ...)
The compiler mis-assumes the return value is not aliased to Enum.
llvm-svn: 171600
The series of patches leading up to this one makes llc -O0 run 8% faster.
When deallocating a MachineFunction, there is no need to visit all
MachineInstr and MachineOperand objects to deallocate them. All their
memory come from a BumpPtrAllocator that is about to be purged, and they
have empty destructors anyway.
This only applies when deallocating the MachineFunction.
DeleteMachineInstr() should still be used to recycle MI memory during
the codegen passes.
Remove the LeakDetector support for MachineInstr. I've never seen it
used before, and now it definitely doesn't work. With this patch, leaked
MachineInstrs would be much less of a problem since all of their memory
will be reclaimed by ~MachineFunction().
llvm-svn: 171599
Instead of an std::vector<MachineOperand>, use MachineOperand arrays
from an ArrayRecycler living in MachineFunction.
This has several advantages:
- MachineInstr now has a trivial destructor, making it possible to
delete them in batches when destroying MachineFunction. This will be
enabled in a later patch.
- Bypassing malloc() and free() can be faster, depending on the system
library.
- MachineInstr objects and their operands are allocated from the same
BumpPtrAllocator, so they will usually be next to each other in
memory, providing better locality of reference.
- Reduce MachineInstr footprint. A std::vector is 24 bytes, the new
operand array representation only uses 8+4+1 bytes in MachineInstr.
- Better control over operand array reallocations. In the old
representation, the use-def chains would be reordered whenever a
std::vector reached its capacity. The new implementation never changes
the use-def chain order.
Note that some decisions in the code generator depend on the use-def
chain orders, so this patch may cause different assembly to be produced
in a few cases.
llvm-svn: 171598
This function works like memmove() for MachineOperands, except it also
updates any use-def chains containing the moved operands.
The use-def chains are updated without affecting the order of operands
in the list. That isn't possible when using the
removeRegOperandFromUseList() and addRegOperandToUseList() functions.
Callers to follow soon.
llvm-svn: 171597
legality of an address mode to not use a struct of four values and
instead to accept them as parameters. I'd love to have named parameters
here as most callers only care about one or two of these, but the
defaults aren't terribly scary to write out.
That said, there is no real impact of this as the passes aren't yet
using STTI for this and are still relying upon TargetLowering.
llvm-svn: 171595
next to its only user. This helper relies on TargetLowering information
that shouldn't be generally used throughout the Transfoms library, and
so it made little sense as a generic utility.
This also consolidates the file where we need to remove the remaining
uses of TargetLowering in favor of the IR-layer abstract interface in
TargetTransformInfo.
llvm-svn: 171590
The Attribute class is eventually going to represent one attribute. So we need
this class to create the set of attributes. Add some iterator methods to the
builder to access its internal bits in a nice way.
llvm-svn: 171586
This is similar to the existing Recycler allocator, but instead of
recycling individual objects from a BumpPtrAllocator, arrays of
different sizes can be allocated.
llvm-svn: 171581
The bit mask thing will be a thing of the past. It's not extensible enough. Get
rid of its use here. Opt instead for using a vector to hold the attributes.
Note: Some of this code will become obsolete once the rewrite is further along.
llvm-svn: 171553
wall time, user time, and system time since a process started.
For walltime, we currently use TimeValue's interface and a global
initializer to compute a close approximation of total process runtime.
For user time, this adds support for an somewhat more precise timing
mechanism -- clock_gettime with the CLOCK_PROCESS_CPUTIME_ID clock
selected.
For system time, we have to do a full getrusage call to extract the
system time from the OS. This is expensive but unavoidable.
In passing, clean up the implementation of the old APIs and fix some
latent bugs in the Windows code. This might have manifested on Windows
ARM systems or other systems with strange 64-bit integer behavior.
The old API for this both user time and system time simultaneously from
a single getrusage call. While this results in fewer system calls, it
also results in a lower precision user time and if only user time is
desired, it introduces a higher overhead. It may be worthwhile to switch
some of the pass timers to not track system time and directly track user
and wall time. The old API also tracked walltime in a confusing way --
it just set it to the current walltime rather than providing any measure
of wall time since the process started the way buth user and system time
are tracked. The new API is more consistent here.
The plan is to eventually implement these methods for a *child* process
by using the wait3(2) system call to populate an rusage struct
representing the whole subprocess execution. That way, after waiting on
a child process its stats will become accurate and cheap to query.
llvm-svn: 171551
A BumpPtrAllocator has an empty Deallocate() method, but
Recycler::clear() would still call it for every single object ever
allocated, bringing all those objects into cache. As a bonus,
iplist::remove() will also write to the Prev/Next pointers on all the
objects, so all those cache lines have to be written back to RAM before
the pages are given back to the OS.
Stop wasting time and memory bandwith by using the new
clearAndLeakUnsafely() function to jettison all the recycled objects.
llvm-svn: 171541
The iplist::clear() function can be quite expensive because it traverses
the entire list, calling deleteNode() and removeNodeFromList() on each
element. If node destruction and deallocation can be handled some other
way, clearAndLeakNodesUnsafely() can be used to jettison all nodes
without bringing them into cache.
The function name is meant to be ominous.
llvm-svn: 171540
* Remove dead methods.
* Use the 'operator==' method instead of 'contains', which isn't needed.
* Fix some comments.
No functionality change.
llvm-svn: 171523
This patch fixes the PPC eh_frame definitions for the personality and
frame unwinding for PIC objects. It makes PIC build correctly creates
relative relocations in the '.rela.eh_frame' segments and thus avoiding
a text relocation that generates a DT_TEXTREL segments in link phase.
llvm-svn: 171506
1. Add code to estimate register pressure.
2. Add code to select the unroll factor based on register pressure.
3. Add bits to TargetTransformInfo to provide the number of registers.
llvm-svn: 171469
Users of LLVM_BUILTIN_UNREACHABLE should be responsible in the case when LLVM_BUILTIN_UNREACHABLE is undefined.
Actually, (0, (p)) in LLVM_ASSUME_ALIGNED(p, a) caused thousands of warnings on g++-4.4. It was a motivation in this commit.
llvm-svn: 171455
In order to cost subvector insertion and extraction, we need to know
the type of the subvector being extracted.
No functionality change.
llvm-svn: 171453
before the last time.
--- Reverse-merging r171442 into '.':
U include/llvm/IR/Attributes.h
U lib/IR/Attributes.cpp
U lib/IR/AttributeImpl.h
llvm-svn: 171448
The 'operator==' method is a bit clearer and much less verbose for somethings
that should have only one value. Remove from the AttrBuilder for consistency.
llvm-svn: 171442
Modify the AttrBuilder class to store the attributes as a set instead of as a
bit mask. The Attribute class will represent only one attribute instead of a
collection of attributes.
This is the wave of the future!
llvm-svn: 171427
* Add support for specifying the alignment to use.
* Add the concept of native endianness. Used for unaligned native types.
The native alignment and read/write simplification is based on a patch by Richard Smith.
llvm-svn: 171406
code that includes Intrinsics.gen directly.
This never showed up in my testing because the old Intrinsics.gen was
still kicking around in the make build system and was correct there. =[
Thankfully, some of the bots to clean rebuilds and that caught this.
llvm-svn: 171373
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
utils/sort_includes.py script.
Most of these are updating the new R600 target and fixing up a few
regressions that have creeped in since the last time I sorted the
includes.
llvm-svn: 171362
through the static helper functions. This is already true throughout the
codebase.
Slowly, I'm going to re-implement these static helpers in terms of a new
process based interface which can expose more information, and remove
the program object entirely.
llvm-svn: 171335
Implement the old API in terms of the new one. This simplifies the
implementation on Windows which can now re-use the self_process's once
initialization.
llvm-svn: 171330
a union. These don't actually work for by-value function arguments, and
MSVC warns if they exist even while (we hope) it aligns the argument
correctly due to the other union member.
This means MSVC will miss out on optimizations based on the alignment of
the buffer, but really, there aren't that many for x86 and MSVC is
likely not doing a great job of optimizing LLVM and Clang anyways.
llvm-svn: 171328
This adds AlignedCharArray<Alignment, Size>. A templated struct that contains
a member named buffer of type char[Size] that is aligned to Alignment.
llvm-svn: 171319
The coding style used here is not LLVM's style because this is modeled
after a Boost interface and thus done in the style of a candidate C++
standard library interface. I'll probably end up proposing it as
a standard C++ library if it proves to be reasonably portable and
useful.
This is just the most basic parts of the interface -- getting the
process ID out of it. However, it helps sketch out some of the boiler
plate such as the base class, derived class, shared code, and static
factory function. It also introduces a unittest so that I can
incrementally ensure this stuff works.
However, I've not even compiled this code for Windows yet. I'll try to
fix any Windows fallout from the bots, and if I can't fix it I'll revert
and get someone on Windows to help out. There isn't a lot more that is
mandatory, so soon I'll switch to just stubbing out the Windows side and
get Michael Spencer to help with implementation as he can test it
directly.
llvm-svn: 171289
LLVM libraries. Also, clean up the doxygen and formatting of the
existing interfaces.
With this change I'm calling the existing interface "legacy" because I'd
like to replace it with something much better. My end goal is to expose
a common set of interfaces for inspecting various properties of
a process, and implementations to expose those both for the current
process and for child processes. This will also expose more rich
interfaces for spawning and controling a subprocess, notably to use
system calls like wait3 and wait4 where available and gather detailed
resource usage stats about the subprocess.
My plan (discussed with Michael Spencer on IRC) is to base this loosely
around the proposed Boost.Process interface, but to implement
a relatively small subset of that functionality based around the needs
of LLVM, Clang, the Clang driver, etc.
llvm-svn: 171285
directly.
This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.
llvm-svn: 171253
constant folding calls. Add the initial tests for this which show that
now instsimplify can simplify blindingly obvious code patterns expressed
with both intrinsics and library calls.
llvm-svn: 171194
are nice and decomposed so that we can simplify synthesized calls as
easily as actually call instructions. The internal utility still has the
same behavior, it just now operates on a more generic interface so that
I can extend the set of call simplifications that instsimplify knows
about.
llvm-svn: 171189
information doesn't return an addend for Rel relocations. Go ahead
and use this information to fix relocation handling inside dwarfdump
for 32-bit ELF REL.
llvm-svn: 171126
As with the prefetch intrinsic to which it maps, simply have dcbt
marked as reading from and writing to its arguments instead of having
unmodeled side effects. While this might cause unwanted code motion
(because aliasing checks don't really capture cache-line sharing),
it is more important that prefetches in unrolled loops don't block
the scheduler from rearranging the unrolled loop body.
llvm-svn: 171073
These are now generally used for all diagnostics from the backend, not just
for inline assembly, so this drops the "InlineAsm" from the names. No
functional change. (I've left aliases for the old names but only for long
enough to let me switch over clang to use the new ones.)
llvm-svn: 171047
When the backend is used from clang, it should produce proper diagnostics
instead of just printing messages to errs(). Other clients may also want to
register their own error handlers with the LLVMContext, and the same handler
should work for warnings in the same way as the existing emitError methods.
llvm-svn: 171041
the cost of arithmetic functions. We now assume that the cost of arithmetic
operations that are marked as Legal or Promote is low, but ops that are
marked as custom are higher.
llvm-svn: 171002
On MachO, sections also have segment names. When a tool looking at a .o file
prints a segment name, this is what they mean. In reality, a .o has only one
anonymous, segment.
This patch adds a MachO only function to fetch that segment name. I named it
getSectionFinalSegmentName since the main use for the name seems to be inform
the linker with segment this section should go to.
The patch also changes MachOObjectFile::getSectionName to return just the
section name instead of computing SegmentName,SectionName.
The main difference from the previous patch is that it doesn't use
InMemoryStruct. It is extremely dangerous: if the endians match it returns
a pointer to the file buffer, if not, it returns a pointer to an internal buffer
that is overwritten in the next API call.
We should change all of this code to use
support::detail::packed_endian_specific_integral like ELF, but since these
functions only handle strings, they work with big and little endian machines
as is.
I have tested this by installing ubuntu 12.10 ppc on qemu, that is why it took
so long :-)
llvm-svn: 170838
Instructions that are inserted in a basic block can still be decorated
with addOperand(MO).
Make the two-argument addOperand() function contain the actual
implementation. This function will now always have a valid MF reference
that it can use for memory allocation.
llvm-svn: 170798
This function is often used to decorate dangling instructions, so a
context reference is required to allocate memory for the operands.
Also add a corresponding MachineInstrBuilder method.
llvm-svn: 170797
Rename the AttributeImpl* from Attrs to pImpl to be consistent with other code.
Add comments where none were before. Or doxygen-ify other comments.
llvm-svn: 170767
This is supposed to be a mechanical change with no functional effects.
InstrEmitter can generate all types of MachineOperands which revealed
that MachineInstrBuilder was missing a few methods, added by this patch.
Besides providing a context pointer to MI::addOperand(),
MachineInstrBuilder seems like a better fit for this code.
llvm-svn: 170712
Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage).
llvm-svn: 170704
MC disassembler clients (LLDB) are interested in querying if an
instruction may affect control flow other than by virtue of being
an explicit branch instruction. For example, instructions which
write directly to the PC on some architectures.
llvm-svn: 170610
These were defined on TargetRegisterInfo, but they don't use any information
that's not available in MCRegisterInfo, so sink them down to be available
at the MC layer.
llvm-svn: 170608
Use the version that also takes an MF reference instead.
It would technically be possible to extract an MF reference from the MI
as MI->getParent()->getParent(), but that would not work for MIs that
are not inserted into any basic block.
Given the reasonably small number of places this constructor was used at
all, I preferred the compile time check to a run time assertion.
llvm-svn: 170588
Just like for addMemOperand(), the function pointer provides a context
for allocating memory. This will make it possible to use a better memory
allocation strategy for the MI operand list, which is currently a slow
std::vector.
Most calls to addOperand() come from MachineInstrBuilder, so give that
class an MF reference as well. Code using BuildMI() won't need changing
at all since the MF reference is already required to allocate a
MachineInstr.
Future patches will fix code that calls MI::addOperand(Op) directly, as
well as code that uses the now deprecated MachineInstrBuilder(MI)
constructor.
llvm-svn: 170574
- An MVT can become an EVT when being split (e.g. v2i8 -> v1i8, the latter doesn't exist)
- Return the scalar value when an MVT is scalarized (v1i64 -> i64)
Fixes PR14639ff.
llvm-svn: 170546
I cannot reproduce it the failures locally, so I will keep an eye at the ppc
bots. This patch does add the change to the "Disassembly of section" message,
but that is not what was failing on the bots.
Original message:
Add a funciton to get the segment name of a section.
On MachO, sections also have segment names. When a tool looking at a .o file
prints a segment name, this is what they mean. In reality, a .o has only one
anonymous, segment.
This patch adds a MachO only function to fetch that segment name. I named it
getSectionFinalSegmentName since the main use for the name seems to be infor
the linker with segment this section should go to.
The patch also changes MachOObjectFile::getSectionName to return just the
section name instead of computing SegmentName,SectionName.
llvm-svn: 170545
The bundle flags are now maintained by the slightly higher-level
functions bundleWithPred() / bundleWithSucc() which enforce consistent
bundle flags between neighboring instructions.
See also MIBundleBuilder for an even higher-level approach to building
bundles.
llvm-svn: 170475
The bundle_iterator::operator++ function now doesn't need to dig out the
basic block and check against end(). It can use the isBundledWithSucc()
flag to find the last bundled instruction safely.
Similarly, MachineInstr::isBundled() no longer needs to look at
iterators etc. It only has to look at flags.
llvm-svn: 170473
The bundle-related MI flags need to be kept in sync with the neighboring
instructions. Don't allow the bulk flag-setting setFlags() function to
change them.
Also don't copy MI flags when cloning an instruction. The clone's bundle
flags will be set when it is explicitly inserted into a bundle.
llvm-svn: 170459
Remove the instr_iterator versions of the splice() functions. It doesn't
seem useful to be able to splice sequences of instructions that don't
consist of full bundles.
The normal splice functions that take MBB::iterator arguments are not
changed, and they can move whole bundles around without any problems.
llvm-svn: 170456
The single-element ilist::splice() function supports a noop move:
List.splice(I, List, I);
The corresponding std::list function doesn't allow that, so add a unit
test to document that behavior.
This also means that
List.splice(I, List, F);
is somewhat surprisingly not equivalent to
List.splice(I, List, F, next(F));
This patch adds an assertion to catch the illegal case I == F above.
Alternatively, we could make I == F a legal noop, but that would make
ilist differ even more from std::list.
llvm-svn: 170443
The normal insert() function takes an MBB::iterator position, and
inserts a stand-alone MachineInstr as before.
The insert() function that takes an MBB::instr_iterator position can
insert instructions inside a bundle, and will now update the bundle
flags correctly when that happens.
When the insert position is between two bundles, it is unclear whether
the instruction should be appended to the previous bundle, prepended to
the next bundle, or stand on its own. The MBB::insert() function doesn't
bundle the instruction in that case, use the MIBundleBuilder class for
that.
llvm-svn: 170437
Most code is oblivious to bundles and uses the MBB::iterator which only
visits whole bundles. MBB::erase() operates on whole bundles at a time
as before.
MBB::remove() now refuses to remove bundled instructions. It is not safe
to remove all instructions in a bundle without deleting them since there
is no way of returning pointers to all the removed instructions.
MBB::remove_instr() and MBB::erase_instr() will now update bundle flags
correctly, lifting individual instructions out of bundles while leaving
the remaining bundle intact.
The MachineInstr convenience functions are updated so
eraseFromParent() erases a whole bundle as before
eraseFromBundle() erases a single instruction, leaving the rest of its bundle.
removeFromParent() refuses to operate on bundled instructions, and
removeFromBundle() lifts a single instruction out of its bundle.
These functions will no longer accidentally split or coalesce bundles -
bundle flags are updated to preserve the existing bundling, and explicit
bundleWith* / unbundleFrom* functions should be used to change the
instruction bundling.
This API update is still a work in progress. I am going to update APIs
first so they maintain bundle flags automatically when possible. Then
I'll add stricter verification of the bundle flags.
llvm-svn: 170384
compilation directory.
This defaults to the current working directory, just as it always has,
but now an assembler can choose to override it with a custom directory.
I've taught llvm-mc about this option and added a test case.
llvm-svn: 170371
Mips16 is really a processor decoding mode (ala thumb 1) and in the same
program, mips16 and mips32 functions can exist and can call each other.
If a jal type instruction encounters an address with the lower bit set, then
the processor switches to mips16 mode (if it is not already in it). If the
lower bit is not set, then it switches to mips32 mode.
The linker knows which functions are mips16 and which are mips32.
When relocation is performed on code labels, this lower order bit is
set if the code label is a mips16 code label.
In general this works just fine, however when creating exception handling
tables and dwarf, there are cases where you don't want this lower order
bit added in.
This has been traditionally distinguished in gas assembly source by using a
different syntax for the label.
lab1: ; this will cause the lower order bit to be added
lab2=. ; this will not cause the lower order bit to be added
In some cases, it does not matter because in dwarf and debug tables
the difference of two labels is used and in that case the lower order
bits subtract each other out.
To fix this, I have added to mcstreamer the notion of a debuglabel.
The default is for label and debug label to be the same. So calling
EmitLabel and EmitDebugLabel produce the same result.
For various reasons, there is only one set of labels that needs to be
modified for the mips exceptions to work. These are the "$eh_func_beginXXX"
labels.
Mips overrides the debug label suffix from ":" to "=." .
This initial patch fixes exceptions. More changes most likely
will be needed to DwarfCFException to make all of this work
for actual debugging. These changes will be to emit debug labels in some
places where a simple label is emitted now.
Some historical discussion on this from gcc can be found at:
http://gcc.gnu.org/ml/gcc-patches/2008-08/msg00623.htmlhttp://gcc.gnu.org/ml/gcc-patches/2008-11/msg01273.html
llvm-svn: 170279
for a wider range of GOT entries that can hold thread-relative offsets.
This matches the behavior of GCC, which was not documented in the PPC64 TLS
ABI. The ABI will be updated with the new code sequence.
Former sequence:
ld 9,x@got@tprel(2)
add 9,9,x@tls
New sequence:
addis 9,2,x@got@tprel@ha
ld 9,x@got@tprel@l(9)
add 9,9,x@tls
Note that a linker optimization exists to transform the new sequence into
the shorter sequence when appropriate, by replacing the addis with a nop
and modifying the base register and relocation type of the ld.
llvm-svn: 170209
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
This is the second attempt. In the first attempt (r169837), a few
getSimpleVT() were hoisted too far, detected by bootstrap failures.
llvm-svn: 170104
On MachO, sections also have segment names. When a tool looking at a .o file
prints a segment name, this is what they mean. In reality, a .o has only one,
anonymous, segment.
This patch adds a MachO only function to fetch that segment name. I named it
getSectionFinalSegmentName since the main use for the name seems to be informing
the linker with segment this section should go to.
The patch also changes MachOObjectFile::getSectionName to return just the
section name instead of computing SegmentName,SectionName.
llvm-svn: 170095
In a previous thread it was pointed out that isPowerOfTwo is not a very precise
name since it can return false for powers of two if it is unable to show that
they are powers of two.
llvm-svn: 170093
Provides m_Argument that allows matching against a CallSite's specified argument. Provides m_Intrinsic pattern that can be templatized over the intrinsic id and bind/match arguments similarly to other pattern matchers. Implementations provided for 0 to 4 arguments, though it's very simple to extend for more. Also provides example template specialization for bswap (m_BSwap) and example of code cleanup for its use.
llvm-svn: 170091
Also add an MIBundleBuilder constructor that takes an existing bundle.
Together these functions make it possible to add instructions to
existing bundles.
llvm-svn: 170063
structures to and from YAML using traits. The first client will
be the test suite of lld. The documentation will show up at:
http://llvm.org/docs/YamlIO.html
llvm-svn: 170019
PowerPC target. This is the last of the four models, so we now have
full TLS support.
This is mostly a straightforward extension of the general dynamic model.
I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the
register copy following ADDI_TLSLD_L; otherwise everything above the
ADDIS_DTPREL_HA appeared dead and was removed.
As before, there are new test cases to test the assembly generation, and
the relocations output during integrated assembly. The expected code
gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll.
There are a couple of things I think can be done more efficiently in the
overall TLS code, so there will likely be a clean-up patch forthcoming;
but for now I want to be sure the functionality is in place.
Bill
llvm-svn: 170003
been used in the first place. It simply was passed to the function and to the
recursive invocations. Simply drop the parameter and update the callers for the
new signature.
Patch by Saleem Abdulrasool!
llvm-svn: 169988
When ASan replaces <alloca instruction> with
<offset into a common large alloca>, it should also patch
llvm.dbg.declare calls and replace debug info descriptors to mark
that we've replaced alloca with a value that stores an address
of the user variable, not the user variable itself.
See PR11818 for more context.
llvm-svn: 169984
Add R_ARM_NONE and R_ARM_PREL31 relocation types
to MCExpr. Both of them will be used while
generating .ARM.extab and .ARM.exidx sections.
llvm-svn: 169965
mention the inline memcpy / memset expansion code is a mess?
This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.
llvm-svn: 169959
Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.
llvm-svn: 169954
fsub X, +0 ==> X
fsub X, -0 ==> X, when we know X is not -0
fsub +/-0.0, (fsub -0.0, X) ==> X
fsub nsz +/-0.0, (fsub +/-0.0, X) ==> X
fsub nnan ninf X, X ==> 0.0
fadd nsz X, 0 ==> X
fadd [nnan ninf] X, (fsub [nnan ninf] 0, X) ==> 0
where nnan and ninf have to occur at least once somewhere in this expression
fmul X, 1.0 ==> X
llvm-svn: 169940
m_ConstantFP - match and bind a float constant
m_SpecificConstantFP - match a specific floating point value or vector of floats of that value
m_FPOne - match a floating point 1.0 or vector of 1.0s
m_NegZero - match -0.0
m_AnyZero - match 0 or -0.0
llvm-svn: 169939
ScalarTargetTransformInfo::getIntImmCost() instead. "Legal" is a poorly defined
term for something like integer immediate materialization. It is always possible
to materialize an integer immediate. Whether to use it for memcpy expansion is
more a "cost" conceern.
llvm-svn: 169929
Given a thread-local symbol x with global-dynamic access, the generated
code to obtain x's address is:
Instruction Relocation Symbol
addis ra,r2,x@got@tlsgd@ha R_PPC64_GOT_TLSGD16_HA x
addi r3,ra,x@got@tlsgd@l R_PPC64_GOT_TLSGD16_L x
bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x
R_PPC64_REL24 __tls_get_addr
nop
<use address in r3>
The implementation borrows from the medium code model work for introducing
special forms of ADDIS and ADDI into the DAG representation. This is made
slightly more complicated by having to introduce a call to the external
function __tls_get_addr. Using the full call machinery is overkill and,
more importantly, makes it difficult to add a special relocation. So I've
introduced another opcode GET_TLS_ADDR to represent the function call, and
surrounded it with register copies to set up the parameter and return value.
Most of the code is pretty straightforward. I ran into one peculiarity
when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like
BL8_NOP_ELF except that it takes another parameter to represent the symbol
("x" above) that requires a relocation on the call. Something in the
TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated
identically during the emit phase, so this second operand was never
visited to generate relocations. This is the reason for the slightly
messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding().
Two new tests are included to demonstrate correct external assembly and
correct generation of relocations using the integrated assembler.
Comments welcome!
Thanks,
Bill
llvm-svn: 169910
instead of the instruction. I've left a forwarding wrapper for the
instruction so users with the instruction don't need to create
a GEPOperator themselves.
This lets us remove the copy of this code in instsimplify.
I've looked at most of the other copies of similar code, and this is the
only one I've found that is actually exactly the same. The one in
InlineCost is very close, but it requires re-mapping non-constant
indices through the cost analysis value simplification map. I could add
direct support for this to the generic routine, but it seems overly
specific.
llvm-svn: 169853
the GEP instruction class.
This is part of the continued refactoring and cleaning of the
infrastructure used by SROA. This particular operation is also done in
a few other places which I'll try to refactor to share this
implementation.
llvm-svn: 169852
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
llvm-svn: 169837
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles. Testing with the external/internal nightly
test suite reveals no change in compile time performance. Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures. All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.
In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.
Part of rdar://12553082
llvm-svn: 169796
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
if it's not possible to materialize an integer immediate with a single
instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
Also increase the threshold to something reasonable (8 for memset, 4 pairs
for memcpy).
This significantly improves Dhrystone, up to 50% on ARM iOS devices.
rdar://12760078
llvm-svn: 169791
InitSections is called before the MCContext is initialized it could cause
duplicate temporary symbols to be emitted later (after context initialization
resets the temporary label counter).
llvm-svn: 169785