Introduction:
In case when stack alignment is 8 and GPRs parameter part size is not N*8:
we add padding to GPRs part, so part's last byte must be recovered at
address K*8-1.
We need to do it, since remained (stack) part of parameter starts from
address K*8, and we need to "attach" "GPRs head" without gaps to it:
Stack:
|---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
[ [padding] [GPRs head] ] [ ------ Tail passed via stack ------ ...
FIX:
Note, once we added padding we need to correct *all* Arg offsets that are going
after padded one. That's why we need this fix: Arg offsets were never corrected
before this patch. See new test-cases included in patch.
We also don't need to insert padding for byval parameters that are stored in GPRs
only. We need pad only last byval parameter and only in case it outsides GPRs
and stack alignment = 8.
Though, stack area, allocated for recovered byval params, must satisfy
"Size mod 8 = 0" restriction.
This patch reduces stack usage for some cases:
We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be
"packed" with alignment 4 in some cases.
llvm-svn: 182237
The export list for this test requires the following symbols to be available:
JITTest_AvailableExternallyFunction
JITTest_AvailableExternallyGlobal
The change in r181200 commented them out, which caused the test to fail to
link, at least on Darwin. I have only reverted the change for arm, since I
can't test the other targets and since it sounds like that change was fixing
real problems for those other targets. It should be possible to rearrange the
code to keep those definitions outside the #ifdefs, but that should be done by
someone who can reproduce the problems that r181200 was trying to fix.
llvm-svn: 182233
This fixes a bootstrapping problem with builds for Apple ARM targets.
Clang had the wrong prototype for __clear_cache with ARM targets. Rafael
fixed that in clang svn r181784 and r181810, but without those changes,
we can't build this code for ARM because clang reports an error about the
declaration in Memory.inc not matching the builtin declaration. Some of our
buildbots need to use an older compiler that doesn't have the clang fix.
Since __clear_cache is never used here when __APPLE__ is defined, I'm just
conditionalizing the declaration to match that. I also moved the declaration
of sys_icache_invalidate inside the conditional for __APPLE__ while I was at
it.
llvm-svn: 182223
AArch64 ELF uses .rela relocations so there's no need to actually make
use of the bits we're setting in the destination However, we should
make sure all bits are cleared properly since multiple runs of
resolveRelocations are possible and these could combine to produce
invalid results if stale versions remain in the code.
llvm-svn: 182214
lli's remote MCJIT code calls setExecutable just prior to running
code. In line with Darwin behaviour this seems to be the place to
invalidate any caches needed so that relocations can take effect
properly.
llvm-svn: 182213
On 32-bit hosts %p can print garbage when given a uint64_t, we should
use %llx instead. This only affects the output of the debugging text
produced by lli.
llvm-svn: 182209
We don't need to reject all inline asm as using the counter register (most does
not). Only those that explicitly clobber the counter register need to prevent
the transformation.
llvm-svn: 182191
The peephole tries to reorder MOV32r0 instructions such that they are
before the instruction that modifies EFLAGS.
The problem is that the peephole does not consider the case where the
instruction that modifies EFLAGS also depends on the previous state of
EFLAGS.
Instead, walk backwards until we find an instruction that has a def for
EFLAGS but does not have a use.
If we find such an instruction, insert the MOV32r0 before it.
If it cannot find such an instruction, skip the optimization.
llvm-svn: 182184
This patch matches GCC behavior: the code used to only allow unaligned
load/store on ARM for v6+ Darwin, it will now allow unaligned load/store
for v6+ Darwin as well as for v7+ on Linux and NaCl.
The distinction is made because v6 doesn't guarantee support (but LLVM
assumes that Apple controls hardware+kernel and therefore have
conformant v6 CPUs), whereas v7 does provide this guarantee (and
Linux/NaCl behave sanely).
The patch keeps the -arm-strict-align command line option, and adds
-arm-no-strict-align. They behave similarly to GCC's -mstrict-align and
-mnostrict-align.
I originally encountered this discrepancy in FastIsel tests which expect
unaligned load/store generation. Overall this should slightly improve
performance in most cases because of reduced I$ pressure.
llvm-svn: 182175
The errors were:
non-constant-expression cannot be narrowed from type 'int64_t' (aka 'long') to 'uint32_t' (aka 'unsigned int') in initializer list
and
non-constant-expression cannot be narrowed from type 'long' to 'uint32_t' (aka 'unsigned int') in initializer list
llvm-svn: 182168
Dot4 now uses 8 scalar operands instead of 2 vectors one which allows register
coalescer to remove some unneeded COPY.
This patch also defines some structures/functions that can be used to handle
every vector instructions (CUBE, Cayman special instructions...) in a similar
fashion.
llvm-svn: 182126
Almost all instructions that takes a 128 bits reg as input (fetch, export...)
have the abilities to swizzle their argument and output. Instead of printing
default swizzle for each 128 bits reg, rename T*.XYZW to T* and let instructions
print potentially optimized swizzles themselves.
llvm-svn: 182124