Summary:
Currently an error is thrown if bundle alignment mode is set more than once
per module (either via the API or the .bundle_align_mode directive). This
change allows setting it multiple times as long as the alignment doesn't
change.
Also nested bundle_lock groups are currently not allowed. This change allows
them, with the effect that the group stays open until all nests are exited,
and if any of the bundle_lock directives has the align_to_end flag, the
group becomes align_to_end.
These changes make the bundle aligment simpler to use in the compiler, and
also better match the corresponding support in GNU as.
Reviewers: jvoung, eliben
Differential Revision: http://reviews.llvm.org/D5801
llvm-svn: 219811
These derive from the new asm-only masking definitions.
Unfortunately I wasn't able to find a ISel pattern that we could legally
generate for the masking variants. The problem is that since the destination
is v4* we would need VK4 register classes and v4i1 value types to express the
masking. These are however not legal types/classes in AVX512f but only in VL,
so things get complicated pretty quickly. We can revisit this question later
if we have a more pressing need to express something like this.
So the ISel patterns are empty for the masking instructions and the next patch
will add Pat<>s instead to match the intrinsics calls with instructions.
llvm-svn: 219361
Nico Rieck added support for this 32-bit COFF relocation some time ago
for Win64 stuff. It appears that as an oversight, the assembly output
used "foo"@IMGREL32 instead of "foo"@IMGREL, which is what we can parse.
Sadly, there were actually tests that took in IMGREL and put out
IMGREL32, and we didn't notice the inconsistency. Oh well. Now LLVM can
assemble it's own output with slightly more fidelity.
llvm-svn: 218437
parsing (and latent bug in the instruction definitions).
This is effectively a revert of r136287 which tried to address
a specific and narrow case of immediate operands failing to be accepted
by x86 instructions with a pretty heavy hammer: it introduced a new kind
of operand that behaved differently. All of that is removed with this
commit, but the test cases are both preserved and enhanced.
The core problem that r136287 and this commit are trying to handle is
that gas accepts both of the following instructions:
insertps $192, %xmm0, %xmm1
insertps $-64, %xmm0, %xmm1
These will encode to the same byte sequence, with the immediate
occupying an 8-bit entry. The first form was fixed by r136287 but that
broke the prior handling of the second form! =[ Ironically, we would
still emit the second form in some cases and then be unable to
re-assemble the output.
The reason why the first instruction failed to be handled is because
prior to r136287 the operands ere marked 'i32i8imm' which forces them to
be sign-extenable. Clearly, that won't work for 192 in a single byte.
However, making thim zero-extended or "unsigned" doesn't really address
the core issue either because it breaks negative immediates. The correct
fix is to make these operands 'i8imm' reflecting that they can be either
signed or unsigned but must be 8-bit immediates. This patch backs out
r136287 and then changes those places as well as some others to use
'i8imm' rather than one of the extended variants.
Naturally, this broke something else. The custom DAG nodes had to be
updated to have a much more accurate type constraint of an i8 node, and
a bunch of Pat immediates needed to be specified as i8 values.
The fallout didn't end there though. We also then ceased to be able to
match the instruction-specific intrinsics to the instructions so
modified. Digging, this is because they too used i32 rather than i8 in
their signature. So I've also switched those intrinsics to i8 arguments
in line with the instructions.
In order to make the intrinsic adjustments of course, I also had to add
auto upgrading for the intrinsics.
I suspect that the intrinsic argument types may have led everything down
this rabbit hole. Pretty happy with the result.
llvm-svn: 217310
Instructions like 'fxsave' and control flow instructions like 'jne'
match any operand size. The loop I added to the Intel syntax matcher
assumed that using a different size would give a different instruction.
Now it handles the case where we get the same instruction for different
memory operand sizes.
This also allows us to remove the hack we had for unsized absolute
memory operands, because we can successfully match things like 'jnz'
without reporting ambiguity. Removing this hack uncovered test case
involving 'fadd' that was ambiguous. The memory operand could have been
single or double precision.
llvm-svn: 216604
The existing matcher has lots of AT&T assembly dialect assumptions baked
into it. In particular, the hack for resolving the size of a memory
operand by appending the four most common suffixes doesn't work at all.
The Intel assembly dialect mnemonic table has ambiguous entries, so we
need to try matching multiple times with different operand sizes, since
that's the only way to choose different instruction variants.
This makes us more compatible with gas's implementation of Intel
assembly syntax. MSVC assumes you want byte-sized operations for the
instructions that we reject as ambiguous.
Reviewed By: grosbach
Differential Revision: http://reviews.llvm.org/D4747
llvm-svn: 216481
Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction.
Added encoding and lowering tests.
Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>
llvm-svn: 215536
Fixes PR18916. I don't think we need to implement support for either
hybrid syntax. Nobody should write Intel assembly with '%' prefixes on
their registers or AT&T assembly without them.
llvm-svn: 215031
This is similar to what I did with the two-source permutation recently. (It's
almost too similar so that we should consider generating the masking variants
with some tablegen help.)
Both encoding and intrinsic tests are added as well. For the latter, this is
what the IR that the intrinsic test on the clang side generates.
Part of <rdar://problem/17688758>
llvm-svn: 214890
This allows assembling the two new instructions, encls and enclu for the
SKX processor model.
Note the diffs are a bigger than what might think, but to fit the new
MRM_CF and MRM_D7 in things in the right places things had to be
renumbered and shuffled down causing a bit more diffs.
rdar://16228228
llvm-svn: 214460
This patch minimizes the number of nops that must be emitted on X86 to satisfy
stackmap shadow constraints.
To minimize the number of nops inserted, the X86AsmPrinter now records the
size of the most recent stackmap's shadow in the StackMapShadowTracker class,
and tracks the number of instruction bytes emitted since the that stackmap
instruction was encountered. Padding is emitted (if it is required at all)
immediately before the next stackmap/patchpoint instruction, or at the end of
the basic block.
This optimization should reduce code-size and improve performance for people
using the llvm stackmap intrinsic on X86.
<rdar://problem/14959522>
llvm-svn: 213892
This target is identical to the Windows MSVC (and follows Microsoft ABI for C).
Correct the library call setup for this target. The same set of library calls
are missing on this environment.
llvm-svn: 213883
As destination k0 is allowed but not as predicate/writemask.
I also modified the test to allow checking of error messages by the assembler.
I applied a similar approach to the test ret.s in the same directory.
llvm-svn: 212504
Silvermont can only decode one instruction per cycle if the instruction exceeds 8 bytes.
Also in Silvermont instructions with more than 3 prefixes will cause 3 cycle penalty.
Maximum nop length is limited to 7 bytes when used for padding on Silvermont.
For other x86 processors max nop length remains unchanged 15 bytes.
Differential Revision: http://reviews.llvm.org/D4374
llvm-svn: 212321
This includes assembler and codegen support (see the new tests in
avx512-encodings.s and avx512-shuffle.ll).
<rdar://problem/17492620>
llvm-svn: 212221
For now I only updated the _alt variants. The main variants are used by
codegen and that will need a bit more work to trigger.
<rdar://problem/17492620>
llvm-svn: 212114
For now I used a separate template for these sub-vector/tuple broadcasts
rather than sharing the mem variants with avx512_int_broadcast_rm.
<rdar://problem/17402869>
llvm-svn: 211828
The *_alt defs for vcmp are used by the InstParser (the asm string in the main
def is used by the InstPrinter) . The former was accepting vector registers
as destination rather than mask registers.
llvm-svn: 211750
We would get confused by '@' characters in symbol names, we would
mistake the text following them for the variant kind.
When an identifier a string, the variant kind will never show up inside
of it. Instead, check to see if there is a variant following the
string.
This fixes PR19965.
llvm-svn: 211249
Note that I followed the AVX2 convention here and didn't add LLVM intrinsics
for stores. These can be generated with the nontemporal hint on LLVM IR
stores (see new test). The GCC builtins are lowered directly into nontemporal
stores.
<rdar://problem/17082571>
llvm-svn: 211176
Given
bar = foo + 4
.long bar
MC would eat the 4. GNU as includes it in the relocation. The rule seems to be
that a variable that defines a symbol is used in the relocation and one that
does not define a symbol is evaluated and the result included in the relocation.
Fixing this unfortunately required some other changes:
* Since the variable is now evaluated, it would prevent the ELF writer from
noticing the weakref marker the elf streamer uses. This patch then replaces
that with a VariantKind in MCSymbolRefExpr.
* Using VariantKind then requires us to look past other VariantKind to see
.weakref bar,foo
call bar@PLT
doing this also fixes
zed = foo +2
call zed@PLT
so that is a good thing.
* Looking past VariantKind means that the relocation selection has to use
the fixup instead of the target.
This is a reboot of the previous fixes for MC. I will watch the sanitizer
buildbot and wait for a build before adding back the previous fixes.
llvm-svn: 204294
This changes the implementation of local directional labels to use a dedicated
map. With that it can then just use CreateTempSymbol, which is what the rest
of MC uses.
CreateTempSymbol doesn't do a great job at making sure the names are unique
(or being efficient when the names are not needed), but that should probably
be fixed in a followup patch.
This fixes pr18928.
llvm-svn: 203826
Original commits messages:
Add MRMXr/MRMXm form to X86 for use by instructions which treat the 'reg' field of modrm byte as a don't care value. Will allow for simplification of disassembler code.
Simplify a bunch of code by removing the need for the x86 disassembler table builder to know about extended opcodes. The modrm forms are sufficient to convey the information.
llvm-svn: 201065
These should end up (in ELF) as R_X86_64_32S relocs, not R_X86_64_32.
Kill the horrid and incomplete special case and FIXME in
EncodeInstruction() and set things up so it can infer the signedness
from the ImmType just like it can the size and whether it's PC-relative.
llvm-svn: 200495
Placed the MC variant diagnostics in the wrong directory accidentally. Move
them into their respective architecture specific directories.
llvm-svn: 200161
scale factors in memory addresses. As it does for .att_syntax.
It was producing:
Assertion failed: (((Scale == 1 || Scale == 2 || Scale == 4 || Scale == 8)) && "Invalid scale!"), function CreateMem, file /Volumes/SandBox/llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp, line 1133.
rdar://14967214
llvm-svn: 199942
This finishes the job started in r198756, and creates separate opcodes for
64-bit vs. 32-bit versions of the rest of the RET instructions too.
LRETL/LRETQ are interesting... I can't see any justification for their
existence in the SDM. There should be no 'LRETL' in 64-bit mode, and no
need for a REX.W prefix for LRETQ. But this is what GAS does, and my
Sandybridge CPU and an Opteron 6376 concur when tested as follows:
asm __volatile__("pushq $0x1234\nmovq $0x33,%rax\nsalq $32,%rax\norq $1f,%rax\npushq %rax\nlretl $8\n1:");
asm __volatile__("pushq $1234\npushq $0x33\npushq $1f\nlretq $8\n1:");
asm __volatile__("pushq $0x33\npushq $1f\nlretq\n1:");
asm __volatile__("pushq $0x1234\npushq $0x33\npushq $1f\nlretq $8\n1:");
cf. PR8592 and commit r118903, which added LRETQ. I only added LRETIQ to
match it.
I don't quite understand how the Intel syntax parsing for ret
instructions is working, despite r154468 allegedly fixing it. Aren't the
explicitly sized 'retw', 'retd' and 'retq' supposed to work? I have at
least made the 'lretq' work with (and indeed *require*) the 'q'.
llvm-svn: 199106
The target specific parser should return `false' if the target AsmParser handles
the directive, and `true' if the generic parser should handle the directive.
Many of the target specific directive handlers would `return Error' which does
not follow these semantics. This change simply changes the target specific
routines to conform to the semantis of the ParseDirective correctly.
Conformance to the semantics improves diagnostics emitted for the invalid
directives. X86 is taken as a sample to ensure that multiple diagnostics are
not presented for a single error.
llvm-svn: 199068
We can't do a perfect job here. We *have* to allow (%dx) even in 64-bit
mode, for example, because it might be used for an unofficial form of
the in/out instructions. We actually want to do a better job of validation
*later*. Perhaps *instead* of doing it where we are at the moment.
But for now, doing what validation we *can* do in the place that the code
already has its validation, is an improvement.
llvm-svn: 198760
It seems there is no separate instruction class for having AdSize *and*
OpSize bits set, which is required in order to disambiguate between all
these instructions. So add that to the disassembler.
Hm, perhaps we do need an AdSize16 bit after all?
llvm-svn: 198759
Where "where possible" means that it's an immediate value and it's below
0x10000. In fact GAS will either truncate or error with larger values,
and will insist on using the addr32 prefix to get 32-bit addressing. So
perhaps we should do that, in a later patch.
llvm-svn: 198758
JCXZ should have the 0x67 prefix only if we're in 32-bit mode, so make that
appropriately conditional. And JECXZ needs the prefix instead.
llvm-svn: 198757
I couldn't see how to do this sanely without splitting RETQ from RETL.
Eric says: "sad about the inability to roundtrip them now, but...".
I have no idea what that means, but perhaps it wants preserving in the
commit comment.
llvm-svn: 198756
This fixes the bulk of 16-bit output, and the corresponding test case
x86-16.s now looks mostly like the x86-32.s test case that it was
originally based on. A few irrelevant instructions have been dropped,
and there are still some corner cases to be fixed in subsequent patches.
llvm-svn: 198752
The 0x66 prefix toggles between 16-bit and 32-bit addressing mode.
So in 32-bit mode it is used to switch to 16-bit addressing mode for the
following instruction, while in 16-bit mode it's the other way round — it's
used to switch to 32-bit mode instead.
Thus, emit the 0x66 prefix byte for OpSize only in 32-bit (and 64-bit) mode,
and introduce a new OpSize16 bit which is used in 16-bit mode instead.
This is just the basic infrastructure for that change; a subsequent patch
will add the new OpSize16 bit to the 32-bit instructions that need it.
Patch from David Woodhouse.
llvm-svn: 198586
This is not really expected to work right yet. Mostly because we will
still emit the OpSize (0x66) prefix in all the wrong places, along with
a number of other corner cases. Those will all be fixed in the subsequent
commits.
Patch from David Woodhouse.
llvm-svn: 198584
Add some tests to validate correct register selection, including a fix
to an existing test which was requiring the *wrong* output.
Patch from David Woodhouse.
llvm-svn: 198566
this commit as the only one on the Blamelist so I quickly reverted this.
However it was actually Nick's change who has since fixed that issue.
Original commit message:
Changed the X86 assembler for intel syntax to work with directional labels.
The X86 assembler as a separate code to parser the intel assembly syntax
in X86AsmParser::ParseIntelOperand(). This did not parse directional labels.
And if something like 1f was used as a branch target it would get an
"Unexpected token" error.
The fix starts in X86AsmParser::ParseIntelExpression() in the case for
AsmToken::Integer, it needs to grab the IntVal from the current token
then look for a 'b' or 'f' following an Integer. Then it basically needs to
do what is done in AsmParser::parsePrimaryExpr() for directional
labels. It saves the MCExpr it creates in the IntelExprStateMachine
in the Sym field.
When it returns to X86AsmParser::ParseIntelOperand() it looks
for a non-zero Sym field in the IntelExprStateMachine and if
set it creates a memory operand not an immediate operand
it would normally do for the Integer.
rdar://14961158
llvm-svn: 197744
The X86 assembler has a separate code to parser the intel assembly syntax
in X86AsmParser::ParseIntelOperand(). This did not parse directional labels.
And if something like 1f was used as a branch target it would get an
"Unexpected token" error.
The fix starts in X86AsmParser::ParseIntelExpression() in the case for
AsmToken::Integer, it needs to grab the IntVal from the current token
then look for a 'b' or 'f' following the Integer. Then it basically needs to
do what is done in AsmParser::parsePrimaryExpr() for directional
labels. It saves the MCExpr it creates in the IntelExprStateMachine
in the Sym field.
When it returns to X86AsmParser::ParseIntelOperand() it looks
for a non-zero Sym field in the IntelExprStateMachine and if
set it creates a memory operand not an immediate operand
it would normally do for the Integer.
rdar://14961158
llvm-svn: 197728
Patch by Mikulas Patocka. I added the test. I checked that for cpu names that
gas knows about, it also doesn't generate nopl.
The modified cpus:
i686 - there are i686-class CPUs that don't have nopl: Via c3, Transmeta
Crusoe, Microsoft VirtualBox - see
https://bbs.archlinux.org/viewtopic.php?pid=775414
k6, k6-2, k6-3, winchip-c6, winchip2 - these are 586-class CPUs
via c3 c3-2 - see https://bugs.archlinux.org/task/19733 as a proof that
Via c3 and c3-Nehemiah don't have nopl
llvm-svn: 195679
On darwin, when trying to create compact unwind info, a .cfi_cfa_def
directive would case an llvm_unreachable() to be hit. Back off when we
see this directive and generate the regular DWARF style eh_frame.
rdar://15406518
llvm-svn: 194285
This allows the instruction to be encoded using the 2-byte VEX form instead of the 3-byte VEX form. The GNU assembler has similar behavior and instruction selection already does this.
llvm-svn: 192088
Add basic assembly/disassembly support for the first Intel SHA
instruction 'sha1rnds4'. Also includes feature flag, and test cases.
Support for the remaining instructions will follow in a separate patch.
llvm-svn: 190611
- Instead of setting the suffixes in a bunch of places, just set one master
list in the top-level config. We now only modify the suffix list in a few
suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py).
- Aside from removing the need for a bunch of lit.local.cfg files, this enables
4 tests that were inadvertently being skipped (one in
Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and
CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been
XFAILED).
- This commit also fixes a bunch of config files to use config.root instead of
older copy-pasted code.
llvm-svn: 188513
For decoding, keep the current behavior of always decoding these as their REP
versions. In the future, this could be improved to recognize the cases where
these behave as XACQUIRE and XRELEASE and decode them as such.
llvm-svn: 184207
The issue was that the MatchingInlineAsm and VariantID args to the
MatchInstructionImpl function weren't being set properly. Specifically, when
parsing intel syntax, the parser thought it was parsing inline assembly in the
at&t dialect; that will never be the case.
The crash was caused when the emitter tried to emit the instruction, but the
operands weren't set. When parsing inline assembly we only set the opcode, not
the operands, which is used to lookup the instruction descriptor.
rdar://13854391 and PR15945
Also, this commit reverts r176036. Now that we're correctly parsing the intel
syntax the pushad/popad don't match properly. I've reimplemented that fix using
a MnemonicAlias.
llvm-svn: 181620
unable to handle cases such as __asm mov eax, 8*-8.
This patch also attempts to simplify the state machine. Further, the error
reporting has been improved. Test cases included, but more will be added to
the clang side shortly.
rdar://13668445
llvm-svn: 179719
As these two instructions in AVX extension are privileged instructions for
special purpose, it's only expected to be used in inlined assembly.
llvm-svn: 179266
memory operands.
Essentially, this layers an infix calculator on top of the parsing state
machine. The scale on the index register is still expected to be an immediate
__asm mov eax, [eax + ebx*4]
and will not work with more complex expressions. For example,
__asm mov eax, [eax + ebx*(2*2)]
The plus and minus binary operators assume the numeric value of a register is
zero so as to not change the displacement. Register operands should never
be an operand for a multiply or divide operation; the scale*indexreg
expression is always replaced with a zero on the operand stack to prevent
such a case.
rdar://13521380
llvm-svn: 178881
one-byte NOPs. If the processor actually executes those NOPs, as it sometimes
does with aligned bundling, this can have a performance impact. From my
micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve
one-byte NOPs is about 20% worse than a 15 followed by a 12. This patch
changes NOP emission to emit as many 15-byte (the maximum) as possible followed
by at most one shorter NOP.
llvm-svn: 176464
This is complicated by backward labels (e.g., 0b can be both a backward label
and a binary zero). The current implementation assumes [0-9]b is always a
label and thus it's possible for 0b and 1b to not be interpreted correctly for
ms-style inline assembly. However, this is relatively simple to fix in the
inline assembly (i.e., drop the [bB]).
This patch also limits backward labels to [0-9]b, so that only 0b and 1b are
ambiguous.
Part of rdar://12470373
llvm-svn: 174983
Currently, when a fragment is relaxed, its size is modified, but its
offset is not (it gets laid out as a side effect of checking whether
it needs relaxation), then all subsequent fragments are invalidated
because their offsets need to change. When bundling is enabled,
relaxed fragments need to get laid out again, because the increase in
size may push it over a bundle boundary. So instead of only
invalidating subsequent fragments, also invalidate the fragment that
gets relaxed, which causes it to get laid out again.
This patch also fixes some trailing whitespace and fixes the
bundling-related debug output of MCFragments.
llvm-svn: 174401