2001-06-06 22:29:01 +02:00
|
|
|
From: Chris Lattner <sabre@nondot.org>
|
|
|
|
To: "Vikram S. Adve" <vadve@cs.uiuc.edu>
|
|
|
|
Subject: Re: LLVM Feedback
|
|
|
|
|
|
|
|
I've included your feedback in the /home/vadve/lattner/llvm/docs directory
|
|
|
|
so that it will live in CVS eventually with the rest of LLVM. I've
|
|
|
|
significantly updated the documentation to reflect the changes you
|
|
|
|
suggested, as specified below:
|
|
|
|
|
|
|
|
> We should consider eliminating the type annotation in cases where it is
|
|
|
|
> essentially obvious from the instruction type:
|
|
|
|
> br bool <cond>, label <iftrue>, label <iffalse>
|
|
|
|
> I think your point was that making all types explicit improves clarity
|
|
|
|
> and readability. I agree to some extent, but it also comes at the
|
|
|
|
> cost of verbosity. And when the types are obvious from people's
|
|
|
|
> experience (e.g., in the br instruction), it doesn't seem to help as
|
|
|
|
> much.
|
|
|
|
|
|
|
|
Very true. We should discuss this more, but my reasoning is more of a
|
|
|
|
consistency argument. There are VERY few instructions that can have all
|
2011-04-15 07:18:47 +02:00
|
|
|
of the types eliminated, and doing so when available unnecessarily makes
|
2001-06-06 22:29:01 +02:00
|
|
|
the language more difficult to handle. Especially when you see 'int
|
|
|
|
%this' and 'bool %that' all over the place, I think it would be
|
|
|
|
disorienting to see:
|
|
|
|
|
|
|
|
br %predicate, %iftrue, %iffalse
|
|
|
|
|
|
|
|
for branches. Even just typing that once gives me the creeps. ;) Like I
|
|
|
|
said, we should probably discuss this further in person...
|
|
|
|
|
|
|
|
> On reflection, I really like your idea of having the two different
|
|
|
|
> switch types (even though they encode implementation techniques rather
|
|
|
|
> than semantics). It should simplify building the CFG and my guess is it
|
|
|
|
> could enable some significant optimizations, though we should think
|
|
|
|
> about which.
|
|
|
|
|
|
|
|
Great. I added a note to the switch section commenting on how the VM
|
|
|
|
should just use the instruction type as a hint, and that the
|
|
|
|
implementation may choose altermate representations (such as predicated
|
|
|
|
branches).
|
|
|
|
|
|
|
|
> In the lookup-indirect form of the switch, is there a reason not to
|
|
|
|
> make the val-type uint?
|
|
|
|
|
|
|
|
No. This was something I was debating for a while, and didn't really feel
|
|
|
|
strongly about either way. It is common to switch on other types in HLL's
|
2011-04-15 07:18:47 +02:00
|
|
|
(for example signed int's are particularly common), but in this case, all
|
2001-06-06 22:29:01 +02:00
|
|
|
that will be added is an additional 'cast' instruction. I removed that
|
|
|
|
from the spec.
|
|
|
|
|
|
|
|
> I agree with your comment that we don't need 'neg'
|
|
|
|
|
|
|
|
Removed.
|
|
|
|
|
|
|
|
> There's a trade-off with the cast instruction:
|
|
|
|
> + it avoids having to define all the upcasts and downcasts that are
|
|
|
|
> valid for the operands of each instruction (you probably have
|
|
|
|
> thought of other benefits also)
|
|
|
|
> - it could make the bytecode significantly larger because there could
|
|
|
|
> be a lot of cast operations
|
|
|
|
|
|
|
|
+ You NEED casts to represent things like:
|
|
|
|
void foo(float);
|
|
|
|
...
|
|
|
|
int x;
|
|
|
|
...
|
|
|
|
foo(x);
|
|
|
|
in a language like C. Even in a Java like language, you need upcasts
|
|
|
|
and some way to implement dynamic downcasts.
|
|
|
|
+ Not all forms of instructions take every type (for example you can't
|
|
|
|
shift by a floating point number of bits), thus SOME programs will need
|
|
|
|
implicit casts.
|
|
|
|
|
|
|
|
To be efficient and to avoid your '-' point above, we just have to be
|
|
|
|
careful to specify that the instructions shall operate on all common
|
|
|
|
types, therefore casting should be relatively uncommon. For example all
|
|
|
|
of the arithmetic operations work on almost all data types.
|
|
|
|
|
|
|
|
> Making the second arg. to 'shl' a ubyte seems good enough to me.
|
|
|
|
> 255 positions seems adequate for several generations of machines
|
|
|
|
|
|
|
|
Okay, that comment is removed.
|
|
|
|
|
|
|
|
> and is more compact than uint.
|
|
|
|
|
|
|
|
No, it isn't. Remember that the bytecode encoding saves value slots into
|
|
|
|
the bytecode instructions themselves, not constant values. This is
|
|
|
|
another case where we may introduce more cast instructions (but we will
|
|
|
|
also reduce the number of opcode variants that must be supported by a
|
|
|
|
virtual machine). Because most shifts are by constant values, I don't
|
|
|
|
think that we'll have to cast many shifts. :)
|
|
|
|
|
|
|
|
> I still have some major concerns about including malloc and free in the
|
|
|
|
> language (either as builtin functions or instructions).
|
|
|
|
|
|
|
|
Agreed. How about this proposal:
|
|
|
|
|
|
|
|
malloc/free are either built in functions or actual opcodes. They provide
|
|
|
|
all of the type safety that the document would indicate, blah blah
|
|
|
|
blah. :)
|
|
|
|
|
|
|
|
Now, because of all of the excellent points that you raised, an
|
|
|
|
implementation may want to override the default malloc/free behavior of
|
|
|
|
the program. To do this, they simply implement a "malloc" and
|
|
|
|
"free" function. The virtual machine will then be defined to use the user
|
|
|
|
defined malloc/free function (which return/take void*'s, not type'd
|
|
|
|
pointers like the builtin function would) if one is available, otherwise
|
|
|
|
fall back on a system malloc/free.
|
|
|
|
|
|
|
|
Does this sound like a good compromise? It would give us all of the
|
|
|
|
typesafety/elegance in the language while still allowing the user to do
|
|
|
|
all the cool stuff they want to...
|
|
|
|
|
|
|
|
> 'alloca' on the other hand sounds like a good idea, and the
|
|
|
|
> implementation seems fairly language-independent so it doesn't have the
|
|
|
|
> problems with malloc listed above.
|
|
|
|
|
|
|
|
Okay, once we get the above stuff figured out, I'll put it all in the
|
|
|
|
spec.
|
|
|
|
|
|
|
|
> About indirect call:
|
|
|
|
> Your option #2 sounded good to me. I'm not sure I understand your
|
|
|
|
> concern about an explicit 'icall' instruction?
|
|
|
|
|
|
|
|
I worry too much. :) The other alternative has been removed. 'icall' is
|
|
|
|
now up in the instruction list next to 'call'.
|
|
|
|
|
|
|
|
> I believe tail calls are relatively easy to identify; do you know why
|
|
|
|
> .NET has a tailcall instruction?
|
|
|
|
|
|
|
|
Although I am just guessing, I believe it probably has to do with the fact
|
|
|
|
that they want languages like Haskell and lisp to be efficiently runnable
|
|
|
|
on their VM. Of course this means that the VM MUST implement tail calls
|
|
|
|
'correctly', or else life will suck. :) I would put this into a future
|
|
|
|
feature bin, because it could be pretty handy...
|
|
|
|
|
|
|
|
> A pair of important synchronization instr'ns to think about:
|
|
|
|
> load-linked
|
|
|
|
> store-conditional
|
|
|
|
|
|
|
|
What is 'load-linked'? I think that (at least for now) I should add these
|
|
|
|
to the 'possible extensions' section, because they are not immediately
|
|
|
|
needed...
|
|
|
|
|
|
|
|
> Other classes of instructions that are valuable for pipeline
|
|
|
|
> performance:
|
|
|
|
> conditional-move
|
|
|
|
> predicated instructions
|
|
|
|
|
|
|
|
Conditional move is effectly a special case of a predicated
|
|
|
|
instruction... and I think that all predicated instructions can possibly
|
|
|
|
be implemented later in LLVM. It would significantly change things, and
|
2003-08-18 16:43:39 +02:00
|
|
|
it doesn't seem to be very necessary right now. It would seem to
|
2001-06-06 22:29:01 +02:00
|
|
|
complicate flow control analysis a LOT in the virtual machine. I would
|
|
|
|
tend to prefer that a predicated architecture like IA64 convert from a
|
|
|
|
"basic block" representation to a predicated rep as part of it's dynamic
|
|
|
|
complication phase. Also, if a basic block contains ONLY a move, then
|
|
|
|
that can be trivally translated into a conditional move...
|
|
|
|
|
|
|
|
> I agree that we need a static data space. Otherwise, emulating global
|
|
|
|
> data gets unnecessarily complex.
|
|
|
|
|
2011-04-15 07:18:47 +02:00
|
|
|
Definitely. Also a later item though. :)
|
2001-06-06 22:29:01 +02:00
|
|
|
|
|
|
|
> We once talked about adding a symbolic thread-id field to each
|
|
|
|
> ..
|
|
|
|
> Instead, it could a great topic for a separate study.
|
|
|
|
|
|
|
|
Agreed. :)
|
|
|
|
|
|
|
|
> What is the semantics of the IA64 stop bit?
|
|
|
|
|
|
|
|
Basically, the IA64 writes instructions like this:
|
|
|
|
mov ...
|
|
|
|
add ...
|
|
|
|
sub ...
|
|
|
|
op xxx
|
|
|
|
op xxx
|
|
|
|
;;
|
|
|
|
mov ...
|
|
|
|
add ...
|
|
|
|
sub ...
|
|
|
|
op xxx
|
|
|
|
op xxx
|
|
|
|
;;
|
|
|
|
|
|
|
|
Where the ;; delimits a group of instruction with no dependencies between
|
|
|
|
them, which can all be executed concurrently (to the limits of the
|
|
|
|
available functional units). The ;; gets translated into a bit set in one
|
|
|
|
of the opcodes.
|
|
|
|
|
|
|
|
The advantages of this representation is that you don't have to do some
|
|
|
|
kind of 'thread id scheduling' pass by having to specify ahead of time how
|
|
|
|
many threads to use, and the representation doesn't have a per instruction
|
|
|
|
overhead...
|
|
|
|
|
|
|
|
> And finally, another thought about the syntax for arrays :-)
|
|
|
|
> Although this syntax:
|
|
|
|
> array <dimension-list> of <type>
|
|
|
|
> is verbose, it will be used only in the human-readable assembly code so
|
|
|
|
> size should not matter. I think we should consider it because I find it
|
|
|
|
> to be the clearest syntax. It could even make arrays of function
|
|
|
|
> pointers somewhat readable.
|
|
|
|
|
|
|
|
My only comment will be to give you an example of why this is a bad
|
|
|
|
idea. :)
|
|
|
|
|
|
|
|
Here is an example of using the switch statement (with my recommended
|
|
|
|
syntax):
|
|
|
|
|
|
|
|
switch uint %val, label %otherwise,
|
|
|
|
[%3 x {uint, label}] [ { uint %57, label %l1 },
|
|
|
|
{ uint %20, label %l2 },
|
|
|
|
{ uint %14, label %l3 } ]
|
|
|
|
|
|
|
|
Here it is with the syntax you are proposing:
|
|
|
|
|
|
|
|
switch uint %val, label %otherwise,
|
|
|
|
array %3 of {uint, label}
|
|
|
|
array of {uint, label}
|
|
|
|
{ uint %57, label %l1 },
|
|
|
|
{ uint %20, label %l2 },
|
|
|
|
{ uint %14, label %l3 }
|
|
|
|
|
|
|
|
Which is ambiguous and very verbose. It would be possible to specify
|
|
|
|
constants with [] brackets as in my syntax, which would look like this:
|
|
|
|
|
|
|
|
switch uint %val, label %otherwise,
|
|
|
|
array %3 of {uint, label} [ { uint %57, label %l1 },
|
|
|
|
{ uint %20, label %l2 },
|
|
|
|
{ uint %14, label %l3 } ]
|
|
|
|
|
|
|
|
But then the syntax is inconsistent between type definition and constant
|
|
|
|
definition (why do []'s enclose the constants but not the types??).
|
|
|
|
|
|
|
|
Anyways, I'm sure that there is much debate still to be had over
|
|
|
|
this... :)
|
|
|
|
|
|
|
|
-Chris
|
|
|
|
|
|
|
|
http://www.nondot.org/~sabre/os/
|
|
|
|
http://www.nondot.org/MagicStats/
|
|
|
|
http://korbit.sourceforge.net/
|
|
|
|
|
|
|
|
|