From: Chris Lattner <sabre@nondot.org> | |
To: "Vikram S. Adve" <vadve@cs.uiuc.edu> | |
Subject: Re: LLVM Feedback | |
I've included your feedback in the /home/vadve/lattner/llvm/docs directory | |
so that it will live in CVS eventually with the rest of LLVM. I've | |
significantly updated the documentation to reflect the changes you | |
suggested, as specified below: | |
> We should consider eliminating the type annotation in cases where it is | |
> essentially obvious from the instruction type: | |
> br bool <cond>, label <iftrue>, label <iffalse> | |
> I think your point was that making all types explicit improves clarity | |
> and readability. I agree to some extent, but it also comes at the | |
> cost of verbosity. And when the types are obvious from people's | |
> experience (e.g., in the br instruction), it doesn't seem to help as | |
> much. | |
Very true. We should discuss this more, but my reasoning is more of a | |
consistency argument. There are VERY few instructions that can have all | |
of the types eliminated, and doing so when available unnecessarily makes | |
the language more difficult to handle. Especially when you see 'int | |
%this' and 'bool %that' all over the place, I think it would be | |
disorienting to see: | |
br %predicate, %iftrue, %iffalse | |
for branches. Even just typing that once gives me the creeps. ;) Like I | |
said, we should probably discuss this further in person... | |
> On reflection, I really like your idea of having the two different | |
> switch types (even though they encode implementation techniques rather | |
> than semantics). It should simplify building the CFG and my guess is it | |
> could enable some significant optimizations, though we should think | |
> about which. | |
Great. I added a note to the switch section commenting on how the VM | |
should just use the instruction type as a hint, and that the | |
implementation may choose altermate representations (such as predicated | |
branches). | |
> In the lookup-indirect form of the switch, is there a reason not to | |
> make the val-type uint? | |
No. This was something I was debating for a while, and didn't really feel | |
strongly about either way. It is common to switch on other types in HLL's | |
(for example signed int's are particularly common), but in this case, all | |
that will be added is an additional 'cast' instruction. I removed that | |
from the spec. | |
> I agree with your comment that we don't need 'neg' | |
Removed. | |
> There's a trade-off with the cast instruction: | |
> + it avoids having to define all the upcasts and downcasts that are | |
> valid for the operands of each instruction (you probably have | |
> thought of other benefits also) | |
> - it could make the bytecode significantly larger because there could | |
> be a lot of cast operations | |
+ You NEED casts to represent things like: | |
void foo(float); | |
... | |
int x; | |
... | |
foo(x); | |
in a language like C. Even in a Java like language, you need upcasts | |
and some way to implement dynamic downcasts. | |
+ Not all forms of instructions take every type (for example you can't | |
shift by a floating point number of bits), thus SOME programs will need | |
implicit casts. | |
To be efficient and to avoid your '-' point above, we just have to be | |
careful to specify that the instructions shall operate on all common | |
types, therefore casting should be relatively uncommon. For example all | |
of the arithmetic operations work on almost all data types. | |
> Making the second arg. to 'shl' a ubyte seems good enough to me. | |
> 255 positions seems adequate for several generations of machines | |
Okay, that comment is removed. | |
> and is more compact than uint. | |
No, it isn't. Remember that the bytecode encoding saves value slots into | |
the bytecode instructions themselves, not constant values. This is | |
another case where we may introduce more cast instructions (but we will | |
also reduce the number of opcode variants that must be supported by a | |
virtual machine). Because most shifts are by constant values, I don't | |
think that we'll have to cast many shifts. :) | |
> I still have some major concerns about including malloc and free in the | |
> language (either as builtin functions or instructions). | |
Agreed. How about this proposal: | |
malloc/free are either built in functions or actual opcodes. They provide | |
all of the type safety that the document would indicate, blah blah | |
blah. :) | |
Now, because of all of the excellent points that you raised, an | |
implementation may want to override the default malloc/free behavior of | |
the program. To do this, they simply implement a "malloc" and | |
"free" function. The virtual machine will then be defined to use the user | |
defined malloc/free function (which return/take void*'s, not type'd | |
pointers like the builtin function would) if one is available, otherwise | |
fall back on a system malloc/free. | |
Does this sound like a good compromise? It would give us all of the | |
typesafety/elegance in the language while still allowing the user to do | |
all the cool stuff they want to... | |
> 'alloca' on the other hand sounds like a good idea, and the | |
> implementation seems fairly language-independent so it doesn't have the | |
> problems with malloc listed above. | |
Okay, once we get the above stuff figured out, I'll put it all in the | |
spec. | |
> About indirect call: | |
> Your option #2 sounded good to me. I'm not sure I understand your | |
> concern about an explicit 'icall' instruction? | |
I worry too much. :) The other alternative has been removed. 'icall' is | |
now up in the instruction list next to 'call'. | |
> I believe tail calls are relatively easy to identify; do you know why | |
> .NET has a tailcall instruction? | |
Although I am just guessing, I believe it probably has to do with the fact | |
that they want languages like Haskell and lisp to be efficiently runnable | |
on their VM. Of course this means that the VM MUST implement tail calls | |
'correctly', or else life will suck. :) I would put this into a future | |
feature bin, because it could be pretty handy... | |
> A pair of important synchronization instr'ns to think about: | |
> load-linked | |
> store-conditional | |
What is 'load-linked'? I think that (at least for now) I should add these | |
to the 'possible extensions' section, because they are not immediately | |
needed... | |
> Other classes of instructions that are valuable for pipeline | |
> performance: | |
> conditional-move | |
> predicated instructions | |
Conditional move is effectly a special case of a predicated | |
instruction... and I think that all predicated instructions can possibly | |
be implemented later in LLVM. It would significantly change things, and | |
it doesn't seem to be very necessary right now. It would seem to | |
complicate flow control analysis a LOT in the virtual machine. I would | |
tend to prefer that a predicated architecture like IA64 convert from a | |
"basic block" representation to a predicated rep as part of it's dynamic | |
complication phase. Also, if a basic block contains ONLY a move, then | |
that can be trivally translated into a conditional move... | |
> I agree that we need a static data space. Otherwise, emulating global | |
> data gets unnecessarily complex. | |
Definitely. Also a later item though. :) | |
> We once talked about adding a symbolic thread-id field to each | |
> .. | |
> Instead, it could a great topic for a separate study. | |
Agreed. :) | |
> What is the semantics of the IA64 stop bit? | |
Basically, the IA64 writes instructions like this: | |
mov ... | |
add ... | |
sub ... | |
op xxx | |
op xxx | |
;; | |
mov ... | |
add ... | |
sub ... | |
op xxx | |
op xxx | |
;; | |
Where the ;; delimits a group of instruction with no dependencies between | |
them, which can all be executed concurrently (to the limits of the | |
available functional units). The ;; gets translated into a bit set in one | |
of the opcodes. | |
The advantages of this representation is that you don't have to do some | |
kind of 'thread id scheduling' pass by having to specify ahead of time how | |
many threads to use, and the representation doesn't have a per instruction | |
overhead... | |
> And finally, another thought about the syntax for arrays :-) | |
> Although this syntax: | |
> array <dimension-list> of <type> | |
> is verbose, it will be used only in the human-readable assembly code so | |
> size should not matter. I think we should consider it because I find it | |
> to be the clearest syntax. It could even make arrays of function | |
> pointers somewhat readable. | |
My only comment will be to give you an example of why this is a bad | |
idea. :) | |
Here is an example of using the switch statement (with my recommended | |
syntax): | |
switch uint %val, label %otherwise, | |
[%3 x {uint, label}] [ { uint %57, label %l1 }, | |
{ uint %20, label %l2 }, | |
{ uint %14, label %l3 } ] | |
Here it is with the syntax you are proposing: | |
switch uint %val, label %otherwise, | |
array %3 of {uint, label} | |
array of {uint, label} | |
{ uint %57, label %l1 }, | |
{ uint %20, label %l2 }, | |
{ uint %14, label %l3 } | |
Which is ambiguous and very verbose. It would be possible to specify | |
constants with [] brackets as in my syntax, which would look like this: | |
switch uint %val, label %otherwise, | |
array %3 of {uint, label} [ { uint %57, label %l1 }, | |
{ uint %20, label %l2 }, | |
{ uint %14, label %l3 } ] | |
But then the syntax is inconsistent between type definition and constant | |
definition (why do []'s enclose the constants but not the types??). | |
Anyways, I'm sure that there is much debate still to be had over | |
this... :) | |
-Chris | |
http://www.nondot.org/~sabre/os/ | |
http://www.nondot.org/MagicStats/ | |
http://korbit.sourceforge.net/ | |