Fetch-decode-execute-store. This is the repeating, endless, tireless, repetition of the machine. Like some gigantic mobile that is always in motion, the processor of today is cycling through those motions at an other-dimensional speed. Like the science fiction character who moves so fast he is invisible, the computer’s motion seems beyond comprehension.
But, as you can see, our mind’s imagination can slow down the cycle, single stepping it to examine how it works. We can slow the machine down to a crawl so we can see what the blur of levers and gears of this electronic machine are doing.
In last week’s blog, I was vague about specific instructions, those orders for the machine. I didn’t give concrete examples of instructions, just the vague notion of how instructions are processed. In this blog, I will put some meat on those bones by giving you specifics of a real machine, the real orders for the operations, a real set of codes that are loaded into the memory of our computer of the mind’s eye.
I do all this because in order to understand a virtual machine you have to understand a real one. I won’t dump so much information on you that it will cease to be interesting, I hope. I hope instead to give you just a bit of real instructions for a real machine so that you can then see clearly the basic operation, not so much that you can be a programmer but an informed observer. Luckily, it only takes a small part of instruction set to fully explain how the computer works, for you to get an intuitive feel for the machine.
As I mentioned in the last blog post, I will use the IBM System/360 architecture as my example. This is because it remains one of the best instruction sets ever invented, both simple and complex, clean and pragmatic, theoretical and practical. It was designed by three exceptional people, Gene Amdahl, Fred Brooks, and Gerry Blaauw, in the early 1960s, over 56 years ago! Back then, IBM realized that they needed a single architecture to replace their six different instruction sets of their six different computer lines. The result was the New Product Line project to design a new, single architecture for all the use cases. They worked hard to design an instruction set that would work for scientific number crunching, commercial applications, and systems work, such as operating systems and compilers. And they succeeded beyond all expectations.
The result was the System/360, an all encompassing architecture that could be implemented in several different widely differing machines. In the late 1960s, they updated it a little, illogically calling it the System/370. All encompassing and then ten more degrees?
It was designed originally for only a maximum of sixteen megabytes of store, but soon was expanded to two gigabytes. Several updates were made to the architecture over the years, and it did get a bit thick around its middle, but it still carried on. In fact, it’s still used to this day in IBM’s z14 mainframe, a $2B business for the company.
That’s right. It is still being produced and used today. Many thousands of Linux servers run on a single z14, all using the 360’s architecture.
But the reason I chose it for this exercise is because, while it is classified as a Complex Instruction Set Computer (CISC) architecture, it remains particularly clean one. It is fully documented in a 168 page Principles of Operation, as compared with Intel x86 hefty tome of 4,922 pages, as of May 2019. It was also my first architecture, which may bias my viewpoint somewhat. But it is an architecture that plays an important role in the history of virtualization. Now, on with the machine description.
The System/360
The System/360 architecture has a set of sixteen 32 bit registers, numbered 0 through 15. In hexadecimal, that’s 0 through F. (For those who have yet to learn the binary numbering systems, hexadecimal uses sixteen digits, 0-9 and A-F, where A is ten, B is eleven, and F is fifteen. This lets us use a single character to describe four bits of a byte. At this point in my career I see the bits of each hex digit in my mind automatically. F is binary 1111, 9 is 1001, A is 1010, 5 is 0101, and so forth. There can be two hex digits in an eight bit byte. You don’t have to see the bits, just understand that B is eleven, C is twelve, and so forth.)
The model for a register machine like the 360 is simple. Load data into registers, perform operations on them, and store the results back to memory. To do this there are load and store register instructions. Here is the load instruction:
L 12,PLACE LOAD THE VALUE FROM PLACE INTO REGISTER 12
Let me explain this notation a bit. This is a source line for an assembler program. The assembler is the program that reads the source lines and turns them into the binary instructions. In UNIX and Unix-like systems, such as Linux, assembler source files end in .s (dot s), which originally stood for “source.” You can see that it’s divided into fields separated by blanks. The first field, the “L” is a mnemonic for the “load” instruction. The operation code for load is 58 hexadecimal.
The “12,PLACE” field specifies the operands of the instruction, saying that we want to set register 12 to the 32 bit binary value in the memory location specified by the address PLACE. Anything else left on the line is ignored and can be used to make comments, only occasionally snide, about the code. The assembler would turn this line into the following binary:
58C00F42It would be those four bytes that the machine would fetch, decode, and execute to move the datum at PLACE into register 12. The 58 is hex for the load operation. PLACE seems to be at location F42, which it is for our purposes, but there is a bit more detail there that we need not delved into. The “C” in the above is the 12, the number for our register.
The image at the top of this blog shows all the different instruction formats. If you’re interested in digging more into it, just drop me an email, and I can tell you more.
So, how does the assembler know where “PLACE” is? As the assembler program processes our source, it generates bytes for the instructions and other data. As it does, it keeps track of where it is in memory with a location counter variable. To allocate space for our variable PLACE we would have entered the following line:
PLACE DS Fwhich says that at that location in the program, we have to set aside a full word, that is 32 bits, and set the value of the label PLACE to that memory location. The assembler knows that PLACE is a label because it started at the left margin. The load line above had no label. Here we have a new mnemonic, “DS,” which means Declare Storage, and a new operand, “F,” which means full word--32 bits. The DS is called a pseudo-operation in this case. Any line in the source can have a label.
There is another frequently used pseudo-operation you’ll see if you look at some 360 code. The EQU pseudo-op sets the named label to the value of the expression in the operands. You see something like:
PI EQU 31416 1000 * THE MATH CONST PIOr, for labels in code:
HERE EQU * THE ASTERISK MEANS THE LOCATION COUNTERLabels in code often are written that way to make changing the first line after the label easy.
Now, let’s do something simple, like calculating the following expression: A = B + C + D - E. Each of the letters are variable in our assembler program. We assume values are already in B through E. Here the program:
... L 4,B GET THE VALUE IN VARIABLE B INTO 4 A 4,C ADD C TO IT A 4,D ADD D TO THAT S 4,E SUBTRACT E FROM THE TOTAL ST 4,A AND SAVE THE RESULTS ... A DS F VARIABLES A, B, C, D, & E HAVE SOMETHING B DS F STORED IN THEM FROM OTHER PARTS OF OUR C DS F IMAGINARY PROGRAM. D DS F E DS FThe first five lines do the work, and the last five declare the variables in memory. The mnemonic “A” adds, and “S” subtracts. The “ST” stores the answer. As you can see, the actual program is very easy to follow. We load the value of B from memory into register 4. We then do the operations from memory into that register. More complex expression might use more registers, which is okay. We have plenty.
Speaking of using more registers, multiplication has an interesting feature. It requires a larger place than a single register for the result. After all, if I multiply 1000 by 1000 I get 1,000,000. Just add the zeros. If I multiply 32 bits by 32 bits I’ll need 64 bits to capture the result. Our architecture does this by setting aside two registers for the 64 bit result. We put the multiplicand in the odd register, multiply the multiplier to it, and the result winds up in the even/odd pair. Load the value in register 1. Multiply to register 0. The 64 bit result is in 0 and 1, high bits going in 0. Here is an example:
L 1,HOURS GET THE HOURS WORKED IN 0/1 REGISTER PAIR M 0,RATE MULTIPLY IT BY THE RATE. ST 1,PAY SAVE IN THE PAY VARIABLEIt’s a little funny at first, but all other systems have to do the same to have enough bits for the product of two 32 bit numbers. In the Intel x86 the result is put in EDX:EAX.
Divide is similar.
But things would quickly come to an end if we could only execute a single, straight line sequence of instructions. We need the ability to repeat a sequence of instructions, and the ability to selectively execute different sequences of instructions. In other words, we need to be able to branch.
Branching is altering the sequence of instructions via the use of an instruction that changes the value in the program counter register. You’ll remember that the program counter is used by the “fetch” mechanism to specify the memory location of the next instruction. In higher level languages, like C, we use various language constructs for branching. One example is the for loop.
In the “for” loop we have three phrases: the initial, the condition, and the iterative. A made up example is as follows:
for (i = 0; i < 100; i++) DO STUFF HERE;The phrase “i = 0” is the initial phrase, and it gets executed once as we fall into the loop. We loop as long as the second phrase, the condition phrase, is true, while “i” is less than 100 in this example. That is, we check the condition phrase, and if it is true, we DO STUFF HERE, and return to the top of the loop and check the condition again. The iterative phrase increments our variable “i” by one each time we loop.
Here is how a C compiler for our architecture might translate the above code:
(1) S 1,1 USE REGISTER 1 FOR THE VARIABLE I; SET TO ZERO (2) LOOP EQU * (3) C 1,=F’100’ COMPARE INSTRUCTION; SET CONDITION CODE (4) BNL DONE WAS NOT LESS THAN, WE’RE DONE (5) DO STUFF HERE (6) A 1,=F’1’ ADD ONE TO I (7) B LOOP DO IT AGAIN (8) DONE EQU *Here we’ve used register 1 instead of a variable “i” in memory. This is called the registerisation of the variable “i.” You can see the string “DO STUFF HERE” as the body of the loop, which gives you a sense of what’s going on around it. Line 1 sets our register 1 to zero. It subtracts it from itself, a surefire way to zero out a register. Line 2 has a label LOOP using the pseudo-operation EQU.
The operation on line 3 is a compare instruction. A compare instruction subtracts two numbers discarding the result. Its sole purpose is to set something called the condition code (cc) register. In this example, the value 100 is subtracted from the contents of register 1. The first time through the loop, this will result in a -100, which is less than zero and set the cc register to 1. When the value of register 1 becomes 100, by looping that many times, the comparison will result in a zero which will set the cc to 0. If somehow the register 1 got changed to 101, the result would be greater than 0 and the cc would be set to 2. The cc gets set as things move through the arithmetic/logical unit; some instructions cause the cc to be set. Other instructions, like load and store, do not.
What do we do with this cc? We can change the instruction sequence based on the cc’s value. On line 4, the BNL is a conditional branch. The NL part of the mnemonic means NOT LESS THAN. If “i” is not less than 100, we will take the branch. If the value of the cc is 0 or 2, the value in the operand is loaded into the program counter and the next instruction is taken from there, the address of DONE in our case. If not, the instruction following the branch on condition is fetched.
For now, let’s assume we did not take the branch. After DOING STUFF, we have an “add” instruction that adds a constant 1 to register 1 bringing it closer to that magic value of 100. It is our loop counter. Line 7 is an unconditional branch back to the top of our for loop, so we do it all over again.
That was quite a lot, but now we have a good feel for what the machine actually does, even if there are a lot more instructions and things we haven’t covered. Those, in a way, are mere details to be added to the ideas we have just learned. We see how instructions are executed and can be used to create loops, conditional execution, do calculations, everything we could do with the machine.
But so far it’s all from a user program point of view. The IBM System/360 defines two states of the machine, problem and privileged. We’ve only talked about the problem state. The privileged state is where the operating system, what some call the kernel, and what IBM called the control program, runs. Mostly, it is just like the problem state, most of the instructions being the same. But there are some new ideas down there. And some new instructions that can only be executed in privileged mode. Next week, we’ll talk some about them.
The virtual machine ideas will start to creep in as we do.