Instruction Level Parallelism (ILP) and Out of Order execution (OoO) -------------------------------------------------------------------- Suppose we want to compute "P = Q + R". Let's imagine this sequence of instructions, where "P", "Q", "R", and "adder" are symbols for slot numbers: Q -> adder R -> adder adder -> P etc -> etc // lots more instructions etc -> etc // which have nothing to do with the adder The first two instructions might take 10 ps each, dropping off data to be added. The third instruction must then wait for the adder to complete, say 1000 ps. Now suppose we rearrange the program like this: Q -> adder R -> adder etc -> etc // lots more instructions etc -> etc // which have nothing to do with the adder adder -> P After the first two instructions, the adder begins computing its sum. It does this independently from the main WIZ circuit, which goes on to execute "etc -> etc". That is, devices "plugged in" to the main WIZ circuit, like the adder here, operate independently from it. An instruction which drops off data to a device is done immediately after the drop-off. The main WIZ circuit is then free to move on to other instructions (like "etc -> etc"), while the connected device independently starts or continues to "do its thing". Now the "lots more instructions" in this example may take a while. Say they take, collectively, 800 ps. All that time the adder has been "chewing" on its data. So when we get to the "adder -> P" instruction, the adder is already 800 ps into its computation. Thus the "adder -> P" instruction will now only have to wait another 200 ps. And of course, if the "lots more instructions" takes more than 1000 ps, then the "adder -> P" will have no wait and will execute in just 10 ps. This is called ILP or "Instruction Level Parallelism". This is possible because the WIZ architecture separates all operation cycles into two parts: (1) copy data to it, and (2) retrieve the answer from it -- AT ANY LATER TIME. This is also an example of "OoO" or "Out of Order" instruction sequencing. If we consider the first version of this sequence above the "standard" order, where we start the add and retrieve its answer immediately, then the second version above, by re-arranging the two "etc -> etc" instructions to come in between, is running them "out of order". Contemporary processors (like those from Intel, Arm, etc), in introducing new chips year after year, have the liability that they MUST REMAIN ABSOLUTELY COMPATIBLE with the prior year's models. In the beginning these processors ran "in order" and thus now are stuck in the "run-in-order" paradigm. However, many modern processors give only the external appearance of being in order, while internally breaking apart the instructions just as we do here. But they are forced -- by the constraint of software compatibility -- to do it in real-time, with hardware. On Intel processors, even when the programmer or compiler "knows" a parallelizable instruction is coming, there is no way to express that in the software. For example, in x86 code, "MUL A,B,C; MUL D,E,F" has no way to say something like "MUL A,B,-; MUL D,E,-; MULRETRIEVE C; MULRETRIEVE F". Instead we must code the two MULs and then let the hardware take them apart and put them back together. This costs them tens or hundreds of millions of extra transistors, a huge increase in power usage, and years of complex design effort. On the WIZ we do it up-front in the program in the first place, using software (like smart compilers) to do it instead of hardware. (See chapter "ILP and OOO" later in this book.)