Inter-WIZ function calls - page 3 Here is a more complex function, showing multiple inputs and a function calling other functions. // someFunct(x,y) = cos(pi*x) + sqrt(y+3). gateway, pi => multiplier => cosine gateway, 3 => adder => square-root cosine, square-root => adder => gateway 1 -> repeat; This would run in a WIZ with adder and multiplier objects behind registers on its bus. And the symbols "cosine" and "square-root" would be the serial numbers of the gateways of two other WIZes running cosine and square root algorithms. The first line starts with "gateway -> multiplier" and therefore this WIZ will sleep indefinitely until some data appears in its gateway. The first line ends with "multiplier -> cosine". This instruction will likely wait for the multiplier to complete. Then, if the cosine WIZ is not ready to accept data (perhaps because processing a cosine for another caller), this instruction waits again. When the cosine function WIZ becomes ready to accept this argument, this instruction completes and the data is sent to the cosine. The cosine then wakes up from its wait and goes into its own computations. I'm giving you all this detail to emphasize the power savings over traditional clocked computers. We are handshaking our way down a continuous pipeline of instructions, rather than clocking our way through. This is what Ivan Sutherland called a "micro-pipeline". His 1968 paper on the subject can be found in our appendix here. And we are using zero power during each wait for the next or prior stage of the pipeline. Likewise, the second line above will wait for a second argument in the gateway, pass it to an adder, wait for the adder, then wait for the square-root to be able to accept it, then pass it to the square-root. Which wakes up the square-root WIZ which goes into its own set of actions. ILP (Instruction Level Parallelism) alert: as the third line above starts, both the cosine and square-root WIZ are independently executing their algorithms, simultaneously. The third line starts with "cosine -> adder". As cosine was just started (woken) a few instructions ago, this instruction will likely wait, until the cosine completes its algorithm and puts an answer back into its gateway. The next instruction in the third line above is then "square-root -> adder". If we assume that the square-root algorithm takes about the same time as the cosine algorithm, then this instruction will not wait, as the square-root computation will already be done. If the square-root algorithm takes more time than the cosine, this instruction will wait only the difference between the two times. It doesn't matter which order we do that in, either "cosine, square-root -> adder" or "square-root, cosine -> adder", the total wait time on that pair of instructions will be the same, equal to the larger of the two wait times. Thus we have a new form of instruction level parallelism (ILP), we have function level parallelism. Traditional processors would at best split off a "thread", and might gain some ILP between the multiple threads, but at bottom line, they are using a single processor core across all function threads, and we are using multiple processors, one for each function. This is true parallelism. We are not "multi-threading", we are "multi-WIZing".