Instruction-level parallelism - page 2 But we don't have to do a multiply all at once. We are not "atomic". Instead of: A => multiplier; B = multiplier; multiplier => C we can separate it like this: A => multiplier ; B => multiplier ... other instructions here ... multiplier => C The first two instructions will generally be 1 tu instructions. Once these are done, it is going to be 400 more tu's before the multiplier's output-ready bit will come on. If we do "multiplier => C" immediately, we will wait 400 tu's. But in the separated version, if the "other instructions" take some time, that time overlaps our multiplier's computation. For example, if the other instructions take 350 tu's total, then the "multiplier => C" instruction will wait for just 50 more tu's. And if the other instructions take 400 tu's or more, the multiplier will have completed even before the "multiplier => C" instruction executes, and no wait will occur. In that case this last instruction will take just 1 tu as well. This is an example of what is called "instruction level parallelism" or ILP. The "other instructions" and the multiply are happening at the same time, in parallel. We don't have to do anything special to get ILP, it "just happens" all the time.