Often, a device can determine when it is done in real-time. An adder, for example, can put logic on its carry chain to actually determine when the add is done. It is no doubt much faster to add 0 or any power of two then to add a randomly large number. Alternatively, an adder (or multiplier or any similar device) could just implement a simple maximum delay every time, as the above circuit does. This is a very poor implementation, a "lazy" way out. I offer it up here as an example to demonstrate the principle. The above circuit depends on a pair of buffers whose delay is equal to the maximum possible delay of the given device (labelled "max delay" above). The adder's inputs, registers 1 and 2 here, are always ready for both read and write. Their ready lines would be simply tied high. Only the output, register 3 here, needs a controlled ready bit. When a frontend instruction writes to register 1, its "enable-in" signal will come on, indicating that new data is arriving. Likewise for register 2. The backend delay circuit taps into these two enable-in signals; each causes an SR latch to go to 0 and return to 1 after a delay, which is designed to be the maximum delay of the device. These two latches are ANDed to produce the overall read-ready signal. Thus when either register 1 or 2 inputs new data, register 3's read-ready goes low for a fixed time while the device does its computation. Thus this circuit simply gives the ready signal after the devices's maximum delay time. But note: THIS IS EXACTLY WHAT ALL (TRADITIONAL) CLOCKED SYSTEMS DO TOO. It is just not so obvious. They must calibrate their clock period to the worst-case timing of all devices being clocked, and this is fixed (hard-coded) into their hardware too, in the choice of their clock rate. The above circuit's delay buffers are calibrated to the exact same (worst-case) time. Note that we calibrate each device separately and therefore each has its own worst-case time, instead of the worst case across all devices included in the same clock domain. Thus this poor implementation is slightly faster overall. Clocked systems have no choice but to do this. But we have a better option -- we can have each device provide its own data-dependent or context-dependent ready signals.