The instruction cycle is essentially a four part micro-pipeline: (1) the executor releases an instruction word and sets copy start. This reaches all RCLs. (2) one RCL will see its own source number on the upper 8 bits of the instruction word and set "iAmSource". If its backend device is readReady, this RCL puts its register's bits onto the bus and sets copy in progress. (3) in identical manner, the destination register's RCL finds its own address on the lower 8 bits of the instruction word. When the copy in progress signal reaches it, and if its backend device is writeReady, it receives its register's bits from the bus and sets copy done. (4) And finally, copy done flows back to the executor and the instruction cycle ends. If both registers have readReady/writeReady, this is essentially a four gate-delay loop. It is very fast: in addition to the 4 gate delays, there is a delay for the source to drive the data bus, and for the setup and hold time of the destination register to receive from the data bus. Basically, we have just a register-to-register transfer across a bus. In 2020, with a 10nm process, we estimate this instruction cycle would run at 100 GHz.