Wednesday, July 13, 2011

Flux Capacitors might help prevent pipeline hazards

First off, this is not serious at all. Okay, you've been warned.

Ever wondered about alternative approaches to overcome Pipeline Hazards?

Well, looking at this xkcd schematic I started to wonder about the Flux Capacitor's potential in an actual circuit. I'll tell you what made sense in my head, but first, let me explain the basics. If you already know the basics, you may skip to "Flux Capacitors in Computers".

About Pipelining
It's a way of speeding up code execution by dividing it into several stages that perform sub-tasks simultaneously by means of dedicated hardware for each stage. Four consecutive stages are typically used: Fetching the instruction from memory, Decoding it (identifying the operation), Executing the instruction, and Writing Back the result. This ultimately speeds execution up (ideally in a rate equal to the number of stages in the pipeline). Read more.
Pipelined execution. Numbers shown in the graph are the instructions at the left.


About The Hazards
Hazards are problems related to the use of a pipeline in an instruction execution unit. Two Hazards are considered here:
  • Data Hazards: It happens when an instruction needs data that's produced by another instruction that previously entered the pipeline, but hasn't finished yet. For example let's say instruction i writes data into register A, and instruction i+1 (the next one) uses register A as an operand to calculate something else. What happens when instruction i+1 is in the Execute stage and i is in the Write Back stage? The value of register A used by i+1 as an operand will be the original value prior to the execution of instruction i because it's not done yet. This is called a Read After Write (RAW) or true data dependency. This is a problem because the data used by instruction i+1 is outdated. 
Data Hazard between Execute and WriteBack (may happen between other stages)
  • Control Hazards: This concerns branches, calls, interrupts, exceptions, resets, etc. When sequential execution must change due to a branch or whenever the next instruction to fetch is not the next stored instruction, the Fetch stage must be aware of the address of the next instruction (branch target, interrupt vector, etc). Furthermore, most branches are conditional, and it's not until the Execute stage that the processor knows whether or not the branch should be taken. This is a problem because when the processor becomes aware of a branch, all of the instructions previously loaded into the pipeline might have to be aborted (execution may have to jump elsewhere). This abort depends on the condition of the branch, which will be known after Decode. If the branch must be taken, the pipeline must be flushed or bubbled (instructions previously entered replaced by NOPs [no operations]). If not, no problem. Flushing/bubbling the pipeline is a problem because it's a waste of time, and the pipeline is supposed to speed things up. 
    Control Hazard with bubble
There are many approaches to overcome these hazards, mainly divided in two kinds of strategies: Static and Dynamic. Static approaches are performed by the compiler, conditioning the resulting code (rearranging instructions, loop unrolling, and so on). Dynamic approaches are performed by additional hardware (storing data in temporary registers or keeping a record of previous hazards for prediction).

Read more about pipeline hazards.

About the Flux Capacitor
"The flux capacitor is what makes time travel possible", Doc. Brown.

The Flux capacitor is the Y-shaped device inside the DeLorean Time Machine in the Back to the Future films.
Read more.


Flux Capacitors in Computers
So! Let's consider the unlikely event of an actual Flux Capacitor being invented, tested, manufactured, and even miniaturized to a VLSI level. Not happening, bear with me.

My intention is to get useful data from the pipeline and send it back in time in order to correct the hazards.


In the video above, Doc Brown said that the Flux Capacitor from the DeLorean requires 1.21GW to operate. It was capable of reaching ±30 years of time travel for anything inside the whole DeLorean. So, taken to a very small size, hopefully a MEMS Flux Capacitor requiring a couple of watts to achieve a time travel of no more than ±1 second for a cage of a cubic micron wouldn't be so much to ask (would it?).
Even if the proof of concept for time travel at this level is only able to achieve microseconds for cubic nanometers, this would be good enough because going some clock cycles back in time is all that's needed.

As described in the films, the time machine is only capable of taking objects back (or forth) in time, when they are somehow attached to the flux capacitor inside a stainless steel body. So our little (tiny) time machine would be a stainless steel cage containing a miniaturized Flux Capacitor, a time-traveling register, some status and control registers and a charge pump to initiate time travel by providing the required power for the Flux Capacitor to start "Fluxing" (lol).

Time Traveling Register Module (TTRM) Block Diagram

The amount of time to go back must be written to the TTRMSC register (SC=Status and Control). Some clock cycles must be allowed for the charge pump to operate. The amount of clock cycles required is irrelevant as the register will leave this timeline anyway. The future of this timeline is of no interest anymore.

So, Data Hazards won't be a problem anymore with something like the following arrangement, which is some kind of actual Register Forwarding:
Register LookAhead (RLA) Block Diagram

Thus, when a RAW data dependency is detected, the "GO!" signal stores the present WriteBack data into the Time Traveling Register, and activates the Charge Pump to start Time Travel a few (previously configured) clock cycles back in time. In the past, the "Incoming" signal replaces one operand of the ALU with the future data, eliminating the dependency setbacks.


On the other hand, Control Hazards could be taken care of nicely with a similar technique:
Branch LookAhead (BLA) Block Diagram

If a branch should be taken, the "GO!" signal loads the target address into the TTRM. In the past, the "Incoming" signal sets the Program Counter to the Target Address so the correct instruction fetch is made.

After the target address is sent back in time, actually loading the target to the program counter in this timeline wouldn't be completely necessary, as time travel would take care of that, correcting the events in the universe of interest (the one in the past). Watch a cool video about time travel here.

This brings up some questions that must have lots of discussion already: What happens to all the split universes generated by each time travel? Is there such thing as Universe Leaking? Is a free(Universe) function reasonable?

Considering all the split universes generated by this technique, it's important to point out that it will only be always useful for one and only one universe. All other universes will experience the hazards. (?) Oh well. 


That's all. I hope I'm not losing my mind.

No comments:

Post a Comment