5980 words, short story
The Sentry Branch Predictor Spec: A Fairy Tale
Once upon a time1, one single processor filled five whole rooms, one per pipeline stage. Myriad pinballs flooded through the processor and (so I’ve been told) you could chase them from one stage to another to see its machinations in action. The pinballs clicked, clacked, and crunched through gates and slammed into the maws of traps built into the walls of each room. At the end of one clock cycle, their maws snapped shut. The traps flipped over so that, instead of facing the end of one stage, they faced the start of the next. At the start of the next clock cycle, their maws opened and the traps spit out their pinballs then flopped back. As pinballs careened from stage to stage, the processor constructed its results in assembly-line fashion: fetch an instruction, crack open the instruction, do the operation inside the instruction, get data from memory, retire the instruction. Each stage did the same thing again and again while instructions marched down the pipeline.
Over time, pinballs shrank into ball bearings. Ball bearings shrank into nanodots. Processors now fit on a fingernail, their gates, switches, and traps too small to see with the naked eye. Clocks cycled so quickly that the clicks, clacks, and crunches of a processor computing blurred into an ultrasonic hum. Meanwhile, designers and architects2 partitioned the work to fetch, decode and execute instructions into ever great numbers of ever shorter stages. Rather than just the five classic pipeline stages mentioned above, we divided the work into twenty stages or more3.
The reason we do that is throughput. Even if it takes twenty cycles for an instruction to get through the pipeline, one leaves the pipeline every cycle. The shorter you make each stage, the faster the clock can cycle, the more instructions the processor completes every second, the better Sentry performs4.
There is a cost, of course. Lots of costs, actually, but the one that matters here is this: the Sentry pipeline must be fed. It is a ravenous worm whose segments must stay filled. The worm slithers through memory consuming instructions, digesting them with data, and excreting their results. To get the correct results, it has to consume the correct instructions. If they were simply lined up one after another in memory, this would be simple. The worm would simply crawl through memory in a straight line.
Unfortunately, the correct instructions lie on a path that zigs and zags. The possible paths are spread like a web in space. To find the correct path, the worm has to feel its way through memory, consuming instructions then be guided by their results. Branch instructions tell the worm where to go. The trick is that they won’t until they are digested. By then, though, the worm will have already consumed more instructions. It doesn’t wait to find out where it should go first. Remember, the worm must always stay sated. Whenever it goes down a wrong path, it has to back up then go again in the right direction. That wastes time. The more time it wastes, the fewer computations it makes. The thing to do then, of course, is make sure it never goes down a wrong path.
Sentry’s Instruction Fetch Unit (IFU)5 has one job: steer the worm so that it always consumes the correct instructions. It is the marriage of Brady6, the branch predictor, and Ian7, the instruction cache. Every cycle, Brady places an address, where the worm should go, into a trap. It shuts at the end of a cycle, flips over to Ian, then spits the address out before flopping back over to Brady. Ian steers the worm then pushes the instructions at that address into the worm’s mouth. Being apoplectic to Ian8 is not an option. Brady always has to have something to say to him9.
Once a branch instruction is digested, Brady does find out the truth of where to go, but that shows up too late to do anything besides correct his mistakes. The worm only ever tells him where he should have gone, not where he should go.
Brady’s job, then, is really to divine the truth, to know where the worm should go before the worm figures it out for itself. How does he do it? He consults the three oracles. To do that, he must walk the three rings, one for each oracle. Nano-filaments guide him from segment to segment inside the worm. The rings are nested but they all meet at the trap that Brady uses to tell Ian where to steer the worm. This is where Brady’s journey starts and ends. He’ll encounter machines, monsters, and magic. And when he’s done, he’ll finally know what to say to Ian.
Next Address Table (NAT)
The first oracle dwells in the innermost of the three rings. To tell Ian an address every cycle, Brady has to make it all the way around the ring before the trap that sends the address to Ian shuts. Rugged walls soar on either side of Brady. An ever-shifting tangle of chutes latched to the walls block his way. nanodots flood through the walls, gears clack, and one end of a chute unhooks from the wall to his left. The chute swings. It whooshes over Brady and swerves around the other chutes. With a thud, it lands on some other crevice on the wall it’d left. The wall ripples with nanodots and the chute hooks itself back onto the wall.
The walls and chutes together form NAT. What it does is remember for Brady. Just before he walked this ring, he told Ian an address. Let’s call it A. What NAT remembers is the next address Brady told Ian after the last time Brady told Ian address A. Where Brady steers the worm from A this time is where he steered the worm from A last time. He just has to find that address on the wall.
Ducking chutes and grasping holds, he climbs the wall on his right. The lower portion of A is his guide. He follows it bit by bit, twisting left or right as he pushes his way towards the correct chute. He doesn’t use the whole address, just the bottom ten bits. If he used the full address to find the right chute, the trap that pushes his address to Ian would have closed and opened again before he got around the ring. He’d arrive too late to tell Ian anything.
When he finds the right chute, he steps inside then slides. He is turned and tossed with the chute’s every curve until he slams into a crevice in the other wall. Within the crevice are two tokens that represent the next address. The crevice is too small to hold the full address. What it stores instead are the bottom bits of the next address and an index that will tell Ian which ITLB entry to look at to find the address’ top bits. ITLB lives in Ian’s domain and holding the top bits is just one of its jobs10. Ian must reconstitute the address before searching for the instructions that live there.
Once Brady has those tokens, it’s a climb down to the floor then a sprint back to the trap where the ring starts and ends. The walls and the tangle of chutes cover the entire ring. He dodges every step of the way and, if it’s not too cold and the nanodots flow fast enough, he hits that trap just in time.
There’s a faster way around this ring. The IFU Configuration Register (IFUCR)11 has a bit that opens up a ramp that leads to a tunnel below the wall. Inside the tunnel lives an adder. Brady tells the adder the address he told Ian and, in return, the adder tells Brady the sequential address. That is, where to go when the worm gets to slide straight ahead.
What Brady tells Ian is actually the address of an instruction bundle. The worm consumes instructions not one at time but one bundle at a time. The whole of memory is partitioned into 32 byte lines (which hold 8 instructions strung together like beads) and 8K pages12. All the instructions in a bundle come from the same line. Ian ends bundles either at the end of line or right after a branch instruction. The adder always assumes Ian will break the bundle at the end of the line and give Brady a token for the line immediately ahead of the worm. For the other token, Brady just keeps the ITLB index he already has.
The ITLB index tells Ian which page to fetch from. This means that every time the worm tries to slither from one page to another, Brady will steer wrong. He will have to back up, get the ITLB index for the page the worm is going into, then try again.
The adder doesn’t predict very accurately. It assumes the worm should go straight ahead and the instruction stream is bound to have branches in it. Even when it doesn’t, the adder can still get it wrong. However, Brady can sprint through the tunnel and back up in plenty of time. Trade offs13.
That said, is the NAT prediction accurate? More so than the adder, but not particularly. Its main job is to remember what better predictors have come up with. Those predictions can collide, though. Too many addresses can lead Brady to the same chute. Those addresses are probably all parts of completely different paths, but the chute can only take Brady to one crevice.
NAT is the oracle Brady can visit within one cycle14. It’s how Brady can tell Ian an address every cycle. That’s why Brady has to visit two other oracles. Their predictions may correct the NAT prediction.
The second oracle Brady visits hauls out the serious magic15. Gshare is the sort of thing my undergraduate logic design professor would have hrumphed at then dismissed as heuristics. She’s right, of course.
Her solution for where to fetch from while you’re waiting for the truth to arrive was to have delay slots. Basically, you declare the actual branching doesn’t actually happen until some number of instructions after the branch instruction. The instructions between the branch instruction and when the branch takes effect are the delay slots16. To say any more about them is to make them more important than they are.
Like NAT, Gshare operates under the principle of historiomancy: how the branch behaved last time is the how it will behave this time. Gshare just takes longer and consults way more history to divine a branch’s future. In particular, it curates and references PHR, BHT, and BTB.
Pattern History Register (PHR) remembers whether each of the ten most recent branches was taken or not. It’s a segmented snake. Taken branches twist its segments in one direction while non-taken branches twist them in the other. Branches enter through the tail, shift along the body, then leave through the head. As PHR meets branch after branch, switches clatter and traps rock back and forth on their hinges. The traps slingshot dots from their maws to their neighbors’. With each new instruction bundle, PHR wriggles and writhes. Its shape remembers the path of Sentry’s journey through memory.
Branch History Table (BHT) remembers what happened to every branch Brady has met. It’s a grid of 1024 arrows, each one mounted on its own base and shaft. When a branch is taken, its arrow swings further to the left. When it isn’t, its arrow swings further to the right. Backstops prevent arrows from swinging too far in any direction. An arrow can only tell you one of four things about its branch: strongly taken, weakly taken, weaken not taken, or strongly not taken.
Branch Target Buffer (BTB) remembers where taken branches told Brady to go next. Each address lives in the base that supports the corresponding BHT arrow. When Brady learns where a branch takes him, not only does its arrow swing to the left but the base shudders as it swallows that address.
Brady has two cycles to run around the Gshare ring. In the first cycle, he wrestles PHR. It shudders under his grasp as an incoming branch ripples through its body. He weaves the bottom ten bits of the current fetch address through the ten PHR segments. In the second cycle, the now trussed PHR guides him to the correct arrow in BHT.
Which arrow is the correct arrow depends both on where the branch is in memory and the path Brady fetched to get there. What Gshare predicts isn’t just the result once a branch is digested. It’s actually predicting which path the worm will take through memory. Historiomancy says where we went from here the last time we fetched down this exact path is where we will go this time17. The instruction stream is chaotic. Reaching this branch by some other path may take us to some other place. Also, weaving the address and PHR segments together spreads branches throughout the BHT. This makes it less likely that more than one branch will land on the same arrow. It makes it more likely that the history Brady consults will actually be for the branch it’s trying to predict.
Note that Brady hasn’t used the arrow yet. It’s actually the tribute he will give to the third oracle. Regardless of where it points, Brady always races the address from the base back to the start of the ring.
Gshare is more accurate than NAT. When their predictions disagree, Brady tells Ian the Gshare prediction. Of course, Gshare takes a cycle longer to consult than NAT. By the time he knows the Gshare prediction, he has already told Ian the NAT prediction. To correct the fetch path, the worm has to back up and the instructions Ian had forced down its mouth are flushed out.
Like NAT, Gshare doesn’t know anything about the instructions Ian fetches. It divines the future using a squirming snake and a bunch of arrows. Given a fetch address, Ian will ship down the pipeline a bundle of instructions starting at that address. Gshare may predict that the last instruction in that bundle will be a branch that will be taken. That bundle, however, may not even have a branch in it18. Despite PHR, fetch paths may have collided in the BHT and stolen each other’s arrows.
In a way, this is fetching with one hand tied behind the back. At the very least, Brady should predict only the branches that exist and not the ones that don’t. Ian handles every instruction that he ships down the pipeline. Once he finds them, he knows which ones are branches. In the time that shards of Brady take to consult NAT and Gshare, other shards of Brady are embarking on the long journey around the third and largest of the rings. By the time he reaches the third oracle, Ian will have found for Brady some of the answers Brady wants.
Branch Target Address Calculator (BTAC)
As the name suggests, the third oracle Brady consults doesn’t actually predict. It calculates. By the time Brady reaches BTAC, Ian has seen the instructions that live at the address A that Brady told him. From that, finding the address to go after A is just an addition, assuming the BHT arrow is pointing the correct way.
BTAC actually lives in Ian’s portions of the worm, not Brady’s19. The ring that Brady walks leads out of his own domain, into Ian’s, then back. Just around the bend, gears gnash against each other, conveyor belts whir, and hammers pound. Instructions are conveyed through a tunnel on their way down the pipeline. Inside, they are cracked and counted.
Brady can only want one of two addresses: the sequential address or the target address. BTAC works out both of them as Brady approaches the oracle.
The sequential address is the address that makes the worm go straight ahead. It’s the right address to use when the bundle of instructions Ian sends has no branches or the bundle ends with a branch that the BHT arrow will insist is not taken. It’s the address A plus the number of instructions in the bundle. That’s what the counting is for.
The target address is where the worm should go next if it’s not straight ahead. It’s the right address when a branch ends the bundle and the BHT arrays say that branch is taken. That’s what the cracking is for. Inside the branch instruction is a token that tells Ian how far away the instruction the worm should consume next is. Add the token to address A and Ian has a target address.
Ian deals with full addresses. Brady deals with an ITLB index and an offset. After Ian calculates his addresses, he needs to find the ITLB entry that holds the 8K page that matches it. If it’s not in the ITLB, then Ian goes on his own epic journey. But that’s another story…
Brady approaches BTAC as a supplicant with the BHT arrow. Ian is there, waiting for him. Brady gives him the arrow. If it points to not taken, Ian gives Brady the sequential address. If it points to taken, Ian gives Brady the target address. Actually, if there are no branches in the instruction bundle, Ian just scoffs and tosses the arrow away20. Then he gives Brady the sequential address.
Brady sprints back to the start. BTAC is the most accurate of the oracles. It also takes Brady four cycles to run this ring. By the time Brady can use this address, he has already told Ian three fetch addresses in the meantime.
This is the price, because there’s always a price. To use the BTAC prediction, the worm has to first flush out the three fetch addresses Ian has been told in the meantime, wasting whatever effort Sentry has already put into them. So much for keeping the pipeline full. Also, it may turn out that he doesn’t use the BTAC prediction. The fetch address it’s based on may itself have been flushed out. This is why we want to get to the correct address as quickly as possible. The more we flush, the more empty the pipeline, the poorer Sentry performs.
As things turned out, instructions were tougher to crack than we expected. Ian can’t get it done before Brady shows up. Instead, Ian partially pre-cracks them when he first caches them away. For every instruction, he figures out whether it’s a branch and how far it is to the next. This way, the sequential address is mostly fishing out this extra information and an addition21.
Furthermore, branches come in sixteen flavors. Each one has a different condition that has to be met in order for that branch to be taken. Ian can’t distinguish between flavors. He doesn’t have the time. On one hand, this is OK because Ian doesn’t have the space to cache that away with each instruction anyway22. On the other hand, the Branch Always instruction is always taken and the Branch Never instruction is never taken. The BHT arrow may point in the wrong direction and Brady will get the wrong address even though the information that tells them which address to use is sitting inside the partially cracked branch instruction23.
After BTAC, there’s nothing left for Brady to do for this fetch address except wait for the truth to come to confirm or deny his prediction. In order for this journey to be worth to taking, Brady has to find the truth way more often than the truth finds him. Otherwise, he might as well just have waited.
The magic of branch prediction, of course, is that NAT, Gshare, and BTAC together are surprisingly accurate once they are properly trained. “Properly” is the key word.
Branch Status Table
Historiomancy insists that what has happened will happen. Inside NAT, BHT, and BTB, Brady remembers what has happened. He consults them to divine what will happen. To remember, though, he inevitably makes a mistake first. A NAT entry or BTB entry has to be wrong before it can be repaired with a corrected address. The truth sways the BHT arrow that foretold (or didn’t foretell) the branch direction. As Brady sees the same branches again and again, correct predictions are swept inward to faster predictors. BTAC calculations find themselves inside BTB and BTB predictions find themselves inside NAT. The truth, of course, repairs all predictions.
The Branch Status Table (BST)24 tracks every branch from when Ian is told its address to when an execution unit retires it. When Brady sends his prediction to Ian, he stores it in a BST entry along with the fetch address he predicted from and the PHR used in the prediction (if any).
When a branch resolves, the truth comes to Brady with an index into BST. Actually, Ian sees the truth first because—and this seems to be a familiar refrain—the truth is an address. Ian needs to convert the upper bits into an ITLB index. Then Brady sees the truth. If we call the cycle where Brady tells Ian an address F1, then the truth arrives two cycles before that at FN1, because zero is a number25.
Brady compares his prediction with the truth. If he predicted wrong, not only does he need to re-steer the worm down the correct path, but he has to fix NAT, BHT, and BTB so that the next time the worm slides down this way, those oracles predict the path it is sliding down now. From what he’s stored in the BST for that branch, Brady can figure out which entries of NAT, BHT, and BTB to repair and with what. In addition, the PHR stored in the BST replaces the current PHR, then the actual direction of the branch is shifted into it. This makes the pattern of taken and not-taken branches in PHR what it would have been if we had predicted correctly.
One, Brady only uses BST to deal with the truth. He has no idea when the truth will arrive. It just shows up from out of the blue. When that happens, he has to remind himself what must be repaired. When he corrects himself, his repairs are based on which predictor is making the correction. For example, a Gshare prediction always corrects the NAT prediction Brady told Ian the previous cycle26. In these cases, Brady remembers the past by passing along the fetch address and PHR from stage to stage.
A maze of tracks and maws surround the PHR. It pushes copies of itself onto the tracks, cycle after cycle. Like misshapen cars on intertwined roller coasters, they rush down the tracks, looping back to the start. At any cycle, Brady can revert to the PHR of two cycles ago or four cycles ago then go on as though it’d predicted correctly.
This all sounds really simple, except there’s a new prediction every cycle. That prediction may be for a new branch or it may correct an old branch. Meanwhile, predictions may be flushed out for so many possible reasons. Lots of PHRs and fetch addresses are careening through the tracks but not all of them are valid. The logic to throw the correct snake into PHR turned out to be surprisingly hard to get right, especially when the result has to make it into the trap before it shuts or it is never remembered27.
Two, repairs are never undone. By the time the truth arrives, Brady may have repaired itself who knows how many times for other branches fetched in the interim. The latest truth may flush out those other branches, but the repairs they caused stay in Brady. The Brady repaired by the truth still remembers walking down those false paths that now never happened.
If Brady is accurate enough, this may not happen often enough to matter. The paths that are false now may be paths Sentry will walk later. Maybe Brady has just trained them ahead of time. These may be good reasons to leave those repairs in. These may just be the rationalizations we tell ourselves because there’s nothing else we can do. We don’t have the room to remember enough so that we can recreate NAT, BHT, and BTB exactly as they were after any given fetch. As a result, we can never forget our mistakes. If we’re lucky, we overwrite them.
The Things We Don’t Talk About
This is the branch predictor that never was. There’s a lot I’ve avoided talking about, entire classes of control-transfer instructions, for example. Brady didn’t just predict branches, but also jumps, calls, and returns. Hell, the architecture Sentry implemented specified one delay slot immediately following each branch28. That delay slot can be in the following line and requiring its own fetch. The instruction in the delay slot may itself be a branch (whose delay slot, as it turns out, is located at the target of the original branch)29.
Truth turned out to be much more ephemeral in real life. Multiple branches were being digested in the worm at any given time. The execution units generally resolved them in out-of-order relative to how they arrived at the execution units. Sometimes, they resolved the same branch multiple times with a different result each time. Like the IFU then, the execution units made their own gambles that sometimes did not pay off. That whiplashed the worm, jerking it from one path to another then back again. The truth was not the final word. It was just something that was right for now, but might be wrong later, or not. The result the final time a branch resolved is always correct, but how do you know, at the time, that it’s the final time? You only realize too late when the branch retires. That is, when the worm has excreted the result.
Sentry, of course, had no real resolution. At least not a satisfying one. We never got to see whether it would have worked for real. Part of me still thinks that execution units that calculate the world are still speculating, that Sentry can still resolve another way. The world will flush out the past decade or so of our lives, start down another path and, this time, we’ll ship the damn thing. Or maybe we’ve already tried that and we’ve whiplashed back. There’s no way to know until we retire30.
1 - Sentry doesn’t exist. The processor was canceled just as we were ready to build it for real. I’ve gone on to design processors that do exist. I’ve gone on to other canceled projects. I’ve gone on to state-of-the-art work. Sentry was just my first processor, but also the one where I wonder about the other paths we might have taken.
2 - Yes, I’m being pedantic. Architects, like Ajay and me, tell the builders and verifiers fairy tales about how the processor works. Designers, like Marie, take the fairy tales and make them real. They spin nano filaments and hammer them into place. They make sure the nanodots actually make it from one trap to another before the traps shut. Verifiers, like Hongwen, make sure the fairy tales are believable.
3 - I never found a diagram of the entire Sentry pipeline, just portions for a few of its units. The ones drawn by my partner-in-crime, Ajay, were, of course, exquisite.
4 - And the better Sentry performs, the more likely you can keep doing the job you love with the people you love.
5 - Yes, we really called it that. (Say it one letter after the other, like we did. I-F-U.) Gallows humor. Sentry was always in danger of being canceled.
6 - By the time the project was canceled, I’d invested so much of myself, I might as well have been Brady.
7 - In my mind’s eye, I always imagine Ian looks like Ajay. Tall, broad, and, frankly, somewhat daunting. Ian is, by all rights, though, Ajay’s story to tell, not mine.
8 - Or Ajay, for that matter.
9 - Sometimes, knowing what to say to Ajay was as hard as predicting the outcome of branches.
10 - Hmm. It’s going to be hard to talk about Brady without talking about Ian. Figures. It’s not like Ajay had nothing do with Brady or I had nothing to do with Ian.
11 - Yes, the IFUCR. It’s pronounced exactly the way you think it’s pronounced.
Ajay joined the project about a year in. He had this tendency to loom. I knew I’d like him when, about two weeks after he started, he walked into my cube and told me, deadpan, “I need a bit in the eye fucker.” Then, he smiled.
As it turned out, though, he really did need a bit in the IFUCR.
12 - Ajay was so the right person to design Ian. I’ve never seen anyone so handy with a spreadsheet or so disappointed when things didn’t go exactly according his Gantt chart.
He’d go lift whenever he got frustrated. Sometimes, I’d run into him in the locker room. He was much more approachable tired.
It probably says something that Sentry changed what flavor of imposing he presented over time. He started off very “Come home with me to my private island in the South Pacific.” After a couple years on the project, he was much more “Fight by my side in our quest for truth and justice.” Given that I was the one born outside the US—on an island in the South Pacific, no less—I found the latter far more appealing.
13 - Did you know that around 2AM, campus security would walk the parking lots and take down license plate numbers? 2AM was also about when Ajay would show up at my cube and gently suggest that it was time to go home.
I’m no longer dedicated enough to pull hours like that. Maybe if I were, there would be another Ajay at my current job for me to lean against as we walk out of the office and into the parking lot.
14 - Marie basically parked herself in my cube until I came up with something that fit within one cycle. Hongwen liked NAT because it was simple. We could build it and we could verify it. Now if only we could have also made it predict accurately.
15 - If Ajay was the one who was precise and regular, then I was the one who ran down all the blind alleys then backtracked. Maybe, in a way, that made me the right person to architect Brady.
16 - History has not been kind to the delay slot. I tried to explain to Ajay once what happens if a delay slot itself is a branch. He fell asleep in my lap on my couch. In his defense, we were both hammered.
17 - Ian worked so closely with Brady that Ajay had to understand Brady, too. I’d try to explain historiomancy to Ajay over pizza and all I’d get back from him is this penetrating stare, as though I was daring to lie to the embodiment of all that’s true and just. He simply couldn’t see how a snake and a pile of arrow could guess right so often.
Honestly, I don’t think I could have lied to him even if I wanted to. No matter how skeptical his gaze, his body language was always gentle. His large, thick-fingered hands would rest open on his thighs, not balled into fists across his chest. (My hands and forearms, of course, were scarred. So much for that concert piano career.)
He listened, even when it was obvious that I didn’t understand how a snake and a pile of arrows were right so often either. Just that they were.
18 - I still remember Ajay’s expression when I started to point out the stupid ways Brady can go wrong. Who knew it was possible for eyes to roll that high?
19 - This part of the story doesn’t make sense without Ajay’s. Ian does the calculation, then Brady figures out what to do with it. By this point in the design, Ajay and I were one mind in two bodies.
20 - Ian brooks about as little nonsense as Ajay. Shocking, I know.
21 - This scheme seems obvious in retrospect, but we didn’t come up with it until after five workouts, three dinners, and Mission Impossible (the movie). That said, I don’t know that we were trying to minimize the amount of time we spent together outside the office.
22 - Ajay and I tried, but one did not mess with Marie. Her first processor was in the ball bearing days. If she said there wasn’t room, there wasn’t.
23 - When I realized this, I told Ajay that this was my baptism into the compromises of real-world computer architecture. He just smiled and handed me my cell phone. I’d left it at his apartment the previous night.
24 - Yes, a table of BS. Keeping morale in Sentry’s final days wasn’t easy.
25 - You’d be surprised who didn’t think F0 was a thing. For an embarrassingly long time, Ajay and I were a cycle off until we realized what was happening. It turns out when we both think the other one is wrong, anything could set one or the other of us off. (No, of course it wasn’t really about whether zero was a number. That was just the handy linchpin.)
In retrospect, I don’t know that we ever really worked it out. Or maybe we got back on the right path, but we couldn’t erase going down all those wrong paths and that took its toll.
26 - Hongwen should be thrilled that I finally asserted a property she can test. Yes, over a decade too late.
27 - Trying to get this right literally drove me to tears. And some things we had to get wrong because you can only put nanodots through so much in one cycle. Hongwen was never thrilled when something deviated from the platonic ideal.
I may have spent months, if not years, with my head nestled against Ajay’s shoulders or his hands massaging my shoulders. Ajay, as previously mentioned, never dealt with his frustrations by, say, talking to me. He always fled to the gym.
28 - But I don’t have to admit to delay slots if I don’t want to. Just like I don’t have to admit that Ajay left. The long threatened cancelation axe fell not only on Sentry but on everyone who worked on Sentry. It’s ironic that he was the one who foresaw this while I was caught by surprise. He found another job, then moved to California before it all fell apart. I, caught off-guard by both Ajay and the cancelation, stayed with Sentry to its sputtering end.
29 - Making delay slots work correctly might have been bearable with Ajay’s help, not to mention his strong hands, broad shoulders, and capable mind.
30 - I never looked up Ajay in California. It wouldn’t have been that hard. This is a small field we work in. I mean, Marie and I are working at the same company again. In fact, she’s my boss’ boss.
At first, I was too angry to look Ajay up. Now, it’d be too weird. Part of me thinks it’s better to remember him as that guy who made me want to fight for truth and justice by his side. Part of me wonders whether there was some other path I missed. In any case, even if I saw him, I haven’t walked my three rings yet. I wouldn’t know what to say to him. Oh well.
John Chu is a microprocessor architect by day, a writer, translator, and podcast narrator by night. His fiction has appeared or is forthcoming at Boston Review, Uncanny, Asimov's Science Fiction, and Tor.com among other venues. His translations have been published or is forthcoming at Clarkesworld, The Big Book of SF, and other venues. He has narrated for podcasts such as EscapePod, PodCastle, and Lightspeed. His story "The Water That Falls on You from Nowhere" won the 2014 Hugo Award for Best Short Story.