When is Intel "IA-32" (aka Intel "x86") complex instruction set
computation (CISC) going to finally die? That question has been asked
every since MIPS R2000 processors hit the market in the mid-80s. While
the debate of "kill x86" v. "x86 forever" rages on, the company who is
behind the latter might actually be the best one for killing the former
as we will see.
Overview:
- IA-64: When Reality Breaks Theory
- Athlon: The Re-programmable Pentium
- AMD x86-64 and Intel Yamhill
- Digital FX Flashbacks
- Transmetting the Future
- IA-64: WHEN REALITY BREAKS THEORY
The 60s introduced complex reduction set computation (CISC) which was
quickly followed by the birth of the microprocessor by Intel, which all
its subsequent products would be based. And CISC moved into superscalar
and pipelined design, it was obviously difficult to optimize. So the
80s brought us reduced instruction set computation (RISC) drastically
reducing logic size and design times and many CISC vendors made their
switch then and there. Unfortunately, RISC still didn't solve the issue
with only 50% of pipelines being utilized at any time. So when Intel
skipped the RISC generation but finally decided to move away from its
CISC backbone, they moved to address the shortcomings of RISC with a
approach for the 21st century known as explicitly parallel
instruction-set computation (EPIC).
EPIC is extremely innovative. It uses heavy compile-time optimizations
to assemble traditionally 32-bit RISC instruction words into a 128-bit
very long instruction word (VLIW) of three 41-bit RISC words and some
control bits. This eliminates a lot of overhead in the run-time design
of the processor, making RISC even more RISC. And in an effort to
completely eliminate the dreadful even of a processor stall caused by
branch misprediction, it introduced the concept of branch "predication"
where both branches are executed and the road not taken result is
discarded when the branch is resolved. Unfortunately, EPIC wasn't as
good in silicon as they thought it would be.
The first Intel IA-64 processor, Itanium, wasn't just a flop because it
did not run older IA-32 CISC code well. It failed to really keep its
pipelines 90% full like the EPIC approach promised -- despite heavy
compiler optimization development. And when it came to branch
predication, the savings in "stalls" was not worth the extra, useless
work the processor committed itself to doing by executing the branch
that would not be taken. While Intel is addressing the utilization
issue with the addition of traditional run-time optimization, and even
some traditional branch prediction in its 2nd generation IA-64
processor, "McKinley," even Intel itself is wondering if they have made
the right approach to transitioning away from CISC IA-32.
- ATHLON: THE RE-PROGRAMMABLE PENTIUM
You've never heard an Intel engineer curse more than when they speak of
Math Matrix eXtentions (MMX) or Streaming SIMD Extensions (SSE). Intel
has not only bloated its CISC IA-32 instruction set with such
concoctions, but have ended up giving their engineering teams all kinds
of tangent designs to figure out how to slap onto their cores. Instead
of evolving their now aged Pentium core design with more general
arithmetic logic unit (ALU) and floating point unit (FPU) pipes and
registers, they slap on more "lossy, application-specific" integer-float
interpolating logic and registers for them. Worse yet was the fact that
they still haven't addressed their "less-than-ideal" out-of-order and
branch prediction units because the whole Pentium series was supposed to
be addressed by IA-64 EPIC/predication by now.
The result is a chip that excels at specific, visual applications where
accuracy is not necessary, but one that is not so fast at general
applications let alone engineering and scientific ones.
While the well-funded Moore and co. design teams were busy either adding
accessories to their Mustang or their prototype that only millionaires
could afford, his former Fairchild colleague Sanders was off spending
the few R&D dollars they had to build a Viper. They took the aged
muscle car approach that they knew worked, refined and modernized it
with more pipes, better branch prediction, lots of buffering into a
solid, efficient, 9-issue core in a few years instead of a decade. Not
the most efficient, easily double the size of an original RISC design,
but it was built to run code written for a 4 decade old approach. The
result would be known to end-users as the Athlon. A core design that
will serve them a good 5 years before it needed to be overhauled.
AMD has always led Intel in ALU performance and memory loads, and their
branch prediction unit was based on lessons learned in the K6 (which was
overkill). But the Athlon's greatest strength was its 3-issue FPU which
causes Intel headaches to this day. Whenever Intel adds another 50+
opcodes for some fancy, schmancy multimedia niche, AMD just writes some
microcode to leverage its FPU (or ALU in some case) to do it. So while
Intel has to slap on yet another execution unit and more registers, AMD
just figures what FPU pipes to use and registers to dedicate to it. The
effort is far less, and more time can be spent to optimizing the
accommodation in existing design, instead of rushing to finish the "slap
on" design, do timing resolution of the new logic with the old, etc...
Although IA-64 also uses microcode to execute the bloated CISC IA-32
instruction set on its EPIC design, it wasn't designed for it like
Athlon. Seeing the Intel IA-32 team add more and more junk to its
product without giving the IA-64 team a thought reminds me a lot of
another company, who's "Chicago" team did the same with their products
without consulting the other guys in their same company.
- AMD x86-64 AND INTEL YAMHILL
The Athlon also did one more thing for AMD, it gave them their own
hardware platform. No longer did AMD need to wait on Intel to move on
the OEM end, they moved the platform themselves. Sure, the first 6
months were dominated with few products, poor 3rd party support, and
even poor, end-product reliability, but the platform boomed in no time,
and by the end of the first year, few OEMs were limiting themselves to
Intel. Now AMD is going to finish the job.
AMD x86-64 brings 64-bit addressing to IA-32, in a fully, backward
compatible, similarly performing way. In fact, x86-64 is nothing
special, it's just an Athlon with 64-bit addressing, another pipeline
and more registers now with 64-bit lengths. Nothing major to address in
overall design, other than adding in the addressing/register extensions
and making sure it handles run-time resolution of switching between
legacy and 64-bit modes.
Since IA-64 "McKinley" won't arrive until x86-64 does as well, Intel
realized they had far too many of their eggs in one basket. Although
Intel has not confirmed it, their "Yamhill" project is one to build an
x86-64 compatible processor. This means that Intel had to license AMD
x86-64 which AMD has confirmed. This means engineering bliss for the
future of IA-32. Why?
AMD has a history of not bloating IA-32. Only once have they introduced
instruction set extensions (3DNow!), and those were done to address the
_shortcomings_ of a marketing-driven extension set from Intel (MMX).
Later refinements of those extensions were often just adoptions of Intel
introductions and, as discussed before, done in a way where microcode
was added using the existing ALU/FPU pipes. Now that AMD controls the
ISA as well as its own platform, IA-32 will finally "stabilize" under
AMD's x86-64 leadership. Even Intel marketing will take a "back seat"
for awhile as they cannot even hope to have an x86-64 competitor out
until late 2003 -- a good year behind AMD.
- DIGITAL FX FLASHBACKS
AMD doesn't have the R&D dollars of Intel. Even though they spend a
greater percentage on R&D than Intel (who spends a lot of that on
marketing-related R&D projects), they cannot make a dent in comparison.
So they rely on industry partnerships who contribute and proliferate
their combined concepts, innovations, ideas and products into a
community designed platform. No more apparent is this than in the
introduction of their ultra-flexible HyperTransport interconnect, which
is being used by basically everyone outside of Intel, even for Intel
platform systems in some cases.
At the forefront of this are employees of the former company known as
Digital, now owned by Compaq, now owned by HP. These employees built
the most anal of RISC designs, the Alpha microprocessor (uP) and the
most practical of microcontroller designs (uC), the StrongARM. They
dominated the design of pretty much all of the enterprise-level system
and bus logic other interconnects, EV6/7, PCI bridges, etc... And they
seeded much of the commodity Ethernet market with their popular design,
the Tulip. Although that collective engineering resource is gone, their
footprint on history even continues today at AMD and partners like API
Networks (fka Alpha Processor, Inc.). And one major technology they
introduced continues to be undervalued.
When Digital created the Alpha, they created an ultra-clean 64-bit
platform for _only_ 32/64-bit computing -- no 8 or 16-bit. This wasn't
by mistake, nor was it just to show how efficient RISC could be when
taken to a level an "analness" like the Alpha. It was a hardware
conduit for an innovative software concept and associated set of tool.
Those tools was FX!32, which silently won award after award for its
approach.
FX!32 was a "binary compiler" (if I may call it that) that not only
run-time emulated software written for another architecture or "byte
code," but did run-time _conversion_ of binary executables and libraries
from one architecture into Alpha. It then further did post-conversion
optimizations on the new Alpha binaries each time it was run -- to try
to further match the execution speed of the original -- and boy did it
come close! It was a brilliant piece of work -- one that Digital needed
not just to sell the NT/Alpha platform as it could run NT/x86 binaries
but, more importantly, to allow users to run VAX/VMS binaries on the
accompaning Alpha/VMS platform. Digital would even go as far as to
introduce FX!32 software for Linux/x86 -> Linux/Alpha and even some
limited UNIX/MIPS -> Alpha/UNIX.
Digital realized that software runs on the operating system platform,
not just an architecture. While it is common to emulate other software
platforms via library calls or even semi-virtualized hardware on the
same architecture or "byte code" (e.g., VMWare or WINE on x86), Digital
found it was far easier to emulate various other architecture (MIPS,
VAX, x86) on the same software platform (UNIX, VMS, Windows/Linux,
respectively) to theirs (Alpha). And it didn't stop there because they
could _permanently_convert_ the binaries of those other architectures to
Alpha. Because binaries are built for a software platform -- the
architecture was just an instance of it.
The Digital Alpha technology was licensed to Samsung, AMD and Intel,
with Intel being the owner of the platform now. One has to wonder if
Intel knew today how IA-64 would perform, would they had not bought
Alpha a long time ago and used it as its nexgen, non-CISC platform?
Alpha has _always_ been the highest performing architecture. I mean,
while an 800MHz, 0.18um Itanium toasts even a 2.4GHz, 0.13um Pentium 4
at floating point, even 3-year old, 600MHz 0.35um Alpha 264s
_outperforms_ that same Itanium by an even wider margin! You add in the
fact that FX!32 on Alpha _greatly_outperforms_ Itanium when it comes to
running x86 binaries, and one can only wonder if we wouldn't have 64-bit
Intel Alpha chips now, running at 4GHz at 0.13um, with fully supported
FX!32 software for running legacy Windows and Linux binaries on it. And
instead of talking about "fixing" IA-64 with "McKinley," we'd be talking
about the new Alpha 364 design that is the best of both worlds --
adopting Intel EPIC ideas like compile-time optimization to improve RISC
run-time utilization.
- TRANSMETTING THE FUTURE
So what's my point? The main reason we have NOT seen something like
FX!32 is because Intel keeps extended IA-32 and toying with IA-64.
Yeah, so, Intel finally owns Alpha now, and while McKinley and later
IA-64s will benefit, it's far too little, far too late. Now that AMD is
commanding IA-32 c/o their 64-bit x86-64 -- maybe, just maybe, the
AMD-API guys are thinking about going beyond legacy CISC IA-32. Maybe
they are thinking of building their own 128-bit VLIW design. Or doesn't
someone else already have one???
Yes, one company does. In fact, they looked at it a little
differently. Instead of writing some add-on systems software that lets
one architecture run the software written for the same platform as
another, this company put it in the firmware and that's all it does! It
doesn't even market its own, natively running software but _always_ runs
the foreign bytecode. The Transmeta Crusoe architecture is a 128-bit
VLIW RISC design that has virtually _no_ microcode at the core, but uses
a software/firmware-driven principle know as "code morphing" to take
another bytecode and break it down into its raw, native VLIW words at
run-time. "Code morphing" is yet another innovative approach based on
the simple fact that x86 bytecode rules the landscape which, like FX!32,
is based on the fact that it is easier the same software platform on a
different architecture than a different software platform on the same
architecture. Furthermore, why else do you think they hired the guy who
wrote the first operating system against the full Intel i386+MMU
specification, Linus Torvalds -- because he knew x86 bytecode in and
out! And guess who is also a licensee of the Transmeta IP?
Yeah, the same company who is now in control of IA-32, AMD. Makes you
wonder where this is all leading. Let me piece it together my
predictions for you ...
- As the new leader in x86-64, AMD will "permanently stabilize" IA-32
ISA. x86 bytecode will now be a "standard" that doesn't change.
- A new, 2nd generation 128-bit VLIW using HyperTransport will be born
out of the AMD-API-Transmeta alliance. This chip, unlike Crusoe, will
have native versions of 64-bit Windows and Linux released for it.
- A merger of FX!32 and Code Morphing concepts will lead to an improved
"binary complier" for both Windows and Linux. You will still have to
run Windows/VLIW2 to run Windows/x86[-64] and Linux/VLIW2 to run
Linux/x86[-64] binaries, respectively, but it will finally move people
away from IA-32/x86 by 2006-2007.
IMHO, if this happens, Intel will have its issues go exponential. Not
only will they have a tough time proving to people that IA-64 is viable
versus this new VLIW2, but their other strategy revolves around the,
"now dying," x86-64 ISA. Since IA-64 hasn't "caught on" yet and there
is a very good chance that even 2nd gen "McKinley" won't either (the
consumer version isn't due until late 2003), the only chance Intel has
is to go x86-64 "full bore" and keep people from moving off it. So
we're back to Intel actually being the "x86 forever" guys!
I could be wrong about AMD looking at VLIW. But something tells me all
those former Alpha engineers are watering over the Transmeta technology
-- or at least when thinking about making improvements to what they have
already done. If this new "binary compiler" becomes available, x86 may
very well die regardless of Linux adoption. In fact, Linux desktop
adoption helps Intel with IA-64, so maybe continued Windows/x86 usage is
in Transmeta-AMD's favor? So maybe the support of Microsoft by AMD is
not so blind, eh?
It's just hard to tell. But it'ss harder to sit by and see good ideas
and innovations that could easily move us away from x86 inefficiency to
a new, RISC-like, VLIW bytecode platform not happen in the next 5
years. Because if it is going to happening, their is more change of it
from the AMD-Transmeta partnership than from Intel and its IA-64 IMHO.
Ironic this is because it is AMD who is keeping x86 alive with x86-64
because their "seizing control" of it is our best chance of stabilizing
and getting off it. Because like Microsoft with its Win/NT-ignorant
Win/DOS market, Intel cannot keep their IA-32 marketeers from ruining
any chance IA-64 has.
-- Bryan