Itanium

Development: 1989–2001 Inception: 1989–1994 In 1989, HP started to research an architecture that would exceed the expected limits of the reduced instruction set computer (RISC) architectures caused by the great increase in complexity needed for executing multiple instructions per cycle due to the need for dynamic dependency checking and precise exception handling. HP hired Bob Rau of Cydrome and Josh Fisher of Multiflow, the pioneers of very long instruction word (VLIW) computing. One VLIW instruction word can contain several independent instructions, which can be executed in parallel without having to evaluate them for independence. A compiler must attempt to find valid combinations of instructions that can be executed at the same time, effectively performing the instruction scheduling that conventional superscalar processors must do in hardware at runtime. HP researchers modified the classic VLIW into a new type of architecture, later named Explicitly Parallel Instruction Computing (EPIC), which differs by: having template bits which show which instructions are independent inside and between the bundles of three instructions, which enables the explicitly parallel execution of multiple bundles and increasing the processors' issue width without the need to recompile; by predication of instructions to reduce the need for branches; and by full interlocking to eliminate the delay slots. In EPIC the assignment of execution units to instructions and the timing of their issuing can be decided by hardware, unlike in the classic VLIW. HP intended to use these features in PA-WideWord, the planned successor to their PA-RISC ISA. EPIC was intended to provide the best balance between the efficient use of silicon area and electricity, and general-purpose flexibility. In 1993 HP held an internal competition to design the best (simulated) microarchitectures of a RISC and an EPIC type, led by Jerry Huck and Rajiv Gupta respectively. The EPIC team won, with over double the simulated performance of the RISC competitor. At the same time Intel was also looking for ways to make better ISAs. In 1989 Intel had launched the i860, which it marketed for workstations, servers, and iPSC and Paragon supercomputers. It differed from other RISCs by being able to switch between the normal single instruction per cycle mode, and a mode where pairs of instructions are explicitly defined as parallel so as to execute them in the same cycle without having to do dependency checking. Another distinguishing feature were the instructions for an exposed floating-point pipeline, that enabled the tripling of throughput compared to the conventional floating-point instructions. Both of these features were left largely unused because compilers didn't support them, a problem that later challenged Itanium, too. Without them, i860's parallelism (and thus performance) was no better than other RISCs, so it failed in the market. Itanium would adopt a more flexible form of explicit parallelism than i860 had. In November 1993 HP approached Intel, seeking collaboration on an innovative future architecture. At the time Intel was looking to extend x86 to 64 bits in a processor codenamed P7, which they found challenging. Later Intel claimed that four different design teams had explored 64-bit extensions, but each of them concluded that it was not economically feasible. At the meeting with HP, Intel's engineers were impressed when Jerry Huck and Rajiv Gupta presented the PA-WideWord architecture they had designed to replace PA-RISC. "When we saw WideWord, we saw a lot of things we had only been looking at doing, already in their full glory", said Intel's John Crawford, who in 1994 became the chief architect of Merced, and who had earlier argued against extending the x86 with P7. HP's Gupta recalled: "I looked Albert Yu [Intel's general manager for microprocessors] in the eyes and showed him we could run circles around PowerPC, that we could kill PowerPC, that we could kill the x86." Convinced of the superiority of the new project, in 1994 Intel canceled their existing plans for P7. In June 1994 Intel and HP announced their joint effort to make a new ISA that would adopt ideas of Wide Word and VLIW. Yu declared: "If I were competitors, I'd be really worried. If you think you have a future, you don't." On P7's future, Intel said the alliance would impact it, but "it is not clear" whether it would "fully encompass the new architecture". Later the same month, Intel said that some of the first features of the new architecture would start appearing on Intel chips as early as the P7, but the full version would appear sometime later. In August 1994 EE Times reported that Intel told investors that P7 was being re-evaluated and possibly canceled in favor of the HP processor. Intel immediately issued a clarification, saying that P7 is still being defined, and that HP may contribute to its architecture. Later it was confirmed that the P7 codename had indeed passed to the HP-Intel processor. By early 1996 Intel revealed its new codename, Merced. HP believed that it was no longer cost-effective for individual enterprise systems companies such as itself to develop proprietary microprocessors, so it partnered with Intel in 1994 to develop the IA-64 architecture, derived from EPIC. Intel was willing to undertake the very large development effort on IA-64 in the expectation that the resulting microprocessor would be used by the majority of enterprise systems manufacturers. HP and Intel initiated a large joint development effort with a goal of delivering the first product, Merced, in 1998. Shortly before the reveal of EPIC at the Microprocessor Forum in October 1997, an analyst of the Microprocessor Report said that Itanium would "not show the competitive performance until 2001. It will take the second version of the chip for the performance to get shown". At the Forum, Intel's Fred Pollack originated the "wait for McKinley" mantra when he said that it would double the Merced's performance and would "knock your socks off", while using the same 180 nm process as Merced. Pollack also said that Merced's x86 performance would be lower than the fastest x86 processors, and that x86 would "continue to grow at its historical rates". Later it was reported that HP's motivation when starting to design McKinley in 1996 was to have more control over the project so as to avoid the issues affecting Merced's performance and schedule. The design team finalized McKinley's project goals in 1997. In late May 1998 Merced was delayed to mid-2000, and by August 1998 analysts were questioning its commercial viability, given that McKinley would arrive shortly after with double the performance, as delays were causing Merced to turn into simply a development vehicle for the Itanium ecosystem. The "wait for McKinley" narrative was becoming prevalent. The same day it was reported that due to the delays, HP would extend its line of PA-RISC PA-8000 series processors from PA-8500 to as far as PA-8900. In October 1998 HP announced its plans for four more generations of PA-RISC processors, with PA-8900 set to reach 1.2 GHz in 2003. By March 1999 some analysts expected Merced to ship in volume only in 2001, but the volume was widely expected to be low as most customers would wait for McKinley. In July 1999, upon reports that the first silicon would be made in late August, analysts predicted a delay to late 2000, and came into agreement that Merced would be used chiefly for debugging and testing the IA-64 software. Linley Gwennap of MPR said of Merced that "at this point, everyone is expecting it's going to be late and slow, and the real advance is going to come from McKinley. What this does is puts a lot more pressure on McKinley and for that team to deliver". By then, Intel had revealed that Merced would be initially priced at $5000. In August 1999 HP advised some of their customers to skip Merced and wait for McKinley. By July 2000 HP told the press that the first Itanium systems would be for niche uses, and that "You're not going to put this stuff near your data center for several years."; HP expected its Itanium systems to outsell the PA-RISC systems only in 2005. The same July Intel told of another delay, due to a stepping change to fix bugs. Now only "pilot systems" would ship that year, while the general availability was pushed to the "first half of 2001". Server makers had largely forgone spending on the R&D for the Merced-based systems, instead using motherboards or whole servers of Intel's design. To foster a wide ecosystem, by mid-2000 Intel had provided 15,000 Itaniums in 5,000 systems to software developers and hardware designers. In March 2001 Intel said Itanium systems would begin shipping to customers in the second quarter, followed by a broader deployment in the second half of the year. By then even Intel publicly acknowledged that many customers would wait for McKinley. Expectations During development, Intel, HP, and industry analysts predicted that IA-64 would dominate first in 64-bit servers and workstations, then expand to the lower-end servers, supplanting Xeon, and finally penetrate into the personal computers, eventually to supplant RISC and complex instruction set computing (CISC) architectures for all general-purpose applications, though not replacing x86 "for the foreseeable future" according to Intel. In 1997-1998, Intel CEO Andy Grove predicted that Itanium would not come to the desktop computers for four of five years after launch, and said "I don't see Merced appearing on a mainstream desktop inside of a decade". Already in 1998 Itanium's focus on the high end of the computer market was criticized for making it vulnerable to challengers expanding from the lower-end market segments, but many people in the computer industry feared voicing doubts about Itanium in the fear of Intel's retaliation. Several groups ported operating systems for the architecture, including Microsoft Windows, OpenVMS, Linux, HP-UX, Solaris, Tru64 UNIX, The latter three were canceled before reaching the market. By 1997, it was apparent that the IA-64 architecture and the compiler were much more difficult to implement than originally thought, and the delivery timeframe of Merced began slipping. Within hours, the name Itanic had been coined on a Usenet newsgroup, a reference to the RMS Titanic, the "unsinkable" ocean liner that sank on her maiden voyage in 1912. "Itanic" was then used often by The Register, and others, to imply that the multibillion-dollar investment in Itanium—and the early hype associated with it—would be followed by its relatively quick demise. Itanium (Merced): 2001 After having sampled 40,000 chips to the partners, Intel launched Itanium on May 29, 2001, with first OEM systems from HP, IBM and Dell shipping to customers in June. By then Itanium's performance was not superior to competing RISC and CISC processors. Itanium competed at the low-end (primarily four-CPU and smaller systems) with servers based on x86 processors, and at the high-end with IBM POWER and Sun Microsystems SPARC processors. Intel repositioned Itanium to focus on the high-end business and HPC computing markets, attempting to duplicate the x86's successful "horizontal" market (i.e., single architecture, multiple systems vendors). The success of this initial processor version was limited to replacing the PA-RISC in HP systems, Alpha in Compaq systems and MIPS in SGI systems, though IBM also delivered a supercomputer based on this processor. POWER and SPARC remained strong, while the 32-bit x86 architecture continued to grow into the enterprise space, building on the economies of scale fueled by its enormous installed base. Only a few thousand systems using the original Merced Itanium processor were sold, due to relatively poor performance, high cost and limited software availability. Recognizing that the lack of software could be a serious problem for the future, Intel made thousands of these early systems available to independent software vendors (ISVs) to stimulate development. HP and Intel brought the next-generation Itanium 2 processor to the market a year later. Few of the microarchitectural features of Merced would be carried over to all the subsequent Itanium designs, including the 16+16 KB L1 cache size and the 6-wide (two-bundle) instruction decoding. Itanium 2 (McKinley and Madison): 2002–2006 The Itanium 2 processor was released in July 2002, and was marketed for enterprise servers rather than for the whole gamut of high-end computing. The first Itanium 2, code-named McKinley, was jointly developed by HP and Intel, led by the HP team at Fort Collins, Colorado, taping out in December 2000. It relieved many of the performance problems of the original Itanium processor, which were mostly caused by an inefficient memory subsystem by approximately halving the latency and doubling the fill bandwidth of each of the three levels of cache, while expanding the L2 cache from 96 to 256 KB. Floating-point data is excluded from the L1 cache, because the L2 cache's higher bandwidth is more beneficial to typical floating-point applications than low latency. The L3 cache is now integrated on-chip rather than on a separate die, tripling in associativity and doubling in bus width. McKinley also greatly increases the number of possible instruction combinations in a VLIW-bundle and reaches 25% higher frequency, despite having only eight pipeline stages versus Merced's ten. In May 2003 it was disclosed that some McKinley processors can suffer from a critical-path erratum leading to a system's crashing. It can be avoided by lowering the processor frequency from the original 1 GHz to 800 MHz. In 2003, AMD released the Opteron CPU, which implements its own 64-bit architecture called AMD64. The Opteron gained rapid acceptance in the enterprise server space because it provided an easy upgrade from x86. Under the influence of Microsoft, Intel responded by implementing AMD's x86-64 instruction set architecture (instead of IA-64) in its Xeon microprocessors in 2004, resulting in a new industry-wide de facto standard. The Alliance announced that its members would invest $10 billion in the Itanium Solutions Alliance by the end of the decade. Itanium 2 9000 and Itanium 9100: 2006 and 2007 In early 2003, due to the success of IBM's dual-core POWER4, Intel announced that the first 90 nm Itanium processor, codenamed Montecito, would be delayed to 2005 so as to change it into a dual-core, thus merging it with the Chivano project. In September 2004 Intel demonstrated a working Montecito system, and claimed that the inclusion of hyper-threading increases Montecito's performance by 10-20% and that its frequency could reach 2 GHz. After a delay to "mid-2006" and reduction of the frequency to 1.6 GHz, on July 18 Intel delivered Montecito (marketed as the Itanium 2 9000 series), a dual-core processor with a switch-on-event multithreading and split 256 KB + 1 MB L2 caches that roughly doubled the performance and decreased the energy consumption by about 20 percent. At 596 mm² die size and 1.72 billion transistors it was the largest microprocessor at the time. It was supposed to feature Foxton Technology, a very sophisticated frequency regulator, which failed to pass validation and was thus not enabled for customers. Intel released the Itanium 9100 series, codenamed Montvale, in November 2007, retiring the "Itanium 2" brand. Originally intended to use the 65 nm process, it was changed into a fix of Montecito, enabling the demand-based switching (like EIST) and up to 667 MT/s front-side bus, which were intended for Montecito, plus a core-level lockstep. Itanium 9300 (Tukwila): 2010 The original code name for the first Itanium with more than two cores was Tanglewood, but it was changed to Tukwila in late 2003 due to trademark issues. Intel discussed a "middle-of-the-decade Itanium" to succeed Montecito, achieving ten times the performance of Madison. In early 2004 Intel told of "plans to achieve up to double the performance over the Intel Xeon processor family at platform cost parity by 2007". By early 2005 Tukwila was redefined, now having fewer cores but focusing on single-threaded performance and multiprocessor scalability. In March 2005, Intel disclosed some details of Tukwila, the next Itanium processor after Montvale, to be released in 2007. Tukwila would have four processor cores and would replace the Itanium bus with a new Common System Interface, which would also be used by a new Xeon processor. Tukwila was to have a "common platform architecture" with a Xeon codenamed Whitefield, when Intel revised Tukwila's delivery date to late 2008. In May 2009, the schedule for Tukwila, was revised again, with the release to OEMs planned for the first quarter of 2010. The Itanium 9300 series processor, codenamed Tukwila, was released on February 8, 2010, with greater performance and memory capacity. The device uses a 65 nm process, includes two to four cores, up to 24 MB on-die caches, Hyper-Threading technology and integrated memory controllers. It implements double-device data correction, which helps to fix memory errors. Tukwila also implements Intel QuickPath Interconnect (QPI) to replace the Itanium bus-based architecture. It has a peak interprocessor bandwidth of 96 GB/s and a peak memory bandwidth of 34 GB/s. With QuickPath, the processor has integrated memory controllers and interfaces the memory directly, using QPI interfaces to directly connect to other processors and I/O hubs. QuickPath is also used on Intel x86-64 processors using the Nehalem microarchitecture, which possibly enabled Tukwila and Nehalem to use the same chipsets. Tukwila incorporates two memory controllers, each of which has two links to Scalable Memory Buffers, which in turn support multiple DDR3 DIMMs, much like the Nehalem-based Xeon processor code-named Beckton. HP vs. Oracle During the 2012 Hewlett-Packard Co. v. Oracle Corp. support lawsuit, court documents unsealed by a Santa Clara County Court judge revealed that in 2008, Hewlett-Packard had paid Intel around $440 million to keep producing and updating Itanium microprocessors from 2009 to 2014. In 2010, the two companies signed another $250 million deal, which obliged Intel to continue making Itanium CPUs for HP's machines until 2017. Under the terms of the agreements, HP had to pay for chips it gets from Intel, while Intel launches Tukwila, Poulson, Kittson, and Kittson+ chips in a bid to gradually boost performance of the platform. Itanium 9500 (Poulson): 2012 Intel first mentioned Poulson on March 1, 2005, at the Spring IDF. In June 2007 Intel said that Poulson would use a 32 nm process technology, skipping the 45 nm process. Analyst David Kanter speculated that Poulson would use a new microarchitecture, with a more advanced form of multithreading that uses up to two threads, to improve performance for single threaded and multithreaded workloads. Some information was also released at the Hot Chips conference. Information presented improvements in multithreading, resiliency improvements (Intel Instruction Replay RAS) and few new instructions (thread priority, integer instruction, cache prefetching, and data access hints). Poulson was released on November 8, 2012, as the Itanium 9500 series processor. It is the follow-on processor to Tukwila. It features eight cores and has a 12-wide issue architecture, multithreading enhancements, and new instructions to take advantage of parallelism, especially in virtualization. The Poulson L3 cache size is 32 MB and common for all cores, not divided like previously. L2 cache size is 6 MB, 512 I KB, 256 D KB per core. Die size is 544 mm², less than its predecessor Tukwila (698.75 mm²). Intel's Product Change Notification (PCN) 111456-01 lists four models of Itanium 9500 series CPU, which was later removed in a revised document. The parts were later listed in Intel's Material Declaration Data Sheets (MDDS) database. Intel later posted Itanium 9500 reference manual. The models are the following: : Itanium 9700 (Kittson): 2017 Intel had committed to at least one more generation after Poulson, first mentioning Kittson on 14 June 2007. Kittson was supposed to be on a 22 nm process and use the same LGA2011 socket and platform as Xeons. On 31 January 2013 Intel issued an update to their plans for Kittson: it would have the same LGA1248 socket and 32 nm process as Poulson, thus effectively halting any further development of Itanium processors. In April 2015, Intel, although it had not yet confirmed formal specifications, did confirm that it continued to work on the project. Meanwhile, the aggressively multicore Xeon E7 platform displaced Itanium-based solutions in the Intel roadmap. Even Hewlett-Packard, the main proponent and customer for Itanium, began selling x86-based Superdome and NonStop servers, and started to treat the Itanium-based versions as legacy products. Intel officially launched the Itanium 9700 series processor family on May 11, 2017. Intel announced that the 9700 series would be the last Itanium chips produced. : == Market share ==