Nvidia In the Valley – Stratechery by Ben Thompson

Nvidia investors have been in the valley before:

This chart, though, is not from the last two years, but rather from the beginning of 2017 to the beginning of 2019; here is 2017 to today:

Three big things happened to Nvidia’s business over the last three years that drove the price to unprecedented heights:

The pandemic led to an explosion in PC buying generally and gaming cards specifically, as customers had both the need for new computers and a huge increase in discretional income with nowhere to spend it beyond better game experiences.
Machine learning applications, which were trained on Nvidia GPUs, exploded amongst the hyperscalers.
The crypto bubble led to skyrocketing demand for Nvidia chips to solve Ethereum proof-of-work equations to earn — i.e. mine — Ether.

Crypto isn’t so much a valley as it is a cliff: Ethereum successfully transitioned to a proof-of-stake model, rendering entire mining operations, built with thousands of Nvidia GPUs, worthless overnight; given that Bitcoin, the other major crypto network to use proof-of-work, is almost exclusively mined on custom-designed chips, all of those old GPUs are flooding the second-hand market. This is particularly bad timing for Nvidia given that the pandemic buying spree ended just as the company’s attempt to catch up on demand for its 3000-series of chips were coming to fruition. Needless to say, too much new inventory plus too much used inventory is terrible for a company’s financial results, particularly when you’re trying to clear the channel for a new series:

Nvidia CEO Jensen Huang told me in a Stratechery Interview last week that the company didn’t see this coming:

I don’t think we could have seen it. I don’t think I would’ve done anything different, but what I did learn from previous examples is that when it finally happens to you, just take the hard medicine and get it behind you…We’ve had two bad quarters and two bad quarters in the context of a company, it’s frustrating for all the investors, it’s difficult on all the employees.

We’ve been here before at Nvidia.

We just have to deal with it and not be overly emotional about it, realize how it happened, keep the company as agile as possible. But when the facts presented itself, we just made cold, hard decisions. We took care of our partners, we took care of our channel, we took care of making sure that everybody had plenty of time. By delaying Ada, we made sure that everybody had plenty of time for and we repriced all the products such that even in the context of Ada, even if Ada were available, the products that after it’s been repriced is actually a really good value. I think we took care of as many things as we could, it resulted in two fairly horrific quarters. But I think in the grand scheme of things, we’ll come right back so I think that was probably the lessons from the past.

This may be a bit generous; analysts like Tae Kim and Doug O’Laughlin forecast the stock plunge earlier this year, although that was probably already too late to avoid this perfect storm of slowing PC sales and Ethereum’s transition, given that Nvidia ordered all of those extra 3000-series GPUs in the middle of the pandemic (Huang also cited the increasing lead times for chips as a big reason why Nvidia got this so wrong).

What is more concerning for Nvidia, though, is that while its inventory and Ethereum issues are the biggest drivers of its “fairly horrific quarters”, that is not the only valley its gaming business is navigating. I’m reminded of John Bunyan’s Pilgrim’s Progress:

Now Christian had not gone far in this Valley of Humiliation before he was severely tested, for he noticed a very foul fiend coming over the field to meet him; his name was Apollyon [Destroyer].

Call Apollyon inventory issues; Christian defeated him, as Nvidia eventually will.

Now at the end of this valley there was another called the Valley of the Shadow of Death; and it was necessary for Christian to pass through it because the way to the Celestial City was in that direction. Now this valley was a very solitary and lonely place. The prophet Jeremiah describes it as, “A wilderness, a land of deserts and of pits, a land of drought and of the shadow of death, a land that no man” (except a Christian) “passes through, and where no man dwells.”

What was striking about Nvidia’s GTC keynote last week was the extent to which this allegory seems to fit Nvidia’s ambitions: the company is setting off on what appears to be a fairly solitary journey to define the future of gaming, and it’s not clear when or if the rest of the industry will come along. Moreover, the company is pursuing a similarly audacious strategy in the data center and with its metaverse ambitions as well: in all three cases the company is pursuing heights even greater than those achieved over the last two years, but the path is surprisingly uncertain.

Gaming in the Valley: Ray-Tracing and AI

The presentation of 3D games has long depended on a series of hacks, particularly in terms of lighting. First, a game determines what you actually see (i.e. there is no use rendering an object that is occluded by another); then the correct texture is applied to the object (i.e. a tree, or grass, or whatever else you might imagine). Finally light is applied based on the position of a pre-determined light source, with a shadow map on top of that. The complete scene is then translated into individual pixels and rendered onto your 2D screen; this process is known as rasterization.

Ray tracing handles light completely differently: instead of starting with a pre-determined light source and applying light and shadow maps, ray tracing starts with your eye (or more precisely, the camera through which you are viewing the scene). It then traces the line of sight to every pixel on the screen, bounces it off that pixel (based on what type of object it represents), and continues following that ray until it either hits a light source (and thus computes the lighting) or discards it. This produces phenomenally realistic lighting, particularly in terms of reflections and shadows. Look closely at these images from PC Magazine:

Let’s see how ray tracing can visually improve a game. I took the following screenshot pairs in Square Enix’s Shadow of the Tomb Raider for PC, which supports ray-traced shadows on Nvidia GeForce RTX graphics cards. Specifically, look at the shadows on the ground.

Rasterized shadows

Ray-traced shadows

[…]The ray-traced shadows are softer and more realistic compared with the harsher rasterized versions. Their darkness varies depending on how much light an object is blocking and even within the shadow itself, while rasterization seems to give every object a hard edge. The rasterized shadows still don’t look bad, but after playing the game with ray traced shadows, it’s tough to go back.

Nvidia first announced API support for ray tracing back in 2009; however, few if any games used it because it is so computationally expensive (ray-tracing is used in movie CGI; however, those scenes can be rendered over hours or even days; games have to be rendered in real-time). That is why Nvidia introduced dedicated ray tracing hardware in its GeForce 2000-series line of cards (which were thus christened “RTX”) which came out in 2018. AMD went a different direction, adding ray-tracing capabilities to its core shader units (which also handle rasterization); this is slower than Nvidia’s pure hardware solution, but it works, and, importantly, since AMD makes graphics cards for the PS5 and Xbox, it means that ray tracing support is now industry-wide. More and more games will support ray tracing going forward, although most applications are still fairly limited because of performance concerns.

Here’s the important thing about ray tracing, though: by virtue of calculating light dynamically, instead of via light and shadow maps, it is something developers can get “for free.” A game or 3D environment that depended completely on ray tracing should be easier and cheaper to develop; more importantly, it means that environments could change in dynamic ways that the developer never anticipated, all while having more realistic lighting than the most labored-over pre-drawn environment.

This is particularly compelling in two emerging contexts: the first is in simulation games like Minecraft. With ray tracing it will be increasingly realistic to have highly detailed 3D-worlds that are constructed on the fly and lit perfectly. Future games could go further: the keynote opened with a game called RacerX where every single part of the game was fully simulated, including objects; the same sort of calculations for light were used for in-game physics as well.

The second context is a future of AI-generated content I discussed in DALL-E, the Metaverse, and Zero Marginal Cost Content. All of those textures I noted above are currently drawn by hand; as graphical capabilities — largely driven by Nvidia — have increased, so has the cost of creating new games, thanks to the need to create high resolution assets. One can imagine a future where asset creation is fully automated and done on the fly, and then lit appropriately via ray tracing.

For now, though Nvidia is already using AI to render images: the company also announced version 3 of its Deep Learning Super Sampling (DLSS) technology, which predicts and pre-renders frames, meaning they don’t have to be computed at all (previous versions of DLSS predicted and pre-rendered individual pixels). Moreover, Nvidia is, as with ray-tracing, backing up DLSS with dedicated hardware to make it much more performant. These new approaches, matched with dedicated cores on Nvidia’s GPUs, make Nvidia very well-placed for an entirely new paradigm in not just gaming but immersive 3D experiences generally (like a metaverse).

Here’s the problem, though: all of that dedicated hardware comes at a cost. Nvidia’s new GPUs are big chips — the top-of-the-line AD102, sold as the RTX 4090, is a fully integrated system-on-a-chip that measures 608.4mm² on TSMCs N4 process;¹ the top-of-the-line Navi 31 chip in AMD’s upcoming RDNA 3 graphics line, in comparison, is a chiplet design with a 308mm² graphics chip on TSMC’s N5 process,² plus six 37.5mm² memory chips on TSMC’s N6 process.³ In short, Nvidia’s chip is much larger (which means much more expensive), and it’s on a slightly more modern process (which likely costs more). Dylan Patel explains the implications at SemiAnalysis:

In short, AMD saves a lot on die costs by forgoing AI and ray tracing fixed function accelerators and moving to smaller dies with advanced packaging. The advanced packaging cost is up significantly with AMD’s RDNA 3 N31 and N32 GPUs, but the small fan-out RDL packages are still very cheap relative to wafer and yield costs. Ultimately, AMD’s increased packaging costs are dwarfed by the savings they get from disaggregating memory controllers/infinity cache, utilizing cheaper N6 instead of N5, and higher yields…Nvidia likely has a worse cost structure in traditional rasterization gaming performance for the first time in nearly a decade.

This is the valley that Nvidia is entering. Gamers were immediately up-in-arms after Nvidia’s keynote because of the 4000-series’ high prices, particularly when the fine print on Nvidia’s website revealed that one of the tier-two chips Nvidia announced was much more akin to a rebranded tier-3 chip, with the suspicion being that Nvidia was playing marketing games to obscure a major price increase. Nvidia’s cards may have the best performance, and are without question the best placed for a future of ray tracing and AI-generated content, but at the cost of being the best values for games as they are played today. Reaching the heights of purely simulated virtual worlds requires making it through a generation of charging for capabilities that most gamers don’t yet care about.

AI in the Valley: Systems, not Chips

One reason to be optimistic about Nvidia’s approach in gaming is that the company made a similar bet on the future when it invented shaders; I explained shaders after last year’s GTC in a Daily Update:

Nvidia first came to prominence with the Riva and TNT line of video cards that were hard-coded to accelerate 3D libraries like Microsoft’s Direct3D:

The GeForce line, though, was fully programmable via a type of computer program called a “shader” (I explained more about shaders in this Daily Update). This meant that a GeForce card could be improved even after it was manufactured, simply by programming new shaders (perhaps to support a new version of Direct3D, for example).

[…]More importantly, shaders didn’t necessarily need to render graphics; any sort of software — ideally programs with simple calculations that could be run in parallel — could be programmed as shaders; the trick was figuring out how to write them, which is where CUDA came in. I explained in 2020’s Nvidia’s Integration Dreams:

This increased level of abstraction meant the underlying graphics processing unit could be much simpler, which meant that a graphics chip could have many more of them. The most advanced versions of Nvidia’s just-announced GeForce RTX 30 Series, for example, has an incredible 10,496 cores.

This level of scalability makes sense for video cards because graphics processing is embarrassingly parallel: a screen can be divided up into an arbitrary number of sections, and each section computed individually, all at the same time. This means that performance scales horizontally, which is to say that every additional core increases performance. It turns out, though, that graphics are not the only embarrassingly parallel problem in computing…

This is why Nvidia transformed itself from a modular component maker to an integrated maker of hardware and software; the former were its video cards, and the latter was a platform called CUDA. The CUDA platform allows programmers to access the parallel processing power of Nvidia’s video cards via a wide number of languages, without needing to understand how to program graphics.

Now the Nvidia “stack” had three levels:

The important thing to understand about CUDA, though, is that it didn’t simply enable external programmers to write programs for Nvidia chips; it enabled Nvidia itself.

Much of this happened out of desperation; Huang explained in a Stratechery interview last spring that introducing shaders, which he saw as essential for the future, almost killed the company:

The disadvantage of programmability is that it’s less efficient. As I mentioned before, a fixed function thing is just more efficient. Anything that’s programmable, anything that could do more than one thing just by definition carries a burden that is not necessary for any particular one task, and so the question is “When do we do it?” Well, there was also an inspiration at the time that everything looks like OpenGL Flight Simulator. Everything was blurry textures and trilinear mipmapped, and there was no life to anything, and we felt that if you didn’t bring life to the medium and you didn’t allow the artist to be able to create different games and different genres and tell different stories, eventually the medium would cease to exist. We were driven by simultaneously this ambition of wanting to create a more programmable palette so that the game and the artist could do something great with it. At the same time, we also were driven to not go out of business someday because it would be commoditized. So somewhere in that kind of soup, we created programmable shaders, so I think the motivation to do it was very clear. The punishment afterwards was what we didn’t expect.

What was that?

Well, the punishment is all of a sudden, all the things that we expected about programmability and the overhead of unnecessary functionality because the current games don’t need it, you created something for the future, which means that the current applications don’t benefit. Until you have new applications, your chip is just too expensive and the market is competitive.

Nvidia survived because their ability to do direct acceleration was still the best; it thrived in the long run because they took it upon themselves to build the entire CUDA infrastructure to leverage shaders. This is where that data center growth comes from; Huang explained:

On the day that you become processor company, you have to internalize that this processor architecture is brand new. There’s never been a programmable pixel shader or a programmable GPU processor and a programming model like that before, and so we internalize. You have to internalize that this is a brand new programming model and everything that’s associated with being a program processor company or a computing platform company had to be created. So we had to create a compiler team, we have to think about SDKs, we have to think about libraries, we had to reach out to developers and evangelize our architecture and help people realize the benefits of it, and if not, even come close to practically doing it ourselves by creating new libraries that make it easy for them to port their application onto our libraries and get to see the benefits of it.

The first reason to recount this story is to note the parallels between the cost of shader complexity and the cost of ray tracing and AI in terms of current games; the second is to note that Nvidia’s approach to problem-solving has always been to do everything itself. Back then that meant developing CUDA for programming those shaders; today it means building out entire systems for AI.

Huang said during last week’s keynote:

Nvidia is dedicated to advancing science and industry with accelerated computing. The days of no-work performance scaling are over. Unaccelerated software will no longer enjoy performance scaling without a disproportionate increase in costs. With nearly three decades of a singular focus, Nvidia is expert at accelerating software and scaling computer by a 1,000,000x, going well beyond Moore’s Law.

Accelerated computing is a full-stack challenge. It demands deep understanding of the problem domain, optimizing across every layer of computing, and all three chips: CPU, GPU, and DPU. Scaling across multi-GPUs on multi-nodes is a datacenter-scale challenge, and requires treating the network and storage as part of the computing fabric, and developers and customers want to run software in many places, from PCs to super-computing centers, enterprise data centers, cloud, to edge. Different applications want to run in different locations, and in different ways.

Today, we’re going to talk about accelerated computing across the stack. New chips and how they will boost performance, far beyond the number of transistors, new libraries, and how it accelerates critical workloads to science and industry, new domain-specific frameworks, to help develop performant and easily deployable software. And new platforms, to let you deploy your software securely, safely, and with order-of-magnitude gains.

In Huang’s view, simply having fast chips is no longer enough for the workloads of the future: that is why Nvidia is building out entire data centers using all of its own equipment. Here again, though, a future where every company needs accelerated computing generally, and Nvidia to build it for them specifically — Nvidia’s Celestial City — is in contrast to the present where the biggest users of Nvidia chips in the data center are hyperscalers who have their own systems already in place.

A company like Meta, for example, doesn’t need Nvidia’s networking; they invented their own. What they do need are a lot of massively parallelizable chips to train their machine learning algorithms on, which means they have to pay Nvidia and their high margins. Small wonder that Meta, like Google before them, is building its own chip.

This is the course that all of the biggest companies will likely follow: they don’t need an Nvidia system, they need a chip that works in their system for their needs. That is why Nvidia is so invested in the democratization of AI and accelerated computing: the long term key to scale will be in building systems for everyone but the largest players. The trick to making it through the valley will be in seeing that ecosystem develop before Nvidia’s current big customers stop buying Nvidia’s expensive chips. Huang once saw that 3D accelerators would be commoditized and took a leap with shaders; one gets the sense he has the same fear with chips and is thus leaping into systems.

Metaverse in the Valley: Omniverse Nucleus

In the interview last spring I asked Huang if Nvidia would ever build a cloud service;

If we ever do services, we will run it all over the world on the GPUs that are in everybody’s clouds, in addition to building something ourselves, if we have to. One of the rules of our company is to not squander the resources of our company to do something that already exists. If something already exists, for example, an x86 CPU, we’ll just use it. If something already exists, we’ll partner with them, because let’s not squander our rare resources on that. And so if something already exists in the cloud, we just absolutely use that or let them do it, which is even better. However, if there’s something that makes sense for us to do and it doesn’t make for them to do, we even approach them to do it, other people don’t want to do it then we might decide to do it. We try to be very selective about the things that we do, we’re quite determined not to do things that other people do.

It turns out there was something no one else wanted to do, and that was create a universal database for 3D objects for use in what Nvidia is calling the Omniverse. These objects could be highly detailed millimeter-precise objects for use in manufacturing or supply chains, or they could be fantastical objects and buildings generated for virtual worlds; in Huang’s vision they would be available to anyone building on Omniverse Nucleus.

Here the Celestial City is a world of 3D experiences used across industry and entertainment — an Omniverse of metaverses, if you will, all connected to Nvidia’s cloud — and it’s ambitious enough to make Mark Zuckerberg blush! This valley, by the same token, seems even longer and darker: not only do all of these assets and 3D experiences need to be created, but entire markets need to be convinced of their utility and necessity. Building a cloud for a world that doesn’t yet exist is to reach for heights still out of sight.

There certainly is no questioning Huang and Nvidia’s ambition, although some may quibble with the wisdom of navigating three valleys all at once; it’s perhaps appropriate that the stock is in a valley itself, above and beyond that perfect storm in gaming.

What is worth considering, though, is that the number one reason why Nvidia customers — both in the consumer market and the enterprise one — get frustrated with the company is price: Nvidia GPUs are expensive, and the company’s margins — other than the last couple of quarters — are very high. Pricing power in Nvidia’s case, though, is directly downstream from Nvidia’s own innovations, both in terms of sheer performance in established workloads, and also in its investment in the CUDA ecosystem creating the tools for entirely new ones.

In other words, Nvidia has earned the right to be hated by taking the exact sort of risks in the past it is embarking on now. Suppose, for example, the expectation for all games in the future is not just ray tracing but full-on simulation of all particles: Nvidia’s investment in hardware will mean it dominates the era just as it did the rasterized one. Similarly, if AI applications become democratized and accessible to all enterprises, not just the hyperscalers, then it is Nvidia who will be positioned to pick up the entirety of the long tail. And, if we get to a world of metaverses, then Nvidia’s head start on not just infrastructure but on the essential library of objects necessary to make that world real (objects that will be lit by ray-tracing in AI-generated spaces, of course), will make it the most essential infrastructure in the space.

These bets may not all pay off; I do, though, appreciate the audacity of the vision, and won’t begrudge the future margins that may result in the Celestial City if Nvidia makes it through the valley.