SAN JOSE, Calif. -- Nvidia Chief Executive Jensen Huang ushered in the Age of Inference at the company's annual GTC conference Monday, outlining a huge array of new products, both in hardware and software, geared toward running AI models more quickly and efficiently.
In front of a crowd of more than 30,000 at the SAP Center, home of the San Jose Sharks hockey team, Huang unveiled Nvidia's new flagship product, which he said would revolutionize inference, the form of AI computing that allows models to respond to user queries.
For years, Nvidia has dominated the business of selling graphics processing units, or GPUs, the powerful chips used to train most large AI models. But over the last year, as AI companies have moved quickly to try to monetize their models and the AI tools built on top of them, customers have asked for better chips that are customized for inference computing, rather than training.
Known as the Nvidia Groq 3 LPX rack, Nvidia's new servers will combine 72 of Nvidia's next-generation Vera Rubin servers with 256 of a new chip called an LPU, or language processing unit, developed by Groq, a startup whose top leadership team Nvidia acquired in a $20 billion technology licensing deal in December.
"This is the AI future. This is where AI wants to go," Huang said. "It's designed for inference, this one workload. And this workload is what drives AI factories."
Nvidia said that this new system can generate 700 million tokens -- the basic unit of computing measurements -- per second, a rate of computing that's 350 times as fast as Nvidia's second-to-last generation of GPUs, known as Hopper.
Huang has been signaling for most of the last year that Nvidia would increasingly focus on inference computing going forward. The company's traditional GPUs have typically not been regarded as ideal for inference, because they consume a huge amount of energy and don't come with enough attached memory to allow models to access the troves of data on which they were trained.
The new Vera Rubin and Groq combined servers will have 500 times as much high bandwidth memory as the Hopper generation, helping solve the memory bottleneck.
"The inference inflection has arrived," Huang said in his keynote speech. "This is the secret sauce."
Huang said Nvidia expected to sell $1 trillion worth of Blackwell and Rubin chips by the end of 2027, updating earlier guidance that had the company selling $500 billion worth by the end of 2026.
Huang used the speech to announce a host of partnerships aimed at bolstering Nvidia's business in designing "digital twins" and other types of simulations. The company also announced a coalition of software companies, including Cursor, Mistral, Perplexity, Reflection and Thinking Machines, aimed at making it easier to develop frontier open-sourced AI models.
The coalition's work would put the development of enterprise software tools into hyperdrive, Huang said, helping speed the transformation of the world's software-as-a-service industry into an agentic-AI-as-a-service industry.
As Huang, spoke, Nscale, a U.K.-based cloud-computing startup backed by Nvidia, announced that it would build a 1.35-gigawatt data center cluster in West Virginia using the new Vera Rubin servers. The company describes the project, known as Monarch Compute Campus, as one of the largest AI computing installations in the world.
Nvidia also announced an expansion of its autonomous driving business, including the addition of four new partners for Nvidia's robotaxi computing system -- India's BYD, China's Geely Auto, Hyundai and Nissan. Using Nvidia's chips and simulation models, the auto manufacturers are expected to significantly increase the number of autonomous ride-share vehicles on the road, Huang said.
Toward the end of the presentation, a robotic version of Olaf, the snowman from Disney's "Frozen" animated franchises, designed by a partnership between Nvidia, DeepMind and Disney, ambled onstage and had a stilted conversation with Huang about the company's Omniverse division, which develops physical AI products for things like robots.
"You learned how to walk inside Omniverse," Huang told the robot.
"I really love walking!" it replied enthusiastically.
"Can you imagine this?" Huang asked, before leaving the stage. "The future of Disneyland: all these robots, all these characters wandering around."
