Former Google and Apple Researchers Raise $50M for New Visual AI Startup

Deep News01-11 17:12

Veteran AI researcher Andrew Dai announced that after a 14-year tenure at Google DeepMind, he has recently departed to establish a new startup focused on developing artificial intelligence models capable of simultaneously understanding and processing text, images, video, and audio data. Dai, along with another informed source, revealed that the new company, named Elorian, is currently negotiating a seed funding round of approximately $50 million. The source also indicated that Striker Venture Partners is in talks to lead this investment round; this venture firm was established last October by Max Gazor, a former General Partner at venture capital firm CRV. The source disclosed that Yinfei Yang is a co-founder of Elorian. Yang previously served as a research scientist at Apple, working on the company's AI models, and left in December last year. Both Dai and Yang have updated their LinkedIn profiles to show employment at a company in "stealth mode," with Dai's profile listing his position as Chief Executive Officer. In a phone interview on Saturday, Dai stated that Elorian's core business is developing AI models that can interpret and analyze the physical world at a visual level by simultaneously processing image, video, and audio data. He mentioned that robotics is one potential application scenario for Elorian's AI models, and the startup has also planned numerous other application directions, though he did not elaborate further. Yang has not yet responded to related inquiries. Early AI models developed by companies like OpenAI were trained solely on text data, but the industry trend in recent years has shifted towards models trained on image and video data. This research area, known as visual reasoning, has now become a key focus for many major AI companies and startups, including Google, OpenAI, and Anthropic. Amazon also launched a similar AI model last month at its annual cloud technology conference. Visual reasoning models are designed for complex AI application scenarios, such as robotic systems. These models possess multi-modal integration capabilities, which can eliminate the need for developers to integrate different AI models. Some researchers point out that this technology is highly valuable for AI agents—agents that need to interpret and recognize visual information like screenshots to complete complex tasks such as processing retail product returns or reviewing legal documents. According to Dai's LinkedIn profile, during his tenure at Google DeepMind, he served as a co-lead for the data pre-training work on the Gemini series of models, a core underlying technology for the Gemini series. Furthermore, Dai co-authored research papers with several prominent Google researchers, including Quoc V. Le and Jeff Dean, Chief Scientist at Google DeepMind and Google Research. The informed source stated that Andrew Dai is a pioneer in the field of language models and has been deeply involved in pre-training research for the past two decades. The source added that his research work has largely focused on two main directions: one is developing techniques to assess the quality of training data for AI models, and the other is ensuring that a model's training data is sourced from multiple, diverse channels.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment