OpenAI Reportedly Uses Google Data to Power ChatGPT
As ChatGPT becomes a go-to tool for information, new reports indicate that its parent company, OpenAI, is leveraging data from Google Search to generate answers for its users.
Bridging the Information Gap
According to a detailed report from The Information, OpenAI is specifically using Google's search index to address topics where its own proprietary tools have limitations. These areas primarily include fast-moving subjects like current news, sports results, and financial market data. By tapping into Google's vast and constantly updated index, ChatGPT can provide more timely and accurate responses for these real-time inquiries.
This development highlights a fascinating dynamic, as Google Search is one of ChatGPT's most significant competitors. The rise of AI chatbots has pushed Google to accelerate the development of its own AI-powered search summaries, which has in turn impacted web traffic for many online publishers.
The Technology Behind the Scrape
The report also sheds light on the method OpenAI allegedly uses to access this data. It appears OpenAI retrieves the information through SerpApi, a third-party web-scraping company. According to the SerpApi website, the firm specializes in extracting search engine results data, which is highly valuable for training large machine learning models.
An Engineer's Clever Experiment
To add weight to these claims, former Google engineer Abishek Iyer conducted a compelling experiment. As detailed in a post on X, Iyer invented a unique, nonsensical word. He then placed this word on a hidden webpage and ensured it was indexed only by Google Search, keeping it out of other search engines like Bing. Shortly after the term was indexed, he prompted ChatGPT and found it could define the fake word using the exact phrasing from his hidden page, strongly suggesting the information was sourced from Google's index.
This evidence contradicts OpenAI's previous public statements, where the company has maintained that its search capabilities are fueled by its own in-house web crawler and data from partnered publishers. Google was not known to be one of these partners, making the use of its search index a notable and unconfirmed strategy.