What are the tipping points for an AI boom?
Some are clear in hindsight.
The open-source release of Stable Diffusion, still one of the most impressive image generators out there, was the beginning of the end for the closed-access model that had dominated the AI world until then. It arrived when the image generator Dall-E 2 was still limited to a handful of people who had been vetted by OpenAI, and offered an alternative proposal: powerful image creation to anyone who wanted it.
That prompted the next tipping point: the launch of ChatGPT, the Ford Model T of AI. It was open-access, easy to use and powerfully capable, and its appearance captured imaginations and propelled the technology to the peak of the hype cycle.
Now, just a few months later, we’re seeing the arrival of a third, as AI systems shift from being a standalone service to something deeply integrated with the tools and apps we already use to work and live.
Copilot (and Google)
Last Tuesday, Google announced a swathe of AI tools for its productivity suite. Eventually, users will be able to use the company’s large language model (LLM) to generate text directly in Gmail or Google Docs; generate images, audio and video in Slides; and ask complex natural language questions to manipulate data in Google Sheets.
The company was evasive on when these features would roll out, saying only that it plans to bring them to “trusted testers on a rolling basis throughout the year, before making them available publicly”. In true Google style, the company seemed more concerned with showing off its undeniable ability than shipping projects.
But never discount light corporate espionage as a motive. Just a couple of days later, the motivation for announcing the features became clear when Microsoft held a launch event for its new Copilot feature for Microsoft 365 (still better known as MS Office, a brand that was technically retired at the beginning of this year). From the Verge:
The Copilot, powered by GPT-4 from OpenAI, will sit alongside Microsoft 365 apps much like an assistant (remember Clippy?), appearing in the sidebar as a chatbot that allows Office users to summon it to generate text in documents, create PowerPoint presentations based on Word documents, or even help use features like PivotTables in Excel.
The features Microsoft demonstrated on Thursday are wildly impressive. You can join a Teams video chat and ask not only for a brief summary of what was discussed so far, but even for a sense of how a specific proposal was received by the other members of the call. Copilot can not only draft an email inviting people to a birthday party, it can also include a request for them to reply with anecdotes to use in a speech, then automatically pull out the three best stories from those replies, edit them for length and throw them directly into your notes for the talk itself.
Microsoft says that Copilot isn’t just a version of GPT-4 awkwardly stuck on to Office. The company says it is closely integrated with the raw data that lies behind everything you do, and can be much more precise as a result.
But I think that matters less than the simple presence of an AI system built into the corporate behemoth that is Office. Once these features roll out – and when Google flicks the switch in its own web apps – millions of people around the world will have the ability to pull a powerful AI in as a co-worker, without having to convince management to sign-off on it, without having to experiment with and trust a new provider, and without anyone consciously deciding to “pivot to AI”.
Adobe
Microsoft was just the beginning. Today, Adobe announced a similar overhaul of its own products, bringing AI image generation to its Creative Cloud (best known for Photoshop). The new service, called Firefly, is in part a similar extension of the offering to that produced by Microsoft, bringing AI-powered technology inside the processes and workflows that the company’s customers are already used to.
That means users will be able to spin up Firefly to generate whole new imagery, like other image generators such as Midjourney and Stable Diffusion, or to create text effects for lettering. The company is also planning to introduce AI-powered video editing (“make this scene look as if it was filmed in winter”), 3D modelling and digital image manipulation.
Adobe’s been one of the leading commercial providers of AI-powered tools for some time now. Photoshop’s “content aware fill”, which used proto-AI techniques to replace the background in edited pictures, was a landmark in image editing when it was released more than a decade ago.
But the company’s offering this time is more than just building the same AI generation into its own software. A core plank of Firefly is that the company is offering “safe” generation: its generative model is, it says, “trained on Adobe Stock images, openly licensed content, and public domain content where copyright has expired”. In other words, if you work with Firefly-created images, you know for certain that there is no nasty copyright lawsuit coming down the line.
That stands in stark contrast to GPT-4, which is trained on … well, no one actually knows. (In a very telling interview, OpenAI’s chief scientist “did not reply when asked if OpenAI could state definitively that its training data does not include pirated material”. In surely unrelated information, one of the largest LLM training datasets, an 800GB collection of text called the Pile, includes 196,640 books downloaded from a popular BitTorrent site called Bibliotik. The copyright notice for the Pile’s hosts is a video of a choir of women pretending to masturbate.)
Adobe’s plans to distinguish itself here go further still. In 2019, the company founded the Content Authenticity Initiative, which aims to fight misinformation by building a standard for images and other media to embed proof of their provenance. Now, it’s expanding that by introducing a “do not train” tag to images, allowing creators to ensure that their media doesn’t get incorporated into future models. It’s not as strong as some critics would like – an opt-out system will always catch more people unawares than an opt-in one – but it’s a clear push for respectability.
Once More to Google
Just an hour after Adobe’s announcement (and the reason why today’s newsletter is published a little later than usual), Google unveiled the project that could kill the goose that lays the golden eggs – or save it.
Bard, Google’s ChatGPT-esque conversational AI that was announced earlier this year (again, just days before Microsoft announced-and-shipped its own Bing Chat) is now real, with the company rolling out access to users through a wait-list.
Compared with the competition, there’s nothing immediately stunning in what Bard can do. But a couple of features distinguish it, such as the ability to automatically generate multiple drafts of a longer response to see which you prefer, the distinction between simple factual posts that arrive with footnotes to sources and longer generative ones that don’t, or the ability to automatically generate a Google search from your queries.
But it will take time to uncover where Bard excels, a phenomenon known in the industry as “capability overhang”. We typically find out what AI models can do in the weeks and months after they’re made, as simple queries give way to more elaborate and practiced commands. For now, and with only a quick live demo to go on, it seems to be roughly on par with the competition, although it opened its first reply with an error: responding to a request for a list of child-friendly activities in Tokyo, it failed to mention that the market it suggested had substantially relocated since 2018.
Equally unclear, and more existential, is whether Bard can coexist with Google Search. The company wouldn’t answer questions about how much a Bard query costs to run, instead talking about the efficiency improvements they had made, but a ballpark figure of 10 to 100 times as much as a single Google search is a safe bet. Bard, however, doesn’t show users any adverts (yet), so it’s not clear that it would be able to elegantly earn even a fraction of the revenue of a normal search slot.
The company insists that some queries will be better answered through a search (including my own question about Japan), and if Bard serves as an expensive traffic-acquisition strategy to stave off a potential flood of users to Bing and ChatGPT, that could be good enough for the short term. But it feels like a holding strategy where Google should be shooting for the stars.
If you want to read the complete version of the newsletter, please subscribe to receive TechScape in your inbox every Tuesday