A few frameworks to think about the models that will change the future and the companies that will be built on and around them.
I’ve been thinking about AI a lot in the last few years. In particular, since 2020 when it became very obvious how radically AI is going to change the world with the release of GPT-3.
Here are a few frameworks I have been using to think about the models that will change the future and the companies that will be built on and around them. Current and future AI models fall into three categories, roughly ordered by how new the technologies behind them are:
I will expand on what I mean by Classic, Generative, and Active AI and what new companies I believe will be relevant to each one. I’ll particularly focus on the middle one - Generative AI - since that is the one getting the most hype currently.
Classic AI refers to the stuff that we’ve been able to do well for a while now: ML models whose output is constrained to some set of labels, such as models used for image classification, spam recognition, and image segmentation.
New companies in this area are in niches where either:
A good example of the former is Spoor.ai. They bring an ornithologist to every wind farm using AI. Their models ingest video feeds to monitor bird populations and identify the number, and species of, birds in the area over time.
There are lots of other corners of the world where it would be interesting to do real-time monitoring like this (e.g. identifying poachers in nature reserves), but using human labor has been prohibitively expensive.
Examples of the latter include AI diagnostics for radiology. Radiologists' time is very expensive so using AI to make the decision process simpler and faster is extremely attractive.
Systems that can generate content, from text to images and 3D models. This is the natural next step from Classic AI, as the constraint of having a finite set of labels is removed. (Technically the output is still constrained to a set of tokens or pixel values but, in practice, it feels infinite since the combinatorics make the output space so vast).
There are three distinct types of use cases that are starting to crop up here:
Jasper.ai and Stability.ai are two high-profile examples using large models to generate text and images respectively. The focus of businesses here is on the output of the model - the input is only relevant insofar as it gets you the output you want. This is where prompt engineering is important and the UI is designed to enable creative exploration. The tool I am writing this essay in, lex.page, is another example, as is Github Copilot for code generation.
The tools for creation are almost always being used directly by humans and the output, thus, has a human quality check built-in.
The main difference to Classic AI is that by using Generative models, you have much fewer constraints than e.g. a labeling task. Examples here might include Supernormal, which takes full transcripts from meetings and maps them to useful meeting notes, action items, and decisions.
The core premise is that it is important for the output to accurately reflect the input - nothing important should be lost nor added.
Other examples in this category might include models for automatically writing product descriptions from product photos, intelligently summarizing the content of legal documents, explaining code, etc.
The main difference here is that humans are not necessarily part of the loop anymore - the transformation needs to be so accurate that the users can trust it enough to only verify correctness.
These models will appear on a spectrum of complexity and risk. On the simple, low-risk, side of the spectrum are models that can generate content for mostly entertainment value, such as ChatGPT and Replika.ai.
“Correctness” here doesn’t matter too much, although hallucinations and bias from the model may lead to the spread of false information and offense.
On the more complex, high-risk side, we have what I call “Expert AI”. These could be models that are trained to be tax experts, lawyers, and doctors. They can answer any question you throw at them better than the most knowledgeable accountant, lawyer, or doctor in the world.
But it may take some time before we get models like this - there is currently no reliable way to ensure that a model will only output answers from within a certain corpus, without “hallucinations” and without “prompt injection” issues.
There are already examples in the middle of the risk spectrum: the many apps with chatbots that provide advice on Cognitive Behavioral Therapy. But, they generally have heavy content filtering to avoid dealing with too complex and sensitive situations (suicidal thoughts, clinical depression, eating disorders, etc).
Active AI systems take actions in the world. This may look similar to Generative AI, but the main difference is needing ongoing feedback on the effects of its actions — i.e. these systems have a component that is a reinforcement learning problem.
Some Active AI systems are already being created using language models, providing feedback successively via the input/prompt. For example, GPT-3 can be prompted to use an iPython kernel to answer questions. Similarly, Adept.ai is working on ACT-1, a transformer model for turning a text description of a task into a set of computer actions to execute that task.
New architectures are likely needed to get further in this area - e.g. for tasks that involve many steps (booking flights) or that require fast, high-fidelity feedback (controlling robotic arms).
Using a reinforcement learning approach, such as DeepMind’s work on AlphaGo and AlphaStar, may not be feasible in areas where data is hard to generate or acquire.
This is one of the big difficulties in making AI models for self-driving cars. Simulating driving conditions to the fidelity required to generalize to the real world is hard and requires a detailed understanding of the millions of edge cases that the real world contains.
The big area for Active AI is controlling robots to perform high-level tasks while accounting for ongoing inputs from sensors. One example is Rios.ai which uses an AI model to figure out how to grip irregularly shaped objects with robotic arms.
Code has sharp edges. Learning to write code is hard because computers work fundamentally differently from human intuition. A compiler does not understand “what you mean”. It will do exactly what you tell it to do. If you accidentally left room for ambiguity in your code, the code will either fail immediately or fail later, in a way that will feel bizarre to you as a beginner.
This is most noticeable in the interfaces that code has with the real world. Taking text input from a user requires code to handle lots of possible edge cases, errors, typos, and hack attempts.
Similarly, it can be bewildering to find out that information laid out in a way humans easily understand may take weeks of work to input into a computer.
For example, we can easily skim through an annual report pdf to find yearly revenue figures from a table. But to algorithmically extract this information from an arbitrary PDF, we need thousands and thousands of lines of code to do it reliably.
On the output side, code output has so far mostly involved strictly mathematical transformations of the input, in a very particular format. Database manipulations and calculations are extremely useful and have changed the world drastically, but they have so far been limited to the very narrow domain for which a program has been designed.
My Uber Eats app lets me use a filter to find the best-rated Indian restaurants. But I can’t tell the app to filter out dishes containing lemongrass unless an engineer has spent time to build that particular feature.
AI makes these sharp edges fuzzier.
With AI models, we can start getting to a world where computers work in a way that comes closer to the way we intuitively think they should. Code writing models are already starting to understand “intent” from text descriptions. Input methods are expanding rapidly. Instead of writing this document, I can describe what I want to be written to an AI model which then writes a paragraph for me - or I can use a speech-to-text model and dictate what I want written.
Within the next decade, I think you’ll be able to ask your computer “what was that article I read a few years ago by some VC talking about a framework for AI models” and it will point you back here.