On New Language Models

A few years ago, Greg Brockman (at the time the CTO of Stripe), Sam Altman (at the time the President of YCombinator) and Elon Musk(at the time slightly less crazy but still Elon Musk) got together to have dinner to discuss the future of AI. Its 2015, AlphaGo has yet to beat Lee Se Dol at one of the most computational challenging games that humans have invented, and ImageNet has only scratched the surface of what it would go on to do in terms of image recognition.

Yet despite limited advances for the last 15 years or so, there was a prevailing sense that some new change was just beyond the horizon. Just a year earlier in 2014, Swedish philosopher Nick Bostrom had written a book titled SuperIntelligence that describes the implications of creating an agent capable of not only superhuman levels of cognition but the ability to rapidly improve those abilities. It posited that such an agent could represent and existentialist threat to the survival of humanity were it to be created without proper safeguards. It’s a terrifying and thrilling read— and it clearly was an influential book for Elon Greg and Sam.

They put together some money from Elon, Sam, Reid Hoffman(of PayPal and LinkedIn notoriety) and a few institutional investors to create a foundation to study and create what it would mean to create a ‘safe’ artificial intelligence. And thus OpenAI was born, it’s charter: to ensure that any artificial general intelligence that is created benefits all of humanity.

Over the last few years they’ve attracted extremely top notch talent and published a number of seminal works in recent years, including a few that I feel are underrated in their perceived importance within the AI community:

This week OpenAI released access to an API that serves as an interface to their new language model: GPT-3. GPT3 builds unsurprisingly on the work of GPT’s 1 and 2 but also on a surge in work in the Natural Language Processing space (NLP) following the 2017 paper Attention is all you need.

I was lucky enough to get access to the API and spent a few hours exploring how I could use it for a project I am working on.

(I’ll leave a deep dive into the specific implementation details of this model for a different blog post, but instead focus on some of the cool things I was able to get it to do).

Interesting use case 1:

I prompted it with a passage of text (specifically the abstract from the paper on which the API was based) and asked it to summarize it:

https://youtu.be/wzgaQaSygRM

It correctly summarized the abstract in a way that was interpretable by a second grader. I did a few other experiments with this format and found I could get to to create a summary for most age ranges. Interestingly, I asked it summarize it to create a summary for “my Japanese friend” and “my friend” and in the former case it wrote a summary that was not only longer but more technically in-depth.

Use case 2:

I promoted it with a set of pairs. The first component of the pair was a description of what a website element might look like. The second was what the corresponding HTML might be.