I’m a little obsessed with optimization engines. Simply, an optimization engine is something that takes feedback on some process and uses that feedback to make refinements to the process.

Machine learning is full of these. The models you see hyped up in the news over the last few years have largely relied on an optimization scheme called stochastic gradient decent, where you have a model that tries to perform a task (e.g. identifying if something is a cat) and some loss function which is a bit like a score or feedback. The SGD algorithm then makes tiny changes to the model in a process called Backpropagation. Three Blue 1 Brown has a great video series on the topic if you’re interested in a tiny bit more depth than I’ve provided here.

There are of course other means of training, one family I’m particularly excited an interested in are evolutionary methods.

The real magic of evolutionary methods is they don’t need to compute a costly gradient, you simply average score by the current parameters of a set of models (population) and boom you have a new set of parameters to begin the next round of training on. CGPgray has a good treatment on thinking about these kinds of models and David Ha from Google Brain - Tokyo has a really great blog post about it.

What both of these techniques have in common though, is testing and feedback. For the purpose of this article I think its useful to define some terms.

Testing we will define as the process of evaluating the current skill at a task by some individual (or population of individuals). It can be an essay, a spelling bee, a soccer match, identifying if something is a cat or not, a game of go, or even the power utilization of a data center.

Feedback we will define as just the result from such a test. Feedback, can be both high and low dimensional, and can be a measure of either how successful you were or how unsuccessful you were.

That’s a bit abstract so I feel its worth making the idea of dimensionality in feedback more concrete with some examples.

Dimensionality in feedback:

  1. 1-D Feedback. You write an essay. The teacher gives you only a score. You have to infer from that score, what parts of your essay were bad and which were good to make improvements on the next round. This is the worst kind of feedback because it makes it extremely challenging to identify what to work on to improve.

  2. 2-D Feedback. You take a multiple choice test on a variety of topics or a very complex and nuanced topic. You are given an aggregate score as well as a marking of which answers you got wrong. Now you have a global sense of how much you need to improve as well as the specific topics you are weak on. You don’t however have good feedback on how well you know individual topics, just that you know certain ones below a threshold represented by the quality of the question

  3. 3-D feedback: you take a test with multiple short answer or essay questions. You get an aggregate score, and individual level scores on each individual question.

  4. 4-D feedback. Same as 3-D feedback, but you also get specific explanations of why each question is scored the way it is.

Generally, the richer the feedback the better a model can update it’s priors. The same, it turns out, is true of people.

A human operates and grows under a similar system. Sure we don’t compute gradients or have thousands of clones to generate a new set of parameters, but our learning process goes through a similar function of adapting to feedback and reorienting for the next upcoming test.

It would be helpful to define a more general framework for learning, human or machine.

An OODA loop is an idea that came out of the military strategy, and with or without thinking about it— its how you are learning how to do new things.

It stands for Observe, Orient, Decide, and Act.

  1. Observe: Gather information about the world. What is the problem, what is the goal, what were the results of my past actions

  2. Orient: synthesize the information gathered in the observe step