What Is the Probably Approximately Correct Learning Theory?
Evolution is one of those theories that just never rests, stirring up new ideas that conflict with many a world-view. Its success cannot be denied, nor can some of its enduring mysteries. How do organisms actually make the changes they need to sustain themselves and evolve? What time frame does it take for an evolutionary change to take hold? Mutations are often the key to talking about these, but for Leslie Valiant, a computer scientist at Harvard, he wanted a different explanation. And so he developed his idea on ecorithms and the Probably-Approximately-Correct (PAC) theory. Though this, I hope you may come to view evolution in a new light: a system that is learning just as we do.
Understanding How to Learn with Ecorithms
It is important to distinguish that most life forms seem to learn mainly based on a non-mathematical model, sometimes with trial and error and sometimes with false notions. It is the ability of a life form to cope with what life hands them that determines their ability to survive. But is there actually a math-derived way to describe this learning ability? For Valiant, it most certainly can be, and it is through computer science that we can glean insights. As he puts it, “We have to ask what computers already teach us about ourselves.” (Valiant 2-3)
It is through an analysis of how computers operate and extending it to life forms that Valiant hopes to demonstrate the idea of an ecorithm: An algorithm which gives one the ability to gain knowledge from their surroundings in an effort to adapt to them. Humans are great at implementing ecorithms, having taken nature’s resources and extending them to our purpose. We generalize and max out our ecorithmic ability, but how can we actually describe the process via an algorithmic process? Can we use math to go about this? (4-6)
How do ecorithms imply the PAC situation, which simply put takes our ecorithms and modifies them according to our situation? Though some assumptions. First, we take for granted that life forms adapt to their environment via ecorithmic mechanisms in response to ones environment. These adaptations can be either mental or genetic in nature, for “ecorithms are defined broadly enough that they encompass any mechanistic process” as a result of the Church-Turing Hypothesis (where anything mechanistic can be generalized via algorithms or computations) (7-8).
And here is where we get to the bedrock of this ecorithmic work. Alan Turing and his theories on machine learning are still influential to this day. Searchers for artificial intelligence have been led by identifying machine learning, where patterns are discerned from a mine of data and led to predictive powers but without a theory. Hmm, sounds familiar doesn’t it? Learning algorithms are obviously not only restricted to this but thus far most escape universal application. Many depend on their environment for practicality, and this is where ecorithms will be useful as being purposefully turned to the environment. We, like a machine, are developing a pattern based on past experiences without contexts as to why it works, only caring about the utility behind it (8-9).
Now, it should be clear that we have discussed properties of an ecorithm, but we should also tread with care. We have expectations of our ecorithm, including being able to define it so it’s not a broad. We want these to be applied to the theoryless, the complex, the chaotic. On the flip side, we can’t have this be too narrow as to be impractical in application. And finally, it has to be biological in nature as to explain evolutionary traits such as gene expression and environmental adaptations. We have to have the ability to see “that there are many possible worlds” and that we cannot be “assuming that they are all the same” nor can we fix ourselves onto a single track (9, 13)”
Turing hinted as much when he showed in the 1930s that it is possible to get a computation but impossible to show the step-by-step for all the calculations of a given type. With ecorithms, we need to get those calculations in a short span of time, so it’s reasonable to think that a blow-by-blow for each step would be difficult if not impossible. We may best examine this with a Turing machine, which demonstrated the step-by-step calculations for a given situation. It should give a reasonable answer, and one could hypothetically extrapolate and make a universal Turing machine that can do any (mechanical) process desired. But an interesting kink to a Turing machine is that “not all well-defined mathematical problems can be solved mechanically,” something that many advanced math students can attest to. The machine tries to break down the calculation into finite steps but eventually it can approach infinite as it tries and tries. This is known as the Halting Problem (Valiant 24-5, Frenkel).
If our set is expressed fully, then we can see where these issues lay and identify them but Turing showed that impossibilities for Turing machines still exist. Could a different mechanism aid us, then? Of course, just depends on their set-up and methodology. All of these pieces contribute to our goal of evaluating a computation of a real world scenario with the possible and impossible conclusions based off our model being able to be reached. Now, it should be mentioned that the track record of Turing machines is well-established when it comes to modeling real-world scenarios. Sure, other models are good but Turing machines work best. It is this robustness that gives us confidence in utilizing Turing machines to help us out (Valiant 25-8).
However, computational modeling has limits dubbed computational complexity. It can be mathematical in nature, like modeling exponential growth or logarithmic decay. It can be the number of finite steps required to model the situation, even the number of computers running the simulation. It can even be the feasibility of the situation, for the machines will be dealing with a “deterministic of each step” calculation that builds from prior steps. Goof up early and you can forget about the effectiveness of the situation. How about randomly aiming for a solution? It can work, but such a machine will have a “bounded probabilistic polynomial” time associated with the run, unlike the standard polynomial time we associate with a known process. There is even a “boundary quantum polynomial” time, which is clearly based off a quantum Turing machine (and who even knows how one could be built). Can any of these be equivalent and substitute one method for another? Unknown at this time (Valiant 31-5, Davis).
Generalization seems to be the basis for many learning methods (non-academically, that is). If you encounter a situation that hurts you then one becomes wary if anything remotely like that arises again. It is through this initial situation that we then specify and narrow down into disciplines. But how would this work inductively? How do I take past experiences and use them to inform me of things I haven’t yet experienced? If I deduced, that takes more time than one has so something inductively has to be occurring at least some of the time. But another problem arises when we consider a false starting point. Many times we will be problem starting and our initial approach is wrong, throwing everything else off too. How much do I need to know before I have reduced the error to a functional level? (Valiant 59-60)
For Variant, two things are key for an inductive process to be effective. One is an invariance assumption, or that problems form location to location should be relatively the same. Even if the world changes, that should effectively alter everything that the changes impact and leave other things the same, consistently. It lets me map out to new places with confidence. The other key is learnable regularity assumptions, where the criteria I use to make judgments remains consistent. Any such standard that has no application isn’t useful and should be discarded. I get regularity out of this (61-2).
But errors crop up, it’s just a part of the scientific process. They cannot be fully removed but we can certainly minimize their effects, making our answer probably right. Having a large sample size for example can minimize the noise data gives us, making our work approximately right. The rate of our interactions can also impact it, for we make many quick calls that don’t give the luxury of time. By making our inputs binary, we can limit the choices and therefore the possible wrong choices present, hence the PAC learning method (Valiant 65-7, Kun).
Biology Meets Learnability
Biology does have some network extensions like computers do. For example, humans have 20,000 genes for our protein expression network. Our DNA tells them how to make them as well as how much. But how did this start in the first place? Do ecorithms change this network? Can they also be used to describe neuron behavior? It would make sense for them to be ecorithmic, learning from the past (either an ancestor or our own) and adapt to new conditions. Could we be sitting on the actual model for learning? (Valiant 6-7, Frenkel)
Turing and von Newmann felt that the connections between biology and computers was more than superficial. But they both realized that logical math wouldn’t be enough to talk about “a computational description of either thinking or life.” The battle ground between common sense and computation doesn’t have much common (see what I did there?) ground (Valiant 57-8).
Darwin’s theory of evolution hit upon two central ideas: variation and natural selection. Plenty of evidence for it in action has been spotted, but issues are present. What is the link between DNA and the external changes to an organism? Is it a one way change or a back-and-forth between the two? Darwin didn’t know about DNA, and so it wasn’t in his purview to even provide a how. Even computers, when given the parameters to mimic nature, fail to do. Most computer simulations show it would take 1,000,000 times the time we have existed for evolution to create us. As Variant puts it, “No one has yet shown that any version of variation and selection can account quantitatively for what we see on Earth.” It’s just too inefficient according to the models (Valiant 16, Frenkel, Davis)
Darwin’s work, however, does hint at an ecorithmic solution being required. All the things a life form does with reality, including physics, chemistry, and so forth are not describable via natural selection. Genes simply aren’t keeping tabs on all of these things, but clearly they do react to them. And the computer models failing to predict even remotely accurate results hint at a missing element. And that shouldn’t be surprising because of the complexities involved. What we need is something that is going to be nearly right, very accurate, almost brute force. We have to take in data and act on it in a probably, approximately, correct manner (Valiant 16-20).
DNA seems to be the basic layer to evolutionary changes, with over 20,000 proteins to activate. But our DNA isn’t always in the pilot’s seat, for sometimes it is influenced by our parent’s life choices prior to our existence, environmental elements, and so on. But this doesn’t mean that PAC learning should be altered, since this is still in the purview of evolution (91-2).
A key subtlety to our PAC argument is that a goal, a target, is the objective with this. Evolution, if it is to follow the PAC model, must also have a defined goal. Many would say this is survival of the fittest, to pass along one’s genes, but is this the goal or a by-product of living instead? If it allows us to perform better than it’s desirable, and we can model performance in several different ways. With an ideal function based off ecorithms, we can do this and model performances via probabilities that are likely to happen for a given environment and species. Sounds simple enough, right? (Valiant 93-6, Feldman, Davis)
Let’s finally talk (abstractly) about some of the calculations that may be going on here. We first define a function that can be idealized by an evolutionary ecorithm. We can say then that the “course of evolution corresponds to the cause of a learning algorithm converging towards a target of evolution.” The math here would be Boolean, for I would want to define x1, …, xn as concentrations of proteins p1, …, pn. It’s binary, either on or off. Our function would then be fn(x1, …, xn) = x1, or …, or xn, where the solution would depend on the given situation. Now, is there a Darwinian mechanism that takes this function and naturally optimizes it for any situation? Plenty: natural selection, choices, habits, and so on. We can define the overall performance as Perff(g,D) = f(x)g(x)D(x) where f is that ideal function, g is our genome, and D is our current conditions, all over a set x. By making f(x) and g(x) Boolean (+/-1), we can say that the output of f(x)g(x) = 1 of both agree and = -1 if in disagreement. And if we consider our Perf equation to be a fraction, then it can be a number from -1 to 1. We have standards for a mathematical model, people. We can use this to evaluate a genome for a given environment and quantify its usefulness, or lack thereof (Valiant 100-104, Kun).
But how are the full mechanics of this? That remains unknown, and frustratingly so. It is hoped that further research into computer science will be able to yield more comparisons, but it has not yet materialized. But who knows, the person who may crack the code could already be PAC learning and using those ecorithms to find a solution…
Davis, Ernest. “Review of Probably Approximate Correct.” Cs.nyu.edu. New York University. Web. 08 Mar. 2019.
Feldman, Marcus. “Probably Approximately Correct Book Review.” Ams.org. American Mathematical Society, Vol. 61 No. 10. Web. 08 Mar. 2019.
Frenkel, Edward. “Evolution, Speeded by Computation.” Nytimes.com. The New York Times, 30 Sept. 2013. Web. 08 Mar. 2019.
Kun, Jeremy. “Probably Approximately Correct – a Formal Theory of Learning.” Jeremykun.com. 02 Jan. 2014. Web. 08 Mar. 2019.
Valiant, Leslie. Probably Approximately Correct. Basic Books, New York. 2013. Print. 2-9, 13, 16-20, 24-8. 31-5, 57-62, 65-7, 91-6, 100-4.
Questions & Answers
© 2020 Leonard Kelley