Algorithms to Live By

Metadata

Author: Brian Christian and Tom Griffiths

Highlights

If you want the best odds of getting the best apartment, spend 37% of your apartment hunt (eleven days, if you’ve given yourself a month for the search) noncommittally exploring options. Leave the checkbook at home; you’re just calibrating. But after that point, be prepared to immediately commit—deposit and all—to the very first place you see that beats whatever you’ve already seen. This is not merely an intuitively satisfying compromise between looking and leaping. It is the provably optimal solution. We know this because finding an apartment belongs to a class of mathematical problems known as “optimal stopping” problems. The 37% rule defines a simple series of steps—what computer scientists call an “algorithm”—for solving these problems. And as it turns out, apartment hunting is just one of the ways that optimal stopping rears its head in daily life. Committing to or forgoing a succession of options is a structure that appears in life again and again, in slightly different incarnations. How many times to circle the block before pulling into a parking space? How far to push your luck with a risky business venture before cashing out? How long to hold out for a better offer on that house or car? The same challenge also appears in an even more fraught setting: dating. Optimal stopping is the science of serial monogamy. Simple algorithms offer solutions not only to an apartment hunt but to all such situations in life where we confront the question of optimal stopping. People grapple with these issues every day—although surely poets have spilled more ink on the tribulations of courtship than of parking—and they do so with, in some cases, considerable anguish. But the anguish is unnecessary. Mathematically, at least, these are solved problems.

There is a particular set of problems that all people face, problems that are a direct result of the fact that our lives are carried out in finite space and time. What should we do, and leave undone, in a day or in a decade? What degree of mess should we embrace—and how much order is excessive? What balance between new experiences and favored ones makes for the most fulfilling life?

But an algorithm is just a finite sequence of steps used to solve a problem, and algorithms are much broader—and older by far—than the computer. Long before algorithms were ever used by machines, they were used by people.

As Carl Sagan put it, “Science is a way of thinking much more than it is a body of knowledge.” Even in cases where life is too messy for us to expect a strict numerical analysis or a ready answer, using intuitions and concepts honed on the simpler forms of these problems offers us a way to understand the key issues and make progress.

look at the first 37% of the applicants,* choosing none, then be ready to leap for anyone better than all those you’ve seen so far.

moments. A family gathering together on the holidays is exploitation.

If you’re thinking not just about the next decision, but about all the decisions you are going to make about the same options in the future, the explore/exploit tradeoff is crucial to the process.

Seizing a day and seizing a lifetime are two entirely different endeavors. We have the expression “Eat, drink, and be merry, for tomorrow we die,” but perhaps we should also have its inverse: “Start learning a new language or an instrument, and make small talk with a stranger, because life is long, and who knows what joy could blossom over many years’ time.” When balancing favorite experiences and new ones, nothing matters as much as the interval over which we plan to enjoy them.
Carpe diem doesnt worj with a variable inverval of time available to us. It is mtter leaa of seizing the day everyday, but of making our time extraordinary

A sobering property of trying new things is that the value of exploration, of finding a new favorite, can only go down over time, as the remaining opportunities to savor it dwindle. Discovering an enchanting café on your last night in town doesn’t give you the opportunity to return. The flip side is that the value of exploitation can only go up over time. The loveliest café that you know about today is, by definition, at least as lovely as the loveliest café you knew about last month. (And if you’ve found another favorite since then, it might just be more so.) So explore when you will have time to use the resulting knowledge, exploit when you’re ready to cash in. The interval makes the strategy.
Novelty is good, if we are given the time to then exploit it.

For every slot machine we know little or nothing about, there is some guaranteed payout rate which, if offered to us in lieu of that machine, will make us quite content never to pull its handle again. This number—which Gittins called the “dynamic allocation index,” and which the world now knows as the Gittins index—suggests an obvious strategy on the casino floor: always play the arm with the highest index.*
The Gittins index takes ibo account both the payout itself and its probability as a result, we cab use the jndex to directly clmlare events kn krder to maximkze them.

But perhaps the most interesting part of the table is the top-left entry. A record of 0–0—an arm that’s a complete unknown—has an expected value of 0.5000 but a Gittins index of 0.7029. In other words, something you have no experience with whatsoever is more attractive than a machine that you know pays out seven times out of ten! As you go down the diagonal, notice that a record of 1–1 yields an index of 0.6346, a record of 2–2 yields 0.6010, and so on. If such 50%-successful performance persists, the index does ultimately converge on 0.5000, as experience confirms that the machine is indeed nothing special and takes away the “bonus” that spurs further exploration.
Something we knkw nothin abkut is still worth exploring. Usibg this system allows us to accoubt for mostakes in a process. For exmple, if you have had good luck ordering food from an Indjan restaurant 4 times, but the 5th fails, the Indian restaurabt is still worth doing over exploring slmeone new (4-1 vs 0-0) index

makes for a Gittins index that’s still above 50%. We can also see how the explore/exploit tradeoff changes as we change the way we’re discounting the future. The following table presents exactly the same information as the preceding one, but assumes that a payoff next time is worth 99% of one now, rather than 90%. With the future weighted nearly as heavily as the present, the value of making a chance discovery, relative to taking a sure thing, goes up even more. Here, a totally untested machine with a 0–0 record is worth a guaranteed 86.99% chance of a payout! Gittins index values as a function of wins and losses, assuming that a payoff next time is worth 99% of a payoff now.
How does one determine the discount value??? Using table for job choices: enjoyed 4 good years as pilot, 4 bad ones. Thus the kndex is .6697, lower than moving into something new with index at 0-0.

We can also see how the explore/exploit tradeoff changes as we change the way we’re discounting the future. The following table presents exactly the same information as the preceding one, but assumes that a payoff next time is worth 99% of one now, rather than 90%. With the future weighted nearly as heavily as the present, the value of making a chance discovery, relative to taking a sure thing, goes up even more. Here, a totally untested machine with a 0–0 record is worth a guaranteed 86.99% chance of a payout! Gittins index values as a function of wins and losses, assuming that a payoff next time is worth 99% of a payoff now.

The old adage tells us that “the grass is always greener on the other side of the fence,” but the math tells us why: the unknown has a chance of being better, even if we actually expect it to be no different, or if it’s just as likely to be worse. The untested rookie is worth more (early in the season, anyway) than the veteran of seemingly equal ability, precisely because we know less about him. Exploration in itself has value, since trying new things increases our chances of finding the best.
Because a new thibg has a 0-0 record, assuminh ijts discoubted value is not too low, it nis generally woeth at least something.

For one, the Gittins index is optimal only under some strong assumptions. It’s based on geometric discounting of future reward, valuing each pull at a constant fraction of the previous one, which is something that a variety of experiments in behavioral economics and psychology suggest people don’t do. And if there’s a cost to switching among options, the Gittins strategy is no longer optimal either.
Given the cost of going tk UBC, is it worth switching? Consider the cost of the program itself as wll as the loss in wages.

First, assuming you’re not omniscient, your total amount of regret will probably never stop increasing, even if you pick the best possible strategy—because even the best strategy isn’t perfect every time. Second, regret will increase at a slower rate if you pick the best strategy than if you pick others; what’s more, with a good strategy regret’s rate of growth will go down over time, as you learn more about the problem and are able to make better choices. Third, and most specifically, the minimum possible regret—again assuming non-omniscience—is
Minimal regret is regret that grows logarithmically. We will never be truly satisfied; however choosing good course of action will decrease how disatisfied w are - or at least at what rate we become satisfied.

Upper Confidence Bound algorithm says, quite simply, to pick the option for which the top of the confidence interval is highest. Like the Gittins index, therefore, Upper Confidence Bound algorithms assign a single number to each arm of the multi-armed bandit. And that number is set to the highest value that the arm could reasonably have, based on the information available so far. So an Upper Confidence Bound algorithm doesn’t care which arm has performed best so far; instead, it chooses the arm that could reasonably perform best in the future. If you have never been to a restaurant before, for example, then for all you know it could be great. Even if you have gone there once or twice, and tried a couple of their dishes, you might not have enough information to rule out the possibility that it could yet prove better than your regular favorite. Like the Gittins index, the Upper Confidence Bound is always greater than the expected value, but by less and less as we gain more experience with a particular option. (A restaurant with a single mediocre review still retains a potential for greatness that’s absent in a restaurant with hundreds of such reviews.) The
Similarly, this can account for review averaging in movies and books. Rather than simply use the IMDB flrmjla, we can establish upper confjdence bkunds to.movie recommendations.

I had reached a juncture in my reading life that is familiar to those who have been there: in the allotted time left to me on earth, should I read more and more new books, or should I cease with that vain consumption—vain because it is endless—and begin to reread those books that had given me the intensest pleasure in my past.
How do the multi arm babdits apply to reading and selecting books?

her collaborator Barbara Fredrickson asked people to choose who they’d rather spend thirty minutes with: an immediate family member, the author of a book they’d recently read, or somebody they had met recently who seemed to share their interests. Older people preferred the family member; young people were just as excited to meet the author or make a new friend. But in a critical twist, if the young people were asked to imagine that they were about to move across the country, they preferred the family member too. In another study, Carstensen and her colleagues found the same result in the other direction as well: if older people were asked to imagine that a medical breakthrough would allow them to live twenty years longer, their preferences became indistinguishable from those of young people. The point is that these differences in social preference are not about age as such—they’re about where people perceive themselves to be on the interval relevant to their decision.
Time always plays a factor. Becoming captain tomorrow for an interval of one year makes sense; but is it resonable to do so qheb thibking bout 40 years?

arvb.net

Explorer

Algorithms to Live By - Brian Christian and Tom Griffiths