I’ve recently been appointed post-doc at MIT under the supervision of Patrick Jaillet. Patrick’s world-known specialty is online optimization, and I’m extremely thrilled to start to work with this topic… even though it means that I’ve had (and still have) a lot to learn! And what better way to learn than to take online optimization as a game? Thanks to the wonderful work of Pavan Mirla, I’m very excited to present to you the (first?) online optimization game ever. Your goal here is to pick the best toilet, given that you cannot go back on earlier decisions of acceptation/rejection of previously visited toilets:
This game has been solved only a few decades ago, after it has been made popular by the great Martin Gardner in 1960. It quickly became known as the secretary problem, although it is also known as the game of googol, the fiancée problem or the sultan’s dowry problem. In our case, it’ll be the toilet problem. And what’s amazing about this game is what happens as the number of toilets goes to infinity (which, fortunately for you, I won’t have you playing!).
Quite surprisingly, no! As it turns out, the best strategy will guarantee that you’ll always find the best toilet more than 1/3 of the times, no matter how many toilets there are! This is what’s brilliantly explained by Dr Ria Symonds in the great Numberphile video below:
More generally, online optimization is all about making clever irrevocable decisions as information pours in. And, these days, that’s a hot, very hot topic!
Because that’s basically what Google’s business is all about. Namely, around 96% of Google’s incomes come from Adwords, which consists of adequately allocating ad display areas accordingly to users’ search words and advertisers’ budgets. Optimizing Adwords by a teeny tiny fraction would lead to literally billions of increase in income! Plus, evidently, Google is not the only company that earns its bread through optimized online decision-making with incomplete information…
The Secretary Toilet Problem
Let’s get back to serious business. I mean… to our toilet problem.
Sure thing! In the toilet problem, we have, say, 100 toilets available. We visit them one at the time, but once we enter a toilet, we need to make an irrevocable decision: Use it or leave it. But evidently, most toilets are very dirty, and we’d rather use the absolute best toilet. So, here’s the problem: Is there a reasonable chance for me to find the absolute best toilet? And if yes, how can I? What if there were 1,000 toilets? Or a billion?
I know! But amazingly, there’s a way to have a very reasonable chance to finding the best toilet, even if there were as many toilets as the number of stars in the universe! No matter how many toilets there are, I can devise a strategy to find the best of them over 36% of the time!
Hehe… To determine this probability, let’s first reflect on the strategy I can devise. A strategy is a rule to accept or reject the proposed toilet, based on how it compares to other observed toilets and to the number of toilets we’ve already rejected. Now, let’s say we’re searching for the absolute best toilet. Evidently, we’ll reject all toilets that are not better than the best toilet we’ve visited so far.
That’s when we’re faced with a dilemma. Intuitively, if we’ve only visited a small number of toilets, then it’s unlikely that the next toilet is the absolute best one, even if it is better than all visited toilets. But if we keep rejecting toilets, the probability that the absolute best toilet is one that we’ve already rejected increases. Therefore, there’s a tradeoff between exploring/learning more about what toilets are out there, and missing out on an opportunity.
Exactly. And if we can determine this optimal stopping time, we can solve the problem.
Suppose we have $n$ toilets to visit, and that our strategy consists in rejecting the first $k$ toilets, and then accepting the first toilet that’s better than all previously visited toilets. Now, consider a toilet $t > k$. What is the probability $p_t$ that we end up accepting toilet $t$ and that it is the best toilet?
Yes. So far so good. But now, given that toilet $t$ is the best toilet, what’s the probability that we do accept it?
Yes! That’s because it will obviously be better than all previously visited toilets.
Very good!
So, what are the conditions under which we do reject all toilets between $k+1$ and $t-1$?
Exactly! If the best of the $t-1$ first toilets is within the $k$ first toilets, then any toilet between $k+1$ and $t-1$ will be worse than the best toilet of the $k$ first toilets. Thus, all the toilets between $k+1$ and $t-1$ will be rejected. Conversely, if the best of the $t-1$ first toilets is not within the $k$ first toilets, then it is a toilet between $k+1$ and $t-1$. And since this toilet is better than all previous toilet, there is not a chance that we reject it.
Very good. So, overall, the probability that we accept toilet $t$ and that it is the best toilet is $p_t = \, ^k/_{n(t-1)}$. Naturally, the greater the stopping time $k$, the more likely that we will not wrongly choose a toilet that was not the best one. The greater $t$, the more toilets need to be rejected to get to toilet $t$, and thus the lower the probability $p_t$. And finally, obviously, the greater $n$, the smaller the chance that $t$ is the best toilet.
Yes. And to compute this probability $p$, all we need to add up the $p_t$’s for $t \geq k+1$. And here, we have an opposite phenomenon. Namely, the greater $k$ is, the fewer terms $p_t$ we can add up. That’s where the tradeoff is. For large $k$’s, we add up few large terms. For small $k$’s, we add up many small terms.
At this point, we need a bit of calculus. Using the approximation of the harmonic series $1+\,^1/_2+\,^1/_3+\,^1/_4+\ldots +\,^1/_n \approx \ln n$ proved by Euler, we have $p = p_k + p_{k+1} + \ldots + p_n = \,^k/_{nk} + \,^k/_{n(k+1)} + \ldots + \,^k/_{n \times n} = \,^k/_n (\,^1/_k + \,^1/_{k+1} \ldots +\, ^1/_n) = \,^k/_n ( (1+ \,^1/_2 + \ldots + \,^1/_n) – (1+ \,^1/_2 + \ldots + \,^1/_{k-1})) \approx (\,^k/_n) \ln (\,^n/_k)$. This is maximized for $k/n \approx 1/e$, where $e =\, ^1/_1 + \,^1/_{1 \times 2} + \,^1/_{1 \times 2 \times 3} + \ldots \approx 2.718$ is Euler’s constant. Choosing $k \approx n/e$ then yields a probability $p \approx 1/e \approx 36.78\%$ of accepting the best toilet.
I’ll do one better. I’ll let you listen to Dr Ria Symonds’ explanations:
Now, you may be pleasantly surprised that we’ve found a strategy that guarantees a good chance of finding the best toilet. But I’d advise you not to apply it to finding your soulmate (yes, you can replace toilets by dates, and the same reasoning holds). This stopping strategy is very risky!
Here’s the problem. If the best toilet is within the first $k$ toilets of the learning phase, then you’ll just reject all the toilets! Eventually, you will be forced to settle with the last toilet you’ve visited, or, if you’re applying this to dating, you will probably end up dying sad and lonely… And sadly, the probability that this worst case occurs is exactly the same as the probability of finding the best toilet: 36.78%.
Indeed it is not. In particular, this toilet solution is not a good idea in practice, and I hope that no online web company uses it. To give relevant practical solution to online decision-making, we need to make our problem more realistic.
Online Optimization
There are a few details about the classical version of the secretary problem that make it unrealistic. The most crucial of all is about the goal we want to achieve. In practice, it’s much better to have a strategy that always guarantee accepting a near-best toilet than one that finds the absolute best toilet with high probability. After all, the absolute best toilet might be better than the next best ones just because it has four rolls of toilet papers instead of three. Well guess what… I hereby dare to declare that I am personally quite fine with (clean) toilets with only three rolls of toilet papers!
Yes I am!
Exactly! The problem should be about what happens in average. To model that, let’s say that toilets can be scored between 0 to 100. We’ll say that a strategy that always finds a toilet of score 90 is better than one that finds a toilet of score 100 with probability 37% and a toilet of score 80 the rest of the time. Why? Because the average score of the latter strategy is $10 0 \times ^{37}/_{100} + 80 \times ^{63}/_{100} \approx 87$, which is worse than the average score of the former strategy, $90$.
To make the problem more interesting, we will as well assume that we’re accompanying 5 kids, and we’re searching for appropriate toilets for them to use. This means that instead of accepting 1 toilet, we now need to accept 5 toilets. Moreover, let’s imagine that we have to pay to use the toilets, but that we only have one dollar on us at the moment. Since not all toilets cost the same (classier ones are probably more expensive), we’ll need to manage our limited budget.
And this is what online optimization really is about: Budget management. And it’s not necessarily a financial budget. Imagine you’re going shopping for food. You have both a financial budget and a stomach budget to manage: You don’t want to be full when you’re stumbling upon a delicious ice cream shop…
Yes. The setting I’ve described here is an online packing/covering problem. Namely we need to “pack” our dollar consumptions, while “covering” the requirement of 5 toilets. Such problems are so ubiquitous in applied mathematics that we call them mathematical programs, and we write them as follows:
The mathematical program above is what we want to solve. Unfortunately, we can’t really solve it, because we don’t yet know the scores and the costs of the toilets we haven’t visited yet. So we need to do the next best thing: Design an online strategy that guarantees a total score in average that is almost the score we’d get if we knew about all the toilets.
This question has been addressed by no less than 10 papers within the last 5 years (and most have come out last year!). I cannot discuss the specifics here (I’ll do that in a future article), but here’s roughly their answer: If I visit the toilets in a random order, I can design an online strategy whose average score is very close to the best possible score. More precisely, the competitive ratio defined as the ratio of the online strategy’s average score by the best possible score is $1-\epsilon$, where $\epsilon$ is very small when the budgets are sufficiently high.
Roughly, there are two extreme strategies, and plenty of others in-between. While these other in-between strategies are quite hard to explain, I find the two extreme strategies very intuitive and natural.
The Economist’s Strategy
Historically (and by that, I mean, in 2009), the first proposed online strategy is sort of an economist’s strategy. Namely, I’m going to give a value to the budgets I have. More precisely, I’ll estimate how many toilet scores I get by paying 1 dollar, for instance. Or, in the case of food shopping, how much happier I’ll be if I eat the equivalent of one big meal. And then, when I see a next toilet, I’ll see if it’s worth its price.
For instance, in our toilet problem, I may expect a toilet to gain 10 point out of 100 if its price increases by 3 cents. So, the value of 3 cents would be 10 points out of 100. Then, if I’m faced with a toilet with score 50/100 and cost 20 cents, I’ll know it’s not worth it, because each of the 10 points out of 100 is valued 3 cents, and thus, the toilet should only cost 15 cents.
The key to this economist’s strategy is to learn the values of the toilets by first observing the first few toilets. In this strategy, like in the original secretary toilet problem, the first few toilets are all rejected. They just serve as a benchmark to judge future toilets.
I know. And crucially, if the first few toilets are somehow representative of all the toilets, then I’m likely to have a great estimation of the value of 10 points of 100. And since, with high probability, I will have a good estimation of that, my future decisions will be near-optimal in average!
The Entrepreneur’s Strategy
While the economist’s strategy is indeed near optimal, last year, Kesselheim, Radke, Tönnis and Vöcking proved that one can do better, with a more “entrepreneurial” strategy. Roughly, in this strategy, an investor allocates a fraction $k/n$ of the total budget to the first $k$ toilets. Then, given this allocated budget, the entrepreneur makes the decision that would be optimal if the problem it had to solve was restricted to these first $k$ toilets.
No he can’t, and he won’t. But, crucially, his decision for the $k^{th}$ toilet is the one he should have made if he could go back on previous decisions. In some sense, the entrepreneur is constantly questioning himself and adapting himself to new data, even though he cannot change the past decisions. In particular, if the $k^{th}$ toilet is really worth it, the entrepreneur might spend all the budget allocated by the investors for the $k$ first toilets on this $k^{th}$ toilet, even if he had already spent some of this budget on the $k-1$ previous toilets. Thus, because of poor past decisions, it’s common for the entrepreneur to spend more budget than is allocated by the investor. But the entrepreneur can’t afford to regret the past.
Sure! Imagine there are 100 toilets and that you’re visiting the 10th toilet. Since you had 1 dollar to start with, and since you’ve only visited 10% of the toilets, the investors only allocate you 10 cents for the first 10 toilets. Now, imagine that the 10th toilet has the best ratio score over cost, so that you wish you had spent all of your first 10% budget on this 10th toilet. Then, here’s what the entrepreneur’s strategy would have you doing: Spend all of the investor’s allocated budget on this 10th toilet (10 cents), even though you may have already spent some of these 10 cents on previous toilets!
You’re right! It’s more of an indicator of how much should be spent on the first toilets. But still, you cannot spend more than the allocated budget for the first 10 toilets on the 10th toilet.
That’s a very good question! Let’s say it costs 20 cents, which is twice the budget allocated for the 10 first toilets. You cannot afford it. However, the entrepreneur’s strategy does not want you to give up on this toilet! It really wants you to spend the investor’s allocated budget on this lovely 10th toilet. So, ideally, it would want you to pay 10 cents to use half of the toilet!
Well, in practice… I’ve never asked to use half of a toilet, but I doubt they’d let you do so!
You can flip a coin! If it’s heads, you’ll spend the 20 cents required to use the toilet. If it’s tails, you’ll give up on this lovely toilet (what a shame!). Crucially though, in average, you’ll have spent 10 cents on the 10th toilet.
I know! When I started to read about it, I knew it was the better strategy, and I was deeply troubled… I thought it couldn’t be the better strategy! But amazingly, after five or six pages of proof, its authors did come to an astounding conclusion: Not only is this strategy better than the economist’s strategy, it is in fact the best online strategy!
Hehe… Sadly (at least for me), online packing with a random order of the visits is indeed a closed subject, in the sense that we know what the best online strategy is, and we know how well it performs. But there are plenty of variants of the problem that remain unanswered. For one thing, mixtures of the economist’s and the entrepreneur’s strategies yield an interesting mixture of quality competitive ratio and fast computation, as opposed to the optimal entrepreneur’s strategy which requires a lot of computation.
But there’s more…
The happy news (at least for me) is that I’ve stumbled upon several interesting open questions whose answers I do not have (and sometimes do not even have a guess of what the answers might be!). But, I’ll keep those for myself… for now.
Let’s Conclude
There’s a lot, lot more to say about online optimization. Evidently, my presentation of this rapidly expanding area of research is deeply skewed by my own personal interests. Let me briefly mention the particular case of online matching, which is the online version of the marriage assignment problem I wrote about years ago. This problem has been given major importance due to applications to online markets and, more surprisingly/amusingly, to kidney transplants. At any point of time, we have a set of donors and receivers. The difficulty lies in the fact that not all donors and receivers are compatible, and that some are more compatible than others. Thus the question: When someone needs a kidney, should we give him one that’s already available, or should we wait for more compatible donors?
Another important variant of online optimization is sometimes known as online learning, where the focus is given on the regret rather than on the competitive ratio. This regret is the difference (as opposed to a ratio) between the best that could have been done as opposed to what the online algorithm has done. I’d love to tell you more about it, but unfortunately, I’m still hugely ignorant of that field of research…