I once wrote about how you can use Gradient Descent to find the perfect cafe without driving around wasting gas. The theory is beautiful on paper: you calculate the cost function (disappointment), then you take slow steps in the direction that makes that cost go down (local minimum).

But last Saturday night, my algorithm crashed completely.

My friends and I tried to implement the theory. We called it a field experiment. But we forgot to include one crucial variable into our model: noise from real-time data, aka illegal parking attendants, the traffic jam at the Dago intersection, and the sudden drizzle.

In machine learning, if you use standard Gradient Descent (Batch Gradient Descent), you calculate the error from the entire dataset before taking one step. In the real world, this is like pulling your motorcycle to the side of the road, opening Google Maps, reading 100 reviews from the 10 nearest cafes, before deciding which way to turn. Extremely slow. By the time you decide to go to cafe A, the rain drops and their outdoor tables become unusable.

Because we felt smart, we switched our approach to Stochastic Gradient Descent (SGD). We chose a random direction, then updated our route at every intersection based on instinct and the immediate situation (just one batch of data). If the road is jammed, we turn left. If the road is empty, we hit the gas.

At first it was fun. But eventually we realized the fatal flaw of SGD: the path was a messy zigzag because it was too sensitive to noise. We entered a small alley that looked empty, only to find the end was blocked by a portal because there was a wedding celebration. We turned back (cost function exploded). Then we found a smooth alternative road, only to realize it was a one-way street and we were going against traffic. Got yelled at by public transport drivers (cost function exploded even more).

Our system failed to find convergence (a stable meeting point). In the end, after an hour of driving around zigzagging like a drunk algorithm, we went back to the exact same coffee stand where we started.

From this failure, I realized one thing. No matter how cool a mathematical algorithm is, it will never be able to handle a chaotic environment 100 percent. Saturday night in a big city is a highly non-linear system full of anomalies.

If you want to deal with a super chaotic system, you need something called Momentum. In the context of dating or hanging out, momentum is "just stop being so picky, grab what is right in front of you and pretend to be happy."

Momentum helps your algorithm keep moving straight past small obstacles without having to recalculate the route every second. When you find a cafe with only a 3.5 rating but it happens to have an empty parking spot and comfortable tables for smoking, you should immediately stop your motorcycle and order an iced coffee milk. No need to think that there might be a 4.8 rated cafe up ahead.

The point is, in this life, sometimes excessive optimization actually leaves you stuck on the street, rained on, and fighting with your own friends. Learn to brake your algorithm's ego, and just enjoy the coffee stand in front of your dorm that has fast wifi.

Khay