There can’t be only one

August 27, 2024 Django, Programming, Python

There’s a concept that I’ve heard called by a lot of different names, but my favorite name for it is “the Highlander problem”, which refers to the catchphrase of the campy-yet-still-quite-fun Highlander movie/TV franchise. In Highlander, immortal beings secretly live amongst us and sword-fight each other in hopes of being the last one standing, who will then get to rule the world forever. And when one of them is about to eliminate another, they often repeat the formulaic phrase: “There can be only one!”

The basic idea of the Highlander problem is that you can cause yourself a lot of trouble, as a programmer, by introducing a “there can be only one!” limit into your code. To take a real example: I once worked at a company that managed medical visits via video calls. A simplistic “there can be only one” view of that would model a visit as being between one doctor and one patient, perhaps with foreign keys at the database level. But that becomes a problem when the actual visit in front of you involves the patient, a primary doctor, a nurse who does triage, a second doctor who consults on the case, a translator who helps them all communicate with the patient, and the patient’s legal guardian/decision-maker. A better model would accept that the visit has multiple participants and multiple possible roles, and keep track of who was in which visit and in which roles.

Similar issues occur all the time. Systems assume someone will only have one of a thing — common examples include physical addresses, email addresses, phone numbers, and payment methods — when many people will quite reasonably have or want to have more than one. And it’s often a lot of trouble to try to fix the system after the fact, because assumptions like these tend to be deeply embedded in the code and often in database schemas.

The result of a few experiences with this is that I’m automatically suspicious of any system which hard-codes a “there can be only one!” assumption. This week, amusingly, I got a new example to use and I expect to get a lot of mileage out of it.

Who’s at bat?

A bit of background: baseball is, by tradition, an outdoor sport. And in the United States, although there are some stadiums with retractable roofs (and, currently, one major-league stadium — Tropicana Field in St. Petersburg, Florida, home of the Tampa Bay Rays — with a permanent domed roof), most at the professional level are open to the elements. And unlike some other sports, baseball does not work well in inclement weather: when it’s raining too much, the ball becomes much more difficult to grip, fielding and running and sliding are affected, and the game becomes a bit of a mess.

The rules allow, under some circumstances, for a game to be ended early due to weather, if certain specific conditions (like a minimum number of completed innings) are met. Or a game can be suspended — literally paused, as-is, and scheduled for completion at a later date.

Teams can also trade players with each other. There are rules about who can be traded (players with a certain amount of major-league playing time start gaining the right to refuse trades), and when (the exact date of the major-league “trade deadline” varies from year to year for game-scheduling reasons, but falls in the last week of July or first week of August). But within those rules there’s a lot of freedom for teams to swap players.

These two factors, combined, created a possibility which had always been theoretical, until yesterday.

On June 26, 2024, the Toronto Blue Jays and Boston Red Sox started a game at Boston’s Fenway Park which was suspended due to rain in the second inning. At the time the game was suspended, Danny Jansen — catcher for the Blue Jays — was at bat.

On July 27, 2024, Jansen was traded to the Red Sox. You may now be able to see where this is going.

Yesterday, August 26, 2024, the schedule had the Blue Jays back in Boston, and the game from June was finally played to completion. As is common for a suspended game being resumed, the official scorer’s listing includes a flurry of simultaneous player substitutions in the second inning, among which two are notable:

Pinch-hitter Daulton Varsho replaces Danny Jansen

Danny Jansen replaces Enmanuel Valdez, batting 7th, playing catcher

In other words, Jansen — who, remember, had been at bat when the game was suspended — was substituted out of the game by the Blue Jays (who he no longer plays for) and replaced with another player. And at the same time, Jansen was substituted into the game, as the new catcher, by the Red Sox (who he does now play for — when the game began, Reese McGuire was catching for the Red Sox, but he’s now in the minor leagues). Usually, a player who has been substituted out cannot re-enter the same game, but this case is an exception.

And thus, for the first time in the over 150-year-long history of major-league baseball in the United States, a single player — Danny Jansen — played for multiple teams in the same game (though it had also happened at least once in lower-level leagues; as this article about the Danny Jansen situation points out, a minor-league player did it in 1986).

Baseball statistics sites, to their credit, already have to handle a lot of weird things. Like the fact that a player can be listed as playing in multiple games for multiple teams on the same day, which is uncommon but not unheard-of (for example, it’s happened several times that two teams playing a doubleheader — two games against each other in one day — have traded players between the two games). And there are a lot of other fun quirks around suspended games: for record-keeping purposes, they occur completely on the day they began, rather than on the day they completed, so it’s possible for a player who took part in the completion of the game to seemingly “time travel” and join a team months earlier than they actually did, due to being officially listed as playing on the original date of the suspended game.

But this — a single player, playing for multiple teams in the same major-league game (most stats sites don’t keep the same kind of comprehensive records on minor-league games) — was an entirely new situation, and appears to have broken some assumptions that a player can only play for one team per game. This reddit thread points out problems baseball-reference — probably the largest and most popular baseball statistics site — is having, and a representative of the site showed up to say they’re working on it, and apparently have a whole internal thread of issues surfaced by Danny Jansen’s accomplishment.

“There can be only one!” has struck again.

A piece of advice

I’ve said many times before that I don’t like the “falsehoods programmers believe about…” genre of posts, because they tend to be contextless lists of things that you’re just told are wrong, with no explanation of why they’re wrong or what you could do instead that would be right. In that spirit, I’d like to offer a bit of guidance.

Wikipedia helpfully mentions the “zero one infinity” rule — the idea that a system should allow either exactly zero of a thing, or exactly one, or an unbounded number limited only by available resources — but I personally tend to lean toward a “zero or infinity” version of the rule: if a thing is worth allowing, it’s almost always worth not putting a hard limit on.

In database schemas, this means you should almost never have a column that is both a foreign key and the only column referenced in a UNIQUE constraint (creating a “one-to-one” instead of “many-to-one” relation — for Django users, this means you should almost never use its OneToOneField).

Allowing multiple and, at most, designating one as “primary”, is usually a better approach. Better still is allowing multiple and adding labels; for example, labeling each of a user’s multiple phone numbers as having different purposes and allowing the user to choose which one to use in a given circumstance.