Bolton’s book and Goodhart’s law

06302020-goodhartslaw
Artwork by Aparna Baxi.

Donald Trump’s obsession with re-election illustrates the problems caused by metrics becoming targets.

John Bolton, erstwhile National Security Advisor to the Trump White House, released his tell-all memoir “The Room Where it Happened” on June 23rd. Like much of the reporting surrounding the current USA Administration, the book’s lurid disclosures of disorganisation, misinformation, and a transactional approach to foreign policy are more confirmatory than eye-opening, given the near-constant diet of sensation and outrage being fed to the public.

The book offers one original insight, however. It reveals the extent of President Trump’s obsession with being re-elected at all costs and his willingness to focus all the tools of his office to that end, apparently even soliciting Chinese purchases of farm goods to secure the loyalty of a demographic that supported him in 2016.

Of course, there’s nothing unusual about that request in and of itself. All governments hope to be re-elected, and politics has never been a wholly clean undertaking either. As is common to so much of his presidency however, Trump’s monomania and his disregard for the niceties of geopolitical manoeuvring have made that desire more overt than is usually the case.

Most governments seek to reward their supporters – keeping promises made to voters is a sound way of securing their continued loyalty – but Trump’s apparent willingness to seek assistance from an overseas power that he was simultaneously embroiled with in a trade war is revealing. Whatever American strategic concerns were being addressed by fronting up to China were ultimately subsidiary to ensuring a payoff for past and potentially future Trump voters.

Such behaviour is in fact a fine example of Goodhart’s Law: when a metric becomes a goal, it ceases to be a reliable metric.

A classic example of Goodhart’s law (very familiar to those in the UK) is hospital waiting times. In theory, the waiting time at a hospital is indicative of how efficient that hospital is, and so a good hospital providing prompt care for patients will generally have a short waiting time for treatment. Consequently, hospital waiting times are a reasonable metric for determining the level of service and thereby the quality of healthcare provided.

But when that metric, the waiting time itself, is reconfigured as a target for hospitals to meet – as it has been by a succession of UK governments – then hospitals are inevitably incentivised to cut waiting times regardless of the effect that has on the overall quality of healthcare provided. By changing the metric into a goal, it ceases to be informative, and the odds are that society isn’t being better served by the change.

By the same token, governments should be trying to do a good job and if they do, then they get re-elected. A good government doing a good job will attract the most votes in an election, and so re-election becomes a metric (one of many possible ones), for the public’s satisfaction with the job it’s doing. But if the focus of the government’s energy becomes re-election itself and good governance is secondary – in other words, if Goodhart’s law takes hold – then there’s a problem.

It is nowadays arguably more important to be a successful politician than to be a good politician, with Trump’s ascent to the US presidency and Boris Johnson’s elevation to UK Prime Minister being standout examples. Johnson’s actual politics are rather undefined – infamously, he wrote positions both for and against EU membership before deciding to opt for the latter in the Brexit referendum – and a far cry from politicians of earlier generations (and many contemporary independents), who entered Parliament with a sense of principle and a determination to do something for the common good, instead of simply being there, and staying there.

In the USA, “staying there” in political terms is the defining symptom of Potomac fever. Over the years, increasingly underhand tools have been employed to facilitate this overriding desire to remain entrenched in Washington no matter what – jerrymandering and voter suppression being two currently relevant examples. Over in the UK, the Johnson/Cummings No. 10 obsession with focus groups and polling instead of actually having policies is a symptom of the same malaise. The three-word slogan trumps the three-year plan (never mind a five-year or ten-year one).

All signals then are pointing in a direction where the governance of democratically elected societies is in reality focused on amassing and retaining power (for what purpose?), rather than serving the interests of voters.

This is a problem that contemporary democracies, and especially liberal democracies, are going to have to resolve if they wish to adhere to the values they claim to represent, instead of deteriorating into an ugly kind of authoritarian mob rule. But once Goodhart’s law has taken hold of a system, how can it be negated?

Academia has actually been wrestling with the same problem with years. Science in particular offers a salutory lesson in the difficulties of assessment without incurring Goodhart’s law, and the constant underlying tension between being successful, and being good.

The struggles in science stem from the fact that it’s very hard to put objective value on the typical roles of an academic – the definitive ones of teaching and research (i.e. the dissemination and acquisition of knowledge, respectively), and the more political ones of raising funds and administrative duties. Outreach, regrettably, is still an optional extra.

It’s easier to define simple objective proxies for value in research and fundraising than it is for teaching and administration – research brings citations, and funding brings money. And because research performance is usually correlated with the ability to attract funding, and because money brings power – to hire more people, to buy equipment and reagents, to contribute more overheads to institutional coffers – it’s also easy to see why these metrics are often given more weight in career terms.

But with so much riding on research assessment, how then should research quality actually be evaluated? Citations, as noted above, can be used as a measure for good science – it’s certainly true that good work gets cited more than substandard efforts. But if the primary aim of research becomes to be highly cited (rather than to do good science), then science itself will be poorer. Goodhart’s law again.

It’s a system that has led to the current tyranny of the Journal Impact Factor (JIF), a system in which an individual scientist’s worth is supposedly determined by their ability to get their work into prestige journals. The primacy of the JIF metric is predicated on the false premise that work published in high JIF journals will automatically be more highly cited and therefore be of perceived greater value. Most papers published in such journals actually don’t get more citations than they would elsewhere, but the credulous treat them as if they do. The JIF is therefore also a manifestation of the halo effect, in which a favourable first impression (“Wow, they have a paper in [prestige journal]!”) outweighs slower and more considered evaluation. There is also the corollary that servitude to the JIF transfers power over scientists’ careers from their peers to journal editors.

Any single metric can be easily gamed if the aim is to satisfy that metric (another manifestation: schools can improve their performance by getting rid of low-performing pupils). Even multi-parametric assessment can be reduced to the level of a box-ticking exercise if it’s presented as binary yes/no.

For a reasonable measure of value, you need both multi-parametric and scaled evaluation that has actually been tailored to assessment of the quality being measured. Only by taking the time and effort to think about what you want, then defining the characteristics that exemplify that aim, and then designing measures that quantify those characteristics, will you arrive at a plausible estimate. As is so often the case when it comes to pop summaries of decision architectures, Daniel Kahnemann’s “Thinking fast and slow” offers a salutory example – the section on recruit evaluation in the Israeli military is a brilliant illustration of the combined worth of multi-parametric and intuitive evaluation working in tandem).

The Declaration On Research Assessment (DORA) in 2012 represented the first push by scientific community to break out of the stranglehold of Goodhart’s law provided and perpetuated by the prestige publishing model. It’s a struggle within science that is far from concluded, but DORA, its advocates and adherents, represents a huge step forward and the construction of a genuine coalition for change that will undoubtedly alter the scientific community for the better.

In terms of politics…it’s not for TIR to make recommendations, but one thing is clear: good politics should be about good governance, not good polling.

 

Artwork: This week’s artwork is from first-time contributor Aparna Baxi. Aparna is an interdisciplinary PhD candidate at the George Washington University (USA), studying Molecular Medicine through the application of mass spectrometry.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s