Machine Learning to Verify Maduro's Loss
I trained a model to assess if the opposition, as CNE claims, is attempting a fraud. Numbers did not lie
In the last two months, Venezuela has attracted the attention of scholars, journalists and experts of international organisations around the question of the presidential election’s results. The analyses and reports published within that time hint towards the opposition’s version of the results. They differ from each other on their level of depth and complexity, but are complementary by looking into different dimensions of the issue.
This article describes the results of an analysis using supervised machine learning methods that found, once again, support for the opposition’s accounts of the electoral event – offering new findings to the increasing body of analysis and adding more evidence to verify that Maduro, contrary to the official version, did not win the elections of July 28th.
This alternative methodology leads to the same result as the other studies, strengthening the theory that Venezuela voted for a new political path.
Testing the only sample we have
When it comes to clearing the uncertainty around the results of an election –and helping prevent the social and political instability that come from it– political science has tools at its disposal to test whether electoral data might be fraudulent. And Venezuela’s presidential elections on July 28 make such a test pertinent, for several reasons.
One is the reports from Juan Carlos Delpino, the main rector of the National Electoral Council (CNE), who cited irregularities before, during, and after election day that compromise the transparency and reliability of the official results. Another is the significant inconsistency between the official results announced by the electoral board president Elvis Amoroso, and the projected voting intentions based on surveys by CEPyG UCAB and the Delphos Institute, as well as exit polls from Edison Research. Even the government’s invited election observer, the Carter Center, reported the elections’ failure to meet democratic standards due to government action.
This alternative methodology leads to the same result as other studies, strengthening the theory that Venezuela voted for a new political path.
Thus, fraud detection tests are clearly indispensable. However, since the CNE published no disaggregated data that supports Maduro’s win, testing it directly is not an option. The opposition, on the other hand, did publish 83.50% of the CNE printed voting tallies, or actas, gathered by electoral witnesses all across the country, and these show a win for Edmundo González with 67.08% of the vote. Their publication allows that these are put to the test in search of evidence of their integrity.
Unexpected frequency in first digits as a bad sign
By looking really closely into the numbers exposed in the actas, following the scientific way of stating a clear hypothesis and work to sustain it or discard it, we can help to answer the question at the heart of an election dispute: if a fraud has been committed, which party is responsible?
One of the detection tools we political scientists often use to detect fraud by testing the integrity of a data set is Benford’s Law. It specifies how often the numbers from 1 to 9 come up as the first digit of numerical data, including voting records – for example, the leading digit 2 in the value 2458 for the total amount of votes cast in a voting centre, and so on for all centres, to generate a data set. The law is based on the observation that, when looking at voting records, digits from 1 to 9 do not come up equally frequently as leading digits from the vote count value. Rather, smaller ones come up more often as leading figures than bigger ones. The digit 1, for example, will come up more often than 2, and even more often than 9. This has been found among unmanipulated and naturally occuring data sets, and deviations from it can indicate fraudulent manipulation.
Concretely, this law can be used as a tool to detect fraud where a party “steals” votes from another party and/or makes up votes from non-voters. This means that it only tests fraud types where data sets have been fabricated or distorted by humans.
Results claiming Nicolás Maduro as winner could be put to such a test as well once – and if – they get published by the CNE.
If digits from 1 to 9 appear leading total vote counts more or less than Benford’s law says they should, this is a symptom of election fraud.
The law expects, for example, that 1 appears as a first digit for 30.1% of the vote counts, 2 for 17.6% – and so on, with a decreasing tendency for each bigger digit. So if a party claims electoral victory with a data set that violates Benford’s law, and fulfils the statistical criteria for Benford’s Law to be expected, this is a hint that such party is trying to steal an election.
The CNE claims that Maduro won, so that was tested as a hypothesis: did the opposition “steal” his votes to give them to González, even with its limited resources within the Venezuelan autocratic context? If the hypothesis were true, Benford’s Law would not hold in the opposition’s data exposed in resultadosvzla.com.
What machine learning found
In order to analyse the opposition’s data, I adapted the methodology of an analysis I studied in my Masters program. Francisco Cantú and Sebastian Saeigh applied Benford’s Law to Buenos Aires’ gubernatorial and national congressional elections from the 1930’s, using machine learning, and identified both fraudulent and non-fraudulent cases in the process that matched the expectations based on anecdotal information. Similar to the coincidences between polls, exit polls and actas that we see about Venezuela’s July 28.
Those who attempt electoral fraud rarely publish data, as is the case in Venezuela. Machine learning helps to solve this problem by simulating the election data, based on how fraudulent and clean election results would look like. To detect fraud, a learner – the statistical model doing the classification of election cases – must know what fraud and what clean elections look like. It does not limit itself to detect fraud in a specific data set under study, but the learner learns from it to detect fraud in further ones. Then happens the learning process regarding how the characteristics of vote counts can tell us whether the electoral data is of integrity or not. Thus, in the Venezuelan case, the simulated data trains the learner to allow it to check the opposition’s data for fraud, or lack thereof.
After the simulation and learning process, the opposite to the hypothetical scenario of fraudulent manipulation by the opposition holds true in the analysis: the learner gave the opposition’s data set the grade of “clean”. This supports, then, the assumption that the opposition did not tamper with the electoral data of July’s presidential election to González’s advantage.
Results claiming Nicolás Maduro as winner could be put to such a test as well once – and if – they get published by the CNE.
In summary, this adds to findings by other actors that hint towards the integrity of these data, including those in international organisations. The UN Panel of Experts found evidence through the review of a sample of the actas scans uploaded on resultadosconvzla.com that their security features match those known to belong to official protocols from Venezuela’s election system. Another scholar, the University of Michigan Professor Walter Mebane, found evidence that there “may be no (…) fraudulent votes among the votes for González at all” in his recent analysis from August, 2024 that applies comprehensive quantitative methods.
The present relevance of the assessment and evaluation of these data, which conditions the recognition of candidate Edmundo González’s election, shows how much academics can contribute to important debates in the context of transition processes that can potentially generate or initiate a democratisation process. Such a recognition is not only vital when coming from countries and organisations from the international community, but also –and most importantly, as political scientist John Magdaleno often explains– when it comes from relevant actors within Venezuela that can make a transition possible.
Caracas Chronicles is 100% reader-supported.
We’ve been able to hang on for 22 years in one of the craziest media landscapes in the world. We’ve seen different media outlets in Venezuela (and abroad) closing shop, something we’re looking to avoid at all costs. Your collaboration goes a long way in helping us weather the storm.
Donate