Filtered by: Scitech

SciTech

On the Elections Part 1: Election Fingerprints

By TJ PALANCA

Published May 30, 2016 6:48pm

In this elections series, we'll explore various aspects of the 2016 Philippine National Elections, from fraud detection to the differences in how our country votes. In this first instalment, we learn about election fingerprints and how they may be used to detect fraud in the form of ballot stuffing or vote padding.

Election data geekery

For the first time, the data geeks have finally gotten some love. Highly detailed elections results, broken down all the way to the precinct level, have been published online by the Commission on Elections (COMELEC) as well as poll watchers and the media.

There are many things I imagine we could do with this data, but one of the most popular uses is to assess the risk of elections irregularities. For the first few parts of this series, we'll try to carefully and scientifically assess the risk of election irregularities.

Going back to the methodology highlighted in a 2014 post, part 1 of this series will focus on detecting elections irregularities through vote padding, defined as the adding of fraudulent votes into the count to increase a candidate's probability of a win, or, conversely, the shaving of legitimate votes from the count to decrease a candidate's probability of a win.

Statistical detection of vote padding

Vote padding, sometimes called ballot stuffing, is a form of electoral fraud that involves adding fake votes or shaving legitimate votes to favor a particular candidate. This is not be detectable in the final aggregated election results. However, if vote padding only occurs in a subset of jurisdictions it can change the distribution of voter turnout and vote share in a way that allows detection from granular election data.

This method was demonstrated in this PNAS Paper¹, where they showed that Russian and Ugandan elections, known to be marred with electoral fraud, contained "election fingerprints" that we smeared towards the top left:

Let's think about this: what happens when fake votes are added to the count?

Increase in voter turnout - because there are now more voters than actual, there is an increase in the % of voters that voted in particular cities/municipalities.
Increase in candidate vote share - the favored candidate will see an increase in the percentage of votes won.

When you have a significant proportion of areas that have this high turnout, high vote share combination, there is an increased risk that electoral irregularities have occurred.

If we replicate this analysis for our elections, we find that there isn't anything that jumps out immediately. You can explore the plots in the following section:

For the presidential race, nothing seems to be out of order, as most of the fingerprints are concentrated around a central mass and with minimal "smearing." For the vice presidential race, you can see a bit of bimodality in terms of the winning percentage for MARCOS, BONGBONG, but the voter turnout is not high enough to cause "smearing." This is a symptom of a polarizing candidate â€“ some areas voted heavily for the candidate, and some did not at all. For the senatorial race, nothing is out of order.

What if the fraud was not as widespread, and it is not immediately detectable by a simple visual inspection? Perhaps, constructing a single index of vote padding risk can allow us to tease out the subtle differences.

Creating a vote padding risk score

The authors of the PNAS¹ paper have devised a simple logarithmic transformation for the vote counts. The distribution of this transformed variable is most likely to be normal (i.e. bell-shaped) for elections with minimal irregularity. Details of this transformation are outlined in the paper. As expected, logarithmic vote counts from the Russian and Ugandan elections show highly negative skewness and highly positive kurtosis, inconsistent with a normal distribution that has skewness and excess kurtosis of 0.

So what does it mean in this case? When we compute the skewness and kurtosis of the logarithmic vote counts, the further they are from 0 (negative skewness and positive kurtosis), the higher the risk of vote padding. Computing these values for all national-level candidates, we can construct the following chart:

How to read this chart: The closer the values are to the top left corner, the higher the risk of vote padding.

Apart from a few party list and senatorial candidates that have understandably strong vote shares in one particular group of cities/municipalities but fall extremely flat in others (BALIGOD, LEVITO, ALONA, KGB, ANG KASANGGA), there seem to be no particular candidates that stand out.

What does this mean?

Let me be clear: This does not mean that there was no electoral fraud - it simply means that the risk of fraud through this particular form - vote padding or ballot stuffing - is significantly low. Remember, data cannot serve as definitive proof â€“ it can only guide investigation and quantify risk. I highly encourage you to go through these important caveats.

Interactive: View the underlying data

If you're interested in finding out more (and potentially sniffing out vote padding for yourself), I highly encourage you to play around with this small Shiny widget. If it does not respond, it might mean that there is too much load on the server. Hover over the points to see more information about how each city/municipality voted.

Important caveats

I've used careful language in presenting this analysis, and that's mainly to avoid misinterpretation; these elections have been very heated, both between candidates and among the general public. I have to make certain things clear:

Statistics can't prove nor disprove fraud. At the most, it can assess the risk of fraud and guide investigation.
The results of an analysis should be taken in the context of its scope, limitations, and assumptions. Sometimes, these are more important than the findings themselves.
Just because this particular analysis shows/does not show signs of electoral irregularity, does not mean that there was/wasn't fraud committed. Each analysis is designed to detect a particular kind of fraud only.

Data notes

The data was scraped from the COMELEC's public election results page, as of May 25, 2016. At that time, 96.69% of election returns were transmitted, and 99.93% of city/municipality certificates of canvass were received. For a full list of cities and municipalities that have no results see here.
Data, code, and computations are available on Github.

Klimek, Yegorov, Hanel, Thurner (2012). Statistical detection of systematic election irregularities. Proceedings of the National Academy of Sciences of the United States of America 109(41). ↵ ↵ ²

Troy is a data nerd that wants you to give numbers a chance; passionate about data driven decision making in businesses and government. City Operations at Uber and data science blogger at tjpalanca.com. The opinions expressed herein are those of the author only and not of his employer nor of GMA News Online.

Tags: eleksyon2016, datajournalism

Election data geekery

Statistical detection of vote padding

Creating a vote padding risk score

What does this mean?

Interactive: View the underlying data

Important caveats

Data notes

Duterte's first ICC appearance set for Friday — court

Duterte said he’s fine after medical check-up, says PH embassy in The Hague

Duterte taps Medialdea as one of his counsels in ICC proceedings

Another round of rollback in fuel prices seen next week

Marcos: Certified investments hit P4.6T as of February

Metro Pacific selling 20% stake in tollways unit to unload debt

ZUS Coffee completes PVL All-Filipino Conference QF cast

Eminem joins potential WNBA ownership group in Detroit —report

Mapua goes 4-0 in men's volleyball; Lady Cardinals outlast EAC in five sets

ZUS Coffee completes PVL All-Filipino Conference QF cast

Eminem joins potential WNBA ownership group in Detroit —report

Mapua goes 4-0 in men's volleyball; Lady Cardinals outlast EAC in five sets

ZUS Coffee completes PVL All-Filipino Conference QF cast

Filipinos among crew of container ship in UK sea collision

Reports of vote buying in Hong Kong rally 'fake news' — Comelec

Over 40 Filipinos repatriated from Israel — DMW

PAGASA: Easterlies to bring isolated rains in parts of PH

Heat index in Dagupan City to reach 44°C on March 12, 2025

Elon Musk blames X outage on cyberattack

Kim Soo Hyun's agency issues official statement addressing allegations

Filipino wins top prize in ‘I Can See Your Voice Singapore’ episode

Paolo Contis says he’s physically prepared for ‘very challenging’ villain role in ‘Mga Batang Riles’

Here are the additional dates for 'Into The Woods'

It's summer. If you're going to have a fling, here's how to do it safely

"F1" movie, starring Brad Pitt, launches action-packed trailer

Here are the additional dates for 'Into The Woods'

It's summer. If you're going to have a fling, here's how to do it safely

"F1" movie, starring Brad Pitt, launches action-packed trailer

Here are the additional dates for 'Into The Woods'

Hirap ka bang mag-move on? Here's why

How to bounce back from burnout and be motivated at work

NO BULLYING! Paano ba matutulungan ang mga nabu-bully?

Vico Sotto humors netizens with 'manual transition' in latest YouTube video

Taking blood pressure, blood sugar levels among College friends' Christmas party games

Three-year-old German ‘mini-Picasso’ makes splash in art world