| Virgil.GRiffith:: Security : Errata for "Messin' with Texas - Deriving Mother's Maiden Names from Public Records" | [Index] |
In our paper " Messin' with Texas: Deriving Mother's Maiden Names from Public Records ", we did something suboptimal in the way we quantified the guessability of Mother's Maiden Names (MMNs). The major results do not change when switching to the optimal method for characterizing the guessability of MMNs, but the graphs in the paper are more useful.
The mistake is how we quantified the number of guesses it would take to know someone's MMN given certain data about them (their last name, year they were born, county in which they were born, etc.). In the paper we used Shannon Entropy to quantify the guessability of the correct MMN. We never directly stated that the Shannon Entropy of the distribution of MMNs is the expected number of guesses it would take guess the MMN, but it is implied.
Shanon Entropy gives you the expected number of guesses it would take a compromise a MMN if you can ask yes/no questions about arbitrary subsets of x. I.e. "Is the MMN Brown, Smith, Johnson, ... ?" However, the proposed attack scenario is someone calling a bank over the phone and attempting to guess someone's MMN by guessing the high-probability MMNs first. In this scenario, the attacker cannot ask yes/no questions about an arbitrary subset of x, the attacker can only ask about one maiden name at a time. Ex: "My mother's maiden name is 'Smith' -- No? Err.... I mean 'Brown'".
For this attack scenario, the correct metric to use is the algorithm that outputs the expected number of guesses before MMN compromise guessing one MMN at a time, guessing the most probable MMNs first. This algorithm is:
Where the R_n's are sorted in non-increasing order. That is R_{n} >= R_{n+1} for all n. And sum_n( R_n / |x| ) = 1.0 .
To understand this intuitively, the expected number of guesses before MMN compromise is the expected value of compromising the MMN on the n'th guess. So, the probability of guessing the MMN correctly on the first guess is R_1 / number_of_records = R_1 / |x|. Likewise, the probability of guessing the MMN correctly on the second guess is R_2 / number_of_records = R_2 / |x|.
Exchanging Shannon Entropy with expected #guesses until compromise does alter the graphs in the paper, but the correction does not change the major results because the part we focused on, the most vulnerable end where the attacker is only allowed 1 or 2 guesses, the suboptimal measure and the optimal measure are one-to-one.