title Security: Errata for "Messin' with Texas - Deriving Mother's Maiden Names from Public Records" user virgil ip 131.215.44.206 vol 1 lock ******** This is an erratum for academics who have read the paper I wrote with /link http://markus-jakobsson.com Markus Jakobsson on deriving Mother's Maiden Names from public records (published in Winter 2007). In Feb 2008, I discovered a suboptimality in the paper. Apparently no one else has spotted it yet, but here's the fix. I've already corrected the demo at /link http://mmn.virgil.gr/ to use the optimal measure. /bar In our paper " /link http://virgil.gr/50+CryptoBytes-Winter07.pdf Messin' with Texas: Deriving Mother's Maiden Names from Public Records ", we did something suboptimal in the way we quantified the guessability of Mother's Maiden Names (MMNs). The major results do not change when switching to the optimal method for characterizing the guessability of MMNs, but the graphs in the paper are more useful. The mistake is how we quantified the number of guesses it would take to know someone's MMN given certain data about them (their last name, year they were born, county in which they were born, etc.). In the paper we used Shannon Entropy to quantify the guessability of the correct MMN. We never directly stated that the Shannon Entropy of the distribution of MMNs is the expected number of guesses it would take guess the MMN, but it is implied. /center Example plot using Entropy: /center /imagefile CryptoBytes-Winter07-8.png /center( Where Entropy = /imagefile entropy.png /center) Shanon Entropy gives you the expected number of guesses it would take a compromise a MMN if you can ask yes/no questions about /i arbitrary subsets of x. I.e. "Is the MMN Brown, Smith, Johnson, ... ?" However, the proposed attack scenario is someone calling a bank over the phone and attempting to guess someone's MMN by guessing the high-probability MMNs first. In this scenario, the attacker cannot ask yes/no questions about an arbitrary subset of x, the attacker can only ask about one maiden name at a time. Ex: /i "My mother's maiden name is 'Smith' -- No? Err.... I mean 'Brown'". For this attack scenario, the correct metric to use is the algorithm that outputs the expected number of guesses before MMN compromise guessing one MMN at a time, guessing the most probable MMNs first. This algorithm is: /imagefile expectedguesses.png Where the R_n's are sorted in non-increasing order. That is R_{n} >= R_{n+1} for all n. And sum_n( R_n / |x| ) = 1.0 . To understand this intuitively, the expected number of guesses before MMN compromise is the expected value of compromising the MMN on the n'th guess. So, the probability of guessing the MMN correctly on the first guess is R_1 / number_of_records = R_1 / |x|. Likewise, the probability of guessing the MMN correctly on the second guess is R_2 / number_of_records = R_2 / |x|. Exchanging Shannon Entropy with expected #guesses until compromise does alter the graphs in the paper, but the correction does not change the major results because the part we focused on, the most vulnerable end where the attacker is only allowed 1 or 2 guesses, the suboptimal measure and the optimal measure are one-to-one.