When a digital lock's buttons become worn or dirty after the same code has been entered a number of times, knowing that these characters are used in the code greatly reduces the number of sequences one would have to try to break the code (when I say "digital" lock, I mean any lock that uses buttons to enter the digits of a code, which includes electronic keypads on doors or copy machines, as well as mechanical push button locks). This article takes a look at exactly how many sequences N are possible given a certain number x of known characters in a sequence of specified length n. While we can determine x by counting the dirty/worn buttons, n can be determined by either knowing that a particular model lock uses a certain length code, or by listening to clicks or beeps when someone enters the code. If we don't know what n is, to crack the code we would try all sequences starting with the smallest possible n (nx, so start with n = x), then progress to sequences with one added digit until the code is found.

Consider a digital lock which has x dirty or worn buttons; that is, we know the code which opens the lock uses exactly x different characters, but we do not know how many times each character is used, or where in the code they are located. We want to know how many possible sequences there are if the code is n digits long. This is given by the expression below:

The equation grows with x, picking up an extra term every time x increases. These extra terms account for and remove invalid sequences which do not include all x characters in the n long sequence.

The first quantity in the equation is the total number of unique n digit long sequences, with x possible characters available. The total number of sequences includes, for example, sequences where only one character is used n times. But we are only interested in sequences where all x characters are used (how many times each character is used is unknown, we only require that each character be used at least once). When x = 1, only the first term in the equation is needed, and the number of sequences is always 1 (there is only 1 way to make an n long sequence with 1 character).

The second quantity is needed for x >= 2, and eliminates all sequences where only one character is used. The third quantity is needed for x >= 3, and it enumerates all sequences n long with only 2 characters used. Similarly, the third expression is needed for x >= 4, and it enumerates sequences with 3 characters. Similar expressions are added for x >= 5, 6, 7..., and it can be seen that these expressions follow a simple pattern.

The third term, which counts up all sequences that are n long and use exactly 2 characters from an available x characters, is the first term which is not immediately obvious. The quotient in the summation is a binomial coefficient (n choose j), and this tells us how many permutations of j of one character and n - j of another character there are in an n long sequence. The summation counts through all possible values for j and n - j (for example, in a 4 character long sequence, with characters A and B, we can have 1 A and 3 B's, 2 A's and 2 B's, or 3 A's and 1 B). The factor in front of the summation is another binomial coefficient, which counts the number of distinct combinations of 2 different characters out of x available.

The fourth and later quantities operate similar to the third, where multinomial coefficients are now used to count permutations. For the case x = n, the expression reduces to a factorial, where N = x! = n! can be used instead.

Below is a table enumerating possible sequences, with sequences as long as 10 digits using up to 10 different characters.

  Length of sequence (n)
1 2 3 4 5 6 7 8 9 10
Number of characters used (x) 1 1 1 1 1 1 1 1 1 1 1
2 0 2 6 14 30 62 126 254 510 1022
3 0 0 6 36 150 540 1,806 5,796 18,150 55,980
4 0 0 0 24 240 1,560 8,400 40824 186,480 818,520
5 0 0 0 0 120 1,800 16,800 126,000 834,120 5,103,000
6 0 0 0 0 0 720 15,120 191,520 1,905,120 16,435,440
7 0 0 0 0 0 0 5,040 141,120 2,328,480 29,635,200
8 0 0 0 0 0 0 0 40,320 1,451,520 30,240,000
9 0 0 0 0 0 0 0 0 362,880 16,329,600
10 0 0 0 0 0 0 0 0 0 3,628,800

It can be seen that if one were to pick a code n digits long for their lock, it would be wise to pick a code that uses x characters such that x maximizes the number of codes possible if n is publicly known. If a certain lock is known to use codes 5 digits long, for example, it would be best to pick a code using 4 characters. This way, when the buttons become worn or dirty so that the characters used are known, it leaves the maximum possible number of sequences that one would have to try before cracking the code.

The following Python program can be used to calculate the number of possible sequences given an arbitrary sequence length and number of characters used.


#!/usr/bin/env python
# Nick Masluk
# 2010-09-18

from math import factorial

# return the product of all elements in a list
def product(row):
    p = 1
    for value in row:
        p = p * value
    return p

# Define binomial/multinomial coefficients, n choose k.
# Check the type of k (int or list) with an if statement to implement
# polymorphism.
def choose(n, k):
    """Returns the binomial/multinomial coefficient for n choose k."""
    if type(k) == int:
        return factorial(n) / \
        (factorial(k) * factorial(n - k))
    if type(k) == list:
        return factorial(n) / \
        (product(map(factorial, k)) * factorial(n - sum(k)))

# multinomial summations for counting invalid sequences in dse()
def mult_sum(level_index, n, k):
    summ = 0
    if level_index == 0: # if final summation
        for j in range(1, n - sum(k)):
            summ = summ + choose(n, k + [j])
        for j in range(1, n - sum(k) - level_index):
            summ = summ + mult_sum(level_index - 1, n, k + [j])
    return summ

# digital lock sequence enumerator
def dse(n, x):
    """Returns the number of n-long sequences possible using x unique
    if x > n:
        return 0
    if x == n:
        return factorial(n)
    combos = x**n
    if x >= 2:
        combos = combos - x
    if x >= 3:
        for level in range(0, x - 2):
            combos = combos - choose(x, level + 2) * mult_sum(level, n, [])
    return combos

n = input("Length of sequence: ")
x = input("Number of characters used: ")
combos = dse(n, x)
print str(combos) + " possible combination" + ("s" if combos != 1 else "")