Generating the finite subsets of {0, 1}*

{0, 1}* is the set of all strings consisting of the digits 0 and 1:

{ ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, ... }.

Here the ε symbol stands for the zero-length string.

Consider the set containing all the finite subsets of {0, 1}*; that is, the set of all the finite sets whose members are only strings of 0s and 1s:

{}, {ε}, {0}, {ε, 0}, {1}, {ε, 1}, {0, 1}, {ε, 0, 1},
{00}, {ε, 00}, {0, 00}, {ε, 0, 00},
{1, 00}, {ε, 1, 00}, {0, 1, 00}, {ε, 0, 1, 00}, ... }.

Suppose we want to write a program that generates this set, i.e. a program that enumerates all the finite subsets of {0, 1}*. (Of course, since there are infinitely many such subsets, it cannot really store them all explicitly in a list or something of that sort, but it can in principle generate as many as we wish, or, given an n, generate the n-th according to some ordering. That is what we mean by generating this set.) In addition, we want this program to be as short as possible, and it should preferably avoid generating the same set more than once. This web page presents some attempts in various programming languages.

We welcome solutions in other languages, as well as alternative solutions in languages already covered here. Just mail them to me.

Haskell

(Author: Uros Jovanovic)

p [] i=[[i]]
p (x:y) i=[x++[i]]++(p y i)
q (x:y) s=(p s x)++(q y ((p s x)++s))
z []=[]
z (x:y)=["0"++x]++["1"++x]++(z y)
w=["0","1"]++(z w)
g=[[]]++(q w [])

[ToDo: explanation]

Python

def a():
  yield ""
  for s in a(): yield "0" + s; yield "1" + s
def b(M):
  yield [M]
  for s in a():
    if s==M: break
    for t in b(s): yield t + [M]
print []
for s in a():
  for t in b(s): print t

Here a generates the set {0, 1}* and b(M) generates all the sets that contain only the string M and zero or more of the strings that a generates before generating M. The last three lines then call b(M) for every string M generated by a.

If we wanted a real Python generator for the finite subsets of {0, 1}*, we could replace the last three lines by a third generator:

def c():
  yield []
  for s in a():
    for t in b(s): yield t

Note: because this program uses generators, it requires Python 2.2 or later. In fact it also requires the from __future__ import generators statement, though eventually this won't be necessary as generators will be available by default.

Python again

def c(n):
  s = ""
  while n > 0: s = "01"[n%2] + s; n >>= 2
  return s
def x():
  i = 0
  while 1:
    s = c(i);
    i += 1;
    yield [c(j + 1)[1:] for j in range(len(s)) if s[-1 - j] == "1"]

Here, c(n) returns the binary representation of the integer n, except for n = 0 where it returns an empty string. Thus, n = 0, 1, 2, 3, 4, 5, 6, ... is mapped into ε, 1, 10, 11, 100, 101, 110, etc. By taking values of n from 1 onwards and removing the first character, we get ε, 0, 1, 00, 01, 10, etc., which is exactly an enumeration of {0, 1}*. We do not need to worry about overflows because Python (from version 2.2 onwards) automatically promotes ints to longs when necessary, and longs allow us to work with arbitrarily large numbers.

In addition, the binary representation of a number can be used to define which strings from {0, 1}* are present in some subset -- each bit corresponds to a string, and the string is included if the corresponding bit is set. This is used by x() to generate the subsets in turn.

The following is a slightly improved version where c(n) produces the reverse string, allowing us to save a few characters by using s += ... instead of s = ... + s and by accessing its characters as s[j] instead of s[-1-j]. In addition, it avoids whitespace wherever possible.

def c(n):
 s=""
 while n>0:s+="01"[n%2];n>>=2
 return s
def x(i=0):
 while 1:s=c(i);i+=1;yield[c(j+1)[:-1]for j in range(len(s))if s[j]=="1"]

We might save another character by using /= instead of >>=, but Python will eventually begin converting results of division into floats if the actual mathematical result of the division would not be an integer, and to avoid this we would then have to use the //= operator, which isn't any shorter.

 

This page is maintained by Janez Brank.
Last changed on 15 May 2002.

[To the index.]