Comparisons

All Python objects also respond to comparisons: tests for equality, relative magnitude, and so on. Python comparisons always inspect all parts of compound objects until a result can be determined. In fact, when nested objects are present, Python automatically traverses data structures to apply comparisons from left to right, and as deeply as needed. The first difference found along the way determines the comparison result.

This is sometimes called a recursive comparison—the same comparison requested on the top-level objects is applied to each of the nested objects, and to each of their nested objects, and so on, until a result is found. Later in this book—in Chapter 19—we’ll see how to write recursive functions of our own that work similarly on nested structures. For now, think about comparing all the linked pages at two websites if you want a metaphor for such structures, and a reason for writing recursive functions to process them.

In terms of core types, the recursion is automatic. For instance, a comparison of list objects compares all their components automatically until a mismatch is found or the end is reached:

>>> L1 = [1, ('a', 3)]           # Same value, unique objects
>>> L2 = [1, ('a', 3)]
>>> L1 == L2, L1 is L2           # Equivalent? Same object?
(True, False)

Here, L1 and L2 are assigned lists that are equivalent but distinct objects. As a review of what we saw in Chapter 6, because of the nature of Python references, there are two ways to test for equality:

The == operator tests value equivalence. Python performs an equivalence test, comparing all nested objects recursively.
The is operator tests object identity. Python tests whether the two are really the same object (i.e., live at the same address in memory).

In the preceding example, L1 and L2 pass the == test (they have equivalent values because all their components are equivalent) but fail the is check (they reference two different objects, and hence two different pieces of memory). Notice what happens for short strings, though:

>>> S1 = 'spam'
>>> S2 = 'spam'
>>> S1 == S2, S1 is S2
(True, True)

Here, we should again have two distinct objects that happen to have the same value: == should be true, and is should be false. But because Python internally caches and reuses some strings as an optimization, there really is just a single string 'spam' in memory, shared by S1 and S2; hence, the is identity test reports a true result. To trigger the normal behavior, we need to use longer strings:

>>> S1 = 'a longer string'
>>> S2 = 'a longer string'
>>> S1 == S2, S1 is S2
(True, False)

Of course, because strings are immutable, the object caching mechanism is irrelevant to your code—strings can’t be changed in place, regardless of how many variables refer to them. If identity tests seem confusing, see Chapter 6 for a refresher on object reference concepts.

As a rule of thumb, the == operator is what you will want to use for almost all equality checks; is is reserved for highly specialized roles. We’ll see cases later in the book where both operators are put to use.

Relative magnitude comparisons are also applied recursively to nested data structures:

>>> L1 = [1, ('a', 3)]
>>> L2 = [1, ('a', 2)]
>>> L1 < L2, L1 == L2, L1 > L2        # Less, equal, greater: tuple of results
(False, False, True)

Here, L1 is greater than L2 because the nested 3 is greater than 2. By now you should know that the result of the last line is really a tuple of three objects—the results of the three expressions typed (an example of a tuple without its enclosing parentheses).

More specifically, Python compares types as follows:

Numbers are compared by relative magnitude, after conversion to the common highest type if needed.
Strings are compared lexicographically (by the character set code point values returned by ord), and character by character until the end or first mismatch ("abc" < "ac").
Lists and tuples are compared by comparing each component from left to right, and recursively for nested structures, until the end or first mismatch ([2] > [1, 2]).
Sets are equal if both contain the same items (formally, if each is a subset of the other), and set relative magnitude comparisons apply subset and superset tests.
Dictionaries compare as equal if their sorted (key, value) lists are equal. Relative magnitude comparisons are not supported for dictionaries in Python 3.X, but they work in 2.X as though comparing sorted (key, value) lists.
Nonnumeric mixed-type magnitude comparisons (e.g., 1 < 'spam') are errors in Python 3.X. They are allowed in Python 2.X, but use a fixed but arbitrary ordering rule based on type name string. By proxy, this also applies to sorts, which use comparisons internally: nonnumeric mixed-type collections cannot be sorted in 3.X.

In general, comparisons of structured objects proceed as though you had written the objects as literals and compared all their parts one at a time from left to right. In later chapters, we’ll see other object types that can change the way they get compared.

Python 2.X and 3.X mixed-type comparisons and sorts

Per the last point in the preceding section’s list, the change in Python 3.X for nonnumeric mixed-type comparisons applies to magnitude tests, not equality, but it also applies by proxy to sorting, which does magnitude testing internally. In Python 2.X these all work, though mixed types compare by an arbitrary ordering:

c:\code> c:\python27\python
>>> 11 == '11'                          # Equality does not convert non-numbers
False
>>> 11 >= '11'                          # 2.X compares by type name string: int, str
False
>>> ['11', '22'].sort()                 # Ditto for sorts
>>> [11, '11'].sort()

But Python 3.X disallows mixed-type magnitude testing, except numeric types and manually converted types:

c:\code> c:\python33\python
>>> 11 == '11'                          # 3.X: equality works but magnitude does not
False
>>> 11 >= '11'
TypeError: unorderable types: int() > str()

>>> ['11', '22'].sort()                 # Ditto for sorts
>>> [11, '11'].sort()
TypeError: unorderable types: str() < int()

>>> 11 > 9.123                          # Mixed numbers convert to highest type
True
>>> str(11) >= '11', 11 >= int('11')    # Manual conversions force the issue
(True, True)

Python 2.X and 3.X dictionary comparisons

The second-to-last point in the preceding section also merits illustration. In Python 2.X, dictionaries support magnitude comparisons, as though you were comparing sorted key/value lists:

C:\code> c:\python27\python
>>> D1 = {'a':1, 'b':2}
>>> D2 = {'a':1, 'b':3}
>>> D1 == D2                            # Dictionary equality: 2.X + 3.X
False
>>> D1 < D2                             # Dictionary magnitude: 2.X only
True

As noted briefly in Chapter 8, though, magnitude comparisons for dictionaries are removed in Python 3.X because they incur too much overhead when equality is desired (equality uses an optimized scheme in 3.X that doesn’t literally compare sorted key/value lists):

C:\code> c:\python33\python
>>> D1 = {'a':1, 'b':2}
>>> D2 = {'a':1, 'b':3}
>>> D1 == D2
False
>>> D1 < D2
TypeError: unorderable types: dict() < dict()

The alternative in 3.X is to either write loops to compare values by key, or compare the sorted key/value lists manually—the items dictionary method and sorted built-in suffice:

>>> list(D1.items())
[('b', 2), ('a', 1)]
>>> sorted(D1.items())
[('a', 1), ('b', 2)]
>>>
>>> sorted(D1.items()) < sorted(D2.items())          # Magnitude test in 3.X
True
>>> sorted(D1.items()) > sorted(D2.items())
False

This takes more code, but in practice, most programs requiring this behavior will develop more efficient ways to compare data in dictionaries than either this workaround or the original behavior in Python 2.X.