Introduction
In programming languages and mathematical notation, there are a few
different ways to do array indexing - the first choice is to start at 0
or at 1. If you want to refer to an interval (or slice) [ a :
b ] of an array, there is a second choice - should the interval
include or exclude point b. And thirdly, there is the option of using
negative array indexing, and deciding what that should mean.
The node "Array indexes as measure of a language's analness" mentions
that there are different schools of thought on this
subject. Personally I believe there is One True Way of indexing, and
all the rest are just bad design. In this node I will explain what the
right way to do indexing and slicing is, and why. Not incidentally,
Python designer Guido van Rossum had just the same thoughts, and I
will give Python examples.
One True Way of Indexing
Given an array/list (from now on list, since that's the Python
term) L, the One True Way of indexing is:
- L[0] is the first
item of the list.
- L[a:b] is the slice of all the items L[x],
for a <= x < b. So this excludes the endpoint, it is the half-open
interval that is often written [ a, b ) in math.
- Negative indices count from the end of the list. That is, L[-x] =
L[length(L)-x] for x > 0.
What makes one way of indexing better than the other? It's better if it can lead
to simpler expressions in common situations. Off-by-one errors are an extremely common
source of bugs. The fewer "-1" or "+1" terms clutter your expressions
the better. The One True Way of indexing leads to very elegant
expressions, especially when intervals and negative indexing are
involved. Without the intervals, I don't think there is much
difference between starting with 0 or 1. With them...
Python writes a slice L[0:a] as L[:a], and
L[a:len(L)] as L[a:] (and L[:] for the
identity slice, or copy of the original list).
Examples
In the following list, the equations hold for the One True Way;
in italics are the equivalent expressions assuming that we
start counting at 1, an interval includes the end point, and there is
no negative indexing:
- The length of a slice L[a:b] is b-a (if b >= a, 0 otherwise) (b-a+1)
- The first n characters of L: L[:n] (also L[:n])
- The last n characters of L: L[-n:] (L[len(L)-n+1:])
- The identity slice. L == L[0:len(L)] == L[:] (L[1:len(L)])
- The empty slice: L[a:a] is empty for any a. (perhaps L[a:a-1]?)
- A slice of length n, from point a, is L[a:a+n] (L[a:a+n-1])
- For any index a, L == L[0,a]+L[a,len(L)] == L[:a]+L[a:] (L[1:a]+L[len(L)-a+1])
- Given a slice L[a:b] and an index c between them, L[a:b] == L[a:c]+L[c:b] (works for negative indices as well) (L[a:c-1]+L[c:b])
The negative indexing makes it easy to do array operations
'backwards', and most of the above equations still hold for negative a
and b. Not a +1 or -1 in sight for the True Way! On the other hand,
I'm even in doubt whether I wrote all the expressions correctly for
the other option...
Visualization help
One thing people often complain about is that this style of
indexing is counter-intuitive. People start counting items at 1, and
when they say "items 1 to 3" they mean items 1, 2 and 3.
If you have trouble with this, you should visualize lists a little
differently; the indices refer to points between the items in the
list. This way, there is no ambiguity possible (no different interpretations), and the math works out exactly the same. Consider the array with the letters of the alphabet:
+---+---+---+---+---+---+---+---+---+-------
| a | b | c | d | e | f | g | h | i | ....
+---+---+---+---+---+---+---+---+---+------
0 1 2 3 4 5 6 7 8 9
Here, it is clear that [2:5] refers to c, d, and e, and nothing else. It's also natural that you would slice between numbers and not right through them.
The same ideas work well in Computer Science, correctness proofs and the like; this is not just for programming languages. Use half-open intervals.