Chapter 11: Arrays

In the last chapter we had a long sequence of code that looked like this:

if sign == "Aries":
    adjectives = "assertive, impulsive, defensive"
elif sign == "Taurus":
    adjectives = "resourceful, thorough, indulgent"
elif sign == "Gemini":
    adjectives = "logical, inquisitive, fast"
elif sign == "Cancer":
    ...

The whole if/else sequence is verbose, repetitive, and tedious to write. This should imply to you that the computer should be doing it for you instead. (Computers exceed at repetitive and tedious tasks.) What we need here is an array. You can use an array to associate one thing with another. In this case we want to associate the string “Aries” with the string “assertive, impulsive, defensive”. We also want to associate the string “Taurus” with the string “resourceful, thorough, indulgent”. And so on for the other 10 signs.

Once you’ve made the 12 associations, you can just ask the computer for what’s associated with “Gemini” and you’ll get the adjectives for it. An array is a variable. So far we’ve dealt with variables that stored numbers (like 25) and strings (like “Yogurt”). You can store an array in a variable too. Let’s say that our array is called signs and we want to tell the computer about the Gemini association. We’d write it like this:

signs["Gemini"] = "logical, inquisitive, fast"

Admittedly, this syntax looks strange, but it make sense once you start using arrays more. Here we’re saying that in the array called signs, we want to associate the string “Gemini” with the string “logical, inquisitive, fast”. It’s a little bit as though we had made a special variable like this:

signs_gemini = "logical, inquisitive, fast"

This would be a normal variable called signs_gemini that had the value “logical, inquisitive, fast”. The advantage of using an array is that, later, when you want to look up the adjectives for a sign, you can do this:

print "What's your sign?"
sign = input()
adjectives = signs[sign]
print adjectives

You get a string from the user, using the “input()” function, and put the results into the variable sign. You then look up the string associated with that string in the array signs, and put the result into the variable adjectives.

It’s pretty subtle what’s going on here, so we’ll go over it one more time. Normally when you store a string into a variable, like this:

signs_gemini = "logical, inquisitive, fast"

you can only get the value back by referencing the variable by name, like this:

print signs_gemini

The name of the variable is set in stone when you write the program. If the user is not a Gemini, then this print statement won’t display the correct thing. That’s why we need that long sequence of if/else statements: to make sure we were displaying the right variable.

With arrays, we only need a single variable. Instead of associating adjectives with a variable called signs_gemini, we associate them with the string “Gemini” in the array signs. The fact that we’re using a string with the contents “Gemini” is the important part. Since we’re setting up the association with a string, that means that we can get the value out using a string, the string we get from the user. So no more if/else statements.

Perhaps we should write a small program to illustrate how arrays work. Let’s write a program that, given rock, paper, or scissors, tells us what object it beats. Without arrays we’d write it like this:

print "What object?"
object = input()
if object == "rock":
    rule = "Breaks scissors."
elif object == "paper":
    rule = "Covers rock."
elif object == "scissors":
    rule = "Cuts paper."
print rule

With arrays we’d write it like this:

rules["rock"] = "Breaks scissors."
rules["paper"] = "Covers rock."
rules["scissors"] = "Cuts paper."

print "What object?"
object = input()
rule = rules[object]
print rule

The first section sets up the associations in the array rules. There are three associations, one for each rule. Then we get the name of the object from the user, look up the rule associated with that object, and print it out. Let’s “play computer”. In the first line, we will make a new variable called rules, which will be an array. In it we associate the string “rock” with the string “Breaks scissors”. (The association is only one way. You can only go from the first string to the second.) The second and third line work the same way. After the third line, the array rules contains three associations. After we get the object from the user, we reach this line:

rule = rules[object]

Let’s say the object variable contains the string “paper”. In this line we’ll asking the computer to find the array called rules, and see if the string “paper” was ever associated with anything. In this case the computer finds (in memory) that indeed the string “paper” was associated with the string “Covers rock.” That gets put into the variable “rule”, and in the next line gets printed out.

The second version of this program, the one that uses an array, is much nicer to look at. It’s cleaner, shorter, and easier to read. You can glance at the first three lines to see what the rules are. No need to follow an awkward sequence of if/else statements. For those reasons alone it would be worth using an array. But there are other advantages, too. With a sequence of if/else statements, the sequence is set in stone (hard-coded) when you write the program. But an array is just a variable like any other, so it can be set when the program is run! Instead of hard-coding our rules, we could get them from the user. Here’s a program that does exactly that:

while true:
    print "New rule. What object?"
    object = input()

    if object == "done":
        break

    print "What rule?"
    rule = input()

    rules[object] = rule

print "What object?"
object = input()
rule = rules[object]
print rule

The first part of this program (the while loop) is new. The second part is identical to the program we had above. So we’ve replaced the three lines that hard-coded the three rules into a while loop that creates it on the fly. Let’s walk through that code because what it’s doing is pretty important to understanding the power of programming and of arrays.

We had previously seen a while loop, like this:

while count >= 1:
    ...

This meant, “Do the indented statements as long as the value of the count variable is greater than or equal to 1.” Strictly speaking, what it’s really saying is, “Do the indented statements as long as the formula count >= 1 is true.” Here our while statement looks like this:

while true:
    ...

This means, “Do the indented statements as long as true is true.” Well, the formula true is always true. This is called an infinite loop. It’s really saying, “Do the indented statements forever.” We’ll deal with the “infinite” part later.

Let’s take a look at the indented statements. We’re trying to build a new rule. Rules contain two parts, the object (a string) and the other object that it beats (another string). We must get both from the user in order to build up our set of rules. We first get the object:

    print "New rule. What object?"
    object = input()

Let’s skip the if statement for now. We then get the rule:

    print "What rule?"
    rule = input()

and we associate the object with the rule in the rules array:

    rules[object] = rule

That last line is just like the lines we had before:

rules["rock"] = "Breaks scissors."

except that the strings are given to us by the user instead of being hard-coded in the program. After making this association, we reach the end of the indented statements, and the while loop starts back at the top again, forever. We don’t really want it to go on forever, or we’d never get to the second part of the program where we make use of the rules array. So we say that if the user enters the object “done”, then we stop the loop and move on with our program. That’s the if statement right after getting the object:

    if object == "done":
        break

If the object that the user entered is the string “done”, then do the indented statement, which is the statement break. We haven’t seen this statement before. It means, “Stop the loop right now, jump to the statements right after it.” It breaks the loop. That’s how we get around the problem of having an infinite loop. You can use break statements with non-infinite loops too.

You might think it’s strange to set up an infinite loop and then to break it half-way through. That’s because the while loop is set up to test the condition (like count >= 1) at the top of the loop. Then the whole set of indented statements is run, then the condition (the equation after while) is checked again. The while loop can only stop at the top of the loop. But in our case we don’t know that we want to stop until the user has entered “done” for the object, and that’s half-way through the set of indented statements. While statements aren’t set up to check for conditions there. So we fake the while loop into going on forever, and break out of it on our own terms half-way through.

We can run the program and enter any set of rules we want:

New rule. What object?
bear
What rule?
Mauls ninja.
New rule. What object?
cowboy
What rule?
Shoots bear.
New rule. What object?
ninja
What rule?
Dodges bullets of cowboy.
New rule. What object?
done
What object?
ninja
Dodges bullets of cowboy.

Try that with an if/else sequence!

Those are the basics of arrays. They’re a bit like a database that you can build up on the fly and then use later. There are a few more details to discuss. Firstly, what happens when we look up something that was never put into the array? Our original array example looked like this:

rules["rock"] = "Breaks scissors."
rules["paper"] = "Covers rock."
rules["scissors"] = "Cuts paper."

print "What object?"
object = input()
rule = rules[object]
print rule

What if the user enters the string “duck”. That was never associated with anything in the array rules. What should the computer do when it tries to run this line:

rule = rules[object]

and the object variable is the string “duck”? This depends on the language. Some languages stop the program and show an error message, and some just set the rule variable to an empty string. In some languages it’s possible to check whether a particular association exists. In Python, for example, you could write:

if rules.has_key(object):
    rule = rules[object]
else:
    print "That's not a valid object."

That first line has a very strange syntax, with the has_key function after a period. We won’t explain it here, but basically the computer is being asked whether the rules array has an association for the key object. (The key of an array is the string that you look up. The value is what the key is associated with.)

So far we have been using arrays with strings. Both the keys and the values have been strings:

rules["rock"] = "Breaks scissors."

Can we use anything else but strings? The value (in this case “Breaks scissors.”) can be anything you could assign to a normal variable, such as a string, an integer, a real number, or even another array. The key (in this case “rock”) can be a string or an integer, like 25. If you wanted to keep some information about various years, you could write:

years[1066] = "Battle of Hastings"

Or if you wanted to keep track of your favorite five songs, in order, you could write:

songs[1] = "Imagine"
songs[2] = "Baby I Love You"
songs[3] = "Hallelujah I Just Love Her So"
songs[4] = "Only The Good Die Young"
songs[5] = "Holiday"

You could then display them, in order:

for i = 1 to 5:
    print songs[i]

The second example shows the advantage of using an integer key: you can use math (like a loop) to calculate the key. In this case we set i to 1 then increment it, displaying each song, until we reach 5. In fact, arrays that have small integers for keys are treated specially by the computer because they run much faster. Many languages have special rules regarding these kinds of arrays, with extra restrictions (like a fixed number of entries) but extra speed. Conceptually, though, they’re just like the other arrays we’ve been talking about.

Let’s finish this chapter by re-writing our astrological program using arrays. We’ll use Python’s strange has_key function to detect that an invalid sign has been given to us:

def get_sign_information(sign):
    signs["Aries"] = "assertive, impulsive, defensive"
    signs["Taurus"] = "resourceful, thorough, indulgent"
    signs["Gemini"] = "logical, inquisitive, fast"
    signs["Cancer"] = "protective, sensitive, clinging"
    signs["Leo"] = "generous, proud, theatrical"
    signs["Virgo"] = "practical, efficient, critical"
    signs["Libra"] = "co-operative, fair, lazy"
    signs["Scorpio"] = "passionate, sensitive, anxious"
    signs["Sagittarius"] = "free, straightforward, careless"
    signs["Capricorn"] = "prudent, cautious, suspicious"
    signs["Aquarius"] = "democratic, unconventional, detached"
    signs["Pisces"] = "imaginative, sensitive, distracted"

    if signs.has_key(sign):
        adjectives = signs[sign]
    else:
        print "That's not a valid sign."
        stop()

    return adjectives

def display_sign_information(whose, prefix):
    print "What's " + whose + " sign?"
    sign = input()
    adjectives = get_sign_information(sign)
    print prefix + " " + adjectives + "."

display_sign_information("your", "You are")
display_sign_information("your mate's", "Your mate is")