Ruby Iterators and Closures

One of the most beloved features of Ruby is its closure-based iterator syntax. It's been noted that this type of functionality can be replicated in some other languages, but tends not to be because the language itself makes it cumbersome. However, there is a unique magic in closures that is very hard to understand for PHP or Java programmers because those languages don't have closures, and even a lot of Perl and Javascript programmers because those languages don't emphasize closures. When I first came to Ruby I soon learned how to use its iterators, but it was several months before I actually did the mental gymnastics to learn what is the concept beneath the syntactic sugar.

First we need some brief definitions (the respective nodes have much more):

Closure: a function along with the environment (variables) in which it was defined.

Iterator: a means of looping through a collection (array, hash, etc) of some sort.

In Ruby an iterator is simply a method that somehow loops over the contents of an object. The object is often a collection type, such as Array or Hash, but an iterator can easily be defined to iterate over anything. For instance, Ruby's native File::open method can be called as an iterator where it returns one line from the file each iteration.

The magic of ruby iterators is that can call an anonymous function (known as a block in Ruby) that is passed in when the iterator is called. This block looks just like a block in many other languages. Example:

a = [ 1, 2, 3 ]
a.each { |x| print x }

would produce output:

123

So, to clarify the syntax:

  • a is an array object
  • each is a basic iterator method defined for the Array class
  • { } define the block (anonymous function)
  • |x| is the parameter passed to the block inside the iterator
  • print x is the body of the block.

Other than being concise, what's so great about this syntax? How would it be any different from a traditional loop such as:

a = [ 1, 2, 3 ]
for i in 0...a.size
  print a[i]
end

The iterator is a function that defines some overall semantics of an operation; like how to loop over the members of this collection, and what to return. However, by including a block the function is then infinitely customizable by the programmer. So in effect you are decoupling the overall actions to be performed on the collection (in this case: loop through it) from the actions to be performed on each individual member (in this case: print it). However there's more to it than that.

Ruby Iterators

Let's look at how iterators are implemented. The Array.each method could be defined internally like this:

def each
  for i in 0...size
    yield(self[i])
  end
end

The critical part here is the yield function. This is what calls the block that is passed in. As you can see, each is pretty straightforward. each doesn't return anything, but most other iterators do. Take find, for example:

def find
  for i in 0...size
    if yield(self[i])
      return self[i]
    end
  end
  return nil
end

To continue with our previous example, you can call find like this:

a = [ 1, 2, 3 ]
if a.find { |x| x == 2 }
  print "FOUND 2!"
else
  print "DIDN'T FIND 2!"
end

Just to stem any confusion over how this is interpreted, note that the last value evaluated in a ruby function is automatically returned even if there is no explicit return statement. So in this case, the array items are passed to the block one by one and the expression x == 2 is evaluated and returned to the find function which returns immediately if the block returns a value that evaluates to true, or else returns nil if it gets to the end without finding such an element.

Closures

Hopefully by now you are starting to see the utility of Ruby's block-based iterators. Certainly blocks used this way can be valuable. However, so far nothing I've shown has required closures. That is to say, we aren't using any previously defined variables in our blocks. So the block could just as well be a traditional method or function defined with a name, and passed to the iterator by reference—something you could do in PHP or Java although it wouldn't be pretty.

No, the true value of Ruby blocks come from the closure property. Consider the following example:

birthdays = ["June 10", "August 20", "December 19"]
my_birthday = database.get_my_birthday()
birthdays.find { |x| x == my_birthday }

Without closures, my_birthday would have to be a parameter passed into the block. But that would require the find method to pass two arguments in its yield statement. Even worse, find would need to have my_birthday passed into it so it could in turn pass it to the block. So we would have something like:

birthdays = ["June 10", "August 20", "December 19"]
my_birthday = database.get_my_birthday()
birthdays.find_equal_to(my_birthday) { |x,y| x == y }

Which is not only longer and uglier than the original with closures, but is far too specific to be of any general use. At this point you may as well hardcode the block into the find_equal_to method (renamed for clarity). Sure, you can still customize the block, but you've lost 99% of the utility of the find method, so why bother when you can probably write a handful of these functions to take care of all the functionality your program needs.

Blocks in General

Earlier I mentioned decoupling the operations on a collection from the operations on its individual members. This is the ruby iterator paradigm. However, it's of critical importance to understand that ruby blocks allow decoupling to occur anywhere—not just between collections and their members. Any method can take a block as its last parameter, and can yield to that block whenever it chooses. Iterators are just the most common use of blocks in ruby, but they are far from being their only use.

In general blocks allow on-the-fly specialization of methods. Without closures a block would be nothing more than an anonymous function, so you could specialize the code of the method but not the data. With closures all your local variables come along for the ride, so your custom code has an infinite amount of context within which to work. I would argue that customizing on the data is the more useful half of the equation since algorithms tend to be general whereas data is unique to each application.