Lessons from reading ostruct.rb

During this month's Scottish Ruby User Group meeting we paired up read some code. We chose the source for OpenStruct as it was small and self-contained enough to get through in the hour or so available.

I expected it to be dull, but it was great fun and we all learnt a lot, mostly about stuff I should have known about Ruby, but had missed or forgotten. Here's some highlights:-

`OpenStruct` implementation

First let's quickly revise what on what an OpenStruct does. From the documentation:

An OpenStruct is a data structure, similar to a Hash, that allows the definition of arbitrary attributes with their accompanying values. This is accomplished by using Ruby's metaprogramming to define methods on the class itself.

So you can do the following:-

require 'ostruct'

o = OpenStruct.new
o.name = "Mavis" # arbitrarily create an attribute (name) and assign a value
puts o.name

While OpenStruct is similar to Hash, it isn't a Hash; it does not extend Hash (or include Enumerable). The attributes are stored in a Hash member variable (@table) (see the initialize method). New attributes are captured using method_messing and the accessors are defined as methods on the object.

Freezing an `OpenStruct`

Freezing a Ruby object is supposed to prevent modifications. By default, this is achieved by disallowing assignment to instance variables. As the OpenStructs attributes are stored within a Hash that is assigned on initialisation(@table), then this alone would not prevent assigning values to an OpenStruct; while the OpenStruct would be frozen, @table would not be.

OpenStruct prevents assigning to frozen objects by all write operations accessing @table through the method modifiable.

def modifiable
  begin
    @modifiable = true
  rescue
    raise RuntimeError,
      "can't modify frozen #{self.class}", caller(3)
  end
  @table
end

protected :modifiable

Assigning a value to @modifiable will raise an error, if the object has been frozen.

Another way of ensuring an OpenStruct is properly frozen might be to override the freeze method.

#NOT COPIED FROM ostruct.rb
def freeze
  @table.freeze
  super
end

My guess is that this method was not followed as it would have made it harder to control the error message and stack; the error would be "can't modify frozen Hash", not "can't modify frozen OpenStruct".

Massaging the backtrace

When errors are raised (in modifiable and method_missing) the backtrace is modified to start at the offending piece of client code. I like this — that's where the debugging programmer needs to look to work out a fix, not in the middle of the library code which has had its contract violated.

`define_singleton_method`

define_singleton_method is method on Object that was introduced in Ruby 1.9, but had passed all us ScotRUG members by. It does what it says — defines a method on an object's singleton class: that is it defines a method on an object instance without affecting other instances of its class. Prior to 1.9, the method would need to be retrieved — messy business.

This is the current way OpenStruct dynamically defines methods:-

define_singleton_method(name) { @table[name] }
define_singleton_method("#{name}=") { |x| modifiable[name] = x }

The 1.8.7 way is a little less readable:-

meta = class << self; self; end
meta.send(:define_method, name) { @table[name] }
meta.send(:define_method, :"#{name}=") { |x| @table[name] = x }

`singleton_class`

There's no opposite of define_singleton_method; remove_singleton_method isn't a thing. So, delete_field finds itself dealing directly with the object's singleton class.

def delete_field(name)
  sym = name.to_sym
  singleton_class.__send__(:remove_method, sym,
    "#{sym}=")
  @table.delete sym
end

singleton_class was introduced in 1.9.2 to be used in place of

(class << self; self; end)

This is the feature request thread for singleton_class.

id2name

In method_missing we found:-

  def method_missing(mid, *args) # :nodoc:
    mname = mid.id2name

I have never seen id2name before. It is a method on Symbol that returns the string corresponding to the symbol. I've always used to_s for that, which apparently is a synonym for id2name.

¯\(ツ)/¯

`to_enum`

Being a bit like a Hash, OpenStruct provides the each_pair method for iterating over the key-value pairs:-

def each_pair
  return to_enum(__method__) { @table.size }
    unless block_given?
  @table.each_pair{|p| yield p}
end

Delegating to the @table Hash is straightforward enough. Using to_enum to return an enumerator needed a bit more reading.

to_enum is defined on object and creates a new enumerator, by calling the passed-in method. So by getting an enumerator from each_pair, here's what happens:-

Call each_pair without a block
to_enum on the instance is called passing in each_pair as the method_name.
This time a block will be passed in, allowing the iteration (delegated to @table)

The number of attributes stored (@table.size) is given to to_enum as the return value of a block, because that's how it is optionally done.

Using the return value of a block to get an optional value is a bit unusual. to_enum uses this, as it already has optional values in its method signature — arguments to pass to the method that takes the block.

`initialize_copy`

This is a private method on Object which is called when dup or clone are used to create a copy (or clone). See Jon Leighton's blog post.

def initialize_copy(orig)
  super
  @table = @table.dup
  @table.each_key{|key| new_ostruct_member(key)}
end

OpenStruct overrides this initialize_copy to ensure that a copied object, gets a duplicate version of the @table Hash holding the key value pairs; otherwise the copy would share that data store, which would get weird. It also ensures that the dynamic methods are defined on the new copy; copy (unlike clone) does not duplicate the singleton class, so they would otherwise be missing.

protected members

I don't see the protected keyword used much in application ruby code. I think being able to override encapsulation with send has made us a bit lazy. Allowing the @table data store to be read through a protected accessor, means it can be accessed by other OpenStruct instances when checking equality.

attr_reader :table # :nodoc:
protected :table

def ==(other)
  return false unless other.kind_of?(OpenStruct)
  @table == other.table
end

`inspect`

Inspect shows the contents of the OpenStruct in "key=value" form, where inspect is called on each of the values. Straightforward? You would think so, but here's the implementation:

InspectKey = :__inspect_key__ # :nodoc:
def inspect
  str = "#<#{self.class}"

  ids = (Thread.current[InspectKey] ||= [])
  if ids.include?(object_id)
    return str << ' ...>'
  end

  ids << object_id
  begin
    first = true
    for k,v in @table
      str << "," unless first
      first = false
      str << " #{k}=#{v.inspect}"
    end
    return str << '>'
  ensure
    ids.pop
  end
end

The thread current storage is a bit confusing at first. It's purpose is to guard against infinite recursion, if an OpenStruct instance is stored in itself.

>> o = OpenStruct.new
=> #<OpenStruct>
>> o.o=o
=> #<OpenStruct o=#<OpenStruct ...>>
>>

The object ids of all the OpenStructs currently being inspected are stored in the Thread.current, to ensure that they are only inspected once.

if ids.include?(object_id)
    return str << ' ...>'
end

Evan Phoenix suggested that we should read code, in his keynote at this year's Scottish Ruby Conference. Picking apart some well-written code is a great way to pick up on all the things you should know, but have somehow missed or forgotten.

OpenStruct implementation

Freezing an OpenStruct

Massaging the backtrace

define_singleton_method

singleton_class