Object IDs and Fixnums

I’ve been poking around Ruby’s internals, and I found some cool things about the way Ruby represents integers in memory.

Let’s start with the obvious: in Ruby, everything is an object, including integer literals.


irb> 32.class.name
=> "Fixnum"

Every object has an object_id, which you can think of as representing its location in memory, though since Ruby is a VM it doesn’t correspond to a physical memory address.

irb> "hello".object_id
=> 3347120
irb> {:foo => :bar, :halo => :whirled}.object_id
=> 23332010

For most objects, Ruby creates an (effectively random) object_id when the object is instantiated. But some objects, including true, false, and nil, have fixed object_ids that are the same every time the program runs.

irb> false.object_id
=> 0
irb> true.object_id
=> 2
irb> nil.object_id
=> 4

Fixnum object_ids are also, uh, fixed.

irb> 32.object_id
=> 65

But as you can see, the object_id is not the same as the Fixnum itself. But it’s also not hardcoded into the Ruby source like 0 is for false. Instead, it’s calculated – it’s (Fixnum * 2) + 1.

To understand why it’s done this way, let’s take a closer look at how Ruby stores integers.

In languages that are closer to the hardware like C, the size of an integer is the size of the architecture. So on a 32-bit machine, integers are 32 bits. On 64-bit machines, they’re 64 bits.

In VM-based languages, the implementation is free to mess with this convention. In Ruby, on a 32-bit machine, Fixnums are actually 32 bits, but only use 31 bits to represent the number.

This means a Fixnum can hold any number from -((2**30)-1) to (2**30)-1, which is -1073741824 to 1073741824. An integer literal larger or smaller than that is a Bignum.

irb> 1073741823.class.name
=> "Fixnum"
irb> 1073741824.class.name
=> "Fixnum"
irb> 1073741825.class.name
=> "Bignum"
irb> -1073741824.class.name
=> "Fixnum"
irb> -1073741825.class.name
=> "Bignum"

So…what about that other bit?

Here’s where it gets interesting. It’s that last bit that tells us whether we’re looking at the Fixnum itself, or the Fixnum’s object_id. If that last bit is zero, that bit is ignored and the Fixnum is evaluated from the other 31 bits. If that last bit is 1, it uses all 32 bits to get the Fixnum’s object_id.

An example will make this clearer. The Fixnum 4 has an object_id of 9. 4 in binary is 100. When you put that in a Fixnum, it’s got 28 zeroes in front of it for a total of 31 bits.

0000000000000000000000000000100

Now, what happens if you put another 1 at the end to make 32 bits?

00000000000000000000000000001001

That's binary for 9, the object_id of 4. By adding a 1 to the end we shifted the number left, effectively multiplying the Fixnum by 2 and adding 1. So to find the object_id of any Fixnum, just write it out in binary and add a 1 to the end.

There are a couple interesting side effects to this method of calculating the object_id from the Fixnum. First, all Fixnum object_ids end in 1, so they're all odd. This means all the other object_ids in the system will be even.

Second, only the odd Fixnums are used as Fixnum object_ids. Half the Fixnums, then, have Fixnum object_ids, while the other half have object_ids that are too big to fit in 31 bits. Their object_ids are BigNums. If your system instantiates enough objects that you run out of even Fixnums, your regular objects will get Bignum object_ids too.

Who says Ruby is too high-level? :)

3 comments to Object IDs and Fixnums

  • My

    Title…

    Very interesting post. I would like to link back to it….

  • Jeff Read

    Hi,

    Congratulations on discovering some Ruby internals. There’s a certain peculiar joy that comes from hacking around the innards of your favorite runtime, seeing how it all works under the hood. :)

    If Ruby is anything like Lisp (from whose tradition the names “fixnum” and “bignum” come), then the object id employs tag bits to determine the type of an object; and the rightmost bit is such a tag bit to represent small integer constants (fixnums). Various other tag bits would mark an object as true, false, nil, or a character constant.

    Furthermore, odds are pretty good that once you mask the tag bits that identify it as an aggregate object, the object ids of various other objects in the Ruby system indeed become pointers; because such objects are usually aligned to machine-word boundaries, it becomes useful to assume that the last four bits of the pointed-to address are zero, and use those bits as tag bits.

    Again, I don’t know that much about Ruby; I’m only extrapolating from what I’ve learned from Lisp. Nevertheless I hope I’ve opened up new avenues of exploration. :)

  • Jeff Read

    Sorry, above I meant “the last two or four bits, depending on machine-word size”.