V8pedia

Tagged values & Smis

Every JavaScript value V8 manipulates — a number, an object, undefined, a string — is, at the machine level, a single word called a tagged value. This page explains the encoding, because almost everything else (the object model, the inline caches, the compilers, the GC) is built on top of it.

::: info Ubiquitous language Tagged: a machine word that carries a low-bit tag telling V8 whether it is a pointer or an immediate integer. Smi ("small integer", say "smee"): an integer stored directly in a tagged word, no heap object involved. :::

The problem

A dynamically-typed language must answer "what is this value?" constantly, and at runtime. Boxing every value as a heap object would be correct but ruinously slow: a simple i + 1 in a loop would allocate. V8's answer is pointer tagging — steal the low bits of a word (which are always zero on aligned pointers) to encode a type tag.

The encoding

The tags are defined in the public-ish internals header:

// Tag information for HeapObject.
const int kHeapObjectTag = 1;
const int kWeakHeapObjectTag = 3;
const int kHeapObjectTagSize = 2;
const intptr_t kHeapObjectTagMask = (1 << kHeapObjectTagSize) - 1;
// Tag information for Smi.
const int kSmiTag = 0;
const int kSmiTagSize = 1;
const intptr_t kSmiTagMask = (1 << kSmiTagSize) - 1;

include/v8-internal.h#L57-L74

Read off the scheme from the low bits:

  • …xxxx0 — low bit 0 → a Smi. The rest of the word is the integer.

  • …xx01 — low bits 01 → a strong pointer to a HeapObject.

  • …xx11 — low bits 11 → a weak pointer to a HeapObject.

So the test "is this a Smi?" is a single AND against kSmiTagMask (1) and a compare against 0 — one or two instructions, no branch misprediction risk on the hot path.

Smi: an integer that never allocates

Because the Smi tag is just the low bit being 0, an integer is stored shifted left. The width depends on the platform, and this is where it gets interesting on 64-bit:

// 32-bit tagged value:
struct SmiTagging<4> { enum { kSmiShiftSize = 0, kSmiValueSize = 31 }; };
// 64-bit tagged value:
struct SmiTagging<8> { enum { kSmiShiftSize = 31, kSmiValueSize = 32 }; };

include/v8-internal.h#L83-L162

  • On a 32-bit tagged word, a Smi is a 31-bit signed integer (1 bit lost to the tag).

  • On a 64-bit tagged word (and, importantly, on 64-bit builds with pointer compression, where the tagged word is 32 bits), the value occupies the upper 32 bits, shifted by 31. This gives a full 32-bit Smi range while keeping decode to a single arithmetic shift.

::: tip Why you see Smi::FromInt everywhere Creating a Smi is (value << shift) | tag — pure arithmetic, often folded at compile time (constexpr). Reading one is a shift. Compare that to allocating a HeapNumber: request memory, write a map word, write the double, and later have the GC trace and possibly move it. This is why keeping array indices and counters in Smi range is a real, measurable win. :::

HeapObject: the other branch

If the value is not a Smi, it is a pointer (tag stripped) to a HeapObject. The conversion is literally an add:

static inline Tagged<HeapObject> FromAddress(Address address) {
  return Tagged<HeapObject>(address + kHeapObjectTag);
}

src/objects/heap-object.h

Every HeapObject begins with a pointer to its Map at offset 0. That is the first thing nearly every operation reads. We cover it next.

::: details C++ aside: Tagged<T> vs Handle<T> Tagged<T> is the raw tagged value — fast, but invalidated if the GC moves the object. When code can trigger GC, it must hold a Handle<T> instead, which the GC updates. See the C++ primer. :::

Why this design matters for performance

  1. No allocation for integers. The vast majority of numbers in real programs are small integers; Smis make them free.

  2. Type checks are bit tests. "Smi or pointer?" is one AND. This check is emitted inline in interpreter handlers, IC stubs, and JIT code millions of times a second.

  3. Cheaper GC. Smis are not pointers, so the GC skips them entirely when tracing — no dereference, no marking, no relocation.

  4. It composes with pointer compression. The same word can hold a 32-bit Smi or a 32-bit compressed pointer, halving memory for both. See Pointer compression.

See also