Using JavaScript FinalizationRegistry to implement a Copy-on-Write Optimization
- Eldan Ben Haim
- Apr 1, 2023
- 12 min read
Meculcalator is a programmable RPN calculator inspired by the HP-48. It is written mostly in TypeScript, with a wrapper for iOS and a now-defunct Web version which I should probably get back to.
With an RPN calculator, operation is based on a stack of operands. Every command, or function, of the calculator pops any number of operands from the stack (depending on the operation, this may be 0 operands or any other number), computes a result and pushes any number of results. To use an RPN calculator the user alternates between pushing data to the stack and issuing commands. So, if we use [ENTER] to denote a "push" operation, if I want to calculate 3 + 2 with an RPN calculator, I would follow the following sequence:
No. | Step | Resulting Stack |
1 | 3 [ENTER] | 3 |
2 | 2 [ENTER] | 3 2 |
3 | + | 5 |
So, if on a regular calculator I'd push the buttons "3 + 2 =", with RPN that would be "3 [ENTER] 2 [ENTER] +". This order of actions, which is "unintuitive to some", is why they're called Reverse Polish notation calculators. With a more complex example, such as (3 + 2 * 4 + 5 * 9) * 8, the benefits of using this approach become more evident (or do they?)
No. | Step | Resulting Stack |
1 | 3 [ENTER] | 3 |
2 | 2 [ENTER] | 3 2 |
3 | 4 [ENTER] | 3 2 4 |
4 | * | 3 8 |
5 | + | 11 |
6 | 5 [ENTER] | 11 5 |
7 | 9 [ENTER] | 11 5 9 |
8 | * | 11 45 |
9 | + | 56 |
10 | 8 [ENTER] | 56 8 |
11 | * | 448 |
Or, in shorthand, "3 [ENTER] 2 [ENTER] 4 * + 5 [ENTER] 9 * + 8 *". Alternately, once could opt for "3 [ENTER] 2 [ENTER] 4 *5 [ENTER] 9 * + + 8 *". In Meculcalator, you would just just type the following command: "3 2 4 *5 9 * + + 8 *". Easy ;)
Stacks And Objects
In addition to arithmetic operations, most RPN calculators -- Meculcalator included -- offer a set of "stack manipulation" commands. Examples may include a DUP command that creates a duplicate of the stack's head (so the stack "3 2 1" becomes "3 2 1 1"), DROP that pops an item from the stack without pushing any result, SWAP that exchanges the two topmost items, and many others. These operations are fundamental to working with RPN calculators, both when doing manual calculations and when writing programs to automate computation. It is very common for RPN programs to perform stack manipulation to, for example, store temporary values and fetch them at a later stage, or perform multiple computations with the same operands.
With Meculcalator, inspired by the HP-48, objects on the stack don't have to be numbers. Additional objects that may be put on the stack include strings, lists, programs, matrices and vectors. And so, one can, for example, push two matrices on the stack and multiply them. Or a matrix and a scalar. While many objects are small scalars, some may turn to be quite big. A matrix is one example; a graphics bitmap is another.
Objects on the stack are all immutable -- meaning once they're on the stack their value cannot change. Consider for example a vector PUT command, that pops a vector x, index i and value v from the stack, and pushes a new vector, x' which is the original vector x with value v put in index i. So, "[1 2 3 4 5] [ENTER] 3 [ENTER] 9 PUT" results in the vector [1 2 9 4 5] on the stack. Now suppose that we have a vector x in the stack, and we issue a DUP command. Our stack now looks like "[1 2 3 4 5] [1 2 3 4 5]". Issuing a PUT command now will now replace the copy at the stack-head with a new vector, but the original vector (the one we DUPed) remains untouched; the stack now contains "[1 2 3 4 5] [1 2 9 4 5]". At last, some intuitive behavior!
Eliminating Copies
From a functional perspective, the DUP command created a copy of the head of the stack and pushed it. This means that there are two independent copies of the object. Moreover, the PUT operation, conceptually, consumed a vector and then created a third copy consisting of the original vector with a replaced element. For a big enough object, both these "copying" operations may become costly, and it's interesting to understand if we can optimize them away.
Let's start with the second copy operation we discussed -- the one done by the PUT command. In a world where DUP clones an object, the PUT operation's implementation can assume that once it popped the vector it will no longer be used by the system (as it's no longer on the stack). With this assumption, PUT's implementation may actually modify the vector it popped in-place, and push it again to the stack. With this simple optimization, when we perform a DUP we copy the object, but PUT can actually skip the copy and mutate the popped object. The same principle can apply to many other commands that "mutate" a big object on the stack -- these commands may modify the object they pop in-place and then push it again on the stack, thus avoiding creating a copy of the mutated object. In all of these cases, the optimization relies on the fact that once popped, an object is no longer used by the system and hence may be modified and reused.
Now we turn to the copy operation performed as part of the DUP command. Observe that issuing a DUP command doesn't necessitate a future modification of the duplicated object. Consider, for example, the following sequence of operations issued to sum the first two elements of a vector: "[ 2 3 4 ] DUP 1 GET DUP 2 GET +" (I'm sure you can guess what GET does). At the end of this sequence, the stack contains "[2 3 4] 5". During the computation we made 2 copies of the vector [2 3 4], and none of them was mutated. Having to copy a vector just to retrieve one of its elements definitely looks wasteful. An immediate trivial optimization here would be to implement DUP such that it just adds another reference to the same object we already have on the stack -- rather than copying the entire object. Copying a reference is obviously much cheaper than copying an entire big object. The same optimization may be applied to various other stack manipulation commands that may create copies of objects.
Introducing Copy-on-Write
What if we apply both the optimization that eliminates the copy during the mutating call and the optimization the prevents copying objects on DUP by pushing additional references to the same object on the stack? Well, here we run into a problem: the eliminate-copy-on-mutation optimization described above assumes that once popped the object is not used by the program hence it can be mutated in-place. But if that object was DUPed previously, then there's another reference to that object on the stack, which means that changes to the mutated object will impact the existing reference as well. If we naively apply both optimizations then the sequence "[1 2 3] DUP 2 3 PUT" will end with the stack: "[1 3 3] [1 3 3]" which is clearly not what a reasonable user expects from a reasonable, albeit RPN, calculator (notice how the first vector was mutated too).
What we would really like to do is to avoid copying an object as part of a DUP operation, and defer this copying only when we're sure it's needed -- specifically if there's an attempt to mutate the object through one of multiple references. This means that "[1 2 3] 2 9 PUT" shouldn't perform a copy (only a single reference to the vector exists so it can be modified in-place), and "[1 2 3] DUP" shouldn't perform a copy (we never copy on DUP), however "[1 2 3] DUP 2 9 PUT" should copy the input to PUT rather than modifying it in place since it has multiple references.
This approach is called Copy-on-Write, and is a common optimization in de-duplication systems. Operating systems use this approach when they share writable pages across processes; for example when forking a process in many Unix flavors, most memory pages of the child process are shared with the parent to avoid duplicating the amount of memory needed (and the cost of copying) when forking. However, as either process (child or parent) writes to these memory pages shared between them, the operating system will first create a copy of the mutated page and then update the page table to refer to the copied page for the mutating process thus preserving memory isolation between the processes.
So, the pseudo-code for a mutating operation such as PUT with a copy-on-write approach would look like the following:
Pop from the stack, in this order: v -- a new value for an element in the vector, i -- an index into the vector and x the vector to mutate.
If x has additional references to it in the stack, let x' be a copy of x. Otherwise let x' be x itself.
Set x'[i] := v
Push x'.
The way step 2 above is stated, it is overly simplified. The reality of things is that references to objects may reside in different places other than the stack. In Meculcalator, for example, these references be stored in named variables which are not part of the stack. References may also be embedded within other objects (e.g one can create a list object that refers to a vector as one of its elements). It quickly becomes impractical to enumerate all references to an object, thus we need a more scalable approach.
Reference Counting
Instead of trying to enumerate all references to an object, we can try to simply keep track of how many references the object has using a good-old reference counter. Of course, with reference counting we are required to be diligent about... counting references. Whenever a new reference to an object is formed its reference counter should be incremented, and when the reference no longer exists the counter should be decremented.
Some languages allow creating constructs that make it very easy to apply such discipline. In C++, for example, through the use of constructors and destructors we can track the creation and destruction of references. In this setup the "reference" is in fact an object by itself that points to the referenced object (and possibly to a reference counter). Using operator overloading and user-defined conversions in C++ this reference object can be made to behave syntactically very much like a regular pointer to the underlying object. This pattern is called smart pointers. Whenever a reference counting smart pointer is constructed, assigned from a different smart pointer, or destructed, the corresponding reference counters are updated. Indeed in C++ the standard library includes a reference-counting smart pointer implementation (std::shared_ptr) that allows managing the lifecycle of an object through reference counting (so the referenced object is deleted when the last reference to it is destructed) -- without the underlying object having to implement anything to benefit from this functionality.
Unfortunately in JavaScript we don't have destructors, assignment operator overloading and in general the pathway to implement such reference counting mechanisms is less obvious. The rest of this post attempts to describe how I'm approaching this in Meculcalator.
Manual Reference Counting
An obvious approach to perform reference counting is to do this manually. So, any code that creates another reference to an object will invoke a call to increment the reference count on that object. So, when pushing an object to the stack, we'll do something like:
function push(object) {
stack.push(object);
object.addRef();
}
Or when adding an object to a list:
function listSet(listObject, index, element) {
listObject[index] = element;
element.addRef();
}
We could stop right here. We don't really have to track when references are released to gain some optimization value; we already gain some improvements by just tracking when an object was ever referenced by more than one reference. With that tracking alone, for "[1 2 3] DUP" there will be no copy, and for "[1 2 3] 2 9 PUT" there will be no copy. For "[1 2 3] DUP 2 9 PUT" there will be an unavoidable copy (since DUP incremented the reference count when the reference to the object was pushed to the stack). However since we don't track reference release -- "[1 2 3] DUP DROP 2 9 PUT" will still copy the object even though we could do without that copy. Granted, this is probably a less common case. But we're not here to let these arguments stop us from continuing our exploration :) So let's try to tackle this too.
Here to we could take a manual approach: just make sure to remove the reference when it's no longer valid. Unfortunately, this is easier said than done. We would first need to have all "root" references, such as the stack, make sure they release references. This is pretty much as easy as adding addRef calls where it makes sense. However, any object that may hold a reference to another object (e.g a List object) must also, when its last reference is released, release all object references it holds. This is where things get contrived; any new class implemented needs to follow this convention which is not always trivial todo.
Can we do better?
Automatic Reference Counting Using FinalizationRegistry
It turns out that we can automate reference count release if we're willing to accept some inaccuracies where reference release is counted later than it actually occurred. To do this, we turn to the FinalizationRegistry JavaScript API.
FinalizationRegistry allows JavaScript code to register a callback to invoke when an object is garbage-collected. Client code first constructs a FinalizationRegistry object, passing it a callback to invoke for each garbage-collected object, and then registers objects the garbage collection of which should be tracked by invoking FinalizationRegistry.register(object, held). At the time of registration an object associated with the object to track is passed to register() as the held parameter; this object will be passed as an argument to the callback when the registered object is deleted. This is done so that the callback won't need to receive an object that's been garbage-collected. To illustrate, the code below will log "hello" when object c is garbage collected:
const fr = new FinalizationRegistry((m) => console.log(m.message));
const c = {};
fr.register(c, {"message": "hello"});
With this tool in our belt, we can try the following approach:
Create a "reference" class (CowRef) that represents a counted reference to an object.
Create a "reference counter" class (CowRefcountInfo) that holds the reference count for an object.
Maintain a map between reference counted objects and their CowRefcountInfo objects. This is a weak map so that it won't keep the reference counted objects alive.
When a new "reference" class is constructed for a referenced object:
Fetch the CowRefcountInfo object for the referenced object if one exists or create a new one if it doesn't.
Increment the reference count in the RefCountInfo object.
Register the new reference class with a FinalizationRegistry, with the CowRefcountInfo object as its held object to pass to the finalization callback.
The FinalizationRegistry callback decreases the reference count of the CowRefcountInfo it receives.
The code below illustrates this:
class CowRef1<T extends object> {
constructor(private _referenced: T, private _refcountInfo: CowRefcountInfo) {}
newRef(): CowRef1<T> {
return CowSupport1.newRef(this._referenced);
}
refCount(): number {
return this._refcountInfo.refCount;
}
get(): T {
return this._referenced;
}
}
type CowRefcountInfo = {
refCount: number;
}
class CowSupport1 {
static newRef<T extends object>(subject: T): CowRef1<T> {
let refcountInfo = CowSupport.existingRefManagersMap.get(subject);
if(!refcountInfo) {
refcountInfo = {
refCount: 0
}
CowSupport.existingRefManagersMap.set(subject, refcountInfo);
}
const cowRef = new CowRef1<T>(subject, refcountInfo);
refcountInfo.refCount ++;
CowSupport.finalizationRegistry.register(cowRef, refcountInfo);
return cowRef;
}
static finalizationRegistry = new FinalizationRegistry((heldValue: CowRefcountInfo) => {
heldValue.refCount --;
});
static existingRefManagersMap = new WeakMap<object, CowRefcountInfo>();
}
Client code of the example above needs to follow only these conventions for this to work:
Use CowRef1<T> to hold reference counted references.
Use CowRef1.get to access the referenced object.
Use CowRef1.refCount to query the reference count of the referenced object, and based on that result decide whether an object should be cloned or not.
We can also implement a CowRef1.ensureWritable() call if we assert that an object supports a cloning operation:
class CowRef1<T extends ICloneable> {
...
ensureWritable(): CowRef1<T> {
return !this.refCount()
? this.newRef()
: CowSupport1.newRef(this.get().clone());
}
...
}
Note that decrementing a reference count with this mechanism depends on the reference being garbage collected. This means that in some cases the reference may be counted for some time after it no longer exists. This, in turn could lead to "redundant" copies. We could add an explicit reference release call -- and hope that client code would call that correctly. Again -- if some of these calls are missing, the impact will only be reduced gain from the optimization.
class CowRef1<T extends ICloneable> {
...
constructor(private _referenced: T, private _refcountInfo: CowRefcountInfo) {
this._token = {};
}
release() {
this._referenced = null;
this._refcountInfo.refCount--;
this._refcountInfo = null;
CowSupport1.releaseRef(this._token);
}
...
}
class CowSupport1 {
...
static releaseRef<T extends object>(token: object): void {
CowSupport.finalizationRegistry.unregister(token);
}
...
}
In Meculcalator, since I've started with fully manual reference counting and experienced how difficult it is to maintain proper manual reference deletion, I intend to only use the automated release pattern and not have explicit release() calls throughout most of the code.
Closing Notes
Note that our "reference" objects vary in their interface from smart pointers in C++: they are not syntactically transparent. Client code using a reference needs to go through the get() call to grab a hold of the underlying object (so, code looks like ref.get().invokeSomeMethod() rather than ref.invokeSomeMethod()). One possible improvement here could be using proxies to make the reference transparent and forward all underlying calls to the referenced object. Care needs to be taken when implementing such a proxy to preserve prototype chains so that code like ref instanceof SomeBaseClass works as client code expects it to. This can be done by implementing the Proxy handler's getPrototypeOf.
JavaScript FinalizationRegistry's API may seem odd at first sight: the fact we need to register a "holdedValue" that will be passed to the callback instead of just receiving the finalized object may raise an eyebrow. Other managed heap runtimes take a seemingly less awkward -- e.g Java with its finalize() method. In fact, JavaScript's approach actually reduces confusion: the use of a holded-value provided during registration and passed to the finalization callback means that the finalization callback cannot retain a reference to the finalized object. This is in contrary to runtimes like Java where finalize(), being is a method of the reclaimed object, may make a reference to the object visible to the program again (e.g by adding it to a global list of references). This is known as "resurrection" and has some confusing semantics (for example, once resurrected the object's finalizer will no longer be invoked even if it's reclaimed again).
This is one of those rare cases where FinalizationRegistry may be useful. The many caveats of how this mechanism behaves in a GC'ed environment make it very risky to rely on for most uses (e.g consider that there's no guarantee for finalizer callbacks to be ever invoked); tread safely here.
I haven't started embedding this mechanism in Meculcalator just yet, but knowing how much code I have lying there that implements manual reference counting, and how many times I scratched my head trying to identify what's the proper points to manually release references I'm really excited with this solution.
Comments