FUGC is the GC of Fil-C, a C/C++ language extension for memory-safe, here's the previous post to introduce Fil-C. In this post, we will introduce FUGC briefly.
Non-stop, concurrent, mostly parallel
FUGC (Fil Unified Garbage Collector) is a non-stop, concurrent, and mostly parallel garbage collector. This means the program, often called the mutator, continues executing while garbage collection takes place, with only minimal pauses. By overlapping collection work with normal program execution, FUGC avoids the long stop-the-world interruptions typical of traditional collectors, while parallelism among GC threads helps it reclaim memory efficiently.
On-the-fly grey-stack Dijkstra
On-the-fly grey-stack Dijkstra is a variant of Dijkstra’s algorithm that avoids building and storing the full search structure in memory. It trades some performance overhead for lower memory requirements, making it practical in settings with huge graphs or limited memory.
Imagine you have a big box of toys (your RAMs) and you want to throw away the toys you’re not playing with. But some toys are connected to others with strings, and you don’t want to accidentally throw away a toy that’s still connected.
- White toys = Toys you might throw away.
- Grey toys = Toys you’re checking to see if they are safe to throw away.
- Black toys = Toys you know you’re still playing with, so you definitely keep them.
The grey-stack is like a special helper stack: it keeps track of all the grey toys that need checking. You pick a grey toy, check all the toys it’s connected to. Any connected toy that you haven’t checked yet will be colored to grey and added to the stack. Once you’re done with a toy, you color it black (safe to keep).
“On-the-fly” means you do this while playing, without stopping your game. Even if you grab a toy in the middle of cleaning, the system can handle it safely.
So basically:
- grey-stack: the list of toys to check
- on-the-fly: cleaning while still playing.
If you want to speed up the grey-stack Dijkstra GC, you can run multiple threads in parallel. This is what FUGC does, and there's a thread pool to manage these GC threads.
Moving vs. Non-moving GCs
FUGC is a non-moving GC. It means that once an object is allocated, its address never changes. This is good for C/C++, FFI (Foreign Function Interface), mmap, and fork. Because pointers remain valid. If objects could move, you’d need write barriers or indirection tables to update pointers everywhere. Non-moving avoids this headache.
However, non-moving GCs have downsides, for example, during running and reclaiming memory, fragmentation can occur, and non-moving GC can't compact memory since the pointers are fixed; another issue is that cache locality may be worse than moving GCs, which can rearrange objects to improve locality.
One of the purposes of Fil-C is to provide a drop-in replacement for malloc/free for existing C/C++ codebase, so the non-moving design is a good fit.
Before free:
p -> [ Foo object data ]
Normal free (no redirect):
p -> [ garbage / reused memory ] (X unsafe)
FUGC free with redirect:
p -> [ -> free_singleton ] (O safe)
But for stack root:
stack: p -> old Foo address (not patched) (X unsafe)
GC ignores it after marking phase
But why free_singleton? In our previous experience, people usually set the freed pointer to NULL to avoid use-after-free bug. However, FUGC replace NULL with a forged free_singleton pointer for the convenience of debugging.
About Libpas
Libpas is a Lego set for building allocators, designed for WebKit but usable in other projects like FUGC, giving you isoheaps, security, JIT-safe heaps, and flexible allocation strategies. I have to mention that the current Fil-C doesn't support JIT in a regular way due to strict memory-safe design principle, so you can't expect to implement you JIT compiler with Fil-C.
Moreover, Libpas provides SIMD-powered sweeping capability named Turbosweep.
Turbosweep
SIMD (Single Instruction, Multiple Data) is used to process those mark bits / free bits in bulk:
e.g., instead of looping through 64 objects and checking them one by one, the GC loads a vector register (128, 256, or 512 bits) and checks multiple mark bits simultaneously.
This usually bring ~10x speedup in practice.
for (i = 0; i < N; i++) {
if (!is_marked(i)) free(i);
}
This loop burns CPU cache and branch prediction.
The SIMD turbosweep doesn’t rely on special SIMD vector instructions; it simply uses 32-bit integers. Each bit encodes the live-or-dead state of one object, so a single instruction can process up to 32 objects at once. It’s SIMD in principle—single instruction, multiple data—just implemented with integers rather than vector registers.
FUGC is non-moving, so freed slots must be quickly reusable.
By using SIMD:
- The GC can sweep whole pages faster.
- Cache locality improves: mark bits are scanned in blocks.
- Free slots are batched to free-lists with minimal branching.
This is especially important because:
- The collector is parallel — each worker thread sweeps a chunk.
- SIMD makes each worker’s job dramatically faster.
- That keeps up with high mutator allocation rates (so GC doesn’t fall behind).
FUGC uses SIMD instructions (likely AVX2/AVX-512) via libpas for bulk sweeping of mark bits / free slots. It matters because sweeping is O(n) over heap size, and SIMD turns it into O(n / word-size), giving a major throughput boost.
This is what makes *turbosweep* fast enough to pair with a concurrent mutator.
A better way to memory-safe?
People can use Fil-C to replace malloc/free for memory-safe. This is usually more like a drop-in replacement without many code modification, or even rewrite with Rust.
Or even further, you can implment your own memory-safe languge with Fil-C without design your own GC with a fair performance. We really need more benchmarks and new experimental project with Fil-C before any conclusion.
Refs:
Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens.
On-the-fly Garbage Collection: An Exercise in Cooperation.