SNAP Sync Resolution — Fukuii Alpha Branch¶
Branch: alpha
Authors: Christopher Mercer + Claude Opus 4.6
Date: 2026-03-07
Commits: eeb814779 (optimization suite) + 6b029ed09 (stale peer fix) + pending (stale-root guard)
Background: SNAP Sync Before This Work¶
Fukuii inherited its SNAP sync implementation from Mantis (IOHK). The original design had the basic protocol structure in place — account range requests, Merkle proof verification, storage downloads, bytecode fetching, and trie healing — but had never been tested against a live ETC mainnet peer serving the snap/1 protocol.
Original Architecture¶
SNAPSyncController
├── AccountRangeCoordinator (N workers)
│ └── AccountRangeWorker → GetAccountRange requests
├── StorageRangeCoordinator (M workers)
│ └── StorageRangeWorker → GetStorageRanges requests
├── ByteCodeCoordinator (K workers)
│ └── ByteCodeWorker → GetByteCodes requests
└── TrieNodeHealingCoordinator (L workers)
└── HealingWorker → GetTrieNodes requests
The sync process divides the 256-bit address keyspace into ranges (originally 4, now 16), assigns each to a worker, and downloads accounts in batches. Each response includes Merkle proofs for verification. When accounts are done, it proceeds to bytecode, then storage, then trie healing.
What Worked¶
- Protocol message encoding/decoding (GetAccountRange, AccountRange, etc.)
- Merkle proof verification
- Account storage into RocksDB MPT
- Basic worker lifecycle (create, dispatch, handle response)
- SNAP-to-fast-sync fallback (eventually)
- Integration with the broader sync pipeline
What Was Broken¶
When we began testing on ETC mainnet with a local core-geth peer serving snap/1, we found 8 distinct problems that collectively made SNAP sync non-functional for production use:
- SNAP→Fast Sync fallback too slow — when SNAP failed, it took ~75 minutes (5 pivot refresh cycles × 15 min each) to fall back to fast sync
- No snap capability detection — sync started before verifying any peer supported snap/1
- False stagnation detection — the watchdog killed sync prematurely
- No progress preservation across pivot changes — all downloaded accounts lost on pivot refresh
- Fixed concurrency regardless of peer count — flooded single peers with parallel requests
- Stop/restart on pivot refresh — destroyed coordinator state unnecessarily
- Stale peer accumulation — inflated peer count from reconnections
- False stateless marking after pivot refresh — in-flight stale-root requests falsely marked peers as stateless for new root
Fix 0: SNAP→Fast Sync Fallback Acceleration (Bug 2)¶
Problem¶
When SNAP sync couldn't make progress (e.g., no snap-capable peers, or all peers became stateless repeatedly), the fallback to fast sync took ~75 minutes. The original code required 5 full pivot refresh cycles before triggering the fallback, and each cycle lasted ~15 minutes (the snap serve window expiry + restart overhead).
Root Cause¶
Two issues combined:
-
No consecutive pivot tracking — The controller restarted SNAP sync from scratch on each pivot change via
restartSnapSync(). This reset all internal state including any failure counters. There was no memory of how many times SNAP had failed across restarts. -
Counter reset in
restartSnapSync()— Even after we added a consecutive empty pivot counter,restartSnapSync()reset it to zero, defeating the purpose.
Fix¶
Added consecutivePivotRefreshes counter that persists across restartSnapSync() calls. After 3 consecutive unproductive pivot refreshes (configurable), the controller falls back to fast sync immediately instead of retrying SNAP. This reduced worst-case fallback time from ~75 minutes to ~6 minutes.
private var consecutivePivotRefreshes = 0 // NOT reset in restartSnapSync()
// In pivot refresh handler:
consecutivePivotRefreshes += 1
if (consecutivePivotRefreshes >= maxConsecutivePivotRefreshes) {
fallbackToFastSync(s"$consecutivePivotRefreshes consecutive stateless pivots")
}
Where¶
SNAPSyncController.scala—consecutivePivotRefreshes,PivotStateUnservablehandler,restartSnapSync()
Fix 1: SNAP Capability Check (Bug 11)¶
Problem¶
When Fukuii connected to the ETC network, it found peers via eth/67 handshake but many peers don't support snap/1 (the ETC Coop bootnodes initially didn't have --snapshot enabled). The controller launched account range workers immediately, which then timed out trying to send GetAccountRange to peers that couldn't handle them.
Root Cause¶
launchAccountRangeWorkers() counted all peersToDownloadFrom without filtering for snap capability. The controller assumed all handshaked peers could serve snap requests.
Fix¶
Added snap peer count check at sync start:
val snapPeerCount = peersToDownloadFrom.count { case (_, p) =>
p.peerInfo.remoteStatus.supportsSnap
}
If snapPeerCount == 0, schedule a grace period check (snapCapabilityGracePeriod, default 30s) before falling back to fast sync. This gives time for snap-capable peers to connect after discovery.
Where¶
SNAPSyncController.scala — launchAccountRangeWorkers() and new CheckSnapCapability message handler.
Fix 2: Stagnation Watchdog (Bug 12)¶
Problem¶
The stagnation watchdog timer (default 180s) checked whether the accountsDownloaded count had changed since the last check. But the liveness signal was only updated when a full task completed — meaning when an entire 1/16th of the keyspace was traversed. On ETC mainnet, each range takes ~200 seconds to complete, which exceeded the 180-second threshold. The watchdog triggered a false stagnation detection even though accounts were actively being downloaded.
Root Cause¶
lastCompletedTaskCount was the sole liveness metric, and it incremented only on task completion (not on intermediate progress).
Fix¶
Changed the liveness signal to track accountsDownloaded (the total account count) instead of task completions. Since each response downloads ~32K accounts, this counter advances every few seconds — well within the 180s window.
// Before: only task completions counted as liveness
if (completedTasks.size == lastCompletedTaskCount) → stagnation
// After: account downloads count as liveness
if (accountsDownloaded == lastAccountsDownloaded) → stagnation
Observed Behavior¶
On the previous run, the watchdog had falsely triggered at ~5M accounts because no single 1/16th range had completed yet. After the fix, the watchdog correctly detected actual stagnation (when peers stop responding) while ignoring the slow-but-steady progress within ranges.
Where¶
AccountRangeCoordinator.scala — CheckAccountStagnation handler.
Fix 3: Partial Range Resume (Bug 13)¶
Problem¶
The most critical issue. ETC mainnet's snap serve window is approximately 128 blocks (~10-16 minutes). After this window, peers stop serving the old state root and return empty responses. The coordinator detected this, marked peers stateless, and requested a pivot refresh.
The original code preserved progress using preservedCompletedRanges: Set[ByteString] — a set of range endpoints (task.last) for fully-completed ranges. On restart, these ranges were skipped.
The problem: With 16 ranges each covering 1/16th of the 256-bit keyspace, and only ~5% of the keyspace downloadable per window, no range ever fully completed before the window expired. Therefore preservedCompletedRanges was always empty, and every restart re-downloaded everything from scratch. Progress was impossible.
Core-geth's Approach¶
We studied core-geth's source code to understand how it handles this:
- 16 account tasks, each with
NextandLastfields Nextadvances after each successful response:task.Next = incHash(lastHash)- Progress persisted to DB via
saveSyncStatus()— serializes all 16 tasks with currentNextpositions - On pivot change: start new sync with new root but load saved task progress — resume from saved
Nextpositions - Content-addressed MPT means accounts stored under an old root are valid under a new root (within ~256 blocks)
Fix¶
Replaced preservedCompletedRanges: Set[ByteString] with preservedRangeProgress: Map[ByteString, ByteString] — a map of task.last → task.next for ALL ranges (pending, active, and completed).
On coordinator stop (postStop()):
override def postStop(): Unit = {
sendProgressSnapshot()
super.postStop()
}
private def sendProgressSnapshot(): Unit = {
val progress: Map[ByteString, ByteString] =
(pendingTasks.iterator ++ activeTasks.values.map(_._1) ++ completedTasks)
.map(t => t.last -> t.next)
.toMap
snapSyncController ! AccountRangeProgress(progress)
}
On restart (launchAccountRangeWorkers()):
- Pass preservedRangeProgress to the new coordinator
- Coordinator creates tasks with next = resumeProgress.getOrElse(task.last, task.originalStart)
- Already-traversed keyspace is skipped — accounts are already in the MPT
Safety valve: If pivot drifts more than 256 blocks (MaxPreservedPivotDistance), clear preserved progress — the MPT data may be stale.
Where¶
SNAPSyncController.scala—preservedRangeProgress,AccountRangeProgresshandlerAccountRangeCoordinator.scala—resumeProgressconstructor param,postStop(),sendProgressSnapshot()Messages.scala—AccountRangeProgress(progress: Map[ByteString, ByteString])messageAccountTask.scala—remainingKeyspacemethod for priority queue ordering
Fix 4: Dynamic Concurrency (Bug 14)¶
Problem¶
The original code launched a fixed number of workers based on accountConcurrency config (default 16). With only 1-4 actual snap-capable peers, this meant multiple workers sent requests to the same peer simultaneously. Peers like core-geth handle snap requests sequentially — concurrent requests to the same peer queue up and slow down, creating the appearance of stagnation.
Fix¶
Cap workers to the actual number of snap-capable peers:
val snapPeerCount = peersToDownloadFrom.count { case (_, p) =>
p.peerInfo.remoteStatus.supportsSnap
}
val effectiveConcurrency = math.min(snapSyncConfig.accountConcurrency, snapPeerCount).max(1)
This creates a 1:1 worker-to-peer mapping. With 1 snap peer, 1 worker; with 4 snap peers, 4 workers; with 50 snap peers, capped at 16 (the configured max).
Priority Queue Dispatching¶
As part of this change, we also switched from FIFO task dispatching to a priority queue ordered by remaining keyspace (smallest-remaining-first):
private val pendingTasks = mutable.PriorityQueue[AccountTask](remainingTasks: _*)(
Ordering.by[AccountTask, BigInt](_.remainingKeyspace).reverse
)
This ensures nearly-complete ranges finish first, freeing workers for other work. Each completed range represents a guaranteed piece of progress that doesn't need re-downloading.
Where¶
SNAPSyncController.scala—effectiveConcurrencycalculationAccountRangeCoordinator.scala—PriorityQueue,activePeerCount, worker cap logic
Fix 5: In-Place Pivot Refresh (Bug 15)¶
Problem¶
When all peers became stateless (serve window expired), the original code path was:
- Coordinator detects all peers stateless → sends progress snapshot → requests refresh
- Controller calls
restartSnapSync()→ stops coordinator actor → creates new coordinator
This stop/restart cycle destroyed all in-memory state (worker pool, adaptive byte budgets, cooling-down peer tracking). The new coordinator started cold. More importantly, there was a race condition: postStop() sent the progress snapshot, but the controller might not process it before creating the new coordinator.
Fix¶
Added a PivotRefreshed(newStateRoot: ByteString) message that updates the coordinator's state root in place without stopping the actor:
case PivotRefreshed(newStateRoot) =>
stateRoot = newStateRoot
// Clear stateless tracking — peers should be fresh for new root
statelessPeers.clear()
pivotRefreshRequested = false
// Update pending tasks with new state root (active tasks will use new root on retry)
// ... re-enqueue tasks that were active
tryRedispatchPendingTasks()
The controller sends this message instead of restarting the coordinator when the pivot change is within the preserve distance:
case AccountRangeProgress(progress) if accountRangeCoordinator.isDefined =>
// In-place refresh: forward new root to existing coordinator
coordinator ! PivotRefreshed(newStateRoot)
Observed Behavior¶
On live ETC mainnet testing, we observed 7 seamless pivot refreshes over ~110 minutes with zero progress lost. The coordinator simply cleared its stateless peer list, updated the root, and continued downloading from where it left off. The postStop() path now serves as a safety net for the restartSnapSync() path (which is still used for full restarts).
Where¶
AccountRangeCoordinator.scala—PivotRefreshedhandlerSNAPSyncController.scala— in-place refresh vs restart decisionMessages.scala—PivotRefreshed(newStateRoot: ByteString)message
Fix 6: Stale Peer Accumulation (Bug 16)¶
Problem¶
Progress logs showed "1 workers/4 peers" when only 1 physical snap peer existed. Investigation revealed that knownAvailablePeers: mutable.Set[Peer] in all 3 coordinators (Account, Storage, Healing) only grew — peers were added on PeerAvailable but never removed on disconnect.
Root Cause¶
Peer is a case class with PeerId derived from ActorRef.path.name. Each time a physical node reconnects, it creates a new Peer instance with a different PeerId but the same remoteAddress. The set accumulated stale entries.
The controller's PeerListSupportNg trait correctly handles peer disconnects via PeerDisconnected(peerId) => removePeerById(peerId), keeping peersToDownloadFrom accurate. But it doesn't forward disconnect signals to coordinators, and removePeerById is private in the trait.
Impact¶
activePeerCount (which caps worker count) counted all entries in knownAvailablePeers that weren't marked stateless or cooling-down. Stale entries from disconnected peers passed this filter, so the worker cap was inflated — creating multiple workers all targeting the same physical peer.
Fix¶
Deduplicate by remoteAddress when a new PeerAvailable arrives — evict stale entries for the same physical node before adding the new one:
case PeerAvailable(peer) =>
// Evict stale entry for same physical node (reconnection creates new PeerId)
val evicted = knownAvailablePeers.filter(_.remoteAddress == peer.remoteAddress)
knownAvailablePeers --= evicted
evicted.foreach(p => statelessPeers -= p.id)
knownAvailablePeers += peer
This is a 3-line addition per coordinator. No new messages, no trait modifications, no controller changes.
Where¶
AccountRangeCoordinator.scala—PeerAvailablehandlerStorageRangeCoordinator.scala—StoragePeerAvailablehandlerTrieNodeHealingCoordinator.scala—HealingPeerAvailablehandler
Fix 7: False Stateless Marking After Pivot Refresh (Bug 17)¶
Problem¶
After an in-place pivot refresh, all peers were immediately re-marked as stateless for the new root — within milliseconds. This caused ~2 minutes of wasted thrashing per pivot refresh cycle, with peers repeatedly marked stateless, backed off, and triggering unnecessary additional pivot refreshes.
Live log evidence (ETC mainnet, 2026-03-07):
07:56:15,450 — All 4 stateless for root 13542126 → Pivot refreshes to f35e2bcb
07:56:15,487 — Peer marked stateless for NEW root f35e2bcb (37ms later!)
07:56:16-20 — All 4 re-marked stateless every second → backoff (1s/60s, 2s/60s, ...)
07:58:15 — Backoff expires → ANOTHER pivot refresh (unnecessary)
Root Cause¶
handleTaskFailed() unconditionally called markPeerStateless(peer, reason) without checking whether the failed request used the current state root or a stale one.
After PivotRefreshed updates stateRoot and clears statelessPeers, in-flight workers still have requests dispatched with the old root. These requests complete quickly (peers respond with "Missing proof for empty account range" because the proof doesn't match). The coordinator receives TaskFailed, calls markPeerStateless, and marks the peer stateless for the new root — even though the peer can serve the new root perfectly fine.
Fix¶
Added a stale-root guard: only mark peers stateless when the failing task's rootHash matches the current stateRoot.
private def handleTaskFailed(requestId: BigInt, reason: String): Unit = {
activeTasks.remove(requestId).foreach { case (task, worker, peer) =>
// Only mark peer stateless if the task was using the CURRENT root.
// After pivot refresh, in-flight requests with the OLD root will fail
// but this doesn't mean the peer can't serve the NEW root.
if (task.rootHash == stateRoot) {
markPeerStateless(peer, reason)
} else {
log.info(s"Ignoring failure from stale-root request (...)")
}
task.rootHash = stateRoot // Update to current root before re-enqueue
pendingTasks.enqueue(task)
Impact¶
- Eliminates ~2 minutes of wasted time per pivot refresh cycle
- Prevents unnecessary cascading pivot refreshes
- With 7+ refreshes per sync, saves ~14+ minutes of dead time
- Download rate maintained continuously instead of stop-start pattern
Where¶
AccountRangeCoordinator.scala—handleTaskFailed()method
Additional Optimizations¶
Cumulative Keyspace Tracking¶
Added consumedKeyspace: BigInt that increments monotonically as accounts are downloaded. This provides accurate progress percentage across pivot refreshes:
private var consumedKeyspace: BigInt = BigInt(0)
// In updateTaskProgress:
consumedKeyspace += (newNext - oldNext).abs
val pct = (consumedKeyspace * 1000 / totalKeyspace).toDouble / 10.0
Adaptive Response Sizing¶
Initial response size set to 2MB (core-geth's handler limit) instead of 512KB. Peers return what they can handle, and the adaptive logic scales down on failures. This maximizes throughput from the first request.
Logging Cleanup¶
Moved high-frequency chunk/response logs to DEBUG level (94% noise reduction). Added periodic 100K-account progress logs, range completion logs, and preserved range logs at INFO level.
Fix 8: SNAP OOM + Periodic Trie Flushing (Bug 18)¶
Root cause: DeferredWriteMptStorage kept ALL trie nodes in memory, only flushing once at finalization. At ~420 bytes/account, OOM at 9.5M accounts (4GB heap) or 19.3M (8GB heap).
Fix: Added periodic flush after each response batch (~32K accounts), bounding peak memory to ~13MB per batch. Also wired disk persistence for account range progress via AppStateStorage.putSnapSyncProgress with serialize/deserialize, crash recovery on restart (256-block pivot safety valve), and clear on phase completion/fallback.
Files changed:
- AccountRangeCoordinator.scala — periodic flush trigger after each batch
- DeferredWriteMptStorage.scala — flush method, batch tracking
- AppStateStorage.scala — putSnapSyncProgress, getSnapSyncProgress
Verification (trie flush): 303K accounts downloaded, 7 flushes, stable 833MB RSS, zero OOMs. Progress persists across restarts.
Contract Accounts OOM (second memory source)¶
Root cause: contractAccounts and contractStorageAccounts in AccountRangeCoordinator were mutable.ArrayBuffer[(ByteString, ByteString)] that grew unbounded. On ETC mainnet, ~85% of accounts are contracts (GasToken-style state bloat from pre-Mystique era), producing ~45M entries consuming ~5-6GB on a 4GB heap.
Fix: Replaced in-memory ArrayBuffers with file-backed storage:
- Two temp files: fukuii-contract-accounts-*.bin and fukuii-contract-storage-*.bin
- Fixed 64-byte entries (32-byte hash + 32-byte codeHash/storageRoot)
- BufferedOutputStream (64KB buffer) for writes during identifyContractAccounts()
- RandomAccessFile for reads when GetContractAccounts/GetContractStorageAccounts queried
- Files cleaned up in postStop()
- Only count kept in memory for logging
Verification (contract accounts): At 62% keyspace: temp files 2.8GB each on disk, JVM RSS 1.9GB, zero OOM. Previous crash at 30%.
Fix 9: Bootstrap Retry Resilience (Bug 2) + Log File Resilience (Bug 19)¶
Root cause: The bootstrap retry loop (LocalPivot branch in startSnapSync()) ran at a fixed 2s interval indefinitely when no peers were available. This code path is completely separate from the PivotStateUnservable → consecutivePivotRefreshes → fallbackToFastSync() chain — it has no timeout, no backoff, and no connection to recordCriticalFailure(). Result: 5,260+ retries over 3+ hours with no escalation to fast sync, despite core-geth running on localhost.
Fix (4 changes):
1. Exponential backoff: 2s → 4s → 8s → 16s → 32s → 60s cap. Reduces log spam from 5,260 entries to ~250.
2. 5-minute timeout: After MaxBootstrapRetryDuration, calls fallbackToFastSync().
3. Periodic diagnostics: Every ~5 retries, logs handshaked peer count, snap-capable count, and elapsed time.
4. Timer reset: bootstrapRetryCount and bootstrapRetryStartMs reset when peers are found (NetworkPivot selected).
Also fixed a pre-existing bug: stale retry code inside the Some(header) match case (header IS available but code was scheduling a retry instead of proceeding).
Also: Log file resilience. Custom ResilientRollingFileAppender extends RollingFileAppender, checks file existence every 100 log events, and reopens if deleted. Solves silent log loss when log files are deleted while running (logback's default holds a dangling file descriptor). Applied to all 7 logback configs.
Live Test Results¶
Test 1: Single Snap Peer (2026-03-06, pre-fix baseline)¶
- Setup: Fukuii on ETC mainnet, 1 snap-capable peer (local core-geth with
--snapshot) - Result: Downloaded ~5M accounts per window, zero progress preserved across restarts
- Failure mode: Stagnation watchdog false-triggered after ~5M accounts (180s without task completion)
Test 2: Post-Fix, Single Snap Peer (2026-03-06)¶
- Setup: Same as Test 1, all fixes applied
- Result: 7 seamless pivot refreshes over ~110 minutes
- Progress: 0.2% → 11.2% keyspace (~9.6M accounts)
- Download rate: ~1,500 accounts/sec (1 peer)
- Pivot refresh: In-place, zero progress lost
- Dynamic concurrency: 1 worker for 1 snap peer (correct)
Test 3: Multiple Snap Peers (2026-03-07)¶
- Setup: Fukuii on ETC mainnet, 4 snap-capable peers (local core-geth + 3 ETC Coop bootnodes)
- Result:
4 workers/4 peers— correct 1:1 mapping - Download rate: ~5,800 accounts/sec (4 peers, ~4x improvement)
- Pivot refresh: First refresh at ~16 min (4/4 peers stateless), in-place refresh, continued downloading
- Peer deduplication: After pivot refresh, peer count correctly stayed at 4 (no stale accumulation)
Architecture After Fixes¶
SNAPSyncController
├── preservedRangeProgress: Map[ByteString, ByteString] ← NEW: survives pivot refreshes
├── effectiveConcurrency = min(configured, snapPeerCount) ← NEW: dynamic
│
├── AccountRangeCoordinator
│ ├── knownAvailablePeers (deduplicated by remoteAddress) ← FIXED
│ ├── PriorityQueue (smallest-remaining-first) ← NEW
│ ├── consumedKeyspace (monotonic progress) ← NEW
│ ├── PivotRefreshed handler (in-place update) ← NEW
│ ├── postStop() → AccountRangeProgress snapshot ← NEW
│ ├── activePeerCount → worker cap ← NEW
│ └── stale-root guard in handleTaskFailed ← FIXED: prevents false stateless
│
├── StorageRangeCoordinator (peer deduplication applied)
├── ByteCodeCoordinator (unchanged)
└── TrieNodeHealingCoordinator (peer deduplication applied)
Remaining Work¶
- Full SNAP sync completion — Run to 100% keyspace on ETC mainnet (estimated ~2.5 hours with 4 peers)
- Bytecode/storage/healing phases — Not yet exercised on live network
- Multi-peer load balancing — Currently round-robin; could benefit from latency-aware peer selection
- Metrics — No Prometheus/metrics integration for monitoring SNAP sync progress in production