SNAP Sync Performance Tuning Guide¶

Audience: Advanced node operators and performance engineers
Estimated Time: 30-45 minutes
Prerequisites: Understanding of SNAP sync basics, system performance tuning, and monitoring

Overview¶

This guide provides advanced performance tuning strategies for optimizing SNAP sync on Fukuii nodes. While the default configuration works well for most deployments, specific environments may benefit from fine-tuning based on network conditions, hardware capabilities, and operational requirements.

Performance Objectives¶

SNAP sync performance can be optimized for different objectives:

Minimize sync time: Fastest possible synchronization
Minimize resource usage: Lower CPU, memory, disk I/O, and network bandwidth
Maximize stability: Reliable sync with minimal failures
Balance: Good performance across all metrics (default)

Hardware Optimization¶

CPU¶

Impact: Affects Merkle proof verification and RLP encoding/decoding

Recommendations by CPU Type:

CPU Type	account-concurrency	storage-concurrency	Notes
2 cores	8	4	Minimum for SNAP sync
4 cores	16 (default)	8 (default)	Balanced performance
8+ cores	32	16	Maximum throughput
High-end (16+ cores)	32	16	No benefit beyond 32/16

Configuration:

snap-sync {
  # High-end CPU (8+ cores)
  account-concurrency = 32
  storage-concurrency = 16

  # Low-end CPU (2 cores)
  account-concurrency = 8
  storage-concurrency = 4
}

CPU Affinity (Linux):

# Pin Fukuii to specific CPU cores for better cache locality
taskset -c 0-7 ./bin/fukuii -Dconfig.file=conf/fukuii.conf

Memory¶

Impact: Affects LRU cache effectiveness and garbage collection

Recommendations by RAM:

RAM	JVM Heap	Cache Size	Notes
4 GB	2 GB	5,000	Minimum (may cause GC pauses)
8 GB	4 GB (default)	10,000 (default)	Balanced
16 GB	8 GB	20,000	Improved caching
32+ GB	16 GB	40,000	Maximum benefit

JVM Configuration:

# 8 GB heap for 16 GB RAM system
export JAVA_OPTS="-Xms8g -Xmx8g -XX:+UseG1GC -XX:MaxGCPauseMillis=200"
./bin/fukuii -Dconfig.file=conf/fukuii.conf

Garbage Collection Tuning:

# G1GC with aggressive tuning for low latency
export JAVA_OPTS="-Xms8g -Xmx8g \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=200 \
  -XX:G1HeapRegionSize=16m \
  -XX:G1ReservePercent=15 \
  -XX:InitiatingHeapOccupancyPercent=45 \
  -XX:+ParallelRefProcEnabled"

Cache Size Tuning (requires code modification):

Current LRU cache is fixed at 10,000 entries. For systems with more memory:

// In StorageRangeDownloader.scala
// Change from:
private val storageTrieCache = new StorageTrieCache(10000)
// To:
private val storageTrieCache = new StorageTrieCache(20000)

Disk¶

Impact: Critical for state storage performance

Recommendations by Disk Type:

Disk Type	Expected SNAP Sync Time	Optimization
HDD (7200 RPM)	12-24 hours	Not recommended
SATA SSD	3-6 hours	Good
NVMe SSD	2-4 hours	Excellent (recommended)
NVMe RAID 0	1.5-3 hours	Maximum performance

Disk I/O Optimization:

# Check current disk scheduler
cat /sys/block/nvme0n1/queue/scheduler

# Set to 'none' for NVMe (best performance)
echo none | sudo tee /sys/block/nvme0n1/queue/scheduler

# Set to 'deadline' for SSD
echo deadline | sudo tee /sys/block/sda/queue/scheduler

# Disable access time updates
sudo mount -o remount,noatime,nodiratime /data

File System Tuning:

# XFS (recommended for blockchain data)
sudo mkfs.xfs -f -d agcount=4 /dev/nvme0n1
sudo mount -o noatime,nodiratime,largeio,inode64,swalloc /dev/nvme0n1 /data

# ext4 (alternative)
sudo mkfs.ext4 /dev/nvme0n1
sudo mount -o noatime,nodiratime,data=writeback,barrier=0 /dev/nvme0n1 /data

Network¶

Impact: Affects download throughput and peer quality

Bandwidth Requirements:

Network Speed	SNAP Sync Performance	Peer Count	Notes
< 10 Mbps	Poor (8-12 hours)	5-10	Minimum
10-50 Mbps	Good (4-6 hours)	10-20	Typical home/office
50-100 Mbps	Excellent (2-3 hours)	20-30	Data center
100+ Mbps	Maximum (1.5-2 hours)	30+	High-end

Network Tuning:

# Increase TCP buffer sizes
sudo sysctl -w net.core.rmem_max=134217728
sudo sysctl -w net.core.wmem_max=134217728
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728"
sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728"

# Enable TCP BBR congestion control (Linux 4.9+)
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
sudo sysctl -w net.core.default_qdisc=fq

# Increase connection tracking
sudo sysctl -w net.netfilter.nf_conntrack_max=1000000

Configuration Tuning¶

Concurrency Optimization¶

Goal: Maximize parallelism without overwhelming peers

Testing Methodology:

Start with default (16/8)
Monitor throughput (accounts/sec, slots/sec)
Increase by 50% (24/12)
Re-measure and compare
Continue until no improvement or errors increase

Example Tuning Session:

# Baseline (default)
snap-sync {
  account-concurrency = 16
  storage-concurrency = 8
}
# Result: 5,000 accounts/sec, 100,000 slots/sec

# Test 1: Increase by 50%
snap-sync {
  account-concurrency = 24
  storage-concurrency = 12
}
# Result: 7,000 accounts/sec, 140,000 slots/sec (40% improvement)

# Test 2: Double original
snap-sync {
  account-concurrency = 32
  storage-concurrency = 16
}
# Result: 8,500 accounts/sec, 170,000 slots/sec (21% improvement)

# Test 3: Triple original
snap-sync {
  account-concurrency = 48
  storage-concurrency = 24
}
# Result: 8,600 accounts/sec, 175,000 slots/sec (1% improvement)
# Conclusion: Diminishing returns, stay with 32/16

Optimal Settings by Environment:

# Data center with excellent network and hardware
snap-sync {
  account-concurrency = 32
  storage-concurrency = 16
  storage-batch-size = 16
  healing-batch-size = 32
  timeout = 20 seconds
}

# Standard cloud VM (8 vCPU, 16 GB RAM)
snap-sync {
  account-concurrency = 16
  storage-concurrency = 8
  storage-batch-size = 8
  healing-batch-size = 16
  timeout = 30 seconds
}

# Resource-constrained environment (4 vCPU, 8 GB RAM)
snap-sync {
  account-concurrency = 8
  storage-concurrency = 4
  storage-batch-size = 4
  healing-batch-size = 8
  timeout = 45 seconds
}

Timeout Optimization¶

Goal: Balance quick failure detection with network latency

Factors to Consider: - Peer geographic distribution (local vs global) - Network latency (ping times) - Response size (larger = longer timeout)

Timeout Recommendations:

Network Conditions	Recommended Timeout	Notes
Low latency (<50ms)	15-20 seconds	Data center, same region
Medium latency (50-150ms)	30 seconds (default)	Typical internet
High latency (>150ms)	45-60 seconds	Intercontinental peers
Satellite/mobile	90-120 seconds	High latency, variable

Configuration:

snap-sync {
  # Low latency environment
  timeout = 20 seconds

  # High latency environment
  timeout = 60 seconds
}

Retry Strategy¶

Goal: Recover from transient failures without wasting time

Configuration:

snap-sync {
  max-retries = 3  # Default
  # Exponential backoff: 1s, 2s, 4s
}

Tuning for Different Environments:

# Stable network with good peers
snap-sync {
  max-retries = 2  # Fewer retries, faster failure detection
}

# Unstable network or few peers
snap-sync {
  max-retries = 5  # More retries, better resilience
}

Batch Size Optimization¶

Goal: Maximize data per request while staying within response limits

Storage Batch Size:

# Default (8 accounts per storage request)
snap-sync {
  storage-batch-size = 8
}

# High bandwidth network
snap-sync {
  storage-batch-size = 16  # More data per request
}

# Low bandwidth or slow peers
snap-sync {
  storage-batch-size = 4   # Smaller requests
}

Healing Batch Size:

# Default (16 paths per healing request)
snap-sync {
  healing-batch-size = 16
}

# Minimize healing iterations
snap-sync {
  healing-batch-size = 32  # Larger batches
}

# Reduce memory pressure during healing
snap-sync {
  healing-batch-size = 8   # Smaller batches
}

Peer Management¶

Peer Selection Strategy¶

Goal: Connect to high-quality SNAP-capable peers

Configuration:

network {
  peer {
    # Increase max peers for more SNAP sources
    max-outgoing-peers = 30  # Default: 10-15
    max-incoming-peers = 30  # Default: 10-15

    # Faster peer discovery
    peer-discovery-interval = 10 seconds  # Default: 30s
  }
}

Peer Quality Monitoring:

# Monitor SNAP-capable peers
tail -f logs/fukuii.log | grep "supportsSnap=true"

# Check peer geographic distribution
# Diverse locations = better reliability

Blacklist Management¶

Goal: Quickly identify and avoid problematic peers

Current automatic blacklisting: - 10+ total failures - 3+ invalid proof errors - 5+ malformed response errors

Tuning for aggressive blacklisting:

// In SNAPErrorHandler.scala
// Reduce thresholds for faster blacklisting
def shouldBlacklistPeer(peerId: PeerId): Boolean = {
  val failures = peerFailures.getOrElse(peerId, PeerFailureInfo.empty)
  failures.totalFailures >= 5 ||  // Reduced from 10
  failures.invalidProofErrors >= 2 ||  // Reduced from 3
  failures.malformedResponseErrors >= 3  // Reduced from 5
}

Monitoring and Profiling¶

Key Performance Indicators (KPIs)¶

Monitor these metrics during SNAP sync:

Throughput:
Accounts/sec (target: 5,000-10,000)
Storage slots/sec (target: 50,000-150,000)
Nodes healed/sec (target: 500-2,000)
Resource Usage:
CPU utilization (target: 60-80%)
Memory usage (target: <80% of heap)
Disk I/O (target: <80% of disk bandwidth)
Network bandwidth (target: <80% of available)
Reliability:
Request success rate (target: >95%)
Peer blacklist rate (target: <10%)
Retry rate (target: <20%)
Circuit breaker trips (target: 0)

Profiling Tools¶

JVM Profiling:

# Enable JMX for secure local monitoring
export JAVA_OPTS="-Dcom.sun.management.jmxremote \
  -Dcom.sun.management.jmxremote.host=127.0.0.1 \
  -Dcom.sun.management.jmxremote.port=9999 \
  -Dcom.sun.management.jmxremote.rmi.port=9999 \
  -Dcom.sun.management.jmxremote.authenticate=true \
  -Dcom.sun.management.jmxremote.password.file=/path/jmxremote.password \
  -Dcom.sun.management.jmxremote.access.file=/path/jmxremote.access \
  -Dcom.sun.management.jmxremote.ssl=true"

# Use VisualVM or JConsole to connect
visualvm --openjmx 127.0.0.1:9999

Note:
Replace /path/jmxremote.password and /path/jmxremote.access with the actual paths to your JMX password and access files.
For remote access, use an SSH tunnel and restrict port 9999 to trusted hosts via firewall.

CPU Profiling:

# Async-profiler for low-overhead profiling
./profiler.sh -d 60 -f /tmp/flamegraph.svg $(pgrep -f fukuii)

# Analyze flamegraph to identify bottlenecks
# Common hotspots: RLP encoding, proof verification, trie operations

Memory Profiling:

# Heap dump on OutOfMemoryError
export JAVA_OPTS="-XX:+HeapDumpOnOutOfMemoryError \
  -XX:HeapDumpPath=/tmp/heapdump.hprof"

# Analyze with Eclipse MAT or JProfiler

Network Profiling:

# tcpdump for packet capture
sudo tcpdump -i eth0 -w /tmp/snap-sync.pcap port 30303

# Analyze with Wireshark
wireshark /tmp/snap-sync.pcap

Performance Scenarios¶

Scenario 1: Minimize Sync Time (Data Center)¶

Hardware: - CPU: 16 cores - RAM: 32 GB - Disk: NVMe SSD - Network: 1 Gbps

Configuration:

sync {
  do-snap-sync = true

  snap-sync {
    enabled = true
    pivot-block-offset = 512  # Lower for faster catch-up
    account-concurrency = 32
    storage-concurrency = 16
    storage-batch-size = 16
    healing-batch-size = 32
    state-validation-enabled = true
    max-retries = 2  # Fail fast
    timeout = 20 seconds
  }
}

network.peer {
  max-outgoing-peers = 50
  max-incoming-peers = 50
}

JVM:

export JAVA_OPTS="-Xms16g -Xmx16g \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=100 \
  -XX:+ParallelRefProcEnabled"

Expected Result: 1.5-2 hours sync time

Scenario 2: Minimize Resource Usage (Budget VPS)¶

Hardware: - CPU: 2 cores - RAM: 4 GB - Disk: SATA SSD - Network: 100 Mbps

Configuration:

sync {
  do-snap-sync = true

  snap-sync {
    enabled = true
    pivot-block-offset = 1024
    account-concurrency = 6  # Lower for 2 cores
    storage-concurrency = 3
    storage-batch-size = 4
    healing-batch-size = 8
    state-validation-enabled = true
    max-retries = 4  # More tolerant
    timeout = 45 seconds
  }
}

network.peer {
  max-outgoing-peers = 15
  max-incoming-peers = 10
}

JVM:

export JAVA_OPTS="-Xms2g -Xmx2g \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=500"

Expected Result: 8-12 hours sync time, low resource usage

Scenario 3: Maximum Reliability (Production)¶

Hardware: - CPU: 8 cores - RAM: 16 GB - Disk: NVMe SSD RAID 1 (mirrored) - Network: 500 Mbps

Configuration:

sync {
  do-snap-sync = true

  snap-sync {
    enabled = true
    pivot-block-offset = 2048  # Extra stability
    account-concurrency = 16
    storage-concurrency = 8
    storage-batch-size = 8
    healing-batch-size = 16
    state-validation-enabled = true  # Always validate
    max-retries = 5  # More resilient
    timeout = 45 seconds  # Generous timeout
    max-snap-sync-failures = 10  # Avoid premature fallback
  }
}

network.peer {
  max-outgoing-peers = 30
  max-incoming-peers = 25
}

JVM:

export JAVA_OPTS="-Xms8g -Xmx8g \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=200 \
  -XX:+HeapDumpOnOutOfMemoryError"

Expected Result: 3-4 hours sync time, maximum reliability

Benchmarking¶

Baseline Measurement¶

Before tuning, establish baseline performance:

# Start sync with default config
./bin/fukuii -Dconfig.file=conf/fukuii.conf

# Record metrics every 5 minutes
while true; do
  echo "$(date): $(grep 'SNAP Sync Progress' logs/fukuii.log | tail -1)" >> /tmp/baseline.log
  sleep 300
done

# After sync completes, calculate average throughput
grep "accounts=" /tmp/baseline.log | awk -F'@' '{sum+=$2} END {print "Avg:", sum/NR, "accounts/sec"}'

A/B Testing¶

Compare two configurations:

# Test A: Default config
./bin/fukuii -Dconfig.file=conf/default.conf
# Record: Total sync time, avg throughput, resource usage

# Test B: Tuned config
./bin/fukuii -Dconfig.file=conf/tuned.conf
# Record: Total sync time, avg throughput, resource usage

# Compare results
# Choose configuration with best overall performance

Troubleshooting Performance Issues¶

High CPU Usage¶

Symptom: CPU at 100%, sync progressing slowly

Diagnosis:

# Check what's consuming CPU
top -H -p $(pgrep -f fukuii)

# Profile to identify hotspots
./profiler.sh -d 60 -f /tmp/cpu-profile.svg $(pgrep -f fukuii)

Solutions: - Reduce concurrency if CPU-bound - Upgrade to faster CPU - Ensure CPU governor is set to "performance"

High Memory Usage / GC Pauses¶

Symptom: Frequent GC pauses, high memory usage

Diagnosis:

# Monitor GC activity
jstat -gcutil $(pgrep -f fukuii) 1000

# Check heap usage
jmap -heap $(pgrep -f fukuii)

Solutions: - Increase heap size (-Xmx) - Tune G1GC parameters - Reduce cache sizes - Enable GC logging for analysis

Disk I/O Bottleneck¶

Symptom: High disk wait times, slow write speeds

Diagnosis:

# Monitor disk I/O
iostat -x 1

# Check if disk is saturated
iotop -o

Solutions: - Upgrade to NVMe SSD - Enable write caching - Optimize file system (XFS, noatime) - Use separate disk for logs

Network Bottleneck¶

Symptom: Low download throughput, peer timeouts

Diagnosis:

# Monitor network bandwidth
iftop -i eth0

# Check for packet loss
ping -c 100 8.8.8.8 | grep loss

Solutions: - Upgrade network connection - Increase TCP buffers - Enable BBR congestion control - Connect to geographically closer peers

SNAP Sync User Guide - Basic usage and configuration
SNAP Sync FAQ - Common questions
Monitoring SNAP Sync - Grafana dashboards
Operating Modes - Node operating modes
Node Configuration - Complete configuration reference

Last Updated: 2025-12-03
Version: 1.0
Maintainer: Fukuii Development Team

SNAP Sync Performance Tuning Guide¶

Overview¶

Performance Objectives¶

Hardware Optimization¶

CPU¶

Memory¶

Disk¶

Network¶

Configuration Tuning¶

Concurrency Optimization¶

Timeout Optimization¶

Retry Strategy¶

Batch Size Optimization¶

Peer Management¶

Peer Selection Strategy¶

Blacklist Management¶

Monitoring and Profiling¶

Key Performance Indicators (KPIs)¶

Profiling Tools¶

Performance Scenarios¶

Scenario 1: Minimize Sync Time (Data Center)¶

Scenario 2: Minimize Resource Usage (Budget VPS)¶

Scenario 3: Maximum Reliability (Production)¶

Benchmarking¶

Baseline Measurement¶

A/B Testing¶

Troubleshooting Performance Issues¶

High CPU Usage¶

High Memory Usage / GC Pauses¶

Disk I/O Bottleneck¶

Network Bottleneck¶

Related Documentation¶