Known Issues and Solutions¶
Audience: Operators troubleshooting common problems
Last Updated: 2025-11-26
Status: Living Document
Overview¶
This document provides practical solutions for common operational scenarios with Fukuii. Each issue includes symptoms, causes, and step-by-step resolution guides. Most issues have straightforward solutions that can be applied quickly.
Table of Contents¶
- RocksDB Operations
- Temporary Directory Configuration
- JVM Optimization
- Network Connectivity
- Issue 13: Network Sync Zero-Length BigInteger ✅ Resolved
- Issue 14: ETH68 Peer Connections ✅ Resolved
- Issue 15: ForkId Compatibility ✅ Resolved
RocksDB Operations¶
RocksDB is a robust embedded key-value database used by Fukuii for blockchain data. This section covers common operational scenarios and their solutions.
Issue 1: Database Recovery After Unclean Shutdown¶
Severity: High
Frequency: Uncommon
Impact: Node fails to start
Symptoms¶
ERROR [RocksDbDataSource] - Failed to open database
ERROR [RocksDbDataSource] - Corruption: block checksum mismatch
ERROR [RocksDbDataSource] - Corruption: bad magic number
Root Cause¶
- Power loss or system crash during write operations
- Disk errors or failing storage hardware
- Out-of-memory conditions during database writes
- Improper shutdown (SIGKILL instead of SIGTERM)
Workaround¶
Option 1: Automatic repair (try first)
Option 2: Manual database repair (if auto-repair fails)
RocksDB can sometimes repair itself on restart. If not:
# Stop Fukuii
pkill -f fukuii
# Remove LOCK files (prevents "database is locked" errors)
find ~/.fukuii/etc/rocksdb/ -name "LOCK" -delete
# Remove WAL (Write-Ahead Log) if corrupted
# WARNING: Loses recent uncommitted transactions
# Only do this if node won't start
# rm -rf ~/.fukuii/etc/rocksdb/*/log/
# Restart
./bin/fukuii etc
Option 3: Restore from backup
Option 4: Resync from genesis (last resort)
# Backup keys first!
cp ~/.fukuii/etc/node.key ~/node.key.backup
cp -r ~/.fukuii/etc/keystore ~/keystore.backup
# Remove corrupted database
rm -rf ~/.fukuii/etc/rocksdb/
# Restore keys
cp ~/node.key.backup ~/.fukuii/etc/node.key
cp -r ~/keystore.backup ~/.fukuii/etc/keystore/
# Resync (takes days)
./bin/fukuii etc
Prevention (Recommended)¶
Set up proper shutdown procedures:
-
Proper shutdown procedure:
-
Enable journaling filesystem (ext4 journal, XFS):
-
Use UPS (Uninterruptible Power Supply) for physical servers
-
Regular backups: See backup-restore.md
-
Monitor disk health:
Issue 2: Optimizing RocksDB Performance¶
Severity: Medium
Frequency: Common after months of operation
Impact: Slow block imports, high disk I/O
Symptoms¶
WARN [RocksDbDataSource] - Database operation took 5000ms (expected < 100ms)
INFO [SyncController] - Block import rate: 5 blocks/second (down from 50+)
- Increasing disk usage despite stable blockchain size
- High disk I/O wait times
- Slower RPC queries
Root Cause¶
- Compaction backlog: LSM tree needs compaction but hasn't kept up
- Write amplification: Multiple rewrites of same data
- Fragmentation: SST files not optimally organized
- Insufficient free space: < 20% free prevents efficient compaction
Workaround¶
Step 1: Verify disk space
Step 2: Allow compaction to complete
# Check compaction status in logs
grep -i compact ~/.fukuii/etc/logs/fukuii.log | tail -20
# Compaction runs automatically but may take hours
# Monitor with:
watch -n 5 "du -sh ~/.fukuii/etc/rocksdb/*"
Step 3: Force compaction (if supported)
If Fukuii exposes a compaction trigger (check documentation):
Step 4: Offline compaction via restart
# Stop node during low-traffic period
# RocksDB performs major compaction during startup
# May take 30-60 minutes
./bin/fukuii etc
Permanent Fix¶
Prevention measures:
-
Maintain adequate free space (30%+ recommended):
-
Use SSD/NVMe storage:
- SST file compaction is I/O intensive
- SSD dramatically improves compaction speed
-
HDD can create compaction backlog
-
Allocate more resources:
- More CPU cores help parallel compaction
-
More RAM caches database operations
-
Regular maintenance windows:
- Restart weekly/monthly during low activity
-
Allows full compaction cycle
-
Monitor metrics:
Status¶
Permanent: Inherent to LSM tree architecture. Managed through proper resource allocation and maintenance.
Issue 3: File Descriptor Configuration¶
Severity: High
Frequency: Rare
Impact: Node crashes or fails to start
Symptoms¶
Root Cause¶
Linux file descriptor limit exceeded. RocksDB opens many SST files simultaneously.
Workaround¶
Temporary fix (current session):
Permanent Fix¶
For systemd service:
Edit /etc/systemd/system/fukuii.service:
Reload and restart:
For user (persistent):
Edit /etc/security/limits.conf:
Log out and back in, verify:
For Docker:
docker run -d \
--ulimit nofile=65536:65536 \
--name fukuii \
ghcr.io/chippr-robotics/chordodes_fukuii:v1.0.0
Or in docker-compose.yml:
Status¶
Fixed: Set file descriptor limits to 65536 or higher.
Temporary Directory Configuration¶
Fukuii and its JVM may use temporary directories for various operations. This section covers proper configuration for temp directories.
Issue 4: Temp Space Configuration¶
Severity: Medium
Frequency: Uncommon
Impact: Node crashes or performance degradation
Symptoms¶
ERROR [JVM] - No space left on device: /tmp
WARN [Fukuii] - Failed to create temporary file
java.io.IOException: No space left on device
- Node hangs or crashes unexpectedly
- Slow performance during heavy operations
Root Cause¶
/tmppartition full- Large temporary files not cleaned up
- Small
/tmppartition size - Excessive JVM temporary file usage
Workaround¶
Immediate fix:
# Check temp space
df -h /tmp
# Clean temp files (carefully)
sudo find /tmp -type f -atime +7 -delete # Files older than 7 days
sudo rm -rf /tmp/hsperfdata_* # JVM performance data
sudo rm -rf /tmp/java_* # JVM temporary files
Permanent Fix¶
Option 1: Increase /tmp size
For tmpfs (RAM-based):
# Check current size
df -h /tmp
# Increase to 4GB (edit /etc/fstab)
tmpfs /tmp tmpfs defaults,size=4G 0 0
# Remount
sudo mount -o remount /tmp
Option 2: Use dedicated temp directory
# Create dedicated temp directory
sudo mkdir -p /var/tmp/fukuii
sudo chown fukuii_user:fukuii_group /var/tmp/fukuii
sudo chmod 700 /var/tmp/fukuii
Set in JVM options (.jvmopts or startup script):
Option 3: Automated cleanup
Create systemd timer or cron job:
#!/bin/bash
# /usr/local/bin/cleanup-fukuii-temp.sh
TEMP_DIR=/var/tmp/fukuii
find "$TEMP_DIR" -type f -mtime +1 -delete # Delete files older than 1 day
Cron:
Status¶
Fixed: Configure adequate temp space and automated cleanup.
Issue 5: Temp Directory Permissions¶
Severity: Low
Frequency: Rare
Impact: Node fails to start or certain operations fail
Symptoms¶
Root Cause¶
- Temp directory not writable by Fukuii user
- SELinux or AppArmor restrictions
/tmpmounted withnoexecflag
Workaround¶
# Fix permissions
sudo chmod 1777 /tmp # Standard /tmp permissions
# Or for dedicated temp:
sudo chown fukuii_user:fukuii_group /var/tmp/fukuii
sudo chmod 700 /var/tmp/fukuii
Permanent Fix¶
Verify mount options:
If /tmp has noexec, use dedicated temp directory (see Issue 4).
Check SELinux (if applicable):
# Check SELinux status
getenforce
# If enforcing, may need context change
# WARNING: Adjust path to match your actual temp directory
sudo semanage fcontext -a -t tmp_t "/var/tmp/fukuii(/.*)?"
sudo restorecon -R /var/tmp/fukuii
Status¶
Fixed: Ensure proper permissions and mount options.
JVM Optimization¶
Fukuii runs on the JVM and benefits from proper tuning for optimal performance. This section covers recommended JVM configurations.
Issue 6: Heap Size Configuration¶
Severity: High
Frequency: Common with default settings
Impact: Node crashes
Symptoms¶
ERROR [JVM] - java.lang.OutOfMemoryError: Java heap space
ERROR [JVM] - java.lang.OutOfMemoryError: Metaspace
ERROR [JVM] - java.lang.OutOfMemoryError: GC overhead limit exceeded
Node crashes, especially during: - Initial sync - Heavy RPC load - Large block imports
Root Cause¶
- Heap size too small for workload
- Memory leak (rare)
- Metaspace exhaustion (many classes loaded)
Workaround¶
Immediate fix: Restart node (temporary relief)
Permanent Fix¶
Increase heap size (.jvmopts file):
Default:
For 16 GB RAM system:
For 32 GB RAM system:
Guidelines:
- -Xms (initial) = -Xmx (max) for predictable behavior
- Heap should be 50-70% of available RAM
- Leave RAM for OS, RocksDB cache, and other processes
- Minimum 4 GB heap recommended
- 8-16 GB ideal for production
For Docker:
docker run -d \
-e JAVA_OPTS="-Xms8g -Xmx16g" \
--name fukuii \
ghcr.io/chippr-robotics/chordodes_fukuii:v1.0.0
Verify settings:
Metaspace Issues¶
If specifically OutOfMemoryError: Metaspace:
Status¶
Fixed: Configure adequate heap size based on available RAM.
Issue 7: Garbage Collection Tuning¶
Severity: Medium
Frequency: Common with large heaps
Impact: Periodic unresponsiveness, slow sync
Symptoms¶
- Periodic freezes (seconds)
- Delayed block imports
- RPC timeouts
- Peer disconnections
Root Cause¶
- Default garbage collector not optimal for large heaps
- Full GC triggered too frequently
- Heap size too small (constant GC pressure)
Workaround¶
Monitor GC activity:
Permanent Fix¶
Use G1GC (recommended for heaps > 4GB):
Add to .jvmopts:
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=32M
-XX:InitiatingHeapOccupancyPercent=45
Or use ZGC (JDK 21+, for large heaps and low latency):
Or use Shenandoah GC (JDK 21+, alternative low-pause collector):
Tuning recommendations: - Heap < 8GB: Default or G1GC - Heap 8-32GB: G1GC - Heap > 32GB: ZGC or Shenandoah
Additional tuning:
# Reduce GC frequency by tuning thresholds
-XX:NewRatio=2 # New generation = 1/3 of heap
-XX:SurvivorRatio=8
Status¶
Fixed: Use appropriate garbage collector and tune parameters.
Issue 8: Production JVM Configuration¶
Severity: Medium
Frequency: Common without tuning
Impact: Suboptimal performance
Symptoms¶
- Slower than expected block imports
- High CPU usage
- Frequent GC pauses
- Poor throughput
Root Cause¶
Default JVM settings not optimized for Fukuii's workload.
Permanent Fix¶
Recommended production configuration (.jvmopts):
# Heap settings (adjust based on available RAM)
-Xms8g
-Xmx8g
# Garbage Collection
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=32M
# Code cache and metaspace
-XX:ReservedCodeCacheSize=1024m
-XX:MaxMetaspaceSize=1g
# Stack size
-Xss4M
# Performance optimizations
-XX:+UseStringDeduplication
-XX:+OptimizeStringConcat
-XX:+UseCompressedOops
# Monitoring (optional)
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintFlagsFinal
# GC logging (for troubleshooting)
-Xlog:gc*:file=/var/log/fukuii-gc.log:time,level,tags
# JMX monitoring (optional, for debugging)
# -Dcom.sun.management.jmxremote
# -Dcom.sun.management.jmxremote.port=9999
# -Dcom.sun.management.jmxremote.authenticate=false
# -Dcom.sun.management.jmxremote.ssl=false
For development (faster compilation, more debugging):
Status¶
Fixed: Use optimized JVM configuration for production.
Issue 9: JVM Version Compatibility¶
Severity: High
Frequency: Rare
Impact: Node fails to start
Symptoms¶
Root Cause¶
- Wrong JVM version (Fukuii requires JDK 21)
- Multiple JVM installations causing confusion
Workaround¶
# Check current Java version
java -version
# Should show: openjdk version "21.x.x" or similar
# Check which Java is being used
which java
update-alternatives --display java
Permanent Fix¶
Install JDK 21:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install openjdk-21-jdk
# Set as default
sudo update-alternatives --config java
# Select JDK 21
# Verify
java -version
Explicitly set JAVA_HOME (in startup script or environment):
For Docker: Use official image which includes correct JDK version.
Status¶
Fixed: Ensure JDK 21 is installed and used.
Network Connectivity¶
Issue 10: Network Configuration¶
Severity: Medium
Frequency: Common for new operators
Impact: No peers, no sync
Symptoms¶
WARN [PeerManagerActor] - Disconnected from peer: incompatible network
INFO [PeerManagerActor] - Active peers: 0
All peers disconnect immediately after handshake.
Root Cause¶
Running on wrong network (e.g., trying to connect ETC node to ETH network).
Fix¶
Verify correct network:
Check logs for network ID:
Status¶
User Error: Ensure correct network specified at startup.
Issue 11: Time Synchronization¶
Severity: Medium
Frequency: Uncommon
Impact: Peer issues, synchronization problems
Symptoms¶
WARN [Discovery] - Message expired or clock skew detected
WARN [PeerActor] - Peer timestamp out of acceptable range
Root Cause¶
System clock significantly different from network time.
Fix¶
Check time synchronization:
Enable NTP:
# Ubuntu/Debian
sudo apt-get install ntp
sudo systemctl enable ntp
sudo systemctl start ntp
# Or use systemd-timesyncd
sudo systemctl enable systemd-timesyncd
sudo systemctl start systemd-timesyncd
Force sync:
Status¶
Fixed: Enable and verify NTP time synchronization.
Issue 12: Firewall Configuration¶
Severity: Medium
Frequency: Common in security-hardened environments
Impact: No incoming peers, slow peer discovery
Symptoms¶
INFO [PeerManagerActor] - Active peers: 5 (all outgoing)
WARN [ServerActor] - No incoming connections
Root Cause¶
Firewall blocking required ports (9076/TCP, 30303/UDP).
Fix¶
See peering.md and first-start.md.
Status¶
Configuration: Open required ports in firewall.
Issue 13: Network Sync Zero-Length BigInteger ✅¶
Status: Fixed in v1.0.1
Summary¶
This issue was caused by incorrect handling of empty byte arrays in the RLP serialization layer. The fix ensures empty byte arrays correctly deserialize to zero, per Ethereum specification.
Symptoms (for reference)¶
ERROR [o.a.pekko.actor.OneForOneStrategy] - Zero length BigInteger
java.lang.NumberFormatException: Zero length BigInteger
at java.base/java.math.BigInteger.<init>(BigInteger.java:...)
Technical Details¶
- Location:
src/main/scala/com/chipprbots/ethereum/domain/package.scala - Affected component:
ArbitraryIntegerMpt.bigIntSerializer.fromBytes - Root cause: Did not handle empty byte arrays before calling
BigInt(bytes)
The fix:
// Before:
override def fromBytes(bytes: Array[Byte]): BigInt = BigInt(bytes)
// After:
override def fromBytes(bytes: Array[Byte]): BigInt =
if (bytes.isEmpty) BigInt(0) else BigInt(bytes)
Test coverage added: 21+ tests covering all serialization paths.
See commit afc0626 for full implementation details.
Issue 14: ETH68 Peer Connections ✅¶
Status: Fixed in current release
Summary¶
This issue was caused by incorrect message decoder ordering. Network protocol messages must be decoded before capability-specific messages per the devp2p specification.
Symptoms (for reference)¶
DEBUG [c.c.e.n.p2p.MessageDecoder$$anon$1] - Unknown eth/68 message type: 1
INFO [c.c.e.n.rlpx.RLPxConnectionHandler] - Cannot decode message from <peer-ip>:30303, because of Cannot decode Disconnect
Technical Details¶
- Location:
src/main/scala/com/chipprbots/ethereum/network/rlpx/RLPxConnectionHandler.scala - Root cause: ETH68 decoder tried to decode network messages first
The fix:
// Before:
val md = EthereumMessageDecoder.ethMessageDecoder(negotiated).orElse(NetworkMessageDecoder)
// After:
val md = NetworkMessageDecoder.orElse(EthereumMessageDecoder.ethMessageDecoder(negotiated))
See commit 801b236 for full implementation details.
Getting Help¶
If you encounter an issue not documented here:
- Search existing issues: https://github.com/chippr-robotics/fukuii/issues
- Collect information:
- Fukuii version
- Operating system and version
- JVM version
- Relevant log excerpts
- Steps to reproduce
- Open new issue: Provide detailed report with above information
Contributing to This Document¶
This is a living document. Your contributions help everyone! If you: - Find a solution to an issue - Discover a new operational pattern - Have improved configurations
Please submit a pull request or open an issue to update this documentation.
Issue 15: ForkId Compatibility ✅¶
Status: Fixed in current release
Summary¶
This issue was caused by incompatible ForkId values being advertised during ETH64+ protocol handshake for nodes starting from low block numbers.
Symptoms (for reference)¶
INFO [c.c.e.n.handshaker.EthNodeStatus64ExchangeState] - STATUS_EXCHANGE: Sending status - bestBlock=1234
INFO [c.c.e.n.PeerManagerActor] - Handshaked 0/80, pending connection attempts 15
INFO [c.c.e.b.sync.PivotBlockSelector] - Cannot pick pivot block. Need at least 3 peers, but there are only 0
Technical Details¶
- Location:
src/main/scala/com/chipprbots/ethereum/network/handshaker/EthNodeStatus64ExchangeState.scala - Root cause: Bootstrap pivot block only used when
bestBlockNumber == 0
The fix extends bootstrap pivot usage for ForkId calculation during initial sync:
// Use bootstrap pivot for ForkId during initial sync
val forkIdBlockNumber = if (bootstrapPivotBlock > 0) {
val threshold = math.min(bootstrapPivotBlock / 10, BigInt(100000))
if (bestBlockNumber < (bootstrapPivotBlock - threshold)) bootstrapPivotBlock
else bestBlockNumber
} else bestBlockNumber
Benefits: - Bootstrap pivot used for ForkId calculation during entire initial sync - Smooth transition from pivot to actual block number when close to synced - Both regular sync and fast sync now maintain stable peer connections
See CON-006: ForkId Compatibility During Initial Sync for details.
Document Version: 1.2
Last Updated: 2025-11-26
Maintainer: Chippr Robotics LLC