Log Triage Runbook¶
Audience: Operators diagnosing issues and troubleshooting via logs
Estimated Time: 15-45 minutes per issue
Prerequisites: Access to Fukuii logs
Overview¶
This runbook covers log configuration, analysis techniques, and troubleshooting common issues through log examination. Logs are your primary diagnostic tool for understanding node behavior and identifying problems.
Table of Contents¶
- Log Configuration
- Log Locations and Structure
- Understanding Log Levels
- Common Log Patterns
- Troubleshooting by Category
- Log Analysis Tools
- Best Practices
Log Configuration¶
Default Configuration¶
Fukuii uses Logback for logging, configured in src/main/resources/logback.xml.
Default settings:
- Format: Text with timestamp, level, logger name, and message
- Console: INFO level and above
- File: All levels (configurable)
- Rotation: 10 MB per file, max 50 files
- Location: ~/.fukuii/<network>/logs/
Configuring Log Levels¶
Log levels can be set via application configuration:
Via application.conf:
logging {
logs-dir = ${user.home}"/.fukuii/"${fukuii.blockchains.network}"/logs"
logs-file = "fukuii"
logs-level = "INFO" # Options: TRACE, DEBUG, INFO, WARN, ERROR
json-output = false
}
Via environment variable (if supported):
Via JVM system property:
Adjusting Specific Logger Levels¶
Edit your configuration or create a custom logback.xml:
<configuration>
<!-- ... other config ... -->
<!-- Set specific package to DEBUG -->
<logger name="com.chipprbots.ethereum.blockchain.sync" level="DEBUG" />
<!-- Reduce verbose logger -->
<logger name="io.netty" level="WARN"/>
<!-- Silence very verbose logger -->
<logger name="com.chipprbots.ethereum.vm.VM" level="OFF" />
</configuration>
Enabling JSON Logging¶
For structured logging (useful for log aggregation tools like ELK, Splunk):
Restart Fukuii to apply changes.
Log Rotation¶
Rotation is automatic with default settings:
- Size-based: Rolls over at 10 MB
- Retention: Keeps 50 archived logs
- Compression: Archives are compressed (.zip)
- Naming:
fukuii.1.log.zip,fukuii.2.log.zip, etc.
To adjust, modify logback.xml:
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<fileNamePattern>${LOGSDIR}/${LOGSFILENAME}.%i.log.zip</fileNamePattern>
<minIndex>1</minIndex>
<maxIndex>100</maxIndex> <!-- Keep 100 files instead of 50 -->
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<maxFileSize>50MB</maxFileSize> <!-- 50 MB instead of 10 MB -->
</triggeringPolicy>
Log Locations and Structure¶
Log File Locations¶
Binary installation:
~/.fukuii/etc/logs/
├── fukuii.log # Current log
├── fukuii.1.log.zip # Most recent archive
├── fukuii.2.log.zip
└── ...
Docker installation:
# View logs
docker logs fukuii
# Follow logs
docker logs -f fukuii
# Export logs to file
docker logs fukuii > fukuii.log 2>&1
Systemd service:
# View logs
journalctl -u fukuii
# Follow logs
journalctl -u fukuii -f
# Export logs
journalctl -u fukuii --no-pager > fukuii.log
Log Entry Format¶
Standard format:
2025-11-02 10:30:45 INFO [com.chipprbots.ethereum.Fukuii] - Starting Fukuii client version: 1.0.0
│ │ │ │
│ │ │ └─ Message
│ │ └─ Logger name (class/package)
│ └─ Log level
└─ Timestamp
JSON format (when enabled):
{
"timestamp": "2025-11-02T10:30:45.123Z",
"level": "INFO",
"logger": "com.chipprbots.ethereum.Fukuii",
"message": "Starting Fukuii client version: 1.0.0",
"hostname": "node01"
}
Understanding Log Levels¶
Log Level Hierarchy¶
When you set a level, you see that level and all higher levels.
Level Descriptions¶
| Level | Description | When to Use | Volume |
|---|---|---|---|
| ERROR | Critical failures | Production - always monitor | Low |
| WARN | Potential issues | Production - should investigate | Low-Medium |
| INFO | Important events | Production - normal operations | Medium |
| DEBUG | Detailed diagnostic info | Development/troubleshooting | High |
| TRACE | Very detailed execution flow | Deep debugging only | Very High |
Typical Production Setup¶
Root level: INFO
Specific troubleshooting: DEBUG for relevant packages
Performance-critical paths: WARN or OFF (e.g., VM execution)
Common Log Patterns¶
Healthy Node Startup¶
INFO [Fukuii] - Starting Fukuii client version: 1.0.0
INFO [NodeBuilder] - Fixing database...
INFO [GenesisDataLoader] - Loading genesis data...
INFO [GenesisDataLoader] - Genesis data loaded successfully
INFO [NodeBuilder] - Starting peer manager...
INFO [ServerActor] - Server bound to /0.0.0.0:9076
INFO [NodeBuilder] - Starting server...
INFO [DiscoveryService] - Discovery service started on port 30303
INFO [NodeBuilder] - Starting sync controller...
INFO [SyncController] - Starting blockchain synchronization
INFO [NodeBuilder] - Starting JSON-RPC HTTP server on 0.0.0.0:8546...
INFO [JsonRpcHttpServer] - JSON-RPC HTTP server listening on 0.0.0.0:8546
INFO [Fukuii] - Fukuii started successfully
Normal Operation Logs¶
INFO [PeerManagerActor] - Connected to peer: Peer(...)
INFO [SyncController] - Imported 100 blocks in 5.2 seconds
INFO [BlockBroadcaster] - Broadcasted block #12345678 to 25 peers
INFO [PendingTransactionsManager] - Added transaction 0xabc...
Warning Signs (Need Attention)¶
WARN [PeerManagerActor] - Disconnected from peer: handshake timeout
WARN [SyncController] - No suitable peers for synchronization
WARN [RocksDbDataSource] - Compaction took longer than expected: 120s
WARN [PeerActor] - Received unknown message type from peer
Error Indicators (Immediate Action Needed)¶
ERROR [ServerActor] - Failed to bind to port 9076: Address already in use
ERROR [RocksDbDataSource] - Database corruption detected
ERROR [BlockImporter] - Failed to execute block: insufficient gas
ERROR [Fukuii] - Fatal error during startup
Troubleshooting by Category¶
Startup Issues¶
Problem: Port Already in Use¶
Log pattern:
Diagnosis:
Solution:
# Kill conflicting process or change Fukuii port
# Change port in config:
# fukuii.network.server-address.port = 9077
See: first-start.md
Problem: Database Corruption¶
Log pattern:
Solution: See known-issues.md
Problem: Genesis Data Load Failure¶
Log pattern:
ERROR [GenesisDataLoader] - Failed to load genesis data
ERROR [GenesisDataLoader] - Invalid genesis configuration
Diagnosis:
Solution: - Ensure correct network specified (etc, eth, mordor) - Verify genesis configuration files are present - Check for file corruption
Synchronization Issues¶
Problem: Slow or Stalled Sync¶
Log pattern:
Diagnosis:
# Check recent import activity
grep "Imported.*blocks" ~/.fukuii/etc/logs/fukuii.log | tail -20
# Check peer count
grep "peer count" ~/.fukuii/etc/logs/fukuii.log | tail -5
Common causes: 1. No peers: See peering.md 2. Disk I/O bottleneck: See disk-management.md 3. Network issues: Check bandwidth, latency
Solution:
# Enable DEBUG logging for sync
# In config: logging.logs-level = "DEBUG"
# Or specific: <logger name="com.chipprbots.ethereum.blockchain.sync" level="DEBUG" />
# Monitor for detailed sync info
tail -f ~/.fukuii/etc/logs/fukuii.log | grep -i sync
Problem: Block Import Failures¶
Log pattern:
ERROR [BlockImporter] - Failed to execute block 12345678
ERROR [BlockImporter] - Invalid block: state root mismatch
Diagnosis: This may indicate: - Database corruption - Bug in EVM implementation - Fork incompatibility
Solution: 1. Check Fukuii version is up-to-date 2. Review recent hard forks - may need upgrade 3. Verify database integrity (see disk-management.md) 4. Report issue with block number to maintainers
Network and Peering Issues¶
Problem: No Peers¶
Log pattern:
Diagnosis:
# Check discovery is enabled
grep "discovery" ~/.fukuii/etc/logs/fukuii.log | tail -10
# Check for connection errors
grep -i "connection\|peer" ~/.fukuii/etc/logs/fukuii.log | grep -i error | tail -20
Solution: See peering.md
Problem: Peers Disconnecting¶
Log pattern:
WARN [PeerManagerActor] - Disconnected from peer: incompatible network
WARN [PeerActor] - Peer handshake timeout
INFO [PeerManagerActor] - Blacklisted peer: ...
Analysis:
# Count disconnect reasons
grep "Disconnected from peer" ~/.fukuii/etc/logs/fukuii.log | \
cut -d: -f3 | sort | uniq -c | sort -rn
Common reasons:
- incompatible network - Wrong network/fork
- handshake timeout - Network latency or peer overload
- protocol error - Peer misbehavior or version incompatibility
Solution: Usually normal - node filters incompatible peers. If excessive (> 50% disconnect rate), see peering.md
RPC and API Issues¶
Problem: RPC Not Responding¶
Log pattern:
Diagnosis:
# Check if RPC server started
grep "JSON-RPC" ~/.fukuii/etc/logs/fukuii.log
# Test RPC endpoint
curl -X POST --data '{"jsonrpc":"2.0","method":"web3_clientVersion","params":[],"id":1}' \
http://localhost:8546
Solution: - Verify RPC is enabled in configuration - Check port is not in use - Review firewall rules
Problem: RPC Errors¶
Log pattern:
Analysis: Check which RPC methods are failing:
Performance Issues¶
Problem: High Memory Usage¶
Log pattern:
Diagnosis:
# Check current memory usage
ps aux | grep fukuii
jps -lvm | grep fukuii
# Check JVM settings
cat .jvmopts
Solution: See known-issues.md
Problem: Slow Performance¶
Log pattern:
WARN [RocksDbDataSource] - Database operation took 5000ms (expected < 100ms)
WARN [SyncController] - Block import rate: 2 blocks/second (expected 50+)
Diagnosis:
# Check for disk I/O warnings
grep -i "slow\|took.*ms\|performance" ~/.fukuii/etc/logs/fukuii.log
# System diagnostics
iostat -x 1 10
top
Solution: See disk-management.md
Database Issues¶
Problem: RocksDB Errors¶
Log pattern:
ERROR [RocksDbDataSource] - RocksDB error: ...
ERROR [RocksDbDataSource] - Failed to write batch
WARN [RocksDbDataSource] - Compaction pending
Solution: See known-issues.md
Log Analysis Tools¶
Basic Command-Line Tools¶
Search for errors:
Count log levels:
Find recent activity:
Search archived logs:
Time-range analysis:
# Logs from last hour
awk -v d=$(date -d '1 hour ago' '+%Y-%m-%d %H') '$0 ~ d' ~/.fukuii/etc/logs/fukuii.log
Extract stack traces:
Advanced Analysis Scripts¶
Summarize issues:
#!/bin/bash
# log-summary.sh
LOG_FILE=~/.fukuii/etc/logs/fukuii.log
echo "=== Log Summary ==="
echo "Total lines: $(wc -l < $LOG_FILE)"
echo ""
echo "=== Log Levels ==="
awk '{print $3}' "$LOG_FILE" | sort | uniq -c | sort -rn
echo ""
echo "=== Top Errors ==="
grep ERROR "$LOG_FILE" | awk -F'\\[|\\]' '{print $2}' | sort | uniq -c | sort -rn | head -10
echo ""
echo "=== Recent Errors ==="
grep ERROR "$LOG_FILE" | tail -10
Monitor specific patterns:
#!/bin/bash
# monitor-logs.sh
tail -f ~/.fukuii/etc/logs/fukuii.log | while read line; do
if echo "$line" | grep -q "ERROR"; then
echo "🔴 $line"
elif echo "$line" | grep -q "WARN"; then
echo "🟡 $line"
elif echo "$line" | grep -q "Imported.*blocks"; then
echo "✅ $line"
fi
done
Performance metrics extraction:
# Extract block import rates
grep "Imported.*blocks" ~/.fukuii/etc/logs/fukuii.log | \
awk '{print $1, $2, $6, $7, $8, $9}' | tail -20
Log Aggregation Tools¶
For production environments:
1. ELK Stack (Elasticsearch, Logstash, Kibana)
2. Grafana Loki
3. Splunk
4. CloudWatch / Stackdriver
Best Practices¶
Logging Strategy¶
- Production: INFO level by default
- Troubleshooting: DEBUG for specific packages
- Development: DEBUG or TRACE
- Performance testing: WARN or ERROR only
Log Retention¶
- Keep logs for troubleshooting window: 7-30 days typical
- Archive old logs: Compress and move to long-term storage
- Automate cleanup: Prevent disk exhaustion
# Clean logs older than 30 days
find ~/.fukuii/etc/logs/ -name "fukuii.*.log.zip" -mtime +30 -delete
Monitoring and Alerting¶
Set up alerts for:
# Critical errors
grep -c "ERROR" fukuii.log > threshold
# Startup failures
grep "Fatal error" fukuii.log
# Peer connectivity
grep "No peers available" fukuii.log
# Database issues
grep "RocksDB.*error\|corruption" fukuii.log
Log Rotation Best Practices¶
- Size-based rotation: 10-50 MB per file
- Retention count: 50-100 files
- Compression: Always enable
- Monitoring: Alert if logs stop rotating (may indicate hang)
Security Considerations¶
- Restrict access:
chmod 640 ~/.fukuii/etc/logs/* - No sensitive data: Avoid logging private keys, passwords
- Audit logging: Enable for production nodes
- Secure storage: Protect log archives
Debugging Workflow¶
- Identify symptoms: What's not working?
- Check recent logs: Look for errors around symptom time
- Increase verbosity: Enable DEBUG for relevant packages
- Reproduce issue: Observe logs during reproduction
- Analyze patterns: Look for correlations
- Test hypothesis: Make changes, observe results
- Document findings: Update runbooks
Log Analysis Checklist¶
When investigating an issue:
- Check latest log entries for errors
- Review startup sequence for anomalies
- Verify all services started successfully
- Check for resource warnings (memory, disk)
- Review peer connectivity messages
- Look for patterns (timing, frequency)
- Check archived logs if issue is historical
- Compare with known good logs
- Search for similar issues in documentation
- Correlate with system metrics (CPU, disk, network)
Related Runbooks¶
- First Start - Initial setup and startup logs
- Peering - Network and peer-related logs
- Disk Management - Database and storage logs
- Known Issues - Common log patterns and solutions
- Investigation Reports - Detailed analysis of production incidents and operational issues
Example Analysis Reports¶
- Sync process issues are documented in the Troubleshooting section
Document Version: 1.1
Last Updated: 2025-11-10
Maintainer: Chippr Robotics LLC