OPS-002: Logging Level Categorization Standards¶

Status: Accepted

Date: November 2024

Deciders: Chippr Robotics LLC Engineering Team

Context¶

During troubleshooting of RLPx sync and other systems, many log statements were added at INFO level without proper categorization. This created excessive log noise during normal sync operations, making it difficult for operators to identify important events and system state changes.

Problem Statement¶

The fukuii client was producing overwhelming amounts of INFO-level logs during normal operation: - Detailed protocol handshake steps logged as INFO - Routine block/header/receipt processing logged as INFO
- Each peer interaction logged as INFO - Internal state updates logged as INFO

This created several issues: 1. Operator fatigue: Production logs were too verbose to monitor effectively 2. Signal vs noise: Important events were buried in routine operational details 3. Troubleshooting difficulty: No clear distinction between normal operation and issues requiring attention 4. Storage costs: Excessive logging increased log storage requirements

Requirements¶

From Issue #512 ("reduce the noise"): 1. Evaluate all log messages in fukuii for proper categorization 2. Distinguish between debug, info, warning, and error severity levels 3. Reduce INFO-level noise during normal sync operations 4. Preserve detailed troubleshooting information at DEBUG level 5. Ensure errors are properly categorized as ERROR or WARN

Technical Context¶

The fukuii client uses two different logging frameworks depending on the actor type:

Pekko Classic Actors (ActorLogging)¶

Classic actors that extend Actor with ActorLogging use Pekko's LoggingAdapter, which provides: - log.debug() - Debug level - log.info() - Info level - log.warning() - Warning level (note: warning, not warn) - log.error() - Error level

Examples: PivotBlockSelector, BlockImporter, RLPxConnectionHandler

Pekko Typed Actors (context.log)¶

Typed actors that extend AbstractBehavior and use context.log get an SLF4J Logger, which provides: - log.trace() - Trace level - log.debug() - Debug level - log.info() - Info level - log.warn() - Warning level (note: warn, not warning) - log.error() - Error level

Examples: BlockFetcher, PoWMiningCoordinator

Other Components (SLF4J/Scala Logging)¶

Non-actor components using SLF4J directly or Scala Logging also use: - log.warn() - Warning level (note: warn, not warning)

Important: When modifying log levels, always check whether the file uses Pekko Classic ActorLogging (use log.warning) or SLF4J/Typed Actor logging (use log.warn). Using the wrong method will cause compilation errors.

Decision¶

We established comprehensive logging level categorization standards and recategorized 43 log statements across 8 files to align with these standards.

Log Level Standards¶

ERROR¶

Use for failures requiring immediate attention or indicating serious problems: - Connection failures that terminate connections - Message decoding errors that close connections - Critical system failures - Database corruption or access failures - Security-related issues

WARN¶

Use for unexpected but recoverable situations that may indicate problems: - Dismissed or invalid data from peers - Branch resolution issues - Timeout conditions that trigger retries - Blacklisting events - Configuration issues that have fallbacks

INFO¶

Use for significant state changes and important milestones: - System startup and shutdown - Sync mode changes (starting/stopping fast sync, regular sync) - Mining state changes - Successful pivot block selection - Block synchronization completion - Important configuration changes - Major component initialization

DEBUG¶

Use for detailed operational information useful during troubleshooting: - Detailed protocol handshake steps - Received/sent message counts from peers - Block/header/receipt processing details - Pivot block updates and changes - Checkpoint processing - Block import operations - State sync progress details - Peer voting and selection details - Request/response cycles

Subsystem-Specific Guidelines¶

RLPx Connection Handling¶

Connection initiation/acceptance: DEBUG
TCP connection established: DEBUG
Detailed handshake steps: DEBUG
Protocol negotiation details: DEBUG
Successful handshake completion: INFO
Full connection establishment: INFO
Connection failures/timeouts: ERROR
Message decode errors: ERROR

Sync (Fast and Regular)¶

Sync start/stop: INFO
Received block data from peers: DEBUG
Pivot block selection/updates: DEBUG (except final selection: INFO)
State sync progress: DEBUG
Sync strategy selection: INFO
Sync completion: INFO
Retry attempts: DEBUG
Sync failures requiring fallback: WARN

Block Processing¶

Block mined: INFO
Block imported: DEBUG
Checkpoint received: DEBUG
Branch resolution issues: WARN
Invalid blocks: WARN

Peer Management¶

Peer discovery start/stop: INFO
Blacklisting peers: INFO
Dismissed invalid data: WARN

Mining¶

Mining mode changes: INFO
Miner instantiation: INFO
Mining enable/disable: INFO

Database¶

Database open/close: INFO
Detailed initialization steps: DEBUG
Database errors: ERROR

General Principles¶

INFO should be sparse: A running node should produce INFO logs only occasionally for significant events
DEBUG is for troubleshooting: Enable DEBUG level when diagnosing sync or connection issues
ERROR means action needed: ERROR logs should indicate something requiring investigation or action
WARN indicates problems: WARN logs should flag unexpected but handled situations
Consider frequency: If a log would fire hundreds of times during normal operation, it's probably DEBUG
Consider audience: INFO is for operators, DEBUG is for developers

Decision Checklist for New Log Statements¶

When adding or modifying log statements, consider: 1. Will this fire frequently during normal operation? → DEBUG 2. Does this indicate a problem? → WARN or ERROR 3. Is this a major state transition? → INFO 4. Is this helpful for troubleshooting specific issues? → DEBUG 5. Would an operator need to see this in production? → INFO, otherwise DEBUG

Implementation¶

The following changes were implemented across 8 files (43 log statements total):

RLPxConnectionHandler.scala (11 changes)¶

Protocol handshake steps: INFO → DEBUG
Message decode errors that close connections: INFO → ERROR
Kept significant milestones (handshake success, connection established) as INFO

FastSync.scala (17 changes)¶

Received headers/bodies/receipts from peers: INFO → DEBUG
Pivot block updates and state changes: INFO → DEBUG
Kept sync start/completion messages as INFO

RegularSync.scala (3 changes)¶

Checkpoint and block import operations: INFO → DEBUG
Kept sync start/stop and block mined as INFO

BlockFetcher.scala (3 changes)¶

Dismissed/invalid headers from peers: INFO → WARN

BlockImporter.scala (2 changes)¶

Branch resolution issues: INFO → WARN

AdaptiveSyncStrategy.scala (1 change)¶

Retryable sync failures: INFO → DEBUG

SyncStateSchedulerActor.scala (2 changes)¶

State node requests/responses: INFO → DEBUG

PivotBlockSelector.scala (4 changes)¶

Pivot block voting details: INFO → DEBUG
Timeouts and insufficient votes: INFO → WARN

Consequences¶

Positive¶

Reduced log noise: INFO level now shows ~95% fewer messages during normal operation
Clear signal: Important events (sync start/stop, errors) stand out clearly in INFO logs
Better troubleshooting: DEBUG level contains all detailed information when needed
Proper severity: Errors are ERROR, problems are WARN, progress is DEBUG
Operator friendly: INFO level suitable for production monitoring without overwhelming detail
Storage savings: Reduced log volume in production deployments
Clear standards: Documented guidelines for future development

Negative¶

Migration effort: Required reviewing and recategorizing 43 log statements
Learning curve: Developers need to learn new categorization standards
Potential gaps: Some edge cases may not fit perfectly into categories

Neutral¶

No functionality change: All information still logged, just at appropriate levels
Backward compatible: Operators can still enable DEBUG to see all details
Dual logging systems: Must use log.warning for Pekko Classic ActorLogging and log.warn for SLF4J/Typed Actors - developers must check the actor type before modifying log statements

Future logging enhancements should follow these categorization standards
Consider structured logging (e.g., JSON) in future for better machine parsing
May need to revisit categories as new subsystems are added

References¶

Issue #512: "reduce the noise"
SLF4J Logging Documentation
Apache Pekko Logging Documentation
Original PR: Recategorize log levels to reduce INFO noise during sync operations

Appendix: Log Format Conventions¶

Use structured logging with placeholders: log.info("Block {} imported", blockNumber)
Include relevant context: peer IDs, block numbers, error details
Keep messages concise but informative
Use consistent terminology across the codebase
Prefix subsystem-specific logs with tags like [RLPx], [FastSync], etc. when helpful

Appendix: Manual Log Level Configuration¶

The logback.xml file provides comprehensive logger entries for all major components, organized by subsystem. Operators can manually set log levels for troubleshooting by editing the appropriate logger entry.

Configuration Location Rationale¶

Fukuii uses a hybrid configuration approach for logging:

Setting Type	Location	Rationale
Global log level	`base.conf` (`logging.logs-level`)	Simple runtime override via `-Dlogging.logs-level=DEBUG` or environment variable
Output format	`base.conf` (`logging.json-output`)	Deployment-specific setting (JSON for log aggregation)
Log file paths	`base.conf` (`logging.logs-dir`, `logging.logs-file`)	Environment-specific paths
Per-component log levels	`logback.xml`	Detailed troubleshooting control (see below)

Why per-component log levels belong in logback.xml:

Hierarchical control: Logback's native logger hierarchy allows setting levels at package and class granularity. HOCON config would require flattening this into string keys.
Well-documented convention: Logback's XML format is the standard for Java/Scala applications. Operators familiar with SLF4J/Logback will expect logger configuration in logback.xml.
IDE and tooling support: XML schema validation, autocompletion, and logback-specific tooling work natively with logback.xml.
Conditional logic: Logback supports <if> conditions for environment-specific behavior (e.g., JSON output toggle), which is already used in the current configuration.
Hot reload capability: Logback can reload logback.xml at runtime with scan="true", enabling log level changes without restart.

The global logging.logs-level in base.conf is bridged to logback via ConfigPropertyDefiner, providing a convenient override for the root logger while preserving fine-grained control in logback.xml.

Configuration Files¶

Production: src/main/resources/logback.xml
Unit tests: src/test/resources/logback-test.xml
Integration tests: src/it/resources/logback-test.xml
Global settings: src/main/resources/conf/base.conf (logging section)

Subsystem Categories¶

The following subsystems have dedicated logger entries in logback.xml:

Subsystem	Package	Description
Scalanet	`com.chipprbots.scalanet`	Low-level networking library
Network	`com.chipprbots.ethereum.network`	Peer connections and communication
RLPx	`com.chipprbots.ethereum.network.rlpx`	Encrypted transport protocol
Sync	`com.chipprbots.ethereum.blockchain.sync`	Block synchronization
Fast Sync	`com.chipprbots.ethereum.blockchain.sync.fast`	Fast sync mode
Regular Sync	`com.chipprbots.ethereum.blockchain.sync.regular`	Regular sync mode
SNAP Sync	`com.chipprbots.ethereum.blockchain.sync.snap`	SNAP protocol sync
Ledger	`com.chipprbots.ethereum.ledger`	Block execution and state
Database	`com.chipprbots.ethereum.db`	Storage and persistence
MPT	`com.chipprbots.ethereum.mpt`	Merkle Patricia Trie
VM	`com.chipprbots.ethereum.vm`	Ethereum Virtual Machine
Consensus	`com.chipprbots.ethereum.consensus`	Mining and validation
Transactions	`com.chipprbots.ethereum.transactions`	Pending transactions
JSON-RPC	`com.chipprbots.ethereum.jsonrpc`	API server
Faucet	`com.chipprbots.ethereum.faucet`	Test network faucet
Metrics	`com.chipprbots.ethereum.metrics`	Performance monitoring

Example: Enabling DEBUG for RLPx Troubleshooting¶

To enable DEBUG logging for RLPx connection issues, modify logback.xml:

<!-- Before: INFO (production default) -->
<logger name="com.chipprbots.ethereum.network.rlpx" level="INFO" />

<!-- After: DEBUG (for troubleshooting) -->
<logger name="com.chipprbots.ethereum.network.rlpx" level="DEBUG" />

Note on Pekko Actor Logging¶

When setting any level to DEBUG, you may also need to adjust pekko.loglevel in application.conf for actor-based components to ensure DEBUG messages are not filtered at the actor system level.