OPS-002: Logging Level Categorization Standards¶
Status: Accepted
Date: November 2024
Deciders: Chippr Robotics LLC Engineering Team
Context¶
During troubleshooting of RLPx sync and other systems, many log statements were added at INFO level without proper categorization. This created excessive log noise during normal sync operations, making it difficult for operators to identify important events and system state changes.
Problem Statement¶
The fukuii client was producing overwhelming amounts of INFO-level logs during normal operation:
- Detailed protocol handshake steps logged as INFO
- Routine block/header/receipt processing logged as INFO
- Each peer interaction logged as INFO
- Internal state updates logged as INFO
This created several issues: 1. Operator fatigue: Production logs were too verbose to monitor effectively 2. Signal vs noise: Important events were buried in routine operational details 3. Troubleshooting difficulty: No clear distinction between normal operation and issues requiring attention 4. Storage costs: Excessive logging increased log storage requirements
Requirements¶
From Issue #512 ("reduce the noise"): 1. Evaluate all log messages in fukuii for proper categorization 2. Distinguish between debug, info, warning, and error severity levels 3. Reduce INFO-level noise during normal sync operations 4. Preserve detailed troubleshooting information at DEBUG level 5. Ensure errors are properly categorized as ERROR or WARN
Technical Context¶
The fukuii client uses two different logging frameworks depending on the actor type:
Pekko Classic Actors (ActorLogging)¶
Classic actors that extend Actor with ActorLogging use Pekko's LoggingAdapter, which provides:
- log.debug() - Debug level
- log.info() - Info level
- log.warning() - Warning level (note: warning, not warn)
- log.error() - Error level
Examples: PivotBlockSelector, BlockImporter, RLPxConnectionHandler
Pekko Typed Actors (context.log)¶
Typed actors that extend AbstractBehavior and use context.log get an SLF4J Logger, which provides:
- log.trace() - Trace level
- log.debug() - Debug level
- log.info() - Info level
- log.warn() - Warning level (note: warn, not warning)
- log.error() - Error level
Examples: BlockFetcher, PoWMiningCoordinator
Other Components (SLF4J/Scala Logging)¶
Non-actor components using SLF4J directly or Scala Logging also use:
- log.warn() - Warning level (note: warn, not warning)
Important: When modifying log levels, always check whether the file uses Pekko Classic ActorLogging (use log.warning) or SLF4J/Typed Actor logging (use log.warn). Using the wrong method will cause compilation errors.
Decision¶
We established comprehensive logging level categorization standards and recategorized 43 log statements across 8 files to align with these standards.
Log Level Standards¶
ERROR¶
Use for failures requiring immediate attention or indicating serious problems: - Connection failures that terminate connections - Message decoding errors that close connections - Critical system failures - Database corruption or access failures - Security-related issues
WARN¶
Use for unexpected but recoverable situations that may indicate problems: - Dismissed or invalid data from peers - Branch resolution issues - Timeout conditions that trigger retries - Blacklisting events - Configuration issues that have fallbacks
INFO¶
Use for significant state changes and important milestones: - System startup and shutdown - Sync mode changes (starting/stopping fast sync, regular sync) - Mining state changes - Successful pivot block selection - Block synchronization completion - Important configuration changes - Major component initialization
DEBUG¶
Use for detailed operational information useful during troubleshooting: - Detailed protocol handshake steps - Received/sent message counts from peers - Block/header/receipt processing details - Pivot block updates and changes - Checkpoint processing - Block import operations - State sync progress details - Peer voting and selection details - Request/response cycles
Subsystem-Specific Guidelines¶
RLPx Connection Handling¶
- Connection initiation/acceptance: DEBUG
- TCP connection established: DEBUG
- Detailed handshake steps: DEBUG
- Protocol negotiation details: DEBUG
- Successful handshake completion: INFO
- Full connection establishment: INFO
- Connection failures/timeouts: ERROR
- Message decode errors: ERROR
Sync (Fast and Regular)¶
- Sync start/stop: INFO
- Received block data from peers: DEBUG
- Pivot block selection/updates: DEBUG (except final selection: INFO)
- State sync progress: DEBUG
- Sync strategy selection: INFO
- Sync completion: INFO
- Retry attempts: DEBUG
- Sync failures requiring fallback: WARN
Block Processing¶
- Block mined: INFO
- Block imported: DEBUG
- Checkpoint received: DEBUG
- Branch resolution issues: WARN
- Invalid blocks: WARN
Peer Management¶
- Peer discovery start/stop: INFO
- Blacklisting peers: INFO
- Dismissed invalid data: WARN
Mining¶
- Mining mode changes: INFO
- Miner instantiation: INFO
- Mining enable/disable: INFO
Database¶
- Database open/close: INFO
- Detailed initialization steps: DEBUG
- Database errors: ERROR
General Principles¶
- INFO should be sparse: A running node should produce INFO logs only occasionally for significant events
- DEBUG is for troubleshooting: Enable DEBUG level when diagnosing sync or connection issues
- ERROR means action needed: ERROR logs should indicate something requiring investigation or action
- WARN indicates problems: WARN logs should flag unexpected but handled situations
- Consider frequency: If a log would fire hundreds of times during normal operation, it's probably DEBUG
- Consider audience: INFO is for operators, DEBUG is for developers
Decision Checklist for New Log Statements¶
When adding or modifying log statements, consider: 1. Will this fire frequently during normal operation? → DEBUG 2. Does this indicate a problem? → WARN or ERROR 3. Is this a major state transition? → INFO 4. Is this helpful for troubleshooting specific issues? → DEBUG 5. Would an operator need to see this in production? → INFO, otherwise DEBUG
Implementation¶
The following changes were implemented across 8 files (43 log statements total):
RLPxConnectionHandler.scala (11 changes)¶
- Protocol handshake steps: INFO → DEBUG
- Message decode errors that close connections: INFO → ERROR
- Kept significant milestones (handshake success, connection established) as INFO
FastSync.scala (17 changes)¶
- Received headers/bodies/receipts from peers: INFO → DEBUG
- Pivot block updates and state changes: INFO → DEBUG
- Kept sync start/completion messages as INFO
RegularSync.scala (3 changes)¶
- Checkpoint and block import operations: INFO → DEBUG
- Kept sync start/stop and block mined as INFO
BlockFetcher.scala (3 changes)¶
- Dismissed/invalid headers from peers: INFO → WARN
BlockImporter.scala (2 changes)¶
- Branch resolution issues: INFO → WARN
AdaptiveSyncStrategy.scala (1 change)¶
- Retryable sync failures: INFO → DEBUG
SyncStateSchedulerActor.scala (2 changes)¶
- State node requests/responses: INFO → DEBUG
PivotBlockSelector.scala (4 changes)¶
- Pivot block voting details: INFO → DEBUG
- Timeouts and insufficient votes: INFO → WARN
Consequences¶
Positive¶
- Reduced log noise: INFO level now shows ~95% fewer messages during normal operation
- Clear signal: Important events (sync start/stop, errors) stand out clearly in INFO logs
- Better troubleshooting: DEBUG level contains all detailed information when needed
- Proper severity: Errors are ERROR, problems are WARN, progress is DEBUG
- Operator friendly: INFO level suitable for production monitoring without overwhelming detail
- Storage savings: Reduced log volume in production deployments
- Clear standards: Documented guidelines for future development
Negative¶
- Migration effort: Required reviewing and recategorizing 43 log statements
- Learning curve: Developers need to learn new categorization standards
- Potential gaps: Some edge cases may not fit perfectly into categories
Neutral¶
- No functionality change: All information still logged, just at appropriate levels
- Backward compatible: Operators can still enable DEBUG to see all details
- Dual logging systems: Must use
log.warningfor Pekko Classic ActorLogging andlog.warnfor SLF4J/Typed Actors - developers must check the actor type before modifying log statements
Related Decisions¶
- Future logging enhancements should follow these categorization standards
- Consider structured logging (e.g., JSON) in future for better machine parsing
- May need to revisit categories as new subsystems are added
References¶
- Issue #512: "reduce the noise"
- SLF4J Logging Documentation
- Apache Pekko Logging Documentation
- Original PR: Recategorize log levels to reduce INFO noise during sync operations
Appendix: Log Format Conventions¶
- Use structured logging with placeholders:
log.info("Block {} imported", blockNumber) - Include relevant context: peer IDs, block numbers, error details
- Keep messages concise but informative
- Use consistent terminology across the codebase
- Prefix subsystem-specific logs with tags like
[RLPx],[FastSync], etc. when helpful
Appendix: Manual Log Level Configuration¶
The logback.xml file provides comprehensive logger entries for all major components, organized by subsystem. Operators can manually set log levels for troubleshooting by editing the appropriate logger entry.
Configuration Location Rationale¶
Fukuii uses a hybrid configuration approach for logging:
| Setting Type | Location | Rationale |
|---|---|---|
| Global log level | base.conf (logging.logs-level) |
Simple runtime override via -Dlogging.logs-level=DEBUG or environment variable |
| Output format | base.conf (logging.json-output) |
Deployment-specific setting (JSON for log aggregation) |
| Log file paths | base.conf (logging.logs-dir, logging.logs-file) |
Environment-specific paths |
| Per-component log levels | logback.xml |
Detailed troubleshooting control (see below) |
Why per-component log levels belong in logback.xml:
-
Hierarchical control: Logback's native logger hierarchy allows setting levels at package and class granularity. HOCON config would require flattening this into string keys.
-
Well-documented convention: Logback's XML format is the standard for Java/Scala applications. Operators familiar with SLF4J/Logback will expect logger configuration in
logback.xml. -
IDE and tooling support: XML schema validation, autocompletion, and logback-specific tooling work natively with
logback.xml. -
Conditional logic: Logback supports
<if>conditions for environment-specific behavior (e.g., JSON output toggle), which is already used in the current configuration. -
Hot reload capability: Logback can reload
logback.xmlat runtime withscan="true", enabling log level changes without restart.
The global logging.logs-level in base.conf is bridged to logback via ConfigPropertyDefiner, providing a convenient override for the root logger while preserving fine-grained control in logback.xml.
Configuration Files¶
- Production:
src/main/resources/logback.xml - Unit tests:
src/test/resources/logback-test.xml - Integration tests:
src/it/resources/logback-test.xml - Global settings:
src/main/resources/conf/base.conf(logging section)
Subsystem Categories¶
The following subsystems have dedicated logger entries in logback.xml:
| Subsystem | Package | Description |
|---|---|---|
| Scalanet | com.chipprbots.scalanet |
Low-level networking library |
| Network | com.chipprbots.ethereum.network |
Peer connections and communication |
| RLPx | com.chipprbots.ethereum.network.rlpx |
Encrypted transport protocol |
| Sync | com.chipprbots.ethereum.blockchain.sync |
Block synchronization |
| Fast Sync | com.chipprbots.ethereum.blockchain.sync.fast |
Fast sync mode |
| Regular Sync | com.chipprbots.ethereum.blockchain.sync.regular |
Regular sync mode |
| SNAP Sync | com.chipprbots.ethereum.blockchain.sync.snap |
SNAP protocol sync |
| Ledger | com.chipprbots.ethereum.ledger |
Block execution and state |
| Database | com.chipprbots.ethereum.db |
Storage and persistence |
| MPT | com.chipprbots.ethereum.mpt |
Merkle Patricia Trie |
| VM | com.chipprbots.ethereum.vm |
Ethereum Virtual Machine |
| Consensus | com.chipprbots.ethereum.consensus |
Mining and validation |
| Transactions | com.chipprbots.ethereum.transactions |
Pending transactions |
| JSON-RPC | com.chipprbots.ethereum.jsonrpc |
API server |
| Faucet | com.chipprbots.ethereum.faucet |
Test network faucet |
| Metrics | com.chipprbots.ethereum.metrics |
Performance monitoring |
Example: Enabling DEBUG for RLPx Troubleshooting¶
To enable DEBUG logging for RLPx connection issues, modify logback.xml:
<!-- Before: INFO (production default) -->
<logger name="com.chipprbots.ethereum.network.rlpx" level="INFO" />
<!-- After: DEBUG (for troubleshooting) -->
<logger name="com.chipprbots.ethereum.network.rlpx" level="DEBUG" />
Note on Pekko Actor Logging¶
When setting any level to DEBUG, you may also need to adjust pekko.loglevel in application.conf for actor-based components to ensure DEBUG messages are not filtered at the actor system level.