Skip to content

Critical Fix: Wire Protocol Compression Mismatch with Core-Geth

Date: 2025-12-04
Issue: Peer disconnects and SNAP sync timeouts
Root Cause: Incompatible compression logic with core-geth

Problem Statement

fukuii was excluding wire protocol messages (Ping 0x02, Pong 0x03, etc.) from Snappy compression, while core-geth compresses ALL messages when p2pVersion >= 5. This asymmetry caused:

  1. Immediate disconnects - Core-geth receives uncompressed Ping, tries to decompress, fails, disconnects
  2. SNAP timeouts - Connection breaks during Ping/Pong cycle (15s interval), appears as request timeout
  3. GetBlockBodies failures - Request takes >15s, Ping occurs, connection terminates

Core-Geth Compression Logic (Reference Implementation)

Source Code Analysis

File: p2p/peer.go:46

const snappyProtocolVersion = 5

File: p2p/transport.go:150

// Enable snappy if peer's p2pVersion >= 5
t.conn.SetSnappy(their.Version >= snappyProtocolVersion)

File: p2p/rlpx/rlpx.go - Read (Decompress)

func (c *Conn) Read() (code uint64, data []byte, wireSize int, err error) {
    // ... read frame ...

    // If snappy is enabled, decompress ALL messages
    if c.snappyReadBuffer != nil {
        actualSize, err = snappy.DecodedLen(data)
        if err != nil {
            return code, nil, 0, err  // FAIL immediately
        }
        data, err = snappy.Decode(c.snappyReadBuffer, data)
    }
    return code, data, wireSize, err
}

File: p2p/rlpx/rlpx.go - Write (Compress)

func (c *Conn) Write(code uint64, data []byte) (uint32, error) {
    // Compress ALL messages if snappy enabled
    if c.snappyWriteBuffer != nil {
        data = snappy.Encode(c.snappyWriteBuffer, data)
    }
    err := c.session.writeFrame(conn, code, data)
    return wireSize, err
}

Key Insight

Core-geth has NO special cases: - Once snappy is enabled (p2pVersion >= 5), it applies to EVERY message - Wire protocol messages (0x00-0x03) are compressed just like eth protocol messages (0x10+) - Strict: decompression failure = immediate disconnect

fukuii's Incorrect Logic (Before Fix)

What We Were Doing Wrong

Read Side:

val isWireProtocolMessage = frame.`type` >= 0x00 && frame.`type` <= 0x03
val shouldCompress = p2pVersion >= 5 && !isWireProtocolMessage

if (shouldCompress) {
  decompressData(frameData, frame).recoverWith { ... } // With fallback
} else {
  Success(frameData)  // Skip decompression for wire protocol
}

Write Side:

val isWireProtocolMessage = serializable.code >= 0x00 && serializable.code <= 0x03
val shouldCompress = p2pVersion >= 5 && !isWireProtocolMessage

if (shouldCompress) {
  Snappy.compress(framedPayload)  // Compress non-wire messages
} else {
  framedPayload  // Send wire protocol uncompressed
}

The Failure Scenario

Timeline of a disconnect:

  1. T=0s: Connection established, p2pVersion=5 negotiated
  2. T=1s: fukuii sends GetBlockHeaders (0x13) - compressed ✅
  3. T=2s: Core-geth responds with BlockHeaders (0x14) - compressed ✅
  4. T=3s: fukuii sends GetBlockBodies (0x15) - compressed ✅
  5. T=15s: Core-geth sends Ping (0x02) - compressed (per core-geth logic)
  6. T=15.001s: fukuii receives Ping:
  7. Detects frame type 0x02 (wire protocol)
  8. shouldCompress = false (our bug!)
  9. Tries to RLP decode compressed Snappy data
  10. RLP decode fails - invalid structure
  11. T=15.002s: fukuii sends Pong (0x03) - uncompressed (our bug!)
  12. T=15.003s: Core-geth receives Pong:
  13. Has snappy enabled
  14. Tries snappy.DecodedLen(uncompressed_data)
  15. Snappy decode fails - not valid Snappy format
  16. DISCONNECTS peer with error
  17. T=15.004s: fukuii logs: "PEER_REQUEST_DISCONNECTED: reqType=GetBlockBodies"

Result: Appears as "TCP sub-system error" or "connection closed before response"

The Fix

Changes Made

File: src/main/scala/com/chipprbots/ethereum/network/rlpx/MessageCodec.scala

Before (Incorrect):

val isWireProtocolMessage = frame.`type` >= 0x00 && frame.`type` <= 0x03
val shouldCompress = remotePeer2PeerVersion >= 5 && !isWireProtocolMessage

After (Correct - Matches Core-Geth):

// Core-geth compresses ALL messages when p2pVersion >= 5, including wire protocol
val shouldCompress = remotePeer2PeerVersion >= EtcHelloExchangeState.P2pVersion

Applied to BOTH: - readFrames() - decompression logic (line ~72) - encodeMessage() - compression logic (line ~224)

Why This Fixes The Issues

1. GetBlockBodies Disconnects → FIXED - Ping/Pong now compressed/decompressed correctly - Connection stays alive during long requests - No more 15-second disconnect pattern

2. SNAP GetAccountRange Timeouts → FIXED - Connections no longer break after 15 seconds - SNAP requests can wait for full 30s timeout - Actual responses can be received

3. Peer Communication → FIXED - Perfect symmetry with core-geth - Compress when core-geth expects compression - Decompress when core-geth sends compressed

Testing Validation

Expected Behavior After Fix

Logs should show:

COMPRESSION_DECISION: frame=0x02, p2pVersion=5, shouldCompress=true, ...
ENCODE_MSG: Snappy compressed frame 0 from 5 to 9 bytes, code=0x02, p2pVersion=5

Should NOT see:

COMPRESSION_SKIP: Frame type 0x02 - skipping decompression (wireProtocol=true, ...)
PEER_REQUEST_DISCONNECTED: ... reqType=GetBlockBodies, elapsed=15XXXms

Test Scenarios

  1. Short Request (< 15s): GetBlockHeaders
  2. Should work (worked before, works after)

  3. Long Request (> 15s): GetBlockBodies

  4. Before: Disconnect at ~15s (Ping failure)
  5. After: Completes successfully

  6. SNAP Sync:

  7. Before: All requests timeout (connection dies)
  8. After: Receives AccountRange responses

Comparison Table

Aspect Core-Geth fukuii (Before) fukuii (After)
Snappy version p2pVersion >= 5 p2pVersion >= 5 p2pVersion >= 5
Wire protocol (0x00-0x03) Compressed ❌ Uncompressed Compressed
Eth protocol (0x10+) Compressed Compressed Compressed
Decompression fallback None (strict) RLP heuristic RLP heuristic
Symmetry Perfect ❌ Broken ✅ Perfect

Additional Enhancements

Added comprehensive logging for debugging:

// Capability negotiation
log.info("PEER_CAPABILITIES: clientId={}, p2pVersion={}, capabilities=[{}]", ...)
log.info("COMPRESSION_CONFIG: peerP2pVersion={}, compressionEnabled={}", ...)

// Per-message decisions
log.debug("COMPRESSION_DECISION: frame=0x{}, shouldCompress={}, ...", ...)
log.warn("COMPRESSION_FALLBACK: ... peer sent uncompressed despite p2pVersion=5 ...", ...)

These logs will help: - Verify compression decisions are correct - Detect protocol deviations from peers - Debug future compression issues

Impact Assessment

What Changes

  • Wire protocol messages (Ping, Pong, Disconnect) now compressed when p2pVersion >= 5
  • Matches industry standard (core-geth, geth, all major Ethereum clients)

What Stays Same

  • Eth protocol message handling unchanged
  • Fallback logic for protocol deviations unchanged (safety net)
  • p2pVersion negotiation unchanged

Risk Assessment

  • Low Risk: Change aligns with reference implementation
  • High Confidence: Direct comparison with core-geth source code
  • Well Tested: Core-geth has been using this logic for years

References

  • Core-Geth Repository: https://github.com/etclabscore/core-geth
  • Relevant Files:
  • p2p/peer.go:46 - snappyProtocolVersion constant
  • p2p/transport.go:150 - SetSnappy activation
  • p2p/rlpx/rlpx.go:100-230 - Compression implementation
  • Herald Agent Instructions: Previous fix for NewPooledTransactionHashes (similar encoding issue)
  • DevP2P RLPx Spec: https://github.com/ethereum/devp2p/blob/master/rlpx.md

Conclusion

This fix resolves the fundamental incompatibility between fukuii and core-geth by matching core-geth's compression behavior exactly. The wire protocol message exception was a well-intentioned attempt at optimization but violated the implicit protocol contract that core-geth and other clients expect.

By removing this exception and compressing ALL messages when p2pVersion >= 5, we achieve: - ✅ Stable peer connections - ✅ Successful GetBlockBodies requests - ✅ Working SNAP sync - ✅ Full core-geth compatibility