RLPx Hello Regression Investigation¶
Summary¶
- Validation run
ops/gorgoroth/logs/rplx-20251209-211024shows every RLPx session breaking before STATUS exchange withMessageDecoder$MalformedMessageError: Cannot decode Hello. - Our logs also show
COMPRESSION_POLICYenabling Snappy on both directions before the remote hello is fully handled, followed by each peer emittingSEND_MSG: type=Helloagain after the connection is already marked "FULLY ESTABLISHED". - When this second hello is sent from the
handshakedstate it is serialized throughMessageCodec(which compresses because both sides advertise p2pVersion ≥ 5), so the remote peer'sHelloCodectries to RLP-decode Snappy data and aborts the handshake.
Evidence from the failed run¶
2025-12-10 03:11:06,605 INFO [c.c.e.network.rlpx.MessageCodec] - COMPRESSION_POLICY: ... compressOutbound=true, expectInboundCompressed=true
2025-12-10 03:11:06,607 INFO [c.c.e.n.rlpx.RLPxConnectionHandler] - [RLPx] Connection FULLY ESTABLISHED with peer 172.25.0.11:30303, entering handshaked state
2025-12-10 03:11:06,614 INFO [c.c.e.n.rlpx.RLPxConnectionHandler] - SEND_MSG: peer=172.25.0.11:30303, type=Status, code=0x10, seqNum=1
2025-12-10 03:11:06,652 ERROR [o.a.p.a.OneForOneStrategy] - Cannot decode Hello
...
2025-12-10 03:11:21,950 INFO [c.c.e.n.rlpx.RLPxConnectionHandler] - [RLPx] Connection FULLY ESTABLISHED with peer 172.25.0.13:30303, entering handshaked state
2025-12-10 03:11:21,951 INFO [c.c.e.n.rlpx.RLPxConnectionHandler] - SEND_MSG: peer=172.25.0.13:30303, type=Hello, code=0x0, seqNum=0
2025-12-10 03:11:21,958 INFO [c.c.e.n.rlpx.RLPxConnectionHandler] - SEND_MSG: peer=172.25.0.13:30303, type=Status, code=0x10, seqNum=1
2025-12-10 03:11:21,969 INFO [c.c.e.b.sync.CacheBasedBlacklist] - Blacklisting peer ... Reason: Some other reason specific to a subprotocol
SEND_MSG type=Hello is emitted by RLPxConnectionHandler.sendMessage (see src/main/scala/.../RLPxConnectionHandler.scala lines 340-386), which means the message went through the Snappy-enabled MessageCodec. The receiving side is still in awaitInitialHello, so it crashes before handshake completion, matching the stack trace rooted at HelloCodec.extractHello (lines 575-609 of the same file).
Current Fukuii implementation¶
markHelloAckReceived(lines 91-119) flipshelloWriteAcknowledgedas soon as TCP acknowledges the outbound hello and eagerly callsmessageCodec.enableInboundCompression("hello-write-ack")the moment a codec instance exists. This enables Snappy before both sides are guaranteed to have exchanged the plain-text hello frames.registerMessageCodec(lines 121-128) immediately enables inbound compression if that flag was pre-set, so the very first DEVp2p frame processed after the scaler entershandshakedmode is forced through Snappy.- The generic
handshakedreceive block (lines 329-415) does not special-caseHelloEnc, so any late-arrivingSendMessage(HelloEnc)is serialized viaMessageCodecinstead ofHelloCodec. BecauseMessageCodeccompresses whenever both peers advertise p2p ≥ 5 (CompressionPolicy.fromHandshakeinMessageCodec.scalalines 26-70), we end up sending a compressed hello. - Actor-mailbox ordering makes this race observable: if the peer manager enqueues
SendMessage(HelloEnc)roughly when the TCP selector is also delivering the remote hello frame, whichever arrives second at the actor will run either theawaitInitialHelloorhandshakedhandler. We saw the second case in the logs.
Core-Geth reference implementation¶
- Core-Geth never re-encodes hello through the Snappy pipeline. Their
rlpxTransport.doProtoHandshake(seep2p/transport.golines 134-153 inetclabscore/core-geth) writes our hello, waits for the remote hello viareadProtocolHandshake, and only then callst.conn.SetSnappy(their.Version >= snappyProtocolVersion). rlpx.Conn.SetSnappy(p2p/rlpx/rlpx.golines 99-126 ) simply toggles the read/write buffers used during frame compression; it is invoked strictly after both sides complete the handshake, so hello/status exchanges happen uncompressed as mandated by devp2p.- Because hello is only handled inside the transport handshake, there is no opportunity for a second hello to be sent via the Snappy path, and the receiver will never try to RLP-decode compressed data while still inside its
Hellocodec.
Root cause and remediation ideas¶
- Compressed hello leak – The actor currently uses the same
SendMessageentry point for both handshake and post-handshake traffic. As soon as the state machine flips tohandshaked, any residualSendMessage(HelloEnc)in the mailbox is serialized viaMessageCodec, which compresses it. This violates devp2p (hello must be plain RLP) and causes the remote side to throwCannot decode Hello. We should always routeHelloEncthroughHelloCodec.writeHello, even if the actor has already transitioned, or ensure the hello send happens synchronously before we process any inbound frames. - Over-eager compression –
CompressionPolicysetsexpectInboundCompressed=trueimmediately whenever both peers advertise p2p ≥ 5. In contrast, Core-Geth only begins decompressing afterreadProtocolHandshakesucceeds, i.e., afterhelloexchange completes. We should defer enabling inbound compression until after we have both sent and received hello (and ideally status), or at least until we observe the remoteHelloframe being parsed.
Next steps¶
- Modify
RLPxConnectionHandler.handshakedto interceptSendMessage(h: HelloEnc)and delegate toHelloCodecso hello is never compressed. This will also make the log noise (SEND_MSG ... type=Hello) disappear after the connection is fully established. - Gate
CompressionPolicy.enableInboundCompressionon an explicit "hello exchange completed" signal (both write acknowledged and remote hello parsed) to mirror Core-Geth'sSetSnappytiming. - Re-run the Gorgoroth validation after applying the above fixes and confirm that hello decoding succeeds and peers progress into the ETC handshake/status stages.