Skip to content

Testing Tags Implementation - Next Steps

Based on: Testing Tags Verification Report
Date: November 17, 2025
Status: Phase 1 & 2 Complete (65%), Phase 3-5 Pending (35%)


Executive Summary

The testing tags infrastructure is substantially complete and production-ready. All critical infrastructure (tags system, SBT commands, ethereum/tests adapter) is implemented and validated. The remaining work is primarily systematic application and execution rather than new development.

Estimated Effort to 100% Completion: 2-3 weeks


Immediate Actions (High Priority)

1. Complete Test Tagging (Phase 2 Completion)

Status: 32% complete (48/150+ files tagged)

Objective: Tag all remaining test files with appropriate ScalaTest tags.

Effort: 2-3 days

Steps: 1. Identify all test files without tag imports:

# Find test files without Tags import
find src -name "*Spec.scala" -o -name "*Test.scala" | \
  xargs grep -L "import.*Tags" | \
  grep -v "/target/"

  1. For each file, add appropriate tags:

    import com.chipprbots.ethereum.testing.Tags._
    
    // Unit test example
    "MyComponent" should "do something" taggedAs(UnitTest) in { ... }
    
    // Integration test example
    "Database" should "persist data" taggedAs(IntegrationTest, DatabaseTest) in { ... }
    
    // Slow test example
    "LargeSync" should "sync blocks" taggedAs(SlowTest, SyncTest) in { ... }
    

  2. Follow tagging guidelines:

  3. UnitTest: Fast (< 100ms), no external dependencies
  4. IntegrationTest: Multiple components, may use database/network
  5. SlowTest: > 100ms execution time
  6. Module tags: CryptoTest, VMTest, NetworkTest, etc.
  7. Fork tags: BerlinTest, IstanbulTest, etc. (for fork-specific tests)

  8. Verify tagging:

    sbt testEssential  # Should exclude SlowTest, IntegrationTest
    sbt testStandard   # Should exclude BenchmarkTest, EthereumTest
    

Files by Priority: - High: VM, State, Consensus tests - Medium: Network, Database, MPT tests - Low: Utility, RLP, Crypto tests (some already tagged)


2. Execute Full Ethereum/Tests Suite (Phase 4 Kickoff)

Status: Phase 2 complete (validation passing), Phase 3 ready

Objective: Run comprehensive ethereum/tests suites and document results.

Effort: 1-2 weeks (execution + analysis + fixes)

Steps:

2.1 Run BlockchainTests Suite

# Full suite execution
sbt "IntegrationTest / testOnly *ComprehensiveBlockchainTestsSpec"

# Monitor execution time
# Expected: 30-60 minutes

Expected Results: - Target: > 90% pass rate - Categories: ValidBlocks, InvalidBlocks, bcStateTests

2.2 Run GeneralStateTests Suite

# Full suite execution
sbt "IntegrationTest / testOnly *GeneralStateTestsSpec"

# Monitor execution time
# Expected: 30-60 minutes

Expected Results: - Target: > 95% pass rate - Categories: stArgsZeroOneBalance, stCodeSizeLimit, etc.

2.3 Run VMTests Suite

# Full suite execution
sbt "IntegrationTest / testOnly *VMTestsSpec"

# Expected: 15-30 minutes

Expected Results: - Target: > 95% pass rate - Validates all 140+ EVM opcodes

2.4 Run TransactionTests Suite

# Full suite execution
sbt "IntegrationTest / testOnly *TransactionTestsSpec"

# Expected: 10-20 minutes

Expected Results: - Target: > 95% pass rate - Validates transaction validation logic

2.5 Document Results

Create docs/testing/ETHEREUM_TESTS_COMPLIANCE_REPORT.md: - Test suite breakdown - Pass/fail rates by category - Failures analysis - Network filtering statistics - Comparison with geth/besu (if possible)


3. Measure KPI Baselines (Phase 3 Completion)

Status: Baselines defined, measurement pending

Objective: Measure and document actual test execution times and metrics.

Effort: 1 day

Steps:

3.1 Measure Test Execution Times

# Run each tier with timing
time sbt testEssential > test-essential-timing.log 2>&1
time sbt testStandard > test-standard-timing.log 2>&1
time sbt testComprehensive > test-comprehensive-timing.log 2>&1

3.2 Extract Metrics

# Extract test counts and timings
grep -E "Total number of tests run|Tests: succeeded|Run completed" test-*.log

# Example output:
# testEssential: 450 tests in 3m 42s
# testStandard: 1200 tests in 18m 15s
# testComprehensive: 2500+ tests in 2h 15m

3.3 Document Results

Update docs/testing/KPI_BASELINES.md: - Measured test execution times - Test counts per tier - Coverage percentages - Comparison against targets

3.4 Validate Against Targets

Compare measured values against ADR-017 targets: - Essential: < 5 minutes ✅ or ❌ - Standard: < 30 minutes ✅ or ❌ - Comprehensive: < 3 hours ✅ or ❌

If any tier exceeds target, analyze and optimize: - Profile slow tests - Consider parallelization - Move tests to higher tier if appropriate


Short-term Actions (Medium Priority)

4. Generate Compliance Report (Phase 4 Continuation)

Effort: 2-3 days

Deliverable: docs/testing/ETHEREUM_TESTS_COMPLIANCE_REPORT.md

Contents: 1. Executive Summary - Overall pass rate - Compliance level (95%+ = excellent, 90-95% = good, < 90% = needs work)

  1. Test Suite Breakdown

    BlockchainTests:
      - ValidBlocks/bcValidBlockTest: 45/50 (90%)
      - ValidBlocks/bcStateTests: 38/40 (95%)
      - InvalidBlocks: 20/25 (80%)
      Total: 103/115 (90%)
    
    GeneralStateTests:
      - stArgsZeroOneBalance: 15/15 (100%)
      - stCodeSizeLimit: 12/12 (100%)
      - ... (more categories)
      Total: 450/475 (95%)
    
    VMTests:
      - vmArithmeticTest: 25/25 (100%)
      - vmBitwiseLogicOperation: 18/18 (100%)
      - ... (more categories)
      Total: 140/150 (93%)
    
    TransactionTests:
      - ttNonce: 10/10 (100%)
      - ttData: 8/8 (100%)
      - ... (more categories)
      Total: 65/70 (93%)
    

  2. Failure Analysis

  3. Common failure patterns
  4. Network-specific issues
  5. Known ETC divergences
  6. Action items for fixes

  7. Network Filtering

  8. Pre-Spiral tests included
  9. Post-Spiral tests excluded
  10. Network version distribution

  11. Cross-Client Comparison (if available)

  12. Fukuii vs geth pass rates
  13. Fukuii vs besu pass rates
  14. Notable differences

5. Update CI Workflows (Phase 2 Cleanup)

Effort: 30 minutes - 1 hour

Objective: Make CI workflows explicitly use tiered test commands.

Changes:

5.1 Update .github/workflows/ci.yml

- name: Run Essential Tests
  run: sbt testEssential
  timeout-minutes: 10
  env:
    FUKUII_DEV: true

- name: Run Standard Tests with Coverage
  run: sbt testStandard
  timeout-minutes: 45
  env:
    FUKUII_DEV: true
  if: success()

Benefits: - Clearer test categorization - Explicit tier execution - Better alignment with ADR-017

5.2 Update .github/workflows/nightly.yml

Add comprehensive test job:

jobs:
  nightly-comprehensive-tests:
    name: Nightly Comprehensive Test Suite
    runs-on: ubuntu-latest
    timeout-minutes: 240
    steps:
      - name: Run Comprehensive Tests
        run: sbt testComprehensive
        env:
          FUKUII_DEV: true


6. Document Test Guidelines (Phase 2 Documentation)

Effort: 2-3 hours

Deliverable: docs/testing/TEST_CATEGORIZATION_GUIDELINES.md

Contents: 1. Introduction - Purpose of test categorization - Three-tier strategy overview

  1. Tag Selection Criteria
  2. Decision tree for choosing tags
  3. Examples for each tag
  4. Anti-patterns (tags not to use together)

  5. Best Practices

  6. One test, one purpose
  7. Minimize test execution time
  8. Proper resource cleanup
  9. Avoid flakiness

  10. Common Patterns

    // Unit test - fast, no dependencies
    "Parser" should "parse valid input" taggedAs(UnitTest, RLPTest) in {
      val result = RLP.decode(validInput)
      result shouldBe expected
    }
    
    // Integration test - multiple components
    "BlockImporter" should "import block" taggedAs(IntegrationTest, DatabaseTest) in {
      val blockchain = createBlockchain()
      val result = blockchain.importBlock(testBlock)
      result shouldBe Right(Imported)
    }
    
    // Slow test - long execution
    "Sync" should "sync 1000 blocks" taggedAs(SlowTest, SyncTest, IntegrationTest) in {
      val sync = createSyncService()
      val result = sync.syncBlocks(1000)
      result should have length 1000
    }
    

  11. Tag Reference

  12. Complete list of available tags
  13. Usage guidelines for each tag
  14. SBT filter examples

Long-term Actions (Low Priority)

7. Implement Metrics Tracking (Phase 3 & 5)

Effort: 3-5 days

Objective: Automated KPI tracking and alerting.

Components:

7.1 Metrics Collection

  • Parse CI workflow outputs
  • Extract test timing, counts, pass rates
  • Store in time-series format (JSON/CSV)

7.2 Dashboard (Optional)

  • GitHub Pages static dashboard
  • Charts for KPI trends
  • Coverage over time
  • Pass rate history

7.3 Alerting

  • Slack webhook integration
  • Email notifications
  • GitHub Issue auto-creation for regressions

Configuration:

# .github/workflows/ci.yml
- name: Track Metrics
  run: |
    python scripts/track_metrics.py \
      --test-output test-results.xml \
      --coverage-report coverage/scoverage.xml \
      --output metrics-${{ github.run_number }}.json

- name: Check for Regressions
  run: |
    python scripts/check_regressions.py \
      --current metrics-${{ github.run_number }}.json \
      --baseline metrics-baseline.json \
      --slack-webhook ${{ secrets.SLACK_WEBHOOK }}


8. Establish Continuous Improvement Process (Phase 5)

Effort: Ongoing

Objective: Regular KPI review and baseline updates.

Schedule:

Monthly KPI Review (1st Monday)

  • Review test execution time trends
  • Analyze coverage changes
  • Identify flaky tests
  • Check ethereum/tests pass rate

Checklist: - [ ] Review GitHub Actions timing - [ ] Check coverage reports - [ ] Analyze test failures - [ ] Update tracking spreadsheet

Quarterly Baseline Adjustment (1st of Quarter)

  • Re-measure comprehensive test suite
  • Update KPI baselines if needed
  • Document changes
  • Get engineering team approval

Process: 1. Run comprehensive suite (3+ iterations) 2. Calculate new P50/P95/P99 values 3. Compare with existing baselines 4. Document justification for changes 5. Update KPI_BASELINES.md and KPIBaselines.scala 6. Create PR for team review

Regular Ethereum/Tests Sync (Monthly)

  • Check for new ethereum/tests releases
  • Update test submodule
  • Run full suite
  • Document new test additions

Success Criteria

Phase 2 Complete (Test Categorization)

  • All test files tagged (100% coverage)
  • CI workflows use explicit tier commands
  • Test categorization guidelines documented
  • Verify testEssential runs in < 5 minutes
  • Verify testStandard runs in < 30 minutes

Phase 3 Complete (KPI Baseline)

  • Comprehensive test suite executed
  • Baseline metrics documented
  • KPI tracking configured in CI
  • Baselines validated against targets

Phase 4 Complete (Ethereum/Tests Integration)

  • Full BlockchainTests suite executed (> 90% pass rate)
  • Full GeneralStateTests suite executed (> 95% pass rate)
  • Full VMTests suite executed (> 95% pass rate)
  • Full TransactionTests suite executed (> 95% pass rate)
  • Compliance report generated
  • Results compared with other clients

Phase 5 Complete (Continuous Improvement)

  • Monthly KPI review process established
  • Quarterly baseline adjustment schedule set
  • Ethereum/tests sync process documented
  • Performance regression analysis automated

Resources

Documentation: - Testing Tags Verification Report - TEST-001 ADR - TEST-002 ADR - KPI Baselines

Tools: - ScalaTest: https://www.scalatest.org/ - scoverage: https://github.com/scoverage/scalac-scoverage-plugin - ethereum/tests: https://github.com/ethereum/tests

Team Contacts: - Engineering Team: Chippr Robotics LLC - Questions: GitHub Issues


Created: November 17, 2025
Last Updated: November 17, 2025
Next Review: Upon Phase 2 completion