Testing Tags Implementation - Next Steps¶
Based on: Testing Tags Verification Report
Date: November 17, 2025
Status: Phase 1 & 2 Complete (65%), Phase 3-5 Pending (35%)
Executive Summary¶
The testing tags infrastructure is substantially complete and production-ready. All critical infrastructure (tags system, SBT commands, ethereum/tests adapter) is implemented and validated. The remaining work is primarily systematic application and execution rather than new development.
Estimated Effort to 100% Completion: 2-3 weeks
Immediate Actions (High Priority)¶
1. Complete Test Tagging (Phase 2 Completion)¶
Status: 32% complete (48/150+ files tagged)
Objective: Tag all remaining test files with appropriate ScalaTest tags.
Effort: 2-3 days
Steps: 1. Identify all test files without tag imports:
# Find test files without Tags import
find src -name "*Spec.scala" -o -name "*Test.scala" | \
xargs grep -L "import.*Tags" | \
grep -v "/target/"
-
For each file, add appropriate tags:
import com.chipprbots.ethereum.testing.Tags._ // Unit test example "MyComponent" should "do something" taggedAs(UnitTest) in { ... } // Integration test example "Database" should "persist data" taggedAs(IntegrationTest, DatabaseTest) in { ... } // Slow test example "LargeSync" should "sync blocks" taggedAs(SlowTest, SyncTest) in { ... } -
Follow tagging guidelines:
- UnitTest: Fast (< 100ms), no external dependencies
- IntegrationTest: Multiple components, may use database/network
- SlowTest: > 100ms execution time
- Module tags: CryptoTest, VMTest, NetworkTest, etc.
-
Fork tags: BerlinTest, IstanbulTest, etc. (for fork-specific tests)
-
Verify tagging:
Files by Priority: - High: VM, State, Consensus tests - Medium: Network, Database, MPT tests - Low: Utility, RLP, Crypto tests (some already tagged)
2. Execute Full Ethereum/Tests Suite (Phase 4 Kickoff)¶
Status: Phase 2 complete (validation passing), Phase 3 ready
Objective: Run comprehensive ethereum/tests suites and document results.
Effort: 1-2 weeks (execution + analysis + fixes)
Steps:
2.1 Run BlockchainTests Suite¶
# Full suite execution
sbt "IntegrationTest / testOnly *ComprehensiveBlockchainTestsSpec"
# Monitor execution time
# Expected: 30-60 minutes
Expected Results: - Target: > 90% pass rate - Categories: ValidBlocks, InvalidBlocks, bcStateTests
2.2 Run GeneralStateTests Suite¶
# Full suite execution
sbt "IntegrationTest / testOnly *GeneralStateTestsSpec"
# Monitor execution time
# Expected: 30-60 minutes
Expected Results: - Target: > 95% pass rate - Categories: stArgsZeroOneBalance, stCodeSizeLimit, etc.
2.3 Run VMTests Suite¶
Expected Results: - Target: > 95% pass rate - Validates all 140+ EVM opcodes
2.4 Run TransactionTests Suite¶
# Full suite execution
sbt "IntegrationTest / testOnly *TransactionTestsSpec"
# Expected: 10-20 minutes
Expected Results: - Target: > 95% pass rate - Validates transaction validation logic
2.5 Document Results¶
Create docs/testing/ETHEREUM_TESTS_COMPLIANCE_REPORT.md:
- Test suite breakdown
- Pass/fail rates by category
- Failures analysis
- Network filtering statistics
- Comparison with geth/besu (if possible)
3. Measure KPI Baselines (Phase 3 Completion)¶
Status: Baselines defined, measurement pending
Objective: Measure and document actual test execution times and metrics.
Effort: 1 day
Steps:
3.1 Measure Test Execution Times¶
# Run each tier with timing
time sbt testEssential > test-essential-timing.log 2>&1
time sbt testStandard > test-standard-timing.log 2>&1
time sbt testComprehensive > test-comprehensive-timing.log 2>&1
3.2 Extract Metrics¶
# Extract test counts and timings
grep -E "Total number of tests run|Tests: succeeded|Run completed" test-*.log
# Example output:
# testEssential: 450 tests in 3m 42s
# testStandard: 1200 tests in 18m 15s
# testComprehensive: 2500+ tests in 2h 15m
3.3 Document Results¶
Update docs/testing/KPI_BASELINES.md:
- Measured test execution times
- Test counts per tier
- Coverage percentages
- Comparison against targets
3.4 Validate Against Targets¶
Compare measured values against ADR-017 targets: - Essential: < 5 minutes ✅ or ❌ - Standard: < 30 minutes ✅ or ❌ - Comprehensive: < 3 hours ✅ or ❌
If any tier exceeds target, analyze and optimize: - Profile slow tests - Consider parallelization - Move tests to higher tier if appropriate
Short-term Actions (Medium Priority)¶
4. Generate Compliance Report (Phase 4 Continuation)¶
Effort: 2-3 days
Deliverable: docs/testing/ETHEREUM_TESTS_COMPLIANCE_REPORT.md
Contents: 1. Executive Summary - Overall pass rate - Compliance level (95%+ = excellent, 90-95% = good, < 90% = needs work)
-
Test Suite Breakdown
BlockchainTests: - ValidBlocks/bcValidBlockTest: 45/50 (90%) - ValidBlocks/bcStateTests: 38/40 (95%) - InvalidBlocks: 20/25 (80%) Total: 103/115 (90%) GeneralStateTests: - stArgsZeroOneBalance: 15/15 (100%) - stCodeSizeLimit: 12/12 (100%) - ... (more categories) Total: 450/475 (95%) VMTests: - vmArithmeticTest: 25/25 (100%) - vmBitwiseLogicOperation: 18/18 (100%) - ... (more categories) Total: 140/150 (93%) TransactionTests: - ttNonce: 10/10 (100%) - ttData: 8/8 (100%) - ... (more categories) Total: 65/70 (93%) -
Failure Analysis
- Common failure patterns
- Network-specific issues
- Known ETC divergences
-
Action items for fixes
-
Network Filtering
- Pre-Spiral tests included
- Post-Spiral tests excluded
-
Network version distribution
-
Cross-Client Comparison (if available)
- Fukuii vs geth pass rates
- Fukuii vs besu pass rates
- Notable differences
5. Update CI Workflows (Phase 2 Cleanup)¶
Effort: 30 minutes - 1 hour
Objective: Make CI workflows explicitly use tiered test commands.
Changes:
5.1 Update .github/workflows/ci.yml¶
- name: Run Essential Tests
run: sbt testEssential
timeout-minutes: 10
env:
FUKUII_DEV: true
- name: Run Standard Tests with Coverage
run: sbt testStandard
timeout-minutes: 45
env:
FUKUII_DEV: true
if: success()
Benefits: - Clearer test categorization - Explicit tier execution - Better alignment with ADR-017
5.2 Update .github/workflows/nightly.yml¶
Add comprehensive test job:
jobs:
nightly-comprehensive-tests:
name: Nightly Comprehensive Test Suite
runs-on: ubuntu-latest
timeout-minutes: 240
steps:
- name: Run Comprehensive Tests
run: sbt testComprehensive
env:
FUKUII_DEV: true
6. Document Test Guidelines (Phase 2 Documentation)¶
Effort: 2-3 hours
Deliverable: docs/testing/TEST_CATEGORIZATION_GUIDELINES.md
Contents: 1. Introduction - Purpose of test categorization - Three-tier strategy overview
- Tag Selection Criteria
- Decision tree for choosing tags
- Examples for each tag
-
Anti-patterns (tags not to use together)
-
Best Practices
- One test, one purpose
- Minimize test execution time
- Proper resource cleanup
-
Avoid flakiness
-
Common Patterns
// Unit test - fast, no dependencies "Parser" should "parse valid input" taggedAs(UnitTest, RLPTest) in { val result = RLP.decode(validInput) result shouldBe expected } // Integration test - multiple components "BlockImporter" should "import block" taggedAs(IntegrationTest, DatabaseTest) in { val blockchain = createBlockchain() val result = blockchain.importBlock(testBlock) result shouldBe Right(Imported) } // Slow test - long execution "Sync" should "sync 1000 blocks" taggedAs(SlowTest, SyncTest, IntegrationTest) in { val sync = createSyncService() val result = sync.syncBlocks(1000) result should have length 1000 } -
Tag Reference
- Complete list of available tags
- Usage guidelines for each tag
- SBT filter examples
Long-term Actions (Low Priority)¶
7. Implement Metrics Tracking (Phase 3 & 5)¶
Effort: 3-5 days
Objective: Automated KPI tracking and alerting.
Components:
7.1 Metrics Collection¶
- Parse CI workflow outputs
- Extract test timing, counts, pass rates
- Store in time-series format (JSON/CSV)
7.2 Dashboard (Optional)¶
- GitHub Pages static dashboard
- Charts for KPI trends
- Coverage over time
- Pass rate history
7.3 Alerting¶
- Slack webhook integration
- Email notifications
- GitHub Issue auto-creation for regressions
Configuration:
# .github/workflows/ci.yml
- name: Track Metrics
run: |
python scripts/track_metrics.py \
--test-output test-results.xml \
--coverage-report coverage/scoverage.xml \
--output metrics-${{ github.run_number }}.json
- name: Check for Regressions
run: |
python scripts/check_regressions.py \
--current metrics-${{ github.run_number }}.json \
--baseline metrics-baseline.json \
--slack-webhook ${{ secrets.SLACK_WEBHOOK }}
8. Establish Continuous Improvement Process (Phase 5)¶
Effort: Ongoing
Objective: Regular KPI review and baseline updates.
Schedule:
Monthly KPI Review (1st Monday)¶
- Review test execution time trends
- Analyze coverage changes
- Identify flaky tests
- Check ethereum/tests pass rate
Checklist: - [ ] Review GitHub Actions timing - [ ] Check coverage reports - [ ] Analyze test failures - [ ] Update tracking spreadsheet
Quarterly Baseline Adjustment (1st of Quarter)¶
- Re-measure comprehensive test suite
- Update KPI baselines if needed
- Document changes
- Get engineering team approval
Process:
1. Run comprehensive suite (3+ iterations)
2. Calculate new P50/P95/P99 values
3. Compare with existing baselines
4. Document justification for changes
5. Update KPI_BASELINES.md and KPIBaselines.scala
6. Create PR for team review
Regular Ethereum/Tests Sync (Monthly)¶
- Check for new ethereum/tests releases
- Update test submodule
- Run full suite
- Document new test additions
Success Criteria¶
Phase 2 Complete (Test Categorization)¶
- All test files tagged (100% coverage)
- CI workflows use explicit tier commands
- Test categorization guidelines documented
- Verify testEssential runs in < 5 minutes
- Verify testStandard runs in < 30 minutes
Phase 3 Complete (KPI Baseline)¶
- Comprehensive test suite executed
- Baseline metrics documented
- KPI tracking configured in CI
- Baselines validated against targets
Phase 4 Complete (Ethereum/Tests Integration)¶
- Full BlockchainTests suite executed (> 90% pass rate)
- Full GeneralStateTests suite executed (> 95% pass rate)
- Full VMTests suite executed (> 95% pass rate)
- Full TransactionTests suite executed (> 95% pass rate)
- Compliance report generated
- Results compared with other clients
Phase 5 Complete (Continuous Improvement)¶
- Monthly KPI review process established
- Quarterly baseline adjustment schedule set
- Ethereum/tests sync process documented
- Performance regression analysis automated
Resources¶
Documentation: - Testing Tags Verification Report - TEST-001 ADR - TEST-002 ADR - KPI Baselines
Tools: - ScalaTest: https://www.scalatest.org/ - scoverage: https://github.com/scoverage/scalac-scoverage-plugin - ethereum/tests: https://github.com/ethereum/tests
Team Contacts: - Engineering Team: Chippr Robotics LLC - Questions: GitHub Issues
Created: November 17, 2025
Last Updated: November 17, 2025
Next Review: Upon Phase 2 completion