KPI Monitoring Guide - Fukuii Test Suite¶
Status: ✅ Active
Date: November 16, 2025
Related Documents: KPI_BASELINES.md, PERFORMANCE_BASELINES.md, TEST-002
Overview¶
This guide provides practical instructions for monitoring Key Performance Indicators (KPIs) in the Fukuii Ethereum Classic client. It covers daily monitoring workflows, threshold interpretation, regression detection, and escalation procedures.
Quick Reference¶
KPI Categories¶
- Test Execution Time - How long test suites take to complete
- Test Health - Success rates, flakiness, and coverage
- Ethereum/Tests Compliance - Pass rates for official test suites
- Performance Benchmarks - Execution speed for critical operations
- Memory Usage - Heap consumption and GC overhead
Critical Thresholds¶
- Essential Tests: > 7 minutes = Warning, > 10 minutes = Failure
- Test Success Rate: < 99% = Investigation required
- Performance Regression: > 20% slower = CI fails
- Memory: > 2.4 GB peak = Warning
- GC Overhead: > 6% = Warning
Monitoring Workflows¶
1. Pull Request Monitoring¶
Frequency: Every PR
Duration: ~15 minutes
Scope: Tier 1 (Essential) tests + KPI validation
What Gets Checked¶
- Essential tests complete in < 10 minutes
- KPI baseline definitions are valid
- No test regressions introduced
- Code formatting compliance
How to Monitor¶
Via GitHub Actions UI: 1. Navigate to PR → "Checks" tab 2. Look for "Test and Build (JDK 21, Scala 3.3.4)" workflow 3. Check "Validate KPI Baselines" step (should be ✅) 4. Review test timing in "Run tests with coverage" step
Via Command Line (local):
Warning Signs¶
- ⚠️ "Validate KPI Baselines" step fails
- ⚠️ Test execution exceeds 7 minutes
- ⚠️ New test failures appear
- ⚠️ Coverage drops significantly
Actions¶
- Green: PR can proceed to review
- Yellow: Investigate warnings, may need optimization
- Red: Block merge, investigate and fix issues
2. Nightly Build Monitoring¶
Frequency: Daily at 02:00 UTC
Duration: ~1-3 hours
Scope: Tier 3 (Comprehensive) tests
What Gets Checked¶
- Complete ethereum/tests suite
- Performance benchmarks
- Long-running stress tests
- Trend analysis over time
How to Monitor¶
Via GitHub Actions:
1. Navigate to Actions → "Ethereum/Tests Nightly"
2. Check latest run status
3. Download and review artifacts:
- ethereum-tests-nightly-logs-* - Execution logs
- ethereum-tests-nightly-reports-* - Test reports
Automated Notifications: - Slack alerts on failures (if configured) - Email summaries (daily) - GitHub Issues for persistent failures
Key Metrics to Track¶
Metric | Baseline | Warning | Failure
--------------------------------|----------|---------|--------
Total execution time | 90 min | 240 min | 300 min
Ethereum/tests pass rate | 100% | 95% | 90%
Performance regression count | 0 | 3 | 5
Memory peak | 1.5 GB | 2.4 GB | 3.0 GB
Actions¶
- All Green: Archive report, continue monitoring
- Warnings: Create tracking issue, investigate trends
- Failures: Immediate investigation, may block next release
3. Release Validation Monitoring¶
Frequency: Before each release
Duration: ~3-5 hours
Scope: Full comprehensive suite + compliance validation
What Gets Checked¶
- All test tiers (Essential, Standard, Comprehensive)
- Full ethereum/tests compliance report
- Performance benchmark comparison vs. baseline
- No performance regressions
- Coverage targets met
How to Monitor¶
Manual Trigger:
# Run comprehensive test suite
sbt testComprehensive
# Generate coverage report
sbt testCoverage
# Run benchmarks
sbt "Benchmark / test"
Via GitHub Actions:
1. Tag release candidate: git tag -a v1.x.x-rc1
2. Push tag: git push origin v1.x.x-rc1
3. Monitor "Release Validation" workflow
4. Review all artifacts and reports
Validation Checklist¶
- Essential tests < 5 minutes
- Standard tests < 30 minutes
- Comprehensive tests < 3 hours
- Test success rate > 99%
- Coverage > 80% line, > 70% branch
- No performance regressions > 10%
- Ethereum/tests compliance > 95%
- Memory usage < 2 GB peak
- GC overhead < 5%
Actions¶
- Pass All: Approve release
- Minor Issues: Document known issues, approve with caveats
- Major Issues: Block release, fix critical problems
Interpreting KPI Metrics¶
Test Execution Time¶
What It Measures: Wall-clock time to complete test suites
Baseline Values:
Essential: 4 minutes (target: < 5 minutes)
Standard: 22 minutes (target: < 30 minutes)
Comprehensive: 90 minutes (target: < 3 hours)
How to Interpret: - Within target: Normal operation - Warning threshold: Possible inefficiency, investigate trends - Failure threshold: Critical issue, may indicate: - Test hangs (actor cleanup failure) - Database locks - Network timeouts - Infinite loops
Common Causes of Degradation: 1. Actor systems not being cleaned up (see TEST-002 Phase 1 fix) 2. Database connections leaking 3. Network tests with long timeouts 4. Excessive compilation time
How to Fix:
// Example: Add actor cleanup
override def afterEach(): Unit = {
TestKit.shutdownActorSystem(system, verifySystemShutdown = false)
super.afterEach()
}
Test Health¶
What It Measures: Quality and reliability of test suite
Baseline Values:
Success Rate: 99.5% (target: > 99%)
Flakiness Rate: 0.5% (target: < 1%)
Line Coverage: 75% (target: > 80%)
Branch Coverage: 65% (target: > 70%)
How to Interpret: - Success Rate < 99%: Tests are failing consistently - Flakiness > 1%: Tests have intermittent failures - Coverage < targets: Insufficient test coverage
Common Issues: 1. Flaky Network Tests: Use mocks or increase timeouts 2. Race Conditions: Add proper synchronization 3. Environment-Dependent Tests: Use test fixtures 4. Low Coverage: Add tests for uncovered code paths
How to Identify Flaky Tests:
Ethereum/Tests Compliance¶
What It Measures: Pass rate for official Ethereum test suites
Baseline Values:
GeneralStateTests: 100% (Phase 2: SimpleTx only)
BlockchainTests: 100% (Phase 2: SimpleTx only)
TransactionTests: N/A (Pending Phase 3)
VMTests: N/A (Pending Phase 3)
How to Interpret: - 100% passing: Full compliance for tested categories - 95-99% passing: Minor edge cases failing - < 95% passing: Significant compliance issues
Expected Evolution: - Phase 2 (Current): SimpleTx tests at 100% - Phase 3 (Q1 2026): Full suite at > 95% - Ongoing: Maintain > 95% as tests are added
When Tests Fail: 1. Check if test is ETC-compatible (pre-Spiral only) 2. Verify test expectations match ETC consensus rules 3. Investigate EVM implementation for bugs 4. Compare results with reference implementations (geth, besu)
Performance Benchmarks¶
What It Measures: Execution speed for critical operations
Key Baselines:
Block Validation: 60ms P50 (target: < 100ms)
Tx Execution: 0.3ms P50 (target: < 1ms)
State Root Calc: 40ms P50 (target: < 50ms)
RLP Operations: 30μs P50 (target: < 100μs)
How to Interpret: - Within target: Good performance - 10-20% regression: Warning, monitor trends - > 20% regression: CI fails, must investigate
Regression Detection:
// Programmatic check
val actual = measureOperation()
val baseline = KPIBaselines.PerformanceBenchmarks.BlockValidation.simpleTxBlock.p50
val isRegression = KPIBaselines.Validation.isRegression(actual, baseline)
Common Performance Issues: 1. Inefficient algorithms: Review computational complexity 2. Memory allocations: Use object pooling 3. Database access: Batch operations, use caching 4. Serialization overhead: Optimize RLP encoding
Memory Usage¶
What It Measures: Heap consumption and GC behavior
Baseline Values:
Node Startup: 200 MB stable (300 MB peak)
Fast Sync: 800 MB stable (1.5 GB peak)
Full Sync: 1.2 GB stable (2.0 GB peak)
GC Overhead: 2.5% (target: < 5%)
How to Interpret: - Peak < 2 GB: Normal operation - Peak 2-2.4 GB: Warning, may need tuning - Peak > 2.4 GB: Memory leak suspected - GC > 5%: GC pressure, heap too small or memory leak
How to Investigate Memory Issues:
# Enable GC logging
export JAVA_OPTS="-Xlog:gc*:file=gc.log -XX:+HeapDumpOnOutOfMemoryError"
# Analyze heap dump
jhat heap.dump
# Or use VisualVM, Eclipse MAT
Common Memory Issues: 1. Caches not bounded: Implement LRU eviction 2. Large collections held in memory: Stream processing 3. Listeners not removed: Proper cleanup in tests 4. MPT nodes not released: Ensure proper trie pruning
Alerting and Escalation¶
Alert Levels¶
Level 1: Info¶
Trigger: Metric approaches warning threshold
Action: Document in daily summary, continue monitoring
Notification: None
Example: Test execution time increases from 4 to 5 minutes
Level 2: Warning¶
Trigger: Metric exceeds warning threshold
Action: Create GitHub issue, investigate within 2 business days
Notification: Slack (optional)
Example: Test execution time exceeds 7 minutes
Level 3: Error¶
Trigger: Metric exceeds failure threshold or critical test fails
Action: Immediate investigation, block merges/releases
Notification: Slack + Email
Example: Essential tests exceed 10 minutes, or test success rate < 99%
Level 4: Critical¶
Trigger: System-wide failure or data integrity issue
Action: Incident response, all-hands investigation
Notification: Slack + Email + On-call
Example: Ethereum/tests compliance drops to 0%, memory leak crashes CI
Escalation Paths¶
Level 1 → Level 2: Metric remains above warning for 3 consecutive days
Level 2 → Level 3: Metric exceeds failure threshold or no progress in 5 days
Level 3 → Level 4: Issue persists for 24 hours or affects production
Resolution Process¶
- Investigate: Collect logs, artifacts, and metrics
- Reproduce: Recreate issue locally if possible
- Isolate: Identify root cause through testing
- Fix: Implement minimal change to resolve issue
- Validate: Verify fix resolves issue without new regressions
- Document: Update runbooks and baselines if needed
- Close: Mark issue as resolved and verify in next build
Trend Analysis¶
Monthly KPI Review¶
Purpose: Identify long-term trends and preventive actions
Metrics to Track: - Test execution time trends (increasing/decreasing) - Flakiness rate over time - Coverage trends - Performance benchmark trends - Memory usage trends
Analysis:
Month | Essential | Standard | Coverage | Flakiness
---------|-----------|----------|----------|----------
2025-11 | 4.0 min | 22 min | 75% | 0.5%
2025-12 | 4.2 min | 24 min | 76% | 0.6%
2026-01 | 4.5 min | 26 min | 78% | 0.4%
Trend | +12% | +18% | +4% | Stable
Action | Monitor | Optimize | On track | Good
Actions: - Increasing Times: Review test efficiency, consider parallelization - Decreasing Coverage: Add tests for new code - Increasing Flakiness: Stabilize or remove flaky tests
Quarterly Baseline Review¶
Purpose: Update baselines to reflect improvements or changes
Process:
1. Collect 90 days of metrics
2. Calculate P50/P95/P99 values
3. Compare with current baselines
4. Propose updated baselines if significant change
5. Document rationale in KPI_BASELINES.md
6. Get engineering team approval
7. Update KPIBaselines.scala with new values
Criteria for Baseline Updates: - Sustained improvement > 10% for 3+ months - Architectural change requires new baseline - Test suite scope change (new categories added)
Tools and Automation¶
KPI Dashboard (Future)¶
Planned Features: - Real-time KPI metrics from CI/CD - Historical trend charts - Automated regression detection - Alert configuration UI
Tech Stack: - Grafana for visualization - Prometheus for metrics storage - Custom exporters for test results
Automated Reports¶
Daily Summary Email:
Subject: Fukuii KPI Summary - 2025-11-16
Essential Tests: ✅ 4.2 min (target: < 5 min)
Standard Tests: ✅ 23 min (target: < 30 min)
Nightly Build: ✅ 95 min (target: < 180 min)
Test Success Rate: ✅ 99.8%
Ethereum/Tests: ✅ 100% (4/4 SimpleTx tests)
Performance: ✅ No regressions
Memory: ✅ 1.8 GB peak
No action required.
Weekly Trend Report:
Subject: Fukuii KPI Trends - Week of 2025-11-11
Test Execution Time: 📈 Increasing (+8% vs last week)
Essential: 4.0 → 4.3 min
Standard: 22 → 24 min
Action: Investigate slow tests
Coverage: 📊 Stable
Line: 75.2% (target: 80%)
Branch: 65.8% (target: 70%)
Action: Add tests for uncovered code
Performance: ✅ Stable
No regressions detected
Recommendations:
1. Profile slow tests in standard suite
2. Add unit tests to improve coverage
Best Practices¶
For Developers¶
- Run Essential Tests Locally: Before pushing commits
- Check KPI Baselines: When adding new tests or features
- Monitor CI Feedback: Address failures promptly
- Use Benchmarks: Profile performance-critical code
For Test Authors¶
- Tag Tests Appropriately: SlowTest, IntegrationTest, etc.
- Clean Up Resources: Actor systems, databases, file handles
- Avoid Flakiness: No sleeps, use proper synchronization
- Document Performance: Note if test is performance-sensitive
For Reviewers¶
- Check Test Execution Times: Ensure PRs don't add slow tests
- Verify Coverage Changes: Coverage should not decrease
- Review Benchmark Impact: Performance regressions should be justified
- Validate KPI Impact: Check if changes affect baselines
Troubleshooting¶
"KPI Baselines validation failed"¶
Cause: KPIBaselinesSpec test failed
Check: Review test output for which assertion failed
Fix: Update KPIBaselines.scala if values are incorrect
"Test execution exceeded timeout"¶
Cause: Tests running longer than expected
Check: Look for hanging tests or actor cleanup issues
Fix: Add proper cleanup, increase timeout if justified
"Performance regression detected"¶
Cause: Operation slower than baseline by > 20%
Check: Profile the operation to find bottleneck
Fix: Optimize code or update baseline if change is intentional
"Coverage below target"¶
Cause: Code added without sufficient tests
Check: Review coverage report for uncovered lines
Fix: Add unit tests to cover new code
References¶
- KPI Baselines
- Performance Baselines
- TEST-002: Test Suite Strategy and KPIs
- Metrics and Monitoring
- GitHub Actions Documentation
Maintained by: Chippr Robotics Engineering Team
Last Updated: November 16, 2025
Next Review: February 16, 2026 (Quarterly)