Skip to content

Cache Design Documentation

Status: ✅ IMPLEMENTED — Released in Zaojun 1.0.0, production since 1.0.3

Overview

Zaojun's caching system improves performance for repeated dependency checks by storing PyPI API responses locally. This document provides technical details about the cache implementation, design decisions, and usage patterns.

Architecture

High-Level Design

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Zaojun CLI    │────│   PyPICache     │────│   File System   │
│                 │    │                 │    │                 │
│  - check()      │    │  - get()        │    │  - JSON files   │
│  - get_latest_  │    │  - set()        │    │  - Atomic ops   │
│    pypi_version │    │  - clear()      │    │  - TTL cleanup  │
│                 │    │  - get_stats()  │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                         ┌───────────────┐
                         │   PyPI API    │
                         │   (fallback)  │
                         └───────────────┘

Cache Flow

  1. Cache-First Strategy: When checking a package, Zaojun first checks the cache
  2. TTL Validation: Cache entries are validated against their time-to-live
  3. Fallback to PyPI: If cache miss or expired, fetch from PyPI
  4. Cache Population: Successful PyPI responses are cached for subsequent use
  5. Statistics Tracking: All cache operations are tracked for monitoring

Implementation Details

PyPICache Class

The core cache implementation is in the PyPICache class with the following key methods:

__init__(cache_dir=None, ttl_hours=24, enabled=True)

  • cache_dir: Custom cache directory or platform-specific default
  • ttl_hours: Time-to-live for cache entries (default: 24 hours)
  • enabled: Master switch for cache functionality

get(package_name) -> dict | None

  1. Check if caching is enabled
  2. Generate cache key for package
  3. Check if cache file exists
  4. Validate TTL (remove if expired)
  5. Load and return data, or None if miss/expired

set(package_name, data) -> bool

  1. Check if caching is enabled
  2. Prepare cache entry with timestamp
  3. Write to temporary file
  4. Atomically rename to final location
  5. Return success/failure status

clear(package_name=None) -> int

  • package_name: Clear specific package or all packages
  • Returns number of entries cleared
  • Handles filesystem errors gracefully

get_stats() -> dict

Returns comprehensive statistics: - Hits, misses, expired entries, errors - Total entries and cache size - Cache configuration (TTL, directory, enabled status)

is_expired(package_name) -> bool

Quick check for cache expiration without loading full data

Cache Storage Format

File Location

Cache files are stored in platform-appropriate directories: - Linux/Unix: ~/.cache/zaojun/pypi_cache/ (respects XDG_CACHE_HOME) - macOS: ~/Library/Caches/zaojun/pypi_cache/ - Windows: %LOCALAPPDATA%\zaojun\pypi_cache\

File Naming

Cache keys are generated using:

normalized_name = package_name.lower().replace("_", "-")
key_hash = hashlib.md5(normalized_name.encode()).hexdigest()
cache_key = f"{normalized_name}_{key_hash[:8]}.json"

Example: requests_3298f130.json

JSON Structure

{
  "timestamp": 1678886400.123456,
  "package_name": "package-name",
  "data": {
    "version": "1.0.0",
    "package_info": {
      "summary": "Package description",
      "author": "Author Name",
      "home_page": "https://example.com",
      "requires_python": ">=3.11",
      "requires_dist": ["other-package>=1.0"]
    }
  }
}

Integration Points

Modified Functions

  1. get_latest_pypi_version(): Cache-first strategy
  2. check_dependency(): Accepts optional cache parameter
  3. process_dependencies(): Propagates cache to dependency checks
  4. check(): CLI entry point with cache options

CLI Options

  • --cache / --no-cache: Enable/disable caching (default: disabled)
  • --clear-cache: Clear all cache entries before checking
  • --cache-stats: Show cache statistics after checking

Design Decisions

1. TTL-Based Expiration

Decision: Use time-based expiration rather than version-based Rationale: - Simpler implementation - Predictable cache behavior - Handles PyPI updates automatically - Configurable based on user needs

2. JSON Storage Format

Decision: Use JSON files rather than binary or database Rationale: - Human readable for debugging - Easy to inspect and modify - Standard Python library support - Cross-platform compatibility

3. Atomic File Operations

Decision: Write to temp file then rename atomically Rationale: - Prevents corrupted cache from partial writes - Thread-safe on filesystem level - Consistent cache state

4. Cache-Through Pattern

Decision: Cache on read miss rather than write-through Rationale: - Only cache what's actually needed - No wasted storage for unused packages - Natural cache warming during usage

5. Default Disabled

Decision: Cache disabled by default in CLI Rationale: - Backward compatibility with existing workflows - Users must opt-in for new behavior - Clear migration path for existing users

Performance Characteristics

Cache Effectiveness Metrics

hit_rate = hits / (hits + misses)
effectiveness = hits / total_requests

Expected Performance Gains

Scenario Without Cache With Cache Improvement
First run Network latency Network latency + cache write 0%
Repeated runs Network latency each time Filesystem read 10-100x
Offline mode Fails Returns cached data Infinite

Memory Usage

  • Minimal memory footprint
  • Only current package data loaded
  • No in-memory cache beyond current operation

Disk Usage

  • Approximately 1-10KB per package
  • Automatic cleanup of expired entries
  • Configurable TTL controls growth

Error Handling

Cache-Specific Errors

  1. Corrupted Cache Files
  2. Automatically detected and removed
  3. Statistics track corruption count
  4. No impact on functionality

  5. Filesystem Errors

  6. Permission errors logged
  7. Cache operations fail gracefully
  8. Fallback to PyPI API

  9. TTL Calculation Errors

  10. Default to cache miss
  11. Conservative approach (treat as expired)

Integration Error Handling

try:
    # Try cache first
    cached = cache.get(package_name)
    if cached:
        return cached["version"]
except CacheError:
    # Log but continue to PyPI
    pass

# Fallback to PyPI
return fetch_from_pypi(package_name)

Security Considerations

Cache Isolation

  • User-specific cache directories
  • No shared cache between users
  • Platform-appropriate permissions

Data Validation

  • JSON schema validation on read
  • Timestamp validation for TTL
  • Package name normalization

Hash Usage

  • MD5 used for filename generation only
  • Not for cryptographic security
  • Collision risk acceptable for this use case

Testing Strategy

Unit Tests

  • Cache hit/miss scenarios
  • TTL expiration testing
  • Error condition simulation
  • Filesystem interaction mocking

Integration Tests

  • End-to-end cache functionality
  • CLI option combinations
  • Performance measurement
  • Cross-platform compatibility

Test Coverage Areas

  1. Cache Operations: get, set, clear, stats
  2. TTL Logic: Expiration, cleanup, validation
  3. Error Handling: Corruption, permissions, network
  4. Integration: CLI options, API usage, fallback

Configuration Options

Runtime Configuration

# Programmatic configuration
cache = PyPICache(
    cache_dir=Path("/custom/cache"),
    ttl_hours=48,
    enabled=True,
)

Environment Variables

  • XDG_CACHE_HOME: Custom cache directory (Unix)
  • ZAOJUN_CACHE_DIR: Override cache directory
  • ZAOJUN_CACHE_TTL: Custom TTL in hours

Future Configuration File

Planned support for ~/.config/zaojun/config.toml:

[cache]
enabled = true
ttl_hours = 24
directory = "~/.cache/zaojun/pypi_cache"
max_size_mb = 100

Monitoring and Maintenance

Cache Statistics

zaojun --cache-stats
# Output:
# Cache Statistics:
#   Enabled: True
#   Hits: 42
#   Misses: 15
#   Expired: 3
#   Errors: 0
#   Total entries: 27
#   Cache directory: /home/user/.cache/zaojun/pypi_cache

Maintenance Tasks

  1. Regular Clearing: zaojun --clear-cache
  2. Size Monitoring: Check total entries and disk usage
  3. Hit Rate Analysis: Monitor cache effectiveness
  4. TTL Adjustment: Based on update frequency needs

Health Checks

def check_cache_health(cache: PyPICache) -> bool:
    stats = cache.get_stats()

    # Check hit rate
    hit_rate = stats["hits"] / max(stats["hits"] + stats["misses"], 1)
    if hit_rate < 0.3:
        print(f"Low cache hit rate: {hit_rate:.1%}")

    # Check error rate
    error_rate = stats["errors"] / max(stats["hits"] + stats["misses"], 1)
    if error_rate > 0.05:
        print(f"High cache error rate: {error_rate:.1%}")

    return hit_rate > 0.5 and error_rate < 0.1

Future Enhancements

Planned Features

  1. Configurable TTL: Per-package or pattern-based TTL
  2. Cache Size Limits: Automatic pruning based on size or LRU
  3. Compression: Gzip compression for cache files
  4. Batch Operations: Bulk cache operations for efficiency
  5. Cache Invalidation: Manual invalidation triggers

Integration Improvements

  1. Multiple Index Support: Cache for different package indexes
  2. Offline Mode: Explicit offline operation with cache only
  3. Cache Sharing: Read-only shared cache for teams
  4. Cache Migration: Tools for cache maintenance and migration

Performance Optimizations

  1. Memory Cache Layer: LRU in-memory cache for hot packages
  2. Prefetching: Cache warming based on usage patterns
  3. Delta Updates: Store only changed package information
  4. Concurrent Access: Thread-safe cache operations

Migration and Compatibility

Backward Compatibility

  • Cache disabled by default
  • Existing workflows unchanged
  • Gradual opt-in adoption

Upgrade Path

  1. Phase 1: Cache implementation (1.0.0) - ✅ COMPLETED
  2. Phase 2: Cache enabled by default (future consideration)
  3. Phase 3: Advanced cache features (future consideration)

Deprecation Strategy

No deprecations in the current implementation. The cache system is fully functional and available for use. Future changes may include: - Migration tools for cache format changes - Compatibility layers for API changes - Clear upgrade instructions

Troubleshooting

Common Issues

Cache Not Working

  1. Check if --cache flag is used
  2. Verify cache directory permissions
  3. Check cache statistics for errors

Stale Cache Data

  1. Use --clear-cache to force refresh
  2. Check TTL configuration
  3. Verify system clock accuracy

Performance Issues

  1. Monitor cache hit rate
  2. Check filesystem performance
  3. Consider SSD vs HDD impact

Debugging Commands

# Check cache directory
ls -la ~/.cache/zaojun/pypi_cache/

# Inspect cache file
cat ~/.cache/zaojun/pypi_cache/requests_*.json | jq .

# Test cache functionality
zaojun --cache --cache-stats --clear-cache

Conclusion

The Zaojun caching system provides significant performance improvements for repeated dependency checks while maintaining backward compatibility and robust error handling. The design emphasizes simplicity, reliability, and user control, with clear paths for future enhancement.

Key strengths: - Performance: 10-100x faster repeated executions - Reliability: Graceful fallback to PyPI on cache issues - Control: User-configurable TTL and clear cache management - Monitoring: Comprehensive statistics for cache effectiveness

The cache implementation follows established patterns for local caching while addressing the specific needs of PyPI dependency checking, making Zaojun more efficient for development workflows and CI/CD pipelines.