Features
HySDS (Hybrid-Cloud Science Data Processing System) is an open source science data processing system used across multiple large-scale Earth Science missions, data production, and analysis systems.
Key Capabilities
Scalability and Performance
- Demonstrated data processing scalability to over 8,000 parallel compute nodes
- Supports processing rates of over 3-million jobs per day
- Successfully processes >300TB/day for NISAR mission
- Processed 2PB in first year for SWOT mission
- Supports both forward processing and bulk reprocessing campaigns
Deployment Flexibility
- Hybrid cloud deployment across:
- AWS cloud infrastructure
- On-premise systems
- NASA High-End Computing Capability (HECC)
- Multi-cloud support spanning:
- Amazon Web Services (AWS)
- Google Cloud Platform (GCP)
- Microsoft Azure
- NASA HECC
Cost Optimization
- Fault tolerant operation in AWS spot market for cost savings
- Cost-production modeling capabilities for:
- Cost estimation
- Resource sizing
- Processing rates
- Data volume projections
- Real-time cost analysis and optimization during operations
Core Components
- GRQ (Geo Region Query):
- Geospatial data catalog and management
- Faceted search capabilities
- Production rules evaluation and triggering
- Mozart:
- Job management and scheduling
- Queue management
- Auto-scaling control
- Metrics:
- Real-time job metrics
- Worker metrics and analytics
- Factotum:
- "Hot" helper workers for low-latency processes
- Verdi Workers:
- Distributed compute nodes
- Scales processing based on workload
Data Management
- Supports rolling storage of data products
- Integrated with external data sources
- Publish/subscribe capabilities
- Data staging and caching mechanisms
- Support for cloud-optimized data formats
- Integration with DAACs and archives
Automation & Orchestration
- Automated trigger rules based on:
- Data ingestion
- Job state changes
- Time-based scheduling
- Workflow orchestration
- Production rule evaluation and actions
- Auto-scaling based on queue backlogs
Development & Integration
- Open source community approach
- Support for multiple programming languages
- Container-based deployment (Docker/Podman)
- CI/CD integration capabilities
- API-driven architecture
Monitoring & Operations
- Real-time faceted analytics
- Operational dashboards
- Job status monitoring
- Resource utilization tracking
- Cost tracking and analysis
- Performance metrics
Security Features
- Support for multiple authentication methods
- Integration with NASA authentication systems
- VPC and firewall management
- Secure data access controls
Mission Support
Currently used by major NASA Earth Science missions including:
- NISAR
- SWOT
- SMAP
- SNWG OPERA
- OCO-2/3 reprocessing
Project Integration
Successfully integrated into:
- 13 active projects as of 2024
- 33 total NASA-funded projects to date
- Multiple research and analysis projects
- Data production systems
- Analysis platforms
Standards & Interoperability
- OGC Application Package support
- WPS-T interface capabilities
- STAC catalog integration
- Support for standard data formats
- Integration with common Earth science tools and services
Community & Support
- Active open source community
- 50+ developers/integrators
- 30+ contributors
- 78 GitHub repositories
- Regular community coordination meetings
- Public documentation and support resources