Community
HySDS (Hybrid Cloud Science Data Processing System) is an open source science data processing system used across many large-scale Earth Science missions, data production, and analysis systems. This page provides information about the HySDS community, contributions, and resources.
Community Overview
The HySDS community consists of:
- 50+ developers who have used HySDS for science data processing
- 30+ contributors to the HySDS Core
- 78 repositories in the organization
- 5 major versions in its evolution
- 83 releases (as of April 2024)
- 13 active projects using HySDS in 2024
- 33 total NASA-funded projects to date
Community Collaboration
The HySDS community follows these collaborative practices:
- Regular Coordination: Bi-weekly multi-mission coordination meetings across development and operations teams
- Shared Benefits: Community contributions including new features, bug fixes, and operational procedures from any project benefit all other projects
- Active Communication: Public GitHub, JIRA, Confluence wiki, and Slack channels enable multi-project sharing and collaboration
- Open Source Approach: Community software contributions drive multi-mission benefits
Active Projects
HySDS is currently being used by several major NASA Earth Science missions and projects:
Missions
- NISAR
- SWOT
- SMAP with HySDS (SWH)
- SNWG OPERA
- OCO-2 reprocessing
- OCO-3 reprocessing
Projects
- Multi-Mission Algorithm and Analysis Platform (MAAP)
- PO.DAAC SWODLR
- ASTER AVA
- ImgSPEC
- SISTER-SBG
- And more
Key Innovations
The HySDS community has driven several significant innovations:
- First NASA Science Data System to demonstrate data production scalability to over 8,000 parallel compute nodes in the cloud
- First SDS to run data production using lower cost and volatile compute in AWS spot market
- Enabling first NASA EO Science Data Systems to do operations in the cloud
- First SDS to run data production spanning across AWS, on-premises, and NASA HECC Super Computing
- First SDS to span across AWS, GCP, Azure, & NASA HECC
- First SDS to integrate real-time faceted analytics with operations
Community Resources
Code & Documentation
- Source code: https://github.com/hysds/
- Releases: https://github.com/hysds/hysds-framework/releases
- Community wiki: https://hysds-core.atlassian.net/
- Issue tracking: https://hysds-core.atlassian.net/jira/software/c/projects/HC/issues
Communication Channels
Active Slack channels:
- #hysds-community
- #hysds-developers
- #hysds-intern
- #hysds-general
- #hysds-sa
Getting Involved
The HySDS community welcomes new contributors and users. Here are ways to get involved:
- Join the bi-weekly multi-mission coordination meetings
- Participate in community Slack channels
- Contribute code via GitHub
- Report issues and feature requests in JIRA
- Share operational procedures and best practices
- Collaborate on new features and capabilities
Project Highlights
Some notable achievements from the HySDS community:
- NISAR using HySDS to support processing > 300TB/day
- SWOT used HySDS to process 2PB in first year
- Data processing scalability to over 8,000 parallel nodes @ > 3-million processing jobs per day
- Fault tolerant operations in low cost & volatile AWS spot market
- Support for ML, GPU, multi-core processing at large scales
- Hybrid processing across AWS, HECC, and on-premise
- Low-latency urgent response & on-demand processing
Citation
When referencing HySDS in publications, please use: DOI: 10.5281/zenodo.11118142
Copyright Notice
Copyright 2024, by the California Institute of Technology. ALL RIGHTS RESERVED. United States Government Sponsorship acknowledged. Any commercial use must be negotiated with the Office of Technology Transfer at the California Institute of Technology.
Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology.