Getting Started with CyVerse Deployment¶
This guide will help you get started with deploying and managing CyVerse infrastructure. Choose the path that matches your role and objectives.
Prerequisites¶
Infrastructure Requirements
Hardware/Cloud:
- Access to bare metal hardware, OpenStack cloud, or commercial provider (:simple-amazonaws: AWS, Google Cloud, :simple-azuredevops: Microsoft Azure)
- Minimum recommended: 8-node Kubernetes cluster with 32 CPU cores and 128GB RAM per node
- Persistent storage capability (NFS, iRODS, or cloud block storage)
- Public IP addresses and DNS management
Skills & Knowledge:
- Advanced understanding of Linux system administration and file permissions
- Experience with Kubernetes (K8s) cluster management
- Familiarity with Docker containerization
- Understanding of Infrastructure as Code (IaC) principles
- Experience with Ansible for configuration management
- Networking knowledge (DNS, load balancing, ingress controllers)
Tools & Access:
- GitHub or GitLab with private repositories for sensitive credentials
kubectl,helm, and other Kubernetes CLI tools- Access via
sshto infrastructure nodes - Web-enabled browser for administrative interfaces
Personal Qualities:
- Patience & Perseverance - complex distributed systems require methodical troubleshooting
Deployment Roadmap¶
Deploying CyVerse follows a structured sequence. Each component builds upon the previous layers:
graph TD
A[1. Kubernetes Cluster] --> B[2. Core Infrastructure]
B --> C[3. Core Services]
C --> D[4. Application Services]
D --> E[5. Databases]
E --> F[6. Discovery Environment]
B --> B1[Storage: OpenEBS]
B --> B2[Networking: Ingress NGINX]
B --> B3[iRODS CSI Driver]
C --> C1[KeyCloak: Authentication]
C --> C2[RabbitMQ: Message Queue]
C --> C3[Redis: Cache]
C --> C4[ElasticSearch: Search]
D --> D1[User Portal]
D --> D2[VICE]
D --> D3[Exim4: Mail]
E --> E1[PostgreSQL Databases]
E --> E2[Grouper]
E --> E3[Unleash]
Component Dependencies¶
| Component | Depends On | Purpose |
|---|---|---|
| Kubernetes Cluster | Hardware/Cloud | Container orchestration foundation |
| OpenEBS | Kubernetes | Persistent volume management |
| Ingress NGINX | Kubernetes | External traffic routing |
| KeyCloak | Kubernetes, PostgreSQL | User authentication and authorization |
| RabbitMQ | Kubernetes | Message passing between services |
| Redis HA | Kubernetes | Caching and session management |
| ElasticSearch | Kubernetes | Full-text search capabilities |
| Discovery Environment | All core services | Main user-facing application |
| VICE | DE, Kubernetes | Interactive computing environments |
Deployment Phases¶
Phase 1: Foundation (Weeks 1-2) - Set up Kubernetes cluster - Deploy storage (OpenEBS) and networking (Ingress NGINX) - Configure iRODS CSI driver for data storage - Verify cluster health and resource allocation
Phase 2: Core Services (Week 3) - Deploy and configure KeyCloak for authentication - Set up RabbitMQ message broker - Deploy Redis HA for caching - Configure ElasticSearch for search functionality - Deploy monitoring (Jaeger)
Phase 3: Databases (Week 4) - Provision PostgreSQL databases for all services - Initialize database schemas - Configure backup and recovery procedures - Set up Grouper for group management - Deploy Unleash for feature flags
Phase 4: Applications (Weeks 5-6) - Deploy User Portal - Configure Discovery Environment services - Set up VICE for interactive applications - Deploy mail services (Exim4) - Configure external integrations
Phase 5: Verification & Tuning (Week 7) - End-to-end testing - Performance optimization - Security hardening - Documentation of deployment specifics
Quick Start by Role¶
For DevOps Engineers¶
Goal: Deploy CyVerse infrastructure from scratch
1. Prepare Your Environment¶
Start with the setup guides to configure your deployment tools:
- Ansible Setup - Configure Ansible for infrastructure automation
- Docker Setup - Set up Docker and container registry access
- Database Setup - Prepare PostgreSQL deployment tools
2. Deploy Core Infrastructure¶
Follow the deployment sequence:
- Kubernetes Cluster - Deploy and configure your K8s cluster
- Kubernetes Resources - Set up namespaces, resource quotas, and RBAC
- Storage (OpenEBS) - Deploy persistent volume management
- iRODS CSI Driver - Connect to iRODS data storage
- Networking - Configure ingress for external access
3. Deploy Services¶
With infrastructure in place, deploy the service layer:
- KeyCloak - Authentication and identity management
- RabbitMQ - Message broker for service communication
- Redis HA - High-availability caching
- ElasticSearch - Search engine
- Databases - PostgreSQL for all services
4. Deploy Applications¶
Finally, deploy the user-facing applications:
- Discovery Environment - Main DE platform
- User Portal - User account management interface
- VICE - Interactive computing
5. Verify Deployment¶
- Check all pods are running:
kubectl get pods --all-namespaces - Verify services are accessible through ingress
- Test authentication flow through KeyCloak
- Review logs for errors:
kubectl logs -n <namespace> <pod-name>
Next Steps: - Review DevOps operational procedures - Set up monitoring and alerting - Configure backup procedures for databases - Plan disaster recovery procedures
For System Administrators¶
Goal: Manage users, apps, and daily operations
1. Understand the Platform¶
Before managing CyVerse, familiarize yourself with the architecture:
- System Overview - How CyVerse components work together
- Discovery Environment - The main user-facing platform
- Data Store - iRODS-based data management
2. Learn Administrative Tools¶
Review the admin guides for operational procedures:
- DE Administration - User management, app publishing, VICE access grants
- Data Store Administration - Data permissions, storage management
- User Portal Administration - Account management
3. Common Administrative Tasks¶
User Management: - Grant VICE access to qualified users - Manage user quotas and resource limits - Process user support requests
App Publishing: - Review and approve tool integration requests - Publish containerized apps to the Discovery Environment - Test and validate app functionality
Data Management: - Process Permanent ID/DOI requests for data publishing - Manage data sharing permissions - Monitor storage usage and quotas
4. Resources¶
- FAQ - Common questions and troubleshooting
- Permanent ID Requests - DOI workflow documentation
- Terrain API - Understanding the backend API
Next Steps: - Bookmark frequently used admin interfaces - Join CyVerse staff communication channels - Review common user support scenarios - Familiarize yourself with escalation procedures
For Application Developers¶
Goal: Integrate with CyVerse APIs or contribute to the platform
1. Understand the API Architecture¶
Start with API fundamentals:
- API Overview - Introduction to Terrain API
- API Endpoint Index - Complete endpoint reference
- Developer Guide - Development environment setup
2. Authentication & Access¶
Learn how to authenticate with CyVerse services:
- Authentication (KeyCloak) - OAUTH 2.0 flow and token management
- Error Handling - API error codes and responses
3. Common Integration Patterns¶
Data Operations: - Filesystem API - Browse and manage data in the Data Store - File I/O - Upload and download files - Metadata - Attach metadata to data objects
Compute Operations: - App Metadata - Query available analysis tools - Job Submission - Launch computational analyses - Callbacks - Receive job status updates
User Interactions: - Notifications - Send messages to users - Comments - Enable collaborative annotations - Favorites - Manage user bookmarks
4. Contributing to CyVerse¶
If you're contributing code to CyVerse:
- Review the Developer Guide for contribution workflow
- Browse the CyVerse-DE GitHub organization
- Test against the live Terrain API
Migration Guides: - Tapis v2 to v3 Migration - Upgrade from legacy Tapis APIs
Next Steps: - Set up a CyVerse development account - Review API rate limits and usage policies - Explore example integrations in GitHub - Join developer community channels
Installation Tools¶
The DevOps Guide provides a complete list of required software for managing a CyVerse deployment, including:
- Kubernetes CLI tools (
kubectl,helm) - Ansible for configuration management
- Docker for container operations
- Database administration tools
- Monitoring and logging tools
Technology Stack Overview¶
Authentication¶
CyVerse authentication relies upon: - LDAP - Directory services for user accounts - OAUTH 2.0 protocol - Modern authentication standard - CILogon - Federated identity for research institutions
See KeyCloak for authentication deployment details.
Security Considerations¶
Experience operating in a Science DMZ network architecture is beneficial for deploying CyVerse on university infrastructure. Key security topics include:
- Firewall rules for high-performance data transfer
- Network segmentation between public and private services
- TLS certificate management for secure communications
- Secrets management for API keys and credentials
Core APIs¶
Terrain API is the backbone service aggregating all Discovery Environment functionality. It provides a unified RESTful interface for:
- Data management (Data Store operations)
- App execution (job submission and monitoring)
- User services (preferences, notifications, favorites)
- Administrative functions (user management, resource allocation)
Detailed endpoint documentation is available in the API endpoints section.
Platform Components¶
User-Facing Products:
- Discovery Environment - Web-based data science workbench with 1000+ pre-integrated tools
- Data Store - 6+ PB iRODS-based storage with data management, hosting, and sharing
- Data Commons - Community data repository with DataCite DOI publishing
- VICE - Visual Interactive Computing (Jupyter, RStudio, Shiny apps)
- BisQue - Browser-based large image analysis platform
- DNA Subway - Educational genomics software for students
Backend Services:
- Core Services - Microservices architecture overview
- Authentication (KeyCloak) - Identity and access management
- Cloud Services (CACAO) - Multi-cloud automation and orchestration
Deployment Platform¶
All CyVerse services are deployed on Kubernetes (K8s) using:
- Helm charts for package management
- Ansible playbooks for configuration
- GitOps practices for infrastructure as code
- Namespace isolation for service boundaries
See the complete Deployments overview for details.
Database Infrastructure¶
CyVerse uses PostgreSQL as its primary database platform. Each service maintains its own database for isolation and independent scaling:
- DE Database - Discovery Environment core data
- Metadata Database - User-defined metadata
- Notifications Database - User notification system
- KeyCloak Database - Authentication data
- And more...
Next Steps¶
After completing your getting started tasks:
- Verify Deployment - Test core functionality end-to-end
- Review Troubleshooting Resources - Familiarize yourself with common issues in the FAQ
- Set Up Monitoring - Implement logging and alerting for production operations
- Join the Community - Connect with other CyVerse operators and developers via GitHub
- Plan for Scale - Review performance tuning and capacity planning for your workload
Need Help?
- Check the FAQ for common questions
- Review deployment-specific documentation in the Deployment Guide
- Explore the API Reference for integration details
- Consult the Developer Guide for contribution workflow