Frequently Asked Questions (FAQ)¶

Common questions about deploying, administering, and developing with CyVerse infrastructure.

Deployment & Installation¶

How do I deploy CyVerse on my infrastructure?¶

Start with the Getting Started Guide to review prerequisites, then follow the Deployment Guide for step-by-step deployment instructions. The deployment follows this sequence:

Deploy Kubernetes cluster
Set up storage (OpenEBS) and networking (Ingress NGINX)
Deploy core services (KeyCloak, RabbitMQ, Redis, ElasticSearch)
Provision databases
Deploy application services (Discovery Environment, User Portal, VICE)

See the deployment roadmap for detailed phase breakdown.

What are the minimum hardware requirements?¶

Minimum recommended configuration:

8-node Kubernetes cluster
32 CPU cores per node (256 cores total)
128GB RAM per node (1TB total)
10TB+ persistent storage (NFS, iRODS, or cloud block storage)
10 Gbps network connectivity
Public IP addresses for ingress services

For smaller deployments (testing/development): - 3-node cluster - 16 cores per node - 64GB RAM per node - 1TB storage

See Prerequisites for complete requirements.

Can I deploy CyVerse on commercial cloud providers (AWS, GCP, Azure)?¶

Yes! CyVerse can be deployed on:

:simple-amazonaws: AWS - Use EKS for Kubernetes, EBS for storage
Google Cloud - Use GKE for Kubernetes, Persistent Disks for storage
:simple-azuredevops: Azure - Use AKS for Kubernetes, Azure Disks for storage
OpenStack - CyVerse's primary cloud platform

You'll need to adapt storage provisioning and load balancer configurations for your cloud provider. See the Kubernetes deployment guide.

What Kubernetes distribution should I use?¶

CyVerse is tested with:

Upstream Kubernetes (1.23+) - Recommended for most deployments
OpenShift - Enterprise Kubernetes with additional security features
Rancher - Kubernetes management platform
Cloud-managed Kubernetes - EKS, GKE, AKS

Use the latest stable Kubernetes version supported by your distribution.

How long does a full CyVerse deployment take?¶

Typical timeline (experienced DevOps team):

Weeks 1-2: Infrastructure setup (Kubernetes, storage, networking)
Week 3: Core services deployment (KeyCloak, RabbitMQ, Redis, ElasticSearch)
Week 4: Database provisioning and configuration
Weeks 5-6: Application deployment (DE, User Portal, VICE)
Week 7: Testing, tuning, and documentation

Total: 7-8 weeks for initial production deployment.

Development/testing environments can be deployed faster (2-3 weeks).

Do I need to deploy all CyVerse services?¶

No! CyVerse is modular. You can deploy:

Core only: Discovery Environment for computational workflows
Data Store only: iRODS-based data management
Custom subset: Select services for your use case

All services require Kubernetes, PostgreSQL databases, and KeyCloak authentication.

How do I update/upgrade CyVerse services?¶

Updates are managed through:

Helm chart updates - Pull latest charts from repository
Container image updates - Reference new image tags in deployments
Database migrations - Run migration scripts before deploying new versions
Rolling updates - Kubernetes handles zero-downtime deployments

Always test updates in a staging environment first. See deployment documentation.

Administration¶

How do I grant VICE access to a user?¶

VICE (Visual Interactive Computing Environment) access is granted through the Discovery Environment admin interface:

Log into the Discovery Environment as an administrator
Navigate to Admin → User Management
Search for the user
Grant "VICE Access" permission
User must meet requirements: valid account, accepted terms of service, sufficient quota

See DE Administration Guide for detailed procedures.

How do I process a DOI/Permanent ID request?¶

DOI requests for data publishing follow this workflow:

User submits request through Discovery Environment (Data → Permanent ID Request)
Admin receives notification
Review submission in admin interface for completeness
Validate metadata and data quality
Approve request - system creates DOI via DataCite
Notify user of published DOI

See Permanent ID Requests documentation for complete SOP.

How do I add a new application to the Discovery Environment?¶

Publishing a containerized app to DE:

Containerize your tool - Create Docker image with tool/software
Push to registry - Upload to Docker Hub or Harbor
Create integration - Use DE "Apps" interface to define:
Input files
Parameters
Output files
Resource requirements
Test privately - Validate app functionality
Publish publicly - Make available to all users

See DE Administration Guide for step-by-step instructions.

How do I manage user storage quotas?¶

Storage quotas are managed through:

iRODS resource management - Set per-user/per-group quotas in iRODS
User Portal - Display quota usage to users
Admin tools - Monitor and adjust quotas as needed

Contact users approaching quota limits and provide options (clean up data, request increase, archive to cold storage).

How do I back up CyVerse databases?¶

Database backup strategy:

# PostgreSQL backup example
pg_dump -h db-host -U username -d de_database > de_backup.sql

# Automated daily backups
0 2 * * * /usr/local/bin/backup-cyverse-dbs.sh

Backup all service databases: - DE database - Metadata database - Notifications database - KeyCloak database - QMS database - Grouper database - Portal database - Unleash database

See Database documentation for detailed backup procedures.

How do I troubleshoot service failures?¶

Systematic troubleshooting approach:

Check pod status: kubectl get pods -n <namespace>
View logs: kubectl logs -n <namespace> <pod-name>
Check events: kubectl describe pod -n <namespace> <pod-name>
Verify dependencies: Ensure databases, RabbitMQ, Redis are accessible
Check resource limits: CPU/memory constraints can cause crashes
Review recent changes: Configuration updates, deployments, network changes

Common issues: - Database connection failures (check credentials, network) - Out of memory (increase resource limits) - Image pull errors (check registry access) - Configuration errors (validate YAML syntax)

How do I add a new administrator?¶

Administrator permissions are managed through KeyCloak:

Log into KeyCloak admin console
Navigate to Users
Find or create the user
Assign admin role:
de-admin for Discovery Environment
data-store-admin for Data Store
super-admin for platform-wide access
User logs out and back in to activate permissions

See KeyCloak documentation for role management.

API & Development¶

How do I authenticate with the Terrain API?¶

Terrain uses OAUTH 2.0 authentication via KeyCloak:

Method 1: Get token via browser (authorization code flow) 1. Direct user to KeyCloak authorization URL 2. User authorizes your application 3. Exchange authorization code for access token

Method 2: Get token via command line (password flow)

TOKEN=$(curl -s -X POST \
  https://kc.cyverse.org/auth/realms/CyVerse/protocol/openid-connect/token \
  -d "grant_type=password" \
  -d "client_id=de-client" \
  -d "username=YOUR_USERNAME" \
  -d "password=YOUR_PASSWORD" \
  | jq -r '.access_token')

Use token in API calls:

curl -H "Authorization: Bearer $TOKEN" \
  https://de.cyverse.org/terrain/filesystem/list

See Developer Guide for complete examples.

Where can I find the complete API documentation?¶

Live Swagger UI - Interactive API testing
Endpoint Index - Complete endpoint reference
API Overview - Terrain API introduction
Error Codes - Error handling guide

All Terrain endpoints are documented with request/response schemas and examples.

How do I migrate from Tapis v2 to Tapis v3?¶

Tapis v3 introduces breaking changes. Follow the Tapis v2 to v3 Migration Guide which covers:

API endpoint changes
Authentication updates
Request/response format changes
Deprecated features
New capabilities

Plan for code changes in your integrations.

What are the API rate limits?¶

Current rate limiting (subject to change):

100 requests/minute per user for authenticated endpoints
10 requests/minute for unauthenticated endpoints
Burst limit: 150 requests in 10 seconds

If you exceed limits, you'll receive HTTP 429 Too Many Requests. Implement exponential backoff in your client code.

For higher limits, contact CyVerse support with your use case.

How do I upload large files via the API?¶

For large files (>100MB), use the chunked upload endpoint:

import requests

def upload_large_file(access_token, local_file, dest_path, chunk_size=10*1024*1024):
    """Upload large file in chunks."""
    # 1. Initialize upload session
    init_response = requests.post(
        "https://de.cyverse.org/terrain/secured/fileio/upload-init",
        headers={"Authorization": f"Bearer {access_token}"},
        json={"dest": dest_path, "filename": "largefile.dat"}
    )
    upload_id = init_response.json()["upload_id"]

    # 2. Upload chunks
    with open(local_file, "rb") as f:
        chunk_num = 0
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            requests.post(
                f"https://de.cyverse.org/terrain/secured/fileio/upload-chunk/{upload_id}",
                headers={"Authorization": f"Bearer {access_token}"},
                files={"chunk": chunk},
                data={"chunk_number": chunk_num}
            )
            chunk_num += 1

    # 3. Finalize upload
    requests.post(
        f"https://de.cyverse.org/terrain/secured/fileio/upload-complete/{upload_id}",
        headers={"Authorization": f"Bearer {access_token}"}
    )

See File I/O endpoints for details.

How do I receive notifications when my job completes?¶

Enable notifications when submitting a job:

payload = {
    "app_id": "app-uuid",
    "name": "My Job",
    "notify": True,  # Enable notifications
    "config": {...}
}

You'll receive notifications via: - In-app notifications in Discovery Environment - Email (if configured in user preferences) - Webhook callbacks (if configured)

See Notifications API and Callbacks API.

Can I run the API locally for development?¶

Yes! Clone the terrain repository and other services:

# Clone terrain
git clone https://github.com/cyverse-de/terrain.git
cd terrain

# Install dependencies (Clojure/Leiningen)
lein deps

# Configure local settings
cp config/terrain.properties.example config/terrain.properties
# Edit config to point to local databases

# Run locally
lein run

See repository README files for service-specific setup instructions.

Troubleshooting¶

Why can't users log in to the Discovery Environment?¶

Common authentication issues:

KeyCloak service down - Check kubectl get pods -n keycloak
Database connection failure - KeyCloak can't reach its database
LDAP sync issues - If using LDAP, verify connection to LDAP server
Browser cache/cookies - Have user clear browser cache and try incognito
CILogon integration broken - Check CILogon service status
Certificate issues - Verify TLS certificates are valid

Debug steps:

# Check KeyCloak logs
kubectl logs -n keycloak deployment/keycloak

# Test database connection
kubectl exec -it -n keycloak deployment/keycloak -- psql -h db-host -U keycloak

# Verify LDAP connectivity
kubectl exec -it -n keycloak deployment/keycloak -- ldapsearch -x -H ldap://ldap-server

Why is the Data Store slow?¶

Performance bottlenecks:

iRODS server overload - Check CPU/memory on iRODS servers
Network congestion - Verify network bandwidth between nodes
Slow storage backend - Check underlying storage performance (NFS, SAN)
Database issues - iRODS catalog queries slow
Too many concurrent users - Consider scaling iRODS servers

Monitoring:

# Check iRODS server status
ils -L  # List with verbose output, observe latency

# Monitor iRODS resource usage
kubectl top pods -n irods

# Check network performance
iperf3 -c irods-server

See Data Store documentation for optimization tips.

Why are jobs stuck in "Submitted" status?¶

Job execution issues:

HTCondor/K8s not processing jobs - Check job execution platform
Insufficient resources - No available compute nodes
Input staging failures - Data transfer from Data Store failed
App configuration errors - Invalid app definition
Permission issues - User lacks access to required resources

Debug steps:

# Check HTCondor queue (for batch jobs)
condor_q -analyze <job-id>

# Check Kubernetes pods (for VICE jobs)
kubectl get pods -n vice-apps

# Check analyses service logs
kubectl logs -n de deployment/analyses

Jobs are failing with "Out of Memory" errors¶

Memory issues:

Insufficient container limits - Increase memory limit in app definition
User requested too little memory - Guide users to request appropriate resources
Node exhaustion - Add more Kubernetes nodes or increase node sizes
Memory leaks in tool - Issue with the containerized tool itself

Solutions: - Increase memory limits: Edit app in DE Apps interface - Add resource quotas: Define default/max memory for different app categories - Scale cluster: Add more compute nodes

See Kubernetes resources documentation.

ElasticSearch is using too much disk space¶

ElasticSearch index management:

# Check index sizes
curl -X GET "localhost:9200/_cat/indices?v&s=store.size:desc"

# Delete old indices
curl -X DELETE "localhost:9200/old-index-name"

# Set up index lifecycle management (ILM)
curl -X PUT "localhost:9200/_ilm/policy/cleanup-policy" -H 'Content-Type: application/json' -d'
{
  "policy": {
    "phases": {
      "delete": {
        "min_age": "30d",
        "actions": {"delete": {}}
      }
    }
  }
}'

See ElasticSearch deployment documentation.

Data Management¶

Data sharing options:

Share within CyVerse - Give users/groups read or write permissions

# Via Data Store API
POST /terrain/secured/filesystem/share
{
  "paths": ["/iplant/home/user/folder"],
  "permissions": ["read"],
  "users": ["collaborator1", "collaborator2"]
}

Public links - Create anonymous access tickets
In DE: Right-click file/folder → Share → Create Public Link
DOI publication - Publish to Data Commons with permanent identifier

See Sharing API and Data Store Guide.

How do I search for data across the entire Data Store?¶

Use the search API:

curl -H "Authorization: Bearer $TOKEN" \
  -X POST https://de.cyverse.org/terrain/secured/filesystem/search \
  -H "Content-Type: application/json" \
  -d '{"query": "genomics", "type": "file"}'

Search supports: - Full-text search in filenames and metadata - Filters by file type, date, size - User-defined metadata (AVU) searches

See Search API documentation.

How do I attach metadata to data files?¶

Add Attribute-Value-Unit (AVU) metadata:

# Add metadata via API
requests.post(
    "https://de.cyverse.org/terrain/secured/filesystem/metadata",
    headers={"Authorization": f"Bearer {token}"},
    json={
        "path": "/iplant/home/user/data.csv",
        "metadata": [
            {"attr": "experiment_date", "value": "2024-01-15", "unit": ""},
            {"attr": "sample_size", "value": "1000", "unit": "samples"}
        ]
    }
)

Or use the DE interface: Right-click file → Metadata → Add

See Metadata API.

How do I recover deleted data?¶

Deleted files go to trash and can be restored within 30 days:

Via Discovery Environment: 1. Navigate to Trash folder 2. Select files/folders to restore 3. Click "Restore"

Via API:

curl -H "Authorization: Bearer $TOKEN" \
  -X POST https://de.cyverse.org/terrain/secured/filesystem/restore \
  -H "Content-Type: application/json" \
  -d '{"paths": ["/iplant/trash/user/deleted-folder"]}'

After 30 days, files are permanently deleted. Implement backups for critical data.

See Restore API.

What's the maximum file size I can store?¶

File size limits:

Single file: No hard limit (tested up to 100+ GB)
Upload via web UI: 2 GB recommended (browser limitations)
Upload via API/CLI: No limit
iCommands (iRODS CLI): Recommended for files >2 GB

For very large files (>100 GB), use: - iCommands: iput for direct iRODS uploads - GoCommands: CyVerse's modern iRODS CLI - Globus: For scheduled large transfers

See Data Store documentation.

Still Have Questions?¶

Search this documentation - Use the search bar at the top
Check service-specific docs - See Deployment Guide, API Reference, or Admin Guides
GitHub Issues - Report issues or ask questions at github.com/cyverse-de
CyVerse Support - Contact support at cyverse.org/help