Product

Customize and deploy runtimes in three steps:

1. Define a Cluster (Hardware Control Plane)
‍
Clusters are virtual compute groups that can be configured for distributed workloads (ex. Spark), single or multi-GPU model training jobs, or single-machine tasks (ex. Pandas).
‍

‍2. Define an Environment (Software Control Plane)
‍
Environments encapsulate the software configurations of runtimes, such as their dependencies and container settings. Environments can be specified using Dockerfiles or Pip/Conda requirements files.
‍‍

3. Connect your Cluster and Environment (Compute Application Plane)
‍
A Cluster-Environment combination specifies a hardware and software configuration that can then be leveraged in any compute application, including Kaspian Jobs and Workflows, Airflow tasks, and Kaspian-hosted Jupyter notebooks.
‍

Turbulent Airflow? Fly smoother with Kaspian Workflows

Optimized job orchestrator and GUI
‍
Kaspian's Workflow builder includes a highly available and performant job orchestrator and a graph builder GUI. Workflows are composed of Jobs, which are standalone tasks similar to the nodes in an Airflow DAG.
‍

‍Orchestration + Compute = Magical Developer Experience
‍
While Kaspian Jobs can be run using Airflow, Workflows leverage their unified orchestration and compute layers to enable features like data staging, metadata management, and historical data flow introspection and querying.
‍

Code-first philosophy makes migrations a breeze
‍
Kaspian is a code-first platform: Jobs are defined in code and pulled from your VCS system during execution. Kaspian also provides an optional API to streamline certain operations such as reading/writing from datastores and staging data.
‍

Operationalize AI without managing infrastructure

Train AI models at scale
‍
Kaspian helps data teams train models ranging from linear regressions to foundational large language models (LLMs). Kaspian Clusters can be configured to run distributed jobs using engines like Spark or harness multiple GPUs to accelerate training.
‍

‍Online learning means always-on, always-improving AI
‍
Kaspian's native data pipelining construct, Workflows, can be used to retrain AI models at a specified cadence. This capability ensures that models are always using the maximum amount of recent data to inform their predictions.
‍‍

Hosted services come bundled for free with Kaspian

JupyterHub (Data Notebooks)
‍‍
JupyterHub is a popular tool for data scientists and analysts to explore datasets and devlop AI models. Kaspian's hosted JupyterHub instance enables notebooks to connect to any Kaspian Environment and Cluster and allows users to share files.
‍

‍Apache Superset (Data Dashboards)
‍
Apache Superset is a popular open-source data visualization platform that enables users to create and share dashboards. Superset connects to dozens of different datastores and includes 40+ visualization modalities.
‍

Grafana (Metrics and Logs Dashboards)
‍
Grafana is an open-source solution for creating operational dashboards. Kaspian pushes compute job metrics and logs to Prometheus and exposes these feeds to Grafana. Users can therefore create their own dashboards and alerts via Grafana.
‍

Product Overview

Customize and deploy runtimes in three steps:

Turbulent Airflow? Fly smoother with Kaspian Workflows

Operationalize AI without managing infrastructure

Hosted services come bundled for free with Kaspian

Get started today