Block Data Delivery
For clients who have purchased block row-level data, Carbon Arc provides two access methods:
- Iceberg REST Catalog — Query data directly using industry-standard Iceberg table format (Recommended)
- Amazon S3 — Direct file access via AWS S3 buckets (Legacy)
Both methods provide access to the same underlying data. We recommend using Polaris for new integrations as it provides a modern, query-ready interface without the need to manage file ingestion pipelines.
Iceberg REST Catalog (Polaris) — Recommended
Overview
Carbon Arc provides access via an Iceberg REST Catalog, allowing you to connect your data platform directly to Carbon Arc's block data warehouse. This is the recommended approach for new clients as it offers:
- No ETL required — Query tables directly without building ingestion pipelines
- Always up-to-date — Access the latest data without managing incremental updates
- Industry standard — Compatible with Snowflake, Databricks, ClickHouse, Spark, Trino, and more
- Schema evolution — Automatic handling of schema changes
Connection Details
| Parameter | Value |
|---|---|
| Catalog URI | https://bulk.apps.carbonarc.co/api/catalog |
| Warehouse | bulk |
| Auth Scope | PRINCIPAL_ROLE:ALL |
| OAuth Token Endpoint | https://bulk.apps.carbonarc.co/api/catalog/v1/oauth/tokens |
Credentials
Your Client ID and Client Secret will be provided via a secure 1Password link after purchase. Keep these credentials secure and do not share them.
Platform Connection Guides
Select your data platform below for specific connection instructions:
- Snowflake
- ClickHouse
- Databricks
- Apache Spark
- Trino / Starburst
Snowflake Integration
Step 1: Create Catalog Integration
CREATE OR REPLACE CATALOG INTEGRATION carbon_arc
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://bulk.apps.carbonarc.co/api/catalog'
WAREHOUSE = 'bulk'
ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = '<your_client_id>'
OAUTH_CLIENT_SECRET = '<your_client_secret>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;
Step 2: Create Linked Database
CREATE DATABASE carc
LINKED_CATALOG = (
CATALOG = 'carbon_arc'
);
Step 3: Query Data
Once connected, you can query tables directly:
SELECT * FROM carc.sloth.app_performance_data_daily LIMIT 100;
Replace <your_client_id> and <your_client_secret> with the credentials provided via 1Password.
ClickHouse Integration
Step 1: Enable Experimental Feature
SET allow_experimental_database_iceberg = 1;
Step 2: Create Database Connection
CREATE DATABASE carc
ENGINE = DataLakeCatalog('https://bulk.apps.carbonarc.co/api/catalog')
SETTINGS
catalog_type = 'rest',
catalog_credential = '<your_client_id>:<your_client_secret>',
warehouse = 'bulk',
auth_scope = 'PRINCIPAL_ROLE:ALL',
oauth_server_uri = 'https://bulk.apps.carbonarc.co/api/catalog/v1/oauth/tokens';
Step 3: Query Data
SELECT * FROM carc.sloth.app_performance_data_daily LIMIT 100;
Replace <your_client_id> and <your_client_secret> with the credentials provided via 1Password.
Databricks Integration
Coming soon — Documentation for Databricks integration is in progress.
For immediate assistance, please contact your Carbon Arc representative.
Apache Spark Integration
Coming soon — Documentation for Apache Spark integration is in progress.
For immediate assistance, please contact your Carbon Arc representative.
Trino / Starburst Integration
Coming soon — Documentation for Trino and Starburst integration is in progress.
For immediate assistance, please contact your Carbon Arc representative.
Amazon S3 — Legacy
S3 file delivery is maintained for existing integrations. For new implementations, we recommend using Polaris instead.
Overview
Block data is delivered to dedicated S3 buckets with a standardized folder structure. Your AWS IAM user or role is granted read-only access to the bucket containing your purchased data assets.
Bucket Access
After purchase, you'll receive:
- Bucket ARN: The S3 bucket location (e.g.,
arn:aws:s3:::carc-ext-{dataset}) - IAM Access: Your AWS principal is granted read access to the bucket
Delivery Structure
Data is organized into two delivery patterns:
Incremental Updates
For ongoing data updates, we follow a standardized incremental delivery pattern:
| Attribute | Value |
|---|---|
| Path Structure | {drop_date}/Incremental/[data_files] |
| Content | Contains only new records received from vendor |
Example path:
s3://carc-ext-sloth/20260203/Incremental/sloth_app_performance_data_daily/
Full Reinstatement Deliveries
Complete data reinstatements are delivered when upstream data is updated:
| Attribute | Value |
|---|---|
| Path Structure | {drop_date}/Full/[data_files] |
| Content | Complete data asset including all historical data ingested to date |
Example paths:
s3://carc-ext-sloth/20260129/Full/sloth_app_performance_data_daily/
s3://carc-ext-sloth/20260129/Full/sloth_app_performance_data_monthly/
When is Data Reinstated?
We generally reinstate data on the first Monday of the month if there are changes to the ontology or significant upstream data corrections.
Data Consumption Guidelines
Recommended Approach:
- Start with Full Reinstatement — Always consume the most recent
Fullreinstatement as your baseline - Append Incrementals — Apply the
Incrementaldeliveries that occurred after the latestFullrefresh - Re-ingest on New Full — When a new
Fullreinstatement is available, delete your existing ingested data and re-ingest the completeFullreinstatement
Monitor the S3 bucket for new Full directories. When one appears, schedule a complete re-ingestion to ensure data consistency.
Available Datasets
The specific tables and feeds available depend on your purchased data package.
Contact your Carbon Arc representative for the complete schema documentation for your purchased data assets.
Support
For questions about block data access or connection issues:
- Email: support@carbonarc.ai