Library API

The Library provides programmatic access to explore all available data assets in the Carbon Arc platform. Use these APIs to discover data assets, understand their structure, preview sample data, and track changes.

Available Methods

Method	Description
`client.data.get_datasets()`	List all available datasets (paginated)
`client.data.get_dataset_information()`	Get details for a specific dataset
`client.data.get_insights_by_dataset()`	Retrieve all insights associated with a dataset
`client.data.get_data_dictionary()`	Get column definitions and metadata
`client.data.get_data_sample()`	Preview sample data rows
`client.data.get_dataset_schedule()`	Get the update schedule for a dataset
`client.data.get_library_version_changes()`	Check for updates and changes

Quick Start

from carbonarc import CarbonArcClient

# Initialize the client
client = CarbonArcClient(
    host="https://api.carbonarc.co",
    token="YOUR_API_TOKEN"
)

List All Datasets

Retrieve a paginated list of all available data assets in the Carbon Arc library.

response = client.data.get_datasets()

Response Structure

{
  "page": 1,
  "size": 25,
  "total_pages": 3,
  "datasources": [
    {
      "dataset_id": ["CA0056"],
      "dataset_name": "Credit Card – US Complete Panel",
      "description": "US credit card transaction data...",
      "provider_name": "Facteus",
      "last_updated_timestamp": "2026-02-18 21:45:08",
      "blocked": false,
      "is_current": true,
      "data": {
        "Topics": ["Core Panel", "by Payment Method"],
        "Key Metrics": ["Credit Card Spend", "Credit Card Transactions", "Credit Card Users"],
        "Coverage": {"Product Brands": "2k+", "Retailers": "1.6k+"}
      }
    }
  ]
}

Example: Display All Datasets

import pandas as pd

response = client.data.get_datasets()

# Extract datasets from response
datasets = response.get('datasources', [])

print(f"Total Datasets: {len(datasets)}")
print(f"Page {response.get('page')} of {response.get('total_pages')}")

# Create a summary DataFrame
df = pd.DataFrame([{
    'dataset_id': ds.get('dataset_id', ['N/A'])[0] if isinstance(ds.get('dataset_id'), list) else ds.get('dataset_id'),
    'dataset_name': ds.get('dataset_name'),
    'description': ds.get('description', '')[:80] + '...'
} for ds in datasets])

print(df.to_string(index=False))

Get Dataset Information

Retrieve detailed information about a specific data asset, including available topics.

dataset_info = client.data.get_dataset_information(dataset_id="CA0056")

Parameters

Parameter	Type	Required	Description
`dataset_id`	string	Yes	The unique dataset identifier (e.g., "CA0056")

Response Structure

{
  "dataset_id": ["CA0056"],
  "dataset_name": "Credit Card – US Complete Panel",
  "description": "US credit card transaction data from 9 provider sources...",
  "entity_topics": [
    {
      "entity_topic_id": 1,
      "entity_topic_label": "Core Panel"
    },
    {
      "entity_topic_id": 2,
      "entity_topic_label": "by Payment Method"
    }
  ]
}

Example: Explore Dataset Details

dataset_info = client.data.get_dataset_information(dataset_id="CA0056")

# Handle dataset_id that might be returned as a list
ds_id = dataset_info.get('dataset_id', 'N/A')
if isinstance(ds_id, list):
    ds_id = ds_id[0] if ds_id else 'N/A'

print(f"Dataset ID:  {ds_id}")
print(f"Name:        {dataset_info.get('dataset_name', 'N/A')}")
print(f"Description: {dataset_info.get('description', 'N/A')}")

# List available topics
topics = dataset_info.get('entity_topics', dataset_info.get('topics', []))
if topics:
    print(f"\nAvailable Topics ({len(topics)}):")
    for topic in topics:
        topic_id = topic.get('entity_topic_id', topic.get('id', 'N/A'))
        topic_name = topic.get('entity_topic_label', topic.get('name', 'N/A'))
        print(f"  • {topic_name} (entity_topic_id: {topic_id})")

tip

The entity_topic_id values returned here can be used to filter results in get_data_dictionary() and get_data_sample().

Get Insights by Dataset

Retrieve all insights (metric definitions) associated with a specific dataset. Each insight represents a queryable metric — such as spend, transactions, or users — scoped to a particular topic within the dataset. The response is paginated.

insights = client.data.get_insights_by_dataset(dataset_id="CA0056")

Parameters

Parameter	Type	Required	Description
`dataset_id`	string	Yes	The unique dataset identifier (e.g., "CA0056")

Response Structure

{
  "total": 33,
  "page": 1,
  "size": 25,
  "pages": 2,
  "items": [
    {
      "insight_id": 629,
      "label": "Credit Card Spend",
      ...
    }
  ]
}

Example: Display Dataset Insights

insights_response = client.data.get_insights_by_dataset(dataset_id="CA0056")

total = insights_response.get('total', 0)
pages = insights_response.get('pages', 1)
items = insights_response.get('items', [])

print(f"Total Insights: {total} (page 1 of {pages})")

for insight in items:
    print(f"• [{insight.get('insight_id')}] {insight.get('label')} — {insight.get('topic_label')}")
    print(f"  insight_name: {insight.get('insight_name')}")

Get Data Dictionary

Retrieve column definitions and metadata for a data asset. Optionally filter by topic.

# Get dictionary for entire dataset
data_dict = client.data.get_data_dictionary(dataset_id="CA0056")

# Filter by specific topic
data_dict = client.data.get_data_dictionary(
    dataset_id="CA0056",
    entity_topic_id=1
)

Parameters

Parameter	Type	Required	Description
`dataset_id`	string	Yes	The unique dataset identifier
`entity_topic_id`	integer	No	Filter to a specific topic (from `get_dataset_information()`)

Response Structure

The response is typically a list of column definitions:

[
  {
    "column_name": "date",
    "data_type": "DATE",
    "description": "Transaction date"
  },
  {
    "column_name": "brand_name",
    "data_type": "STRING",
    "description": "Name of the brand"
  },
  {
    "column_name": "spend",
    "data_type": "FLOAT",
    "description": "Total dollar value of purchases"
  }
]

Example: Display Column Definitions

import pandas as pd

data_dict = client.data.get_data_dictionary(dataset_id="CA0056")

if isinstance(data_dict, list) and data_dict:
    df = pd.DataFrame(data_dict)
    print(f"Total Columns: {len(df)}")
    
    # Display key columns
    display_cols = ['column_name', 'data_type', 'description']
    available = [c for c in display_cols if c in df.columns]
    if available:
        print(df[available].to_string(index=False))

Get Data Sample

Preview sample data rows from a data asset. Useful for understanding data structure before building frameworks.

# Get sample for entire dataset
sample = client.data.get_data_sample(dataset_id="CA0056")

# Filter by specific topic
sample = client.data.get_data_sample(
    dataset_id="CA0056",
    entity_topic_id=1
)

Parameters

Parameter	Type	Required	Description
`dataset_id`	string	Yes	The unique dataset identifier
`entity_topic_id`	integer	No	Filter to a specific topic

Response Structure

{
  "dataset_id": "CA0056",
  "samples": [
    {
      "date": "2025-11-04",
      "brand_id": 56290,
      "brand_name": "Under Armour",
      "company_id": 65116,
      "company_name": "Under Armour, Inc",
      "ticker_id": 1234,
      "ticker_name": "UAA",
      "spend": 125.50,
      "transactions": 3,
      "users": 2
    }
  ]
}

Example: Preview Sample Data

import pandas as pd

sample_response = client.data.get_data_sample(dataset_id="CA0056")

# Extract samples from nested response
if isinstance(sample_response, dict) and 'samples' in sample_response:
    samples = sample_response['samples']
    
    if isinstance(samples, list) and samples:
        df = pd.DataFrame(samples)
        print(f"Sample Size: {len(df)} rows")
        print(f"Columns: {list(df.columns)}")
        print(df.head(10))

Get Dataset Schedule

Retrieve the update schedule for a specific dataset, including when the next data refresh is expected and when the last update occurred.

schedule = client.data.get_dataset_schedule(dataset_id="CA0056")

Parameters

Parameter	Type	Required	Description
`dataset_id`	string	Yes	The unique dataset identifier (e.g., "CA0056")

Response Structure

{
  "schedules": [
    {
      "next_run_start": "2026-06-13T03:34:27.570759+00:00",
      "next_run_end": "2026-06-13T04:04:27.570759+00:00",
      "last_update": "2026-06-11T18:31:13+00:00"
    }
  ]
}

Example: Display Dataset Schedule

schedule_response = client.data.get_dataset_schedule(dataset_id="CA0056")

for entry in schedule_response.get('schedules', []):
    print(f"Last Update:     {entry.get('last_update')}")
    print(f"Next Run Start:  {entry.get('next_run_start')}")
    print(f"Next Run End:    {entry.get('next_run_end')}")

Check Library Version Changes

Track updates and changes to the library. Useful for monitoring when entities or data assets are added, modified, or deprecated.

# Get latest version changes
changes = client.data.get_library_version_changes(version="latest")

# With filters and pagination
changes = client.data.get_library_version_changes(
    version="latest",
    dataset_id="CA0056",  # Optional: filter to specific dataset
    page=1,
    size=100
)

Parameters

Parameter	Type	Required	Description
`version`	string	Yes	Version to check (e.g., `"latest"`, `"2026.1.1"`)
`dataset_id`	string	No	Filter to a specific dataset
`topic_id`	integer	No	Filter to a specific topic
`entity_representation`	string	No	Filter by entity representation (e.g., `"company"`, `"ticker"`)
`page`	integer	No	Page number for pagination
`size`	integer	No	Number of results per page
`order`	string	No	Sort direction (`"asc"` or `"desc"`)

Response Structure

{
  "total": 20278,
  "page": 1,
  "size": 25,
  "pages": 812,
  "entities": [
    {
      "entity_id": "12345",
      "entity_name": "Taylor Swift",
      "entity_representation_name": "entertainer",
      "entity_topic_id": 1,
      "entity_topic_label": "Core Panel",
      "dataset_id": "CA0010",
      "status": "Entity Added",
      "prev_ontology_version": "2026.1.1",
      "current_ontology_version": "2026.1.2",
      "version_release_date": "2026-01-29T22:36:26"
    }
  ]
}

Example: Display Recent Changes

changes = client.data.get_library_version_changes(version="latest")

total = changes.get('total', 0)
page = changes.get('page', 1)
pages = changes.get('pages', 1)

print(f"Total Changes: {total}")
print(f"Page {page} of {pages}")

entities = changes.get('entities', [])
if entities:
    print(f"\n{'Status':<20} | {'Entity Name':<30} | {'Dataset':<10} | {'Topic'}")
    print("-" * 80)
    
    for item in entities[:15]:
        status = item.get('status', 'N/A')
        entity_name = item.get('entity_name', 'N/A')[:30]
        dataset_id = item.get('dataset_id', 'N/A')
        topic = item.get('entity_topic_label', '')[:25]
        print(f"{status:<20} | {entity_name:<30} | {dataset_id:<10} | {topic}")
    
    # Show version info
    if 'current_ontology_version' in entities[0]:
        print(f"\nCurrent Version: {entities[0].get('current_ontology_version')}")
        print(f"Release Date: {entities[0].get('version_release_date', 'N/A')}")

Complete Workflow Example

Here's a complete workflow demonstrating how to explore a data asset from discovery to data preview:

import pandas as pd
from carbonarc import CarbonArcClient

# Initialize client
client = CarbonArcClient(
    host="https://api.carbonarc.co",
    token="YOUR_API_TOKEN"
)

TARGET_DATASET_ID = "CA0056"

# ─────────────────────────────────────────────────────────────────
# Step 1: Get dataset information and available topics
# ─────────────────────────────────────────────────────────────────
print("STEP 1: Dataset Information")
info = client.data.get_dataset_information(dataset_id=TARGET_DATASET_ID)

print(f"Name: {info.get('dataset_name')}")
print(f"Description: {info.get('description')[:100]}...")

topics = info.get('entity_topics', info.get('topics', []))
print(f"\nAvailable Topics ({len(topics)}):")
for topic in topics:
    print(f"  • {topic.get('entity_topic_label')} (ID: {topic.get('entity_topic_id')})")

# ─────────────────────────────────────────────────────────────────
# Step 2: Get data dictionary to understand columns
# ─────────────────────────────────────────────────────────────────
print("\nSTEP 2: Data Dictionary")

# Use first topic if available
if topics:
    first_topic_id = topics[0].get('entity_topic_id')
    data_dict = client.data.get_data_dictionary(
        dataset_id=TARGET_DATASET_ID,
        entity_topic_id=first_topic_id
    )
else:
    data_dict = client.data.get_data_dictionary(dataset_id=TARGET_DATASET_ID)

if isinstance(data_dict, list):
    print(f"Columns found: {len(data_dict)}")
    for col in data_dict[:10]:
        print(f"  • {col.get('column_name')}: {col.get('data_type')}")

# ─────────────────────────────────────────────────────────────────
# Step 3: Preview sample data
# ─────────────────────────────────────────────────────────────────
print("\nSTEP 3: Sample Data Preview")

sample = client.data.get_data_sample(dataset_id=TARGET_DATASET_ID)

if isinstance(sample, dict) and 'samples' in sample:
    samples = sample['samples']
    if isinstance(samples, list) and samples:
        df = pd.DataFrame(samples)
        print(f"Sample rows: {len(df)}")
        print(f"Columns: {list(df.columns)[:8]}...")
        print(df.head(5))

print("\nExploration complete!")

Common Dataset IDs

Here are some commonly used dataset IDs for reference. To get the full list of all available data assets, use client.data.get_datasets().

Dataset ID	Name	Type
`CA0056`	Credit Card – US Complete Panel	Wallet
`CA0028`	Credit Card – US Detailed Panel	Wallet
`CA0029`	POS - Convenience Stores	Wallet
`CA0034`	POS - Instore and Online	Wallet
`CA0030`	Clickstream	Attention
`CA0013`	Mobile App	Attention
`CA0054`	App Intelligence	Attention
`CA009`	Digital Advertising	Attention
`CA0049`	Medical & Pharmacy Open Claims	Balance Sheet
`CA0041`	Medicare Claims & Commercial Price Transparency	Balance Sheet
`CA0040`	Trade Claims	Logistics
`CA0025`	Freight Volume - North America	Logistics

Error Handling

Always wrap API calls in try-except blocks to handle potential errors gracefully:

try:
    dataset_info = client.data.get_dataset_information(dataset_id="CA0056")
    print(f"Dataset: {dataset_info.get('dataset_name')}")
except Exception as e:
    print(f"Error retrieving dataset information: {e}")

Next Steps

Once you've explored the Library and identified the data assets you need:

Use the Ontology API to search for specific entities within data assets
Build Frameworks to query and purchase data
Set up Scheduled Deliveries for recurring data needs