Working with CDF Files

beginnerPythoncdflibastropymatplotlibnumpyrequestsCDFCommon Data Formattime seriesCDF_EPOCHCDF_TT2000cdflibISTPspace physicsheliophysics

Working with CDF Files

CDF (Common Data Format) is a self-describing binary format developed by NASA, widely used in space physics and heliophysics. A CDF file contains one or more variables, each accompanied by metadata attributes (units, description, fill value, etc.). Variables are independent arrays; the relationship between them (e.g., which variable is the time axis for another) is expressed through attributes rather than file structure. This playbook shows how to inspect an unfamiliar CDF file, identify the time variable, and plot numeric data against time.


Requirements

pip install cdflib astropy matplotlib numpy requests

Step 1: Inspect the file

Open the CDF and list its contents without loading data into memory.

import cdflib

path = "/path/to/file.cdf"   # or a local path downloaded from a URL

cdf  = cdflib.CDF(path)
info = cdf.cdf_info()

print("Variables :", info.zVariables)
print("Global attributes:", list(info.Attributes))   # or info.Attributes depending on cdflib version

Examine individual variables:

for var in info.zVariables:
    vi = cdf.varinq(var)
    va = cdf.varattsget(var)
    dtype = vi.Data_Type_Description
    shape = vi.Dim_Sizes
    units = va.get("UNITS", va.get("units", "—"))
    desc  = va.get("CATDESC", va.get("FIELDNAM", ""))
    print(f"  {var:30s} {dtype:30s} units={units!r:15s} {desc}")

Identifying the time variable: The time axis is usually a variable with data type CDF_EPOCH, CDF_EPOCH16, or CDF_TT2000. It is often named Epoch by convention. Check vi.Data_Type_Description for these strings, or look for the VAR_TYPE = 'support_data' attribute in combination with an epoch data type.


Step 2: Load data and convert time

cdflib.cdfepoch.to_datetime converts CDF epoch values to Python datetime objects regardless of whether the epoch type is CDF_EPOCH, CDF_EPOCH16, or CDF_TT2000.

import numpy as np
import cdflib

cdf        = cdflib.CDF(path)
epoch_raw  = cdf.varget("Epoch")          # adjust variable name if needed
times      = np.array(cdflib.cdfepoch.to_datetime(epoch_raw))

# Load one or more numeric variables
rate = cdf.varget("RATE")                 # adjust to the variable you need

Handling fill values: Many CDF variables define a fill value (FILLVAL attribute) that marks missing data. Replace fill values with NaN before plotting:

va   = cdf.varattsget("RATE")
fill = va.get("FILLVAL")
if fill is not None:
    rate = rate.astype(float)
    rate[rate == fill] = np.nan

Step 3: Plot

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(times, rate, lw=0.8)
ax.set_xlabel("Time (UTC)")
ax.set_ylabel(va.get("UNITS", ""))
ax.set_title(va.get("CATDESC", "RATE"))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
fig.autofmt_xdate()
plt.tight_layout()
plt.close()

Working with remote files

Use astropy.utils.data.download_file to fetch a CDF from a URL and cache it locally so repeated runs avoid re-downloading:

from astropy.utils.data import download_file
import cdflib

url   = "https://..."
local = download_file(url, cache=True)
cdf   = cdflib.CDF(local)

Generic helper

The function below wraps the steps above into a reusable loader:

import numpy as np
import cdflib
from astropy.utils.data import download_file


def load_cdf(url, time_var="Epoch", value_vars=None):
    """
    Load a CDF file from a URL (or local path) and return a dict of arrays.
    time_var  : name of the CDF_EPOCH/TT2000 variable to use as the time axis.
    value_vars: list of variable names to load; if None, loads all zVariables.
    """
    local = download_file(url, cache=True) if url.startswith("http") else url
    cdf   = cdflib.CDF(local)
    info  = cdf.cdf_info()

    if value_vars is None:
        value_vars = [v for v in info.zVariables if v != time_var]

    epoch_raw = cdf.varget(time_var)
    times     = np.array(cdflib.cdfepoch.to_datetime(epoch_raw))

    result = {"time": times}
    for var in value_vars:
        data = cdf.varget(var).astype(float)
        va   = cdf.varattsget(var)
        fill = va.get("FILLVAL")
        if fill is not None:
            data[data == fill] = np.nan
        result[var] = data

    return result

Usage:

d = load_cdf("https://...", time_var="Epoch", value_vars=["FLUX", "ENERGY"])

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(d["time"], d["FLUX"])
plt.close()

Note on ISTP/SPDF conventions

Many heliophysics CDF files follow the ISTP metadata guidelines. Under these conventions:

  • Each data variable has a DEPEND_0 attribute naming its time variable.
  • UNITS holds the physical unit string.
  • FILLVAL marks bad/missing samples.
  • VALIDMIN / VALIDMAX give the expected physical range.
  • VAR_TYPE distinguishes data ('data'), time axes ('support_data'), and metadata ('metadata').

You can use these attributes to automate axis labelling and fill-value masking without hardcoding variable names.


References