Working with CDF Files
Working with CDF Files
CDF (Common Data Format) is a self-describing binary format developed by NASA, widely used in space physics and heliophysics. A CDF file contains one or more variables, each accompanied by metadata attributes (units, description, fill value, etc.). Variables are independent arrays; the relationship between them (e.g., which variable is the time axis for another) is expressed through attributes rather than file structure. This playbook shows how to inspect an unfamiliar CDF file, identify the time variable, and plot numeric data against time.
Requirements
pip install cdflib astropy matplotlib numpy requests
Step 1: Inspect the file
Open the CDF and list its contents without loading data into memory.
import cdflib
path = "/path/to/file.cdf" # or a local path downloaded from a URL
cdf = cdflib.CDF(path)
info = cdf.cdf_info()
print("Variables :", info.zVariables)
print("Global attributes:", list(info.Attributes)) # or info.Attributes depending on cdflib version
Examine individual variables:
for var in info.zVariables:
vi = cdf.varinq(var)
va = cdf.varattsget(var)
dtype = vi.Data_Type_Description
shape = vi.Dim_Sizes
units = va.get("UNITS", va.get("units", "—"))
desc = va.get("CATDESC", va.get("FIELDNAM", ""))
print(f" {var:30s} {dtype:30s} units={units!r:15s} {desc}")
Identifying the time variable: The time axis is usually a variable with data type CDF_EPOCH, CDF_EPOCH16, or CDF_TT2000. It is often named Epoch by convention. Check vi.Data_Type_Description for these strings, or look for the VAR_TYPE = 'support_data' attribute in combination with an epoch data type.
Step 2: Load data and convert time
cdflib.cdfepoch.to_datetime converts CDF epoch values to Python datetime objects regardless of whether the epoch type is CDF_EPOCH, CDF_EPOCH16, or CDF_TT2000.
import numpy as np
import cdflib
cdf = cdflib.CDF(path)
epoch_raw = cdf.varget("Epoch") # adjust variable name if needed
times = np.array(cdflib.cdfepoch.to_datetime(epoch_raw))
# Load one or more numeric variables
rate = cdf.varget("RATE") # adjust to the variable you need
Handling fill values: Many CDF variables define a fill value (FILLVAL attribute) that marks missing data. Replace fill values with NaN before plotting:
va = cdf.varattsget("RATE")
fill = va.get("FILLVAL")
if fill is not None:
rate = rate.astype(float)
rate[rate == fill] = np.nan
Step 3: Plot
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(times, rate, lw=0.8)
ax.set_xlabel("Time (UTC)")
ax.set_ylabel(va.get("UNITS", ""))
ax.set_title(va.get("CATDESC", "RATE"))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
fig.autofmt_xdate()
plt.tight_layout()
plt.close()
Working with remote files
Use astropy.utils.data.download_file to fetch a CDF from a URL and cache it locally so repeated runs avoid re-downloading:
from astropy.utils.data import download_file
import cdflib
url = "https://..."
local = download_file(url, cache=True)
cdf = cdflib.CDF(local)
Generic helper
The function below wraps the steps above into a reusable loader:
import numpy as np
import cdflib
from astropy.utils.data import download_file
def load_cdf(url, time_var="Epoch", value_vars=None):
"""
Load a CDF file from a URL (or local path) and return a dict of arrays.
time_var : name of the CDF_EPOCH/TT2000 variable to use as the time axis.
value_vars: list of variable names to load; if None, loads all zVariables.
"""
local = download_file(url, cache=True) if url.startswith("http") else url
cdf = cdflib.CDF(local)
info = cdf.cdf_info()
if value_vars is None:
value_vars = [v for v in info.zVariables if v != time_var]
epoch_raw = cdf.varget(time_var)
times = np.array(cdflib.cdfepoch.to_datetime(epoch_raw))
result = {"time": times}
for var in value_vars:
data = cdf.varget(var).astype(float)
va = cdf.varattsget(var)
fill = va.get("FILLVAL")
if fill is not None:
data[data == fill] = np.nan
result[var] = data
return result
Usage:
d = load_cdf("https://...", time_var="Epoch", value_vars=["FLUX", "ENERGY"])
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(d["time"], d["FLUX"])
plt.close()
Note on ISTP/SPDF conventions
Many heliophysics CDF files follow the ISTP metadata guidelines. Under these conventions:
- Each data variable has a
DEPEND_0attribute naming its time variable. UNITSholds the physical unit string.FILLVALmarks bad/missing samples.VALIDMIN/VALIDMAXgive the expected physical range.VAR_TYPEdistinguishes data ('data'), time axes ('support_data'), and metadata ('metadata').
You can use these attributes to automate axis labelling and fill-value masking without hardcoding variable names.