Skip to content

How to bring your own data

Resolve against your own entities and attach external data to an existing resolver — using Resolver.from_records and Resolver.augment.

Resolve against your own records

Resolver.from_records mints one entity per row and hands you a fully operational resolver over your data. No database, no server, no schema up front.

The resolved entity ID follows the pattern <domain>/<id>:

'custom/w1'
'custom/w1'
from resolvekit import Resolver

r = Resolver.from_records(
    [{"id": "w1", "label": "Widget", "sku": "abc"},
     {"id": "w2", "label": "Gadget", "sku": "xyz"}],
    domain="custom",     # zero-config standalone domain
    name="label",        # required: which column is the canonical name
    id="id",             # entity-ID seed column (sequential ints if omitted)
    codes=["sku"],       # code columns (list → system name == column name)
)
r.resolve("Widget").entity_id      # 'custom/w1'
r.entity(sku="abc").entity_id      # 'custom/w1'

Column mapping

Kwarg What it does Required?
name Column whose value becomes the canonical name Yes
id Column to use as the entity-ID seed; sequential ints if omitted No
codes List of column names to expose as code systems, or a {"system": "column"} dict No
aliases Column or list of columns holding extra resolvable aliases No
attrs Columns to attach as attributes; "rest" keeps all unlisted columns (default drops them) No
entity_type Column name to read the type from each row, or a literal string to stamp on every entity No
namespace Override the entity-ID prefix (defaults to domain) No

codes — list vs dict form. Pass a list when the column name is also the system name you want to query by:

codes=["sku"]           # r.entity(sku="abc")

Pass a dict to map an external system name onto a different column:

codes={"iso3": "country_code"}   # r.entity(iso3="KEN")

Note

Code lookup via entity() is case-insensitive — r.entity(sku="ABC") and r.entity(sku="abc") both match. Store code values in whatever case your data uses; resolvekit casefolds on lookup.

Accepted data forms

data (the first positional argument) accepts any of:

  • list[dict] — one dict per row
  • dict — mapping of id → record dict
  • A file path (string or Path) to a CSV, JSON, or JSONL file
  • A pandas.DataFrame (requires resolvekit[pandas])
  • A polars.DataFrame (requires resolvekit[polars])

Attach external data to an existing resolver

augment links rows from an external dataset onto a resolver's existing entities, attaching new codes, aliases, or attributes without rebuilding the resolver.

This example joins per-country GDP figures onto the bundled country entities by ISO-3 code. The base is a single-domain geo resolver (Resolver.lite()); rows are matched on iso3, and unmatched rows (here "ZZZ") are skipped:

linked=2 minted=0 skipped=1 ambiguous=0
113.4
from resolvekit import Resolver

base = Resolver.lite()   # single-domain geo resolver

report = base.augment(
    [{"iso3": "KEN", "gdp_usd_bn": 113.4},
     {"iso3": "UGA", "gdp_usd_bn": 49.3},
     {"iso3": "ZZZ", "gdp_usd_bn": 0.0}],   # ZZZ matches nothing -> skipped
    link_on=["iso3"],      # ordered code systems (or "name") to match on
    add_attrs=["gdp_usd_bn"],  # columns to attach to linked entities
    on_miss="skip",        # "skip" (default) | "mint" | "error"
    return_report=True,    # return AugmentResult; omit to get the resolver directly
)
report.linked, report.minted, report.skipped   # (2, 0, 1)
report.resolver.entity("Kenya").attributes["gdp_usd_bn"]   # 113.4

Matching is case-insensitive on code values, so "KEN", "ken", and "Ken" all link to the same entity.

Augment parameters

Kwarg What it does
link_on Ordered list of code systems to match on, or "name" for canonical-name matching
columns {"role_or_system": "column"} rename map — use when incoming column names differ from the system names in link_on
add_codes Columns to register as new code systems on matched entities
add_aliases Columns to add as resolvable aliases on matched entities
add_attrs Columns to attach as attributes on matched entities
on_miss What to do when a row doesn't match: "skip" (default), "mint" (create a new entity), or "error"
return_report If True, returns an AugmentResult instead of the resolver directly

AugmentResult fields: resolver, linked, minted, skipped, ambiguous, errors.

link_on="name". Canonical-name matching needs at least one of add_aliases or add_codes so augment can identify which column holds the name to match.

on_miss="mint". Rows that don't link to an existing entity are minted as new entities in the same domain. Their count appears in report.minted.

Heads up

augment requires a single-domain base resolver — pass Resolver.lite(), Resolver.auto(domains=["geo"]), or a from_records base, not a multi-domain Resolver.auto(). link_on accepts code systems (iso3, dcid, wikidata, …) and the "name" sentinel. Code-based linking normalises values case-insensitively (so "FRA", "fra", and "Fra" all link to the same entity). Name-based linking (link_on=["name"]) requires at least one of add_aliases or add_codes to identify the name column, and also links case-insensitively.

Next