How to bring your own data¶
Resolve against your own entities and attach external data to an existing resolver —
using Resolver.from_records and Resolver.augment.
Resolve against your own records¶
Resolver.from_records mints one entity per row and hands you a fully operational
resolver over your data. No database, no server, no schema up front.
The resolved entity ID follows the pattern <domain>/<id>:
from resolvekit import Resolver
r = Resolver.from_records(
[{"id": "w1", "label": "Widget", "sku": "abc"},
{"id": "w2", "label": "Gadget", "sku": "xyz"}],
domain="custom", # zero-config standalone domain
name="label", # required: which column is the canonical name
id="id", # entity-ID seed column (sequential ints if omitted)
codes=["sku"], # code columns (list → system name == column name)
)
r.resolve("Widget").entity_id # 'custom/w1'
r.entity(sku="abc").entity_id # 'custom/w1'
Column mapping¶
| Kwarg | What it does | Required? |
|---|---|---|
name |
Column whose value becomes the canonical name | Yes |
id |
Column to use as the entity-ID seed; sequential ints if omitted | No |
codes |
List of column names to expose as code systems, or a {"system": "column"} dict |
No |
aliases |
Column or list of columns holding extra resolvable aliases | No |
attrs |
Columns to attach as attributes; "rest" keeps all unlisted columns (default drops them) |
No |
entity_type |
Column name to read the type from each row, or a literal string to stamp on every entity | No |
namespace |
Override the entity-ID prefix (defaults to domain) |
No |
codes — list vs dict form. Pass a list when the column name is also the system
name you want to query by:
Pass a dict to map an external system name onto a different column:
Note
Code lookup via entity() is case-insensitive — r.entity(sku="ABC") and
r.entity(sku="abc") both match. Store code values in whatever case your data
uses; resolvekit casefolds on lookup.
Accepted data forms¶
data (the first positional argument) accepts any of:
list[dict]— one dict per rowdict— mapping of id → record dict- A file path (string or
Path) to a CSV, JSON, or JSONL file - A
pandas.DataFrame(requiresresolvekit[pandas]) - A
polars.DataFrame(requiresresolvekit[polars])
Attach external data to an existing resolver¶
augment links rows from an external dataset onto a resolver's existing entities,
attaching new codes, aliases, or attributes without rebuilding the resolver.
This example joins per-country GDP figures onto the bundled country entities by ISO-3
code. The base is a single-domain geo resolver (Resolver.lite()); rows are matched on
iso3, and unmatched rows (here "ZZZ") are skipped:
from resolvekit import Resolver
base = Resolver.lite() # single-domain geo resolver
report = base.augment(
[{"iso3": "KEN", "gdp_usd_bn": 113.4},
{"iso3": "UGA", "gdp_usd_bn": 49.3},
{"iso3": "ZZZ", "gdp_usd_bn": 0.0}], # ZZZ matches nothing -> skipped
link_on=["iso3"], # ordered code systems (or "name") to match on
add_attrs=["gdp_usd_bn"], # columns to attach to linked entities
on_miss="skip", # "skip" (default) | "mint" | "error"
return_report=True, # return AugmentResult; omit to get the resolver directly
)
report.linked, report.minted, report.skipped # (2, 0, 1)
report.resolver.entity("Kenya").attributes["gdp_usd_bn"] # 113.4
Matching is case-insensitive on code values, so "KEN", "ken", and "Ken" all link
to the same entity.
Augment parameters¶
| Kwarg | What it does |
|---|---|
link_on |
Ordered list of code systems to match on, or "name" for canonical-name matching |
columns |
{"role_or_system": "column"} rename map — use when incoming column names differ from the system names in link_on |
add_codes |
Columns to register as new code systems on matched entities |
add_aliases |
Columns to add as resolvable aliases on matched entities |
add_attrs |
Columns to attach as attributes on matched entities |
on_miss |
What to do when a row doesn't match: "skip" (default), "mint" (create a new entity), or "error" |
return_report |
If True, returns an AugmentResult instead of the resolver directly |
AugmentResult fields: resolver, linked, minted, skipped, ambiguous, errors.
link_on="name". Canonical-name matching needs at least one of add_aliases or
add_codes so augment can identify which column holds the name to match.
on_miss="mint". Rows that don't link to an existing entity are minted as new
entities in the same domain. Their count appears in report.minted.
Heads up
augment requires a single-domain base resolver — pass Resolver.lite(),
Resolver.auto(domains=["geo"]), or a from_records base, not a multi-domain
Resolver.auto(). link_on accepts code systems (iso3, dcid, wikidata, …) and
the "name" sentinel. Code-based linking normalises values case-insensitively (so "FRA",
"fra", and "Fra" all link to the same entity). Name-based linking (link_on=["name"])
requires at least one of add_aliases or add_codes to identify the name column, and also
links case-insensitively.
Next¶
- Resolver reference — full parameter listings for
from_recordsandaugment. - API reference — module-level functions and the
AugmentResulttype.