Hanzi Timeline

Implementation Notes

This app shows modern, bronze, seal, and oracle forms with stable IDs and asset-backed rendering. It does not depend on historical-script Unicode coverage.

Data Sources

Modern form: Unicode CJK character.
Oracle forms: JiaGuWen SQLite + oracle image assets.
Bronze + seal forms: EVOBC metadata + local EVOBC image corpus.
Origin summary text: English Wiktionary Chinese “Glyph origin” section.
Origin references per character: Academia Sinica Xiaoxue + Academia Sinica CharDB + Wiktionary links.
EVOBC download source: figshare.com/s/ce2cf55b35a2f8ecc4c6

Current Coverage

Generated runtime records: 793
Oracle: 1602 variants across 793 records
Bronze: 19647 variants across 527 records
Seal: 1497 variants across 728 records
UI variants drawers are available per stage (Bronze, Seal, Oracle Bone).

Canonical Record Model

Each record is keyed by modern character and codepoint with stage rows:

{
  id,
  modernChar,
  modernCodepoint,
  dataset,
  stages: [{ stageName, glyphId, assetType, assetRef }],
  variants: { bronze: [...], seal: [...], oracle: [...] },
  origin: { summary, source, sourceUrl, license, confidence },
  originReferences: [{ id, label, url }]
}

glyphId is canonical truth. We never use PUA codepoints as database truth.

Ingest and Build Pipeline

Download/extract EVOBC image corpus (use extracted `Data-EN` root).
Extract rows from JiaGuWen DB and group by modern character.
Select subset or full JiaGuWen source (`--target-records=500` default, or `--target-records=all`).
Append EVOBC bronze/seal rows for matching modern characters.
Write normalized NDJSON rows to `data/raw/evolution-rows.ndjson`.
Vectorize oracle JPGs into SVG in one batch.
Vectorize EVOBC bronze/seal rasters into SVG in one batch.
Enrich lexical fields from Unihan (`meaning`, `pinyin`, radical/strokes).
Enrich origin summaries from Wiktionary (`originSummary` + citations).
Build generated records to `data/evolution-records.generated.json`.

Lexical Metadata Status

`meaning` records populated: 758
`pinyin` records populated: 793
Source: Unihan `kDefinition` + `kMandarin`.

Historical Origin Metadata

`origin.summary` records populated: 509
Source extractor: English Wiktionary Chinese Glyph-origin section.
Stored with source URL, license label, and confidence score.
Every record also carries direct Xiaoxue + CharDB + Wiktionary lookup links.

Origin Source Acronyms

`WK`: English Wiktionary (Chinese section) - https://en.wiktionary.org/wiki/一#Chinese
`XS`: Academia Sinica Xiaoxue - https://xiaoxue.iis.sinica.edu.tw/yanbian?char=一
`CDB`: Academia Sinica CharDB - https://chardb.iis.sinica.edu.tw/search.jsp?char=一
UI shows acronyms to stay compact; full names live in this docs page.

Asset Strategy

Bronze, seal, and oracle glyphs are rendered from static SVG assets.
URLs are cache-busted during data build using file mtime token (`?v=...`) to avoid stale browser assets.
Modern stage uses Unicode text rendering.

Search Behavior

Primary matching: modern character and modern Unicode codepoint.
Secondary scoring: record IDs and glyph IDs.
Pinyin query mode: exact full-pinyin match, tone-insensitive.
English meaning is not used as a search key.
Search runs on IDs/metadata, not rendered oracle glyph strings.
Origin summary text is currently display-only (not used as search key).

Deep Links

URL state is shareable via query params: `q`, `char`, `theme`, `lang`, `variants`.
`variants` supports a comma list of open variant drawers, e.g. `bronze,oracle`.

Commands

npm run data:import:multistage:full -- --skip-download --evobc-image-root=/path/to/Data-EN
npm run data:import:multistage:full
npm run data:import:multistage
npm run data:import:full
npm run data:import:e2e
npm run data:enrich:origins
npm run data:build
npm run dev

Back to Viewer