Skip to content

Data Schema Reference

A concentrated one-page reference for developers. Each section links to the authoritative architecture doc where the concern is discussed in depth.


MQTT Topics

Sensor → ingester (raw, per-source)

wesense/v2/wifi/{country}/{subdivision}/{device_id}          # WeSense ESP32 over WiFi, protobuf
wesense/v2/lora/{device_id}                                   # WeSense LoRa readings, protobuf
wesense/v2/lora/metadata/{device_id}                          # LoRa device metadata

Payload on the WiFi topic is the SensorArrayReading protobuf (multiple readings per message, one reading per chip). Full protobuf definition and Zenoh key expressions in Topic Structure.

Ingester → network (decoded, per-source)

wesense/decoded/{source}/{country}/{subdivision}/{device_id}

Payload is a decoded JSON document, one per reading type. Subscribers include the map, archive workflows, and the live P2P transport. {source} is one of wesense, meshtastic, homeassistant, govaq, etc.

Registries (OrbitDB)

The wesense.nodes and wesense.trust OrbitDB stores hold network identity and trust records. See Topic Structure → Node/Trust Registration.


ClickHouse — sensor_readings table

Primary analytical table. Canonical SQL lives in wesense-clickhouse-live/create_sensor_readings.sql; migrations in the same repo's migrations/ folder.

Engine: ReplacingMergeTree(timestamp) — most-recent version of each (device_id, reading_type, timestamp) key wins on merge. Partition: toYYYYMM(timestamp) — one partition per month. Order: (device_id, reading_type, timestamp). TTL: 3 years on timestamp.

Key columns

ColumnTypeNotes
timestampDateTime64(3, 'UTC')Sensor timestamp, millisecond precision
device_idStringUnique per sensor, usually the MAC-derived ID
data_sourceLowCardinality(String)Lowercase source token (wesense, meshtastic, …)
data_source_nameLowCardinality(String)Human-readable display name for the source
network_sourceLowCardinality(String)How the sensor is connected: wifi, lora, meshtastic, …
ingestion_node_idLowCardinality(String)Which station ingested this reading
reading_typeLowCardinality(String)Canonical type token: temperature, humidity, co2, pm2_5, voc, nox, pressure, …
reading_type_nameLowCardinality(String)Human-readable display name (e.g. PM2.5 for pm2_5)
valueFloat64Reading value in canonical units
unitLowCardinality(String)Unit string (e.g. °C, %, ppm, µg/m³)
sample_countUInt16Samples aggregated into this reading
sample_interval_avgUInt16Average sample interval in seconds
value_min, value_maxFloat64Min/max across aggregated samples
latitude, longitudeFloat64WGS84, device-reported or geocoded
altitudeNullable(Float32)Metres above sea level when known
geo_countryLowCardinality(String)ISO 3166-1 alpha-2 (lowercase): nz, au, gb
geo_subdivisionLowCardinality(String)ISO 3166-2 (lowercase): auk, wko, eng
geo_h3_res8UInt64H3 index at resolution 8 for fast spatial queries
sensor_model, board_modelLowCardinality(String)Hardware identifiers
calibration_statusLowCardinality(String)unknown, factory, user_calibrated, …
data_quality_flagLowCardinality(String)Default unvalidated; see Data Quality
deployment_typeLowCardinality(String)indoor, outdoor, unknown — classifier output
deployment_type_sourceLowCardinality(String)How deployment_type was determined
transport_typeLowCardinality(String)mqtt, lora, webhook, …
location_sourceLowCardinality(String)How the lat/lon was established
firmware_versionNullable(String)Reporting firmware version
deployment_location, node_nameNullable(String)Owner-set labels
node_info, node_info_urlNullable(String)Optional owner metadata
signatureStringEd25519 signature (hex) — see Data Integrity
public_keyLowCardinality(String)Ed25519 public key (base64), stored per row so archives are self-contained
ingester_idLowCardinality(String)Signing ingester identifier (wsi_xxxxxxxx)
key_versionUInt32Signing key version
signing_payload_versionUInt16Which version of the canonical signing schema was used
received_viaLowCardinality(String)local if ingested here, p2p if synced from another station
data_licenseLowCardinality(String)SPDX identifier, default CC-BY-4.0 — see Data Licensing

See Storage & Archives for partitioning strategy, archive export, deduplication, and the 200-year aggregation roadmap.


Reading ID

A content-based deduplication identifier used at the archive / IPFS layer. Same physical measurement always produces the same ID regardless of which station received it.

python
reading_id = sha256(f"{device_id}|{sensor_timestamp}|{reading_type}|{value}").hexdigest()[:32]

Implementation: wesense-ingester-core/wesense_ingester/ids/reading_id.py. Note sensor_timestamp is the sensor's timestamp in Unix seconds, not the receive time.

Reading IDs are used by archives (one Parquet row per reading ID) and by the P2P replication layer for deduplication; ClickHouse itself deduplicates via the ReplacingMergeTree on the (device_id, reading_type, timestamp) key.


Geocoding

WeSense uses ISO 3166 for all location codes, lowercased throughout:

  • geo_country: ISO 3166-1 alpha-2. Examples: nz, au, gb, us, de.
  • geo_subdivision: ISO 3166-2 region code (without the country prefix). Examples: auk (Auckland, NZ-AUK), wko (Waikato, NZ-WKO), eng (England, GB-ENG).

The mapping table used by all ingesters lives in wesense-ingester-core/wesense_ingester/geocoding/iso3166.py. Reverse geocoding from lat/lon uses GeoNames with an LRU cache.


Parquet archive schema

Daily archives are exported as Parquet files keyed by content-addressed CID. Column layout mirrors sensor_readings with two additions:

  • Each row is keyed by reading_id for deduplication across replicas.
  • Compression is ZSTD (benchmarked as optimal — see Phase 5 roadmap).

Archives are partitioned by (country, subdivision, date). Full scheme: Storage & Archives → Long-Term Archival Strategy.


Signing and trust

Every reading is signed with the ingester's Ed25519 key before leaving the station. The signature and the public key are stored per row so any archive is self-verifiable without external lookup. Trust semantics, key rotation, and revocation are covered in Data Integrity and Governance & Trust.


All WeSense data is free and open, forever.