Accessing Data

All WeSense data is free and open — published under CC-BY-4.0 by default, and signed end-to-end so anyone can verify where each reading came from. This page explains the ways to get at it, from a quick visual browse through to pulling the full archive for your own research.

Which option is right for me?

Just looking around → Live map.
Building a dashboard or bot → Live MQTT subscription.
Research, historical analysis, long time series → Historical Parquet archives.
Complex ad-hoc queries, joining with your own data → Run a station and query locally.

Live map

The fastest way to look at data is map.wesense.earth. It streams live readings over MQTT under the hood and visualises sensor locations, current values, and short-term trends. Handy for operators checking their own sensors, and for anyone curious about local air quality.

No account needed.

Live MQTT subscription

WeSense publishes decoded readings on a public MQTT topic tree. Every reading flows through wesense/decoded/{source}/{country}/{subdivision}/{device_id} as a JSON payload, and you can subscribe to whichever slice you care about.

Public broker: mqtt.wesense.earth:8883 (TLS). Contact a hub operator for credentials, or run your own hub and subscribe locally — both work equally well.

Topic examples

bash

# Everything, firehose:
wesense/decoded/#

# Only WeSense-branded sensors:
wesense/decoded/wesense/#

# Everything from New Zealand:
wesense/decoded/+/nz/#

# Auckland only:
wesense/decoded/+/nz/auk/#

# One specific device:
wesense/decoded/+/nz/auk/office_301274c0e8fc

Quick-start examples

mosquitto_sub (CLI):

bash

mosquitto_sub -h mqtt.wesense.earth -p 8883 --capath /etc/ssl/certs \
  -u "your_user" -P "your_pass" \
  -t 'wesense/decoded/+/nz/#' -v

Python (paho-mqtt):

python

import json
import paho.mqtt.client as mqtt

def on_message(client, userdata, msg):
    reading = json.loads(msg.payload)
    print(f"{reading['timestamp']} {reading['reading_type']}={reading['value']}{reading.get('unit','')}")

client = mqtt.Client()
client.username_pw_set("your_user", "your_pass")
client.tls_set()
client.on_message = on_message
client.connect("mqtt.wesense.earth", 8883, 60)
client.subscribe("wesense/decoded/+/nz/#")
client.loop_forever()

Payload shape — one reading per message, with full provenance:

json

{
  "timestamp": "2026-04-18T03:17:00Z",
  "device_id": "office_301274c0e8fc",
  "reading_type": "co2",
  "reading_type_name": "CO2",
  "value": 850.0,
  "unit": "ppm",
  "latitude": -36.8485,
  "longitude": 174.7633,
  "geo_country": "nz",
  "geo_subdivision": "auk",
  "sensor_model": "SCD4X",
  "data_license": "CC-BY-4.0",
  "signature": "…",
  "public_key": "…"
}

Full payload documentation: Data Schema Reference and Topic Structure.

Why the source segment comes after decoded/

The {source} segment is the data's origin (WeSense ESP32s, Meshtastic nodes, Home Assistant bridges, government air-quality APIs, etc). Use it to filter to a specific source, or wildcard it with + if you want everything.

Historical Parquet archives

Every day, each guardian station exports its region's data to a Parquet archive and publishes the content-addressed identifier (CID) to the WeSense distributed registry. You download once and verify forever — the CID is the file's SHA-256, so you always know you've got the bytes the network produced.

What's in the archive

Same columns as the ClickHouse schema — timestamps, values, units, location, signatures.
ZSTD-compressed Parquet, partitioned by (country, subdivision, date).
One row per reading (deduplicated across replicas by content-based reading_id).

File sizes are small: a single region's entire day of readings is typically a few MB to a few hundred MB depending on sensor density.

Getting an archive

Archives live in IPFS. The CIDs are discoverable via the WeSense P2P registry — practical access is either through a WeSense station or via a public IPFS gateway.

If you run a WeSense station (see Operate a Station), the Archive Replicator already fetches and pins archives for your configured GUARDIAN_SCOPE. Point your tools at the local IPFS node or the ./data/archives/ directory.

Without a station, use a public IPFS gateway. Once you have a CID (bafybei…), the file is reachable at:

https://ipfs.io/ipfs/<CID>
https://cloudflare-ipfs.com/ipfs/<CID>
https://dweb.link/ipfs/<CID>

Discovering CIDs without a station is currently awkward — there's no public HTTP index yet. The easiest path for pure consumers is either to run a station (it's lightweight — see Deployment Profiles), or ask a friendly station operator for the CID list for your region. A public index is on the roadmap.

Reading Parquet

Any Parquet reader works. A few one-liners:

DuckDB (zero-config, fast):

sql

-- Query a local or remote Parquet directly:
SELECT geo_country, reading_type, avg(value), count(*)
FROM 'nz-auk-2026-04-18.parquet'
GROUP BY 1, 2;

Python (pandas + pyarrow):

python

import pandas as pd
df = pd.read_parquet('nz-auk-2026-04-18.parquet')
print(df.groupby(['reading_type'])['value'].describe())

Python (polars, fast + low memory):

python

import polars as pl
df = pl.read_parquet('nz-auk-2026-04-18.parquet')

ClickHouse local (no server, reads Parquet directly):

bash

clickhouse local --query "
  SELECT geo_subdivision, reading_type, avg(value)
  FROM file('nz-auk-2026-04-18.parquet', Parquet)
  GROUP BY 1, 2"

Run a station and query locally

If you want ad-hoc SQL access, or you're joining WeSense data with your own datasets, the cleanest path is to run a guardian station. You get:

A local ClickHouse replica of everything in your GUARDIAN_SCOPE (your region, your country, or the whole world).
Archive Replicator pinning the historical Parquet archives.
Live MQTT subscriber pulling fresh readings directly.

The minimum hardware is modest — see Run a Bootstrap Node for a reference VPS spec.

Local ClickHouse

Once your station is running, ClickHouse listens on port 8123 (HTTP) on the local network:

bash

clickhouse-client -h localhost -u wesense --password <your-password> \
  -q "SELECT count() FROM wesense.sensor_readings"

Or via HTTP:

bash

curl -u wesense:<your-password> \
  'http://localhost:8123/?query=SELECT+count()+FROM+wesense.sensor_readings'

Why there's no central query endpoint

WeSense is deliberately decentralised — no single organisation owns an API that everyone queries. Running your own replica is how you get full query freedom, and how the network stays resilient to any single participant vanishing. The trade-off is the modest setup step, and we think it's the right one for a 200-year archival system. See Architecture Overview → Decentralization Principles for the fuller argument.

Data license and citation

Default licence is CC-BY-4.0. Individual rows carry a data_license field, so if any data was contributed under a different licence you can identify it per reading. When publishing analysis or derivatives, attribute "WeSense community sensor network" and link back to wesense.earth.

Full terms: Data Licensing.

Data Schema Reference — the shape of every reading, column by column
Topic Structure — MQTT / Zenoh / OrbitDB message formats
Storage & Archives — how archives are built and what's in them
Data Integrity — signing and verification so you can trust what you download
Operate a Station — setup guide if you want local access

Accessing Data ​

Live map ​

Live MQTT subscription ​

Topic examples ​

Quick-start examples ​

Historical Parquet archives ​

What's in the archive ​

Getting an archive ​

Reading Parquet ​

Run a station and query locally ​

Local ClickHouse ​

Why there's no central query endpoint ​

Data license and citation ​

Related references ​

Accessing Data

Live map

Live MQTT subscription

Topic examples

Quick-start examples

Historical Parquet archives

What's in the archive

Getting an archive

Reading Parquet

Run a station and query locally

Local ClickHouse

Why there's no central query endpoint

Data license and citation

Related references