Libreloc

CI badge License REUSE Rust Codeberg

Libre Geolocation is an initiative to provide a FOSS, privacy-friendly, community-driven alternative to Mozilla Location Services.

Libreloc is an implementation of a server and client currently under development.

Please see the work-in-progress documentation at: https://libregeolocation.org/libreloc.html

Overview and FAQ

Libreloc provides an API compatible with Mozilla Location Services that can be used on various mobile devices:

Will Libreloc publish WiFi MAC addresses and SSID around me?

No. Data will be obfuscated before release and/or published in aggregated formats.

Libreloc aims to be GDPR compliant and avoid privacy leaks like in this research paper.

Are you storing MAC addresses and SSIDs?

No, such data is hashed in non-reversible manners before it touches the database.

Can hashed data be bruteforced using powerful GPUs?

Probably not. We are planning to use short enough hashes so that each individual datapoint would not lead to significant privacy loss. Yet please be aware that the service is currently under development and the parameters around hashing are still to be tuned.

Do you know my location?

No. The server side is does not perform geolocation. The client is going to provide a MLS-compatible geolocation API /v1/geolocate and perform location locally.

What if the server goes down or runs out of capacity?

We are planning to support geographical and logical sharding and failover.

Can I host an instance for my organization?

Yes, and instance maintainers are encouraged to do so.

End-to-end demo

A simple end-to-end demo using client and server can be ran using WiFi data from NeoStumbler:

cargo build --release

# Run the server on localhost and monitor journald logs
./target/release/libreloc_server &

# Upload a large enough dataset
CONF_FN=testbed.json ./target/release/libreloc_client upload-csv wifis.csv

# Simulate lookup using nearby emitters. The first 9 entries in the CSV must be from the same location
head wifis.csv > my_wifis.csv
CONF_FN=testbed.json ./target/release/libreloc_client locate-csv my_wifi.csv

Goals

  • Provide geolocation for a diverse family of devices across Android, Linux etc

  • Manage privacy issues; do not breach GDPR

  • Keep server requirements (CPU/memory/storage) reasonably low

  • Limit single points of failure on technical and organizational level

Difficult use-cases

IoT or laptop: a device without GSM and GPS. Relies only on WiFi/BT, therefore depends on the quality of data captured by GPS-enabled devices.

Traveller: a mobile device with limited or no access to the Internet where pre-caching phone/wifi/bt maps is possible.

Mobile access point: a mobile router or phone can create a privacy breach and be used to track the location of the owner. See https://www.cs.umd.edu/~dml/papers/wifi-surveillance-sp24.pdf

See the threat model for details.

Threat model

WiFi and Bluetooth based geolocation is an accidental feature due to the diffusion of access points and other devices and was not built by design. As today it is still difficult to define a threat model around it as the security and privacy expectation are unclear.

Nonetheless we want to mitigate risk in the following scenarios:

"The stalker"

Alice travels carrying her mobile AP with her. Mallory knows the macaddr and SSID and want to track Alice’s location around the world.

"The follower"

Bob uploads datapoints. Mallory want to track Bob’s location based on upload traffic.

"The decoy"

Mallory wants to generate a false location for Alice by uploading artificially crafted datapoints.

"The troll"

Mallory wants to disable Alice’s location by uploading large amounts of false datapoints.

"The curious"

Mallory wants to extract a list of emitters (macaddr/SSIDs) at a given location.

Other constraints

Devices cannot store locally billions of hashed wifi/bt datapoints, however local data can be fetched and cached and users can accept that as initial cache warmup takes seconds. Most users have a home/work/school routine where location data is highly local.

Design

Geolocation based on WiFi, Bluetooth and GSM cells represents the most complex part of Libreloc.

The key consideration is that MAC addresses and SSIDs, taken individually, cannot be considered secret and instead it is safer to assume a motivated attacker has knowledge of a number of them. However, the set of emitter devices that can be detected by a client at a given location and time provides an amount of data unknown to any remote attacker.

As such, Libreloc is designed to allow clients to find their location by correlating data based on multiple nearby emitters.

To provide privacy as described above the backend service does not recive nor store client location data, WiFi and Bluetooth MAC addresses, WiFi SSIDs etc.

Instead, it receives and publishes aggregated, anonymized data using Bloom filters and Count-min sketch.

Data like location, MAC addresses, WiFi SSIDs is hashed together using the cryptographic hash BLAKE3 on the client side. Hashes are aggressively truncated in order to create collisions. A remotely similar approach has been taken in WiGLE’s [m8b]

Libreloc uses Geohash to create a hierarchical world map. The location accuracy is proportional to the length of the geohash.

The server acts as a key-value datastore that associates each truncated hash to a small structure, called minimap, that represent geohash values with only few digits of precision. Each item in a minimap has a boolean value: 1 if the there is any device matching such hash value in that cell, 0 otherwise. Multiple devices around the world can cause a match. This effectively implements a Bloom filter.

Crucially, a minimap does not represent a specific, fixed area in the world.

<TODO>

The client

Unlike other location services, Libreloc performs part of the geolocation process on the client side. This is crucial to guarantee the privacy of the user and the WiFi and Bluetooth emitters that clients detect.

Such design of the client allows:

  • limiting upload of GDPR-sensitive data like full AP macaddrs, i.e. upload hashed values instead

  • provides fallbacks where Internet access / GSM / GPS are not available

  • allows sharding/load-balancing servers and failover

Location discovery resources

Type Accuracy Availability Internet GSM Data Plan GPS

Previous loc

Variable

High

GPS

5 Meter

Low

y

Phone Cells

5 Km

Low-medium

y

GSM country

Country

Low-medium

y

Wifi/BT nodes

Meters

Low-medium

GeoIP

City/Country

Medium

y

y?

DNS Anycast

Continent

Medium

y

y?

RTT

Continent

Medium

y

y?

Table description: the last 4 columns flag whereas Internet, GSM/LTE, a paid mobile data plan or a GPS receiver is required.

Previous loc: last known location, stored with a timestamp and accuracy. When used, the accuracy value is decreased based on the elapsed time.

Phone Cells: phone tower database, cached locally. Works without a dataplan.

GSM country: Mobile Country Code (MCC). Works without a dataplan.

GeoIP: public-ipaddr based lookup. Usually pretty reliable at country granularity [unless VPNs are in use]. Some databases are available without significant licensing restrictions: https://archive.org/download/dbip-country-lite

DNS Anycast: many cloud providers offer inexpensive DNS anycast that can both direct clients to the closest server while also discovering the client network location, both with continent granularity.

RTT: clients can ping or tcp-ping 3-4 endpoints and immediately tell if they are close to one of them using a threshold on latency. Very reliable on continent level [unless VPNs are in use].

The client implement an "incremental" geolookup process where each source of geolocation contributes to increasing the accuracy of the location:

  1. Attempt to use readily available data: GPS location, last known location, GSM-based location, GeoIP-based positioning, etc

  2. If needed, download GSM tower cell data and cache it locally

  3. If needed, download hashed wifi/BT data and cache it locally

By having discovered the location on country/continent level in step 1, the client can connect to the closest server. This allows sharding geographical data across macroareas and also increases reliability.

Contributing

The Rust documentation for the codebase is published at:

When contributing to the codebase update the licensing data on .reuse/dep5 and use a comment stile compatible Git Cliff.

Running integration tests

Integration tests use a test local database.

cargo test

Running in development mode

Build and run locally with CONF_FN=testbed.toml cargo run

Monitor with:

sudo journalctl -f --identifier libreloc

It generates metrics locally using the StatsD protocol. Run a StatsD receiver like Netdata on UDP port 8125

Building a Debian package for testing or deployment

The service is started and managed by a Systemd unit.

make debian_install_build_deps
make debian_build_deb

Benchmarking

Use Samply with release build as:

sudo apt install samply
cargo build --release
samply record ./target/release/libreloc_server
samply record ./target/release/libreloc_client

Running the server in a container

sudo mmdebstrap --include=dbus-broker,systemd-container unstable /srv/libreloc_root
sudo systemd-nspawn  -D /srv/libreloc_root --machine libreloc -U --boot
systemctl restart libreloc-server.service
systemctl status libreloc-server.service

Roadmap

  • Basic CI

  • Research lookup maps

  • Metrics

  • Generate docs from CI

  • Benchmark databases

  • Deployment tools and documentation

  • Public metrics dashboard

  • Full CI

  • Privacy-aware API and caching

  • Data backup

References