Libreloc
Libre Geolocation is an initiative to provide a FOSS, privacy-friendly, community-driven alternative to Mozilla Location Services.
Libreloc is an implementation of a server and client currently under development.
Please see the work-in-progress documentation at: https://libregeolocation.org/libreloc.html
Overview and FAQ
Libreloc provides an API compatible with Mozilla Location Services that can be used on various mobile devices:
-
Android (including GrapheneOS microG LineageOS)
-
Linux mobile devices
-
Linux and Windows laptops
Will Libreloc publish WiFi MAC addresses and SSID around me?
No. Data will be obfuscated before release and/or published in aggregated formats.
Libreloc aims to be GDPR compliant and avoid privacy leaks like in this research paper.
Are you storing MAC addresses and SSIDs?
No, such data is hashed in non-reversible manners before it touches the database.
Can hashed data be bruteforced using powerful GPUs?
Probably not. We are planning to use short enough hashes so that each individual datapoint would not lead to significant privacy loss. Yet please be aware that the service is currently under development and the parameters around hashing are still to be tuned.
Do you know my location?
No. The server side is does not perform geolocation.
The client is going to provide a MLS-compatible geolocation API /v1/geolocate
and perform location locally.
What if the server goes down or runs out of capacity?
We are planning to support geographical and logical sharding and failover.
Can I host an instance for my organization?
Yes, and instance maintainers are encouraged to do so.
End-to-end demo
A simple end-to-end demo using client and server can be ran using WiFi data from NeoStumbler:
cargo build --release
# Run the server on localhost and monitor journald logs
./target/release/libreloc_server &
# Upload a large enough dataset
CONF_FN=testbed.json ./target/release/libreloc_client upload-csv wifis.csv
# Simulate lookup using nearby emitters. The first 9 entries in the CSV must be from the same location
head wifis.csv > my_wifis.csv
CONF_FN=testbed.json ./target/release/libreloc_client locate-csv my_wifi.csv
Goals
-
Provide geolocation for a diverse family of devices across Android, Linux etc
-
Manage privacy issues; do not breach GDPR
-
Keep server requirements (CPU/memory/storage) reasonably low
-
Limit single points of failure on technical and organizational level
Difficult use-cases
IoT or laptop: a device without GSM and GPS. Relies only on WiFi/BT, therefore depends on the quality of data captured by GPS-enabled devices.
Traveller: a mobile device with limited or no access to the Internet where pre-caching phone/wifi/bt maps is possible.
Mobile access point: a mobile router or phone can create a privacy breach and be used to track the location of the owner. See https://www.cs.umd.edu/~dml/papers/wifi-surveillance-sp24.pdf
See the threat model for details.
Threat model
WiFi and Bluetooth based geolocation is an accidental feature due to the diffusion of access points and other devices and was not built by design. As today it is still difficult to define a threat model around it as the security and privacy expectation are unclear.
Nonetheless we want to mitigate risk in the following scenarios:
"The stalker"
Alice travels carrying her mobile AP with her. Mallory knows the macaddr and SSID and want to track Alice’s location around the world.
"The follower"
Bob uploads datapoints. Mallory want to track Bob’s location based on upload traffic.
"The decoy"
Mallory wants to generate a false location for Alice by uploading artificially crafted datapoints.
"The troll"
Mallory wants to disable Alice’s location by uploading large amounts of false datapoints.
"The curious"
Mallory wants to extract a list of emitters (macaddr/SSIDs) at a given location.
Other constraints
Devices cannot store locally billions of hashed wifi/bt datapoints, however local data can be fetched and cached and users can accept that as initial cache warmup takes seconds. Most users have a home/work/school routine where location data is highly local.
Design
Geolocation based on WiFi, Bluetooth and GSM cells represents the most complex part of Libreloc.
The key consideration is that MAC addresses and SSIDs, taken individually, cannot be considered secret and instead it is safer to assume a motivated attacker has knowledge of a number of them. However, the set of emitter devices that can be detected by a client at a given location and time provides an amount of data unknown to any remote attacker.
As such, Libreloc is designed to allow clients to find their location by correlating data based on multiple nearby emitters.
To provide privacy as described above the backend service does not recive nor store client location data, WiFi and Bluetooth MAC addresses, WiFi SSIDs etc.
Instead, it receives and publishes aggregated, anonymized data using Bloom filters and Count-min sketch.
Data like location, MAC addresses, WiFi SSIDs is hashed together using the cryptographic hash BLAKE3 on the client side. Hashes are aggressively truncated in order to create collisions. A remotely similar approach has been taken in WiGLE’s [m8b]
Libreloc uses Geohash to create a hierarchical world map. The location accuracy is proportional to the length of the geohash.
The server acts as a key-value datastore that associates each truncated hash to a small structure, called minimap, that represent geohash values with only few digits of precision. Each item in a minimap has a boolean value: 1 if the there is any device matching such hash value in that cell, 0 otherwise. Multiple devices around the world can cause a match. This effectively implements a Bloom filter.
Crucially, a minimap does not represent a specific, fixed area in the world.
<TODO>
The client
Unlike other location services, Libreloc performs part of the geolocation process on the client side. This is crucial to guarantee the privacy of the user and the WiFi and Bluetooth emitters that clients detect.
Such design of the client allows:
-
limiting upload of GDPR-sensitive data like full AP macaddrs, i.e. upload hashed values instead
-
provides fallbacks where Internet access / GSM / GPS are not available
-
allows sharding/load-balancing servers and failover
Location discovery resources
Type | Accuracy | Availability | Internet | GSM | Data Plan | GPS |
---|---|---|---|---|---|---|
Previous loc |
Variable |
High |
||||
GPS |
5 Meter |
Low |
y |
|||
Phone Cells |
5 Km |
Low-medium |
y |
|||
GSM country |
Country |
Low-medium |
y |
|||
Wifi/BT nodes |
Meters |
Low-medium |
||||
GeoIP |
City/Country |
Medium |
y |
y? |
||
DNS Anycast |
Continent |
Medium |
y |
y? |
||
RTT |
Continent |
Medium |
y |
y? |
Table description: the last 4 columns flag whereas Internet, GSM/LTE, a paid mobile data plan or a GPS receiver is required.
Previous loc: last known location, stored with a timestamp and accuracy. When used, the accuracy value is decreased based on the elapsed time.
Phone Cells: phone tower database, cached locally. Works without a dataplan.
GSM country: Mobile Country Code (MCC). Works without a dataplan.
GeoIP: public-ipaddr based lookup. Usually pretty reliable at country granularity [unless VPNs are in use]. Some databases are available without significant licensing restrictions: https://archive.org/download/dbip-country-lite
DNS Anycast: many cloud providers offer inexpensive DNS anycast that can both direct clients to the closest server while also discovering the client network location, both with continent granularity.
RTT: clients can ping or tcp-ping 3-4 endpoints and immediately tell if they are close to one of them using a threshold on latency. Very reliable on continent level [unless VPNs are in use].
The client implement an "incremental" geolookup process where each source of geolocation contributes to increasing the accuracy of the location:
-
Attempt to use readily available data: GPS location, last known location, GSM-based location, GeoIP-based positioning, etc
-
If needed, download GSM tower cell data and cache it locally
-
If needed, download hashed wifi/BT data and cache it locally
By having discovered the location on country/continent level in step 1, the client can connect to the closest server. This allows sharding geographical data across macroareas and also increases reliability.
Contributing
The Rust documentation for the codebase is published at:
When contributing to the codebase update the licensing data on .reuse/dep5
and use a comment stile compatible
Git Cliff.
Running in development mode
Build and run locally with CONF_FN=testbed.toml cargo run
Monitor with:
sudo journalctl -f --identifier libreloc
It generates metrics locally using the StatsD protocol. Run a StatsD receiver like Netdata on UDP port 8125
Building a Debian package for testing or deployment
The service is started and managed by a Systemd unit.
make debian_install_build_deps
make debian_build_deb
Benchmarking
Use Samply with release build as:
sudo apt install samply cargo build --release samply record ./target/release/libreloc_server samply record ./target/release/libreloc_client
Running the server in a container
sudo mmdebstrap --include=dbus-broker,systemd-container unstable /srv/libreloc_root sudo systemd-nspawn -D /srv/libreloc_root --machine libreloc -U --boot systemctl restart libreloc-server.service systemctl status libreloc-server.service
Roadmap
-
Basic CI
-
Research lookup maps
-
Metrics
-
Generate docs from CI
-
Benchmark databases
-
Deployment tools and documentation
-
Public metrics dashboard
-
Full CI
-
Privacy-aware API and caching
-
Data backup
References
-
Erik Rye, Dave Levin, (2024) "Surveilling the Masses with Wi-Fi-Based Positioning Systems" DOI: https://doi.org/10.48550/arXiv.2405.14975 Site: https://arxiv.org/abs/2405.14975
-
Boutet, A., & Cunche, M. (2021). "Privacy protection for Wi-Fi location positioning systems". Workshop on Information Security Applications. DOI: https://doi.org/10.1016/J.JISA.2020.102635. https://www.sciencedirect.com/science/article/abs/pii/S2214212620307985
-
Yang, X. Y., Luo, Y., Xu, M., Fu, S., & Chen, Y. (2022). "Privacy-preserving WiFi Fingerprint Localization Based on Spatial Linear Correlation". Lecture Notes in Computer Science, Wireless Algorithms, Systems, and Applications. DOI: https://doi.org/10.1007/978-3-031-19208-1_33. Online: https://link.springer.com/chapter/10.1007/978-3-031-19208-1_33
-
Zhang, G., Zhao, P., & Zhang, A. (2024). "Lightweight Privacy-Preserving Scheme in WiFi Fingerprint-Based Indoor Localization". Signals and Communication Technology. DOI: https://doi.org/10.1007/978-3-031-58013-0_5. Online: https://link.springer.com/chapter/10.1007/978-3-031-58013-0_5
-
Rusca, R., Carluccio, A., Casetti, C., & Giaccone, P. (2024). "Privacy-preserving WiFi-based crowd monitoring". Transactions on Emerging Telecommunications Technologies. DOI: https://doi.org/10.1002/ett.4956. Online: https://onlinelibrary.wiley.com/doi/10.1002/ett.4956
-
Rusca, R., Carluccio, A., Casetti, C., & Giaccone, P. (2023). "Privacy-preserving WiFi-based Crowd Monitoring". Preprint. DOI: https://doi.org/10.22541/au.169960264.44881930/v1. Online: https://www.researchsquare.com/article/rs-2762130/v1
-
WiGLE’s m8b format https://github.com/wiglenet/m8b