Open Dataset for AI Model Training Includes Six Million Images
Satellogic Releases Extensive EO Database
A large open dataset of high-resolution imagery, curated from the Satellogic archive, has been released by the company to support AI training of foundation models.
“Instead of relying on analysts to manually select and process satellite images, we will soon start interacting with large Earth Observation AI models with access to high-resolution, real-time imagery of our planet to derive those insights.”
Javier Marin, Satellogic
The dataset contains around 3 million Satellogic images of unique locations — 6 million images, including location revisits — from around the world. Each image is 384 by 384 pixels, totaling 900 Gigapixels spanning different land-use types, objects, geographies, and seasons. The full dataset can be accessed on Hugging Face.
“Following a stream of recent publications, with the release of this large dataset we aim to accelerate the development of foundational models in the field of EO,” said Javier Marin, Applied AI Director at Satellogic. “Instead of relying on analysts to manually select and process satellite images, we will soon start interacting with large Earth Observation AI models with access to high-resolution, real-time imagery of our planet to derive those insights.”
Satellogic data is released under a Creative Commons CC-BY 4.0 license, allowing for commercial use of the data with attribution.
A paper presenting the dataset will be published along with the release of a baseline foundation model, a masked autoencoder (scalable self-supervised learners for computer vision), built on top of it. The paper describes how the dataset is built, the model architecture and experimental setup. This work is the result of Satellogic’s collaboration with an exceptional team of researchers led by Alexandre Lacoste at ServiceNow under Yoshua Bengio’s guidance.