aboutsummaryrefslogtreecommitdiff

Photograph Dataset Scripts

Scripts to build photograph datasets from Wikimedia Commons. For more information, see https://zpl.fi/high-quality-photograph-dataset/.

Usage

download-data.py fetches search results from Wikimedia Commons API and outputs them as JSON. create-dataset.py reads in the JSON data, filters the results and outputs CSV.

The datasets in this repository are created by running:

python download-data.py 'incategory:Quality_images incategory:CC-Zero' > quality.json
python create-dataset.py < quality.json > quality.csv

and

python download-data.py 'incategory:Featured_pictures_on_Wikimedia_Commons incategory:CC-Zero' > featured.json
python create-dataset.py < featured.json > featured.csv

and

python download-data.py 'incategory:Pictures_of_the_Year_(first_place)' 'incategory:Pictures_of_the_Year_(second_place)' 'incategory:Pictures_of_the_Year_(third_place)' > poty.json
python create-dataset.py --no-restrictions < poty.json > poty.csv