Photograph Dataset Scripts
Scripts to build photograph datasets from Wikimedia Commons. For more information, see https://zpl.fi/high-quality-photograph-dataset/.
Usage
download-data.py
fetches search results from Wikimedia Commons API and outputs them as JSON.
create-dataset.py
reads in the JSON data, filters the results and outputs CSV.
The datasets in this repository are created by running:
python download-data.py 'incategory:Quality_images incategory:CC-Zero' > quality.json
python create-dataset.py < quality.json > quality.csv
and
python download-data.py 'incategory:Featured_pictures_on_Wikimedia_Commons incategory:CC-Zero' > featured.json
python create-dataset.py < featured.json > featured.csv
and
python download-data.py 'incategory:Pictures_of_the_Year_(first_place)' 'incategory:Pictures_of_the_Year_(second_place)' 'incategory:Pictures_of_the_Year_(third_place)' > poty.json
python create-dataset.py --no-restrictions < poty.json > poty.csv