$12256 / $11500
I would like to make a backup of all the art that has been submitted to this website. I have a feeling though that if I start using a script to download everything, it would possibly result in an IP ban or be highly frowned up due to the extra bandwidth costs. Any suggestions on how I could go about doing this? I've learned that if you don't back things up now, you'll regret it later. I estimate it will be around 1 terabyte of data or less for all the art assets.
I'll ask the host about a backup for you, but your guess about an IP ban is correct.
--Medicine Storm
I suppose if I had to, I could download what archive.org has saved. That would obviously not be all the files, but it would be a start. The other hard part is downloading the licensing for each file. Not a lot of good having a back up if the you don't know the licensing to use it.
If your script is smart enough to grab the asset itself from archive.org, why couldn't it also grab the license for that asset? they're listed together, and there are even script-generated licensing info files available for any asset you download. Regardless, the following may be a more desireable solution:
Sys admin says if you feel like making a drupal module that auto-creates any new submissions into a github repo, he'd allow that. From there you could script download and/or pull the entire archive of OGA assets from github.
--Medicine Storm
I was just thinking that it might be easier than I thought and I could download every file from https://opengameart.org/sites/default/files/* via the Wayback Machine since it allows wildcards and listing every file/URL it knows with wildcards. It would be as easy as downloading every file from the list of URLs it gives. But then I realized that wouldn't get the licensing for the files. I would instead need to look at https://opengameart.org/content/* pages to get the licensing and then scan those for the file links.
I'm honestly not that great of a scripter. I was gifted a bunch of storage space and have been making backups of things that I think might get lost to time.
Have a look at WinHTTrack. It's a free program that will allow you to leech an entire website, if configured properly. If done right, you can download at such limits that you will not risk getting kicked.
https://www.httrack.com/
That could work. Please pm me when you plan on using it so I can make sure it isn't getting too close to hindering site performance.
--Medicine Storm
Speaking of backups. Is there a public backup or a way to restart the site in case something would go wrong?
Recently the website was down and this is usually when one realize how important OGA is :p.
I just started wondering this too! Archive.org is a pretty well funded project, and I think it'd be super spiffy if anything posted here /also/ ended up on the archive.