I recognize the "backup snapshot scrubbed of user data" is likely the preferred option, and one that does reduce resource costs for our sysadmin team. However, it does still require some work on the part of our sysadmins, so we may not be able to give that solution a try until much later. If time is of the essence, you may elect to try Isaiah's throttled webscraper option, since it doesn't require you to wait on us to have some free time. I'll present it to the sysadmins, but can't promise any sort of timeframe.
As Isaiah 65:8 said, we spoke about his plan prior to implementation. There is no issue with doing the same thing as long as you are placing a reasonable throttle on it just has he did.
We have a way to access all the files, and we have a way to access the database information containing autorship, licence, etc. But we don't the have the resources to build scripts that arrange asset files and information .txt files together. However, the code for drupal 7 is available here https://www.drupal.org/download and the custom module code for OGA is available here https://github.com/OpenGameArt/OGA2-Modules
If you create a tool you'd like to try, we'll take a look.
having the files by themselves are incomplete. The files are unusable without the license and attribution that goes with them. To preserve the information, you must preserve the files as well as the license, author, and what other assets the files were derived from.
The respect is appreciated. Scraping the site would result in an automated IP restriction. I'll work with you in any way I can, though. Please let me know what sort of resources you would like and I'll see what is available.
I would like this, but it is low feasibility with our current level of human resources. Each snapshot would have to be manually produced each time, and it can't just be a copy of the site files & database: we would not be able to share sensitive data like user account information, so such data would have to be filtered out.
Then again, I don't really know how wikipedia handles it. If there are tools to facilitate that, I'm willing to listen. Just know that this site is not based on a MediaWiki framework.
Ace Studio: The "License Free" license says "use them for commercial or non-commercial purposes for free" but that is not the same as PD or CC0. Often such licenses have extra conditions like "...but you may not redistribute them as-is nor resell them". Reasonable terms, and you could use such assets in your own project, but it would make them ineligible for hosting on OGA as we are technically a stock asset hosting service and all licenses on OGA permit resale, et cetera. I don't actually know if those extra conditions are present because they don't list the actual license text. It seems you must "apply" for a copy of the full license text by filling out personal details, which I am unwilling to do. I'd love to take a look at the actual license text if anyone else is willing to apply for it and share it here.
However! The license granted by Ace Studio is not the same thing as a license (if any) granted by the owners of the training data. As is common in AI these days, AI trainers will scrape publicly available assets without obtaining permission for their use. "Publicly available" is not the same as "Public Domain". i.e. images from Google Image Search are publicly available, but 90% are copyrighted and non-free. As eugeneloza mentioned, this may be considered Fair-Use.... Buuuuut 1) Fair-Use is not Public Domain and it comes with caveats on how it can be used, and 2) This Fair-Use defense is an assumption being generally made by AI trainers. Everyone is just assuming the courts will conclude its ok to not ask permission from the owners of the training data. OGA can make no such assumptions.
Voca DB: I can't seem to find any clear indications on the terms of use nor any information about their training data. That doesn't mean it isn't there, I just didn't find it. If you see what I'm missing, by all means direct me to the details. However, in the absence of that information, we must assume the terms are "non-free" despite blurbs or license deeds simply saying "it's free!". As with Ace Studio, we can't trust statements of freedom without seeing the full license text.
Vsinger: I wasn't able to find any terms of use at all. In fact, the page scared my malware protection system and halted the site from fully loading. Not a great endorsement of trust to start with, but let me know if anyone else has better luck locating the details of the licensing and training data origins.
These are assessments from the perspective of OGA policy and do not neccessarily mean individual users would be unable to legally use such assets in their projects. What OGA is allowed to do is not the same as what you are allowed to do. That being said, until we have more details on those licenses and training dataset origins, the answer to this question:
"Would it be possible to share cloned voices as assets in this site...?"'
Then you're free to change the license from CC-BY 4.0 to CC0 with no conflicts. It is up to the user of the asset to decide if they want to continue using the assets under the old terms or switch to the new terms.
Bumped for new content.
I recognize the "backup snapshot scrubbed of user data" is likely the preferred option, and one that does reduce resource costs for our sysadmin team. However, it does still require some work on the part of our sysadmins, so we may not be able to give that solution a try until much later. If time is of the essence, you may elect to try Isaiah's throttled webscraper option, since it doesn't require you to wait on us to have some free time. I'll present it to the sysadmins, but can't promise any sort of timeframe.
As Isaiah 65:8 said, we spoke about his plan prior to implementation. There is no issue with doing the same thing as long as you are placing a reasonable throttle on it just has he did.
We have a way to access all the files, and we have a way to access the database information containing autorship, licence, etc. But we don't the have the resources to build scripts that arrange asset files and information .txt files together. However, the code for drupal 7 is available here https://www.drupal.org/download and the custom module code for OGA is available here https://github.com/OpenGameArt/OGA2-Modules
If you create a tool you'd like to try, we'll take a look.
having the files by themselves are incomplete. The files are unusable without the license and attribution that goes with them. To preserve the information, you must preserve the files as well as the license, author, and what other assets the files were derived from.
The respect is appreciated. Scraping the site would result in an automated IP restriction. I'll work with you in any way I can, though. Please let me know what sort of resources you would like and I'll see what is available.
I would like this, but it is low feasibility with our current level of human resources. Each snapshot would have to be manually produced each time, and it can't just be a copy of the site files & database: we would not be able to share sensitive data like user account information, so such data would have to be filtered out.
Then again, I don't really know how wikipedia handles it. If there are tools to facilitate that, I'm willing to listen. Just know that this site is not based on a MediaWiki framework.
However! The license granted by Ace Studio is not the same thing as a license (if any) granted by the owners of the training data. As is common in AI these days, AI trainers will scrape publicly available assets without obtaining permission for their use. "Publicly available" is not the same as "Public Domain". i.e. images from Google Image Search are publicly available, but 90% are copyrighted and non-free. As eugeneloza mentioned, this may be considered Fair-Use.... Buuuuut 1) Fair-Use is not Public Domain and it comes with caveats on how it can be used, and 2) This Fair-Use defense is an assumption being generally made by AI trainers. Everyone is just assuming the courts will conclude its ok to not ask permission from the owners of the training data. OGA can make no such assumptions.
These are assessments from the perspective of OGA policy and do not neccessarily mean individual users would be unable to legally use such assets in their projects. What OGA is allowed to do is not the same as what you are allowed to do. That being said, until we have more details on those licenses and training dataset origins, the answer to this question:
... is "no", unfortunately.
Then you're free to change the license from CC-BY 4.0 to CC0 with no conflicts. It is up to the user of the asset to decide if they want to continue using the assets under the old terms or switch to the new terms.
It depends. Is the content who's license is changing derived from anyone else's work?
Pages