Improvements to content curation

Improvements to content curation

bart's picture

Edit: Now that we're over the $500 mark, this is our current goal.

So, let's face it.  Right now, OGA's art search feature isn't very good.  The advanced search interface is cumbersome to work with, and sometimes search results that ought to be there just don't show up.  For instance, one might think that if you typed the words "lpc base" in the search box, you'd find the LPC Base Assets, but in fact you actually get no results at all (or you might get a huge flood, YMMV).  (FYI, the work-around in this case is to search for it again in the advanced search box on the left side of the search results, but even in that case, you get too many results, which still isn't particularly useful.)  So, the other option is to browse by category, but the categories are so broad that that isn't particularly helpful either.  This needs to be fixed.

Unfortunately, fixing this will require a large time investment, for several reasons:

  • Right now, we use a Drupal module called Views to do art searches.  I actually like Views a lot; it's great for quickly searching through and displaying lots of data, but frankly it's better suited to simpler types of searches.  What I can do with the Advanced Search form, for instance, is currently limited by the capabilities of Views.
  • To make narrower categories, we have to start collecting a lot more metadata.  The problem with this is that people don't like entering a lot of data when they submit art.  One frequent complaint I get from artists is that the form is already kind of a pain to fill out.  On the other hand, a complaint we get from people looking for art is that there isn't enough metadata to help them find things.  So at once, we're collecting too much information, and not enough.  In order to rectify this situation, someone (myself, specifically) will need to go through new art when it's added and add the appropriate metadata to it.  (As a side note, ages ago, when OGA was very young, we had narrower categories and let people classify things themselves, but art was constantly being miscategorized, so we dumped categories in favor or tags.)

I'd love to be able to automate some generation of metadata, but unfortunately, metadata is an inherently complicated thing.  For instance, the metadata required for vector art is different from the metadata you'd need for pixel art.  For pixel art map tiles, for instance, you probably want to know the per-tile resolution, but in the case of vector map tiles, resolution is irrelevant.  For music, you might want to know the length and genre of the song.  For 3D models, you probably want to know the polycount, texture resolution, whether it's rigged or static, etc.  If we're smart, we can programmatically guess some of these things, but certain things, like musical genre, would require a much more sophisticated algorithm than we have processing power to run.

So as I said, the solution to all this is that I'm going to have to go through and enter metadata as new art is submitted (not to mention going through the archives and adding it there too -- something that will likely take many, many hours).  But before I even get to actually entering metadata, I need to figure out what is going to be the best way to store it, and then build a reasonably usable web form so that I can enter it without inducing any more headaches than absolutely necesssary.  In the process of doing all this, I'd also like to rebuild the search interface into something that a) actually works, and b) is more appropriate for searching through art.

I recently spoke with my friend Clint Bellanger (developer of FLARE), who has a lot of experience with metadata and content curation, and he gave me some really good suggestions.  I'd like to switch our searching and indexing over to Apache Solr, which should be a big perforamance win, and will also allow some major improvements to the search form itself (not to mention vastly better results).  Ultimately, what I'd like to arrive at is a search system that works a lot more like, for example, this one at the Auburn University Library.  Note how quickly and easily you can add and remove search filters.  Now, imagine that you're searching for art on OGA, and you can do that with all sorts of data that's specific to individual types of art, as well as universal things like license, favorite count, download count (which we'll be re-adding), and submission date.  Here's a mockup image (click to enlarge):

Since this isn't implemented yet, you'll have to imagine that the results returned in the image are accurate and relevant, but that should give you a general idea of what we were thinking.  And just to reiterate, this is a mockup, so it's subject to change.

How long wiill this take?  It's hard to say, but since it'll be such a huge change, I can say for certain that it's likely to take weeks of actual programming time (which could translate into several months out here in The Real World).  Beyond that, new art isn't going to curate itself, which means that even after it's done, there will be a constant (and probably growing) workload of making sure that new art is properly curated.

People have suggested gamification (that is, reputation points) to encourage people to help out, and I think that's an excellent idea, but when we eventually go that route, I'll have to put a lot of thought into ways to make sure that items aren't miscategorized or categorized inconsistently (a common problem if multiple people are sorting things into categories).  Even if we enlist the help of users through reputation points, ultimately I'll still need to review their metadata for consistency.

So, for those of you who have been wondering why the content curation goal (which everyone understandably wants) has been set so high, it's because it's going to require a huge initial time investment and then a fairly constant investment of time later on (on top of the few hours per week of basic site management and maintenance).

If you've been curious if there's a good reason for you to donate to the OGA Patreon fund drive, this might be it.  We're just about half way to this point at the time of this post, so if you want to help us out, go to our Patreon page, or help spread the word. :)

Peace,

Bart