Landing : Athabascau University

A blast from my past: Google reimplements CoFIND

A blast from my past: Google reimplements CoFIND

While searching for a movie using Google Search last night I got (for the first time that I can recall) the option to tag the result, as described in this article. I was pleased to discover that the tool they provide for this is virtually identical (albeit with a much slicker and more refined modern interface overhaul) to the CoFIND system that underpinned my PhD, that I built over 20 years ago now. You are presented with a list of tags, and can select one or more that describe the movie, and/or suggest your own, effectively creating a multi-dimensional rating system that other users can use to judge what the movie is like. When I rated the movie last night, for instance, popular tags presented to me included 'terrible acting', 'bad writing', 'clichéed', 'boring' and so on. Having seen the movie, I agree about the bad writing and clichés - it was at the terrible end of the scale - but actually think most of the acting was fairly good, and it was not very boring. What is interestingly different about this, compared with other tagging systems currently available, is that this kind of tag is fuzzy - it represents a value statement about the movie that exists on a continuum, not a simple categorization. The sorting algorithm for the list of tags presented to you appears (like my original CoFIND) to be based mainly on simple popularity though it is possible that (like CoFIND) it uses other metrics like tag age and perhaps even a user model as well. It's vastly more useful and powerful than the typical thumbs-up/thumbs-down that Google normally provides. The feature has sadly not reappeared on subsequent movie searches, so I am guessing that Google is either still testing it or trying to build up a sufficient base of recommendations by occasionally showing it to people, before opening it up to everyone.

Just in case Google or anyone else has tried to patent this, and to assert my prior art, you can find a description and screenshots (p183 and p184) of my original CoFIND system in chapter 6 of my PhD thesis as well as in many papers before and since, not to mention in a fair few blog posts. It's out there in the public domain for anyone to use. The interface of my system was, even by the standards of the day, pretty awful and not even a fraction as good as the one provided by Google, but those were different times: it did work in exactly the same way, though. As I developed it further, the interface actually became much worse. Over the course of a few years I experimented with quite a range of methods to get and display ratings/tags, including an ill-conceived Likert scale as well as a much more successful early use of tag clouds, all of which added complexity and reduced usability. Some of these later systems are described and discussed in my PhD too.  In its final, refactored, and heavily evolved form that postdates my PhD by several years, a version of Cofind (last modified 2007) is actually still available, that almost reverts to the the Google-style tag selection approach of the original, with the slight tweak that, in CoFIND, you can disagree about any particular tag use (for instance, if you don't believe it to be inane then you can cast a vote against that tag).  The interface remains at least as awful as the original, though, and not a patch on Google's. The main other differences, apart from interface variations, are that the nomenclature differs (I used 'qualities'  rather than 'tags), and that CoFIND could be used for anything with a URL, not just movies. If you're interested, click on any resource link in the system and you'll see my primitive, ugly, frame-based attempt to do very much the same as Google is doing for movies (nb. unless you are logged in you cannot add new qualities but, for authorized users, a field appears at the end that is just like Google's). Though primarily intended to share and recommend educational resources, CoFIND was very flexible and was, over the years, used for a range of other purposes from comparing interface designs to discovering images and videos. It was always flaky, ugly, and unscalable, but it worked well enough for my research and teaching purposes, and (because it provides RSS feeds) it was my go-to tool for sharing interesting links right up until 2007, after which I reverted to more conventional but better-maintained tools like the Landing or Wordpress. 

A little bit of CoFIND background

I've written a fair bit about CoFIND, formally and informally, but not for a few years now, so here's a little background for anyone that might be interested, and to remind myself of a little of what I learned all those years ago in the light of what I know now.

An evolving, self-organizing, social bookmarking tool

I started my PhD research in 1997 with the observation that, even then, there was a vast amount of stuff to learn from that could be easily found on the Web, but that it was really difficult to find good stuff, let alone stuff that was actually useful to a particular learner at a particular stage in their development. Remember that this was before Google even started, so things were significantly worse then than they are now. Infoseek was as good as it got.

I had also observed that, in any group of learners, people would find different things and, between them, discover a much larger range of useful resources than any one learner (or teacher) could do alone, a fact that I use in my teaching to this day. These would likely (and, it turns out, in reality) be better than what a teacher could find alone because, though individual learners might be less able to distinguish low from high quality, they would know what worked for them and sufficient numbers of eyes would weed out the bad stuff as long as there was a mechanism for it. This was where I came in.

The only such mechanisms widely available at the time were simple rating systems. However, learners have very different learning needs, so I immediately realized that 'thumbs-up' or simple Likert scales would not work. This was not about finding the one 'best' solution for everyone, but was instead concerned with finding a range of alternatives to fill different ecological niches, and somehow discovering the most useful solution in that niche for a given learner at a given time.  My initial idea was to make use of a crowd, not an individual curator, and to employ a process closely akin to natural evolution to kill bad suggestions and promote good ones, in order to create an ecosystem of learning resources rather than a simple database. CoFIND was a series of software solutions that explored and extended this initial idea.

CoFIND was, on the face of it, what would eventually come to be called a social bookmarking system - a means for learners to find and to share Web resources (and, later, other things) with one another, along with a mechanism for other learners to recommend or critique them. It was by no means the first social bookmarking system, but it was certainly not a common genre at the time, and I don't think such a dedicated system had ever been used in education before (for all such assertions, I stand to be corrected), though other means of sharing links, from simple web pages or wikis or discussion forums to purpose-built teacher-curated tools were not that uncommon. A lot of my early research involved learning about self-organization and complex systems, in particular focusing on evolution and stigmergy (self-organization through signs left in the environment). As well as the survival-of-the-fittest dynamic, evolution furnished me with many useful concepts that I made good use of, such as the importance of parcellation, the necessity of death, ways to avoid skyhooks, benefits of spandrels, ways to leverage chance (including extinction events), and various approaches to supporting speciation.  As a result of learning about stigmergy I independently developed what later came to be know as tag clouds. I don't believe that mine were the first ever tag clouds - weighted lists of one sort or another had been around for a few years - but, though mine didn't then use the name, they were likely the first uses of such things in educational software, and almost certainly the first with this particular theoretical model to support them (again, I am happy to be corrected).

A collaborative filter

The name CoFIND is an acronym for 'collaborative filter in n-dimensions'. The n dimensions were substantiated through what we (my supervisors and I) called qualities. We went through a long list of possible names for these, and I was drawn for a while to calling them 'values', but (unfortunately) we never thought of 'tags' because the term was not in common use for this kind of purpose at the time. After a phase of calling them q-tags, I now call qualities by the much more accessible name of 'fuzzy tags'. Fuzzy tags are not just binary classifications of a topic but tags that describe what we value, or don't value, in a resource, and how much we value it. While people may sometimes disagree about binary classifications (conventional tags) it is always possible to have different opinions about the application of fuzzy tags: some may find something interesting, for instance, while others may not, and others may feel it to be quite interesting, or incredibly so. Fuzzy tags are to do with fuzzy sets, that have a continuum of grades of membership, which is where the name comes from. Different versions of CoFIND used different ways to establish the fuzziness of a tag - the Likert Scale used in a few mid-period versions was my failed attempt to make it explicit but this was a nightmare for people to actually use.  The first versions used the same kind of frequency-based weighting as Google's movie tags, but that was a bit coarse - I was uncomfortable with the averaging effect and the unbridled Matthew Effect that threatened to keep early tags at the top of the list for all time, that I rather coarsely kept in check with a simple age-related weighting that was only boosted when they were used (the unfortunate side effect of which being that, if a system was not used for a few weeks, all the tags vanished in a huge extinction event, albeit that they could be revived if anyone ever used one of the dead ones again). The final version was a bit in-between, allowing an indefinitely large scale via simple up-down ratings, balanced with an algorithm that included a decaying but renewable novelty weighting that adjusted to the frequency of use of the system as a whole. This still had the peculiar effect of evening out/initializing all of the tags over time if no one used the system, but at least it caused fewer catastrophes.

'Traditional' collaborative filters simply discover whether things are likely to be more valued or less valued on a usually implicit single dimension (good-bad, liked-disliked, useful-useless, etc). CoFIND's qualities/fuzzy tags allowed people to express in what ways they were better or worse - more interesting, less helpful, more complex, less funny, etc, just as Google's movie tagging allows you to express what you like or dislike about a movie, not just whether you liked it or not. In many tag-based systems, people tend to use quite a few simple tags that are inherently fuzzy (e.g. Flickr photos tagged as 'beautiful') but they are seldom differentiated in the software from those that simply classify a resource as fitting a particular category, so they are rarely particularly helpful in finding stuff to help with, say, learning.

I was building CoFIND just as the field of collaborative filtering was coming out of its infancy, so the precise definition of the term had yet to be settled. At the time, a collaborative filter (then usually called an 'automated collaborative filter') was simply any system that used prior explicit and/or implicit preferences of a number of previous users (a usually anonymous crowd) to help make better recommendations and/or filter out weaker recommendations for the current users. The PageRank algorithm that still underpins Google Search would perhaps have then been described as a collaborative filter, as was one of its likely inspirations, PHOAKS (People Helping One Another Know Stuff), that mined Usenet newsgroups for links, taking them as an implicit recommendation within the newsgroup topic area. By this definition, CoFIND was in fact a semi-automated collaborative filter that combined explicit preferences with automated matching. Nowadays the term 'collaborative filter' tends to only apply to a specific subset of recommender systems that automatically predict future interests by matching individual patterns of behaviour with those of multiple others, whether by item (people who bought this also bought...) or user (people whose past or expressed preferences seem to be like yours also liked...). I think that, if I built CoFIND today, I would simply refer to it more generically as a recommender system, to avoid confusion.

Disembodied user models

Rather than a collaborative filter, back in the late 90s Peter Brusilovsky saw CoFIND as a new species of educational adaptive hypermedia, as it was perhaps the first (or at least one of the first) that worked on an open corpus rather than through a closed corpus of linked resources. However, he and I were both puzzled about where to find the user model, which was part of Peter's definition of adaptive hypermedia. I didn't feel that it needed one, because users chose the things that mattered to them at runtime. In retrospect, I think that the trick behind CoFIND, and what still distinguishes it from almost all other systems apart from this fairly new Google tool, is that it disembodied and exposed the user model. Qualities were, in essence, the things that would normally be invisibly stored in a user model, but I made them visible, in an extreme variant of what Judy Kay later described as scrutable adaptation.  In effect, a learner chose their own learner model at the time they needed it. The reasoning behind doing so was that, for learners, past behaviour is usually a poor predictor of future needs, mainly because 1) learning changes people (so past preferences may have little bearing on future preferences), and 2) learning is driven by a vast number of things other than taste or past actions: we often have a need for it thrust upon us by an extrinsic agency, like a teacher, or a legislative demand for a driving licence, for instance. Qualities (fuzzy tags) allow us to express the current value of something to us, in a form that we can leave behind without a lot of sticky residue, and that future users can use. In fact, later versions did tend to slightly emphasize similar things to those people had added, categorized, or rated (fuzzily tagged) earlier, but this was just a pragmatic attempt to make the system more valuable as a personal bookmark store, and therefore to encourage more use of it, rather than an attempt to build a full-blown collaborative filter in the modern sense of the word.

Moving on

I still believe that, in principle, this is an excellent approach and I have been a little disappointed that more people have not taken up the idea and improved on it. The big and, at the time, insurmountable obstacles that I hit were 1) that it demands a lot of its users to provide both tags and resources, with little obvious personal benefit, so it is unlikely to get a lot of use, 2) that the cold-start problem that affects most collaborative filters (it relies on many users to be useful but no one will use it until it is useful) is magnified exponentially by every one of those n dimensions so it really demands a big lot of users, and 3) that it is fiendishly hard to represent the complex ecological niches effectively in an interface, making the cognitive load unusably high. Google seems to have made good progress on the last point (an evolution enabled by improved web standards and browsers combined with a simplification of the process, which together are enough to reduce the cognitive load by a sizeable amount), and has plenty sufficient numbers of users to cope with the first and second points, at least with regard to movie recommendations. It remains challenging to see how this would work in an educational setting in anything less than the largest of MOOCs or the most passionately focused of user bases. However, I would love to see Google extend this mechanism to OERs, courses, and other educational resources, from Quora answers to Kahn Academy tutorials, because they do have the numbers, and it would work well. For the same reasons, it would also be great to see it applied to something like StackExchange or similar large-scale systems (Reddit perhaps) where people go to seek solutions to learning problems. I doubt that I will build a new version of CoFIND as such, but the ideas behind it should live on, I think, and it's great to see them back on a system as big as Google Search, even if it is so far only experimental and, so far, just used to recommend movies.

Jon Dron

Jon Dron

still learning, never learning enough
About me

I am a full professor and Associate Dean, Learning & Assessment in the School of Computing & Information Systems, and a member of The Technology-Enhanced Knowledge Research Institute at...