The most interesting thing about my current job is seeing what happens to "the metadata" of our digital collections when it (collectively speaking) goes out into the wild. Some recent projects in my office have demonstrated to me that there are an incredible array of options for distributing metadata (and hence access to your collections). Some of these options entail a higher level of effort to achieve, some little or no effort at all. Also, they are not equivalent in terms of their implementations and outcomes, that is, they are quite different in how they let users search your collections and how they bring them to your collections. Although I have some opinions based on anecdotal evidence, I can't say precisely which of these services are better or worse because, as far as I know, a thorough study hasn't been done. At the very least, this issue needs more analysis. Meanwhile, I have some interesting observations.
1) Metadata that is "Open Access" gets proliferated throughout the web in a very efficient manner. The collections I'm speaking of are our ETD repository and the Journal of Digital Information. Over the past few months, our metadata for these collections has been proliferating over the web through various means. First of all, they are both harvested automatically by the major search engines such as Google, Yahoo, and MSN. Second of all, they are harvested by specialty search engines focused on scholarly resources. They are being harvested by Google Scholar, Elsevier's Scirus, and Thompson-ISI's Current Web Contents and this is just what I know for sure. So the information that WE host is now included in these search engines, from the broadest web search to sites that are focused on serving the needs of scholars. This wouldn't have come about except for the fact that our content is available Open Access. Not just the full-text of the documents themselves, but our metadata as well! I think this is great. Since we are Open Access anyway, we don't measure success by profits, we measure it by use. We want to be used, hit, and downloaded as much as possible. We want our content to be cited and have scholarly impact. One thing that is intersting about this is that because we were offering our metadata "Open Access" anyway, we didn't sign any contracts or licensing agreements with any of these search engines. Furthermore, Open Access metadata is easier to manipulate in an automated way. Contrast this with the next anecdote:
2) A lot of metadata in library systems is not "Open Access." This makes it much harder, less efficient, and expensive to manage. Here are three examples where metadata is not open access: WorldCat. CrossRef. A&I/Fulltext databases. Here are some specifics. First example: We want to update our institutional repository in a way that will slightly change the URLs to our ETDs. For systems that automate the management of metadata, like OAI harvesters, this kind of change is trivial. But we have about 1000 ETD MARC records in WorldCat. Technically, it wouldn't be hard to write a script that updates all of these URLs, but WorldCat is locked down. We can't gain access to the database to automate this process. We have to use Connexion, which was designed strictly for humans to interact with. OCLC does NOT want to share its metadata for free, that is how they make thier money. Although technically we could update these ETD records quite efficiently, OCLCs security apparatus prevents us from doing so. Second example: We would like our ETDs to function as OpenURL targets. One way to do this is to register our metadata with CrossRef. We have to pay money to join CrossRef before *we can send them our metadata*. In other words, we are paying them to allow us to send them a commodity. Granted, they are used to working with commercial publishers that are much more restrictive with their data, but we don't function like that, our metadata is freely available anyway. They want us to sign a contract about the proper use of our metadata, which is ironic considering that Google crawled us without any kind of formal agreement. Dealing with some of these old-style metadata managers is simply frustrating. We are trying to *give away* our metadata, but they are throwing up all of these obstacles. We are in a similar situation with a certain commercial A&I database...
3) What is significant about these observations? In the future, I think metadata is going to become free-er, while the full-text content will be more controlled. I suspect that it will become beneficial for all parties involved to have "free" metadata, but enhance access controls to the content itself. It is the content that is so valuable, not the metadata. I also think that the control of access to content will be determined less by publishers and more by content creators. Some publishers (specifically Open Access publishers) will start providing more levels of access to content, and in this competitive environment, content creators will have a lot of options. The standard agreements that authors have to make with journal publishers, for example, are going to be broken to pieces and become more like menus instead. Furthermore, if it turns out to be true that "Open Access" correlates to higher impact, decisions on access levels will be influenced more by "impact" than commercial considerations. The power will be in the hands of the content creators. Finally, rights management languages/metadata/systems like ODRL and Shibboleth will enable this environment.
4) But I could be wrong, so I'm going to think about it some more.
Friday, July 22, 2005
Subscribe to:
Post Comments (Atom)
1 comments:
Brian
nice post, and I was particularly interested in your comments around opening up access to library system metadata.
I've quoted part of your post in a post of my own on Panlibus, and since I can't trackback to blogger thought I should leave this comment...
http://blogs.talis.com/panlibus/archives/2006/07/looking_forward.php
Post a Comment