While I am new to the world of libraries, I work as a bookseller in my other life and am spending increasing amounts of my time buying second-hand books and cataloguing them so they can be put out to sell. Before that point, I need to allocate a category for each one and decide on how much to charge for it. We generally aim to make between 40% and 50% margin on a book but its condition is really important when pricing it. One of my great bugbears when doing this, however, is the cataloguing software we use and the ludicrously small text box it provides for the book’s title. This means that only a small proportion of the title appears on the label when we print it out and also means that only the part that can be printed is saved within the database. Sometimes we get asked whether we have a second-hand book and we enter the full title into our software and then it spits it out again! This is pretty infuriating and a prime example of how inadequate bibliographic data can make our lives needlessly difficult.
As discussed in one of my previous posts, we have grown accustomed to hearing the term “Web 2.0” to describe the period in the history of the internet where user-generated content became prevalent. This has led of course to a vast increase in the total amount of data generated worldwide and made the job of information professionals considerably more difficult as they attempt to steer their audiences towards the useful and relevant. Naturally the average web user is even more overwhelmed in comparison.
Thankfully, there have been huge gains in the amount of computational power available. Moore’s Law states that we can expect the number of transistors in a circuit to double every two years. Although this rate of growth seems to be slowing down, we still have much more power to play with compared to, say, a decade ago. The whole concept of the “Semantic Web” or “Web 3.0” is to use this extra power in an intelligent way in order to make the data work for us.
At the moment, most of the content published on the Web is in the HTML format rather than as raw data. I have no coding experience myself but my understanding is that HTML elements are of limited intelligibility to computers. Due to the upsurge in data I have already discussed, this is already becoming very inefficient. If machines were to be given more access to this raw data, the need for human input to extract meaning is reduced. Librarians and information professionals have been working for some time to make the plethora of data they have available to them more freely accessible to the wider public. I also get the impression that various libraries and their parent organisations are now working more collaboratively than they have done in the past.
Hence there has been the development of BIBFRAME by the Library of Congress which it is hoped will become a new set of standards that will replace the MARC ones that are currently widespread. It is at this point that I begin to struggle to keep up! The MARC standards were designed in the 1960s so that library records could be shared digitally but commentators now say they are not up to the task of presenting bibliographic data in the user-friendly way information professionals hope will become the new normal. More simply put, people’s expectations have changed and they want to be able to make connections between various data sets.
The LoC website states in its FAQ section that “the BIBFRAME Model is the library community’s formal entry point for becoming part of a much wider web of data, where links between things are paramount”. Apparently, the goal is to present the data we hold about books and other media at what the LoC calls “three core levels of abstraction”. These are as follows: Work, Instance and Item. The Work level of data contains things like subjects, authors and languages. This then leads us to the Instance level which involves the various manifestations a Work may take such as a printed publication or as a Web document and the data required, such as date and publisher, needed to access them. The Item level allows us to access a specific iteration of a Work either physically on a library shelf or virtually. I think the aim of opening access to this data is a laudable one, providing BIBFRAME is integrated with the search engines already widely used by the public. Of course, this then leads to another debate on the power companies such as Google hold over the consumer!