Why Metadata Matters?
The word metadata has often been defined simplistically as data about data. It is information about an object or digital surrogate, providing both context and greater insight into the item. Types of metadata include, but are not limited to: title, author, date, subjects/keywords, item type, source, size, and copyright information. While there are types of metadata that can seem more universal, there are others that are more content specific. For example, number of pages would be used when describing a book, article, journal, etc. and not for a sculpture or painted work.
Metadata is not only for description and information, but it can be used to increase discoverability. Databases, controlled vocabularies, and metadata schemas allow for consistent metadata to be applied to a collection. Users can then use the metadata fields in order to browse for similar material or run simple and complex searches. The success in this is all in how the metadata is structured, applied, and made available to users.
The online database ProQuest uses metadata to describe the content that can be found within its collection. ProQuest enhances scholarship and research by making available academic journals, newspapers, magazines, reports, and books. Additionally, other materials (images, audio, etc.) have records with metadata and links to the actual location of the digitized material. The metadata collected for each item is detailed. While the more general types of metadata are collected (title, author, subject, and copyright to name a few), a lot of the metadata is more detailed. For example, in regards to date, the year, publication year, and publication date are all collected. Furthermore, information about the publisher and publication are also included – publication title, publisher, place of publication, and country of publication.
As a result of the metadata assigned to the digital records, an individual is able to perform basic text searches, that search for text within all of the fields associated with a record. The advanced search option similarly permits text searches, however the user is able to select a particular field that should be searched and include Boolean operators. While the text searching is powerful, the ability to apply search filters based on the metadata makes searching the material more robust. Advanced searches use the following metadata fields to enhance the searching capabilities: source type, document type, and language. Publication date is also included with a number of operator options. Potential search combinations include: historical newspapers before 1930 with the phrase “young hyson tea”; or marriage announcements after 1950 written in English. The possibilities are rather massive.
Some of the metadata options allows users to browse similar materials, such as, all titles found under a specific publication title or by a specific author. Discoverability is still enhanced however, using these options in searches is limited to only text searches. Unlike the controlled vocabulary select list made available for source or document type, or language, a user needs to know the subject matter or stumble upon the result when searching authors or publication title. Additionally, because the search for these fields is text based, multiple options within the field cannot be combined thus, somewhat limiting the robustness of the searching capabilities.
There could be a number of reasons as to why particular options were not included as select list. One reason could be that too many options can lead to a cluttered and difficult to use interface. Preselecting fields that display the possible search options can allow for a cleaner and more approachable user interface. Additionally, too many options can also restrain the search results to much, which could in fact hinder discoverability. Forcing the user to browse some fields instead of performing select searching, creates a way for similar information to be found. The ProQuest database demonstrates that the metadata of an item can be rather detailed however, not all options should be made available to users searching for materials within the collection.