R3 Recycling: Metadata Schema Notes

Required Elements and Schema

Development of the R3 metadata schema began with reviewing the themes that emerged from our user stories. Findability generally was a key user concern, with particular emphasis placed on the ability to easily locate data applicable to specific locations or geographic ranges and data dealing with specific recyclable materials. As such, incorporating a tag or keyword metadata element that could take advantage of CKAN's 'Tag Vocabularies' feature to facilitate locating datasets that address recycling of a particular material was prioritized as a metadata requirement (CKAN, n.d.-d). Similarly, because user stories illuminated the importance of associating datasets with the geographic area of the jurisdictions to which they apply, location metadata that could be translated to the map interface offered by CKAN was also a priority (CKAN, n.d.-a).

Additional elements needed to support R3 were generated by examining the datasets identified for potential inclusion in the repository, as well as schemas utilized by related repositories, in order to create a working list of desired metadata elements:

  • Geographic scope

  • Author and/or publisher

  • Keywords indicating recyclable item or material types mentioned

  • Unique identifier

  • Title
  • Description
  • Contact point/responsible party
  • Temporal scope
  • Date of publication
  • Date of most recent update
  • Frequency of updates
  • Format(s) available
  • URL for access and/or download
  • Rights and/or license information

  • Category/type of dataset (to differentiate between broad categories of recycling datasets, such as recycling routes, recycling facilities, recycling rules/policies, recycling rates/measurements, etc., observed in our survey of potential repository datasets)

This element list was then compared against established schemas in order to create an application profile that would meet R3's metadata needs. DCAT, a schema widely used in the public data sectors relevant to recycling, provided all of our desired metadata elements, including flexible elements that could accommodate keyword tagging with a custom controlled vocabulary. Moreover, numerous extensions and modifications of DCAT exist for public sector data, suggesting that it can be adapted to new needs and ideas. Although additional desirable metadata elements were identified as the repository protocol developed (such as elements indicating file size and dataset version), suitable equivalents could be found without looking beyond DCAT.

Controlled Vocabularies

Three R3 elements, dct:spatial, dcat:keyword, and dcat:theme, include controlled vocabularies. Defining dct:spatial terms was relatively straightforward, but the controlled vocabularies for dcat:keyword and dcat:theme, required significantly more customization. To establish initial vocabularies for the purposes of creating this protocol, with the aim of continuing to develop, extend, and refine them over time, we determined the goals for each vocabulary and gathered terms from both the datasets in our repository and related sources, following the first two steps of Leise, Fast, and Steckel's "Creating a Controlled Vocabulary" (Fast, K., Leise, F. & Steckel, M., 2003), the results of which are included below for reference.

1) Developing a strategy: a sampling of questions to make decisions

Question dcat:keyword dcat:theme
What do you want your controlled vocabulary to accomplish? Make datasets easily searchable Make datasets easily browsable
How much vocabulary control do you want to provide? Enough to apply 5-10 tags to each dataset Four to five basic categories, with each dataset falling into at least one category
Content-specificity: The more items that are similar, the more specific you need to be. Items tend to be very similar within themes-need more specificity Themes tend to be well-differentiated-need less specificity
Content-stability: Do the concepts and names for them change often? No No
Users-who is the target audience? Web-savvy? Need to do a lot of research? Audience likely only moderately technically proficient and favors fast, easy searching Audience likely only moderately technically proficient and favors fast, easy searching
Maintenance-who will maintain your controlled vocabulary? How much time will that take? Vocabulary will be maintained by repository staff – prefer a vocabulary that will remain relatively stable over time to minimize burden Vocabulary will be maintained by repository staff – prefer a vocabulary that will remain relatively stable over time to minimize burden

2) Gathering terms

A) Gather terms used in items identified for inclusion in the collection

  • Recycling
  • Recycle
  • Materials
  • King county
  • Solid waste
  • Disposal
  • Diversion
  • Tonnages
  • Solid waste
  • Energy and greenhouse gas savings
  • Recycled material
  • Tons recycled
  • BTUs saved
  • GHGs avoided
  • Aluminum cans
  • Ferrous metals
  • Plastic bags
  • Plastic containers
  • Electronics
  • Appliances
  • Steel cans
  • Glass
  • Plastics
  • Corrugated cardboard
  • Mixed paper
  • Wood
  • Yard trimmings
  • Food scraps
  • Other organics
  • Mixed metals
  • Landclearing debris
  • Carpet
  • Computers/electronics
  • Construction and demolition debris
  • Tires
  • Newspaper
  • Aluminum
  • Tin
  • Plastic bottles

B) Gather terms used by others for related content

results matching ""

    No results matching ""