DCMI Registry Working Group report

DC2001 Tokyo October 2001

Co-chairs of DCMI Registry Working Group

Rachel Heery , UKOLN, University of Bath

Harry Wagner, OCLC

The emphasis during the DC-2001 workshop at Tokyo was on progressing the DCMI Registry from the prototypes now available to an operational service. In order to achieve this we had a plenary session scheduled during the workshop to demonstrate and get feedback on the two prototypes developed over the last months, followed up by a break-out session to allow for clarification of requirements. Various discussions prior to the workshop also led to a break-out session for consideration of work flow between the DCMI Usage Board, Architecture Working Group and the Registry Working Group. Following the various breakouts a meeting was arranged between the Registry WG chairs and the Usage Board to move towards a common view on the way forward.

Background

Since the Registry WG was chartered post-DC7 Frankfurt, in late 1999, it has had two main tasks

Throughout this time the WG has been committed to exploring the possibilities offered by RDF. A first prototype was demonstrated by Eric Miller at DC8 in Ottawa in October 2000, and since then further work has been done on functionality and software. Throughout the Registry WG has supported development of prototype software at OCLC. Harry Wagner has further developed the early EOR prototype and carried forward the work of Eric Miller and Tod Matola who have now moved on from OCLC to new challenges.

Prototype demonstration

Harry demonstrated two prototypes at Tokyo, one building on the original EOR toolkit, the other a new light-weight approach, see

http://wip.dublincore.org:8080/registry/Registry

The first prototype was based on joint work with Eric Miller and Tod Matola's on an Extensible Open Registry (EOR) open source tool-kit; the other on a simpler light-weight solution. These prototypes were developed to fulfil the functional requirements as at

http://homes.ukoln.ac.uk/~lisrmh/DCMI-registry/funreq.html

The demonstration raised some interesting issues regarding the functionality of the user interface such as

and perhaps more fundamentally raised issues regarding the content of the registry and workflow, such as

There was positive feedback on the demonstrators as illustration of what a registry might offer to the DCMI community, as an information source on DCMI terms and as a resource for software to access.

It seems that the use of RDFS as means to capture semantics of the DCMI vocabulary and to infuse this data into the Registry reveals a tension. RDFS offers a standard schema language which contributes to interoperability between schema definitions, RDF offers a simple data model within the prototype registries, and offers means for machine access and standard outputs. In what I would hope is not a grandiose suggestion, RDFS offers a means to enable registries to become a building block for the semantic web. But with RDF we are working with an immature technology, there are bits missing and the tools are not fully developed.

What is more if we are to use the RDFS schemas as input to the registry (both prototypes 'infuse' all data from RDFS schemas) then there is an overwhelming need for clarity of expression within the DCMI schema(s). It was felt that before moving to an operational DCMI Registry some confirmation that we had got the schemas right was required…. And who would be responsible for this??

In addition in building the prototypes we had to acknowledge that there was a lot of work required to provide a user friendly 'non-RDF jargon' user friendly human interface to the registry. The functional requirements insisted that the structure and relations between terms needed to be expressed by way of the DCMI data model (using the concepts of element, element refinement, scheme etc). This was proving onerous in terms of layering this information on top of the RDF data model, and arguably was against the spirit of an 'open registry' anyway.

Registry/Architecture/Usage Board break-out group discussion

This break-out group was chaired by Makx Dekkers and had in attendance representatives of the groups concerned as well as others with an interest in architecture. A wide ranging discussion of issues regarding the relationship and work flow between these groups led to some various points being highlighted. Of particular relevance to the Registry WG was

After some discussion the following roles were proposed with regard to development of Registry:

Note it was acknowledged that these WGs have other roles as well for other tasks!

Registry WG breakout session

The main objective of the breakout session was to clarify the requirements for the phase one operational registry. Some particular issues for clarification had already been raised on the agenda, and others emerged in particular in relation to the registry as a 'vocabulary management tool'.

1 Content

1.1 Deriving other documentation from RDFS schema

There was a general feeling in the meeting that there needs to be one master version of all information regarding terms. Is the registry meant to include all information regarding terms? Should all the information describing terms be included as now contained in the reference documents that exist on the web site? This would mean for each term including

And possibly in addition

If all these were to be accessible in the registry while keeping to the current model of infusing data from schema, then would all this data need to be included in the RDFS schema?? Or should we be looking to construct different schema, light-weight and heavy-weight? The general feeling seemed to be that the schema needed to be kept as simple as possible, and that the schemas now in draft form should not be 'over-loaded' with additional data.

It was agreed that the Registry WG needs to get definite clarification as to whether the existing draft schemas would need to be more detailed or whether we take another route. The feeling of the meeting was the WG need to get agreement on this from the Usage Board or Architecture WG.

It was decided that the Registry WG chairs should take the opportunity to discuss this at a subsequent Usage Board meeting while we were all in Tokyo.

1.2 Inclusion of proposed terms, application profiles

The feeling of attendees was conservative on inclusion of proposed terms, and application profiles. Certainly including application profiles raises issues of scope, for example what profiles are in scope? How would a DCMI registry deal with non-DCMI terms?

There was a proposal from the chair (Rachel) that a shared approach to declaring local terms would enable early registration of ‘local namespaces’. This would require ‘new’ schemas to be infused, these might be domain specific or language specific. Once again it was felt that this might not be appropriate in view of Usage Board policy.

2 Integrating usage and other information

There seemed general support for integrating information from usage guidelines, so that users could be pointed to good practice in use of terms.

In addition it was felt that the registry might be a pedagogical tool for showing novice users of RDF correct RDF encoding of terms for creating instance metadata. Also it was felt important to enable users of XML to view best practice guidelines for encoding terms in XML. It was agreed that this functionality would be appropriate for a subsequent phase of the registry.

3 Multilinguality

It was agreed that translation of user interface and definitions, etc were required in phase 1. There was a suggestion that automatic query of locale might be used to to set language default (in subsequent phase).

There was some discussion of the issue of distributed 'ownership' and storage of translations, and to ensure in synch do we need to link these to the ‘RDF schema’? Could they be linked as separate RDF schema (annotations) ? Or could they reside in XML files local to the database?

There was some support for the requirement to search for a term and display translations in all languages available.

4 Supporting Usage Board work

Change control for Usage Board revisions was identified as a requirement, including an audit trail (subsequent phase) tracking changes to descriptions of terms, new terms etc

Discussion followed as to whether this additional ‘metadata’ for terms to be expressed in RDFS schema? Or just dealt with internally by registry? It was felt this needs to be decided as part of move to ‘canonical schema’, and once again clarification was required regarding content of schema.

5 Liaison with registry effort elsewhere

It was agreed that it is important to ensure liaison with related initiatives such as

Meeting with Usage Board

The Registry WG asked to attend a subsequent Usage Board meeting to discuss issues raised in the break-outs.

1.Content of schemas

The issue of the level of detail within schema, and positioning responsibility for content of the schema was the first issue for discussion. The overall opinion of the Usage Board was that there was no need to infuse the Registry by means of RDFS schemas, therefore this was in effect a non-issue. The Usage Board felt that straightforward input of data using traditional database building tools would be easier and more appropriate. This would enable a variety of data to be loaded as and when required, it would also mean there would be no necessity to overload the schemas with any additional data. It was agreed that outputting of RDFS schemas would be useful functionality for the Registry, as well as outputting of XML schemas.

From the Registry WG viewpoint Rachel expressed concern that we had worked for more than a year on the assumption that we wanted an 'open, extensible' solution and that DCMI is committed to exploring RDF based solutions. She was concerned that the traditional database solution was not a 'generic' solution for registries in an RDF environment. However the overwhelming opinion appeared to be that output of RDF schema was sufficient for the DCMI registry.

In the interest of early and robust implementation the agreement was to consider a more traditional approach for a software platform for the registry. Harry felt this was feasible and was willing to work towards offering a solution.

2 Content of registry

The Usage Board felt that at this stage there was no requirement to register application profiles, although this might come at a later stage. The Board felt it was unlikely as to there ever being a requirement for WGs or implementors to register proposed terms. This did not fit in with current workflow or policy of the Usage Board.

Rachel, as WG chair, stated that she saw managing the 'evolution' of the DCMI vocabulary as a significant function of the DCMI registry, and felt that in the medium to long term this would be one of the exciting benefits of a registry. Her view was that a registry would facilitate the approval process by recording terms as proposed, recommended or under review. This would enable other DCMI implementors to see what terms had been proposed for approval. However she acknowledged that maybe this was too ambitious an objective for phase 1 of the implementation.

It was agreed (later at the Advisory Committtee meeting) that the Usage Board would review the functional requirements for the Registry.

Future plans

Development of two working Registry prototypes based on an open, RDF approach is I believe a satisfactory achievement for the WG efforts. I am happy to have contributed to this in the context of an international collaboration, working with colleagues at OCLC, and others in the Working Group. I think we can now say we have successfully delivered to a significant extent the original charter of the Registry WG.

My personal opinion is that subsequent development of the DCMI Registry might need to re-visit the requirements and priorities of the DCMI Executive and Usage Board, taking into account the formative work already done by the Registry WG. As the registry moves to an operational service, then interests of WG members, whether these be regarding development of generic registry software, open registry systems, or exploration of RDFS perhaps have to give way to more pragmatic decisions, not least in providing early implementation of an easily maintained reliable system.

In addition as the Usage Board policy and processes are articulated then their requirements for a 'vocabulary management tool' emerge. If the development path is a traditional approach there seems little need to impose a WG context for this work… but how far can a traditional database approach achieve the requirements in a sustainable and scaleable way? It seems to me that at very least the Registry WG can bring to the table the pros and cons of an open, extensible approach... and indicate how far RDF environment contributes to this.

In addition if we are to pass responsibility for functional requirements to the Usage Board it seems to me there needs to be a period of transition, whereby the WG pass on recommendations to that group.

I also firmly believe there is still a role for a forward looking and innovative approach to registry issues. Whether this would form a sufficient role for a Registry WG I am unclear, but it seems there is s requirement to look forward and outward to inform DCMI about registry initiaatives elsewhere, and to introduce suggestions for future development.

I hope this report gives readers a feel for discussions and debate in Tokyo, it is of course my own subjective view and I realise that others may have a different perspective. In particular I would welcome comments from Harry who was there, and from others who were in Tokyo. For those of you who were not able to be there please comment, ask for more details. Please give your views!