Semantic interoperability in a federated Crystallography information service
Traugott Koch, UKOLN
Presentation at "Digital repositories supporting eResearch: exploring the eCrystals Federation
Model. EBank/R4L/SPECTRa Joint Consultation Workshop, London, 2006-10-20
|
|
|
eBank UK project
Contents:
1 Three areas of semantic interoperability
2 Data structures: metadata engineering
3 Categorial data: subject access
4 Factual data: named entity standards/authorities
5 Common standardisation issues
6 Application issues
1 Three areas of semantic interoperability
- Need to enhance interoperability between data, images, text and metadata (both open and proprietary, free and fee-based) in:
- Data repositories (institutional, disciplinary, national, international)
- Publication repositories
- Data and publications on the Internet
- Aggregators
- Databases
- Services
- New requirement and user studies needed with transition to federated service, to define the scope,
the levels of the interoperability, the functionality and user interface solutions
- Semantic vs. syntactic interoperability:
Syntactic interoperability is about applying common formats and protocols for data transfer and merging
(e.g. CIF, XML. OAI, Z39.50, SRW).
Semantic interoperability is about shared meaning of the content.
- Approaches to enhance semantic interoperability:
- Agreed common standards at all sites
- Conversion/normalization
- Mappings between site-specific solutions
- Metadata enhancement (vocabularies, schemes, mapping, names)
When and where this is done depends on the selected architecture
- For full semantic interoperability, all three areas need to be addressed:
- A Data structures (metadata profiles)
- B Categorial data (topics, classification)
- C Factual data (names, formulae, other named entities)
2 Data structures: metadata engineering
Multiplicity of metadata profiles in use (documented or not)
Potential actions:
- Agree on metadata solution incl. value encoding for a future service
- Develop common Application Profile for Crystallography data
- Define different degrees of interoperability/adherence to common model, incl. a minimum level
- Take steps towards harmonization with the profiles of related publication servers
3 Categorial data: subject access
Potential actions:
- Clarify the data model question: what do the keywords characterize: elements of the data or usage/problem
context?
- Develop and maintain common controlled keyword system and classification for Crystallography, with coverage of
data-related topics
- Develop and maintain mapping to related discipline and generic classifications
4 Factual data: named entity standards/authorities
- Author and institutional names
- "Names" of the objects of study and their components: crystal structures, chemical compounds
- IUPAC Chemical Names (Colour books)
- InChI (International Chemical Identifier)
- Chemical Formula (based on CIF)
- Names used by others in the future federation?
Potential actions:
- Cooperate in building and improvement of name authority databases
- Contribute to further standardisation (InChI, Gold Book etc.)
- Use name authorities for verification and metadata enhancement
- Build authorities into metadata creation tools (Repository submission toolset. DSpace, eprint), e.g. via web services
- Experiment with mash-ups of crystallography data. What would be
candidate reference schemes? Consequences for repositories?
5 Common standardisation issues
-
- To what a degree can existing conventions be seen as standard?
- Does the convention/standard lead consistently to the same "name"?
- Use of proprietary systems such as CAS numbers?
- General problems with common and standardised solutions:
standardisation processes;
adoption;
validation;
maintenance.
6 Application issues
- Metadata creation and subject assignment
- Same or similar tools as the ones supporting discovery (e.g. semi-automated indexing)
- "Cataloguing" rules in accordance with the data model and the Application Profile. Harmonization
- Explore benefits of text and data markup (e.g. CML) in combination with keyword indexing and fulltext-searching
- Experiments with participatory indexing (social tagging, folksonomies) carried out by research groups
in specified research areas
- Investigate the need for formal ontologies and logical reasoning over the data and the
literature in Crystallography
- Discovery: searching, browsing, linking
- Searching in components and substructures, using strings; graphical search support (JChempaint, Marvin applet) etc.
- Searching of (and filtering with) key characteristics of crystal structures (cf. Reciprocal Net,
Crystallography Open Database)
- Searching inside the data files and the CIF format
- Searching in information services with much broader topical coverage (university-wide repositories,
national data archives, OAIster, Google)
- Investigate the potential benefits of text and data mining for indexing and searching support
- Steps towards knowledge extraction, hypothesis creation and larger-scale computational processing
eBank terminology report (internal)
Traugott Koch
Created: 2006-10-11
Last modified: 2006-10-18
URL: http://homes.ukoln.ac.uk/~tk213/pres/ebank-ws200610.html