STUDY OF CURRENT APPROACHES FOR WEB PUBLISHING OF OPEN SCIENTIFIC DATA
Read the full article
For citation: Mouromtsev D.I., Lehmann J., Semerkhanov I.A., Navrotskiy M.A., Ermilov I.S. Study of current approaches for Web publishing of open scientific data. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2015, vol. 15, no. 6, pp. 1081–1087.
Subject of Study. The subject of study of this work is closely related to the development of tools and technologies for Internet publishing of open data in machine-readable formats with regard to data of universities, educational and research organizations and scientific laboratories. We analyze the trends in the publishing formats most commonly used including not only popular formats such as pdf, csv, excel, but also the Semantic Web formats such as RDF. The paper describes the way of scientific data publication in semantic formats on the example of import and convertation of the information from University database. Methods. We describe the methods of publication for scientific open data in the network consisting of a set of transformations of the original data sets to the final semantic representation. These transformation steps include data upload from a relational database, data mapping on the ontological model (schema) and the generation of a set of RDF-triples corresponding to the initial database fragment. A description is given to the popular open data publishing systems, such as CKAN, VIVO, and others. OpenLink Virtuoso system is selected as the primary storage and data publication. The description of RDF data model is used as a way of presenting open data of ITMO University. Main Results. The authors have described the methods of scientific open data publication and identified their shortcomings. To demonstrate the efficiency of the proposed method of university open data publication, a software prototype has been developed available online at: http://lod.ifmo.ru/. The example of the system usage is also given. Practical Relevance. Implementation of the proposed approach will improve significantly the effect of the publication of university open data and make it available for third-party applications, such as applications for information retrieval about educational activities and research results, analysis of scientific activities in universities and their research departments.
1. Keßler C., D'Aquin M., Dietze S. Linked data for science and education. Semantic Web, 2013, vol. 4, no. 1, pp. 1–2. doi: 10.3233/SW-120091.
2. Larsen P.O., von Ins M. The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics, 2010, vol. 84, no. 3, pp. 575–603. doi: 10.1007/s11192-010-0202-z
3. Das S., Sundara S., Cyganiak R. R2RML: RDB to RDF Mapping Language. Available at: http://www.w3.org/TR/r2rml (accessed 06.05.2015).
4. Sjaevelandet M.G., Lian E.H., Horrocks I. Publishing the Norwegian Petroleum Directorate's FactPages as semantic web data. Lecture Notes in Computer Science, 2013, vol. 8219, no. 2, pp. 162–177. doi: 10.1007/978-3-642-41338-4_11
5. Rodriguez J.B. et al. R2O, an extensible and semantically based database-to-ontology mapping language. Proc. 2nd Workshop on Semantic Web and Databases, 2004, vol. 3372, pp. 1069–1070.
6. VirtuosoUniversalServer. Available at: http://www.w3.org/wiki/VirtuosoUniversalServer (accessed 21.01.2015).
7. Leinberger M., Scheglmann S., Lammel R., Staab S., Thimm M., Viegas E. Semantic web application development with LITEQ. Lecture Notes in Computer Science, 2014, vol. 8797, pp. 212–227.
8. Heath T., Bizer C. Linked Data: Evolving the Web into a Global Data Space. 1st ed. Morgan & Claypool Publ., 2011. 136 p. doi: 10.2200/S00334ED1V01Y201102WBE001
9. Microsoft Academic Search. Available at: http://academic.research.microsoft.com (accessed 20.08.2015).
10. Devare M., Corson-Rikert J., Caruso B., Lowe B., Chiang K., McCue J. Connecting people, creating a virtual life sciences community. D-Lib Magazine, 2007, vol. 13, no. 7, pp. 1082–9873. doi: 10.1045/july2007-devare
11. Krafft D.B., Cappadona N.A., Caruso B., Corson-Rikert J., Devare M., Lowe B. VIVO: Enabling national networking of scientists. Proc. Web Science Conference. Raleigh, USA, 2010, vol. 2010, pp. 1310–1313.
12. Nonaka I., Takeuchi H. The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation. NY, Oxford University Press, 1995, 304 p.
13. Groza T., Handschuh S., Clark T., Shum S.B., de Waard A. A Short Survey of Discourse Representation Models. Available at: http://ceur-ws.org/Vol-523/Groza.pdf (accessed 20.08.2015).
14. Groza T., Handschuh S., Moller K., Decker S. SALT - Semantically annotated LaTeX for scientific publications. Lecture Notes in Computer Science, 2007, vol. 4519, pp. 518–532.
15. de Waard A., Breure L., Kircz J.G., van Oostendorp H. Modeling Rhetoric in Scientific Publications. Available at: http://www.researchgate.net/publication/46680525_Modeling_Rhetoric_in_Scientific_Publications (accessed 20.08.2015).
16. Sernadela P., van der Horst E., Thompson M., Lopes P., Roos M., Oliveira J.L. A nanopublishing architecture for biomedical data. Proc. 8th Int. Conf. on Practical Applications of Computational Biology and Bioinformatics, PACBB. Salamanca, Spain, 2014, vol. 294, no. 6, pp. 277–284. doi: 10.1007/978-3-319-07581-5_33
17. Saleem M., Khan Y., Hasnain A., Ermilov I., Ngonga Ngomo A.-C. A fine-grained evaluation of SPARQL endpoint federation systems. Semantic Web Journal, 2015, vol. 6, no. 6. doi: 10.3233/SW-150186