AN APPROACH FOR CLONE DETECTION IN DOCUMENTATION REUSE

D. V. Lutsiv, D. V. Koznov, H. A. Basit, E. L. Ouh, M. N. Smirnov, K. Y. Romanovsky


Read the full article 
Article in Russian


Abstract

The paper focuses on the searching method for repetitions in DocBook/DRL or plain text documents. An algorithm has been designed based on software clone detection. The algorithm supports filtering results: clones are rejected if clone length in the group is less than 5 symbols, intersection of clone groups is eliminated, meaningfulness clones are removed, the groups containing clones consisting only of XML are eliminated. Remaining search is supported: found clones are extracted from the documentation, and clone search is repeated. One step is proved to be enough. Adaptive reuse technique of Paul Bassett – Stan Jarzabek has been implemented. A software tool has been developed on the basis of the algorithm. The tool supports setting parameters for repetitions detection and visualization of the obtained results. The tool is integrated into DocLine document development environment, and provides refactoring of documents using found clones. The Clone Miner clone detection utility is used for clones search. The method has been evaluated for Linux Kernel Documentation (29 documents, 25000 lines). Five semantic kinds of clones have been selected: terms (abbreviations, one word and two word terms), hyperlinks, license agreements, functionality description, and code examples. 451 meaningful clone groups have been found, average clone length is 4.43 tokens, and average number of clones in a group is 3.56.


Keywords: software documentation, documentation reuse, software clone detection, adaptive reuse, refactoring, DocBook, DocLine, DRL

References
1.     Holmes R., Walker R.J. Systematizing pragmatic software reuse. ACM Transactions on Software Engineering and Methodology, 2013, vol. 21, no. 4, art. no. 20. doi: 10.1145/2377656.2377657
2.     Czarnecki K. Software reuse and evolution with generative techniques. Proc. of 22nd IEEE/ACM International Conference on Automated Software Engineering. Atlanta, USA, 2007, p. 575. doi: 10.1145/1321631.1321750
3.     Jarzabek S., Bassett P., Zhang H., Zhang W. XVCL: XML-based variant configuration language. Proc. of International Conference on Software Engineering. Portland, USA, 2003, pp. 810−811.
4.     Bassett P. The theory and practice of adaptive reuse. Sigsoft Software Engineering Notes, 1997, vol. 22, no. 3, pp. 2−9. doi: 10.1145/258368.258371
5.     Koznov D.V., RomanovskyK.Yu.DocLine: A method for software product lines documentation development. Programming and Computer Software, 2008, vol. 34, no. 4, pp. 216–224. doi: 10.1134/S0361768808040051
6.     Romanovsky K., Koznov D., Minchin L. Refactoring the documentation of software product lines. Lecture Notes in Computer Science, 2011, vol. 4980 LNCS, pp. 158−170. doi: 10.1007/978-3-642-22386-0_12
7.     Koznov D.V., Romanovsky K.Yu. Avtomatizirovannyi refactoring dokumentatsii semeistv programmnykh produktov [Automated refactoring of documentation of the software family]. Sistemnoe Programmirovanie, 2009, vol. 4. pp. 128–150.
8.     Shutak A.V., Smirnov M.N., Smazhevskii M.A., Koznov D.V. Poisk klonov pri refaktoringe tekhnicheskoi dokumentatsii [Search of clones with refactoring of technical documentation]. Komp'yuternye Instrumenty v Obrazovanii, 2012, no. 4, pp. 30–40.
9.     Rattan D., Bhatia R.K., Singh M. Software clone detection: a systematic review. Information and Software Technology, 2013, vol. 55, no. 7, pp. 1165−1199. doi: 10.1016/j.infsof.2013.01.008
10.  Basit H.A., Smyth W.F., Puglisi S.J., Turpin A., Jarzabek S. Efficient token based clone detection with flexible tokenization. Proc. 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2007. Dubrovnik, Croatia, 2007, pp. 513−516. doi: 10.1145/1287624.1287698
11.  Walsh N., Muellner L. DocBook: The Definitive Guide. O’Reilly Media, 1999, 644 p.
12.  Linux Kernel Documentation. Available at: github.com/torvalds/linux/tree/master/Documentation/DocBook (accessed 01.06.2014).
13.  Wright C.H.G. Technical writing tools for engineers and scientist. Computing in Science and Engineering, 2010, vol. 12, no. 5, pp. 98–103. doi: 10.1109/MCSE.2010.115
14.  Fowler M., Beck K., Brant J., Opdyke W., Roberts D. Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, 1999,464p.
15.  Lugovskoi N.L. Podkhod dlya provedeniya refaktoringa «Vydelenie funktsii» v instrumente Klocwork Insight [Approach for refactoring «Select Features» tool in Klocwork Insight]. Trudy InstitutaSistemnogoProgrammirovaniya RAN, 2012,vol. 23, pp. 107–132.
16.  Itsykson V.M., Moiseev M.Yu., Akhin M.Kh., Zakharov A.V., Tsesko V.A. Algoritmy analiza ukazatelei dlya obnaruzheniya defektov v iskhodnom kode programm [Pointer analysis algorithms for detecting defects in the source code of programs]. SistemnoeProgrammirovanie, 2009, vol. 5, pp. 5–30.
17.  Akhin M., Itsykson V. Obnaruzhenie klonov iskhodnogo koda: teoriya i praktika [Detection of clones at source code: theory and practice]. SistemnoeProgrammirovanie, 2010, vol. 5, no. 1, pp. 145–163.
18.  Zetser N.G. Poisk povtoryayushchikhsya fragmentov iskhodnogo koda pri avtomaticheskom refaktoringe [Automatic clone detection for refactoring]. Trudy Instituta Sistemnogo Programmirovaniya RAN, 2013, vol. 25, pp. 39–50.
19.  GNU General Public License v2.0. Available at: www.gnu.org/licenses/gpl-2.0.html (accessed 01.06.2014).
20.  Abadi A., Nisenson M., Simionovici Y. A traceability technique for specifications. Proc. 16th IEEE International Conference on Program Comprehension, 2008, pp. 103−112. doi: 10.1109/ICPC.2008.30
21.  Krueger C.W. Variation management for software production lines. Proc. 2nd Software Product Line Conference. San Diego, USA, 2002, pp. 37−48.
22.  Trung H.D., Jarzabek S. DME: Documentation Management Environment for Software Product Lines – Tool Demo Proposal. Available at: www.comp.nus.edu.sg/~stan/DME.pdf (accessed 01.06.2014).
23.  Lee J., Muthig D. Feature-oriented variability management in product line engineering. Communications of the ACM, 2006, vol. 49, no. 12, pp. 55−59.
24.  Mei H., Zhang W., Gu F. A feature oriented approach to modeling and reusing requirements of software product lines. Proc. 27th Annual International Conference on Computer Software and Applications, COMPSAC'03. IEEE Computer Society, Washington, USA, 2003, pp. 250–256.
25.  Grigoriev L., Kudryavtsev D. ORG-master: combining classifications, matrices and diagrams in the enterprise architecture modeling tool. Proc. 4th Conference on Knowledge Engineering and Semantic Web, Communications in Computer and Information Science, CCIS. St. Petersburg, 2013, pp. 250−258.
26.  Gavrilova T.A., Kudryavtsev D.V., Gorovoy V.A. Modeli i metody formirovaniya ontologii [Models and methods of ontologies forming]. Nauchno-Tekhnicheskie Vedomosti SPbSPU, 2006, no. 46,pp. 21–28.
27.  Gavrilova T.A. Ob odnom podkhode k ontologicheskomu inzhiniringu [An approach to the ontological engineering]. Novosti Iskusstvennogo Intellekta, 2005, no. 3, p. 25–31.
Copyright 2001-2017 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика