doi: 10.17586/2226-1494-2020-20-5-714-721


SEARCH OF CLONES IN PROGRAM CODE 
 

A. O. Osadchaya, I. V. Isaev


Read the full article  ';
Article in Russian

For citation:
Osadchaya A.O., Isaev I.V. Search of clones in program code. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2020, vol. 20, no. 5, pp. 714–721 (in Russian). doi: 10.17586/2226-1494-2020-20-5-714-721


Abstract
Subject of Research. The paper presents research of existing approaches and methods for the search of clones in the program code. As a result of the study, a method is developed that implements a semantic approach for the search of duplicated fragments focused on all kinds of clones. Method. The developed method is based on the analysis of the program dependency graph built from the source code files. To detect duplicate fragments, for each source code file dependency program graphs are generated with the nodes hashed on the basis of their content properties. Each pair of nodes is selected from each equivalence class, and two isomorphic subgraphs are identified that include a pair of nodes. If a pair of clones is included into another pair, it is removed from the set of the found pairs of duplicated fragments. A set of clones is generated from the pairs of duplicated fragments that share the same isomorphic subgraphs, that is, the pairs of clones are expanded. Main Results. To evaluate the efficiency of the developed method of searching for clones, the files have been compared for determination of the clone types that the system using this method detects, and the testing has been performed on the real system components. The results of the developed system have been compared to the real ones. Practical Relevance. The proposed algorithm makes it possible to automate the analysis of source files. Detecting of clones in the program code is a priority direction in code analysis, since the detection of duplicate fragments provides for the fight against unscrupulous copying of program code.

Keywords: clones in program code, code duplication, duplicated fragments, code clone types, refactoring, code analysis, code reuse

References
1. Deshpande A., Riehle D. The total growth of open source. IFIP International Federation for Information Processing, 2008, vol. 275, pp. 197–209. doi: 10.1007/978-0-387-09684-1_16
2. Kapser C., Godfrey M.W. Toward a taxonomy of clones in source code: A case study. Proc. of the Workshop Evolution of Large-scale Industrial Software Applications (ELISA), 2003, pp. 67–78.
3. Sargsian S.S. Search methods for code clones and semantic errors based on semantic program analysis. Dissertation for the degree of candidate of physical and mathematical sciences. Moscow, ISPRAS, 2016, p. 10–22. (in Russian)
4. Karpov Iu.G. Model Checking. Verification of Parallel and Distributed Software Systems. St. Petersburg, BHV Publ., 2010, 552 p. (in Russian)
5. Bacon D.F., Graham S.L., Sharp O.J. Compiler transformations for high-performance computing. ACM Computing Surveys, 1994, vol. 26, no. 4, pp. 345–420. doi: 10.1145/197405.197406
6. Glass R.L. Frequently forgotten fundamental facts about software engineering. IEEE Software, 2001, vol. 18, no. 3, pp. 110–112. doi: 10.1109/MS.2001.922739
7. Akhin M.Kh., Itcykson V.M. Source code clone detection: theory and practice. Sistemnoe Programmirovanie, 2010, vol. 5, no. 1, pp. 145–163. (in Russian)
8. Hunt A., Thomas D. The Pragmatic Programmer: From Journeyman to Master. Addison-Wesley Professional, 1999, 352 p.
9. Fowler M., Beck K., Brant J., Opdyke W., Roberts D. Refactoring: Improving the Design of Existing Code. Addison-Wesley Professional, 1999, 464 p.
10. Miller G.A. The magical number seven, plus or minus two: some limits on our capacity for processing information. Psycological Review, 1956, vol. 63, no. 2, pp. 81–97. doi: 10.1037/h0043158
11. Ducasse S., Rieger M., Demeyer S. A language independent approach for detecting duplicated code. Proc. 15th International Conference on Software Maintenance (ICSM), 1999, pp. 109–118. doi: 10.1109/ICSM.1999.792593
12. Cordy J.R. The TXL source transformation language. Science of Computer Programming, 2006, vol. 61, no. 3, pp. 190–210. doi: 10.1016/j.scico.2006.04.002
13. Wettel R., Marinescu R. Archeology of code duplication: Recovering duplication chains from small duplication fragments. Proc. 7th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2005), 2005, pp. 63¬70. doi: 10.1109/SYNASC.2005.20
14. Livieri S., Higo Y., Matsushita M., Inoue K. Very-large scale code clone analysis and visualization of open source programs using distributed CCFinder: D-CCFinder. Proc. 29th International Conference on Software Engineering (ICSE), 2007, pp. 106–115. doi: 10.1109/ICSE.2007.97
15. Jiang L., Misherghi G., Su Z., Glondu S. DECKARD: Scalable and accurate tree-based detection of code clones. Proc. 29th International Conference on Software Engineering (ICSE), 2007, pp. 96–105. doi: 10.1109/ICSE.2007.30


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2022 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика