Nikiforov
Vladimir O.
D.Sc., Prof.
doi: 10.17586/2226-1494-2019-19-2-299-305
CRITERIA FOR TEXT CONFORMITY TO SCIENTIFIC STYLE
Read the full article ';
For citation:
Abstract
Criteria of text conformity to scientific style were studied. We present the research of repetition rate of keywords and phrases in a text document, percentage ratio of stop words to the total number of words in the text, deviation of the words frequency graph in the text from the ideal Zipf’s chart. The study was carried out involving executable script that checks the text according to several criteria. As a result of an experimental study on a sample of 2500 articles published in HAC/RSCI sources, the distributions of criteria values were obtained and were checked for normality by several criteria, as well as for correlation between them. Based on these data analysis, threshold criteria values were obtained and mathematically substantiated, and then were used on a test sample consisting of the undergraduate works of students in St. Petersburg Electrotechnical University “LETI”, a pseudoscientific article “Rooter: A Methodology for the Typical Unification of Access Points and Redundancy”, technical articles from the Habr Internet IT community, "Capital" by Karl Marx and a number of other texts not related to the scientific style. A necessary but not sufficient condition for the compliance of the article to the scientific style was formulated.
References
-
Demidova A.K. Tutorial in Russian Language: Scientific Style, Design of Scientific Work. Moscow, Russkii Yazyk Publ., 1991, 201 p. (in Russian)
-
Kirillova O.V. Guidelines for Writing and Design of Scientific Articles in Journals Indexed in International Scientometric Databases. Moscow, ANRI Publ., 2017, 144 p. (in Russian)
-
Davis H. Search Engine Optimization. O'Reilly Media, 2006, 48 p.
-
Newman M.E.J. Power laws, Pareto distributions and Zipf's law. Contemporary Physics, 2005, vol. 46, no. 5, pp. 323–351. doi: 10.1080/00107510500052444
-
Lelu A. Jean-Baptiste Estoup and the origins of Zipf's law: a stenographer with a scientific mind (1868-1950). Boletín de Estadística e Investigación Operativa, 2014, vol. 30, no. 1, pp. 66–77.
-
Blees E.I., Androsov V.Yu. Automate the process of checking text for compliance with the scientific style. Proc.Modern Technologies in the Theory and Practice of Programming, 2018, pp. 118–121. (in Russian)
-
Dong X.L. et al. Knowledge-based trust: Estimating the trustworthiness of web sources. Proceedings of the VLDB Endowment, 2015, vol. 8, no. 9, pp. 938–949. doi: 10.14778/2777598.2777603
-
Script receiving articles selection. Available at: https://github.com/EduardBlees/Master-s-thesis/blob/master/ script/leninka_scrapper.py (accessed: 20.12.2018).
-
Boeing G., Waddell P. New insights into rental housing markets across the United States: Web scraping and analyzing craigslist rental listings. Journal of Planning Education and Research, 2017, vol. 37, no. 4, pp. 457–476. doi: 10.1177/0739456x16664789
-
Shapiro S.S., Wilk M.B. An analysis of variance test for normality (complete samples). Biometrika, 1965, vol. 52, no. 3/4, pp. 591–611. doi: 10.2307/2333709
-
Kolmogorov A. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari. Giorn., 1933, vol. 4, pp. 83–91.
-
Anderson T.W., Darling D.A. Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. The Annals of Mathematical Statistics, 1952, vol. 23, no. 2, pp. 193–212. doi: 10.1214/aoms/1177729437
-
Gmurman B.E. Theory of Probability and Mathematical Statistics. Moscow, Vysshaya Shkola, 2003, 478 p. (in Russian)
-
Cumming G. Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science, 2008, vol. 3, no. 4, pp. 286–300. doi: 10.1111/j.1745-6924.2008.00079.x
-
Script for Calculation of Mathematical Distribution Criteria. Available at: https://github.com/EduardBlees/Master-s-thesis/blob/master/script/results/testDistribution.py (accessed: 20.12.2018).
-
SciPy Module for Python. Available at: https://scipy.org (accessed: 20.12.2018).
-
Wheeler D.J. et al. Understanding Statistical Process Control. SPC Press, 1992, 406 p.
-
Easton V.J., McColl J.H. Statistics glossary. Available at: https://stats.gla.ac.uk/steps/glossary/index.html (accessed: 20.12.2018).
-
Zhukov M.S. Rooter: algorithm for typical unification of access points and redundancy. 2008. Available at: https://e-lub.net/annuals/ratu.htm (accessed: 20.12.2018).
-
Stribling J., Aguayo D., Krohn M. Rooter: A methodology for the typical unification of access points and redundancy. Journal of Irreproducible Results, 2005, vol. 49, no. 3, p. 5.
-
My disappointment in software. Available at: habr.com/post/423889/ (accessed: 20.12.2018).
-
Our personal data is worth nothing. Available at:https://habr.com/post/423947/ (accessed: 20.12.2018).
-
Story how I steal credit card numbers and passwords from visitors of your sites. Available at: https://habr.com/post/346442/ (accessed: 20.12.2018).
-
Three-dimensional engine on Excel formulas for dummies. Available at: https://habr.com/post/353422/ (accessed: 20.12.2018).