Function introduction
rostcm mainly includes three parts: full network search, local literature database comparison and comparison of a small number of documents.
The software columns include chat analysis, network analysis, website analysis, browsing analysis, Weibo analysis and journal analysis.
Through this system, we can determine whether a paper is plagiarized. When analyzing whether a paper is plagiarized, you can also perform functional analysis (word frequency analysis, English word frequency analysis, Chinese word frequency analysis, social network and semantic network analysis, sentiment analysis, traffic analysis, TF/IDF batch word frequency analysis, similarity analysis), This leaves no place for plagiarized papers to hide.
Basic principles:
The anti-plagiarism software automatically cuts the document into multiple small texts of 50 to 200 words (customizable), and uses a hybrid engine to fuzzy match it with 18.8 billion web pages and 4.9 million documents, marking each text block and document The maximum similarity of some documents in the library. From this software, the software calculates the proportion of the total number of words with a similarity of ≥95% (basically plagiarized) and a similarity of ≥80% (plagiarized with slight modifications). We use this ratio as a measure of plagiarism (similarity).
The system requires XP system and word2003 environment.
advantage
Broad coverage, covering approximately 18.8 billion web pages and 4.9 million papers through a hybrid engine. The system uses the self-developed ROST WebSpider and ROST SEAT algorithms to achieve broad coverage of the Internet and some journal networks.
Fuzzy detection and flexible matching. In order to prevent plagiarists from replacing some characters and deleting some punctuation marks, the system makes judgments based on similarity. The system uses the self-developed ROST Similar algorithm to achieve high-speed similarity detection and measurement. The system uses the self-developed QingQing algorithm to extract information fingerprints. On P3 and 512MB PC, the word segmentation speed is 13MB/S. An evaluation version has been provided on the Internet for industry evaluation.
The detection results of this software can only be used as a reference. You can right-click on the table to export the detailed inspection results and send them to the person being inspected. This software does not make a conclusion on whether it is plagiarized. It only tells you the proportion of texts that are more than 80% similar to existing literature. What is the total proportion. Text with a similarity higher than 80% is what needs attention. Values below this can be completely ignored.
Standardize the removal of citations and references to reduce the possibility of misjudgment.
The customized block detection mechanism accurately represents the similarity between each text block of the article and other documents. Each text block ranges from about 50 to 200 words (customizable). Red indicates extreme Similar (similarity greater than 80%), clear at a glance, clear and eye-catching. When set to 50 words per block, potentially plagiarized or similar documents can be found at a lower information granularity.
Similar document module tracking technology can directly locate which content in similar documents has been plagiarized or copied through simple operations, which is intuitive and clear.
The result analysis function automatically analyzes similar results of documents and gives evaluation opinions.
Supports multiple file formats, including PDF, DOC, PPT, XLS, TXT and other documents.
Proprietary data files are saved, so there is no need to repeatedly check and waste time.
shortcoming
This anti-plagiarism system cannot cover all Chinese and English documents in the world. The correlation between coverage rate and recall rate is under study.
The detection time is slightly longer. The software takes 7 seconds to detect every 200 words. An 8,000-word document takes at least about 5 minutes. It requires a little patience.
There is a small error in the detection results of this software. Using smaller document blocks for detection can reduce the error, but the time required will increase accordingly. After our trials in many editorial offices, the block size is set to 200 words. Appropriate, the error rate is acceptable at this time, and the document similarity rate is generally lower than the actual one.
it works
it works
it works