o(╥﹏╥)oSad

WeChat cannot downloadictclas, but just click on the upper right corner and select open in browser or Safari!
ictclas

ictclas

  • Size: 64.32M
  • Language: Simplified Chinese
  • category: Liberal Arts Tools
  • System: Win7/XP/2000/2003/Vista

The official version of ictclas is a powerful word segmentation system. The latest version of ictclas supports Chinese word segmentation, part-of-speech tagging, named entity recognition, new word recognition, user dictionaries and other functions, which can help users conduct analysis and research on Chinese language morphology. The ictclas software also provides users with functions such as part-of-speech standards, keyword extraction, and interface expansion to meet the needs of different users.

Introduction to ictclas software

Based on years of research work, the Institute of Computing Technology of the Chinese Academy of Sciences has developed a Chinese lexical analysis system ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System). Its main functions include Chinese word segmentation; part-of-speech tagging; named entity recognition; new word recognition; and also supports user dictionaries. . We have carefully built it for five years and upgraded the kernel 7 times. Currently, it has been upgraded to ICTCLAS2009 user dictionary interface extension. Users can dynamically add and delete words in the user dictionary and adjust the effect of word segmentation. Improved the flexibility of user dictionary usage.

Since 2009, ICTCLAS lexical analysis system has been renamed NLPIR word segmentation system in order to distinguish it from previous work and promote the NLPIR natural language processing and information retrieval sharing platform. Dr. Zhang Huaping has worked hard for more than ten years and upgraded the kernel more than ten times. He has won the first prize of the Qian Weichang Chinese Information Processing Science and Technology Award in 2010, the overall first place in the International SIGHAN Word Segmentation Competition in 2003, and the 2002 Domestic 973 Evaluation Overall first place. The number of global users has exceeded 300,000, including enterprises such as China Mobile, Huawei, China Sou, 3721, NEC, China Business Network, Silicon Valley Power, Yunnan Daily, and institutions such as Tsinghua University, Xinjiang University, South China Institute of Technology, and the University of Massachusetts: At the same time, ICTCLAS is widely used It has been reported by many media including Science Times, People's Daily Overseas Edition, Science and Technology Daily and many other media. You can visit Google to learn more about the application of ICTCLAS.

ictclas software functions

​​​1. Fingerprint extraction

Based on the content, structure, and relationship between words of the article, the semantic fingerprint that can represent the article is analyzed and represented by a numerical sequence.

​​​2. The word segmentation granularity is adjustable

You can control the granularity of word segmentation results. The shared version provides two word segmentation granularities, standard granularity and coarse granularity, to meet the needs of different users.

3. User dictionary interface extension

Users can dynamically add and delete words in the user dictionary and adjust the effect of word segmentation. Improved the flexibility of user dictionary usage.

​​​4. Enhanced part-of-speech tagging function

There are multiple annotation levels to choose from. The annotation levels available for the system include: Institute of Computing Technology first-level annotation level, Institute of Computing Technology second-level annotation set, Peking University first-level annotation set, and Peking University second-level annotation set.

5. Keyword extraction

Automatically extract several words or phrases that can well represent the topic of the document. Keyword extraction technology is widely used in various intelligent text information processing fields such as information retrieval, text classification/clustering, information filtering, document summarization, etc., and has great application value.

​​​6. New word discovery and adaptive word segmentation function

From longer text content, new feature languages ​​are automatically discovered based on information cross-entropy, and the language probability distribution model of the test corpus is adaptively tested to achieve adaptive word segmentation.

ictclas software advantages

​ 1. Public evaluation by domestic and international authorities, recognition by 30,000 customers

For commercial purposes, some companies close their doors and conduct self-tests, claiming that the accuracy is 99.50%, without introducing the test environment and test methods. It is not surprising that the accuracy of closed tests or small-scale open tests is 100%. ICTCLAS1.0 has been approved by the domestic 973 expert group The organization won first place in the evaluation activities. ICTCLAS2.0 won multiple first places in the evaluation organized by SigHan, the first international Chinese processing research institution. For details, please refer to the system evaluation section. These are the results of large-scale on-site open testing conducted by authoritative organizations and are authentic and credible.

ICTCLAS has issued more than 30,000 authorizations to domestic and foreign enterprises and academic institutions, including 3721, NEC, China Business Network, Silicon Valley Power, Yunnan Daily and other enterprises, Xinjiang University, Tsinghua University, South China Institute of Technology, University of Massachusetts; at the same time, ICTCLAS has been widely reported by many media including Science Times, People's Daily Overseas Edition, Science and Technology Daily, etc. You can visit Google to learn more about the application of ICTCLAS.

2. Optimum overall performance

Whether the word segmentation system can meet practical requirements mainly depends on two factors: word segmentation accuracy and analysis speed. The two restrict each other and are difficult to balance. Most systems tend to fall into the dilemma of "fast but not accurate, accurate but not fast". We have developed the perfect PDAT large-scale knowledge base management technology, which has made a major breakthrough between high speed and high accuracy. This technology can manage millions of dictionary knowledge bases. A single machine can query 1 million entries per second, and the memory Consumes less than 1.5 times the size of the knowledge base. Based on this technology, ICTCLAS3.0 has a word segmentation speed of 996KB/s on a single machine, a word segmentation accuracy of 98.45%, an API of no more than 200KB, and various dictionary data after compression of less than 3M. It is currently the best Chinese lexical analyzer in the world.

3. Unified linguistic computing theoretical framework

Chinese word segmentation involves many factors such as Chinese word segmentation, undefined word recognition, part-of-speech tagging, and language special cases. Most systems lack a unified processing method and often use loosely coupled module combinations. The final model cannot accurately and effectively express the vast diversity of words. language phenomenon, and ICTCLAS uses a Hierarchical Hidden Markov Model (Hierarchical Hidden Markov Model), unifies all aspects of Chinese lexical analysis into a complete theoretical framework to achieve the best overall effect. Relevant theoretical research has been published in top international conferences and magazines, confirming the model both theoretically and practically. advanced nature.

4. Comprehensive support for application development in various environments

ICTCLAS is all written in C/C++, supports Linux, FreeBSD and Windows series operating systems, and supports mainstream development languages ​​such as C/C++/C#/Delphi/Java.

5. Change according to needs and tailor-made

All functional modules can be disassembled and assembled. ICTCLAS has GB2312 and BIG5 versions, which can handle simplified and traditional Chinese respectively; it supports currently widely recognized word segmentation and part-of-speech standards, including the Computing Institute’s part-of-speech annotation set ICTPOS3.0, Peking University standard, Binzhou University standard, National Language Commission standards, "Academia Sinica" in Taiwan, and "City University" in Hong Kong; users can directly customize the output part-of-speech standards and define the output format; users can customize a word segmentation system that suits them based on their own needs.

ictclas update log

​ 1. Optimized some functions

​ 2. Solved many unbearable bugs

Huajun editor recommends:

Looking around, there are software similar to this software everywhere on the Internet. If you are not used to this software, you might as well try the electronic version of Chinese Idiom Dictionary, .NET, Cloud Machine Manager and other software. I hope you like it!

Version: 2016 official version | Update time: 2022-06-30

Similar recommendations

Latest updates

ictclas comments

  • 1st floor Huajun netizen 2021-10-05 07:47:25
    ictclas is awesome! 100 million likes! ! !
  • 2nd floor Huajun netizen 2011-10-06 15:29:11
    The overall feeling of ictclas is good and I am quite satisfied with it. The installation and operation are very smooth! Followed the installation step-by-step instructions and it went very smoothly!
  • 3rd floor Huajun netizen 2009-04-17 01:51:38
    ictclas is not bad, the download is very fast, I give you a good review!

Recommended products

Everyonealso likeThese:

+

This application has not passed real-name authentication. You can download it through the mobile assistant.

CancelOK