Research on Design and Sharing of Yi Language Corpus Resources Database Based on Syntactic Rules
Abstract
With the development of natural language processing technology based on statistics in the field of information science and descriptive linguistics in the field of humanities, some research institutions, enterprises, organizations and even individuals have developed, built and accumulated a large number of corpora of different scales. This paper puts forward a construction scheme of designing and sharing Yi language corpus resources database based on syntactic rules. Including document corpus, corpus entry, corpus classification, corpus editing, corpus retrieval, corpus tools and user management modules, each module has completed a specific work of corpus construction. Operate on the corpus stored in xml documents, which is of great practical value in the study of Chinese institutions. Although xml documents are marked differently, other corpora can be converted only by slightly changing the corpus converter. Selecting the last 3530 sentences from the untrained sentences in the tree library as the test set, the accuracy of dependency parsing LAS increased from 72.33% to 77.12%, and UAS increased from 79.81 to 81.04%.