Cách tiếp cận mới để rút trích ngữ liệu song ngữ

 0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Quang Hùng Lê

Ngôn ngữ: vie

Ký hiệu phân loại: 423.9 Dictionaries of standard English

Thông tin xuất bản: Khoa học (Đại học Quy Nhơn), 2014

Mô tả vật lý: 13-24

Bộ sưu tập: Metadata

ID: 567471

 Parallel corpus is the crucial resource for many Natural Language Processing (NLP) systems such as statistical machine translation, cross-language information retrieval, and so on. Manually obtaining such corpora takes a very high cost while a large amount of them is available in various ways on the Web, such as web pages of bilingual web sites
  therefore, automatically extracting parallel texts from the Web becomes an important task in NLP studying. In this paper, the authors develop a new approach based on extending of the definition of parallel texts to match translation segments. This will help us to extract proper translation units in bilingual web pages. the authors also formulate the problem as a classification problem and use both kinds of knowledge resources, including structural information of web pages and the translation information between the two languages. The experiments are conducted on the language pair of English and Vietnamese, which showed significant results.
Tạo bộ sưu tập với mã QR

THƯ VIỆN - TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP.HCM

ĐT: (028) 71010608 | Email: tt.thuvien@hutech.edu.vn

Copyright @2024 THƯ VIỆN HUTECH