IMPROVEMENT OPTIMIZATION ALGORITHMS APPLIED FOR SOLVING THE POSTERIOR INFERENCE PROBLEM IN TOPIC MODELS

0 Người đánh giá. Xếp hạng trung bình 0

Tác giả: Thị Thanh Xuân Bùi, Thị Nhung Dương

Ngôn ngữ: Vie

Ký hiệu phân loại:

Thông tin xuất bản: Tạp chí Khoa học và Công nghệ Đại học Thái Nguyên, 2019

Mô tả vật lý:

Bộ sưu tập: Báo, Tạp chí

ID: 418493

Thêm vào giỏ

Bài toán suy diễn hậu nghiệm cho mỗi văn bản đóng vai trò quan trọng trong mô hình chủ đề. Tuy nhiên, trong quá trình giải bài toán suy diễn này thường đưa về dưới dạng một bài toán tối ưu không lồi với dữ liệu lớn, do đó nó thường là bài toán NP-khó. Có nhiều phương pháp được đề xuất để giải xấp xỉ bài toán suy diễn hậu nghiệm như phương pháp Variational Bayes (VB), collapsed variational Bayes (CVB) hay phương pháp collapsed Gibbs sampling (CGS),... Tuy nhiên các phương pháp này hầu hết không đảm bảo về chất lượng cũng như tốc độ hội tụ của thuật toán. Với ý tưởng sử dụng thuật toán Online Frank-Wolfe (OFW) và thuật toán Online Maximum a Posterior Estimation (OPE), chúng tôi đề xuất hai thuật toán cải tiến có hiệu quả giải bài toán suy diễn hậu nghiệm với mô hình chủ đề, đó là IOPE1, IOPE2. Bằng việc sử dụng biên ngẫu nhiên, xấp xỉ ngẫu nhiên và phân phối ngẫu nhiên như phân phối Uniform, phân phối Bernoulli, các đề xuất của chúng tôi được sử dụng để phát triển các phương pháp mới có hiệu quả để học các mô hình chủ đề từ bộ sưu tập văn bản lớn. Các kết quả thực nghiệm cho thấy các phương pháp tiếp cận của chúng tôi thường hiệu quả hơn các phương pháp trước đó., Tóm tắt tiếng anh, The posterior inference problem for individual text plays an important role in the topic models. However, in solving this problem, it is usually given as a nonconvex optimization problem with the large datasets, so it is often NP-hard. There are many methods proposed to approximate the posterior inference problem such as Variational Bayes (VB), collapsed variational Bayes (CVB) or collapsed Gibbs sampling (CGS) methods, but these methods do not guarantee the quality or convergence rate. Using the idea of Online Frank-Wolfe algorithm (OFW) and Online Maximum a Posteriori Estimation (OPE) algorithm, we propose two efficient algorithms for solving the posterior inference problem in the topic models which are IOPE1 and IOPE2. Using stochastic bounds, stochastic approximation and probability distributions such as uniform distribution, Bernoulli distribution, our improvements are used to develop new effective method for learning LDA from large text collections. Experimental results show that our approaches are often more effective than OPE.

1. Đọc trực tuyến

1. Đọc trực tuyến

Tạo bộ sưu tập với mã QR