ScholarBank@NUShttps://scholarbank.nus.edu.sgThe DSpace digital repository system captures, stores, indexes, preserves, and distributes digital research material.Tue, 30 Nov 2021 18:41:56 GMT2021-11-30T18:41:56Z5051Comparing keyword extraction techniques for WEBSOM text archiveshttps://scholarbank.nus.edu.sg/handle/10635/40401Title: Comparing keyword extraction techniques for WEBSOM text archives
Authors: Azcarraga, A.P.; Yap Jr., T.N.
Abstract: The WEBSOM methodology for building very large text archives has a very slow method for extracting meaningful unit labels. This is because the method computes for the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 100,000 units and the archive may contain up to 7 million documents, the existing WEBSOM method is not practical. A fast alternative method is based on the distribution of weights in the weight vectors of the trained map, plus a simple manipulation of the random projection matrix used for input data compression. Comparisons made using a WEBSOM archive of the Reuters text collection reveal that a high percentage of keywords extracted using this method match the keywords extracted for the same map units using the original WEBSOM method.
Mon, 01 Jan 2001 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/404012001-01-01T00:00:00ZExtracting meaningful labels for WEBSOM text archiveshttps://scholarbank.nus.edu.sg/handle/10635/40405Title: Extracting meaningful labels for WEBSOM text archives
Authors: Azcarraga, A.P.; Yap Jr., T.N.
Abstract: Self-Organizing Maps, being used mainly with data that are not pre-labeled, need automatic procedures for extracting keywords as labels for each of the map units. The WEBSOM methodology for building very large text archives has a very slow method for extracting such unit labels. It computes the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of all the other units of the map. Since maps may have more than 100,000 units and the archive may contain up to 7 million documents, the existing WEBSOM method is not practical. This paper describes how the meaningful labels per map unit can be deduced by analyzing the relative weight distribution of the SOM weight vectors and by taking advantage of some characteristics of the random projection method used in dimensionality reduction. The effectiveness of this technique is demonstrated on archives of the well studied Reuters and CNN collections. Comparisons with the WEBSOM method are provided.
Mon, 01 Jan 2001 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/404052001-01-01T00:00:00ZGeneralized Associative Memory Models for Data Fusionhttps://scholarbank.nus.edu.sg/handle/10635/40334Title: Generalized Associative Memory Models for Data Fusion
Authors: Yap Jr., T.N.; Azcarraga, A.P.
Abstract: The Hopfield and bi-directional associative memory (BAM) models are well developed and carefully studied models for associative memory that are patterned after the memory structure of the animal brain. Their basic limitation is that they can only perform associations between at most two sets of patterns. Several different models for generalized associative memory are proposed here. These models are all extensions of the Hopfield and BAM models that can perform multiple associations. Extensive software simulations are conducted to evaluate the different models, using the memory capacity as basis for comparing their performance. The use of the Widrow-Hoff gradient descent error correction algorithm is introduced that can improve the memory capacities of the various models. Potential application of these models as data fusion systems is explored.
Wed, 01 Jan 2003 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/403342003-01-01T00:00:00ZAssessing self-organization using order metricshttps://scholarbank.nus.edu.sg/handle/10635/40375Title: Assessing self-organization using order metrics
Authors: Azcarraga, Arnulfo P.
Abstract: Self-Organizing Maps (SOM) are proving to be useful as data analysis and visualization tools because they can visually render the data analysis results in 2D or 3D, and do not need category information for each input pattern. But this unsupervised nature of the training process makes it difficult to have separate training and test sets to determine the quality of the training process, which is done quite naturally for supervised Neural Network learning algorithms. In applications like data analysis, where there is little clue as to the way the SOM is supposed to look like after training, it is important to be able to assess the quality of the self-organization process independent of the application, and without need for category information. The Average Unit Disorder has been used to assess the quality of the ordering of a self-organized map. It is shown here that this same order metric can be used to assess the quality of the self-organization process itself. Based on this order metric, it can be determined whether the SOM has adequately learned, whether the parameters used to train the SOM have been correctly specified, and whether the SOM variant used is well-suited to the specific problem at hand.
Sat, 01 Jan 2000 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/403752000-01-01T00:00:00ZAn effective method for generating multiple linear regression rules from artificial neural networkshttps://scholarbank.nus.edu.sg/handle/10635/42916Title: An effective method for generating multiple linear regression rules from artificial neural networks
Authors: Setiono, R.; Azcarraga, A.
Abstract: We describe a method for multivariate function approximation which combines neural network learning, clustering and multiple regression. Neural networks with a single hidden layer are universal function approximators. However, due to the complexity of the network topology and the nonlinear transfer function used in computing the activation of the hidden units, the predictions of a trained network are difficult to comprehend. On the other hand, predictions from a multiple linear regression equation are easy to understand but not accurate when the underlying relationship between the input variables and the output variable is nonlinear. The method presented in this paper generates a set of multiple linear regression equations using neural networks. The number of regression equations is determined by clustering the weighted input variables. The predictions for samples in the same cluster are computed by the same regression equation. Experimental results on real-world data demonstrate that the new method generates relatively few regression equations from the training data samples. The errors in prediction using these equations are comparable to or lower than those achieved by existing function approximation methods.
Mon, 01 Jan 2001 00:00:00 GMThttps://scholarbank.nus.edu.sg/handle/10635/429162001-01-01T00:00:00Z