没有任何数据可供显示
开源项目社区 | 当前位置 : |
|
www.trustie.net/open_source_projects | 主页 > 开源项目社区 > semanticvectors |
semanticvectors
|
1 | 0 | 696 |
贡献者 | 讨论 | 代码提交 |
The Semantic Vectors PackageSemantic Vector indexes, created by applying a Random Projection algorithm to term-document matrices created using Apache Lucene. The package was created as part of a project by the University of Pittsburgh Office of Technology Management, to explore the potential for automatically matching related concepts in the technology management domain, e.g., mapping new technologies to potentatially interested licensors. This project can be found at http://real.hsls.pitt.edu.
The package creates a WordSpace model, of the kind developed by Stanford University's Infomap Project and other researchers during the 1990s and early 2000s. Such models are designed to represent words and documents in terms of underlying concepts, and as such can be used for many semantic (concept-aware) matching tasks such as automatic thesaurus generation, knowledge representation, and concept matching.
The Semantic Vectors package uses a Random Projection algorithm, a form of automatic semantic analysis, similar to Latent Semantic Analysis (LSA) and its variants like Probabilistic Latent Semantic Analysis (PLSA). However, unlike other methods, Random Projection does not rely on the use of computationally intensive matrix decomposition algorithms like Singular Value Decomposition (SVD). This makes Random Projection a much more scalable technique in practice. Our application of Random Projection for Natural Language Processing (NLP) is descended from Pentti Kanerva's work on Sparse Distributed Memory, which in semantic analysis and text mining, this method has also been called Random Indexing. A growing number of researchers have applied Random Projection to NLP tasks, demonstrating:
Semantic performance comparable with other forms of Latent Semantic Analysis. Significant computational performance advantages in creating and maintaining models. DocumentationJava API Documentation is at http://semanticvectors.googlecode.com/svn/trunk/doc/index.html. Installation help can be found in the InstallationInstructions. Help on using SemanticVectors for DocumentSearch. A page with links to more RelatedResearch. The package requires Apache Ant and Apache Lucene to have been installed, and the Lucene classes must be available in your CLASSPATH.
User GroupIssues and bugs can be posted using the Issues tab above. More general questions and discussions may be posted at the group webpage, http://groups.google.com/group/semanticvectors.
Originally written by Dominic Widdows, in collaboration with Kathleen Ferraro and the University of Pittsburgh. The project is now maintained and extended by a small group of developers, as listed in the SemanticVectors AUTHORS file.
Projects Using Semantic VectorsWe're starting a list of ProjectsUsingSemanticVectors. We're aware of a few more that we'll try to add in due course: please visit this page and leave comments if you know of any.