没有任何数据可供显示
开源项目社区 | 当前位置 : |
|
www.trustie.net/open_source_projects | 主页 > 开源项目社区 > jtextextract |
jtextextract
|
0 | 0 | 3 |
贡献者 | 讨论 | 代码提交 |
概述
Allows developer to extract text from multiple format files such as MS-Office pre-2007 (Word, PPT, XLS), MS-Visio, MS-Office-2007 (docx, pptx,xlsx), PDF, RTF, XML, HTML, Text etc.
Usage scenarios:
1. Use this library to extract text from multiple format files such as MS-Office pre-2007 (Word, PPT, XLS), MS-Visio, MS-Office-2007 (docx, pptx,xlsx), PDF, RTF, XML, HTML, Text etc. and then pass the text to indexing engine such as Lucene for indexing and search across attachments later
2. Extract text from attachments in email
3. Use in any generic search engine that supports search in file attachments
It uses a number of complimentary libraries/technologies such as Apache POI, OpenXML4J and PDFBox.
创建时间:2014-05-11 23:48