Package org.apache.poi.xssf.extractor
Class XSSFEventBasedExcelExtractor
java.lang.Object
org.apache.poi.extractor.POITextExtractor
org.apache.poi.ooxml.extractor.POIXMLTextExtractor
org.apache.poi.xssf.extractor.XSSFEventBasedExcelExtractor
- All Implemented Interfaces:
Closeable
,AutoCloseable
,ExcelExtractor
- Direct Known Subclasses:
XSSFBEventBasedExcelExtractor
Implementation of a text extractor from OOXML Excel
files that uses SAX event based parsing.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected boolean
protected OPCPackage
protected boolean
protected boolean
protected boolean
protected boolean
protected boolean
protected Locale
protected POIXMLProperties
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Allows to free resources of the Extractor as soon as it is not needed any more.protected SharedStrings
createSharedStringsTable
(XSSFReader xssfReader, OPCPackage container) Returns the core document propertiesReturns the custom document propertiesReturns the extended document propertiesboolean
boolean
boolean
boolean
boolean
Returns the opened OPCPackage container.getText()
Processes the file and returns the textstatic void
void
processSheet
(XSSFSheetXMLHandler.SheetContentsHandler sheetContentsExtractor, Styles styles, Comments comments, SharedStrings strings, InputStream sheetInputStream) Processes the given sheetvoid
setConcatenatePhoneticRuns
(boolean concatenatePhoneticRuns) Concatenate text from <rPh> text elements in SharedStringsTable Default is true;void
setFormulasNotResults
(boolean formulasNotResults) Should we return the formula itself, and not the result it produces? Default is falsevoid
setIncludeCellComments
(boolean includeCellComments) Should cell comments be included? Default is falsevoid
setIncludeHeadersFooters
(boolean includeHeadersFooters) Should headers and footers be included? Default is truevoid
setIncludeSheetNames
(boolean includeSheetNames) Should sheet names be included? Default is truevoid
setIncludeTextBoxes
(boolean includeTextBoxes) Should text from textboxes be included? Default is truevoid
Methods inherited from class org.apache.poi.ooxml.extractor.POIXMLTextExtractor
checkMaxTextSize, getDocument, getMetadataTextExtractor
Methods inherited from class org.apache.poi.extractor.POITextExtractor
setFilesystem
-
Field Details
-
container
-
properties
-
locale
-
includeTextBoxes
protected boolean includeTextBoxes -
includeSheetNames
protected boolean includeSheetNames -
includeCellComments
protected boolean includeCellComments -
formulasNotResults
protected boolean formulasNotResults -
concatenatePhoneticRuns
protected boolean concatenatePhoneticRuns
-
-
Constructor Details
-
XSSFEventBasedExcelExtractor
public XSSFEventBasedExcelExtractor(String path) throws org.apache.xmlbeans.XmlException, OpenXML4JException, IOException - Throws:
org.apache.xmlbeans.XmlException
OpenXML4JException
IOException
-
XSSFEventBasedExcelExtractor
public XSSFEventBasedExcelExtractor(OPCPackage container) throws org.apache.xmlbeans.XmlException, OpenXML4JException, IOException - Throws:
org.apache.xmlbeans.XmlException
OpenXML4JException
IOException
-
-
Method Details
-
main
- Throws:
Exception
-
setIncludeSheetNames
public void setIncludeSheetNames(boolean includeSheetNames) Should sheet names be included? Default is true- Specified by:
setIncludeSheetNames
in interfaceExcelExtractor
- Parameters:
includeSheetNames
-true
if the sheet names should be included
-
getIncludeSheetNames
public boolean getIncludeSheetNames()- Returns:
- whether to include sheet names
- Since:
- 3.16-beta3
-
setFormulasNotResults
public void setFormulasNotResults(boolean formulasNotResults) Should we return the formula itself, and not the result it produces? Default is false- Specified by:
setFormulasNotResults
in interfaceExcelExtractor
- Parameters:
formulasNotResults
-true
if the formula itself is returned
-
getFormulasNotResults
public boolean getFormulasNotResults()- Returns:
- whether to include formulas but not results
- Since:
- 3.16-beta3
-
setIncludeTextBoxes
public void setIncludeTextBoxes(boolean includeTextBoxes) Should text from textboxes be included? Default is true -
getIncludeTextBoxes
public boolean getIncludeTextBoxes()- Returns:
- whether or not to extract textboxes
- Since:
- 3.16-beta3
-
setIncludeCellComments
public void setIncludeCellComments(boolean includeCellComments) Should cell comments be included? Default is false- Specified by:
setIncludeCellComments
in interfaceExcelExtractor
- Parameters:
includeCellComments
-true
if cell comments should be included
-
getIncludeCellComments
public boolean getIncludeCellComments()- Returns:
- whether cell comments should be included
- Since:
- 3.16-beta3
-
setConcatenatePhoneticRuns
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns) Concatenate text from <rPh> text elements in SharedStringsTable Default is true;- Parameters:
concatenatePhoneticRuns
- true if runs should be concatenated, false otherwise
-
setLocale
-
getLocale
- Returns:
- locale
- Since:
- 3.16-beta3
-
getPackage
Returns the opened OPCPackage container.- Overrides:
getPackage
in classPOIXMLTextExtractor
- Returns:
- the opened OPCPackage
-
getCoreProperties
Returns the core document properties- Overrides:
getCoreProperties
in classPOIXMLTextExtractor
- Returns:
- the core document properties
-
getExtendedProperties
Returns the extended document properties- Overrides:
getExtendedProperties
in classPOIXMLTextExtractor
- Returns:
- the extended document properties
-
getCustomProperties
Returns the custom document properties- Overrides:
getCustomProperties
in classPOIXMLTextExtractor
- Returns:
- the custom document properties
-
getText
Processes the file and returns the text- Specified by:
getText
in interfaceExcelExtractor
- Specified by:
getText
in classPOITextExtractor
- Returns:
- All the text from the document
-
close
Description copied from class:POITextExtractor
Allows to free resources of the Extractor as soon as it is not needed any more. This may include closing open file handles and freeing memory. The Extractor cannot be used after close has been called.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classPOIXMLTextExtractor
- Throws:
IOException
-