Class ExcelExtractor

All Implemented Interfaces:
Closeable, AutoCloseable, ExcelExtractor

public class ExcelExtractor extends POIOLE2TextExtractor implements ExcelExtractor
A text extractor for Excel files.

Returns the textual content of the file, suitable for indexing by something like Lucene, but not really intended for display to the user.

To turn an excel file into a CSV or similar, then see the XLS2CSVmra example

See Also:
  • Constructor Details

  • Method Details

    • main

      public static void main(String[] args) throws IOException
      Command line extractor.
      Parameters:
      args - the command line parameters
      Throws:
      IOException - if the file can't be read or contains errors
    • setIncludeSheetNames

      public void setIncludeSheetNames(boolean includeSheetNames)
      Description copied from interface: ExcelExtractor
      Should sheet names be included? Default is true
      Specified by:
      setIncludeSheetNames in interface ExcelExtractor
      Parameters:
      includeSheetNames - true if the sheet names should be included
    • setFormulasNotResults

      public void setFormulasNotResults(boolean formulasNotResults)
      Description copied from interface: ExcelExtractor
      Should we return the formula itself, and not the result it produces? Default is false
      Specified by:
      setFormulasNotResults in interface ExcelExtractor
      Parameters:
      formulasNotResults - true if the formula itself is returned
    • setIncludeCellComments

      public void setIncludeCellComments(boolean includeCellComments)
      Description copied from interface: ExcelExtractor
      Should cell comments be included? Default is false
      Specified by:
      setIncludeCellComments in interface ExcelExtractor
      Parameters:
      includeCellComments - true if cell comments should be included
    • setIncludeBlankCells

      public void setIncludeBlankCells(boolean includeBlankCells)
      Should blank cells be output? Default is to only output cells that are present in the file and are non-blank.
      Parameters:
      includeBlankCells - true if blank cells should be included
    • setIncludeHeadersFooters

      public void setIncludeHeadersFooters(boolean includeHeadersFooters)
      Description copied from interface: ExcelExtractor
      Should headers and footers be included in the output? Default is true
      Specified by:
      setIncludeHeadersFooters in interface ExcelExtractor
      Parameters:
      includeHeadersFooters - true if headers and footers should be included
    • getText

      public String getText()
      Description copied from class: POITextExtractor
      Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.
      Specified by:
      getText in interface ExcelExtractor
      Specified by:
      getText in class POITextExtractor
      Returns:
      All the text from the document
    • _extractHeaderFooter

      public static String _extractHeaderFooter(HeaderFooter hf)