Package org.apache.poi.hwpf
Class HWPFDocumentCore
java.lang.Object
org.apache.poi.POIDocument
org.apache.poi.hwpf.HWPFDocumentCore
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
HWPFDocument
,HWPFOldDocument
This class holds much of the core of a Word document, but
without some of the table structure information.
You generally want to work with one of
HWPFDocument
or HWPFOldDocument
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected CHPBinTable
Contains formatting properties for textprotected FileInformationBlock
The FIBprotected FontTable
Holds fonts for this document.protected ListTables
Hold list tablesprotected byte[]
main document stream bufferprotected ObjectPoolImpl
Holds OLE2 objectsprotected PAPBinTable
Contains formatting properties for paragraphsprotected StyleSheet
Holds styles for this document.protected SectionTable
Contains formatting properties for sections.protected static final int
Size of the not encrypted part of the FIBprotected static final int
[MS-DOC] 2.2.6.2/3 Office Binary Document ...protected static final String
protected static final String
protected static final String
protected static final String
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
HWPFDocumentCore
(InputStream istream) This constructor loads a Word document from an InputStream.HWPFDocumentCore
(DirectoryNode directory) This constructor loads a Word document from a specific point in a POIFSFileSystem, probably not the default.HWPFDocumentCore
(POIFSFileSystem pfilesystem) This constructor loads a Word document from a POIFSFileSystem -
Method Summary
Modifier and TypeMethodDescriptionprotected byte[]
getDocumentEntryBytes
(String name, int encryptionOffset, int len) Reads OLE Stream into byte array - if anEncryptionInfo
is available, decrypt the bytes starting at encryptionOffset.Returns document text, i.e.byte[]
abstract Range
Returns the range that covers all text in the file, including main text, footnotes, headers and commentsabstract Range
getRange()
Returns the range which covers the whole of the document, but excludes any headers and footers.abstract StringBuilder
getText()
Internal method to access document textabstract TextPieceTable
protected void
static POIFSFileSystem
verifyAndBuildPOIFS
(InputStream istream) Takes an InputStream, verifies that it's not RTF or PDF, builds a POIFSFileSystem from it, and returns that.Methods inherited from class org.apache.poi.POIDocument
clearDirectory, close, createInformationProperties, getDirectory, getDocumentSummaryInformation, getEncryptedPropertyStreamName, getPropertySet, getPropertySet, getSummaryInformation, initDirectory, readProperties, replaceDirectory, validateInPlaceWritePossible, write, write, write, writeProperties, writeProperties, writeProperties
-
Field Details
-
STREAM_OBJECT_POOL
- See Also:
-
STREAM_WORD_DOCUMENT
- See Also:
-
STREAM_TABLE_0
- See Also:
-
STREAM_TABLE_1
- See Also:
-
FIB_BASE_LEN
protected static final int FIB_BASE_LENSize of the not encrypted part of the FIB- See Also:
-
RC4_REKEYING_INTERVAL
protected static final int RC4_REKEYING_INTERVAL[MS-DOC] 2.2.6.2/3 Office Binary Document ... Encryption: "... The block number MUST be set to zero at the beginning of the stream and MUST be incremented at each 512 byte boundary. ..."- See Also:
-
_objectPool
Holds OLE2 objects -
_fib
The FIB -
_ss
Holds styles for this document. -
_cbt
Contains formatting properties for text -
_pbt
Contains formatting properties for paragraphs -
_st
Contains formatting properties for sections. -
_ft
Holds fonts for this document. -
_lt
Hold list tables -
_mainStream
protected byte[] _mainStreammain document stream buffer
-
-
Constructor Details
-
HWPFDocumentCore
protected HWPFDocumentCore() -
HWPFDocumentCore
This constructor loads a Word document from an InputStream.- Parameters:
istream
- The InputStream that contains the Word document.- Throws:
IOException
- If there is an unexpected IOException from the passed in InputStream.
-
HWPFDocumentCore
This constructor loads a Word document from a POIFSFileSystem- Parameters:
pfilesystem
- The POIFSFileSystem that contains the Word document.- Throws:
IOException
- If there is an unexpected IOException from the passed in POIFSFileSystem.
-
HWPFDocumentCore
This constructor loads a Word document from a specific point in a POIFSFileSystem, probably not the default. Used typically to open embeded documents.- Parameters:
directory
- The DirectoryNode that contains the Word document.- Throws:
IOException
- If there is an unexpected IOException from the passed in POIFSFileSystem.
-
-
Method Details
-
verifyAndBuildPOIFS
Takes an InputStream, verifies that it's not RTF or PDF, builds a POIFSFileSystem from it, and returns that.- Throws:
IOException
-
getRange
Returns the range which covers the whole of the document, but excludes any headers and footers. -
getOverallRange
Returns the range that covers all text in the file, including main text, footnotes, headers and comments -
getDocumentText
Returns document text, i.e. text information from all text pieces, including OLE descriptions and field codes -
getText
Internal method to access document text -
getCharacterTable
-
getParagraphTable
-
getSectionTable
-
getStyleSheet
-
getListTables
-
getFontTable
-
getFileInformationBlock
-
getObjectsPool
-
getTextTable
-
getMainStream
-
getEncryptionInfo
- Overrides:
getEncryptionInfo
in classPOIDocument
- Returns:
- the encryption info if the document is encrypted, otherwise
null
- Throws:
IOException
- If retrieving the encryption information fails
-
updateEncryptionInfo
protected void updateEncryptionInfo() -
getDocumentEntryBytes
protected byte[] getDocumentEntryBytes(String name, int encryptionOffset, int len) throws IOException Reads OLE Stream into byte array - if anEncryptionInfo
is available, decrypt the bytes starting at encryptionOffset. If encryptionOffset = -1, then do not try to decrypt the bytes- Parameters:
name
- the name of the streamencryptionOffset
- the offset from which to start decrypting, use-1
for no decryptionlen
- length of the bytes to be read, useInteger.MAX_VALUE
for all bytes- Returns:
- the read bytes
- Throws:
IOException
- if the stream can't be found
-