Class OfficeReader


  • public class OfficeReader
    extends java.lang.Object

    This class reads and collects global information about an OOo document. This includes styles, forms, information about indexes and references etc.

    • Constructor Detail

      • OfficeReader

        public OfficeReader​(OfficeDocument oooDoc,
                            boolean bAllParagraphsAreSoft)
        Constructor; read a document
    • Method Detail

      • isTextElement

        public static boolean isTextElement​(org.w3c.dom.Node node)
        Checks, if a node is an element in the text namespace
        Parameters:
        node - the node to check
        Returns:
        true if this is a text element
      • isTableElement

        public static boolean isTableElement​(org.w3c.dom.Node node)
        Checks, if a node is an element in the table namespace
        Parameters:
        node - the node to check
        Returns:
        true if this is a table element
      • isDrawElement

        public static boolean isDrawElement​(org.w3c.dom.Node node)
        Checks, if a node is an element in the draw namespace
        Parameters:
        node - the node to check
        Returns:
        true if this is a draw element
      • isNoteElement

        public static boolean isNoteElement​(org.w3c.dom.Node node)
        Checks, if a node is an element representing a note (footnote/endnote)
        Parameters:
        node - the node to check
        Returns:
        true if this is a note element
      • isSingleParagraph

        public static boolean isSingleParagraph​(org.w3c.dom.Node node)
        Checks, if this node contains at most one element, and that this is a paragraph.
        Parameters:
        node - the node to check
        Returns:
        true if the node contains a single paragraph or nothing
      • isWhitespaceContent

        public static boolean isWhitespaceContent​(org.w3c.dom.Node node)

        Checks, if the only text content of this node is whitespace

        Parameters:
        node - the node to check (should be a paragraph node or a child of a paragraph node)
        Returns:
        true if the node contains whitespace only
      • isWhitespace

        public static boolean isWhitespace​(java.lang.String s)

        Checks, if this text is whitespace

        Parameters:
        s - the String to check
        Returns:
        true if the String contains whitespace only
      • getCharacterCount

        public static int getCharacterCount​(org.w3c.dom.Node node)
        Counts the number of characters (text nodes) in this element excluding footnotes etc.
        Parameters:
        node - the node to count in
        Returns:
        the number of characters
      • getTextContent

        public java.lang.String getTextContent​(org.w3c.dom.Node node)
      • getNextChar

        public static char getNextChar​(org.w3c.dom.Node node)
        Return the next character in logical order
      • isPackageFormat

        public boolean isPackageFormat()
        Checks whether or not this document is in package format
        Returns:
        true if it's in package format
      • isInPackage

        public boolean isInPackage​(java.lang.String sUrl)
        Checks whether this url is internal to the package
        Parameters:
        sUrl - the url to check
        Returns:
        true if the url is internal to the package
      • getFontDeclarations

        public OfficeStyleFamily getFontDeclarations()

        Get the collection of all font declarations.

        Returns:
        the OfficeStyleFamily of font declarations
      • getFontDeclaration

        public FontDeclaration getFontDeclaration​(java.lang.String sName)

        Get a specific font declaration

        Parameters:
        sName - the name of the font declaration
        Returns:
        a FontDeclaration representing the font
      • getPresentationStyle

        public StyleWithProperties getPresentationStyle​(java.lang.String sName)
      • getDrawingPageStyle

        public StyleWithProperties getDrawingPageStyle​(java.lang.String sName)
      • getListStyle

        public ListStyle getListStyle​(java.lang.String sName)
      • getPageLayout

        public PageLayout getPageLayout​(java.lang.String sName)
      • getMasterPage

        public MasterPage getMasterPage​(java.lang.String sName)
      • getOutlineStyle

        public ListStyle getOutlineStyle()
      • getFootnotesConfiguration

        public PropertySet getFootnotesConfiguration()
      • getEndnotesConfiguration

        public PropertySet getEndnotesConfiguration()
      • getHeadingStyle

        public StyleWithProperties getHeadingStyle​(int nLevel)

        Returns the paragraph style associated with headings of a specific level. Returns null if no such style is known.

        In principle, different styles can be used for each heading, in practice the same (soft) style is used for all headings of a specific level.

        Parameters:
        nLevel - the level of the heading
        Returns:
        a StyleWithProperties object representing the style
      • getFirstMasterPage

        public MasterPage getFirstMasterPage()

        Returns the first master page used in the document. If no master page is used explicitly, the first master page found in the styles is returned. Returns null if no master pages exists.

        Returns:
        a MasterPage object representing the master page
      • getMajorityLanguage

        public java.lang.String getMajorityLanguage()
        Return the iso language used in most paragaph styles (in a well-structured document this will be the default language) TODO: Base on content rather than style
        Returns:
        the iso language
      • getTocReader

        public TocReader getTocReader​(org.w3c.dom.Element onode)

        Returns a reader for a specific toc

        Parameters:
        onode - the text:table-of-content-node
        Returns:
        the reader, or null
      • isIndexSourceStyle

        public boolean isIndexSourceStyle​(java.lang.String sStyleName)

        Is this style used in some toc as an index source style?

        Parameters:
        sStyleName - the name of the style
        Returns:
        true if this is an index source style
      • isFigureSequenceName

        public boolean isFigureSequenceName​(java.lang.String sName)

        Does this sequence name belong to a lof?

        Parameters:
        sName - the name of the sequence
        Returns:
        true if it belongs to an index
      • isTableSequenceName

        public boolean isTableSequenceName​(java.lang.String sName)

        Does this sequence name belong to a lot?

        Parameters:
        sName - the name of the sequence
        Returns:
        true if it belongs to an index
      • addTableSequenceName

        public void addTableSequenceName​(java.lang.String sName)

        Add a sequence name for table captions.

        OpenDocument has a very weak notion of table captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of tables. If there's no list of tables, captions cannot be identified. Thus this method lets the user add a sequence name to identify the table captions.

        Parameters:
        sName - the name to add
      • addFigureSequenceName

        public void addFigureSequenceName​(java.lang.String sName)

        Add a sequence name for figure captions.

        OpenDocument has a very weak notion of figure captions: A caption is a paragraph containing a text:sequence element. Moreover, the only source to identify which sequence number to use is the list(s) of figures. If there's no list of figures, captions cannot be identified. Thus this method lets the user add a sequence name to identify the figure captions.

        Parameters:
        sName - the name to add
      • getSequenceName

        public java.lang.String getSequenceName​(org.w3c.dom.Element par)

        Get the sequence name associated with a paragraph

        Parameters:
        par - the paragraph to look up
        Returns:
        the sequence name or null
      • getSequenceFromRef

        public java.lang.String getSequenceFromRef​(java.lang.String sRefName)

        Get the sequence name associated with a reference name

        Parameters:
        sRefName - the reference name to use
        Returns:
        the sequence name or null
      • hasFootnoteRefTo

        public boolean hasFootnoteRefTo​(java.lang.String sId)

        Is there a reference to this footnote id?

        Parameters:
        sId - the id of the footnote
        Returns:
        true if there is a reference
      • hasEndnoteRefTo

        public boolean hasEndnoteRefTo​(java.lang.String sId)

        Is there a reference to this endnote?

        Parameters:
        sId - the id of the endnote
        Returns:
        true if there is a reference
      • referenceMarkInHeading

        public boolean referenceMarkInHeading​(java.lang.String sName)
        Is this reference mark contained in a heading?
        Parameters:
        sName - the name of the reference mark
        Returns:
        true if so
      • hasReferenceRefTo

        public boolean hasReferenceRefTo​(java.lang.String sName)
        Is there a reference to this reference mark?
        Parameters:
        sName - the name of the reference mark
        Returns:
        true if there is a reference
      • bookmarkInHeading

        public boolean bookmarkInHeading​(java.lang.String sName)
        Is this bookmark contained in a heading?
        Parameters:
        sName - the name of the bookmark
        Returns:
        true if so
      • hasBookmarkRefTo

        public boolean hasBookmarkRefTo​(java.lang.String sName)

        Is there a reference to this bookmark?

        Parameters:
        sName - the name of the bookmark
        Returns:
        true if there is a reference
      • hasSequenceRefTo

        public boolean hasSequenceRefTo​(java.lang.String sId)

        Is there a reference to this sequence field?

        Parameters:
        sId - the id of the sequence field
        Returns:
        true if there is a reference
      • hasLinkTo

        public boolean hasLinkTo​(java.lang.String sName)

        Is there a link to this sequence anchor name?

        Parameters:
        sName - the name of the anchor
        Returns:
        true if there is a link
      • isOpenDocument

        public boolean isOpenDocument()

        Is this an OASIS OpenDocument or an OOo 1.0 document?

        Returns:
        true if it's an OASIS OpenDocument
      • isText

        public boolean isText()

        Is this an text document?

        Returns:
        true if it's a text document
      • isSpreadsheet

        public boolean isSpreadsheet()

        Is this a spreadsheet document?

        Returns:
        true if it's a spreadsheet document
      • isPresentation

        public boolean isPresentation()

        Is this a presentation document?

        Returns:
        true if it's a presentation document
      • getContent

        public org.w3c.dom.Element getContent()

        Get the content element

        In the old file format this means the office:body element

        In the OpenDocument format this means a office:text, office:spreadsheet or office:presentation element.

        Returns:
        the content Element
      • getForms

        public FormsReader getForms()

        Get the forms belonging to this document.

        Returns:
        a FormsReader representing the forms
      • getTableReader

        public TableReader getTableReader​(org.w3c.dom.Element node)

        Read a table from a table:table node

        Parameters:
        node - the table:table Element node
        Returns:
        a TableReader object representing the table