Class FST<T>

  • All Implemented Interfaces:
    Accountable

    public final class FST<T>
    extends java.lang.Object
    implements Accountable
    Represents an finite state machine (FST), using a compact byte[] format.

    The format is similar to what's used by Morfologik (https://github.com/morfologik/morfologik-stemming).

    See the package documentation for some simple examples.

    • Constructor Detail

      • FST

        public FST​(DataInput in,
                   Outputs<T> outputs)
            throws java.io.IOException
        Load a previously saved FST.
        Throws:
        java.io.IOException
      • FST

        public FST​(DataInput in,
                   Outputs<T> outputs,
                   FSTStore fstStore)
            throws java.io.IOException
        Load a previously saved FST; maxBlockBits allows you to control the size of the byte[] pages used to hold the FST bytes.
        Throws:
        java.io.IOException
    • Method Detail

      • flag

        private static boolean flag​(int flags,
                                    int bit)
      • ramBytesUsed

        public long ramBytesUsed()
        Description copied from interface: Accountable
        Return the memory usage of this object in bytes. Negative values are illegal.
        Specified by:
        ramBytesUsed in interface Accountable
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • finish

        void finish​(long newStartNode)
             throws java.io.IOException
        Throws:
        java.io.IOException
      • getEmptyOutput

        public T getEmptyOutput()
      • setEmptyOutput

        void setEmptyOutput​(T v)
      • save

        public void save​(DataOutput out)
                  throws java.io.IOException
        Throws:
        java.io.IOException
      • save

        public void save​(java.nio.file.Path path)
                  throws java.io.IOException
        Writes an automaton to a file.
        Throws:
        java.io.IOException
      • read

        public static <T> FST<T> read​(java.nio.file.Path path,
                                      Outputs<T> outputs)
                               throws java.io.IOException
        Reads an automaton from a file.
        Throws:
        java.io.IOException
      • writeLabel

        private void writeLabel​(DataOutput out,
                                int v)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • readLabel

        public int readLabel​(DataInput in)
                      throws java.io.IOException
        Reads one BYTE1/2/4 label from the provided DataInput.
        Throws:
        java.io.IOException
      • targetHasArcs

        public static <T> boolean targetHasArcs​(FST.Arc<T> arc)
        returns true if the node at this address has any outgoing arcs
      • shouldExpandNodeWithFixedLengthArcs

        private boolean shouldExpandNodeWithFixedLengthArcs​(Builder<T> builder,
                                                            Builder.UnCompiledNode<T> node)
        Returns whether the given node should be expanded with fixed length arcs. Nodes will be expanded depending on their depth (distance from the root node) and their number of arcs.

        Nodes with fixed length arcs use more space, because they encode all arcs with a fixed number of bytes, but they allow either binary search or direct addressing on the arcs (instead of linear scan) on lookup by arc label.

      • shouldExpandNodeWithDirectAddressing

        private boolean shouldExpandNodeWithDirectAddressing​(Builder<T> builder,
                                                             Builder.UnCompiledNode<T> nodeIn,
                                                             int numBytesPerArc,
                                                             int maxBytesPerArcWithoutLabel,
                                                             int labelRange)
        Returns whether the given node should be expanded with direct addressing instead of binary search.

        Prefer direct addressing for performance if it does not oversize binary search byte size too much, so that the arcs can be directly addressed by label.

        See Also:
        Builder.getDirectAddressingMaxOversizingFactor()
      • writeNodeForBinarySearch

        private void writeNodeForBinarySearch​(Builder<T> builder,
                                              Builder.UnCompiledNode<T> nodeIn,
                                              long startAddress,
                                              int maxBytesPerArc)
      • writeNodeForDirectAddressing

        private void writeNodeForDirectAddressing​(Builder<T> builder,
                                                  Builder.UnCompiledNode<T> nodeIn,
                                                  long startAddress,
                                                  int maxBytesPerArcWithoutLabel,
                                                  int labelRange)
      • getNumPresenceBytes

        private static int getNumPresenceBytes​(int labelRange)
        Gets the number of bytes required to flag the presence of each arc in the given label range, one bit per arc.
      • readPresenceBytes

        private int readPresenceBytes​(FST.Arc<T> arc,
                                      FST.BytesReader in)
                               throws java.io.IOException
        Reads the presence bits of a direct-addressing node, store them in the provided arc FST.Arc.bitTable() and returns the number of presence bytes.
        Throws:
        java.io.IOException
      • getNumArcsDirectAddressing

        private int getNumArcsDirectAddressing​(FST.Arc<T> arc)
      • assertPresenceBytesAreValid

        private boolean assertPresenceBytesAreValid​(FST.Arc<T> arc)
      • getFirstArc

        public FST.Arc<T> getFirstArc​(FST.Arc<T> arc)
        Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node
      • readLastTargetArc

        FST.Arc<T> readLastTargetArc​(FST.Arc<T> follow,
                                     FST.Arc<T> arc,
                                     FST.BytesReader in)
                              throws java.io.IOException
        Follows the follow arc and reads the last arc of its target; this changes the provided arc (2nd arg) in-place and returns it.
        Returns:
        Returns the second argument (arc).
        Throws:
        java.io.IOException
      • readUnpackedNodeTarget

        private long readUnpackedNodeTarget​(FST.BytesReader in)
                                     throws java.io.IOException
        Throws:
        java.io.IOException
      • readFirstTargetArc

        public FST.Arc<T> readFirstTargetArc​(FST.Arc<T> follow,
                                             FST.Arc<T> arc,
                                             FST.BytesReader in)
                                      throws java.io.IOException
        Follow the follow arc and read the first arc of its target; this changes the provided arc (2nd arg) in-place and returns it.
        Returns:
        Returns the second argument (arc).
        Throws:
        java.io.IOException
      • readFirstRealTargetArc

        public FST.Arc<T> readFirstRealTargetArc​(long nodeAddress,
                                                 FST.Arc<T> arc,
                                                 FST.BytesReader in)
                                          throws java.io.IOException
        Throws:
        java.io.IOException
      • isExpandedTarget

        boolean isExpandedTarget​(FST.Arc<T> follow,
                                 FST.BytesReader in)
                          throws java.io.IOException
        Returns whether arc's target points to a node in expanded format (fixed length arcs).
        Throws:
        java.io.IOException
      • readNextArc

        public FST.Arc<T> readNextArc​(FST.Arc<T> arc,
                                      FST.BytesReader in)
                               throws java.io.IOException
        In-place read; returns the arc.
        Throws:
        java.io.IOException
      • readNextArcLabel

        int readNextArcLabel​(FST.Arc<T> arc,
                             FST.BytesReader in)
                      throws java.io.IOException
        Peeks at next arc's label; does not alter arc. Do not call this if arc.isLast()!
        Throws:
        java.io.IOException
      • readArcByIndex

        public FST.Arc<T> readArcByIndex​(FST.Arc<T> arc,
                                         FST.BytesReader in,
                                         int idx)
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • readArcByDirectAddressing

        public FST.Arc<T> readArcByDirectAddressing​(FST.Arc<T> arc,
                                                    FST.BytesReader in,
                                                    int rangeIndex)
                                             throws java.io.IOException
        Reads a present direct addressing node arc, with the provided index in the label range.
        Parameters:
        rangeIndex - The index of the arc in the label range. It must be present. The real arc offset is computed based on the presence bits of the direct addressing node.
        Throws:
        java.io.IOException
      • readNextRealArc

        public FST.Arc<T> readNextRealArc​(FST.Arc<T> arc,
                                          FST.BytesReader in)
                                   throws java.io.IOException
        Never returns null, but you should never call this if arc.isLast() is true.
        Throws:
        java.io.IOException
      • readArc

        private FST.Arc<T> readArc​(FST.Arc<T> arc,
                                   FST.BytesReader in)
                            throws java.io.IOException
        Reads an arc.
        Precondition: The arc flags byte has already been read and set; the given BytesReader is positioned just after the arc flags byte.
        Throws:
        java.io.IOException
      • findTargetArc

        public FST.Arc<T> findTargetArc​(int labelToMatch,
                                        FST.Arc<T> follow,
                                        FST.Arc<T> arc,
                                        FST.BytesReader in)
                                 throws java.io.IOException
        Finds an arc leaving the incoming arc, replacing the arc in place. This returns null if the arc was not found, else the incoming arc.
        Throws:
        java.io.IOException
      • seekToNextNode

        private void seekToNextNode​(FST.BytesReader in)
                             throws java.io.IOException
        Throws:
        java.io.IOException