Class URIEncoder


  • class URIEncoder
    extends Encoder
    URIEncoder -- An encoder for URI based contexts.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  URIEncoder.Mode
      Encoding mode of operation for URI encodes.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private long _highMask
      The bit-mask of characters that do not need to be escaped, for character with code-points in the range 64 to 127.
      private long _lowMask
      The bit-mask of characters that do not need to be escaped, for characters with code-points in the range 0 to 63.
      private URIEncoder.Mode _mode
      The encoding mode for this encoder--used primarily for toString().
      (package private) static int CHARS_0_TO_9
      Number of characters in the range '0' to '9'.
      (package private) static int CHARS_A_TO_Z
      Number of characters in the range 'a' to 'z'.
      (package private) static char INVALID_REPLACEMENT_CHARACTER
      The character to use when replacing an invalid character.
      (package private) static int LONG_BITS
      Number of bits in a long.
      (package private) static int MAX_ENCODED_CHAR_LENGTH
      Maximum number of characters quired to encode a single input character.
      (package private) static int MAX_UTF8_2_BYTE
      Maximum code-point value that can be encoded with 2 utf-8 bytes.
      (package private) static int PERCENT_ENCODED_LENGTH
      Number of characters used to '%' encode a single hex-value.
      (package private) static long RESERVED_MASK_HIGH
      The second 64 RFC 3986 Reserved characters.
      (package private) static long RESERVED_MASK_LOW
      RFC 3986 Reserved Characters.
      (package private) static char[] UHEX
      RFC 3986 -- "The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively.
      (package private) static long UNRESERVED_MASK_HIGH
      RFC 3986 Unreserved Characters.
      (package private) static long UNRESERVED_MASK_LOW
      RFC 3986 Unreserved Characters.
      (package private) static int UTF8_2_BYTE_FIRST_MSB
      When the encoded output requires 2 bytes, this is the high bits of the first byte.
      (package private) static int UTF8_3_BYTE_FIRST_MSB
      When the encoded output requires 3 bytes, this is the high bits of the first byte.
      (package private) static int UTF8_4_BYTE_FIRST_MSB
      When the encoded output requires 4 bytes, this is the high bits of the first byte.
      (package private) static int UTF8_BYTE_MSB
      For all characters in a 2-4 byte encoded sequence after the first this is the high bits of the input bytes.
      (package private) static int UTF8_MASK
      This is the mask containing 6-ones in the lower 6-bits.
      (package private) static int UTF8_SHIFT
      UTF-8 encodes 6-bits of the code-point in each output UTF-8 byte.
    • Constructor Summary

      Constructors 
      Constructor Description
      URIEncoder()
      Constructor equivalent to @{code URIEncoder(Mode.FULL_URI)}.
      URIEncoder​(URIEncoder.Mode mode)
      Constructor for the URIEncoder the specifies the encoding mode the URIEncoder will use.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected java.nio.charset.CoderResult encodeArrays​(java.nio.CharBuffer input, java.nio.CharBuffer output, boolean endOfInput)
      The core encoding loop used when both the input and output buffers are array backed.
      protected int firstEncodedOffset​(java.lang.String input, int off, int len)
      Scans the input string for the first character index that requires encoding.
      protected int maxEncodedLength​(int n)
      Returns the maximum encoded length (in chars) of an input sequence of n characters.
      java.lang.String toString()  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • CHARS_0_TO_9

        static final int CHARS_0_TO_9
        Number of characters in the range '0' to '9'.
        See Also:
        Constant Field Values
      • CHARS_A_TO_Z

        static final int CHARS_A_TO_Z
        Number of characters in the range 'a' to 'z'.
        See Also:
        Constant Field Values
      • MAX_ENCODED_CHAR_LENGTH

        static final int MAX_ENCODED_CHAR_LENGTH
        Maximum number of characters quired to encode a single input character.
        See Also:
        Constant Field Values
      • PERCENT_ENCODED_LENGTH

        static final int PERCENT_ENCODED_LENGTH
        Number of characters used to '%' encode a single hex-value.
        See Also:
        Constant Field Values
      • MAX_UTF8_2_BYTE

        static final int MAX_UTF8_2_BYTE
        Maximum code-point value that can be encoded with 2 utf-8 bytes.
        See Also:
        Constant Field Values
      • UTF8_2_BYTE_FIRST_MSB

        static final int UTF8_2_BYTE_FIRST_MSB
        When the encoded output requires 2 bytes, this is the high bits of the first byte.
        See Also:
        Constant Field Values
      • UTF8_3_BYTE_FIRST_MSB

        static final int UTF8_3_BYTE_FIRST_MSB
        When the encoded output requires 3 bytes, this is the high bits of the first byte.
        See Also:
        Constant Field Values
      • UTF8_4_BYTE_FIRST_MSB

        static final int UTF8_4_BYTE_FIRST_MSB
        When the encoded output requires 4 bytes, this is the high bits of the first byte.
        See Also:
        Constant Field Values
      • UTF8_BYTE_MSB

        static final int UTF8_BYTE_MSB
        For all characters in a 2-4 byte encoded sequence after the first this is the high bits of the input bytes.
        See Also:
        Constant Field Values
      • UTF8_SHIFT

        static final int UTF8_SHIFT
        UTF-8 encodes 6-bits of the code-point in each output UTF-8 byte.
        See Also:
        Constant Field Values
      • UTF8_MASK

        static final int UTF8_MASK
        This is the mask containing 6-ones in the lower 6-bits.
        See Also:
        Constant Field Values
      • INVALID_REPLACEMENT_CHARACTER

        static final char INVALID_REPLACEMENT_CHARACTER
        The character to use when replacing an invalid character.
        See Also:
        Constant Field Values
      • UHEX

        static final char[] UHEX
        RFC 3986 -- "The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively. If two URIs differ only in the case of hexadecimal digits used in percent- encoded octets, they are equivalent. For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent- encodings."
      • UNRESERVED_MASK_LOW

        static final long UNRESERVED_MASK_LOW
        RFC 3986 Unreserved Characters. The first 64.
             unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
         
        See Also:
        Constant Field Values
      • UNRESERVED_MASK_HIGH

        static final long UNRESERVED_MASK_HIGH
        RFC 3986 Unreserved Characters. The second 64.
             unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
         
        See Also:
        Constant Field Values
      • RESERVED_MASK_LOW

        static final long RESERVED_MASK_LOW
        RFC 3986 Reserved Characters. The first 64.
           reserved    = gen-delims / sub-delims
        
           gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
        
           sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                       / "*" / "+" / "," / ";" / "="
         
        See Also:
        Constant Field Values
      • RESERVED_MASK_HIGH

        static final long RESERVED_MASK_HIGH
        The second 64 RFC 3986 Reserved characters.
        See Also:
        Constant Field Values
      • _lowMask

        private final long _lowMask
        The bit-mask of characters that do not need to be escaped, for characters with code-points in the range 0 to 63.
      • _highMask

        private final long _highMask
        The bit-mask of characters that do not need to be escaped, for character with code-points in the range 64 to 127.
      • _mode

        private final URIEncoder.Mode _mode
        The encoding mode for this encoder--used primarily for toString().
    • Constructor Detail

      • URIEncoder

        URIEncoder()
        Constructor equivalent to @{code URIEncoder(Mode.FULL_URI)}.
      • URIEncoder

        URIEncoder​(URIEncoder.Mode mode)
        Constructor for the URIEncoder the specifies the encoding mode the URIEncoder will use.
        Parameters:
        mode - the encoding mode for this encoder.
    • Method Detail

      • maxEncodedLength

        protected int maxEncodedLength​(int n)
        Description copied from class: Encoder
        Returns the maximum encoded length (in chars) of an input sequence of n characters.
        Specified by:
        maxEncodedLength in class Encoder
        Parameters:
        n - the number of characters of input
        Returns:
        the worst-case number of characters required to encode
      • firstEncodedOffset

        protected int firstEncodedOffset​(java.lang.String input,
                                         int off,
                                         int len)
        Description copied from class: Encoder
        Scans the input string for the first character index that requires encoding. If the entire input does not require encoding then the length is returned. This method is used by the Encode.forXYZ methods to return input strings unchanged when possible.
        Specified by:
        firstEncodedOffset in class Encoder
        Parameters:
        input - the input to check for encoding
        off - the offset of the first character to check
        len - the number of characters to check
        Returns:
        the index of the first character to encode. The return value will be off+len if no characters in the input require encoding.
      • encodeArrays

        protected java.nio.charset.CoderResult encodeArrays​(java.nio.CharBuffer input,
                                                            java.nio.CharBuffer output,
                                                            boolean endOfInput)
        Description copied from class: Encoder
        The core encoding loop used when both the input and output buffers are array backed. The loop is expected to fetch the arrays and interact with the arrays directly for performance.
        Overrides:
        encodeArrays in class Encoder
        Parameters:
        input - the input buffer.
        output - the output buffer.
        endOfInput - when true, this is the last input to encode
        Returns:
        UNDERFLOW or OVERFLOW
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object