Package org.owasp.encoder
Class URIEncoder
- java.lang.Object
-
- org.owasp.encoder.Encoder
-
- org.owasp.encoder.URIEncoder
-
class URIEncoder extends Encoder
URIEncoder -- An encoder for URI based contexts.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
URIEncoder.Mode
Encoding mode of operation for URI encodes.
-
Field Summary
Fields Modifier and Type Field Description private long
_highMask
The bit-mask of characters that do not need to be escaped, for character with code-points in the range 64 to 127.private long
_lowMask
The bit-mask of characters that do not need to be escaped, for characters with code-points in the range 0 to 63.private URIEncoder.Mode
_mode
The encoding mode for this encoder--used primarily for toString().(package private) static int
CHARS_0_TO_9
Number of characters in the range '0' to '9'.(package private) static int
CHARS_A_TO_Z
Number of characters in the range 'a' to 'z'.(package private) static char
INVALID_REPLACEMENT_CHARACTER
The character to use when replacing an invalid character.(package private) static int
LONG_BITS
Number of bits in a long.(package private) static int
MAX_ENCODED_CHAR_LENGTH
Maximum number of characters quired to encode a single input character.(package private) static int
MAX_UTF8_2_BYTE
Maximum code-point value that can be encoded with 2 utf-8 bytes.(package private) static int
PERCENT_ENCODED_LENGTH
Number of characters used to '%' encode a single hex-value.(package private) static long
RESERVED_MASK_HIGH
The second 64 RFC 3986 Reserved characters.(package private) static long
RESERVED_MASK_LOW
RFC 3986 Reserved Characters.(package private) static char[]
UHEX
RFC 3986 -- "The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively.(package private) static long
UNRESERVED_MASK_HIGH
RFC 3986 Unreserved Characters.(package private) static long
UNRESERVED_MASK_LOW
RFC 3986 Unreserved Characters.(package private) static int
UTF8_2_BYTE_FIRST_MSB
When the encoded output requires 2 bytes, this is the high bits of the first byte.(package private) static int
UTF8_3_BYTE_FIRST_MSB
When the encoded output requires 3 bytes, this is the high bits of the first byte.(package private) static int
UTF8_4_BYTE_FIRST_MSB
When the encoded output requires 4 bytes, this is the high bits of the first byte.(package private) static int
UTF8_BYTE_MSB
For all characters in a 2-4 byte encoded sequence after the first this is the high bits of the input bytes.(package private) static int
UTF8_MASK
This is the mask containing 6-ones in the lower 6-bits.(package private) static int
UTF8_SHIFT
UTF-8 encodes 6-bits of the code-point in each output UTF-8 byte.
-
Constructor Summary
Constructors Constructor Description URIEncoder()
Constructor equivalent to @{code URIEncoder(Mode.FULL_URI)}.URIEncoder(URIEncoder.Mode mode)
Constructor for the URIEncoder the specifies the encoding mode the URIEncoder will use.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected java.nio.charset.CoderResult
encodeArrays(java.nio.CharBuffer input, java.nio.CharBuffer output, boolean endOfInput)
The core encoding loop used when both the input and output buffers are array backed.protected int
firstEncodedOffset(java.lang.String input, int off, int len)
Scans the input string for the first character index that requires encoding.protected int
maxEncodedLength(int n)
Returns the maximum encoded length (in chars) of an input sequence ofn
characters.java.lang.String
toString()
-
Methods inherited from class org.owasp.encoder.Encoder
encode, encodeBuffers, overflow, underflow
-
-
-
-
Field Detail
-
CHARS_0_TO_9
static final int CHARS_0_TO_9
Number of characters in the range '0' to '9'.- See Also:
- Constant Field Values
-
CHARS_A_TO_Z
static final int CHARS_A_TO_Z
Number of characters in the range 'a' to 'z'.- See Also:
- Constant Field Values
-
LONG_BITS
static final int LONG_BITS
Number of bits in a long.- See Also:
- Constant Field Values
-
MAX_ENCODED_CHAR_LENGTH
static final int MAX_ENCODED_CHAR_LENGTH
Maximum number of characters quired to encode a single input character.- See Also:
- Constant Field Values
-
PERCENT_ENCODED_LENGTH
static final int PERCENT_ENCODED_LENGTH
Number of characters used to '%' encode a single hex-value.- See Also:
- Constant Field Values
-
MAX_UTF8_2_BYTE
static final int MAX_UTF8_2_BYTE
Maximum code-point value that can be encoded with 2 utf-8 bytes.- See Also:
- Constant Field Values
-
UTF8_2_BYTE_FIRST_MSB
static final int UTF8_2_BYTE_FIRST_MSB
When the encoded output requires 2 bytes, this is the high bits of the first byte.- See Also:
- Constant Field Values
-
UTF8_3_BYTE_FIRST_MSB
static final int UTF8_3_BYTE_FIRST_MSB
When the encoded output requires 3 bytes, this is the high bits of the first byte.- See Also:
- Constant Field Values
-
UTF8_4_BYTE_FIRST_MSB
static final int UTF8_4_BYTE_FIRST_MSB
When the encoded output requires 4 bytes, this is the high bits of the first byte.- See Also:
- Constant Field Values
-
UTF8_BYTE_MSB
static final int UTF8_BYTE_MSB
For all characters in a 2-4 byte encoded sequence after the first this is the high bits of the input bytes.- See Also:
- Constant Field Values
-
UTF8_SHIFT
static final int UTF8_SHIFT
UTF-8 encodes 6-bits of the code-point in each output UTF-8 byte.- See Also:
- Constant Field Values
-
UTF8_MASK
static final int UTF8_MASK
This is the mask containing 6-ones in the lower 6-bits.- See Also:
- Constant Field Values
-
INVALID_REPLACEMENT_CHARACTER
static final char INVALID_REPLACEMENT_CHARACTER
The character to use when replacing an invalid character.- See Also:
- Constant Field Values
-
UHEX
static final char[] UHEX
RFC 3986 -- "The uppercase hexadecimal digits 'A' through 'F' are equivalent to the lowercase digits 'a' through 'f', respectively. If two URIs differ only in the case of hexadecimal digits used in percent- encoded octets, they are equivalent. For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent- encodings."
-
UNRESERVED_MASK_LOW
static final long UNRESERVED_MASK_LOW
RFC 3986 Unreserved Characters. The first 64.unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
- See Also:
- Constant Field Values
-
UNRESERVED_MASK_HIGH
static final long UNRESERVED_MASK_HIGH
RFC 3986 Unreserved Characters. The second 64.unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
- See Also:
- Constant Field Values
-
RESERVED_MASK_LOW
static final long RESERVED_MASK_LOW
RFC 3986 Reserved Characters. The first 64.reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
- See Also:
- Constant Field Values
-
RESERVED_MASK_HIGH
static final long RESERVED_MASK_HIGH
The second 64 RFC 3986 Reserved characters.- See Also:
- Constant Field Values
-
_lowMask
private final long _lowMask
The bit-mask of characters that do not need to be escaped, for characters with code-points in the range 0 to 63.
-
_highMask
private final long _highMask
The bit-mask of characters that do not need to be escaped, for character with code-points in the range 64 to 127.
-
_mode
private final URIEncoder.Mode _mode
The encoding mode for this encoder--used primarily for toString().
-
-
Constructor Detail
-
URIEncoder
URIEncoder()
Constructor equivalent to @{code URIEncoder(Mode.FULL_URI)}.
-
URIEncoder
URIEncoder(URIEncoder.Mode mode)
Constructor for the URIEncoder the specifies the encoding mode the URIEncoder will use.- Parameters:
mode
- the encoding mode for this encoder.
-
-
Method Detail
-
maxEncodedLength
protected int maxEncodedLength(int n)
Description copied from class:Encoder
Returns the maximum encoded length (in chars) of an input sequence ofn
characters.- Specified by:
maxEncodedLength
in classEncoder
- Parameters:
n
- the number of characters of input- Returns:
- the worst-case number of characters required to encode
-
firstEncodedOffset
protected int firstEncodedOffset(java.lang.String input, int off, int len)
Description copied from class:Encoder
Scans the input string for the first character index that requires encoding. If the entire input does not require encoding then the length is returned. This method is used by the Encode.forXYZ methods to return input strings unchanged when possible.- Specified by:
firstEncodedOffset
in classEncoder
- Parameters:
input
- the input to check for encodingoff
- the offset of the first character to checklen
- the number of characters to check- Returns:
- the index of the first character to encode. The return value
will be
off+len
if no characters in the input require encoding.
-
encodeArrays
protected java.nio.charset.CoderResult encodeArrays(java.nio.CharBuffer input, java.nio.CharBuffer output, boolean endOfInput)
Description copied from class:Encoder
The core encoding loop used when both the input and output buffers are array backed. The loop is expected to fetch the arrays and interact with the arrays directly for performance.- Overrides:
encodeArrays
in classEncoder
- Parameters:
input
- the input buffer.output
- the output buffer.endOfInput
- when true, this is the last input to encode- Returns:
- UNDERFLOW or OVERFLOW
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-