- All Implemented Interfaces:
Serializable,Comparable<Character>,Constable
Character
class wraps a value of the primitive type char
in an object. An object of class Character
contains a single field whose type is char
. In addition, this class provides a large number of static methods for determining a character's category (lowercase letter, digit, etc.) and for converting characters from uppercase to lowercase and vice versa.
Unicode Conformance
The fields and methods of class Character
are defined in terms of character information from the Unicode Standard, specifically the UnicodeData file that is part of the Unicode Character Database. This file specifies properties including name and category for every assigned Unicode code point or character range. The file is available from the Unicode Consortium at http://www.unicode.org.
Character information is based on the Unicode Standard, version 15.0.
The Java platform has supported different versions of the Unicode Standard over time. Upgrades to newer versions of the Unicode Standard occurred in the following Java releases, each indicating the new version:
| Java release | Unicode version |
|---|---|
| Java SE 20 | Unicode 15.0 |
| Java SE 19 | Unicode 14.0 |
| Java SE 15 | Unicode 13.0 |
| Java SE 13 | Unicode 12.1 |
| Java SE 12 | Unicode 11.0 |
| Java SE 11 | Unicode 10.0 |
| Java SE 9 | Unicode 8.0 |
| Java SE 8 | Unicode 6.2 |
| Java SE 7 | Unicode 6.0 |
| Java SE 5.0 | Unicode 4.0 |
| Java SE 1.4 | Unicode 3.0 |
| JDK 1.1 | Unicode 2.0 |
| JDK 1.0.2 | Unicode 1.1.5 |
Unicode Character Representations
The char
data type (and therefore the value that a Character
object encapsulates) are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code point
s is now U+0000 to U+10FFFF, known as Unicode scalar value
. (Refer to the definition of the U+n notation in the Unicode Standard.)
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP)
. Characters whose code points are greater than U+FFFF are called supplementary character
s. The Java platform uses the UTF-16 representation in char
arrays and in the String
and StringBuffer
classes. In this representation, supplementary characters are represented as a pair of char
values, the first from the high-surrogates
range, (\uD800-\uDBFF), the second from the low-surrogates
range (\uDC00-\uDFFF).
A char
value, therefore, represents Basic Multilingual Plane (BMP) code points, including the surrogate code points, or code units of the UTF-16 encoding. An int
value represents all Unicode code points, including supplementary code points. The lower (least significant) 21 bits of int
are used to represent Unicode code points and the upper (most significant) 11 bits must be zero. Unless otherwise specified, the behavior with respect to supplementary characters and surrogate char
values is as follows:
- The methods that only accept a
charvalue cannot support supplementary characters. They treatcharvalues from the surrogate ranges as undefined characters. For example,Character.isLetter('\uD840')returnsfalse, even though this specific value if followed by any low-surrogate value in a string would represent a letter. - The methods that accept an
intvalue support all Unicode characters, including supplementary characters. For example,Character.isLetter(0x2F81A)returnstruebecause the code point value represents a letter (a CJK ideograph).
In the Java SE API documentation, Unicode code point
is used for character values in the range between U+0000 and U+10FFFF, and Unicode code unit
is used for 16-bit char
values that are code units of the UTF-16
encoding. For more information on Unicode terminology, refer to the Unicode Glossary.
This is a value-based class; programmers should treat instances that are equal as interchangeable and should not use instances for synchronization, or unpredictable behavior may occur. For example, in a future release, synchronization may fail.
- Since:
- 1.0
- External Specifications
- See Also:
Nested Types
- ✓ Subset
- ✓ UnicodeBlock
- ✓ UnicodeScript
Fields
- ✓public static final int BYTES = 2
- ✓public static final byte COMBINING_SPACING_MARK = 8
- ✓public static final byte CONNECTOR_PUNCTUATION = 23
- ✓public static final byte CONTROL = 15
- ✓public static final byte CURRENCY_SYMBOL = 26
- ✓public static final byte DASH_PUNCTUATION = 20
- ✓public static final byte DECIMAL_DIGIT_NUMBER = 9
- ✓public static final byte DIRECTIONALITY_ARABIC_NUMBER = 6
- ✓public static final byte DIRECTIONALITY_BOUNDARY_NEUTRAL = 9
- ✓public static final byte DIRECTIONALITY_COMMON_NUMBER_SEPARATOR = 7
- ✓public static final byte DIRECTIONALITY_EUROPEAN_NUMBER = 3
- ✓public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_SEPARATOR = 4
- ✓public static final byte DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR = 5
- ✓public static final byte DIRECTIONALITY_FIRST_STRONG_ISOLATE = 21
- ✓public static final byte DIRECTIONALITY_LEFT_TO_RIGHT = 0
- ✓public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING = 14
- ✓public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_ISOLATE = 19
- ✓public static final byte DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE = 15
- ✓public static final byte DIRECTIONALITY_NONSPACING_MARK = 8
- ✓public static final byte DIRECTIONALITY_OTHER_NEUTRALS = 13
- ✓public static final byte DIRECTIONALITY_PARAGRAPH_SEPARATOR = 10
- ✓public static final byte DIRECTIONALITY_POP_DIRECTIONAL_FORMAT = 18
- ✓public static final byte DIRECTIONALITY_POP_DIRECTIONAL_ISOLATE = 22
- ✓public static final byte DIRECTIONALITY_RIGHT_TO_LEFT = 1
- ✓public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_ARABIC = 2
- ✓public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING = 16
- ✓public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_ISOLATE = 20
- ✓public static final byte DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE = 17
- ✓public static final byte DIRECTIONALITY_SEGMENT_SEPARATOR = 11
- ✓public static final byte DIRECTIONALITY_UNDEFINED = -1
- ✓public static final byte DIRECTIONALITY_WHITESPACE = 12
- ✓public static final byte ENCLOSING_MARK = 7
- ✓public static final byte END_PUNCTUATION = 22
- ✓public static final byte FINAL_QUOTE_PUNCTUATION = 30
- ✓public static final byte FORMAT = 16
- ✓public static final byte INITIAL_QUOTE_PUNCTUATION = 29
- ✓public static final byte LETTER_NUMBER = 10
- ✓public static final byte LINE_SEPARATOR = 13
- ✓public static final byte LOWERCASE_LETTER = 2
- ✓public static final byte MATH_SYMBOL = 25
- ✓public static final int MAX_CODE_POINT = 1114111
- ✓public static final char MAX_HIGH_SURROGATE = '\udbff'
- ✓public static final char MAX_LOW_SURROGATE = '\udfff'
- ✓public static final int MAX_RADIX = 36
- ✓public static final char MAX_SURROGATE = '\udfff'
- ✓public static final char MAX_VALUE = ''
- ✓public static final int MIN_CODE_POINT = 0
- ✓public static final char MIN_HIGH_SURROGATE = '\ud800'
- ✓public static final char MIN_LOW_SURROGATE = '\udc00'
- ✓public static final int MIN_RADIX = 2
- ✓public static final int MIN_SUPPLEMENTARY_CODE_POINT = 65536
- ✓public static final char MIN_SURROGATE = '\ud800'
- ✓public static final char MIN_VALUE = '\u0000'
- ✓public static final byte MODIFIER_LETTER = 4
- ✓public static final byte MODIFIER_SYMBOL = 27
- ✓public static final byte NON_SPACING_MARK = 6
- ✓public static final byte OTHER_LETTER = 5
- ✓public static final byte OTHER_NUMBER = 11
- ✓public static final byte OTHER_PUNCTUATION = 24
- ✓public static final byte OTHER_SYMBOL = 28
- ✓public static final byte PARAGRAPH_SEPARATOR = 14
- ✓public static final byte PRIVATE_USE = 18
- ✓public static final int SIZE = 16
- ✓public static final byte SPACE_SEPARATOR = 12
- ✓public static final byte START_PUNCTUATION = 21
- ✓public static final byte SURROGATE = 19
- ✓public static final byte TITLECASE_LETTER = 3
- ✓public static final java.lang.Class<java.lang.Character> TYPE
- ✓public static final byte UNASSIGNED = 0
- ✓public static final byte UPPERCASE_LETTER = 1
Constructors
- ✓@Deprecated(since="9"forRemoval=true)
public Character(char arg0)
Methods
- ✓public static int charCount(int arg0)
- ✓public char charValue()
- ✓public static int codePointAt(char[] arg0, int arg1)
- ✓public static int codePointAt(char[] arg0, int arg1, int arg2)
- ✓public static int codePointAt(java.lang.CharSequence arg0, int arg1)
- ✓public static int codePointBefore(char[] arg0, int arg1)
- ✓public static int codePointBefore(char[] arg0, int arg1, int arg2)
- ✓public static int codePointBefore(java.lang.CharSequence arg0, int arg1)
- ✓public static int codePointCount(char[] arg0, int arg1, int arg2)
- ✓public static int codePointCount(java.lang.CharSequence arg0, int arg1, int arg2)
- ✓public static int codePointOf(java.lang.String arg0)
- ✓public static int compare(char arg0, char arg1)
- ✓public int compareTo(java.lang.Character arg0)
- ✓public java.util.Optional<java.lang.constant.DynamicConstantDesc<java.lang.Character>> describeConstable()
- ✓public static int digit(int arg0, int arg1)
- ✓public static int digit(char arg0, int arg1)
- ✓public boolean equals(java.lang.Object arg0)
- ✓public static char forDigit(int arg0, int arg1)
- ✓public static byte getDirectionality(int arg0)
- ✓public static byte getDirectionality(char arg0)
- ✓public static java.lang.String getName(int arg0)
- ✓public static int getNumericValue(int arg0)
- ✓public static int getNumericValue(char arg0)
- ✓public static int getType(int arg0)
- ✓public static int getType(char arg0)
- ✓public int hashCode()
- ✓public static int hashCode(char arg0)
- ✓public static char highSurrogate(int arg0)
- ✓public static boolean isAlphabetic(int arg0)
- ✓public static boolean isBmpCodePoint(int arg0)
- ✓public static boolean isDefined(int arg0)
- ✓public static boolean isDefined(char arg0)
- ✓public static boolean isDigit(int arg0)
- ✓public static boolean isDigit(char arg0)
- ①Only in: jdk-21+35; not in: jdk-20-ga.public static boolean isEmoji(int arg0)Not in jdk-20-ga; only in jdk-21+35
isEmoji
public static boolean isEmoji(int codePoint) Determines if the specified character (Unicode code point) is an Emoji.A character is considered to be an Emoji if and only if it has the
Emojiproperty, defined in Unicode Emoji (Technical Standard #51).- Parameters:
codePoint- the character (Unicode code point) to be tested.- Returns:
trueif the character is an Emoji;falseotherwise.- Since:
- 21
- ①Only in: jdk-21+35; not in: jdk-20-ga.public static boolean isEmojiComponent(int arg0)Not in jdk-20-ga; only in jdk-21+35
isEmojiComponent
public static boolean isEmojiComponent(int codePoint) Determines if the specified character (Unicode code point) is an Emoji Component.A character is considered to be an Emoji Component if and only if it has the
Emoji_Componentproperty, defined in Unicode Emoji (Technical Standard #51).- Parameters:
codePoint- the character (Unicode code point) to be tested.- Returns:
trueif the character is an Emoji Component;falseotherwise.- Since:
- 21
- ①Only in: jdk-21+35; not in: jdk-20-ga.public static boolean isEmojiModifier(int arg0)Not in jdk-20-ga; only in jdk-21+35
isEmojiModifier
public static boolean isEmojiModifier(int codePoint) Determines if the specified character (Unicode code point) is an Emoji Modifier.A character is considered to be an Emoji Modifier if and only if it has the
Emoji_Modifierproperty, defined in Unicode Emoji (Technical Standard #51).- Parameters:
codePoint- the character (Unicode code point) to be tested.- Returns:
trueif the character is an Emoji Modifier;falseotherwise.- Since:
- 21
- ①Only in: jdk-21+35; not in: jdk-20-ga.public static boolean isEmojiModifierBase(int arg0)Not in jdk-20-ga; only in jdk-21+35
isEmojiModifierBase
public static boolean isEmojiModifierBase(int codePoint) Determines if the specified character (Unicode code point) is an Emoji Modifier Base.A character is considered to be an Emoji Modifier Base if and only if it has the
Emoji_Modifier_Baseproperty, defined in Unicode Emoji (Technical Standard #51).- Parameters:
codePoint- the character (Unicode code point) to be tested.- Returns:
trueif the character is an Emoji Modifier Base;falseotherwise.- Since:
- 21
- ①Only in: jdk-21+35; not in: jdk-20-ga.public static boolean isEmojiPresentation(int arg0)Not in jdk-20-ga; only in jdk-21+35
isEmojiPresentation
public static boolean isEmojiPresentation(int codePoint) Determines if the specified character (Unicode code point) has the Emoji Presentation property by default.A character is considered to have the Emoji Presentation property if and only if it has the
Emoji_Presentationproperty, defined in Unicode Emoji (Technical Standard #51).- Parameters:
codePoint- the character (Unicode code point) to be tested.- Returns:
trueif the character has the Emoji Presentation property;falseotherwise.- Since:
- 21
- ①Only in: jdk-21+35; not in: jdk-20-ga.public static boolean isExtendedPictographic(int arg0)Not in jdk-20-ga; only in jdk-21+35
isExtendedPictographic
public static boolean isExtendedPictographic(int codePoint) Determines if the specified character (Unicode code point) is an Extended Pictographic.A character is considered to be an Extended Pictographic if and only if it has the
Extended_Pictographicproperty, defined in Unicode Emoji (Technical Standard #51).- Parameters:
codePoint- the character (Unicode code point) to be tested.- Returns:
trueif the character is an Extended Pictographic;falseotherwise.- Since:
- 21
- ✓public static boolean isHighSurrogate(char arg0)
- ✓public static boolean isISOControl(int arg0)
- ✓public static boolean isISOControl(char arg0)
- ✓public static boolean isIdentifierIgnorable(int arg0)
- ✓public static boolean isIdentifierIgnorable(char arg0)
- ✓public static boolean isIdeographic(int arg0)
- ✓public static boolean isJavaIdentifierPart(int arg0)
- ✓public static boolean isJavaIdentifierPart(char arg0)
- ✓public static boolean isJavaIdentifierStart(int arg0)
- ✓public static boolean isJavaIdentifierStart(char arg0)
- ✓@Deprecated(since="1.1")
public static boolean isJavaLetter(char arg0) - ✓@Deprecated(since="1.1")
public static boolean isJavaLetterOrDigit(char arg0) - ✓public static boolean isLetter(int arg0)
- ✓public static boolean isLetter(char arg0)
- ✓public static boolean isLetterOrDigit(int arg0)
- ✓public static boolean isLetterOrDigit(char arg0)
- ✓public static boolean isLowSurrogate(char arg0)
- ✓public static boolean isLowerCase(int arg0)
- ✓public static boolean isLowerCase(char arg0)
- ✓public static boolean isMirrored(int arg0)
- ✓public static boolean isMirrored(char arg0)
- ✓@Deprecated(since="1.1")
public static boolean isSpace(char arg0) - ✓public static boolean isSpaceChar(int arg0)
- ✓public static boolean isSpaceChar(char arg0)
- ✓public static boolean isSupplementaryCodePoint(int arg0)
- ✓public static boolean isSurrogate(char arg0)
- ✓public static boolean isSurrogatePair(char arg0, char arg1)
- ✓public static boolean isTitleCase(int arg0)
- ✓public static boolean isTitleCase(char arg0)
- ✗public static boolean isUnicodeIdentifierPart(int arg0)Comparing jdk-20-ga and jdk-21+35
isUnicodeIdentifierPart
public static boolean isUnicodeIdentifierPart(int codePoint) Determines if the specified character (Unicode code point) may be part of a Unicode identifier as other than the first character.A character may be part of a Unicode identifier if and only if one of the following statements is true:
- it is a letter
- it is a connecting punctuation character (such as
'_') - it is a digit
- it is a numeric letter (such as a Roman numeral character)
- it is a combining mark
- it is a non-spacing mark
-
isIdentifierIgnorablereturnstruefor this character. - it is an
Other_ID_Startcharacter. - it is an
Other_ID_Continuecharacter.
This method conforms to UAX31-R1: Default Identifiers requirement of the Unicode Standard, with the following profile of UAX31:
Continue := Start + ID_Continue + ignorable Medial := empty ignorable := isIdentifierIgnorable(int) returns true for the character
ignorableis added toContinuefor backward compatibility.- Parameters:
codePoint- the character (Unicode code point) to be tested.- Returns:
trueif the character may be part of a Unicode identifier;falseotherwise.- Since:
- 1.5
- External Specifications
- See Also:
- ✗public static boolean isUnicodeIdentifierPart(char arg0)Comparing jdk-20-ga and jdk-21+35
isUnicodeIdentifierPart
public static boolean isUnicodeIdentifierPart(char ch) Determines if the specified character may be part of a Unicode identifier as other than the first character.A character may be part of a Unicode identifier if and only if one of the following statements is true:
- it is a letter
- it is a connecting punctuation character (such as
'_') - it is a digit
- it is a numeric letter (such as a Roman numeral character)
- it is a combining mark
- it is a non-spacing mark
-
isIdentifierIgnorablereturnstruefor this character. - it is an
Other_ID_Startcharacter. - it is an
Other_ID_Continuecharacter.
This method conforms to UAX31-R1: Default Identifiers requirement of the Unicode Standard, with the following profile of UAX31:
Continue := Start + ID_Continue + ignorable Medial := empty ignorable := isIdentifierIgnorable(char) returns true for the character
ignorableis added toContinuefor backward compatibility.Note: This method cannot handle supplementary characters. To support all Unicode characters, including supplementary characters, use the
isUnicodeIdentifierPart(int)method.- Parameters:
ch- the character to be tested.- Returns:
trueif the character may be part of a Unicode identifier;falseotherwise.- Since:
- 1.1
- External Specifications
- See Also:
- ✗public static boolean isUnicodeIdentifierStart(int arg0)Comparing jdk-20-ga and jdk-21+35
isUnicodeIdentifierStart
public static boolean isUnicodeIdentifierStart(int codePoint) Determines if the specified character (Unicode code point) is permissible as the first character in a Unicode identifier.A character may start a Unicode identifier if and only if one of the following conditions is true:
-
isLetter(codePoint)returnstrue -
getType(codePoint)returnsLETTER_NUMBER. - it is an
Other_ID_Startcharacter.
This method conforms to UAX31-R1: Default Identifiers requirement of the Unicode Standard, with the following profile of UAX31:
Start := ID_Start + 'VERTICAL TILDE' (U+2E2F)
'VERTICAL TILDE'is added toStartfor backward compatibility.- Parameters:
codePoint- the character (Unicode code point) to be tested.- Returns:
trueif the character may start a Unicode identifier;falseotherwise.- Since:
- 1.5
- External Specifications
- See Also:
-
- ✗public static boolean isUnicodeIdentifierStart(char arg0)Comparing jdk-20-ga and jdk-21+35
isUnicodeIdentifierStart
public static boolean isUnicodeIdentifierStart(char ch) Determines if the specified character is permissible as the first character in a Unicode identifier.A character may start a Unicode identifier if and only if one of the following conditions is true:
-
isLetter(ch)returnstrue -
getType(ch)returnsLETTER_NUMBER. - it is an
Other_ID_Startcharacter.
This method conforms to UAX31-R1: Default Identifiers requirement of the Unicode Standard, with the following profile of UAX31:
Start := ID_Start + 'VERTICAL TILDE' (U+2E2F)
'VERTICAL TILDE'is added toStartfor backward compatibility.Note: This method cannot handle supplementary characters. To support all Unicode characters, including supplementary characters, use the
isUnicodeIdentifierStart(int)method.- Parameters:
ch- the character to be tested.- Returns:
trueif the character may start a Unicode identifier;falseotherwise.- Since:
- 1.1
- External Specifications
- See Also:
-
- ✓public static boolean isUpperCase(int arg0)
- ✓public static boolean isUpperCase(char arg0)
- ✓public static boolean isValidCodePoint(int arg0)
- ✓public static boolean isWhitespace(int arg0)
- ✓public static boolean isWhitespace(char arg0)
- ✓public static char lowSurrogate(int arg0)
- ✓public static int offsetByCodePoints(char[] arg0, int arg1, int arg2, int arg3, int arg4)
- ✓public static int offsetByCodePoints(java.lang.CharSequence arg0, int arg1, int arg2)
- ✓public static char reverseBytes(char arg0)
- ✓public static char[] toChars(int arg0)
- ✓public static int toChars(int arg0, char[] arg1, int arg2)
- ✓public static int toCodePoint(char arg0, char arg1)
- ✓public static int toLowerCase(int arg0)
- ✓public static char toLowerCase(char arg0)
- ✓public java.lang.String toString()
- ✓public static java.lang.String toString(int arg0)
- ✓public static java.lang.String toString(char arg0)
- ✓public static int toTitleCase(int arg0)
- ✓public static char toTitleCase(char arg0)
- ✓public static int toUpperCase(int arg0)
- ✓public static char toUpperCase(char arg0)
- ✓public static java.lang.Character valueOf(char arg0)
Serialized Form
✓serialVersionUID
✓3786198910865385080Serialized Fields
- ✓char value
Summary
| Elements | Comments | Descriptions | Total | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Added | Changed | Removed | Added | Changed | Removed | Added | Changed | Removed | ||
| Character | 3 | 3 | ||||||||
| isEmoji(int) | 1 | 1 | 2 | |||||||
| isEmojiComponent(int) | 1 | 1 | 2 | |||||||
| isEmojiModifier(int) | 1 | 1 | 2 | |||||||
| isEmojiModifierBase(int) | 1 | 1 | 2 | |||||||
| isEmojiPresentation(int) | 1 | 1 | 2 | |||||||
| isExtendedPictographic(int) | 1 | 1 | 2 | |||||||
| isUnicodeIdentifierPart(int) | 3 | 3 | ||||||||
| isUnicodeIdentifierPart(char) | 3 | 3 | ||||||||
| isUnicodeIdentifierStart(int) | 3 | 3 | ||||||||
| isUnicodeIdentifierStart(char) | 3 | 3 | ||||||||
| Total | 6 | 21 | 27 | |||||||