Doxygen
|
Various UTF8 related helper functions. More...
#include <cstdint>
#include <string>
Go to the source code of this file.
Functions | |
std::string | convertUTF8ToLower (const std::string &input) |
Converts the input string into a lower case version, also taking into account non-ASCII characters that has a lower case variant. | |
std::string | convertUTF8ToUpper (const std::string &input) |
Converts the input string into a upper case version, also taking into account non-ASCII characters that has a upper case variant. | |
std::string | getUTF8CharAt (const std::string &input, size_t pos) |
Returns the UTF8 character found at byte position pos in the input string. | |
uint32_t | getUnicodeForUTF8CharAt (const std::string &input, size_t pos) |
Returns the 32bit Unicode value matching character at byte position pos in the UTF8 encoded input. | |
uint8_t | getUTF8CharNumBytes (char firstByte) |
Returns the number of bytes making up a single UTF8 character given the first byte in the sequence. | |
const char * | writeUTF8Char (TextStream &t, const char *s) |
Writes the UTF8 character pointed to by s to stream t and returns a pointer to the next character. | |
bool | lastUTF8CharIsMultibyte (const std::string &input) |
Returns true iff the last character in input is a multibyte character. | |
bool | isUTF8CharUpperCase (const std::string &input, size_t pos) |
Returns true iff the input string at byte position pos holds an upper case character. | |
int | isUTF8NonBreakableSpace (const char *input) |
Check if the first character pointed at by input is a non-breakable whitespace character. | |
bool | isUTF8PunctuationCharacter (uint32_t unicode) |
Check if the given Unicode character represents a punctuation character. | |
Various UTF8 related helper functions.
See https://en.wikipedia.org/wiki/UTF-8 for details on UTF8 encoding.
Definition in file utf8.h.
std::string convertUTF8ToLower | ( | const std::string & | input | ) |
Converts the input string into a lower case version, also taking into account non-ASCII characters that has a lower case variant.
Definition at line 187 of file utf8.cpp.
References asciiToLower(), caseConvert(), and convertUnicodeToLower().
Referenced by SearchIndexInfo::add(), Index::addClassMemberNameToIndex(), Index::addFileMemberNameToIndex(), Index::addModuleMemberNameToIndex(), Index::addNamespaceMemberNameToIndex(), AnchorGenerator::generate(), QCString::lower(), FileNameFn::searchKey(), and SearchTerm::termEncoded().
std::string convertUTF8ToUpper | ( | const std::string & | input | ) |
Converts the input string into a upper case version, also taking into account non-ASCII characters that has a upper case variant.
Definition at line 192 of file utf8.cpp.
References asciiToUpper(), caseConvert(), and convertUnicodeToUpper().
Referenced by Translator::createNoun(), QCString::upper(), and writeAlphabeticalClassList().
uint32_t getUnicodeForUTF8CharAt | ( | const std::string & | input, |
size_t | pos ) |
Returns the 32bit Unicode value matching character at byte position pos in the UTF8 encoded input.
Definition at line 135 of file utf8.cpp.
References convertUTF8CharToUnicode(), and getUTF8CharAt().
Referenced by AnchorGenerator::generate().
std::string getUTF8CharAt | ( | const std::string & | input, |
size_t | pos ) |
Returns the UTF8 character found at byte position pos in the input string.
The resulting string can be a multi byte sequence.
Definition at line 127 of file utf8.cpp.
References getUTF8CharNumBytes().
Referenced by SearchIndexInfo::add(), Index::addClassMemberNameToIndex(), Index::addFileMemberNameToIndex(), Index::addModuleMemberNameToIndex(), Index::addNamespaceMemberNameToIndex(), Translator::createNoun(), AnchorGenerator::generate(), getUnicodeForUTF8CharAt(), and writeAlphabeticalClassList().
uint8_t getUTF8CharNumBytes | ( | char | firstByte | ) |
Returns the number of bytes making up a single UTF8 character given the first byte in the sequence.
Definition at line 23 of file utf8.cpp.
Referenced by detab(), escapeCharsInString(), AnchorGenerator::generate(), getUTF8CharAt(), nextUTF8CharPosition(), updateColumnCount(), and writeUTF8Char().
bool isUTF8CharUpperCase | ( | const std::string & | input, |
size_t | pos ) |
Returns true iff the input string at byte position pos holds an upper case character.
Definition at line 218 of file utf8.cpp.
References convertUnicodeToLower(), and convertUTF8CharToUnicode().
Referenced by DefinitionImpl::_setBriefDescription().
int isUTF8NonBreakableSpace | ( | const char * | input | ) |
Check if the first character pointed at by input is a non-breakable whitespace character.
Returns the byte size of the character if there is match or 0 if not.
Definition at line 228 of file utf8.cpp.
Referenced by detab().
bool isUTF8PunctuationCharacter | ( | uint32_t | unicode | ) |
Check if the given Unicode character represents a punctuation character.
Definition at line 234 of file utf8.cpp.
References isPunctuationCharacter().
Referenced by AnchorGenerator::generate().
bool lastUTF8CharIsMultibyte | ( | const std::string & | input | ) |
Returns true iff the last character in input is a multibyte character.
Definition at line 212 of file utf8.cpp.
Referenced by DefinitionImpl::_setBriefDescription().
const char * writeUTF8Char | ( | TextStream & | t, |
const char * | s ) |
Writes the UTF8 character pointed to by s to stream t and returns a pointer to the next character.
Definition at line 197 of file utf8.cpp.
References getUTF8CharNumBytes(), and TextStream::write().
Referenced by HtmlCodeGenerator::codify(), ManCodeGenerator::codify(), RTFCodeGenerator::codify(), HtmlDocVisitor::operator()(), HtmlDocVisitor::writeObfuscatedMailAddress(), and writeXMLCodeString().