问一个字符编码的问题

BT_Dana 2013-09-24 10:00:00
在程序里声明一个字符串, 这个字符串采用什么方式的编码, 和什么有关?
是和代码文件的编码方式? 还是编译器的配置有关? 还是和操作系统有关?

比如在vs的程序里, 声明一个字符串char a[] = "你好吗"; 则sizeof(a)的值为7, 明显这采用了gb2312的编码方式.
那么通过做什么更改, 可以改变声明的这个字符串的编码方式? 比如我想要utf8的编码方式. 因为正常在linux utf8的编码环境下, char a[] = "你好吗", sizeof(a)的结果会是10, 这是utf8的编码方式.
...全文
261 14 打赏 收藏 转发到动态 举报
AI 作业
写回复
用AI写文章
14 条回复
切换为时间正序
请发表友善的回复…
发表回复
attilax 2013-09-26
  • 打赏
  • 举报
回复
程序里边的汉字编码 以源文件保存编码为准...
modyaj 2013-09-24
  • 打赏
  • 举报
回复
引用 9 楼 aa2650 的回复:
引用 8 楼 modyaj 的回复:
[quote=引用 7 楼 aa2650 的回复:] [quote=引用 4 楼 modyaj 的回复:] VS中的那几个选项也就Unicode和多字符集 还有没有其他做法的话就不知道的了 不过已经够用了吧!这样的话 sizeof(a)的话也就只能得到8 ,
Unicode和多字符集那个选项我改了试了, 没效果. 话说这个是干嘛的呀?
我也试过了 char a[] = "你好吗", sizeof(a)的结果会是8 达不到楼主说的10. Unicode和多字符集还不能满足楼主的使用么?[/quote]不就是工程的properties --> Configuration Properties -> General -->Character Set里的三个选项吗? 有"not set", "use unicode character set", "user multi_byte character set". 是这3个吧? 这三个我都设置了, 怎么sizeof(a)一直都是7???没有变化?? 你的8是哪来的啊 哪种编码方式也不会出来8这个值啊- - 也不是wchar_t[/quote] wchar_t是8
BT_Dana 2013-09-24
  • 打赏
  • 举报
回复
引用 8 楼 modyaj 的回复:
引用 7 楼 aa2650 的回复:
[quote=引用 4 楼 modyaj 的回复:] VS中的那几个选项也就Unicode和多字符集 还有没有其他做法的话就不知道的了 不过已经够用了吧!这样的话 sizeof(a)的话也就只能得到8 ,
Unicode和多字符集那个选项我改了试了, 没效果. 话说这个是干嘛的呀?
我也试过了 char a[] = "你好吗", sizeof(a)的结果会是8 达不到楼主说的10. Unicode和多字符集还不能满足楼主的使用么?[/quote]不就是工程的properties --> Configuration Properties -> General -->Character Set里的三个选项吗? 有"not set", "use unicode character set", "user multi_byte character set". 是这3个吧? 这三个我都设置了, 怎么sizeof(a)一直都是7???没有变化?? 你的8是哪来的啊 哪种编码方式也不会出来8这个值啊- - 也不是wchar_t
modyaj 2013-09-24
  • 打赏
  • 举报
回复
引用 7 楼 aa2650 的回复:
引用 4 楼 modyaj 的回复:
VS中的那几个选项也就Unicode和多字符集 还有没有其他做法的话就不知道的了 不过已经够用了吧!这样的话 sizeof(a)的话也就只能得到8 ,
Unicode和多字符集那个选项我改了试了, 没效果. 话说这个是干嘛的呀?
我也试过了 char a[] = "你好吗", sizeof(a)的结果会是8 达不到楼主说的10. Unicode和多字符集还不能满足楼主的使用么?
BT_Dana 2013-09-24
  • 打赏
  • 举报
回复
引用 4 楼 modyaj 的回复:
VS中的那几个选项也就Unicode和多字符集 还有没有其他做法的话就不知道的了 不过已经够用了吧!这样的话 sizeof(a)的话也就只能得到8 ,
Unicode和多字符集那个选项我改了试了, 没效果. 话说这个是干嘛的呀?
BT_Dana 2013-09-24
  • 打赏
  • 举报
回复
引用 5 楼 yuelengdihai 的回复:
编码方式应该是编译器默认设置的 可以修改的
怎么改...
cocoabird 2013-09-24
  • 打赏
  • 举报
回复
编码方式应该是编译器默认设置的 可以修改的
modyaj 2013-09-24
  • 打赏
  • 举报
回复
VS中的那几个选项也就Unicode和多字符集 还有没有其他做法的话就不知道的了 不过已经够用了吧!这样的话 sizeof(a)的话也就只能得到8 ,
BT_Dana 2013-09-24
  • 打赏
  • 举报
回复
引用 2 楼 aa2650 的回复:
[quote=引用 1 楼 max_min_ 的回复:] 文件->高级选项保存->编码 这里可以修改的吧!试试
刚又查了些资料, 了解到我要的这个东西, 叫"程序内码". http://blog.csdn.net/qq522842083/article/details/9202871[/quote] linux可以通过编译选项, 控制字符串采用那种编码方式, 但是windows下一向封装性比较强, 似乎没法具体指定. 这个"程序内码"感觉应该是和编译器有关的, 至于编译器的具体策略就是不可知的了... 可能根据源文件的编码方式, 也可能根据系统的编码方式. 关于什么源文件编码和程序内码的概念, 可以参见这个帖子: http://soft.chinabyte.com/database/29/12325029.shtml
BT_Dana 2013-09-24
  • 打赏
  • 举报
回复
引用 1 楼 max_min_ 的回复:
文件->高级选项保存->编码 这里可以修改的吧!试试
刚又查了些资料, 了解到我要的这个东西, 叫"程序内码". http://blog.csdn.net/qq522842083/article/details/9202871
max_min_ 2013-09-24
  • 打赏
  • 举报
回复
文件->高级选项保存->编码 这里可以修改的吧!试试
赵4老师 2013-09-24
  • 打赏
  • 举报
回复
推荐使用WinHex软件查看硬盘或文件或内存中的原始字节内容。 对电脑而言没有乱码,只有二进制字节;对人脑才有乱码。啊 GBK:0xB0 0xA1,Unicode-16 LE:0x4A 0x55,Unicode-16 BE:0x55 0x4A,UTF-8:0xE5 0x95 0x8A 电脑内存或文件内容只是一个一维二进制字节数组及其对应的二进制地址; 人脑才将电脑内存或文件内容中的这个一维二进制字节数组及其对应的二进制地址的某些部分看成是整数、有符号数/无符号数、浮点数、复数、英文字母、阿拉伯数字、中文/韩文/法文……字符/字符串、汇编指令、函数、函数参数、堆、栈、数组、指针、数组指针、指针数组、数组的数组、指针的指针、二维数组、字符点阵、字符笔画的坐标、黑白二值图片、灰度图片、彩色图片、录音、视频、指纹信息、身份证信息……
赵4老师 2013-09-24
  • 打赏
  • 举报
回复
C++ String Literals A string literal consists of zero or more characters from the source character set surrounded by double quotation marks ("). A string literal represents a sequence of characters that, taken together, form a null-terminated string. Syntax string-literal : "s-char-sequenceopt" L"s-char-sequenceopt" s-char-sequence : s-char s-char-sequence s-char s-char : any member of the source character set except the double quotation mark ("), backslash (\), or newline character escape-sequence C++ strings have these types: Array of char[n], where n is the length of the string (in characters) plus 1 for the terminating '\0' that marks the end of the string Array of wchar_t, for wide-character strings The result of modifying a string constant is undefined. For example: char *szStr = "1234"; szStr[2] = 'A'; // Results undefined Microsoft Specific In some cases, identical string literals can be “pooled” to save space in the executable file. In string-literal pooling, the compiler causes all references to a particular string literal to point to the same location in memory, instead of having each reference point to a separate instance of the string literal. The/Gf compiler option enables string pooling. END Microsoft Specific When specifying string literals, adjacent strings are concatenated. Therefore, this declaration: char szStr[] = "12" "34"; is identical to this declaration: char szStr[] = "1234"; This concatenation of adjacent strings makes it easy to specify long strings across multiple lines: cout << "Four score and seven years " "ago, our forefathers brought forth " "upon this continent a new nation."; In the preceding example, the entire string Four score and seven years ago, our forefathers brought forth upon this continent a new nation. is spliced together. This string can also be specified using line splicing as follows: cout << "Four score and seven years \ ago, our forefathers brought forth \ upon this continent a new nation."; After all adjacent strings in the constant have been concatenated, the NULL character, '\0', is appended to provide an end-of-string marker for C string-handling functions. When the first string contains an escape character, string concatenation can yield surprising results. Consider the following two declarations: char szStr1[] = "\01" "23"; char szStr2[] = "\0123"; Although it is natural to assume that szStr1 and szStr2 contain the same values, the values they actually contain are shown in Figure 1.1. Figure 1.1 Escapes and String Concatenation Microsoft Specific The maximum length of a string literal is approximately 2,048 bytes. This limit applies to strings of type char[] and wchar_t[]. If a string literal consists of parts enclosed in double quotation marks, the preprocessor concatenates the parts into a single string, and for each line concatenated, it adds an extra byte to the total number of bytes. For example, suppose a string consists of 40 lines with 50 characters per line (2,000 characters), and one line with 7 characters, and each line is surrounded by double quotation marks. This adds up to 2,007 bytes plus one byte for the terminating null character, for a total of 2,008 bytes. On concatenation, an extra character is added to the total number of bytes for each of the first 40 lines. This makes a total of 2,048 bytes. (The extra characters are not actually written to the string.) Note, however, that if line continuations (\) are used instead of double quotation marks, the preprocessor does not add an extra character for each line. END Microsoft Specific Determine the size of string objects by counting the number of characters and adding 1 for the terminating '\0' or 2 for type wchar_t. Because the double quotation mark (") encloses strings, use the escape sequence (\") to represent enclosed double quotation marks. The single quotation mark (') can be represented without an escape sequence. The backslash character (\) is a line-continuation character when placed at the end of a line. If you want a backslash character to appear within a string, you must type two backslashes (\\). (SeePhases of Translation in the Preprocessor Reference for more information about line continuation.) To specify a string of type wide-character (wchar_t[]), precede the opening double quotation mark with the character L. For example: wchar_t wszStr[] = L"1a1g"; All normal escape codes listed in Character Constants are valid in string constants. For example: cout << "First line\nSecond line"; cout << "Error! Take corrective action\a"; Because the escape code terminates at the first character that is not a hexadecimal digit, specification of string constants with embedded hexadecimal escape codes can cause unexpected results. The following example is intended to create a string literal containing ASCII 5, followed by the characters five: \x05five" The actual result is a hexadecimal 5F, which is the ASCII code for an underscore, followed by the characters ive. The following example produces the desired results: "\005five" // Use octal constant. "\x05" "five" // Use string splicing.
赵4老师 2013-09-24
  • 打赏
  • 举报
回复
C++ Character Constants Character constants are one or more members of the “source character set,” the character set in which a program is written, surrounded by single quotation marks ('). They are used to represent characters in the “execution character set,” the character set on the machine where the program executes. Microsoft Specific For Microsoft C++, the source and execution character sets are both ASCII. END Microsoft Specific There are three kinds of character constants: Normal character constants Multicharacter constants Wide-character constants Note Use wide-character constants in place of multicharacter constants to ensure portability. Character constants are specified as one or more characters enclosed in single quotation marks. For example: char ch = 'x'; // Specify normal character constant. int mbch = 'ab'; // Specify system-dependent // multicharacter constant. wchar_t wcch = L'ab'; // Specify wide-character constant. Note that mbch is of type int. If it were declared as type char, the second byte would not be retained. A multicharacter constant has four meaningful characters; specifying more than four generates an error message. Syntax character-constant : 'c-char-sequence' L'c-char-sequence' c-char-sequence : c-char c-char-sequence c-char c-char : any member of the source character set except the single quotation mark ('), backslash (\), or newline character escape-sequence escape-sequence : simple-escape-sequence octal-escape-sequence hexadecimal-escape-sequence simple-escape-sequence : one of \' \" \? \\ \a \b \f \n \r \t \v octal-escape-sequence : \octal-digit \octal-digit octal-digit \octal-digit octal-digit octal-digit hexadecimal-escape-sequence : \xhexadecimal-digit hexadecimal-escape-sequence hexadecimal-digit Microsoft C++ supports normal, multicharacter, and wide-character constants. Use wide-character constants to specify members of the extended execution character set (for example, to support an international application). Normal character constants have type char, multicharacter constants have type int, and wide-character constants have type wchar_t. (The type wchar_t is defined in the standard include files STDDEF.H, STDLIB.H, and STRING.H. The wide-character functions, however, are prototyped only in STDLIB.H.) The only difference in specification between normal and wide-character constants is that wide-character constants are preceded by the letter L. For example: char schar = 'x'; // Normal character constant wchar_t wchar = L'\x81\x19'; // Wide-character constant Table 1.2 shows reserved or nongraphic characters that are system dependent or not allowed within character constants. These characters should be represented with escape sequences. Table 1.2 C++ Reserved or Nongraphic Characters Character ASCII Representation ASCII Value Escape Sequence Newline NL (LF) 10 or 0x0a \n Horizontal tab HT 9 \t Vertical tab VT 11 or 0x0b \v Backspace BS 8 \b Carriage return CR 13 or 0x0d \r Formfeed FF 12 or 0x0c \f Alert BEL 7 \a Backslash \ 92 or 0x5c \\ Question mark ? 63 or 0x3f \? Single quotation mark ' 39 or 0x27 \' Double quotation mark " 34 or 0x22 \" Octal number ooo — \ooo Hexadecimal number hhh — \xhhh Null character NUL 0 \0 If the character following the backslash does not specify a legal escape sequence, the result is implementation defined. In Microsoft C++, the character following the backslash is taken literally, as though the escape were not present, and a level 1 warning (“unrecognized character escape sequence”) is issued. Octal escape sequences, specified in the form \ooo, consist of a backslash and one, two, or three octal characters. Hexadecimal escape sequences, specified in the form \xhhh, consist of the characters \x followed by a sequence of hexadecimal digits. Unlike octal escape constants, there is no limit on the number of hexadecimal digits in an escape sequence. Octal escape sequences are terminated by the first character that is not an octal digit, or when three characters are seen. For example: wchar_t och = L'\076a'; // Sequence terminates at a char ch = '\233'; // Sequence terminates after 3 characters Similarly, hexadecimal escape sequences terminate at the first character that is not a hexadecimal digit. Because hexadecimal digits include the letters a through f (and A through F), make sure the escape sequence terminates at the intended digit. Because the single quotation mark (') encloses character constants, use the escape sequence \' to represent enclosed single quotation marks. The double quotation mark (") can be represented without an escape sequence. The backslash character (\) is a line-continuation character when placed at the end of a line. If you want a backslash character to appear within a character constant, you must type two backslashes in a row (\\). (SeePhases of Translation in the Preprocessor Reference for more information about line continuation.)

3,882

社区成员

发帖
与我相关
我的任务
社区描述
C/C++ 其它技术问题
社区管理员
  • 其它技术问题社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧